All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel related (?) user space crash at ARM11 MPCore
       [not found]     ` <20090817140422.GA10764@n2100.arm.linux.org.uk>
@ 2009-08-29 12:27       ` Catalin Marinas
  2009-08-31  8:30         ` Catalin Marinas
  2009-09-03 11:58         ` Dirk Behme
       [not found]       ` <1250529916.11185.80.camel@pc1117.cambridge.arm.com>
  1 sibling, 2 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-08-29 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-08-17 at 15:04 +0100, Russell King - ARM Linux wrote:
> On Mon, Aug 17, 2009 at 10:28:31AM +0100, Catalin Marinas wrote:
> > On Thu, 2009-08-13 at 18:20 +0100, Catalin Marinas wrote:
> > > Since I can't statically link the above code (ld complaining about some
> > > relocation), it means that the dynamic linker needs to do some
> > > relocations at run-time. Would it need to flush the cache for those
> > > relocations? I don't see any calls to the ARM-specific cache flushing
> > > syscall and the difference on ARM11MPCore from other CPUs is that the
> > > caches are always write-allocate. This may explain why adding a full
> > > cache flush apparently solves the problem, but it's not a solution.
> > 
> > At a first look, it's only data which is relocated rather than code, so
> > cache flushing should be required. More investigation into the dynamic
> > linker is needed here.
> > 
> > What I noticed when running through strace is that the dynamic loader
> > executes a few mprotect() calls on the application code mapped at
> > 0x2a000000. The first one sets permissions to PROT_READ|PROT_WRITE,
> > which implies that it may need to do some modifications. This is
> > followed by setting the PROT_READ|PROT_EXEC back.
> 
> This is probably for one of the GOT such like tables.  I seem to
> remember that function calls to libraries are implemented as something
> like:
> 
> 	ldr	pc, . + 4
> 	.word	0
> 
> and the dynamic linker fixes up the ".word 0" to be the actual address.
> This means that the dynamic linker requires RW access to this table,
> but then has to set it back to RX access so that the instructions can
> be executed.

It looks like this is causing the problem. Setting the protection to RW
and writing data (not instructions) causes the text page to be COW'ed
(page mapped with MAP_PRIVATE). Some cache flushing is missing on VIPT
caches during page copying for COW. With ARM11MPCore, the D-cache is
write-allocate so it never makes it to the main memory for the I-cache
to pick.

I'll look again next week on where to best add the flushing (or just
modify the dynamic linker to avoid COW on text pages). Any suggestions?

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-08-29 12:27       ` Kernel related (?) user space crash at ARM11 MPCore Catalin Marinas
@ 2009-08-31  8:30         ` Catalin Marinas
  2009-09-07 15:29           ` Catalin Marinas
  2009-09-03 11:58         ` Dirk Behme
  1 sibling, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-08-31  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2009-08-29 at 13:27 +0100, Catalin Marinas wrote:
> On Mon, 2009-08-17 at 15:04 +0100, Russell King - ARM Linux wrote:
> > On Mon, Aug 17, 2009 at 10:28:31AM +0100, Catalin Marinas wrote:
> > > On Thu, 2009-08-13 at 18:20 +0100, Catalin Marinas wrote:
> > > > Since I can't statically link the above code (ld complaining about some
> > > > relocation), it means that the dynamic linker needs to do some
> > > > relocations at run-time. Would it need to flush the cache for those
> > > > relocations? I don't see any calls to the ARM-specific cache flushing
> > > > syscall and the difference on ARM11MPCore from other CPUs is that the
> > > > caches are always write-allocate. This may explain why adding a full
> > > > cache flush apparently solves the problem, but it's not a solution.
> > > 
> > > At a first look, it's only data which is relocated rather than code, so
> > > cache flushing should be required. More investigation into the dynamic
> > > linker is needed here.
> > > 
> > > What I noticed when running through strace is that the dynamic loader
> > > executes a few mprotect() calls on the application code mapped at
> > > 0x2a000000. The first one sets permissions to PROT_READ|PROT_WRITE,
> > > which implies that it may need to do some modifications. This is
> > > followed by setting the PROT_READ|PROT_EXEC back.
> > 
> > This is probably for one of the GOT such like tables.  I seem to
> > remember that function calls to libraries are implemented as something
> > like:
> > 
> > 	ldr	pc, . + 4
> > 	.word	0
> > 
> > and the dynamic linker fixes up the ".word 0" to be the actual address.
> > This means that the dynamic linker requires RW access to this table,
> > but then has to set it back to RX access so that the instructions can
> > be executed.
> 
> It looks like this is causing the problem. Setting the protection to RW
> and writing data (not instructions) causes the text page to be COW'ed
> (page mapped with MAP_PRIVATE). Some cache flushing is missing on VIPT
> caches during page copying for COW. With ARM11MPCore, the D-cache is
> write-allocate so it never makes it to the main memory for the I-cache
> to pick.
> 
> I'll look again next week on where to best add the flushing (or just
> modify the dynamic linker to avoid COW on text pages). Any suggestions?

After talking to the toolchain people, it seems that the dynamic linker
is just doing whatever the ELF file says regarding the relocations. The
problem in this case is that when compiling with -pie, one of the crt*.o
files (and _start) used in PIE applications is not position-independent.

I think this was fixed (but not released yet) by CodeSourcery but you
can get this behaviour if some files of an executable were not compiled
with -fpic. So the mprotect cache flushing patch that I posted looks
like a valid workaround.

For mappings with RWX protection, however, the copy_user_highpage
function may need to do some flushing as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-08-29 12:27       ` Kernel related (?) user space crash at ARM11 MPCore Catalin Marinas
  2009-08-31  8:30         ` Catalin Marinas
@ 2009-09-03 11:58         ` Dirk Behme
  1 sibling, 0 replies; 72+ messages in thread
From: Dirk Behme @ 2009-09-03 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

Catalin Marinas wrote:
> On Mon, 2009-08-17 at 15:04 +0100, Russell King - ARM Linux wrote:
>> On Mon, Aug 17, 2009 at 10:28:31AM +0100, Catalin Marinas wrote:
>>> On Thu, 2009-08-13 at 18:20 +0100, Catalin Marinas wrote:
>>>> Since I can't statically link the above code (ld complaining about some
>>>> relocation), it means that the dynamic linker needs to do some
>>>> relocations at run-time. Would it need to flush the cache for those
>>>> relocations? I don't see any calls to the ARM-specific cache flushing
>>>> syscall and the difference on ARM11MPCore from other CPUs is that the
>>>> caches are always write-allocate. This may explain why adding a full
>>>> cache flush apparently solves the problem, but it's not a solution.
>>> At a first look, it's only data which is relocated rather than code, so
>>> cache flushing should be required. More investigation into the dynamic
>>> linker is needed here.
>>>
>>> What I noticed when running through strace is that the dynamic loader
>>> executes a few mprotect() calls on the application code mapped at
>>> 0x2a000000. The first one sets permissions to PROT_READ|PROT_WRITE,
>>> which implies that it may need to do some modifications. This is
>>> followed by setting the PROT_READ|PROT_EXEC back.
>> This is probably for one of the GOT such like tables.  I seem to
>> remember that function calls to libraries are implemented as something
>> like:
>>
>> 	ldr	pc, . + 4
>> 	.word	0
>>
>> and the dynamic linker fixes up the ".word 0" to be the actual address.
>> This means that the dynamic linker requires RW access to this table,
>> but then has to set it back to RX access so that the instructions can
>> be executed.
> 
> It looks like this is causing the problem. Setting the protection to RW
> and writing data (not instructions) causes the text page to be COW'ed
> (page mapped with MAP_PRIVATE). Some cache flushing is missing on VIPT
> caches during page copying for COW. With ARM11MPCore, the D-cache is
> write-allocate so it never makes it to the main memory for the I-cache
> to pick.
> 
> I'll look again next week on where to best add the flushing (or just
> modify the dynamic linker to avoid COW on text pages). Any suggestions?

Do you have any suggestions that might help Catalin with this?

Many thanks

Dirk

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-08-31  8:30         ` Catalin Marinas
@ 2009-09-07 15:29           ` Catalin Marinas
  2009-09-07 15:56             ` Dirk Behme
  2009-09-07 17:31             ` Mikael Pettersson
  0 siblings, 2 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-07 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Mon, 2009-08-31 at 09:30 +0100, Catalin Marinas wrote: 
> On Sat, 2009-08-29 at 13:27 +0100, Catalin Marinas wrote:
> > It looks like this is causing the problem. Setting the protection to RW
> > and writing data (not instructions) causes the text page to be COW'ed
> > (page mapped with MAP_PRIVATE). Some cache flushing is missing on VIPT
> > caches during page copying for COW. With ARM11MPCore, the D-cache is
> > write-allocate so it never makes it to the main memory for the I-cache
> > to pick.
> > 
> > I'll look again next week on where to best add the flushing (or just
> > modify the dynamic linker to avoid COW on text pages). Any suggestions?
> 
> After talking to the toolchain people, it seems that the dynamic linker
> is just doing whatever the ELF file says regarding the relocations. The
> problem in this case is that when compiling with -pie, one of the crt*.o
> files (and _start) used in PIE applications is not position-independent.
> 
> I think this was fixed (but not released yet) by CodeSourcery but you
> can get this behaviour if some files of an executable were not compiled
> with -fpic. So the mprotect cache flushing patch that I posted looks
> like a valid workaround.

There is a glibc patch to allow fully position-independent code and
avoid CoW for text pages:

http://sourceware.org/ml/libc-ports/2008-10/msg00009.html

We tested it and it seems to solve the problem without requiring a
kernel patch.

Any opinion on whether the ARM kernel should support dynamic shared
objects where not all objects are position-independent? IOW, whether
text relocations are allowed to be resolved at run-time rather than
compile (static link) time for the dynamic shared objects? AFAICT, there
isn't anything in the ARM EABI which would prevent this, so a kernel
patch may be needed.

There is a similar situation for RWX pages and CoW. In this case, the
copy_user_highpage() function should probably be modified to clean the
D-cache and invalidate the I-cache.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-07 15:29           ` Catalin Marinas
@ 2009-09-07 15:56             ` Dirk Behme
  2009-09-07 16:43               ` Catalin Marinas
  2009-09-07 17:31             ` Mikael Pettersson
  1 sibling, 1 reply; 72+ messages in thread
From: Dirk Behme @ 2009-09-07 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin,

Catalin Marinas wrote:
> Hi Russell,
> 
> On Mon, 2009-08-31 at 09:30 +0100, Catalin Marinas wrote: 
>> On Sat, 2009-08-29 at 13:27 +0100, Catalin Marinas wrote:
>>> It looks like this is causing the problem. Setting the protection to RW
>>> and writing data (not instructions) causes the text page to be COW'ed
>>> (page mapped with MAP_PRIVATE). Some cache flushing is missing on VIPT
>>> caches during page copying for COW. With ARM11MPCore, the D-cache is
>>> write-allocate so it never makes it to the main memory for the I-cache
>>> to pick.
>>>
>>> I'll look again next week on where to best add the flushing (or just
>>> modify the dynamic linker to avoid COW on text pages). Any suggestions?
>> After talking to the toolchain people, it seems that the dynamic linker
>> is just doing whatever the ELF file says regarding the relocations. The
>> problem in this case is that when compiling with -pie, one of the crt*.o
>> files (and _start) used in PIE applications is not position-independent.
>>
>> I think this was fixed (but not released yet) by CodeSourcery but you
>> can get this behaviour if some files of an executable were not compiled
>> with -fpic. So the mprotect cache flushing patch that I posted looks
>> like a valid workaround.
> 
> There is a glibc patch to allow fully position-independent code and
> avoid CoW for text pages:
> 
> http://sourceware.org/ml/libc-ports/2008-10/msg00009.html
> 
> We tested it and it seems to solve the problem without requiring a
> kernel patch.

Regarding "fix the tool chain/libc and not the kernel":

The initial issue we are discussing here was found with a user space 
application downloaded as binary. Namely Ubuntu-ARM. While we could 
try to fix some tool chains, we never will be able to fix all tool 
chains out there. There will be ever some binaries compiled with a 
non-patched tool chain, resulting in ARM11 MPCore failures.

Additionally, this Ubuntu ARM port runs fine on single core ARM11 
(e.g. N800) or Cortex A8 (e.g OMAP3 based BeagleBoard). It's my 
understanding that the issue we are discussing here is ARM11 MPCore 
kernel (cache handling?) specific, then. So it should be fixed in 
kernel, too, to be able to run the same binaries the same way on ARM11 
MPCore like on ARM11 single core or Cortex A8.

> Any opinion on whether the ARM kernel should support dynamic shared
> objects where not all objects are position-independent? IOW, whether
> text relocations are allowed to be resolved at run-time rather than
> compile (static link) time for the dynamic shared objects? AFAICT, there
> isn't anything in the ARM EABI which would prevent this, so a kernel
> patch may be needed.

 From what mentioned above, I think a kernel patch is needed.

> There is a similar situation for RWX pages and CoW. In this case, the
> copy_user_highpage() function should probably be modified to clean the
> D-cache and invalidate the I-cache.

Many thanks for your help and best regards

Dirk

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-07 15:56             ` Dirk Behme
@ 2009-09-07 16:43               ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-07 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-07 at 17:56 +0200, Dirk Behme wrote:
> Catalin Marinas wrote:
> > There is a glibc patch to allow fully position-independent code and
> > avoid CoW for text pages:
> > 
> > http://sourceware.org/ml/libc-ports/2008-10/msg00009.html
> > 
> > We tested it and it seems to solve the problem without requiring a
> > kernel patch.
> 
> Regarding "fix the tool chain/libc and not the kernel":
> 
> The initial issue we are discussing here was found with a user space 
> application downloaded as binary. Namely Ubuntu-ARM. While we could 
> try to fix some tool chains, we never will be able to fix all tool 
> chains out there. There will be ever some binaries compiled with a 
> non-patched tool chain, resulting in ARM11 MPCore failures.

Please don't get me wrong, I'm in favour of a kernel fix even though
future toolchains may avoid this problem. I'm just trying clarify the
issue and get another opinion from Russell or others on this list.

> Additionally, this Ubuntu ARM port runs fine on single core ARM11 
> (e.g. N800) or Cortex A8 (e.g OMAP3 based BeagleBoard). It's my 
> understanding that the issue we are discussing here is ARM11 MPCore 
> kernel (cache handling?) specific, then. 

It's a problem with write-allocate caches, it isn't related to SMP. When
CONFIG_SMP is enabled, the kernel sets the cache policy to writealloc.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-07 15:29           ` Catalin Marinas
  2009-09-07 15:56             ` Dirk Behme
@ 2009-09-07 17:31             ` Mikael Pettersson
  2009-09-07 21:40               ` Catalin Marinas
  1 sibling, 1 reply; 72+ messages in thread
From: Mikael Pettersson @ 2009-09-07 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas writes:
 > Any opinion on whether the ARM kernel should support dynamic shared
 > objects where not all objects are position-independent? IOW, whether
 > text relocations are allowed to be resolved at run-time rather than
 > compile (static link) time for the dynamic shared objects? AFAICT, there
 > isn't anything in the ARM EABI which would prevent this, so a kernel
 > patch may be needed.

I didn't follow the start of this thread, but what exactly
do you mean by "kernel support" for runtime relocations in
shared objects? As far as I can tell, all that would happen
is that some non-pic .so is mapped somewhere, the user-space
linker runs and fixes relocations (directly or lazily), writes
to COW pages, causing the kernel to copy them to new page frames.

In my book that's just "normal" actions for mmaps and pagefaults.

Is the problem that the kernel has buggy cache maintenance when
COW-ing text pages on this arch?

/Mikael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-07 17:31             ` Mikael Pettersson
@ 2009-09-07 21:40               ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-07 21:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-07 at 19:31 +0200, Mikael Pettersson wrote:
> Catalin Marinas writes:
>  > Any opinion on whether the ARM kernel should support dynamic shared
>  > objects where not all objects are position-independent? IOW, whether
>  > text relocations are allowed to be resolved at run-time rather than
>  > compile (static link) time for the dynamic shared objects? AFAICT, there
>  > isn't anything in the ARM EABI which would prevent this, so a kernel
>  > patch may be needed.
> 
> I didn't follow the start of this thread, but what exactly
> do you mean by "kernel support" for runtime relocations in
> shared objects? As far as I can tell, all that would happen
> is that some non-pic .so is mapped somewhere, the user-space
> linker runs and fixes relocations (directly or lazily), writes
> to COW pages, causing the kernel to copy them to new page frames.
> 
> In my book that's just "normal" actions for mmaps and pagefaults.
> 
> Is the problem that the kernel has buggy cache maintenance when
> COW-ing text pages on this arch?

Yes (IMHO), pretty much. Two situations here - (a) COW-ing RW pages with
mprotect(RX) afterwards and (b) COW-wing RWX pages. The latter isn't
usually found in current filesystems, so we can ignore it of now. As for
the former, we don't want to clean the cache at every COW of RW pages
(on VIPT/PIPT hardware), so the best place is during mprotect(RX).

Alternatively (*not* my preferred solution), we tell the toolchain
people that such thing as COW text pages isn't supported, so I suspect
other architectures can cope fine with this.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore)
       [not found]               ` <87k518yc8a.fsf@brigitte.kvy.fi>
@ 2009-09-11  9:21                 ` Catalin Marinas
  2009-09-11 12:55                   ` Bill Gatliff
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-11  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

(sorry for the repost Antti, the message was sent to the old list and
didn't like the "was:" stuff in the subject and the reply headers)

On Thu, 2009-08-13 at 00:05 +0300, Antti P Miettinen wrote:
> BTW - the smsc911x does not work for me in the
> recent kernels. I had to change back to smc911x. Otherwise my boot
> from NFS root hangs pretty early. I think it used to be the other way
> around in some kernel version.

I added the commit below to my tree and the new driver now seems to work
fine with ARM11MPCore systems:

http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=246a6cc1f1fa52e96a08f327337c5616eb634825

I'm waiting for the SMSC guys to look into this issue. It may just be
that some of the read-after-write or read-after-read timing constraints
are violated when more than one CPU tries to access the Ethernet chip.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore)
  2009-09-11  9:21                 ` smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore) Catalin Marinas
@ 2009-09-11 12:55                   ` Bill Gatliff
  2009-09-11 13:00                     ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Bill Gatliff @ 2009-09-11 12:55 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> (sorry for the repost Antti, the message was sent to the old list and
> didn't like the "was:" stuff in the subject and the reply headers)
>
> On Thu, 2009-08-13 at 00:05 +0300, Antti P Miettinen wrote:
>   
>> BTW - the smsc911x does not work for me in the
>> recent kernels. I had to change back to smc911x. Otherwise my boot
>> from NFS root hangs pretty early. I think it used to be the other way
>> around in some kernel version.
>>     
>
> I added the commit below to my tree and the new driver now seems to work
> fine with ARM11MPCore systems:
>
> http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=246a6cc1f1fa52e96a08f327337c5616eb634825
>
> I'm waiting for the SMSC guys to look into this issue. It may just be
> that some of the read-after-write or read-after-read timing constraints
> are violated when more than one CPU tries to access the Ethernet chip.
>
>   

Why take the lock in smsc911x_tx_writefifo?  It looks like you take the 
same lock again in smsc911x_reg_write.


b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore)
  2009-09-11 12:55                   ` Bill Gatliff
@ 2009-09-11 13:00                     ` Catalin Marinas
  2009-09-11 15:20                       ` Bill Gatliff
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-11 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2009-09-11 at 07:55 -0500, Bill Gatliff wrote:
> Catalin Marinas wrote:
> > (sorry for the repost Antti, the message was sent to the old list and
> > didn't like the "was:" stuff in the subject and the reply headers)
> >
> > On Thu, 2009-08-13 at 00:05 +0300, Antti P Miettinen wrote:
> >   
> >> BTW - the smsc911x does not work for me in the
> >> recent kernels. I had to change back to smc911x. Otherwise my boot
> >> from NFS root hangs pretty early. I think it used to be the other way
> >> around in some kernel version.
> >>     
> >
> > I added the commit below to my tree and the new driver now seems to work
> > fine with ARM11MPCore systems:
> >
> > http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=246a6cc1f1fa52e96a08f327337c5616eb634825
> >
> > I'm waiting for the SMSC guys to look into this issue. It may just be
> > that some of the read-after-write or read-after-read timing constraints
> > are violated when more than one CPU tries to access the Ethernet chip.
> 
> Why take the lock in smsc911x_tx_writefifo?  It looks like you take the 
> same lock again in smsc911x_reg_write.

Ah, ok, I haven't tried the 16-bit mode, only the 32-bit one and was
working fine, no deadlocks. But I can't say whether that's the right
fix, it's better for SMSC to look into it.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore)
  2009-09-11 13:00                     ` Catalin Marinas
@ 2009-09-11 15:20                       ` Bill Gatliff
  2009-09-11 16:06                         ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Bill Gatliff @ 2009-09-11 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> On Fri, 2009-09-11 at 07:55 -0500, Bill Gatliff wrote:
>   
>>
>> Why take the lock in smsc911x_tx_writefifo?  It looks like you take the 
>> same lock again in smsc911x_reg_write.
>>     
>
> Ah, ok, I haven't tried the 16-bit mode, only the 32-bit one and was
> working fine, no deadlocks. But I can't say whether that's the right
> fix, it's better for SMSC to look into it.
>
>   

I certainly won't object to SMSC taking a look at it.  :)  But your 
changes pass the common-sense test AFAICT, apart from my question above.




b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore)
  2009-09-11 15:20                       ` Bill Gatliff
@ 2009-09-11 16:06                         ` Catalin Marinas
  2009-10-06  6:12                           ` smsc911x.c driver and SMP Antti P Miettinen
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-11 16:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2009-09-11 at 10:20 -0500, Bill Gatliff wrote:
> Catalin Marinas wrote:
> > On Fri, 2009-09-11 at 07:55 -0500, Bill Gatliff wrote:
> >> Why take the lock in smsc911x_tx_writefifo?  It looks like you take the 
> >> same lock again in smsc911x_reg_write.
> >
> > Ah, ok, I haven't tried the 16-bit mode, only the 32-bit one and was
> > working fine, no deadlocks. But I can't say whether that's the right
> > fix, it's better for SMSC to look into it.
> 
> I certainly won't object to SMSC taking a look at it.  :)  But your 
> changes pass the common-sense test AFAICT, apart from my question above.

Fixed it with this new commit (I'm not posting the patch here as it's
not meant for review on this list):

http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=3cc8ee538f6727d1f6773fff6d8e31bc5522bfee

Thanks for spotting the problem.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
       [not found]           ` <1253435940.498.15.camel@pc1117.cambridge.arm.com>
@ 2009-09-20  9:31             ` Russell King - ARM Linux
  2009-09-20 19:02               ` Russell King - ARM Linux
  2009-09-20 22:02               ` Catalin Marinas
  0 siblings, 2 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-20  9:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Sep 20, 2009 at 09:39:00AM +0100, Catalin Marinas wrote:
> On Sat, 2009-09-19 at 23:40 +0100, Russell King - ARM Linux wrote:
> > On Mon, Aug 17, 2009 at 06:25:16PM +0100, Catalin Marinas wrote:
> > > Assuming that the dynamic linker does instruction modifications as well
> > > and expects the mprotect(RX) to flush the caches, the patch below
> > > appears to fix the problem (not intensively tested). Note that I don't
> > > say this is the right fix but it may work around the problem until
> > > further investigation into the dynamic linker.
> > 
> > Having now re-read the start of the thread, and put all the pieces
> > together, the problem is not to do with SMP per-se, or Icache
> > problems.
> 
> It's the I-D cache coherency.

You may be right, but the current evidence does not support that.
If what you say is true, then all current ARMv6 and ARMv7 non-SMP
systems would be affected.  So far, the bug report is only against
SMP systems, where the cache is always forced to write allocate mode.

> > I'd like to request that someone who can prove that the program works
> > on ARMv6/v7 hardware does the following test:
> > 
> > 1. boot the system with cachepolicy=writealloc
> > 2. re-test the program
> 
> I don't think this would work. All the non-SMP v6/v7 processors I'm
> aware of only support read-allocate caches, even if you try to force
> write-allocate. On the SMP ones (Cortex-A9, ARM11MPCore), write-allocate
> is the default.

Are you sure - I thought some of them did support write allocate.

> I also recall that the cachepolicy argument was only affecting the
> kernel mapping rather than the user one. Is this still the case?

Since changing the ptebits stuff, it affects everything.

> > I'll bet that it will fail - because the writes to userspace to fix up
> > the dynamic linking end up sitting in the data cache rather than hitting
> > the underlying page.  (I would try it myself, but I can't build EABI
> > userspace binaries here - only OABI stuff I'm afraid).
> 
> I've explained this already in this thread.

This thread is soo big that I couldn't find an explaination.  Besides,
my explaination appears to be different from yours.

> > I think what we need to do is to ensure that the copy_user_highpage
> > function is writing back data to the backing RAM, so it is visible
> > to the I-cache when COWs of executable pages occur.  However, while
> > we can pass this the vma, the vm_flags can't currently be used to
> > detect COW of temporarily non-executable pages - which is what we
> > want to detect to avoid having to clean the cache on every page
> > copy.
> 
> copy_user_highpage() would work if we can detect the VM_EXEC flag but in
> this case, the linker does mprotect(RW) before writing to the page (BTW,
> this function could be fixed as well for RWX pages).

"can't currently be used" - yes, I'm aware of this.  We could arrange
to remember that the region had executable permission, and use that as
a trigger for additional handling in copy_user_highpage().

> I don't think it's recommended to clean the D-cache (and invalidate the
> I-cache) every time in copy_user_highpage, therefore cache maintenance
> via mprotect -> change_protection -> flush_cache_range may be a better
> option.

I really don't believe so - try it yourself - run some benchmarks on your
ARMv6 or v7 system, comparing the results both with and without the patch.
Especially pay attention to the process creation/shell script performance.
I think you'll find that with your patch, it'll be worse than ARM systems
running at similar clock rates with VIVT caches.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-20  9:31             ` Russell King - ARM Linux
@ 2009-09-20 19:02               ` Russell King - ARM Linux
  2009-09-20 22:46                 ` Catalin Marinas
  2009-09-20 22:02               ` Catalin Marinas
  1 sibling, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-20 19:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Sep 20, 2009 at 10:31:39AM +0100, Russell King - ARM Linux wrote:
> On Sun, Sep 20, 2009 at 09:39:00AM +0100, Catalin Marinas wrote:
> > I don't think it's recommended to clean the D-cache (and invalidate the
> > I-cache) every time in copy_user_highpage, therefore cache maintenance
> > via mprotect -> change_protection -> flush_cache_range may be a better
> > option.
> 
> I really don't believe so - try it yourself - run some benchmarks on your
> ARMv6 or v7 system, comparing the results both with and without the patch.
> Especially pay attention to the process creation/shell script performance.
> I think you'll find that with your patch, it'll be worse than ARM systems
> running at similar clock rates with VIVT caches.

The figures reveal a 10% reduction in the performance of execve - that's
quite a nasty hit, basically meaning shell scripts will run about 10%
slower (shell scripts typically exec lots of programs.)

Using my proposal measures more favourably - there is no measurable impact
on execve itself (maybe a 0.5% reduction, which I consider to be in the
measurement noise), but a 5.5% reduction in the performance of fork()+exit()
- this is using __cpuc_coherent_kern_range() in
v6_copy_user_highpage_nonaliasing() to ensure the new page is fully
coherent.

Here's my measurements, using the programs from an old version of the
byte benchmarks (not running it as per the benchmark due to the limited
environment on the Realview platform).  Basically, each program was
invoked four times, requesting 30 second runs, and the average taken.

The first two are really for establishing an indication that things are
still as they should be (iow, the hardware isn't playing silly buggers).

Run:
old     b42c634 bf45699 d374bf1 df297bf catalin vma1    vma2    cowcc

Dhrystone 2 RSS=332K:
7640923 7641386 7642194 7639866 7641322 7641308 7640809 7641096 7640737
99.995% 100.00% 100.01% 99.981% 100.00% 100.00% 99.993% 99.997% 99.993%

Pipe based context switching RSS=296K:
246648  244868  244402  242516  248877  248970  237815  246172  247082
99.104% 98.389% 98.202% 97.444% 100.00% 100.04% 95.555% 98.913% 99.279%

fork()+exit() (aka spawn) RSS=292K:
10492   10457   10460   10455   10485   10478   10524   10491   9968
100.07% 99.733% 99.762% 99.714% 100.00% 99.933% 100.37% 100.06% 95.069%

execve() (aka execl) RSS=188K:
2236    2219    2268    2247    2239    2033    2260    2263    2222
99.866% 99.107% 101.30% 100.36% 100.00% 90.799% 100.94% 101.05% 99.241%


The runs are (the board was power cycled before each run):
old - my published tip of tree on Saturday
df297bf - my (slightly modified) prefetch abort fixes (the runs between
    these represent various stages through that work.)
catalin - your patch
vma1 - passing but not using the vma to the copy_user_highpage
    functions.  this appears to be an anomalous run.
vma2 - repeated run
cowcc - adding cache handling to v6_copy_user_highpage_nonaliasing()


One thing I have noticed: it takes the Realview SMP board _two_ attempts
to boot a kernel.  The first attempt tends to cause a spontaneous reboot
when the CLCD controller is enabled, or possibly a hang.  The second
attempt seems to always run fine.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-20  9:31             ` Russell King - ARM Linux
  2009-09-20 19:02               ` Russell King - ARM Linux
@ 2009-09-20 22:02               ` Catalin Marinas
  2009-09-22  5:44                 ` Shilimkar, Santosh
  1 sibling, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-20 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Sun, Sep 20, 2009 at 09:39:00AM +0100, Catalin Marinas wrote:
>> On Sat, 2009-09-19 at 23:40 +0100, Russell King - ARM Linux wrote:
>>> On Mon, Aug 17, 2009 at 06:25:16PM +0100, Catalin Marinas wrote:
>>>> Assuming that the dynamic linker does instruction modifications as well
>>>> and expects the mprotect(RX) to flush the caches, the patch below
>>>> appears to fix the problem (not intensively tested). Note that I don't
>>>> say this is the right fix but it may work around the problem until
>>>> further investigation into the dynamic linker.
>>> Having now re-read the start of the thread, and put all the pieces
>>> together, the problem is not to do with SMP per-se, or Icache
>>> problems.
 >>
>> It's the I-D cache coherency.
> 
> You may be right, but the current evidence does not support that.
> If what you say is true, then all current ARMv6 and ARMv7 non-SMP
> systems would be affected.  So far, the bug report is only against
> SMP systems, where the cache is always forced to write allocate mode.

It is quite unlikely, though not impossible, for the I-cache to have 
stale entries. That's mainly because by the time a page cache page is 
reused for a different file, the corresponding I-cache entries are long 
gone. You could try on a software model to limit the amount of RAM and 
increase the I-cache size (I think AEM supports pseudo-infinite caches).

Data (instruction opcodes) not reaching the RAM because of 
write-allocate D-cache is the main issue, but it would be better to 
cover the I-cache coherency to avoid hard to reproduce bugs on some 
hardware configurations.

>>> I'd like to request that someone who can prove that the program works
>>> on ARMv6/v7 hardware does the following test:
>>>
>>> 1. boot the system with cachepolicy=writealloc
>>> 2. re-test the program
 >>
>> I don't think this would work. All the non-SMP v6/v7 processors I'm
>> aware of only support read-allocate caches, even if you try to force
>> write-allocate. On the SMP ones (Cortex-A9, ARM11MPCore), write-allocate
>> is the default.
> 
> Are you sure - I thought some of them did support write allocate.

I'm not entirely sure but that's what I recall. Anyway, you can run a UP 
kernel on ARM11MPCore.

>> I also recall that the cachepolicy argument was only affecting the
>> kernel mapping rather than the user one. Is this still the case?
> 
> Since changing the ptebits stuff, it affects everything.

Great.

>>> I think what we need to do is to ensure that the copy_user_highpage
>>> function is writing back data to the backing RAM, so it is visible
>>> to the I-cache when COWs of executable pages occur.  However, while
>>> we can pass this the vma, the vm_flags can't currently be used to
>>> detect COW of temporarily non-executable pages - which is what we
>>> want to detect to avoid having to clean the cache on every page
>>> copy.
 >>
>> copy_user_highpage() would work if we can detect the VM_EXEC flag but in
>> this case, the linker does mprotect(RW) before writing to the page (BTW,
>> this function could be fixed as well for RWX pages).
> 
> "can't currently be used" - yes, I'm aware of this.  

Sorry, I missed that line (too early in the morning).

> We could arrange
> to remember that the region had executable permission, and use that as
> a trigger for additional handling in copy_user_highpage().

Can the current dirty mechanism used for UP kernels be extended to cover 
this? The copy_user_highpage() could mark the page as dirty and later 
flush_cache_range() called via mprotect() could check this bit, similar 
to update_mmu_cache(). This would work on Cortex-A9 (where cache 
operations are detected by the snoop unit) but not on ARM11MPCore, where 
we can do it non-lazily.

We have the mechanism in place already, we could call 
flush_dcache_page(to) in copy_user_highpage() (which makes sense since 
the kernel is writing to a page visible to the user). Ideally, 
change_protection() should call update_mmu_cache() as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-20 19:02               ` Russell King - ARM Linux
@ 2009-09-20 22:46                 ` Catalin Marinas
  2009-09-21  8:31                   ` Jamie Lokier
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-20 22:46 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Sun, Sep 20, 2009 at 10:31:39AM +0100, Russell King - ARM Linux wrote:
>> On Sun, Sep 20, 2009 at 09:39:00AM +0100, Catalin Marinas wrote:
>>> I don't think it's recommended to clean the D-cache (and invalidate the
>>> I-cache) every time in copy_user_highpage, therefore cache maintenance
>>> via mprotect -> change_protection -> flush_cache_range may be a better
>>> option.
>> I really don't believe so - try it yourself - run some benchmarks on your
>> ARMv6 or v7 system, comparing the results both with and without the patch.
>> Especially pay attention to the process creation/shell script performance.
>> I think you'll find that with your patch, it'll be worse than ARM systems
>> running at similar clock rates with VIVT caches.
> 
> The figures reveal a 10% reduction in the performance of execve - that's
> quite a nasty hit, basically meaning shell scripts will run about 10%
> slower (shell scripts typically exec lots of programs.)
> 
> Using my proposal measures more favourably - there is no measurable impact
> on execve itself (maybe a 0.5% reduction, which I consider to be in the
> measurement noise), but a 5.5% reduction in the performance of fork()+exit()
> - this is using __cpuc_coherent_kern_range() in
> v6_copy_user_highpage_nonaliasing() to ensure the new page is fully
> coherent.

Thanks for running these benchmarks. The results on both your and my 
patch are affected by invalidating the whole I-cache in 
v6_coherent_user_range() rather than doing it by line (that's historical 
because of some erratum on ARM1136 - maybe we should fix this).

Another thing that's affecting the performance of my patch as it 
currently is (and withtout changing generic code) - the D-cache flushing 
generates a fault in some situations which takes time to process. I can 
fix this by using the VAtoPA translation registers in the 
coherent_user_range function.

Anyway, I think it depends on the type of applications you are running. 
I personally don't see shell performance too important, so we may 
disagree on the best fix here.

For a web server (Apache) where you have plenty of forks, your patch 
might affect the performance quite a lot as you get many 
copy_user_highpage() calls for CoW (BTW, unrelated to this issue, 
www.linux-arm.org, including the Git server, is hosted on a set of 
Marvell MV78100 boards - http://www.linux-arm.org/Main/LinuxArmOrg).

While we can choose benchmarks to show that either option is bad, we 
should probably try to get an optimal solution.

My view is that something similar to flush_dcache_page + 
update_mmu_cache would be better (though maybe not these functions 
directly but could try to reuse PG_arch_1).

> One thing I have noticed: it takes the Realview SMP board _two_ attempts
> to boot a kernel.  The first attempt tends to cause a spontaneous reboot
> when the CLCD controller is enabled, or possibly a hang.  The second
> attempt seems to always run fine.

I noticed this as well only on RealView EB but not all boards. The other 
SMP boards I have are fine. It could be a hardware bug, I don't see 
anything obvious in Linux.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-20 22:46                 ` Catalin Marinas
@ 2009-09-21  8:31                   ` Jamie Lokier
  2009-09-21  8:41                     ` Russell King - ARM Linux
  2009-09-21  8:49                     ` Catalin Marinas
  0 siblings, 2 replies; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21  8:31 UTC (permalink / raw)
  To: linux-arm-kernel

I'm not able to tell from this thread, and I don't have the hardware
to test either:

I have some userspace ARM code which modifies instructions and data
used by those instructions in a few pages, using mprotect() to make
them writable, modify, and make them PROT_READ|PROT_EXEC again.

There is no execution of the modified code _during_ the modification,
only afterwards.

I expect the behaviour is the same as that toolchain which modifies
instructions in the ELF PLT.  (Which, by the way, an ARM FDPIC-ELF
for no-MMU I've been working on also does).

Would the crash problems being discussed affect that sort of code in
general on released ARM kernels?  Do I need an I-cache flush in
userspace after the mprotect - is that required, and will that always
be enough?  Will it still be required when the fix is in?

My question is about some simple code patching at application startup.
But, generalising to a JIT code generator, does it complicate matters
if code is being executed from a page _at the same time_ as another
thread (perhaps on another CPU) is writing to another part of the
_same page_ - writing code, and it's associated local data, to be
executed shortly after it's written while continuing to execute the
earlier code?

Thanks!
-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  8:31                   ` Jamie Lokier
@ 2009-09-21  8:41                     ` Russell King - ARM Linux
  2009-09-21  9:41                       ` Jamie Lokier
  2009-09-21  8:49                     ` Catalin Marinas
  1 sibling, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-21  8:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 09:31:09AM +0100, Jamie Lokier wrote:
> I'm not able to tell from this thread, and I don't have the hardware
> to test either:
> 
> I have some userspace ARM code which modifies instructions and data
> used by those instructions in a few pages, using mprotect() to make
> them writable, modify, and make them PROT_READ|PROT_EXEC again.
> 
> There is no execution of the modified code _during_ the modification,
> only afterwards.

We've had a syscall to handle the cache issues to do with self
modifying code - __ARM_NR_cacheflush.  See the comments associated
with 'cacheflush' in arch/arm/kernel/traps.c about how to use it.
This is the situation which the syscall was designed to address back
in the 1990s.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  8:31                   ` Jamie Lokier
  2009-09-21  8:41                     ` Russell King - ARM Linux
@ 2009-09-21  8:49                     ` Catalin Marinas
  2009-09-21  8:54                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 09:31 +0100, Jamie Lokier wrote:
> I'm not able to tell from this thread, and I don't have the hardware
> to test either:
> 
> I have some userspace ARM code which modifies instructions and data
> used by those instructions in a few pages, using mprotect() to make
> them writable, modify, and make them PROT_READ|PROT_EXEC again.

As Russell said, we have system call dedicated to flushing the caches if
you write instructions. Now it's arguable whether using mprotect(RX)
should also flush the cache.

> I expect the behaviour is the same as that toolchain which modifies
> instructions in the ELF PLT.  (Which, by the way, an ARM FDPIC-ELF
> for no-MMU I've been working on also does).

In this thread, the dynamic linker only writes data. The CoW mechanism
has the side-effect of incoherent I-D caches.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  8:49                     ` Catalin Marinas
@ 2009-09-21  8:54                       ` Russell King - ARM Linux
  2009-09-21  9:44                         ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-21  8:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 09:49:04AM +0100, Catalin Marinas wrote:
> On Mon, 2009-09-21 at 09:31 +0100, Jamie Lokier wrote:
> > I expect the behaviour is the same as that toolchain which modifies
> > instructions in the ELF PLT.  (Which, by the way, an ARM FDPIC-ELF
> > for no-MMU I've been working on also does).
> 
> In this thread, the dynamic linker only writes data. The CoW mechanism
> has the side-effect of incoherent I-D caches.

As I already said, that is not really the problem, otherwise all ARMv6
CPUs would fail to work.  This is clearly not the case.  The problem is
the data sitting in the data cache with write allocate caches.

The instruction cache issue is an entirely separate problem.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  8:41                     ` Russell King - ARM Linux
@ 2009-09-21  9:41                       ` Jamie Lokier
  2009-09-21 10:08                         ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21  9:41 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 09:31:09AM +0100, Jamie Lokier wrote:
> > I'm not able to tell from this thread, and I don't have the hardware
> > to test either:
> > 
> > I have some userspace ARM code which modifies instructions and data
> > used by those instructions in a few pages, using mprotect() to make
> > them writable, modify, and make them PROT_READ|PROT_EXEC again.
> > 
> > There is no execution of the modified code _during_ the modification,
> > only afterwards.
> 
> We've had a syscall to handle the cache issues to do with self
> modifying code - __ARM_NR_cacheflush.  See the comments associated
> with 'cacheflush' in arch/arm/kernel/traps.c about how to use it.
> This is the situation which the syscall was designed to address back
> in the 1990s.

Understood already.  Other archs have a similar call.  My question was
multiple questions, specifically in response to this thread:

  - Whether sys_cacheflush is necessary when doing mprotect RW->RX
    after writing new code, or does the mprotect imply it (I presume
    _mmap_ always does);

  - If a kernel bug has been uncovered which makes sys_cacheflush
    insufficient for that in some obscure case under discussion with
    COW pages;

  - Whether it is necessary if only the data words in a code+data
    page are modified (similar to ELF PLT updates), in principle, and

  - If the above answer is no, with only the data words written, the
    bug under discussion, is sys_cacheflush necessary to work with
    unfixed kernels?

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  8:54                       ` Russell King - ARM Linux
@ 2009-09-21  9:44                         ` Catalin Marinas
  2009-09-21 10:07                           ` Russell King - ARM Linux
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 09:54 +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 09:49:04AM +0100, Catalin Marinas wrote:
> > On Mon, 2009-09-21 at 09:31 +0100, Jamie Lokier wrote:
> > > I expect the behaviour is the same as that toolchain which modifies
> > > instructions in the ELF PLT.  (Which, by the way, an ARM FDPIC-ELF
> > > for no-MMU I've been working on also does).
> > 
> > In this thread, the dynamic linker only writes data. The CoW mechanism
> > has the side-effect of incoherent I-D caches.
> 
> As I already said, that is not really the problem, otherwise all ARMv6
> CPUs would fail to work.  This is clearly not the case.  The problem is
> the data sitting in the data cache with write allocate caches.

I would still call this I-D cache coherency issue since the two caches
have a different view of the RAM but I agree that the D-cache is the one
holding the data (with a slight chance for the I-cache not to be in sync
with main RAM, though we could treat it separately).

We can sort out the D-cache issue with your approach for cleaning it in
the copy_user_highpage() function, but, as I said, we affect the
standard CoW mechanism for data pages quite a lot.

Since the latest dynamic linker was fixed already to avoid writing to a
text page, we would penalise (with either solution) user-space in the
future even if it doesn't need this.

If the linker would call mprotect(RWX), things would be simpler in
copy_user_highpage as we can detect VM_EXEC.

> The instruction cache issue is an entirely separate problem.

We would need to fix this somehow as well. We currently handle the
I-cache in update_mmu_cache() when a page is first mapped if it has
VM_EXEC set. But mprotect() or change_protection() don't seem to call
this. Should we mandate a cacheflush syscall in user space when calling
mprotect(RX)? I don't think people are expecting this.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  9:44                         ` Catalin Marinas
@ 2009-09-21 10:07                           ` Russell King - ARM Linux
  2009-09-21 10:42                             ` Catalin Marinas
                                               ` (3 more replies)
  0 siblings, 4 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-21 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> I would still call this I-D cache coherency issue since the two caches
> have a different view of the RAM but I agree that the D-cache is the one
> holding the data (with a slight chance for the I-cache not to be in sync
> with main RAM, though we could treat it separately).
> 
> We can sort out the D-cache issue with your approach for cleaning it in
> the copy_user_highpage() function, but, as I said, we affect the
> standard CoW mechanism for data pages quite a lot.

Let me restate my approach more clearly:

1. Remember that a VMA has been executable.
2. Only do the additional handing if the VMA has been executable.

> Since the latest dynamic linker was fixed already to avoid writing to a
> text page, we would penalise (with either solution) user-space in the
> future even if it doesn't need this.

Great - that means a previously executable region won't be COWed anymore,
and so my patch will have no measurable impact.

> > The instruction cache issue is an entirely separate problem.
> 
> We would need to fix this somehow as well. We currently handle the
> I-cache in update_mmu_cache() when a page is first mapped if it has
> VM_EXEC set.

The reason I'm pushing you hard to separate the two issues is that the
two should be treated separately.  I think we need to consider ensuring
that freed pages do not have any I-cache lines associated with them,
rather than waiting for them to be allocated and then dealing with the
I-cache problem.

We need to measure what the impact of that would be.

> But mprotect() or change_protection() don't seem to call this.

That's because update_mmu_cache() is a TLB interface, not a cache
interface.  You'd have to call update_mmu_cache() for every individual
page.  See cachetlb.txt.

> Should we mandate a cacheflush syscall in user space when calling
> mprotect(RX)? I don't think people are expecting this.

It is not clear what effect mprotect() with harvard caches should have
on I+D coherency - certainly there's no behavioural requirements in
this regard specified by POSIX.

Given that if you have a RWX region you have to deal with the coherency
issue in userspace already, I don't see mprotect() being any different
in this regard.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21  9:41                       ` Jamie Lokier
@ 2009-09-21 10:08                         ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 10:41 +0100, Jamie Lokier wrote:
>   - Whether sys_cacheflush is necessary when doing mprotect RW->RX
>     after writing new code, or does the mprotect imply it (I presume
>     _mmap_ always does);

mmap() doesn't do this as it doesn't map the pages itself. The pages are
mapped later during fault processing and it's the
flush_dcache_page/update_mmu_cache combination that takes care of the
D-cache cleaning and I-cache invalidation.

IMHO, mprotect(RX) should do the flushing but let's see Russell's
opinion (that's what my patch proposes).

>   - If a kernel bug has been uncovered which makes sys_cacheflush
>     insufficient for that in some obscure case under discussion with
>     COW pages;

sys_cacheflush isn't insufficient with write-allocate caches, it's only
that the dynamic linker doesn't use it (since it's only writing data but
in a text page).

>   - Whether it is necessary if only the data words in a code+data
>     page are modified (similar to ELF PLT updates), in principle, and

IMHO, no.

>   - If the above answer is no, with only the data words written, the
>     bug under discussion, is sys_cacheflush necessary to work with
>     unfixed kernels?

sys_cacheflush would work around the problem but distributions,
including Debian and Ubuntu, are affected and the dynamic linker would
need to be fixed (for such filesystems it's easier to push a kernel fix
than a toolchain one).

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 10:07                           ` Russell King - ARM Linux
@ 2009-09-21 10:42                             ` Catalin Marinas
  2009-09-21 20:10                             ` Jamie Lokier
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 11:07 +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > I would still call this I-D cache coherency issue since the two caches
> > have a different view of the RAM but I agree that the D-cache is the one
> > holding the data (with a slight chance for the I-cache not to be in sync
> > with main RAM, though we could treat it separately).
> > 
> > We can sort out the D-cache issue with your approach for cleaning it in
> > the copy_user_highpage() function, but, as I said, we affect the
> > standard CoW mechanism for data pages quite a lot.
> 
> Let me restate my approach more clearly:
> 
> 1. Remember that a VMA has been executable.
> 2. Only do the additional handing if the VMA has been executable.

If we don't do anything with mprotect(RX), I'm fine with this approach.
Can the VM_MAYEXEC flag be used or we would need to use some of the
pgprot bits?

The .S parts of my patch would need to be merged as well but with a
different description. That's because the sys_cacheflush() can easily
generate a kernel oops if it is called on an mmap'ed page where the
pages haven't been mapped yet and the vma is valid.

> > > The instruction cache issue is an entirely separate problem.
> > 
> > We would need to fix this somehow as well. We currently handle the
> > I-cache in update_mmu_cache() when a page is first mapped if it has
> > VM_EXEC set.
> 
> The reason I'm pushing you hard to separate the two issues is that the
> two should be treated separately.  I think we need to consider ensuring
> that freed pages do not have any I-cache lines associated with them,
> rather than waiting for them to be allocated and then dealing with the
> I-cache problem.

Yes, this approach should work as well but I can't tell the impact (some
lmbench tests would be useful, though not sure when I'll have time).

> > But mprotect() or change_protection() don't seem to call this.
> 
> That's because update_mmu_cache() is a TLB interface, not a cache
> interface.  You'd have to call update_mmu_cache() for every individual
> page.  See cachetlb.txt.

It's a TLB interface but it's also used for lazy cache flushing.
According to cachetlb.txt it isn't used for change_protection() I don't
think we can change this without hassle.

> > Should we mandate a cacheflush syscall in user space when calling
> > mprotect(RX)? I don't think people are expecting this.
> 
> It is not clear what effect mprotect() with harvard caches should have
> on I+D coherency - certainly there's no behavioural requirements in
> this regard specified by POSIX.
> 
> Given that if you have a RWX region you have to deal with the coherency
> issue in userspace already, I don't see mprotect() being any different
> in this regard.

Thanks. I won't try to push for an mprotect() change here as I don't
have a very strong opinion (but I've seen people expecting this to
work). Anyway, I just wanted to get a statement from you on this issue.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 10:07                           ` Russell King - ARM Linux
  2009-09-21 10:42                             ` Catalin Marinas
@ 2009-09-21 20:10                             ` Jamie Lokier
  2009-09-21 21:26                               ` Russell King - ARM Linux
  2009-09-21 21:58                               ` Catalin Marinas
  2009-09-21 21:38                             ` Russell King - ARM Linux
  2009-10-15 14:57                             ` Russell King - ARM Linux
  3 siblings, 2 replies; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21 20:10 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > I would still call this I-D cache coherency issue since the two caches
> > have a different view of the RAM but I agree that the D-cache is the one
> > holding the data (with a slight chance for the I-cache not to be in sync
> > with main RAM, though we could treat it separately).
> > 
> > We can sort out the D-cache issue with your approach for cleaning it in
> > the copy_user_highpage() function, but, as I said, we affect the
> > standard CoW mechanism for data pages quite a lot.
> 
> Let me restate my approach more clearly:
> 
> 1. Remember that a VMA has been executable.
> 2. Only do the additional handing if the VMA has been executable.

Sorry, I'm a little confused, and I'm trying to understand what I can
safely assume is reliable when using mprotect.

If the problem is data in the D-cache not being flushed to be read as
data from a text page (i.e. nothing to do with I-cache, it's all about
the D-cache between different mappings), why is the previous
executableness of the VMA relevant to the solution?

And here's a little something:

http://www.mail-archive.com/aufs-users at lists.sourceforge.net/msg02093.html

It's about MIPS, but has an awful lot of things in common with the bug
being discussed in this thread: dynamic linker, constants embedded in
the code, using mprotect rx->rw->rx, missing I-cache flush, only
affects COW, copy_user_highpage(), is worked around by switching the
cache from write-back to write-through...

Useful?

I found that while searching to see if mprotect rw->rx implies I-cache
flush.  On IRIX it's explicitly documented to, in fact it has
PROT_EXEC_NOFLUSH in case you want to optimise that away :-) Haven't
found anything to confirm or deny it for Linux or anything else,
though.

Hopefully it's clear that munmap of the region, followed by mmap
PROT_READ|PROTE_EXEC to restore the mapping with different permissions
(when it has a backing file) - hopefully it's clear that _that_ will
do the needed I-cache flush.

-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 20:10                             ` Jamie Lokier
@ 2009-09-21 21:26                               ` Russell King - ARM Linux
  2009-09-21 22:14                                 ` Catalin Marinas
  2009-09-21 22:25                                 ` Jamie Lokier
  2009-09-21 21:58                               ` Catalin Marinas
  1 sibling, 2 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-21 21:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 09:10:43PM +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > I would still call this I-D cache coherency issue since the two caches
> > > have a different view of the RAM but I agree that the D-cache is the one
> > > holding the data (with a slight chance for the I-cache not to be in sync
> > > with main RAM, though we could treat it separately).
> > > 
> > > We can sort out the D-cache issue with your approach for cleaning it in
> > > the copy_user_highpage() function, but, as I said, we affect the
> > > standard CoW mechanism for data pages quite a lot.
> > 
> > Let me restate my approach more clearly:
> > 
> > 1. Remember that a VMA has been executable.
> > 2. Only do the additional handing if the VMA has been executable.
> 
> Sorry, I'm a little confused, and I'm trying to understand what I can
> safely assume is reliable when using mprotect.
> 
> If the problem is data in the D-cache not being flushed to be read as
> data from a text page (i.e. nothing to do with I-cache, it's all about
> the D-cache between different mappings), why is the previous
> executableness of the VMA relevant to the solution?

We're using the previous state to indicate what the likely future state
of the page is going to be.

> And here's a little something:
> 
> http://www.mail-archive.com/aufs-users at lists.sourceforge.net/msg02093.html
> 
> It's about MIPS, but has an awful lot of things in common with the bug
> being discussed in this thread: dynamic linker, constants embedded in
> the code, using mprotect rx->rw->rx, missing I-cache flush, only
> affects COW, copy_user_highpage(), is worked around by switching the
> cache from write-back to write-through...
> 
> Useful?

Depends.  ARMv7 has the requirement that memory is not mapped in using
different cache settings (we already violate that, and ARM Ltd's aware
of that, but no one yet knows how to solve it in Linux.)

Given that, I'm not sure we want to dig ourselves deeper into having
mappings of differing cache attributes - we actually want to eliminate
them.

> I found that while searching to see if mprotect rw->rx implies I-cache
> flush.  On IRIX it's explicitly documented to, in fact it has
> PROT_EXEC_NOFLUSH in case you want to optimise that away :-) Haven't
> found anything to confirm or deny it for Linux or anything else,
> though.

Linux has no hook for doing this; you can't even detect it via mprotect()
because the old vm_flags are overwritten before any arch code gets a
look-in.

> Hopefully it's clear that munmap of the region, followed by mmap
> PROT_READ|PROTE_EXEC to restore the mapping with different permissions
> (when it has a backing file) - hopefully it's clear that _that_ will
> do the needed I-cache flush.

Not necessarily, especially if the file is mapped using MAP_PRIVATE.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 10:07                           ` Russell King - ARM Linux
  2009-09-21 10:42                             ` Catalin Marinas
  2009-09-21 20:10                             ` Jamie Lokier
@ 2009-09-21 21:38                             ` Russell King - ARM Linux
  2009-09-21 22:28                               ` Catalin Marinas
                                                 ` (2 more replies)
  2009-10-15 14:57                             ` Russell King - ARM Linux
  3 siblings, 3 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-21 21:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > I would still call this I-D cache coherency issue since the two caches
> > have a different view of the RAM but I agree that the D-cache is the one
> > holding the data (with a slight chance for the I-cache not to be in sync
> > with main RAM, though we could treat it separately).
> > 
> > We can sort out the D-cache issue with your approach for cleaning it in
> > the copy_user_highpage() function, but, as I said, we affect the
> > standard CoW mechanism for data pages quite a lot.
> 
> Let me restate my approach more clearly:
> 
> 1. Remember that a VMA has been executable.
> 2. Only do the additional handing if the VMA has been executable.

Well, there's a problem with this approach.  Any binary which executes
with read-implies-exec (in other words, the majority of those around)
results in any region with read permission also getting exec permission.

So, mprotect(rw) actually ends up as mprotect(rwx), which means that
effectively _all_ VMAs have been executable.

This approach won't work as well as I'd hope, since this means every
COW fault ends up triggering the cache flush.

However, the same problem affects Catalin's approach too - with these
binaries, every VMA has VM_EXEC set, which means every VMA gets the
cache flushing treatment whenever flush_cache_range() is called.

This is a nasty problem to solve...

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 20:10                             ` Jamie Lokier
  2009-09-21 21:26                               ` Russell King - ARM Linux
@ 2009-09-21 21:58                               ` Catalin Marinas
  2009-09-21 22:12                                 ` Jamie Lokier
  1 sibling, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21 21:58 UTC (permalink / raw)
  To: linux-arm-kernel

Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
>> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
>>> I would still call this I-D cache coherency issue since the two caches
>>> have a different view of the RAM but I agree that the D-cache is the one
>>> holding the data (with a slight chance for the I-cache not to be in sync
>>> with main RAM, though we could treat it separately).
>>>
>>> We can sort out the D-cache issue with your approach for cleaning it in
>>> the copy_user_highpage() function, but, as I said, we affect the
>>> standard CoW mechanism for data pages quite a lot.
>> Let me restate my approach more clearly:
>>
>> 1. Remember that a VMA has been executable.
>> 2. Only do the additional handing if the VMA has been executable.
> 
> Sorry, I'm a little confused, and I'm trying to understand what I can
> safely assume is reliable when using mprotect.

With the current kernel, using mprotect(RX) requires sys_cacheflush.

The patch I proposed handles the mprotect(RX) case and as a side-effect 
deals with the linker way of writing the relocations.

> If the problem is data in the D-cache not being flushed to be read as
> data from a text page (i.e. nothing to do with I-cache, it's all about
> the D-cache between different mappings), why is the previous
> executableness of the VMA relevant to the solution?

Well, it's a workaround to the dynamic linker issue we are having. The 
linker usually maps a file with RX rights and then checks for 
relocations. If these are needed, it calls mprotect(RW). Now we just 
assume that if the area was once executable, it is only temporarily 
marked RW.

> And here's a little something:
> 
> http://www.mail-archive.com/aufs-users at lists.sourceforge.net/msg02093.html
> 
> It's about MIPS, but has an awful lot of things in common with the bug
> being discussed in this thread: dynamic linker, constants embedded in
> the code, using mprotect rx->rw->rx, missing I-cache flush, only
> affects COW, copy_user_highpage(), is worked around by switching the
> cache from write-back to write-through...
> 
> Useful?

Very useful, thanks. It looks similar to the approach I proposed but the 
difference I think is that MIPS was already flushing the D-cache in 
flush_cache_range and they just added the I-cache invalidation. In our 
case (VIPT non-aliasing caches) we don't flush the D-cache at all in 
flush_cache_range().

> I found that while searching to see if mprotect rw->rx implies I-cache
> flush.  On IRIX it's explicitly documented to, in fact it has
> PROT_EXEC_NOFLUSH in case you want to optimise that away :-) Haven't
> found anything to confirm or deny it for Linux or anything else,
> though.

Maybe we should raise this on linux-arch and get some consistent 
approach between other Harvard architectures.

If we go this route, I can try to do a few more optimisations to my 
patch. But for the RWX case, we need to handle it via copy_user_highpage 
(though pretty simple check of the vm_flags).

> Hopefully it's clear that munmap of the region, followed by mmap
> PROT_READ|PROTE_EXEC to restore the mapping with different permissions
> (when it has a backing file) - hopefully it's clear that _that_ will
> do the needed I-cache flush.

It will indeed do the I-cache invalidation via update_mmu_cache() during 
fault processing after mmap but not necessarily D-cache cleaning. So if 
you write any instructions, make sure you call sys_cacheflush(). For 
shared mappings (no CoW), writing only data to text pages should be fine.

A possible scenario (though more code analysis is needed to be entirely 
sure) with writing instructions and not calling sys_cacheflush():

- application mmap's a file (shared mapping, otherwise the data written 
to private mappings is lost when unmapping)
- app writes some instructions to text pages. We don't get CoW because 
of the shared mapping but we don't get D-cache cleaning either
- app unmap's the page but the kernel keeps the physical page in its 
page cache. The flush_cache_page() on non-aliasing VIPT doesn't do 
anything on ARM
- app mmap's the page with PROT_READ|PROT_EXEC
- app executes from the page generating a prefetch abort. The kernel 
finds the page in its page cache and maps it into user space, calling 
update_mmu_cache(). However, the dirty bit isn't set (since the kernel 
hasn't touched the page) and the lazy D-cache flushing in 
update_mmu_cache isn't triggered, leaving the I-cache with old entries 
directly from RAM.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 21:58                               ` Catalin Marinas
@ 2009-09-21 22:12                                 ` Jamie Lokier
  2009-09-21 22:31                                   ` Russell King - ARM Linux
  2009-09-21 22:34                                   ` Catalin Marinas
  0 siblings, 2 replies; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21 22:12 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> A possible scenario (though more code analysis is needed to be entirely 
> sure) with writing instructions and not calling sys_cacheflush():
> 
> - application mmap's a file (shared mapping, otherwise the data written 
> to private mappings is lost when unmapping)
> - app writes some instructions to text pages. We don't get CoW because 
> of the shared mapping but we don't get D-cache cleaning either
> - app unmap's the page but the kernel keeps the physical page in its 
> page cache. The flush_cache_page() on non-aliasing VIPT doesn't do 
> anything on ARM
> - app mmap's the page with PROT_READ|PROT_EXEC
> - app executes from the page generating a prefetch abort. The kernel 
> finds the page in its page cache and maps it into user space, calling 
> update_mmu_cache(). However, the dirty bit isn't set (since the kernel 
> hasn't touched the page) and the lazy D-cache flushing in 
> update_mmu_cache isn't triggered, leaving the I-cache with old entries 
> directly from RAM.

Why isn't the dirty bit set by the last step?

The dirty bit must be set by the writes in the second step, otherwise
how does the kernel know not to discard those writes under memory
pressure?

Btw, regarding "non-aliasing", it's pretty clear that it does alias
the I-cache ;-), in much the same way as different addresses alias in
the D-cache with an "aliasing" cache.  That may well be a clue as to
clean, systematic and sane way of ensuring all the cache ops are in
all the right places.

-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 21:26                               ` Russell King - ARM Linux
@ 2009-09-21 22:14                                 ` Catalin Marinas
  2009-09-21 22:25                                 ` Jamie Lokier
  1 sibling, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21 22:14 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 09:10:43PM +0100, Jamie Lokier wrote:
>> And here's a little something:
>>
>> http://www.mail-archive.com/aufs-users at lists.sourceforge.net/msg02093.html
>>
>> It's about MIPS, but has an awful lot of things in common with the bug
>> being discussed in this thread: dynamic linker, constants embedded in
>> the code, using mprotect rx->rw->rx, missing I-cache flush, only
>> affects COW, copy_user_highpage(), is worked around by switching the
>> cache from write-back to write-through...
>>
>> Useful?
> 
> Depends.  ARMv7 has the requirement that memory is not mapped in using
> different cache settings (we already violate that, and ARM Ltd's aware
> of that, but no one yet knows how to solve it in Linux.)

I think so far the hardware can cope with this as long as you don't 
access both mappings at the same time (they would need at least a DSB 
between accesses with different mappings). If a new CPU requires this 
strict rule to be enforced, the solution I see is a hardware one - ACP 
(ARM's Accelerator Coherency Port) - where we get everything, including 
DMA memory, mapped cached.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 21:26                               ` Russell King - ARM Linux
  2009-09-21 22:14                                 ` Catalin Marinas
@ 2009-09-21 22:25                                 ` Jamie Lokier
  2009-09-22  8:43                                   ` Catalin Marinas
  1 sibling, 1 reply; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21 22:25 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> > Hopefully it's clear that munmap of the region, followed by mmap
> > PROT_READ|PROTE_EXEC to restore the mapping with different permissions
> > (when it has a backing file) - hopefully it's clear that _that_ will
> > do the needed I-cache flush.
> 
> Not necessarily, especially if the file is mapped using MAP_PRIVATE.

If the answer is not necessarily for MAP_SHARED, then we're in trouble
when someone does

    internal_untar("some_files.tar.gz");

        -> Uses open/ftruncate/mmap(PROT_WRITE)/close to write the files.

    dlopen("some_files/code.so")
    code(...)

Which strikes as the sort of thing people might do these days.

For MAP_PRIVATE...  I'm not sure if that will trip people up or not.

-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 21:38                             ` Russell King - ARM Linux
@ 2009-09-21 22:28                               ` Catalin Marinas
  2009-09-21 22:37                                 ` Jamie Lokier
  2009-09-21 22:33                               ` Jamie Lokier
  2009-09-22 10:19                               ` Catalin Marinas
  2 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21 22:28 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
>> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
>>> I would still call this I-D cache coherency issue since the two caches
>>> have a different view of the RAM but I agree that the D-cache is the one
>>> holding the data (with a slight chance for the I-cache not to be in sync
>>> with main RAM, though we could treat it separately).
>>>
>>> We can sort out the D-cache issue with your approach for cleaning it in
>>> the copy_user_highpage() function, but, as I said, we affect the
>>> standard CoW mechanism for data pages quite a lot.
>> Let me restate my approach more clearly:
>>
>> 1. Remember that a VMA has been executable.
>> 2. Only do the additional handing if the VMA has been executable.
> 
> Well, there's a problem with this approach.  Any binary which executes
> with read-implies-exec (in other words, the majority of those around)
> results in any region with read permission also getting exec permission.
> 
> So, mprotect(rw) actually ends up as mprotect(rwx), which means that
> effectively _all_ VMAs have been executable.
> 
> This approach won't work as well as I'd hope, since this means every
> COW fault ends up triggering the cache flush.
> 
> However, the same problem affects Catalin's approach too - with these
> binaries, every VMA has VM_EXEC set, which means every VMA gets the
> cache flushing treatment whenever flush_cache_range() is called.

In this case, I don't have a better solution, other than optimising the 
coherent_user_range function in my patch to avoid generating page faults 
(and run some benchmarks).

Of course, we could improve the generic mm/ code but that's not easy to 
merge as it affects other architectures.

> This is a nasty problem to solve...

I agree.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 22:12                                 ` Jamie Lokier
@ 2009-09-21 22:31                                   ` Russell King - ARM Linux
  2009-09-21 22:34                                   ` Catalin Marinas
  1 sibling, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-21 22:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 11:12:29PM +0100, Jamie Lokier wrote:
> Btw, regarding "non-aliasing", it's pretty clear that it does alias
> the I-cache ;-), in much the same way as different addresses alias in
> the D-cache with an "aliasing" cache.  That may well be a clue as to
> clean, systematic and sane way of ensuring all the cache ops are in
> all the right places.

If you want your kernel to flush the I and D caches on every page fault,
process creation and exit, be my guest.  I'd rather have a system with
some reasonable performance left in it rather than being relegated back
to the days of VIVT caches.

The key thing is to work out the _minimum_ amount of cache flushing,
not just throw cache flushing blindly in.

MPCore, btw, is PIPT write allocate harvard.  In other words, it is
fully coherent with virtual mappings - the only things it isn't
coherent with is the instruction cache and DMA.  It'd be absolutely
stupid to have to throw the heavy weights of flushing large amounts
of cache for what is basically a small localized problem.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 21:38                             ` Russell King - ARM Linux
  2009-09-21 22:28                               ` Catalin Marinas
@ 2009-09-21 22:33                               ` Jamie Lokier
  2009-09-22  9:21                                 ` Catalin Marinas
  2009-09-22 10:19                               ` Catalin Marinas
  2 siblings, 1 reply; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21 22:33 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > I would still call this I-D cache coherency issue since the two caches
> > > have a different view of the RAM but I agree that the D-cache is the one
> > > holding the data (with a slight chance for the I-cache not to be in sync
> > > with main RAM, though we could treat it separately).
> > > 
> > > We can sort out the D-cache issue with your approach for cleaning it in
> > > the copy_user_highpage() function, but, as I said, we affect the
> > > standard CoW mechanism for data pages quite a lot.
> > 
> > Let me restate my approach more clearly:
> > 
> > 1. Remember that a VMA has been executable.
> > 2. Only do the additional handing if the VMA has been executable.
> 
> Well, there's a problem with this approach.  Any binary which executes
> with read-implies-exec (in other words, the majority of those around)
> results in any region with read permission also getting exec permission.
> 
> So, mprotect(rw) actually ends up as mprotect(rwx), which means that
> effectively _all_ VMAs have been executable.
> 
> This approach won't work as well as I'd hope, since this means every
> COW fault ends up triggering the cache flush.
> 
> However, the same problem affects Catalin's approach too - with these
> binaries, every VMA has VM_EXEC set, which means every VMA gets the
> cache flushing treatment whenever flush_cache_range() is called.
> 
> This is a nasty problem to solve...

Would it be so harmful if mprotect(rw) were treated as mprotect(rw),
while mprotect(r) is treated as mprotect(rx)?  Sure, someone _could_
exec from rw pages, but they _should not_ be doing so if they haven't
set PROT_EXEC.

PROT_READ implying PROT_READ|PROT_EXEC may be needed for historical
compatibility reasons, but PROT_READ|PROT_WRITE implying
PROT_READ|PROT_WRITE|PROT_EXEC - there might not be any code doing that.

I would guess (but it's just a guess) that the sort of people who
execute code from a PROT_WRITE segment would have known to set
PROT_EXEC, and the main cause for concern would be if there are
executables which have writable data immediately after the text
segment, in the same page, to save space in an executable.  Has that
happened?  Even that would be of no concern if that page is always
mapped PROT_EXEC by the loaders.

Hmm :/  Actually I'm quite surprised read-implies-exec is needed.

-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 22:12                                 ` Jamie Lokier
  2009-09-21 22:31                                   ` Russell King - ARM Linux
@ 2009-09-21 22:34                                   ` Catalin Marinas
  1 sibling, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-21 22:34 UTC (permalink / raw)
  To: linux-arm-kernel

Jamie Lokier wrote:
> Catalin Marinas wrote:
>> A possible scenario (though more code analysis is needed to be entirely 
>> sure) with writing instructions and not calling sys_cacheflush():
>>
>> - application mmap's a file (shared mapping, otherwise the data written 
>> to private mappings is lost when unmapping)
>> - app writes some instructions to text pages. We don't get CoW because 
>> of the shared mapping but we don't get D-cache cleaning either
>> - app unmap's the page but the kernel keeps the physical page in its 
>> page cache. The flush_cache_page() on non-aliasing VIPT doesn't do 
>> anything on ARM
>> - app mmap's the page with PROT_READ|PROT_EXEC
>> - app executes from the page generating a prefetch abort. The kernel 
>> finds the page in its page cache and maps it into user space, calling 
>> update_mmu_cache(). However, the dirty bit isn't set (since the kernel 
>> hasn't touched the page) and the lazy D-cache flushing in 
>> update_mmu_cache isn't triggered, leaving the I-cache with old entries 
>> directly from RAM.
> 
> Why isn't the dirty bit set by the last step?

By dirty here I mean the PG_arch_1 bit set by flush_dcache_page() which 
won't happen at step 2.

> The dirty bit must be set by the writes in the second step, otherwise
> how does the kernel know not to discard those writes under memory
> pressure?

That's another dirty bit (I think PG_dirty) but it isn't taken into 
account by update_mmu_cache(). As I said, more in-depth code analysis 
here is needed to be entirely sure.

> Btw, regarding "non-aliasing", it's pretty clear that it does alias
> the I-cache ;-), 

We just refer to D-cache aliasing here.

> in much the same way as different addresses alias in
> the D-cache with an "aliasing" cache.  That may well be a clue as to
> clean, systematic and sane way of ensuring all the cache ops are in
> all the right places.

We should indeed favour correctness but there may be some corner cases 
which aren't used anyway, so we shouldn't penalise normal usage.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 22:28                               ` Catalin Marinas
@ 2009-09-21 22:37                                 ` Jamie Lokier
  0 siblings, 0 replies; 72+ messages in thread
From: Jamie Lokier @ 2009-09-21 22:37 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> Of course, we could improve the generic mm/ code but that's not easy to 
> merge as it affects other architectures.

Then again, I seem to recall a troublesome problem with TLBs and
multiple threads in here many months ago that I didn't believe was
fully solved in corner cases, and there's this question I keep
wondering about whether O_DIRECT (used by SQL databases and QEMU among
others) is safe on these ARMs due to DMA to user-space coherence...

Maybe a bit of generic code just has to be changed.

-- Jamie

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-20 22:02               ` Catalin Marinas
@ 2009-09-22  5:44                 ` Shilimkar, Santosh
  2009-09-22  9:01                   ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Shilimkar, Santosh @ 2009-09-22  5:44 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin,
> -----Original Message-----
> From: linux-arm-kernel-bounces at lists.infradead.org [mailto:linux-arm-
> kernel-bounces at lists.infradead.org] On Behalf Of Catalin Marinas
> Sent: Monday, September 21, 2009 3:32 AM
> To: Russell King - ARM Linux
> Cc: Dirk Behme; linux-arm-kernel at lists.infradead.org
> Subject: Re: Kernel related (?) user space crash at ARM11 MPCore
> 
> Russell King - ARM Linux wrote:
> > On Sun, Sep 20, 2009 at 09:39:00AM +0100, Catalin Marinas wrote:
> >> On Sat, 2009-09-19 at 23:40 +0100, Russell King - ARM Linux wrote:
> >>> On Mon, Aug 17, 2009 at 06:25:16PM +0100, Catalin Marinas wrote:
> >>>> Assuming that the dynamic linker does instruction modifications as
> well
> >>>> and expects the mprotect(RX) to flush the caches, the patch below
> >>>> appears to fix the problem (not intensively tested). Note that I
> don't
> >>>> say this is the right fix but it may work around the problem until
> >>>> further investigation into the dynamic linker.
> >>> Having now re-read the start of the thread, and put all the pieces
> >>> together, the problem is not to do with SMP per-se, or Icache
> >>> problems.
>  >>
> >> It's the I-D cache coherency.
> >
> > You may be right, but the current evidence does not support that.
> > If what you say is true, then all current ARMv6 and ARMv7 non-SMP
> > systems would be affected.  So far, the bug report is only against
> > SMP systems, where the cache is always forced to write allocate mode.
> 
> It is quite unlikely, though not impossible, for the I-cache to have
> stale entries. That's mainly because by the time a page cache page is
> reused for a different file, the corresponding I-cache entries are long
> gone. You could try on a software model to limit the amount of RAM and
> increase the I-cache size (I think AEM supports pseudo-infinite caches).
> 
> Data (instruction opcodes) not reaching the RAM because of
> write-allocate D-cache is the main issue, but it would be better to
> cover the I-cache coherency to avoid hard to reproduce bugs on some
> hardware configurations.
> 
> >>> I'd like to request that someone who can prove that the program works
> >>> on ARMv6/v7 hardware does the following test:
> >>>
> >>> 1. boot the system with cachepolicy=writealloc
> >>> 2. re-test the program
>  >>
> >> I don't think this would work. All the non-SMP v6/v7 processors I'm
> >> aware of only support read-allocate caches, even if you try to force
> >> write-allocate. On the SMP ones (Cortex-A9, ARM11MPCore), write-
> allocate
> >> is the default.
> >
> > Are you sure - I thought some of them did support write allocate.
> 
> I'm not entirely sure but that's what I recall. Anyway, you can run a UP
> kernel on ARM11MPCore.

Even though this thread is mainly focused on SMP + WA cache, can you point out some reference which support your argument that " the non-SMP v6/v7 processors I'm aware of only support read-allocate caches, even if you try to force write-allocate". Mainly Cortex-a8 (ARMv7). The Cortex-A8 TRM says that it does support WBWA cache.

Thanks!!

Regards,
Santosh

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 22:25                                 ` Jamie Lokier
@ 2009-09-22  8:43                                   ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-22  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 23:25 +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > > Hopefully it's clear that munmap of the region, followed by mmap
> > > PROT_READ|PROTE_EXEC to restore the mapping with different permissions
> > > (when it has a backing file) - hopefully it's clear that _that_ will
> > > do the needed I-cache flush.
> > 
> > Not necessarily, especially if the file is mapped using MAP_PRIVATE.
> 
> If the answer is not necessarily for MAP_SHARED, then we're in trouble
> when someone does
> 
>     internal_untar("some_files.tar.gz");
> 
>         -> Uses open/ftruncate/mmap(PROT_WRITE)/close to write the files.

So the code above would use MAP_SHARED here to make the files visible to
to other processes.

>     dlopen("some_files/code.so")
>     code(...)

That's in a different application I suspect.

I think my scenario previously described could be valid but source code
analysis is needed to be sure. If pages in the kernel page cache have
dirty cache lines, update_mmu_cache() may only invalidate the I-cache
rather than cleaning the D-cache.

If this scenario is valid, we may need to implement flush_cache_range()
or flush_cache_page().

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-22  5:44                 ` Shilimkar, Santosh
@ 2009-09-22  9:01                   ` Catalin Marinas
  2009-09-22  9:34                     ` Shilimkar, Santosh
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-22  9:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2009-09-22 at 11:14 +0530, Shilimkar, Santosh wrote:
> Even though this thread is mainly focused on SMP + WA cache, can you
> point out some reference which support your argument that " the
> non-SMP v6/v7 processors I'm aware of only support read-allocate
> caches, even if you try to force write-allocate". Mainly Cortex-a8
> (ARMv7). The Cortex-A8 TRM says that it does support WBWA cache.

Please note the "I'm aware of" part. This could be limited to what I
have on my desk.

Regarding the Cortex-A8 - the 7.3.3 section in the latest TRM (Level 1
Memory System -> Memory attributes -> Normal) has a table which says for
the WBWA L1 configuration:

        This is not supported. L1 is always in the no write-allocate
        mode.

The L2 (inner cache) supports the write-allocate mode but this is a
unified cache and doesn't affect the problem we are seeing here.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 22:33                               ` Jamie Lokier
@ 2009-09-22  9:21                                 ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-22  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 23:33 +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > So, mprotect(rw) actually ends up as mprotect(rwx), which means that
> > effectively _all_ VMAs have been executable.
> > 
> > This approach won't work as well as I'd hope, since this means every
> > COW fault ends up triggering the cache flush.
> > 
> > However, the same problem affects Catalin's approach too - with these
> > binaries, every VMA has VM_EXEC set, which means every VMA gets the
> > cache flushing treatment whenever flush_cache_range() is called.
> > 
> > This is a nasty problem to solve...
> 
> Would it be so harmful if mprotect(rw) were treated as mprotect(rw),
> while mprotect(r) is treated as mprotect(rx)?  Sure, someone _could_
> exec from rw pages, but they _should not_ be doing so if they haven't
> set PROT_EXEC.

For OABI, I think we rely on executable user stack for signal handling
and the arm_elf_read_implies_exec() always returns 1. There are also
tools like like this - http://linux.die.net/man/8/execstack - or maybe
the toolchain setting the PT_GNU_STACK which would also set the
read_implies_exec for the stack. And the stack must be writable.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-22  9:01                   ` Catalin Marinas
@ 2009-09-22  9:34                     ` Shilimkar, Santosh
  0 siblings, 0 replies; 72+ messages in thread
From: Shilimkar, Santosh @ 2009-09-22  9:34 UTC (permalink / raw)
  To: linux-arm-kernel


> -----Original Message-----
> From: Catalin Marinas [mailto:catalin.marinas at arm.com]
> Sent: Tuesday, September 22, 2009 2:31 PM
> To: Shilimkar, Santosh
> Cc: Russell King - ARM Linux; Dirk Behme; linux-arm-
> kernel at lists.infradead.org
> Subject: RE: Kernel related (?) user space crash at ARM11 MPCore
> 
> On Tue, 2009-09-22 at 11:14 +0530, Shilimkar, Santosh wrote:
> > Even though this thread is mainly focused on SMP + WA cache, can you
> > point out some reference which support your argument that " the
> > non-SMP v6/v7 processors I'm aware of only support read-allocate
> > caches, even if you try to force write-allocate". Mainly Cortex-a8
> > (ARMv7). The Cortex-A8 TRM says that it does support WBWA cache.
> 
> Please note the "I'm aware of" part. This could be limited to what I
> have on my desk.
> 
> Regarding the Cortex-A8 - the 7.3.3 section in the latest TRM (Level 1
> Memory System -> Memory attributes -> Normal) has a table which says for
> the WBWA L1 configuration:
> 
>         This is not supported. L1 is always in the no write-allocate
>         mode.
> 
> The L2 (inner cache) supports the write-allocate mode but this is a
> unified cache and doesn't affect the problem we are seeing here.

Thanks Catalin for information!!

Regards,
Santosh

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 21:38                             ` Russell King - ARM Linux
  2009-09-21 22:28                               ` Catalin Marinas
  2009-09-21 22:33                               ` Jamie Lokier
@ 2009-09-22 10:19                               ` Catalin Marinas
  2009-09-22 17:17                                 ` Catalin Marinas
  2 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-22 10:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-21 at 22:38 +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > I would still call this I-D cache coherency issue since the two caches
> > > have a different view of the RAM but I agree that the D-cache is the one
> > > holding the data (with a slight chance for the I-cache not to be in sync
> > > with main RAM, though we could treat it separately).
> > > 
> > > We can sort out the D-cache issue with your approach for cleaning it in
> > > the copy_user_highpage() function, but, as I said, we affect the
> > > standard CoW mechanism for data pages quite a lot.
> > 
> > Let me restate my approach more clearly:
> > 
> > 1. Remember that a VMA has been executable.
> > 2. Only do the additional handing if the VMA has been executable.

> Well, there's a problem with this approach.  Any binary which executes
> with read-implies-exec (in other words, the majority of those around)
> results in any region with read permission also getting exec permission.

Some idea (not tested) - we keep XN always set even if the mapping has
PROT_EXEC. We then handle the prefetch abort and with the IFSR patches
from Kirill we can detect the permission fault and clear the XN bit
while also cleaning the D-cache, maybe via update_mmu_cache() or
directly in __do_page_fault().

Most text pages are handled via a prefetch abort anyway, so we don't
have any performance penalty here. For the read-implies-exec pages or
the dynamic linker case, we have an additional prefetch abort but that's
much faster than either flush_cache_range() or copy_to_user_highpage()
patches.

We may need an additional pte bit or just reuse L_PTE_EXEC.

Yet another idea - add a generic flush_cache_range_for_mprotect()
function with a specific implementation for ARM (called via
change_protection).

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-22 10:19                               ` Catalin Marinas
@ 2009-09-22 17:17                                 ` Catalin Marinas
  2009-09-23  6:03                                   ` Dirk Behme
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-22 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2009-09-22 at 11:19 +0100, Catalin Marinas wrote:
> Yet another idea - add a generic flush_cache_range_for_mprotect()
> function with a specific implementation for ARM (called via
> change_protection).

The patch below looks like the best option in my opinion but requires
some generic kernel changes (minimal though). The patch contains the
ARM-specific code as well but can be split in two for pushing upstream.

Apart from this patch, I ran some lmbench tests and my workaround
affects mmap tests quite a lot because of the read-implies-exec forcing
flush_cache_range() in several places. Russell's patch for adding cache
flushing during CoW (either coherent_kernel_range or flush_dcache_page)
slows the fork() tests a bit but the lmbench tests are relatively small
and don't cause a lot of page CoW. This may be different for something
like apache.


Add generic flush_prot_range() and ARM-specific implementation

From: Catalin Marinas <catalin.marinas@arm.com>

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/cacheflush.h |    3 +++
 arch/arm/mm/cache-v6.S            |   20 ++++++++++++++++++--
 arch/arm/mm/cache-v7.S            |   19 +++++++++++++++++--
 arch/arm/mm/flush.c               |   19 +++++++++++++++++++
 include/linux/highmem.h           |    8 ++++++++
 mm/hugetlb.c                      |    2 +-
 mm/mprotect.c                     |    2 +-
 7 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 91aec14..d53832b 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -391,12 +391,15 @@ flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 	}
 }
 #else
+#define ARCH_HAS_FLUSH_PROT_RANGE
 extern void flush_cache_mm(struct mm_struct *mm);
 extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
 extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
 extern void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 				unsigned long uaddr, void *kaddr,
 				unsigned long len, int write);
+extern void flush_prot_range(struct vm_area_struct *vma, unsigned long start,
+			     unsigned long end);
 #endif
 
 #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 8364f6c..7baa6ce 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -12,6 +12,7 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <asm/assembler.h>
+#include <asm/unwind.h>
 
 #include "proc-macros.S"
 
@@ -129,11 +130,13 @@ ENTRY(v6_coherent_kern_range)
  *	- the Icache does not read data from the write buffer
  */
 ENTRY(v6_coherent_user_range)
-
+ UNWIND(.fnstart		)
 #ifdef HARVARD_CACHE
 	bic	r0, r0, #CACHE_LINE_SIZE - 1
-1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
+1:
+ USER(	mcr	p15, 0, r0, c7, c10, 1	)	@ clean D line
 	add	r0, r0, #CACHE_LINE_SIZE
+2:
 	cmp	r0, r1
 	blo	1b
 #endif
@@ -151,6 +154,19 @@ ENTRY(v6_coherent_user_range)
 	mov	pc, lr
 
 /*
+ * Fault handling for the cache operation above. If the virtual address in r0
+ * isn't mapped, just try the next page.
+ */
+9001:
+	mov	r0, r0, lsr #12
+	mov	r0, r0, lsl #12
+	add	r0, r0, #4096
+	b	2b
+ UNWIND(.fnend		)
+ENDPROC(v6_coherent_user_range)
+ENDPROC(v6_coherent_kern_range)
+
+/*
  *	v6_flush_kern_dcache_page(kaddr)
  *
  *	Ensure that the data held in the page kaddr is written back
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 29e6904..4b733d1 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -13,6 +13,7 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <asm/assembler.h>
+#include <asm/unwind.h>
 
 #include "proc-macros.S"
 
@@ -153,13 +154,16 @@ ENTRY(v7_coherent_kern_range)
  *	- the Icache does not read data from the write buffer
  */
 ENTRY(v7_coherent_user_range)
+ UNWIND(.fnstart		)
 	dcache_line_size r2, r3
 	sub	r3, r2, #1
 	bic	r0, r0, r3
-1:	mcr	p15, 0, r0, c7, c11, 1		@ clean D line to the point of unification
+1:
+ USER(	mcr	p15, 0, r0, c7, c11, 1	)	@ clean D line to the point of unification
 	dsb
-	mcr	p15, 0, r0, c7, c5, 1		@ invalidate I line
+ USER(	mcr	p15, 0, r0, c7, c5, 1	)	@ invalidate I line
 	add	r0, r0, r2
+2:
 	cmp	r0, r1
 	blo	1b
 	mov	r0, #0
@@ -167,6 +171,17 @@ ENTRY(v7_coherent_user_range)
 	dsb
 	isb
 	mov	pc, lr
+
+/*
+ * Fault handling for the cache operation above. If the virtual address in r0
+ * isn't mapped, just try the next page.
+ */
+9001:
+	mov	r0, r0, lsr #12
+	mov	r0, r0, lsl #12
+	add	r0, r0, #4096
+	b	2b
+ UNWIND(.fnend		)
 ENDPROC(v7_coherent_kern_range)
 ENDPROC(v7_coherent_user_range)
 
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 73b886e..ed07f4d 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -87,6 +87,25 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 	}
 }
 
+void flush_prot_range(struct vm_area_struct *vma, unsigned long start,
+		      unsigned long end)
+{
+	if (cache_is_vipt_nonaliasing()) {
+		if (vma->vm_flags & VM_EXEC) {
+			/*
+			 * Increment the task's preempt_count so that
+			 * in_atomic() is true and do_page_fault() does not
+			 * try to map pages in. If a page isn't mapped yet, it
+			 * will be ignored.
+			 */
+			inc_preempt_count();
+			flush_cache_user_range(vma, start, end);
+			dec_preempt_count();
+		}
+	} else
+		flush_cache_range(vma, start, end);
+}
+
 void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
 {
 	if (cache_is_vivt()) {
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 13875ce..067e67d 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -19,6 +19,14 @@ static inline void flush_kernel_dcache_page(struct page *page)
 }
 #endif
 
+#ifndef ARCH_HAS_FLUSH_PROT_RANGE
+static inline void flush_prot_range(struct vm_area_struct *vma,
+				    unsigned long start, unsigned long end)
+{
+	flush_cache_range(vma, start, end);
+}
+#endif
+
 #ifdef CONFIG_HIGHMEM
 
 #include <asm/highmem.h>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6058b53..7ce4f57 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2213,7 +2213,7 @@ void hugetlb_change_protection(struct vm_area_struct *vma,
 	struct hstate *h = hstate_vma(vma);
 
 	BUG_ON(address >= end);
-	flush_cache_range(vma, address, end);
+	flush_prot_range(vma, address, end);
 
 	spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index fded06f..a6b7616 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -123,7 +123,7 @@ static void change_protection(struct vm_area_struct *vma,
 
 	BUG_ON(addr >= end);
 	pgd = pgd_offset(mm, addr);
-	flush_cache_range(vma, addr, end);
+	flush_prot_range(vma, addr, end);
 	do {
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(pgd))


-- 
Catalin

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-22 17:17                                 ` Catalin Marinas
@ 2009-09-23  6:03                                   ` Dirk Behme
  2009-09-23  9:13                                     ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Dirk Behme @ 2009-09-23  6:03 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> On Tue, 2009-09-22 at 11:19 +0100, Catalin Marinas wrote:
>> Yet another idea - add a generic flush_cache_range_for_mprotect()
>> function with a specific implementation for ARM (called via
>> change_protection).

Catalin and Russell: First many thanks for all the discussion and help 
about this!

> The patch below looks like the best option in my opinion but requires
> some generic kernel changes (minimal though). The patch contains the
> ARM-specific code as well but can be split in two for pushing upstream.
> 
> Apart from this patch, I ran some lmbench tests and my workaround

If you talk about "workaround", do you mean patch below or patch in

http://lists.arm.linux.org.uk/lurker/message/20090817.172516.3100340a.en.html

?

> affects mmap tests quite a lot because of the read-implies-exec forcing
> flush_cache_range() in several places. Russell's patch 

Is Russell's patch available publically somewhere? Sorry if I missed it.

Many thanks

Dirk

> for adding cache
> flushing during CoW (either coherent_kernel_range or flush_dcache_page)
> slows the fork() tests a bit but the lmbench tests are relatively small
> and don't cause a lot of page CoW. This may be different for something
> like apache.
 >
> Add generic flush_prot_range() and ARM-specific implementation
> 
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm/include/asm/cacheflush.h |    3 +++
>  arch/arm/mm/cache-v6.S            |   20 ++++++++++++++++++--
>  arch/arm/mm/cache-v7.S            |   19 +++++++++++++++++--
>  arch/arm/mm/flush.c               |   19 +++++++++++++++++++
>  include/linux/highmem.h           |    8 ++++++++
>  mm/hugetlb.c                      |    2 +-
>  mm/mprotect.c                     |    2 +-
>  7 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 91aec14..d53832b 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -391,12 +391,15 @@ flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
>  	}
>  }
>  #else
> +#define ARCH_HAS_FLUSH_PROT_RANGE
>  extern void flush_cache_mm(struct mm_struct *mm);
>  extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
>  extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
>  extern void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
>  				unsigned long uaddr, void *kaddr,
>  				unsigned long len, int write);
> +extern void flush_prot_range(struct vm_area_struct *vma, unsigned long start,
> +			     unsigned long end);
>  #endif
>  
>  #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index 8364f6c..7baa6ce 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -12,6 +12,7 @@
>  #include <linux/linkage.h>
>  #include <linux/init.h>
>  #include <asm/assembler.h>
> +#include <asm/unwind.h>
>  
>  #include "proc-macros.S"
>  
> @@ -129,11 +130,13 @@ ENTRY(v6_coherent_kern_range)
>   *	- the Icache does not read data from the write buffer
>   */
>  ENTRY(v6_coherent_user_range)
> -
> + UNWIND(.fnstart		)
>  #ifdef HARVARD_CACHE
>  	bic	r0, r0, #CACHE_LINE_SIZE - 1
> -1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
> +1:
> + USER(	mcr	p15, 0, r0, c7, c10, 1	)	@ clean D line
>  	add	r0, r0, #CACHE_LINE_SIZE
> +2:
>  	cmp	r0, r1
>  	blo	1b
>  #endif
> @@ -151,6 +154,19 @@ ENTRY(v6_coherent_user_range)
>  	mov	pc, lr
>  
>  /*
> + * Fault handling for the cache operation above. If the virtual address in r0
> + * isn't mapped, just try the next page.
> + */
> +9001:
> +	mov	r0, r0, lsr #12
> +	mov	r0, r0, lsl #12
> +	add	r0, r0, #4096
> +	b	2b
> + UNWIND(.fnend		)
> +ENDPROC(v6_coherent_user_range)
> +ENDPROC(v6_coherent_kern_range)
> +
> +/*
>   *	v6_flush_kern_dcache_page(kaddr)
>   *
>   *	Ensure that the data held in the page kaddr is written back
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index 29e6904..4b733d1 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -13,6 +13,7 @@
>  #include <linux/linkage.h>
>  #include <linux/init.h>
>  #include <asm/assembler.h>
> +#include <asm/unwind.h>
>  
>  #include "proc-macros.S"
>  
> @@ -153,13 +154,16 @@ ENTRY(v7_coherent_kern_range)
>   *	- the Icache does not read data from the write buffer
>   */
>  ENTRY(v7_coherent_user_range)
> + UNWIND(.fnstart		)
>  	dcache_line_size r2, r3
>  	sub	r3, r2, #1
>  	bic	r0, r0, r3
> -1:	mcr	p15, 0, r0, c7, c11, 1		@ clean D line to the point of unification
> +1:
> + USER(	mcr	p15, 0, r0, c7, c11, 1	)	@ clean D line to the point of unification
>  	dsb
> -	mcr	p15, 0, r0, c7, c5, 1		@ invalidate I line
> + USER(	mcr	p15, 0, r0, c7, c5, 1	)	@ invalidate I line
>  	add	r0, r0, r2
> +2:
>  	cmp	r0, r1
>  	blo	1b
>  	mov	r0, #0
> @@ -167,6 +171,17 @@ ENTRY(v7_coherent_user_range)
>  	dsb
>  	isb
>  	mov	pc, lr
> +
> +/*
> + * Fault handling for the cache operation above. If the virtual address in r0
> + * isn't mapped, just try the next page.
> + */
> +9001:
> +	mov	r0, r0, lsr #12
> +	mov	r0, r0, lsl #12
> +	add	r0, r0, #4096
> +	b	2b
> + UNWIND(.fnend		)
>  ENDPROC(v7_coherent_kern_range)
>  ENDPROC(v7_coherent_user_range)
>  
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 73b886e..ed07f4d 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -87,6 +87,25 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
>  	}
>  }
>  
> +void flush_prot_range(struct vm_area_struct *vma, unsigned long start,
> +		      unsigned long end)
> +{
> +	if (cache_is_vipt_nonaliasing()) {
> +		if (vma->vm_flags & VM_EXEC) {
> +			/*
> +			 * Increment the task's preempt_count so that
> +			 * in_atomic() is true and do_page_fault() does not
> +			 * try to map pages in. If a page isn't mapped yet, it
> +			 * will be ignored.
> +			 */
> +			inc_preempt_count();
> +			flush_cache_user_range(vma, start, end);
> +			dec_preempt_count();
> +		}
> +	} else
> +		flush_cache_range(vma, start, end);
> +}
> +
>  void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
>  {
>  	if (cache_is_vivt()) {
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 13875ce..067e67d 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -19,6 +19,14 @@ static inline void flush_kernel_dcache_page(struct page *page)
>  }
>  #endif
>  
> +#ifndef ARCH_HAS_FLUSH_PROT_RANGE
> +static inline void flush_prot_range(struct vm_area_struct *vma,
> +				    unsigned long start, unsigned long end)
> +{
> +	flush_cache_range(vma, start, end);
> +}
> +#endif
> +
>  #ifdef CONFIG_HIGHMEM
>  
>  #include <asm/highmem.h>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6058b53..7ce4f57 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2213,7 +2213,7 @@ void hugetlb_change_protection(struct vm_area_struct *vma,
>  	struct hstate *h = hstate_vma(vma);
>  
>  	BUG_ON(address >= end);
> -	flush_cache_range(vma, address, end);
> +	flush_prot_range(vma, address, end);
>  
>  	spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
>  	spin_lock(&mm->page_table_lock);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index fded06f..a6b7616 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -123,7 +123,7 @@ static void change_protection(struct vm_area_struct *vma,
>  
>  	BUG_ON(addr >= end);
>  	pgd = pgd_offset(mm, addr);
> -	flush_cache_range(vma, addr, end);
> +	flush_prot_range(vma, addr, end);
>  	do {
>  		next = pgd_addr_end(addr, end);
>  		if (pgd_none_or_clear_bad(pgd))
> 
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-23  6:03                                   ` Dirk Behme
@ 2009-09-23  9:13                                     ` Catalin Marinas
  2009-09-23 10:38                                       ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-23  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2009-09-23 at 08:03 +0200, Dirk Behme wrote:
> Catalin Marinas wrote:
> > On Tue, 2009-09-22 at 11:19 +0100, Catalin Marinas wrote:
> >> Yet another idea - add a generic flush_cache_range_for_mprotect()
> >> function with a specific implementation for ARM (called via
> >> change_protection).
> 
> Catalin and Russell: First many thanks for all the discussion and help 
> about this!
> 
> > The patch below looks like the best option in my opinion but requires
> > some generic kernel changes (minimal though). The patch contains the
> > ARM-specific code as well but can be split in two for pushing upstream.
> > 
> > Apart from this patch, I ran some lmbench tests and my workaround
> 
> If you talk about "workaround", do you mean patch below or patch in
> 
> http://lists.arm.linux.org.uk/lurker/message/20090817.172516.3100340a.en.html

The patch at the link above. It implements flush_cache_range with a
check for VM_EXEC but as Russell pointed out, that's set all the time,
so the mmap performance is affected. Whether this is on a critical path,
I can't say.

> > affects mmap tests quite a lot because of the read-implies-exec forcing
> > flush_cache_range() in several places. Russell's patch 
> 
> Is Russell's patch available publically somewhere? Sorry if I missed it.

It's not but just add a __cpuc_flush_dcache_page() to the
copy_to_user_highpage() function (I don't think it needs the coherent
flush since the page was first mapped with RX and update_mmu_cache()
should have taken care of the I-cache invalidation). Something like
below:

diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index 4127a7b..f19ed4e 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -41,6 +41,7 @@ static void v6_copy_user_highpage_nonaliasing(struct page *to,
 	kfrom = kmap_atomic(from, KM_USER0);
 	kto = kmap_atomic(to, KM_USER1);
 	copy_page(kto, kfrom);
+	__cpuc_flush_dcache_page(kto);
 	kunmap_atomic(kto, KM_USER1);
 	kunmap_atomic(kfrom, KM_USER0);
 }

After running some benchmarks, my preference for a fix is:

     1. The patch I just posted which introduces flush_prot_range(). It
        affects generic code but if people here agree with the idea, I'm
        happy to try to push for it upstream. It allows mprotect(rx) to
        flush the caches as some people might expect and doesn't affect
        the fork or mmap performance
     2. Russell's (unposted) patch (or the one above) for flushing the
        D-cache during CoW (no I-cache invalidation necessary). There
        would be a (probably slight) performance drop for applications
        using intensive forking (apache?) but it shouldn't be that
        different from read-allocate caches. The common fork+exec
        combination shouldn't do much CoW so I don't expect this to be
        affected
     3. My initial patch for flush_cache_range but it would be better to
        disable read-implies-exec
     4. Handle the prefetch abort for exec permission and only to the
        flushing there (this patch would be a bit more complicated)

For now, I would say to go with 2 for the -rc releases (and maybe try to
convince upstream people of 1). If option 1 isn't acceptable upstream,
we tell people that mprotect(rx) does not flush the caches, so
sys_cacheflush() should be used instead.

-- 
Catalin

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-23  9:13                                     ` Catalin Marinas
@ 2009-09-23 10:38                                       ` Catalin Marinas
  2009-09-23 12:12                                         ` Mikael Pettersson
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-23 10:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2009-09-23 at 10:13 +0100, Catalin Marinas wrote:
>      1. The patch I just posted which introduces flush_prot_range(). It
>         affects generic code but if people here agree with the idea, I'm
>         happy to try to push for it upstream. It allows mprotect(rx) to
>         flush the caches as some people might expect and doesn't affect
>         the fork or mmap performance
>      2. Russell's (unposted) patch (or the one above) for flushing the
>         D-cache during CoW (no I-cache invalidation necessary). There
>         would be a (probably slight) performance drop for applications
>         using intensive forking (apache?) but it shouldn't be that
>         different from read-allocate caches. The common fork+exec
>         combination shouldn't do much CoW so I don't expect this to be
>         affected
>      3. My initial patch for flush_cache_range but it would be better to
>         disable read-implies-exec
>      4. Handle the prefetch abort for exec permission and only to the
>         flushing there (this patch would be a bit more complicated)

I did some tests with OABI_COMPAT disabled so that read-implies-exec is
also disabled. Option 3 above has no noticeable performance drop
compared to a vanilla kernel, also with OABI_COMPAT disabled.

Which would make me recommend workaround 3 if OABI compatibility is not
required. I'm not sure what Debian or Ubuntu do (I think the latter has
OABI_COMPAT disabled). If OABI is required, we should go with 2.

I've spent too much time already on this issue. There are several
solutions available, all with drawbacks. If there are no other solutions
to be proposed (other than modifying user-space), maybe we should vote
on the favourite option.

My vote: 2 (until we no longer have OABI filesystems around)

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-23 10:38                                       ` Catalin Marinas
@ 2009-09-23 12:12                                         ` Mikael Pettersson
  2009-09-23 12:42                                           ` Russell King - ARM Linux
  0 siblings, 1 reply; 72+ messages in thread
From: Mikael Pettersson @ 2009-09-23 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas writes:
 > On Wed, 2009-09-23 at 10:13 +0100, Catalin Marinas wrote:
 > >      1. The patch I just posted which introduces flush_prot_range(). It
 > >         affects generic code but if people here agree with the idea, I'm
 > >         happy to try to push for it upstream. It allows mprotect(rx) to
 > >         flush the caches as some people might expect and doesn't affect
 > >         the fork or mmap performance
 > >      2. Russell's (unposted) patch (or the one above) for flushing the
 > >         D-cache during CoW (no I-cache invalidation necessary). There
 > >         would be a (probably slight) performance drop for applications
 > >         using intensive forking (apache?) but it shouldn't be that
 > >         different from read-allocate caches. The common fork+exec
 > >         combination shouldn't do much CoW so I don't expect this to be
 > >         affected
 > >      3. My initial patch for flush_cache_range but it would be better to
 > >         disable read-implies-exec
 > >      4. Handle the prefetch abort for exec permission and only to the
 > >         flushing there (this patch would be a bit more complicated)
 > 
 > I did some tests with OABI_COMPAT disabled so that read-implies-exec is
 > also disabled. Option 3 above has no noticeable performance drop
 > compared to a vanilla kernel, also with OABI_COMPAT disabled.
 > 
 > Which would make me recommend workaround 3 if OABI compatibility is not
 > required. I'm not sure what Debian or Ubuntu do (I think the latter has
 > OABI_COMPAT disabled). If OABI is required, we should go with 2.
 > 
 > I've spent too much time already on this issue. There are several
 > solutions available, all with drawbacks. If there are no other solutions
 > to be proposed (other than modifying user-space), maybe we should vote
 > on the favourite option.
 > 
 > My vote: 2 (until we no longer have OABI filesystems around)

OABI is obsolete legacy, so while it should (perhaps) be
kept working, it should not be allowed to negatively affect
the performance of pure EABI systems.

If option 3 is best for EABI then that's my vote.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-23 12:12                                         ` Mikael Pettersson
@ 2009-09-23 12:42                                           ` Russell King - ARM Linux
  2009-09-23 12:51                                             ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-09-23 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 23, 2009 at 02:12:23PM +0200, Mikael Pettersson wrote:
> Catalin Marinas writes:
>  > On Wed, 2009-09-23 at 10:13 +0100, Catalin Marinas wrote:
>  > >      1. The patch I just posted which introduces flush_prot_range(). It
>  > >         affects generic code but if people here agree with the idea, I'm
>  > >         happy to try to push for it upstream. It allows mprotect(rx) to
>  > >         flush the caches as some people might expect and doesn't affect
>  > >         the fork or mmap performance
>  > >      2. Russell's (unposted) patch (or the one above) for flushing the
>  > >         D-cache during CoW (no I-cache invalidation necessary). There
>  > >         would be a (probably slight) performance drop for applications
>  > >         using intensive forking (apache?) but it shouldn't be that
>  > >         different from read-allocate caches. The common fork+exec
>  > >         combination shouldn't do much CoW so I don't expect this to be
>  > >         affected
>  > >      3. My initial patch for flush_cache_range but it would be better to
>  > >         disable read-implies-exec
>  > >      4. Handle the prefetch abort for exec permission and only to the
>  > >         flushing there (this patch would be a bit more complicated)
>  > 
>  > I did some tests with OABI_COMPAT disabled so that read-implies-exec is
>  > also disabled. Option 3 above has no noticeable performance drop
>  > compared to a vanilla kernel, also with OABI_COMPAT disabled.
>  > 
>  > Which would make me recommend workaround 3 if OABI compatibility is not
>  > required. I'm not sure what Debian or Ubuntu do (I think the latter has
>  > OABI_COMPAT disabled). If OABI is required, we should go with 2.
>  > 
>  > I've spent too much time already on this issue. There are several
>  > solutions available, all with drawbacks. If there are no other solutions
>  > to be proposed (other than modifying user-space), maybe we should vote
>  > on the favourite option.
>  > 
>  > My vote: 2 (until we no longer have OABI filesystems around)
> 
> OABI is obsolete legacy, so while it should (perhaps) be
> kept working, it should not be allowed to negatively affect
> the performance of pure EABI systems.
> 
> If option 3 is best for EABI then that's my vote.

However, Catalins benchmarks don't include testing my patch with EABI
executables with RIX disabled, so we really don't know how it would
behave.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-23 12:42                                           ` Russell King - ARM Linux
@ 2009-09-23 12:51                                             ` Catalin Marinas
  2009-09-23 12:55                                               ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-09-23 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2009-09-23 at 13:42 +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 23, 2009 at 02:12:23PM +0200, Mikael Pettersson wrote:
> > OABI is obsolete legacy, so while it should (perhaps) be
> > kept working, it should not be allowed to negatively affect
> > the performance of pure EABI systems.
> > 
> > If option 3 is best for EABI then that's my vote.
> 
> However, Catalins benchmarks don't include testing my patch with EABI
> executables with RIX disabled, so we really don't know how it would
> behave.

I'll do it shortly, no problem. But since it didn't take VM_EXEC into
account, I thought it wouldn't really matter. But I see where you come
from - just for comparison with my workaround.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-23 12:51                                             ` Catalin Marinas
@ 2009-09-23 12:55                                               ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-09-23 12:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2009-09-23 at 13:51 +0100, Catalin Marinas wrote:
> On Wed, 2009-09-23 at 13:42 +0100, Russell King - ARM Linux wrote:
> > On Wed, Sep 23, 2009 at 02:12:23PM +0200, Mikael Pettersson wrote:
> > > OABI is obsolete legacy, so while it should (perhaps) be
> > > kept working, it should not be allowed to negatively affect
> > > the performance of pure EABI systems.
> > > 
> > > If option 3 is best for EABI then that's my vote.
> > 
> > However, Catalins benchmarks don't include testing my patch with EABI
> > executables with RIX disabled, so we really don't know how it would
> > behave.
> 
> I'll do it shortly, no problem. But since it didn't take VM_EXEC into
> account, I thought it wouldn't really matter. But I see where you come
> from - just for comparison with my workaround.

Forgot to mention - I don't think lmbench is too relevant to CoW
performance since the applications are relatively small. But that's the
problem with any benchmark application which was designed to be, well, a
benchmark application. We would need real-world applications but I don't
have time for further investigation here.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP
  2009-09-11 16:06                         ` Catalin Marinas
@ 2009-10-06  6:12                           ` Antti P Miettinen
  2010-08-31  0:07                             ` Shinya Kuribayashi
  0 siblings, 1 reply; 72+ messages in thread
From: Antti P Miettinen @ 2009-10-06  6:12 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas <catalin.marinas@arm.com> writes:
> Fixed it with this new commit (I'm not posting the patch here as it's
> not meant for review on this list):
>
> http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=3cc8ee538f6727d1f6773fff6d8e31bc5522bfee

Just a piece of information: with vanilla 2.6.31 boot from NFS root on
my PB11MPCore hangs, with this patch applied, boot succeeds.

-- 
http://www.iki.fi/~ananaza/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-09-21 10:07                           ` Russell King - ARM Linux
                                               ` (2 preceding siblings ...)
  2009-09-21 21:38                             ` Russell King - ARM Linux
@ 2009-10-15 14:57                             ` Russell King - ARM Linux
  2009-10-15 15:20                               ` Catalin Marinas
  3 siblings, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-10-15 14:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > We would need to fix this somehow as well. We currently handle the
> > I-cache in update_mmu_cache() when a page is first mapped if it has
> > VM_EXEC set.
> 
> The reason I'm pushing you hard to separate the two issues is that the
> two should be treated separately.  I think we need to consider ensuring
> that freed pages do not have any I-cache lines associated with them,
> rather than waiting for them to be allocated and then dealing with the
> I-cache problem.

Having now benchmarked this (making flush_cache_* always invalidate
the I-cache, so free'd pages are I-cache clean), and to me, the results
quite look promising - please try out this patch.

diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index d0d17b6..b9c1cd4 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, pfn);
-		else if (vma->vm_flags & VM_EXEC)
-			__flush_icache_all();
 	}
 }
 
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index b279429..5b739b4 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -20,6 +20,12 @@
 
 #ifdef CONFIG_ARM_ERRATA_411920
 extern void v6_icache_inval_all(void);
+#else
+static inline void v6_icache_inval_all(void)
+{
+	const int zero = 0;
+	asm("mcr	p15, 0, %0, c7, c5, 0" : : "r" (zero) : "cc");
+}
 #endif
 
 #ifdef CONFIG_CPU_CACHE_VIPT
@@ -35,16 +41,10 @@ static void flush_pfn_alias(unsigned long pfn, unsigned long vaddr)
 	flush_tlb_kernel_page(to);
 
 	asm(	"mcrr	p15, 0, %1, %0, c14\n"
-	"	mcr	p15, 0, %2, c7, c10, 4\n"
-#ifndef CONFIG_ARM_ERRATA_411920
-	"	mcr	p15, 0, %2, c7, c5, 0\n"
-#endif
+	"	mcr	p15, 0, %2, c7, c10, 4"
 	    :
 	    : "r" (to), "r" (to + PAGE_SIZE - L1_CACHE_BYTES), "r" (zero)
 	    : "cc");
-#ifdef CONFIG_ARM_ERRATA_411920
-	v6_icache_inval_all();
-#endif
 }
 
 void flush_cache_mm(struct mm_struct *mm)
@@ -57,17 +57,12 @@ void flush_cache_mm(struct mm_struct *mm)
 
 	if (cache_is_vipt_aliasing()) {
 		asm(	"mcr	p15, 0, %0, c7, c14, 0\n"
-		"	mcr	p15, 0, %0, c7, c10, 4\n"
-#ifndef CONFIG_ARM_ERRATA_411920
-		"	mcr	p15, 0, %0, c7, c5, 0\n"
-#endif
+		"	mcr	p15, 0, %0, c7, c10, 4"
 		    :
 		    : "r" (0)
 		    : "cc");
-#ifdef CONFIG_ARM_ERRATA_411920
-		v6_icache_inval_all();
-#endif
 	}
+	v6_icache_inval_all();
 }
 
 void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end)
@@ -81,17 +76,12 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 
 	if (cache_is_vipt_aliasing()) {
 		asm(	"mcr	p15, 0, %0, c7, c14, 0\n"
-		"	mcr	p15, 0, %0, c7, c10, 4\n"
-#ifndef CONFIG_ARM_ERRATA_411920
-		"	mcr	p15, 0, %0, c7, c5, 0\n"
-#endif
+		"	mcr	p15, 0, %0, c7, c10, 4"
 		    :
 		    : "r" (0)
 		    : "cc");
-#ifdef CONFIG_ARM_ERRATA_411920
-		v6_icache_inval_all();
-#endif
 	}
+	v6_icache_inval_all();
 }
 
 void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
@@ -106,6 +96,7 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
 
 	if (cache_is_vipt_aliasing())
 		flush_pfn_alias(pfn, user_addr);
+	v6_icache_inval_all();
 }
 
 void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
@@ -226,8 +217,6 @@ void flush_dcache_page(struct page *page)
 		__flush_dcache_page(mapping, page);
 		if (mapping && cache_is_vivt())
 			__flush_dcache_aliases(mapping, page);
-		else if (mapping)
-			__flush_icache_all();
 	}
 }
 EXPORT_SYMBOL(flush_dcache_page);

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 14:57                             ` Russell King - ARM Linux
@ 2009-10-15 15:20                               ` Catalin Marinas
  2009-10-15 15:28                                 ` Russell King - ARM Linux
                                                   ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-10-15 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > We would need to fix this somehow as well. We currently handle the
> > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > VM_EXEC set.
> > 
> > The reason I'm pushing you hard to separate the two issues is that the
> > two should be treated separately.  I think we need to consider ensuring
> > that freed pages do not have any I-cache lines associated with them,
> > rather than waiting for them to be allocated and then dealing with the
> > I-cache problem.
> 
> Having now benchmarked this (making flush_cache_* always invalidate
> the I-cache, so free'd pages are I-cache clean), and to me, the results
> quite look promising - please try out this patch.
> 
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index d0d17b6..b9c1cd4 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  	if (mapping) {
>  		if (cache_is_vivt())
>  			make_coherent(mapping, vma, addr, pfn);
> -		else if (vma->vm_flags & VM_EXEC)
> -			__flush_icache_all();
>  	}
>  }

Before trying the patch, I don't entirely agree with the approach. You
will get speculative fetches in the I-cache via the kernel linear
mapping (where NX is always cleared) on newer processors and may end up
with random faults in user space (not that likely but not impossible
either).

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 15:20                               ` Catalin Marinas
@ 2009-10-15 15:28                                 ` Russell King - ARM Linux
  2009-10-15 15:56                                   ` Catalin Marinas
  2009-10-15 15:48                                 ` Dirk Behme
  2009-10-25 13:04                                 ` Russell King - ARM Linux
  2 siblings, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-10-15 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 15, 2009 at 04:20:22PM +0100, Catalin Marinas wrote:
> On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> > On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > > We would need to fix this somehow as well. We currently handle the
> > > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > > VM_EXEC set.
> > > 
> > > The reason I'm pushing you hard to separate the two issues is that the
> > > two should be treated separately.  I think we need to consider ensuring
> > > that freed pages do not have any I-cache lines associated with them,
> > > rather than waiting for them to be allocated and then dealing with the
> > > I-cache problem.
> > 
> > Having now benchmarked this (making flush_cache_* always invalidate
> > the I-cache, so free'd pages are I-cache clean), and to me, the results
> > quite look promising - please try out this patch.
> > 
> > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> > index d0d17b6..b9c1cd4 100644
> > --- a/arch/arm/mm/fault-armv.c
> > +++ b/arch/arm/mm/fault-armv.c
> > @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> >  	if (mapping) {
> >  		if (cache_is_vivt())
> >  			make_coherent(mapping, vma, addr, pfn);
> > -		else if (vma->vm_flags & VM_EXEC)
> > -			__flush_icache_all();
> >  	}
> >  }
> 
> Before trying the patch, I don't entirely agree with the approach. You
> will get speculative fetches in the I-cache via the kernel linear
> mapping (where NX is always cleared) on newer processors and may end up
> with random faults in user space (not that likely but not impossible
> either).

That means we have no option but to flush the I-cache every time a page
is placed into userspace - we might as well make update_mmu_cache
unconditionally flush the I-cache every time its called.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 15:20                               ` Catalin Marinas
  2009-10-15 15:28                                 ` Russell King - ARM Linux
@ 2009-10-15 15:48                                 ` Dirk Behme
  2009-10-15 15:53                                   ` Catalin Marinas
  2009-10-25 13:04                                 ` Russell King - ARM Linux
  2 siblings, 1 reply; 72+ messages in thread
From: Dirk Behme @ 2009-10-15 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
>> On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
>>> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
>>>> We would need to fix this somehow as well. We currently handle the
>>>> I-cache in update_mmu_cache() when a page is first mapped if it has
>>>> VM_EXEC set.
>>> The reason I'm pushing you hard to separate the two issues is that the
>>> two should be treated separately.  I think we need to consider ensuring
>>> that freed pages do not have any I-cache lines associated with them,
>>> rather than waiting for them to be allocated and then dealing with the
>>> I-cache problem.
>> Having now benchmarked this (making flush_cache_* always invalidate
>> the I-cache, so free'd pages are I-cache clean), and to me, the results
>> quite look promising - please try out this patch.
>>
>> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
>> index d0d17b6..b9c1cd4 100644
>> --- a/arch/arm/mm/fault-armv.c
>> +++ b/arch/arm/mm/fault-armv.c
>> @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>>  	if (mapping) {
>>  		if (cache_is_vivt())
>>  			make_coherent(mapping, vma, addr, pfn);
>> -		else if (vma->vm_flags & VM_EXEC)
>> -			__flush_icache_all();
>>  	}
>>  }
> 
> Before trying the patch, I don't entirely agree with the approach. You
> will get speculative fetches in the I-cache via the kernel linear
> mapping (where NX is always cleared) on newer processors and may end up
> with random faults in user space (not that likely but not impossible
> either).

Just to get a better understanding: With "newer processors" you mean 
Cortex A9? So for ARM11 MPCore the approach should be fine?

Thanks

Dirk

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 15:48                                 ` Dirk Behme
@ 2009-10-15 15:53                                   ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-10-15 15:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2009-10-15 at 17:48 +0200, Dirk Behme wrote:
> Catalin Marinas wrote:
> > On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> >> On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> >>> On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> >>>> We would need to fix this somehow as well. We currently handle the
> >>>> I-cache in update_mmu_cache() when a page is first mapped if it has
> >>>> VM_EXEC set.
> >>> The reason I'm pushing you hard to separate the two issues is that the
> >>> two should be treated separately.  I think we need to consider ensuring
> >>> that freed pages do not have any I-cache lines associated with them,
> >>> rather than waiting for them to be allocated and then dealing with the
> >>> I-cache problem.
> >> Having now benchmarked this (making flush_cache_* always invalidate
> >> the I-cache, so free'd pages are I-cache clean), and to me, the results
> >> quite look promising - please try out this patch.
> >>
> >> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> >> index d0d17b6..b9c1cd4 100644
> >> --- a/arch/arm/mm/fault-armv.c
> >> +++ b/arch/arm/mm/fault-armv.c
> >> @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> >>  	if (mapping) {
> >>  		if (cache_is_vivt())
> >>  			make_coherent(mapping, vma, addr, pfn);
> >> -		else if (vma->vm_flags & VM_EXEC)
> >> -			__flush_icache_all();
> >>  	}
> >>  }
> > 
> > Before trying the patch, I don't entirely agree with the approach. You
> > will get speculative fetches in the I-cache via the kernel linear
> > mapping (where NX is always cleared) on newer processors and may end up
> > with random faults in user space (not that likely but not impossible
> > either).
> 
> Just to get a better understanding: With "newer processors" you mean 
> Cortex A9? So for ARM11 MPCore the approach should be fine?

Yes, Cortex-A9.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 15:28                                 ` Russell King - ARM Linux
@ 2009-10-15 15:56                                   ` Catalin Marinas
  2009-10-20 11:39                                     ` Catalin Marinas
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-10-15 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2009-10-15 at 16:28 +0100, Russell King - ARM Linux wrote:
> On Thu, Oct 15, 2009 at 04:20:22PM +0100, Catalin Marinas wrote:
> > On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> > > On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > > > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > > > We would need to fix this somehow as well. We currently handle the
> > > > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > > > VM_EXEC set.
> > > > 
> > > > The reason I'm pushing you hard to separate the two issues is that the
> > > > two should be treated separately.  I think we need to consider ensuring
> > > > that freed pages do not have any I-cache lines associated with them,
> > > > rather than waiting for them to be allocated and then dealing with the
> > > > I-cache problem.
> > > 
> > > Having now benchmarked this (making flush_cache_* always invalidate
> > > the I-cache, so free'd pages are I-cache clean), and to me, the results
> > > quite look promising - please try out this patch.
> > > 
> > > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> > > index d0d17b6..b9c1cd4 100644
> > > --- a/arch/arm/mm/fault-armv.c
> > > +++ b/arch/arm/mm/fault-armv.c
> > > @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> > >  	if (mapping) {
> > >  		if (cache_is_vivt())
> > >  			make_coherent(mapping, vma, addr, pfn);
> > > -		else if (vma->vm_flags & VM_EXEC)
> > > -			__flush_icache_all();
> > >  	}
> > >  }
> > 
> > Before trying the patch, I don't entirely agree with the approach. You
> > will get speculative fetches in the I-cache via the kernel linear
> > mapping (where NX is always cleared) on newer processors and may end up
> > with random faults in user space (not that likely but not impossible
> > either).
> 
> That means we have no option but to flush the I-cache every time a page
> is placed into userspace - we might as well make update_mmu_cache
> unconditionally flush the I-cache every time its called.

We have to look at more than one case: COW text pages, mprotext(RX),
Nitin's scenario with swap pages.

We can flush the D-cache in copy_user_page(), maybe lazily via
flush_dcache_page() and invalidate the I-cache in update_mmu_cache() if
PG_arch_1 (ignoring VM_EXEC).

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 15:56                                   ` Catalin Marinas
@ 2009-10-20 11:39                                     ` Catalin Marinas
  2009-10-25 13:39                                       ` Russell King - ARM Linux
  2009-10-25 14:48                                       ` Russell King - ARM Linux
  0 siblings, 2 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-10-20 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2009-10-15 at 16:56 +0100, Catalin Marinas wrote:
> On Thu, 2009-10-15 at 16:28 +0100, Russell King - ARM Linux wrote:
> > On Thu, Oct 15, 2009 at 04:20:22PM +0100, Catalin Marinas wrote:
> > > On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> > > > On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > > > > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > > > > We would need to fix this somehow as well. We currently handle the
> > > > > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > > > > VM_EXEC set.
> > > > > 
> > > > > The reason I'm pushing you hard to separate the two issues is that the
> > > > > two should be treated separately.  I think we need to consider ensuring
> > > > > that freed pages do not have any I-cache lines associated with them,
> > > > > rather than waiting for them to be allocated and then dealing with the
> > > > > I-cache problem.
> > > > 
> > > > Having now benchmarked this (making flush_cache_* always invalidate
> > > > the I-cache, so free'd pages are I-cache clean), and to me, the results
> > > > quite look promising - please try out this patch.
[...]
> > > Before trying the patch, I don't entirely agree with the approach. You
> > > will get speculative fetches in the I-cache via the kernel linear
> > > mapping (where NX is always cleared) on newer processors and may end up
> > > with random faults in user space (not that likely but not impossible
> > > either).
> > 
> > That means we have no option but to flush the I-cache every time a page
> > is placed into userspace - we might as well make update_mmu_cache
> > unconditionally flush the I-cache every time its called.
[...]
> We can flush the D-cache in copy_user_page(), maybe lazily via
> flush_dcache_page() and invalidate the I-cache in update_mmu_cache() if
> PG_arch_1 (ignoring VM_EXEC).

Something like below (based on your original suggestion for flushing the
D-cache in copy_user_highpage).

BTW, the cache flushing code in Linux can be optimised a bit more on
VIPT caches:

      * __cpuc_flush_dcache_page() could cope with just D-cache clean
        rather than clean+invalidate
      * whole I-cache invalidation was needed for some ARM1136 erratum.
        We can conditionally revert it to invalidating a range
      * Cortex-A9 SMP kernels can go back to lazy cache flushing (I have
        a patch for this that works fine but I need to add some extra
        safety checks of the CPU ID registers to make sure that the
        feature is present)



Flush the D-cache during copy_user_highpage()

From: Catalin Marinas <catalin.marinas@arm.com>

The I and D caches for copy-on-write pages on processors with
write-allocate caches become incoherent causing problems on application
relying on CoW for text pages (dynamic linker relocating symbols in a
text page). This patch flushes the D-cache for such pages (possibly
lazily via update_mmu_cache which also takes care of the I-cache).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/copypage-v6.c |    1 +
 arch/arm/mm/fault-armv.c  |    3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index 4127a7b..e61fdc8 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -43,6 +43,7 @@ static void v6_copy_user_highpage_nonaliasing(struct page *to,
 	copy_page(kto, kfrom);
 	kunmap_atomic(kto, KM_USER1);
 	kunmap_atomic(kfrom, KM_USER0);
+	flush_dcache_page(to);
 }
 
 /*
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index d0d17b6..4e37ab6 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -160,8 +160,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, pfn);
-		else if (vma->vm_flags & VM_EXEC)
-			__flush_icache_all();
+		__flush_icache_all();
 	}
 }
 

-- 
Catalin

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-15 15:20                               ` Catalin Marinas
  2009-10-15 15:28                                 ` Russell King - ARM Linux
  2009-10-15 15:48                                 ` Dirk Behme
@ 2009-10-25 13:04                                 ` Russell King - ARM Linux
  2009-10-26 18:18                                   ` Catalin Marinas
  2 siblings, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-10-25 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 15, 2009 at 04:20:22PM +0100, Catalin Marinas wrote:
> On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> > On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > > We would need to fix this somehow as well. We currently handle the
> > > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > > VM_EXEC set.
> > > 
> > > The reason I'm pushing you hard to separate the two issues is that the
> > > two should be treated separately.  I think we need to consider ensuring
> > > that freed pages do not have any I-cache lines associated with them,
> > > rather than waiting for them to be allocated and then dealing with the
> > > I-cache problem.
> > 
> > Having now benchmarked this (making flush_cache_* always invalidate
> > the I-cache, so free'd pages are I-cache clean), and to me, the results
> > quite look promising - please try out this patch.
> > 
> > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> > index d0d17b6..b9c1cd4 100644
> > --- a/arch/arm/mm/fault-armv.c
> > +++ b/arch/arm/mm/fault-armv.c
> > @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> >  	if (mapping) {
> >  		if (cache_is_vivt())
> >  			make_coherent(mapping, vma, addr, pfn);
> > -		else if (vma->vm_flags & VM_EXEC)
> > -			__flush_icache_all();
> >  	}
> >  }
> 
> Before trying the patch, I don't entirely agree with the approach. You
> will get speculative fetches in the I-cache via the kernel linear
> mapping (where NX is always cleared) on newer processors and may end up
> with random faults in user space (not that likely but not impossible
> either).

BTW, if this is your view, we should get rid of the I-cache flushing for
the aliasing VIPT caches in the flush_cache_* functions - to make them
behave in the same way as VIPT non-aliasing.

Having these functions behave in one way for VIPT non-aliasing and
a different way for VIPT aliasing caches as far as I/D coherency is
stupid.

However, as a general point, Linux's cache flushing APIs are predicated
on the notion that there is no speculative fetches going on - for
instance, when mappings are removed, the following procedure is
adhered to:

	flush_cache();
	change page tables
	flush_tlb();

I suspect that the right solution to the problems with speculative
fetches is that we need to invent a new cache API to adequately cover
this - and since we'll be the only architecture needing the API, you
can expect it to have a very high maintainence cost.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-20 11:39                                     ` Catalin Marinas
@ 2009-10-25 13:39                                       ` Russell King - ARM Linux
  2009-10-26 18:40                                         ` Catalin Marinas
  2009-10-25 14:48                                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-10-25 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 20, 2009 at 12:39:08PM +0100, Catalin Marinas wrote:
> On Thu, 2009-10-15 at 16:56 +0100, Catalin Marinas wrote:
> > On Thu, 2009-10-15 at 16:28 +0100, Russell King - ARM Linux wrote:
> > > On Thu, Oct 15, 2009 at 04:20:22PM +0100, Catalin Marinas wrote:
> > > > On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> > > > > On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > > > > > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > > > > > We would need to fix this somehow as well. We currently handle the
> > > > > > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > > > > > VM_EXEC set.
> > > > > > 
> > > > > > The reason I'm pushing you hard to separate the two issues is that the
> > > > > > two should be treated separately.  I think we need to consider ensuring
> > > > > > that freed pages do not have any I-cache lines associated with them,
> > > > > > rather than waiting for them to be allocated and then dealing with the
> > > > > > I-cache problem.
> > > > > 
> > > > > Having now benchmarked this (making flush_cache_* always invalidate
> > > > > the I-cache, so free'd pages are I-cache clean), and to me, the results
> > > > > quite look promising - please try out this patch.
> [...]
> > > > Before trying the patch, I don't entirely agree with the approach. You
> > > > will get speculative fetches in the I-cache via the kernel linear
> > > > mapping (where NX is always cleared) on newer processors and may end up
> > > > with random faults in user space (not that likely but not impossible
> > > > either).
> > > 
> > > That means we have no option but to flush the I-cache every time a page
> > > is placed into userspace - we might as well make update_mmu_cache
> > > unconditionally flush the I-cache every time its called.
> [...]
> > We can flush the D-cache in copy_user_page(), maybe lazily via
> > flush_dcache_page() and invalidate the I-cache in update_mmu_cache() if
> > PG_arch_1 (ignoring VM_EXEC).
> 
> Something like below (based on your original suggestion for flushing the
> D-cache in copy_user_highpage).
> 
> BTW, the cache flushing code in Linux can be optimised a bit more on
> VIPT caches:
> 
>       * __cpuc_flush_dcache_page() could cope with just D-cache clean
>         rather than clean+invalidate

No it can not - that breaks shared mappings.  The problem is that
flush_dcache_page() is used in two circumstances.  These are described
in more detail in cachetlb.txt, but briefly:

1. After the kernel writes to its mapping for a page cache page, and needs
   to ensure that those writes are visible to shared mmap()s in userspace.

2. Before the kernel reads from its mapping for a page cache page, and
   needs to ensure that it reads up to date data written by userspace
   into those mappings.

So just cleaning the D-cache means that (2) will return stale data.

>       * whole I-cache invalidation was needed for some ARM1136 erratum.
>         We can conditionally revert it to invalidating a range

That's not what the commit (826cbda) says which implemented it.  Also,
since we have broken I-cache flushes even with the erratum enabled,
let's fix the work-around and re-evaluate the situation before changing
anything.

It could be that some of the I-cache problems are caused by the improperly
fixed erratum.

> Flush the D-cache during copy_user_highpage()
> 
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> The I and D caches for copy-on-write pages on processors with
> write-allocate caches become incoherent causing problems on application
> relying on CoW for text pages (dynamic linker relocating symbols in a
> text page). This patch flushes the D-cache for such pages (possibly
> lazily via update_mmu_cache which also takes care of the I-cache).

Actually, I think this is caused by a missing I-cache flush in
flush_cache_range().  Adding that flush should resolve this problem
in hand (and make VIPT aliasing and VIPT non-aliasing behave in the
same manner.)  That's something which my patch previously posted in
this thread did.

Note also that with ASID tagged VIVT I-cache, we are missing out
on cache flushing.  As you've identified, it's entirely possible
for text page translations to be changed, and according to B3.4.1
bullet 2, a flush is required.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-20 11:39                                     ` Catalin Marinas
  2009-10-25 13:39                                       ` Russell King - ARM Linux
@ 2009-10-25 14:48                                       ` Russell King - ARM Linux
  2009-10-26 18:45                                         ` Catalin Marinas
  1 sibling, 1 reply; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-10-25 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 20, 2009 at 12:39:08PM +0100, Catalin Marinas wrote:
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index d0d17b6..4e37ab6 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -160,8 +160,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  	if (mapping) {
>  		if (cache_is_vivt())
>  			make_coherent(mapping, vma, addr, pfn);
> -		else if (vma->vm_flags & VM_EXEC)
> -			__flush_icache_all();
> +		__flush_icache_all();

BTW, why are you penalising VIVT cached systems as well with this
change?  Surely you meant to leave the 'else' still there.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-25 13:04                                 ` Russell King - ARM Linux
@ 2009-10-26 18:18                                   ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-10-26 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 2009-10-25 at 13:04 +0000, Russell King - ARM Linux wrote:
> On Thu, Oct 15, 2009 at 04:20:22PM +0100, Catalin Marinas wrote:
> > On Thu, 2009-10-15 at 15:57 +0100, Russell King - ARM Linux wrote:
> > > On Mon, Sep 21, 2009 at 11:07:51AM +0100, Russell King - ARM Linux wrote:
> > > > On Mon, Sep 21, 2009 at 10:44:23AM +0100, Catalin Marinas wrote:
> > > > > We would need to fix this somehow as well. We currently handle the
> > > > > I-cache in update_mmu_cache() when a page is first mapped if it has
> > > > > VM_EXEC set.
> > > > 
> > > > The reason I'm pushing you hard to separate the two issues is that the
> > > > two should be treated separately.  I think we need to consider ensuring
> > > > that freed pages do not have any I-cache lines associated with them,
> > > > rather than waiting for them to be allocated and then dealing with the
> > > > I-cache problem.
> > > 
> > > Having now benchmarked this (making flush_cache_* always invalidate
> > > the I-cache, so free'd pages are I-cache clean), and to me, the results
> > > quite look promising - please try out this patch.
> > > 
> > > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> > > index d0d17b6..b9c1cd4 100644
> > > --- a/arch/arm/mm/fault-armv.c
> > > +++ b/arch/arm/mm/fault-armv.c
> > > @@ -160,8 +160,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> > >  	if (mapping) {
> > >  		if (cache_is_vivt())
> > >  			make_coherent(mapping, vma, addr, pfn);
> > > -		else if (vma->vm_flags & VM_EXEC)
> > > -			__flush_icache_all();
> > >  	}
> > >  }
> > 
> > Before trying the patch, I don't entirely agree with the approach. You
> > will get speculative fetches in the I-cache via the kernel linear
> > mapping (where NX is always cleared) on newer processors and may end up
> > with random faults in user space (not that likely but not impossible
> > either).
> 
> BTW, if this is your view, we should get rid of the I-cache flushing for
> the aliasing VIPT caches in the flush_cache_* functions - to make them
> behave in the same way as VIPT non-aliasing.

I lost my memory on why these were added in the first instance. As I
understand (now), they were added to avoid having freed page cache pages
with stale I-cache lines. But if we invalidate the I-cache when mapping
pages, I agree, we should have the same behaviour for both aliasing and
non-aliasing caches.

> However, as a general point, Linux's cache flushing APIs are predicated
> on the notion that there is no speculative fetches going on - for
> instance, when mappings are removed, the following procedure is
> adhered to:
> 
> 	flush_cache();
> 	change page tables
> 	flush_tlb();
> 
> I suspect that the right solution to the problems with speculative
> fetches is that we need to invent a new cache API to adequately cover
> this - and since we'll be the only architecture needing the API, you
> can expect it to have a very high maintainence cost.

I don't think speculative prefetches is something specific to ARM.
Anything that can do branch prediction also does some form of
prefetching (I know Alpha even does speculative stores).

But in the case above I don't think it's a problem with VIPT
non-aliasing caches (and ARMv7 doesn't allow aliasing caches). Since we
can only have speculative reads, there is no way to get dirty cache
lines that would later need to be evicted (even so, I think the cache
keeps the physical address for a line so that it doesn't require another
TLB look-up for evicting lines; but this may be implementation
specific).

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-25 13:39                                       ` Russell King - ARM Linux
@ 2009-10-26 18:40                                         ` Catalin Marinas
  0 siblings, 0 replies; 72+ messages in thread
From: Catalin Marinas @ 2009-10-26 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 2009-10-25 at 13:39 +0000, Russell King - ARM Linux wrote:
> On Tue, Oct 20, 2009 at 12:39:08PM +0100, Catalin Marinas wrote:
> > BTW, the cache flushing code in Linux can be optimised a bit more on
> > VIPT caches:
> > 
> >       * __cpuc_flush_dcache_page() could cope with just D-cache clean
> >         rather than clean+invalidate
> 
> No it can not - that breaks shared mappings.  The problem is that
> flush_dcache_page() is used in two circumstances.  These are described
> in more detail in cachetlb.txt, but briefly:
> 
> 1. After the kernel writes to its mapping for a page cache page, and needs
>    to ensure that those writes are visible to shared mmap()s in userspace.
> 
> 2. Before the kernel reads from its mapping for a page cache page, and
>    needs to ensure that it reads up to date data written by userspace
>    into those mappings.
> 
> So just cleaning the D-cache means that (2) will return stale data.

I was referring to VIPT (non-aliasing) case only. Point (2) above is
still valid even if only cleaning the D-cache lines. Shared mappings use
the same cache lines according to the physical address.


> >       * whole I-cache invalidation was needed for some ARM1136 erratum.
> >         We can conditionally revert it to invalidating a range
> 
> That's not what the commit (826cbda) says which implemented it.  Also,
> since we have broken I-cache flushes even with the erratum enabled,
> let's fix the work-around and re-evaluate the situation before changing
> anything.

The commit doesn't say but there was an earlier commit (141fa40cff9088)
which was changing from I-cache inv by MVA to full I-cache inv because
of some ARM1136 erratum (371025). When I sent the second, I just
followed the same idea. We could make this conditional but need to run
some benchmarks to check whether there is a real benefit.

> > Flush the D-cache during copy_user_highpage()
> > 
> > From: Catalin Marinas <catalin.marinas@arm.com>
> > 
> > The I and D caches for copy-on-write pages on processors with
> > write-allocate caches become incoherent causing problems on application
> > relying on CoW for text pages (dynamic linker relocating symbols in a
> > text page). This patch flushes the D-cache for such pages (possibly
> > lazily via update_mmu_cache which also takes care of the I-cache).
> 
> Actually, I think this is caused by a missing I-cache flush in
> flush_cache_range().  Adding that flush should resolve this problem
> in hand (and make VIPT aliasing and VIPT non-aliasing behave in the
> same manner.)  That's something which my patch previously posted in
> this thread did.

I think the I-cache flush would later be done via update_mmu_cache(), if
I recall the call trace correctly.

Anyway, flushing both D and I caches in flush_cache_range() would have
the same impact.

> Note also that with ASID tagged VIVT I-cache, we are missing out
> on cache flushing.  As you've identified, it's entirely possible
> for text page translations to be changed, and according to B3.4.1
> bullet 2, a flush is required.

We can do this conditionally at run-time. I don't have such CPU to test
it.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-25 14:48                                       ` Russell King - ARM Linux
@ 2009-10-26 18:45                                         ` Catalin Marinas
  2009-10-26 19:17                                           ` Russell King - ARM Linux
  0 siblings, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2009-10-26 18:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 2009-10-25 at 14:48 +0000, Russell King - ARM Linux wrote:
> On Tue, Oct 20, 2009 at 12:39:08PM +0100, Catalin Marinas wrote:
> > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> > index d0d17b6..4e37ab6 100644
> > --- a/arch/arm/mm/fault-armv.c
> > +++ b/arch/arm/mm/fault-armv.c
> > @@ -160,8 +160,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> >  	if (mapping) {
> >  		if (cache_is_vivt())
> >  			make_coherent(mapping, vma, addr, pfn);
> > -		else if (vma->vm_flags & VM_EXEC)
> > -			__flush_icache_all();
> > +		__flush_icache_all();
> 
> BTW, why are you penalising VIVT cached systems as well with this
> change?  Surely you meant to leave the 'else' still there.

I removed the else for the case where v6_copy_user_highpage() only does
flush_dcache_page() and the page was mprotect(RW) temporarily.

There was another situation in the recent swap/anonymous pages case
where I replied that there could be a situation where I-cache needs
unconditional invalidation.

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Kernel related (?) user space crash at ARM11 MPCore
  2009-10-26 18:45                                         ` Catalin Marinas
@ 2009-10-26 19:17                                           ` Russell King - ARM Linux
  0 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2009-10-26 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 26, 2009 at 06:45:36PM +0000, Catalin Marinas wrote:
> On Sun, 2009-10-25 at 14:48 +0000, Russell King - ARM Linux wrote:
> > On Tue, Oct 20, 2009 at 12:39:08PM +0100, Catalin Marinas wrote:
> > > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> > > index d0d17b6..4e37ab6 100644
> > > --- a/arch/arm/mm/fault-armv.c
> > > +++ b/arch/arm/mm/fault-armv.c
> > > @@ -160,8 +160,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
> > >  	if (mapping) {
> > >  		if (cache_is_vivt())
> > >  			make_coherent(mapping, vma, addr, pfn);
> > > -		else if (vma->vm_flags & VM_EXEC)
> > > -			__flush_icache_all();
> > > +		__flush_icache_all();
> > 
> > BTW, why are you penalising VIVT cached systems as well with this
> > change?  Surely you meant to leave the 'else' still there.
> 
> I removed the else for the case where v6_copy_user_highpage() only does
> flush_dcache_page() and the page was mprotect(RW) temporarily.
> 
> There was another situation in the recent swap/anonymous pages case
> where I replied that there could be a situation where I-cache needs
> unconditional invalidation.

The question still stands.  We've not had I-cache flushing here for VIVT
caches.  We don't have a problem with them here.  Why is it necessary
for them to also flush the I-cache?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP
  2009-10-06  6:12                           ` smsc911x.c driver and SMP Antti P Miettinen
@ 2010-08-31  0:07                             ` Shinya Kuribayashi
  2010-08-31  6:22                               ` Antti P Miettinen
  2010-08-31  8:33                               ` Catalin Marinas
  0 siblings, 2 replies; 72+ messages in thread
From: Shinya Kuribayashi @ 2010-08-31  0:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin and Antti,

On 10/6/2009 3:12 PM, Antti P Miettinen wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
>> Fixed it with this new commit (I'm not posting the patch here as it's
>> not meant for review on this list):
>>
>> http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=3cc8ee538f6727d1f6773fff6d8e31bc5522bfee
> 
> Just a piece of information: with vanilla 2.6.31 boot from NFS root on
> my PB11MPCore hangs, with this patch applied, boot succeeds.

Now I'm setting up similar dev environment by chance; Cortex-A9 MPcore,
slightly old kernel (say .31 or .32), and with smsc911x.c Ethernet
driver required [1].

I'm auditing the required patches right now, and very interested in
above patch, but it seems that blob got lost these days :-(

Do you still have the patch, and would you mind sharing it with me?

[1] http://www.kmckk.co.jp/eng/kzmca9/index.html
-- 
Shinya Kuribayashi
Renesas Electronics

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP
  2010-08-31  0:07                             ` Shinya Kuribayashi
@ 2010-08-31  6:22                               ` Antti P Miettinen
  2010-08-31  9:10                                 ` Shinya Kuribayashi
  2010-08-31  8:33                               ` Catalin Marinas
  1 sibling, 1 reply; 72+ messages in thread
From: Antti P Miettinen @ 2010-08-31  6:22 UTC (permalink / raw)
  To: linux-arm-kernel

Shinya Kuribayashi <shinya.kuribayashi.px@renesas.com> writes:
> Hi Catalin and Antti,
>
> On 10/6/2009 3:12 PM, Antti P Miettinen wrote:
>> Catalin Marinas <catalin.marinas@arm.com> writes:
>>> Fixed it with this new commit (I'm not posting the patch here as it's
>>> not meant for review on this list):
>>>
>>> http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=3cc8ee538f6727d1f6773fff6d8e31bc5522bfee
>> 
>> Just a piece of information: with vanilla 2.6.31 boot from NFS root on
>> my PB11MPCore hangs, with this patch applied, boot succeeds.
>
> Now I'm setting up similar dev environment by chance; Cortex-A9 MPcore,
> slightly old kernel (say .31 or .32), and with smsc911x.c Ethernet
> driver required [1].
>
> I'm auditing the required patches right now, and very interested in
> above patch, but it seems that blob got lost these days :-(
>
> Do you still have the patch, and would you mind sharing it with me?
>
> [1] http://www.kmckk.co.jp/eng/kzmca9/index.html
> -- 
> Shinya Kuribayashi
> Renesas Electronics

Hmm.. hard to remember what the patch actually contained - could it be
the one below?  By the way - I have a KZM board so I'd be happy to
test your kernels.

diff --git a/drivers/net/smsc911x.c b/drivers/net/smsc911x.c
index 94b6d26..999e89d 100644
--- a/drivers/net/smsc911x.c
+++ b/drivers/net/smsc911x.c
@@ -84,8 +84,7 @@ struct smsc911x_data {
 	 */
 	spinlock_t mac_lock;
 
-	/* spinlock to ensure 16-bit accesses are serialised.
-	 * unused with a 32-bit bus */
+	/* spinlock to ensure register accesses are serialised */
 	spinlock_t dev_lock;
 
 	struct phy_device *phy_dev;
@@ -118,37 +117,33 @@ struct smsc911x_data {
 	unsigned int hashlo;
 };
 
-/* The 16-bit access functions are significantly slower, due to the locking
- * necessary.  If your bus hardware can be configured to do this for you
- * (in response to a single 32-bit operation from software), you should use
- * the 32-bit access functions instead. */
-
-static inline u32 smsc911x_reg_read(struct smsc911x_data *pdata, u32 reg)
+static inline u32 __smsc911x_reg_read(struct smsc911x_data *pdata, u32 reg)
 {
 	if (pdata->config.flags & SMSC911X_USE_32BIT)
 		return readl(pdata->ioaddr + reg);
 
-	if (pdata->config.flags & SMSC911X_USE_16BIT) {
-		u32 data;
-		unsigned long flags;
-
-		/* these two 16-bit reads must be performed consecutively, so
-		 * must not be interrupted by our own ISR (which would start
-		 * another read operation) */
-		spin_lock_irqsave(&pdata->dev_lock, flags);
-		data = ((readw(pdata->ioaddr + reg) & 0xFFFF) |
+	if (pdata->config.flags & SMSC911X_USE_16BIT)
+		return ((readw(pdata->ioaddr + reg) & 0xFFFF) |
 			((readw(pdata->ioaddr + reg + 2) & 0xFFFF) << 16));
-		spin_unlock_irqrestore(&pdata->dev_lock, flags);
-
-		return data;
-	}
 
 	BUG();
 	return 0;
 }
 
-static inline void smsc911x_reg_write(struct smsc911x_data *pdata, u32 reg,
-				      u32 val)
+static inline u32 smsc911x_reg_read(struct smsc911x_data *pdata, u32 reg)
+{
+	u32 data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pdata->dev_lock, flags);
+	data = __smsc911x_reg_read(pdata, reg);
+	spin_unlock_irqrestore(&pdata->dev_lock, flags);
+
+	return data;
+}
+
+static inline void __smsc911x_reg_write(struct smsc911x_data *pdata, u32 reg,
+					u32 val)
 {
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
 		writel(val, pdata->ioaddr + reg);
@@ -156,44 +151,54 @@ static inline void smsc911x_reg_write(struct smsc911x_data *pdata, u32 reg,
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_16BIT) {
-		unsigned long flags;
-
-		/* these two 16-bit writes must be performed consecutively, so
-		 * must not be interrupted by our own ISR (which would start
-		 * another read operation) */
-		spin_lock_irqsave(&pdata->dev_lock, flags);
 		writew(val & 0xFFFF, pdata->ioaddr + reg);
 		writew((val >> 16) & 0xFFFF, pdata->ioaddr + reg + 2);
-		spin_unlock_irqrestore(&pdata->dev_lock, flags);
 		return;
 	}
 
 	BUG();
 }
 
+static inline void smsc911x_reg_write(struct smsc911x_data *pdata, u32 reg,
+				      u32 val)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pdata->dev_lock, flags);
+	__smsc911x_reg_write(pdata, reg, val);
+	spin_unlock_irqrestore(&pdata->dev_lock, flags);
+}
+
 /* Writes a packet to the TX_DATA_FIFO */
 static inline void
 smsc911x_tx_writefifo(struct smsc911x_data *pdata, unsigned int *buf,
 		      unsigned int wordcount)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&pdata->dev_lock, flags);
+
 	if (pdata->config.flags & SMSC911X_SWAP_FIFO) {
 		while (wordcount--)
-			smsc911x_reg_write(pdata, TX_DATA_FIFO, swab32(*buf++));
-		return;
+			__smsc911x_reg_write(pdata, TX_DATA_FIFO,
+					     swab32(*buf++));
+		goto out;
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
 		writesl(pdata->ioaddr + TX_DATA_FIFO, buf, wordcount);
-		return;
+		goto out;
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_16BIT) {
 		while (wordcount--)
-			smsc911x_reg_write(pdata, TX_DATA_FIFO, *buf++);
-		return;
+			__smsc911x_reg_write(pdata, TX_DATA_FIFO, *buf++);
+		goto out;
 	}
 
 	BUG();
+out:
+	spin_unlock_irqrestore(&pdata->dev_lock, flags);
 }
 
 /* Reads a packet out of the RX_DATA_FIFO */
@@ -201,24 +206,31 @@ static inline void
 smsc911x_rx_readfifo(struct smsc911x_data *pdata, unsigned int *buf,
 		     unsigned int wordcount)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&pdata->dev_lock, flags);
+
 	if (pdata->config.flags & SMSC911X_SWAP_FIFO) {
 		while (wordcount--)
-			*buf++ = swab32(smsc911x_reg_read(pdata, RX_DATA_FIFO));
-		return;
+			*buf++ = swab32(__smsc911x_reg_read(pdata,
+							    RX_DATA_FIFO));
+		goto out;
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
 		readsl(pdata->ioaddr + RX_DATA_FIFO, buf, wordcount);
-		return;
+		goto out;
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_16BIT) {
 		while (wordcount--)
-			*buf++ = smsc911x_reg_read(pdata, RX_DATA_FIFO);
-		return;
+			*buf++ = __smsc911x_reg_read(pdata, RX_DATA_FIFO);
+		goto out;
 	}
 
 	BUG();
+out:
+	spin_unlock_irqrestore(&pdata->dev_lock, flags);
 }
 
 /* waits for MAC not busy, with timeout.  Only called by smsc911x_mac_read

-- 
http://www.iki.fi/~ananaza/

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP
  2010-08-31  0:07                             ` Shinya Kuribayashi
  2010-08-31  6:22                               ` Antti P Miettinen
@ 2010-08-31  8:33                               ` Catalin Marinas
  2010-08-31  8:42                                 ` Shinya Kuribayashi
  1 sibling, 1 reply; 72+ messages in thread
From: Catalin Marinas @ 2010-08-31  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2010-08-31 at 01:07 +0100, Shinya Kuribayashi wrote:
> On 10/6/2009 3:12 PM, Antti P Miettinen wrote:
> > Catalin Marinas <catalin.marinas@arm.com> writes:
> >> Fixed it with this new commit (I'm not posting the patch here as it's
> >> not meant for review on this list):
> >>
> >> http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=3cc8ee538f6727d1f6773fff6d8e31bc5522bfee
> >
> > Just a piece of information: with vanilla 2.6.31 boot from NFS root on
> > my PB11MPCore hangs, with this patch applied, boot succeeds.
> 
> Now I'm setting up similar dev environment by chance; Cortex-A9 MPcore,
> slightly old kernel (say .31 or .32), and with smsc911x.c Ethernet
> driver required [1].
> 
> I'm auditing the required patches right now, and very interested in
> above patch, but it seems that blob got lost these days :-(
> 
> Do you still have the patch, and would you mind sharing it with me?

It is in mainline already:

http://git.kernel.org/linus/492c5d943d6a04b124ba3a719dc746dc36b14cfb

-- 
Catalin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP
  2010-08-31  8:33                               ` Catalin Marinas
@ 2010-08-31  8:42                                 ` Shinya Kuribayashi
  0 siblings, 0 replies; 72+ messages in thread
From: Shinya Kuribayashi @ 2010-08-31  8:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/31/2010 5:33 PM, Catalin Marinas wrote:
> It is in mainline already:
> 
> http://git.kernel.org/linus/492c5d943d6a04b124ba3a719dc746dc36b14cfb

Thanks!  Just slightly out of my search range ;-)

$ git describe 492c5d943d6a04b124ba3a719dc746dc36b14cfb
v2.6.35-rc1-1255-g492c5d9

-- 
Shinya Kuribayashi
Renesas Electronics

^ permalink raw reply	[flat|nested] 72+ messages in thread

* smsc911x.c driver and SMP
  2010-08-31  6:22                               ` Antti P Miettinen
@ 2010-08-31  9:10                                 ` Shinya Kuribayashi
  0 siblings, 0 replies; 72+ messages in thread
From: Shinya Kuribayashi @ 2010-08-31  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 8/31/2010 3:22 PM, Antti P Miettinen wrote:
> Hmm.. hard to remember what the patch actually contained - could it be
> the one below?

That's the one (also identified by Catalin), thanks both.

> By the way - I have a KZM board so I'd be happy to test your kernels.

Great, thanks for kind offer.  I'm planning to upgrade its kernel
from .28/.29 to .31/.32, which doubles as ARM training for me (as
I'm new to ARM).  When I have a problem, I'd like to ask for help.

Thanks,
-- 
Shinya Kuribayashi
Renesas Electronics

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2010-08-31  9:10 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4A7AEEB6.5060903@googlemail.com>
     [not found] ` <1250184014.14019.40.camel@pc1117.cambridge.arm.com>
     [not found]   ` <1250501311.9858.24.camel@pc1117.cambridge.arm.com>
     [not found]     ` <20090817140422.GA10764@n2100.arm.linux.org.uk>
2009-08-29 12:27       ` Kernel related (?) user space crash at ARM11 MPCore Catalin Marinas
2009-08-31  8:30         ` Catalin Marinas
2009-09-07 15:29           ` Catalin Marinas
2009-09-07 15:56             ` Dirk Behme
2009-09-07 16:43               ` Catalin Marinas
2009-09-07 17:31             ` Mikael Pettersson
2009-09-07 21:40               ` Catalin Marinas
2009-09-03 11:58         ` Dirk Behme
     [not found]       ` <1250529916.11185.80.camel@pc1117.cambridge.arm.com>
     [not found]         ` <20090919224022.GA738@n2100.arm.linux.org.uk>
     [not found]           ` <1253435940.498.15.camel@pc1117.cambridge.arm.com>
2009-09-20  9:31             ` Russell King - ARM Linux
2009-09-20 19:02               ` Russell King - ARM Linux
2009-09-20 22:46                 ` Catalin Marinas
2009-09-21  8:31                   ` Jamie Lokier
2009-09-21  8:41                     ` Russell King - ARM Linux
2009-09-21  9:41                       ` Jamie Lokier
2009-09-21 10:08                         ` Catalin Marinas
2009-09-21  8:49                     ` Catalin Marinas
2009-09-21  8:54                       ` Russell King - ARM Linux
2009-09-21  9:44                         ` Catalin Marinas
2009-09-21 10:07                           ` Russell King - ARM Linux
2009-09-21 10:42                             ` Catalin Marinas
2009-09-21 20:10                             ` Jamie Lokier
2009-09-21 21:26                               ` Russell King - ARM Linux
2009-09-21 22:14                                 ` Catalin Marinas
2009-09-21 22:25                                 ` Jamie Lokier
2009-09-22  8:43                                   ` Catalin Marinas
2009-09-21 21:58                               ` Catalin Marinas
2009-09-21 22:12                                 ` Jamie Lokier
2009-09-21 22:31                                   ` Russell King - ARM Linux
2009-09-21 22:34                                   ` Catalin Marinas
2009-09-21 21:38                             ` Russell King - ARM Linux
2009-09-21 22:28                               ` Catalin Marinas
2009-09-21 22:37                                 ` Jamie Lokier
2009-09-21 22:33                               ` Jamie Lokier
2009-09-22  9:21                                 ` Catalin Marinas
2009-09-22 10:19                               ` Catalin Marinas
2009-09-22 17:17                                 ` Catalin Marinas
2009-09-23  6:03                                   ` Dirk Behme
2009-09-23  9:13                                     ` Catalin Marinas
2009-09-23 10:38                                       ` Catalin Marinas
2009-09-23 12:12                                         ` Mikael Pettersson
2009-09-23 12:42                                           ` Russell King - ARM Linux
2009-09-23 12:51                                             ` Catalin Marinas
2009-09-23 12:55                                               ` Catalin Marinas
2009-10-15 14:57                             ` Russell King - ARM Linux
2009-10-15 15:20                               ` Catalin Marinas
2009-10-15 15:28                                 ` Russell King - ARM Linux
2009-10-15 15:56                                   ` Catalin Marinas
2009-10-20 11:39                                     ` Catalin Marinas
2009-10-25 13:39                                       ` Russell King - ARM Linux
2009-10-26 18:40                                         ` Catalin Marinas
2009-10-25 14:48                                       ` Russell King - ARM Linux
2009-10-26 18:45                                         ` Catalin Marinas
2009-10-26 19:17                                           ` Russell King - ARM Linux
2009-10-15 15:48                                 ` Dirk Behme
2009-10-15 15:53                                   ` Catalin Marinas
2009-10-25 13:04                                 ` Russell King - ARM Linux
2009-10-26 18:18                                   ` Catalin Marinas
2009-09-20 22:02               ` Catalin Marinas
2009-09-22  5:44                 ` Shilimkar, Santosh
2009-09-22  9:01                   ` Catalin Marinas
2009-09-22  9:34                     ` Shilimkar, Santosh
     [not found] ` <1249981883.27150.14.camel@pc1117.cambridge.arm.com>
     [not found]   ` <4A818CBC.8040000@googlemail.com>
     [not found]     ` <1250006770.30628.1.camel@pc1117.cambridge.arm.com>
     [not found]       ` <4A819C54.3080606@googlemail.com>
     [not found]         ` <1250009043.30628.9.camel@pc1117.cambridge.arm.com>
     [not found]           ` <87ab25vazg.fsf@brigitte.kvy.fi>
     [not found]             ` <1250080338.20332.32.camel@pc1117.cambridge.arm.com>
     [not found]               ` <87k518yc8a.fsf@brigitte.kvy.fi>
2009-09-11  9:21                 ` smsc911x.c driver and SMP (was Re: Kernel related (?) user space crash at ARM11 MPCore) Catalin Marinas
2009-09-11 12:55                   ` Bill Gatliff
2009-09-11 13:00                     ` Catalin Marinas
2009-09-11 15:20                       ` Bill Gatliff
2009-09-11 16:06                         ` Catalin Marinas
2009-10-06  6:12                           ` smsc911x.c driver and SMP Antti P Miettinen
2010-08-31  0:07                             ` Shinya Kuribayashi
2010-08-31  6:22                               ` Antti P Miettinen
2010-08-31  9:10                                 ` Shinya Kuribayashi
2010-08-31  8:33                               ` Catalin Marinas
2010-08-31  8:42                                 ` Shinya Kuribayashi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.