All of lore.kernel.org
 help / color / mirror / Atom feed
* arm_syscall cacheflush breakage on VIPT platforms
@ 2009-09-28  9:29 Imre Deak
  2009-09-28  9:41 ` Russell King - ARM Linux
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-28  9:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

the following test app will cause an unhandled kernel paging request
on VIPT platforms. The triggering condition is the mmap_sem held by
thread_func while the main thread performs cache flushing.

Since the likelihood of this to trigger is relatively low, a patch will
follow that makes similar bugs more visible.

--Imre

#define _GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>

#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/mman.h>

static int exit_thread;
static pthread_t tid;

void *thread_func(void *arg)
{
	while (1) {
		int map_size = 4096;
		void *mem;

		if (exit_thread)
			break;
		mem = mmap(NULL, map_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
		if (mem == MAP_FAILED) {
			perror("mmap");
			break;
		}
		munmap(mem, map_size);
	}
}

int start_mmap_thread(void)
{
	if (pthread_create(&tid, NULL, thread_func, NULL) < 0) {
		perror("pthread_create");
		return -1;
	}
}

int stop_mmap_thread(void)
{
	exit_thread = 1;
	pthread_join(tid, NULL);
}

int main(int argc, char *argv[])
{
	void *mem;
	pid_t tid;
	int r;
	size_t size;
	unsigned long end;
	int i;
	int nr_iter = 1000;

	size = 4096;
	if (posix_memalign(&mem, size, 4096) != 0) {
		fprintf(stderr, "malloc\n");
		return -1;
	}

	start_mmap_thread();

	for (i = 0; i < nr_iter; i++) {
		end = (unsigned long)mem + size - 1;
		r = syscall(__ARM_NR_cacheflush, (unsigned long)mem, end, 0);
		if (r < 0) {
			fprintf(stderr, "syscall: %d\n", r);
			goto out;
		}
	}
out:
	free(mem);

	stop_mmap_thread();

	return 0;
}


[   92.347442] Unable to handle kernel paging request at virtual address 00012000
[   92.354797] pgd = cf1d4000
[   92.357574] [00012000] *pgd=8f1dc031, *pte=00000000, *ppte=00000000
[   92.363983] Internal error: Oops: 817 [#1] PREEMPT
[   92.368804] Modules linked in:
[   92.415679] CPU: 0    Not tainted  (2.6.28-omap1-00042-g96a5ca2-dirty #231)
[   92.422729] PC is at v7_coherent_kern_range+0x18/0x44
[   92.427825] LR is@arm_syscall+0x1c4/0x2b0
[   92.432159] pc : [<c0033b88>]    lr : [<c00306ec>]    psr: 80000053
[   92.432159] sp : cf2a3e80  ip : cf1de0b0  fp : cf2a3fa4
[   92.443725] r10: 40024000  r9 : cf2a2000  r8 : 00000000
[   92.449005] r7 : 000f0002  r6 : 00000000  r5 : 00012fff  r4 : 00012000
[   92.455596] r3 : 0000003f  r2 : 00000040  r1 : 00013000  r0 : 00012000
[   92.462188] Flags: Nzcv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[   92.469482] Control: 10c5387d  Table: 8f1d4018  DAC: 00000015
[   92.475280] Process ct (pid: 768, stack limit = 0xcf2a22e0)
[   92.480895] Stack: (0xcf2a3e80 to 0xcf2a4000)
[   92.485290] 3e80: cfdb23c0 c0281880 cf2a3eb4 cf2a3e98 c0285b08 c0285a68 00000000 c004f980
[   92.493743] 3ea0: cf16401c cfdb2700 cf2a3ecc cf2a3eb8 c0285b54 c0285ae0 cf2df900 cfdb2700
[   92.502166] 3ec0: cf2a3efc cf2a3ed0 c02816b8 c00501fc 40000000 cf2a2000 c0381540 00000000
[   92.510589] 3ee0: 00000301 00000000 00000000 00000000 cf2a3f14 cf2a3f00 c0281880 c0281410
[   92.519012] 3f00: cfdb23c0 c0381540 cf2a3f3c cf2a3f18 c0052ff4 c0281848 003d0f00 60000053
[   92.527435] 3f20: cf2a3f3c cfdb23c0 003d0f00 00000000 cf2a3f8c cf2a3f40 c0054f1c c0052f0c
[   92.535858] 3f40: 409734d8 cf215bc0 00000000 c0156750 cf2a3fa4 cf2a3f60 c00a34d4 c0070930
[   92.544281] 3f60: 00100070 409734d8 40973490 4004c000 00000078 c002cac4 cf2a2000 40033888
[   92.552703] 3f80: cf2a3fa4 cf2a3f90 c002f9c0 00000000 bea99ef4 00000001 00000000 cf2a3fa8
[   92.561126] 3fa0: c002c940 c0030534 00000000 bea99ef4 00012000 00012fff 00000000 40023e08
[   92.569549] 3fc0: 00000000 bea99ef4 00000001 000f0002 00000000 00000000 40024000 bea99d9c
[   92.577972] 3fe0: bea99d68 bea99d58 00008788 4010d6f0 60000050 00012000 805b6021 805b6421
[   92.586395] Backtrace:
[   92.588867] [<c0030528>] (arm_syscall+0x0/0x2b0) from [<c002c940>] (ret_fast_syscall+0x0/0x2c)
[   92.597625]  r6:00000001 r5:bea99ef4 r4:00000000
[   92.602294] Code: e3a02010 e1a02312 e2423001 e1c00003 (ee070f3b)
[   92.609893] mtdoops: Ready 26, 219 (no erase)
[   92.878631] ---[ end trace 6854c4877e56a241 ]---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28  9:29 arm_syscall cacheflush breakage on VIPT platforms Imre Deak
@ 2009-09-28  9:41 ` Russell King - ARM Linux
  2009-09-28  9:54   ` Imre Deak
  2009-09-28  9:48 ` [PATCH] ARM: add warning for invalid kernel page faults Imre Deak
  2009-09-28 12:49 ` arm_syscall cacheflush breakage on VIPT platforms Jamie Lokier
  2 siblings, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28  9:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 12:29:19PM +0300, Imre Deak wrote:
> Hi,
> 
> the following test app will cause an unhandled kernel paging request
> on VIPT platforms. The triggering condition is the mmap_sem held by
> thread_func while the main thread performs cache flushing.
> 
> Since the likelihood of this to trigger is relatively low, a patch will
> follow that makes similar bugs more visible.

The problem is that, unlike previous cache architectures, if a page is
not present we now get a data abort during cache maintainence.  That
means the cache maintainence instructions used for this call need to
be marked with user fixups, so that the kernel knows how to handle
such an abort.

It is not caused by the holding of mmap_sem.  However, do_cache_op()
should hold something to ensure the VMA doesn't disappear beneath it.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28  9:29 arm_syscall cacheflush breakage on VIPT platforms Imre Deak
  2009-09-28  9:41 ` Russell King - ARM Linux
@ 2009-09-28  9:48 ` Imre Deak
  2009-09-28  9:55   ` Russell King - ARM Linux
  2009-09-28 12:49 ` arm_syscall cacheflush breakage on VIPT platforms Jamie Lokier
  2 siblings, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28  9:48 UTC (permalink / raw)
  To: linux-arm-kernel

From: Imre Deak <imre.deak@nokia.com>

According to the following in arch/arm/mm/fault.c page faults from
kernel mode are invalid if mmap_sem is already held and there is
no exception handler defined for the faulting instruction:

/*
 * As per x86, we may deadlock here.  However, since the kernel only
 * validly references user space from well defined areas of the code,
 * we can bug out early if this is from code which shouldn't.
 */
if (!down_read_trylock(&mm->mmap_sem)) {
	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
		goto no_context;

Since mmap_sem can be held at arbitrary times by another thread this
also means that any page faults from kernel mode are invalid if no
exception handler is defined for them, regardless whether mmap_sem is
held at the time of fault.

To easier detect code that can trigger the above error, add a check
also for the case where mmap_sem is acquired. As this has an overhead
make it a VM debug warning.

This will emit a warning at least for arm_syscall cacheflush users
on VIPT platforms, where the breakage would happen only in a less
likely situation.

Signed-off-by: Imre Deak <imre.deak@nokia.com>
---
 arch/arm/mm/fault.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index cc8829d..d719746 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -278,6 +278,20 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
 			goto no_context;
 		down_read(&mm->mmap_sem);
+#ifdef CONFIG_DEBUG_VM
+		if (!user_mode(regs) &&
+		    !search_exception_tables(regs->ARM_pc)) {
+			static unsigned long last_warn_jiffies;
+
+			if (printk_timed_ratelimit(&last_warn_jiffies, 10000)) {
+				printk(KERN_WARNING
+				       "Invalid kernel paging request at virtual address %08lx",
+				       addr);
+				show_pte(mm, addr);
+				WARN_ON(1);
+			}
+		}
+#endif
 	}
 
 	fault = __do_page_fault(mm, addr, fsr, tsk);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28  9:41 ` Russell King - ARM Linux
@ 2009-09-28  9:54   ` Imre Deak
  2009-09-28  9:59     ` Russell King - ARM Linux
  0 siblings, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 11:41:40AM +0200, ext Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 12:29:19PM +0300, Imre Deak wrote:
> > Hi,
> > 
> > the following test app will cause an unhandled kernel paging request
> > on VIPT platforms. The triggering condition is the mmap_sem held by
> > thread_func while the main thread performs cache flushing.
> > 
> > Since the likelihood of this to trigger is relatively low, a patch will
> > follow that makes similar bugs more visible.
> 
> The problem is that, unlike previous cache architectures, if a page is
> not present we now get a data abort during cache maintainence.  That
> means the cache maintainence instructions used for this call need to
> be marked with user fixups, so that the kernel knows how to handle
> such an abort.
> 
> It is not caused by the holding of mmap_sem.

This particular bug is caused by holding of mmap_sem, without any fixup
being defined.

> However, do_cache_op()
> should hold something to ensure the VMA doesn't disappear beneath it.

Yes, agreed.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28  9:48 ` [PATCH] ARM: add warning for invalid kernel page faults Imre Deak
@ 2009-09-28  9:55   ` Russell King - ARM Linux
  2009-09-28 10:00     ` Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 12:48:24PM +0300, Imre Deak wrote:
> To easier detect code that can trigger the above error, add a check
> also for the case where mmap_sem is acquired. As this has an overhead
> make it a VM debug warning.

It _is_ already easy.  I'm not sure why you want even more noise, and
why you want to break the page fault handling.  From the warning you
received in your previous post, it said:

[   92.422729] PC is at v7_coherent_kern_range+0x18/0x44
[   92.427825] LR is at arm_syscall+0x1c4/0x2b0
...
[   92.588867] [<c0030528>] (arm_syscall+0x0/0x2b0) from [<c002c940>] (ret_fast_syscall+0x0/0x2c)
[   92.597625]  r6:00000001 r5:bea99ef4 r4:00000000
[   92.602294] Code: e3a02010 e1a02312 e2423001 e1c00003 (ee070f3b)

which is quite clear - the fault happened in v7_coherent_kern_range()
and the code line disassembles to:

   0:   e3a02010        mov     r2, #16 ; 0x10
   4:   e1a02312        lsl     r2, r2, r3
   8:   e2423001        sub     r3, r2, #1      ; 0x1
   c:   e1c00003        bic     r0, r0, r3
  10:   ee070f3b        mcr     15, 0, r0, cr7, cr11, {1}

If we look up v7_coherent_kern_range(), we find:

ENTRY(v7_coherent_user_range)
        dcache_line_size r2, r3
        sub     r3, r2, #1
        bic     r0, r0, r3
1:      mcr     p15, 0, r0, c7, c11, 1          @ clean D line to the point of unification
        dsb

So we know which bit of kernel code caused the problem.

If we want to know what address, there is one simple, and one slightly
more complicated way to find out:

[   92.347442] Unable to handle kernel paging request at virtual address 00012000

The above line is the simple way.  The slightly more complicated way is
by looking at the above code, realising that 'r0' is the address which
was being cleaned, and then looking it up in the register dump:

[   92.432159] pc : [<c0033b88>]    lr : [<c00306ec>]    psr: 80000053
[   92.432159] sp : cf2a3e80  ip : cf1de0b0  fp : cf2a3fa4
[   92.443725] r10: 40024000  r9 : cf2a2000  r8 : 00000000
[   92.449005] r7 : 000f0002  r6 : 00000000  r5 : 00012fff  r4 : 00012000
[   92.455596] r3 : 0000003f  r2 : 00000040  r1 : 00013000  r0 : 00012000

I'm not sure what other information you would want.

And we _certainly_ do not want to allow the thread to continue if we
encounter an unexpected kernel page fault.  Jumping to no_context is
definitely the right thing to do.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28  9:54   ` Imre Deak
@ 2009-09-28  9:59     ` Russell King - ARM Linux
  2009-09-28 10:10       ` Imre Deak
  2009-09-28 16:54       ` Catalin Marinas
  0 siblings, 2 replies; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 12:54:29PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 11:41:40AM +0200, ext Russell King - ARM Linux wrote:
> > On Mon, Sep 28, 2009 at 12:29:19PM +0300, Imre Deak wrote:
> > > Hi,
> > > 
> > > the following test app will cause an unhandled kernel paging request
> > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > thread_func while the main thread performs cache flushing.
> > > 
> > > Since the likelihood of this to trigger is relatively low, a patch will
> > > follow that makes similar bugs more visible.
> > 
> > The problem is that, unlike previous cache architectures, if a page is
> > not present we now get a data abort during cache maintainence.  That
> > means the cache maintainence instructions used for this call need to
> > be marked with user fixups, so that the kernel knows how to handle
> > such an abort.
> > 
> > It is not caused by the holding of mmap_sem.
> 
> This particular bug is caused by holding of mmap_sem, without any fixup
> being defined.

No it is not.  The problem has nothing to do with holding of mmap_sem
AT ALL.  In fact, do_cache_op needs to hold mmap_sem itself, to prevent
the VMA going away beneath it.  That's not going to stop it generating
faults, and it's not going to stop it oopsing.

The problem is that we don't have any fixup in place for this situation.

There is nothing wrong in the page fault handler.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28  9:55   ` Russell King - ARM Linux
@ 2009-09-28 10:00     ` Imre Deak
  2009-09-28 10:04       ` Russell King - ARM Linux
  0 siblings, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28 10:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 11:55:16AM +0200, ext Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 12:48:24PM +0300, Imre Deak wrote:
> > To easier detect code that can trigger the above error, add a check
> > also for the case where mmap_sem is acquired. As this has an overhead
> > make it a VM debug warning.
> 
> It _is_ already easy.  I'm not sure why you want even more noise, and
> why you want to break the page fault handling.  From the warning you
> received in your previous post, it said:

The problem is that it happens very rarely. Only if at the time of the
fault the mmap_sem happens to be held. With the change the error would
be apparent at the first fault the offending instruction generates.

For debugging this is very useful, I think.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 10:00     ` Imre Deak
@ 2009-09-28 10:04       ` Russell King - ARM Linux
  2009-09-28 10:16         ` Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 10:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 01:00:48PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 11:55:16AM +0200, ext Russell King - ARM Linux wrote:
> > On Mon, Sep 28, 2009 at 12:48:24PM +0300, Imre Deak wrote:
> > > To easier detect code that can trigger the above error, add a check
> > > also for the case where mmap_sem is acquired. As this has an overhead
> > > make it a VM debug warning.
> > 
> > It _is_ already easy.  I'm not sure why you want even more noise, and
> > why you want to break the page fault handling.  From the warning you
> > received in your previous post, it said:
> 
> The problem is that it happens very rarely. Only if at the time of the
> fault the mmap_sem happens to be held. With the change the error would
> be apparent at the first fault the offending instruction generates.

Actually... I don't agree that your code can have any change what so
ever.

Condition 1:
                if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
                        goto no_context;

                down_read(&mm->mmap_sem);
+#ifdef CONFIG_DEBUG_VM

Condition 2:

+               if (!user_mode(regs) &&
+                   !search_exception_tables(regs->ARM_pc)) {
+                       static unsigned long last_warn_jiffies;

Condition 1 and condition 2 are identical.  They do not change on whether
the mmap_sem is held or not.  So please explain to me how you actually
get to printing any of your new warnings.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28  9:59     ` Russell King - ARM Linux
@ 2009-09-28 10:10       ` Imre Deak
  2009-09-28 10:28         ` Russell King - ARM Linux
  2009-09-28 16:54       ` Catalin Marinas
  1 sibling, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 11:59:45AM +0200, ext Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 12:54:29PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 11:41:40AM +0200, ext Russell King - ARM Linux wrote:
> > > On Mon, Sep 28, 2009 at 12:29:19PM +0300, Imre Deak wrote:
> > > > Hi,
> > > > 
> > > > the following test app will cause an unhandled kernel paging request
> > > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > > thread_func while the main thread performs cache flushing.
> > > > 
> > > > Since the likelihood of this to trigger is relatively low, a patch will
> > > > follow that makes similar bugs more visible.
> > > 
> > > The problem is that, unlike previous cache architectures, if a page is
> > > not present we now get a data abort during cache maintainence.  That
> > > means the cache maintainence instructions used for this call need to
> > > be marked with user fixups, so that the kernel knows how to handle
> > > such an abort.
> > > 
> > > It is not caused by the holding of mmap_sem.
> > 
> > This particular bug is caused by holding of mmap_sem, without any fixup
> > being defined.
> 
> No it is not.

What I meant is that without holding mmap_sem you'll not trigger the error
message even if you don't have a fixup routine.

> The problem has nothing to do with holding of mmap_sem
> AT ALL.  In fact, do_cache_op needs to hold mmap_sem itself, to prevent
> the VMA going away beneath it.  That's not going to stop it generating
> faults, and it's not going to stop it oopsing.
> 
> The problem is that we don't have any fixup in place for this situation.

Yes, agreed. What I stated is that for _triggerring_ the error message you
need mmap_sem.

> There is nothing wrong in the page fault handler.

Agreed. Only a warning would be nice to make similar cases more apparent
even if mmap_sem is not held.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 10:04       ` Russell King - ARM Linux
@ 2009-09-28 10:16         ` Imre Deak
  2009-09-28 10:27           ` Russell King - ARM Linux
  0 siblings, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 12:04:42PM +0200, ext Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 01:00:48PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 11:55:16AM +0200, ext Russell King - ARM Linux wrote:
> > > On Mon, Sep 28, 2009 at 12:48:24PM +0300, Imre Deak wrote:
> > > > To easier detect code that can trigger the above error, add a check
> > > > also for the case where mmap_sem is acquired. As this has an overhead
> > > > make it a VM debug warning.
> > > 
> > > It _is_ already easy.  I'm not sure why you want even more noise, and
> > > why you want to break the page fault handling.  From the warning you
> > > received in your previous post, it said:
> > 
> > The problem is that it happens very rarely. Only if at the time of the
> > fault the mmap_sem happens to be held. With the change the error would
> > be apparent at the first fault the offending instruction generates.
> 
> Actually... I don't agree that your code can have any change what so
> ever.
> 
> Condition 1:
>                 if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
>                         goto no_context;
> 
>                 down_read(&mm->mmap_sem);
> +#ifdef CONFIG_DEBUG_VM
> 
> Condition 2:
> 
> +               if (!user_mode(regs) &&
> +                   !search_exception_tables(regs->ARM_pc)) {
> +                       static unsigned long last_warn_jiffies;
> 
> Condition 1 and condition 2 are identical.  They do not change on whether
> the mmap_sem is held or not.  So please explain to me how you actually
> get to printing any of your new warnings.

With the change it's:

Condition 1:

if (!down_read_trylock(&mm->mmap_sem)) {
	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
		goto no_context;
	down_read(&mm->mmap_sem);
} else {
#ifdef CONFIG_DEBUG_VM

Condition 2:
	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))


So in condition 2, we managed to acquire mmap_sem, which will be the
majority of the cases for kernel mode faults. In condition 1, we didn't
since it was held by another thread.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 10:16         ` Imre Deak
@ 2009-09-28 10:27           ` Russell King - ARM Linux
  2009-09-28 11:01             ` Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 10:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 01:16:25PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 12:04:42PM +0200, ext Russell King - ARM Linux wrote:
> > On Mon, Sep 28, 2009 at 01:00:48PM +0300, Imre Deak wrote:
> > > On Mon, Sep 28, 2009 at 11:55:16AM +0200, ext Russell King - ARM Linux wrote:
> > > > On Mon, Sep 28, 2009 at 12:48:24PM +0300, Imre Deak wrote:
> > > > > To easier detect code that can trigger the above error, add a check
> > > > > also for the case where mmap_sem is acquired. As this has an overhead
> > > > > make it a VM debug warning.
> > > > 
> > > > It _is_ already easy.  I'm not sure why you want even more noise, and
> > > > why you want to break the page fault handling.  From the warning you
> > > > received in your previous post, it said:
> > > 
> > > The problem is that it happens very rarely. Only if at the time of the
> > > fault the mmap_sem happens to be held. With the change the error would
> > > be apparent at the first fault the offending instruction generates.
> > 
> > Actually... I don't agree that your code can have any change what so
> > ever.
> > 
> > Condition 1:
> >                 if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> >                         goto no_context;
> > 
> >                 down_read(&mm->mmap_sem);
> > +#ifdef CONFIG_DEBUG_VM
> > 
> > Condition 2:
> > 
> > +               if (!user_mode(regs) &&
> > +                   !search_exception_tables(regs->ARM_pc)) {
> > +                       static unsigned long last_warn_jiffies;
> > 
> > Condition 1 and condition 2 are identical.  They do not change on whether
> > the mmap_sem is held or not.  So please explain to me how you actually
> > get to printing any of your new warnings.
> 
> With the change it's:
> 
> Condition 1:
> 
> if (!down_read_trylock(&mm->mmap_sem)) {
> 	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> 		goto no_context;
> 	down_read(&mm->mmap_sem);
> } else {
> #ifdef CONFIG_DEBUG_VM
> 
> Condition 2:
> 	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> 
> 
> So in condition 2, we managed to acquire mmap_sem, which will be the
> majority of the cases for kernel mode faults. In condition 1, we didn't
> since it was held by another thread.

Now you're talking about different code - the bit I quoted was what was
in your submitted patch, without deletion of intervening lines.  There
was no else clause in your patch.

Please, go back and look at your original patch.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 10:10       ` Imre Deak
@ 2009-09-28 10:28         ` Russell King - ARM Linux
  2009-09-28 11:00           ` Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 01:10:44PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 11:59:45AM +0200, ext Russell King - ARM Linux wrote:
> > The problem has nothing to do with holding of mmap_sem
> > AT ALL.  In fact, do_cache_op needs to hold mmap_sem itself, to prevent
> > the VMA going away beneath it.  That's not going to stop it generating
> > faults, and it's not going to stop it oopsing.
> > 
> > The problem is that we don't have any fixup in place for this situation.
> 
> Yes, agreed. What I stated is that for _triggerring_ the error message you
> need mmap_sem.
> 
> > There is nothing wrong in the page fault handler.
> 
> Agreed. Only a warning would be nice to make similar cases more apparent
> even if mmap_sem is not held.

Page faults can happen at other times, and validly be fixed up.  vmalloc
space as an example.  You can't assume that a kernel mode page fault
without an exception fixup is invalid.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 10:28         ` Russell King - ARM Linux
@ 2009-09-28 11:00           ` Imre Deak
  0 siblings, 0 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-28 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 12:28:15PM +0200, ext Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 01:10:44PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 11:59:45AM +0200, ext Russell King - ARM Linux wrote:
> > > The problem has nothing to do with holding of mmap_sem
> > > AT ALL.  In fact, do_cache_op needs to hold mmap_sem itself, to prevent
> > > the VMA going away beneath it.  That's not going to stop it generating
> > > faults, and it's not going to stop it oopsing.
> > > 
> > > The problem is that we don't have any fixup in place for this situation.
> > 
> > Yes, agreed. What I stated is that for _triggerring_ the error message you
> > need mmap_sem.
> > 
> > > There is nothing wrong in the page fault handler.
> > 
> > Agreed. Only a warning would be nice to make similar cases more apparent
> > even if mmap_sem is not held.
> 
> Page faults can happen at other times, and validly be fixed up.  vmalloc
> space as an example.  You can't assume that a kernel mode page fault
> without an exception fixup is invalid.

Yes, but vmalloc is handled separately in do_translation_fault. In do_page_fault
you'll need a fixup for kernel mode page fault.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 10:27           ` Russell King - ARM Linux
@ 2009-09-28 11:01             ` Imre Deak
  2009-09-28 11:05               ` [PATCH v2] " Imre Deak
  2009-09-28 11:26               ` [PATCH] " Russell King - ARM Linux
  0 siblings, 2 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-28 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 12:27:10PM +0200, ext Russell King - ARM Linux wrote:
> [...]
>
> Now you're talking about different code - the bit I quoted was what was
> in your submitted patch, without deletion of intervening lines.  There
> was no else clause in your patch.
> 
> Please, go back and look at your original patch.

Ah, sorry. Missed that line when rebasing. I'll send an updated one.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2] ARM: add warning for invalid kernel page faults
  2009-09-28 11:01             ` Imre Deak
@ 2009-09-28 11:05               ` Imre Deak
  2009-09-28 11:26               ` [PATCH] " Russell King - ARM Linux
  1 sibling, 0 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-28 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

From: Imre Deak <imre.deak@nokia.com>

According to the following in arch/arm/mm/fault.c page faults from
kernel mode are invalid if mmap_sem is already held and there is
no exception handler defined for the faulting instruction:

/*
 * As per x86, we may deadlock here.  However, since the kernel only
 * validly references user space from well defined areas of the code,
 * we can bug out early if this is from code which shouldn't.
 */
if (!down_read_trylock(&mm->mmap_sem)) {
	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
		goto no_context;

Since mmap_sem can be held at arbitrary times by another thread this
also means that any page faults from kernel mode are invalid if no
exception handler is defined for them, regardless whether mmap_sem is
held at the time of fault.

To easier detect code that can trigger the above error, add a check
also for the case where mmap_sem is acquired. As this has an overhead
make it a VM debug warning.

This will emit a warning at least for arm_syscall cacheflush users
on VIPT platforms, where the breakage would happen only in a less
likely situation.

Signed-off-by: Imre Deak <imre.deak@nokia.com>
---

Added a missing else.

 arch/arm/mm/fault.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index cc8829d..91a9710 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -278,6 +278,21 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
 			goto no_context;
 		down_read(&mm->mmap_sem);
+	} else {
+#ifdef CONFIG_DEBUG_VM
+		if (!user_mode(regs) &&
+		    !search_exception_tables(regs->ARM_pc)) {
+			static unsigned long last_warn_jiffies;
+
+			if (printk_timed_ratelimit(&last_warn_jiffies, 10000)) {
+				printk(KERN_WARNING
+				       "Invalid kernel paging request at virtual address %08lx",
+				       addr);
+				show_pte(mm, addr);
+				WARN_ON(1);
+			}
+		}
+#endif
 	}
 
 	fault = __do_page_fault(mm, addr, fsr, tsk);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 11:01             ` Imre Deak
  2009-09-28 11:05               ` [PATCH v2] " Imre Deak
@ 2009-09-28 11:26               ` Russell King - ARM Linux
  2009-09-28 11:33                 ` Imre Deak
  1 sibling, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 02:01:58PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 12:27:10PM +0200, ext Russell King - ARM Linux wrote:
> > [...]
> >
> > Now you're talking about different code - the bit I quoted was what was
> > in your submitted patch, without deletion of intervening lines.  There
> > was no else clause in your patch.
> > 
> > Please, go back and look at your original patch.
> 
> Ah, sorry. Missed that line when rebasing. I'll send an updated one.

Now I see what you're getting at.  Yes, we can make this a debugging
option, but I believe it should exhibit the right behaviour.

In other words (and as I already covered) it should invoke the standard
no_context thing if there isn't a fixup handler in place rather than
continuing blindly on to try to handle the fault.  So, it should be:

	if (!down_read_trylock(&mm->mmap_sem)) {
		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
			goto no_context;
		down_read(&mm->mmap_sem);
	} else {
+#ifdef CONFIG_DEBUG_VM
+		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
+			goto no_context;
+#endif
...

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 11:26               ` [PATCH] " Russell King - ARM Linux
@ 2009-09-28 11:33                 ` Imre Deak
  2009-09-28 11:34                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 01:26:24PM +0200, ext Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 02:01:58PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 12:27:10PM +0200, ext Russell King - ARM Linux wrote:
> > > [...]
> > >
> > > Now you're talking about different code - the bit I quoted was what was
> > > in your submitted patch, without deletion of intervening lines.  There
> > > was no else clause in your patch.
> > > 
> > > Please, go back and look at your original patch.
> > 
> > Ah, sorry. Missed that line when rebasing. I'll send an updated one.
> 
> Now I see what you're getting at.  Yes, we can make this a debugging
> option, but I believe it should exhibit the right behaviour.
> 
> In other words (and as I already covered) it should invoke the standard
> no_context thing if there isn't a fixup handler in place rather than
> continuing blindly on to try to handle the fault.  So, it should be:
> 
> 	if (!down_read_trylock(&mm->mmap_sem)) {
> 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> 			goto no_context;
> 		down_read(&mm->mmap_sem);
> 	} else {
> +#ifdef CONFIG_DEBUG_VM
> +		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> +			goto no_context;
> +#endif

Ok, agreed. Do you need an updated patch?

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] ARM: add warning for invalid kernel page faults
  2009-09-28 11:33                 ` Imre Deak
@ 2009-09-28 11:34                   ` Russell King - ARM Linux
  2009-09-29 10:07                     ` [PATCH v3] ARM: add debug check " Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 02:33:27PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 01:26:24PM +0200, ext Russell King - ARM Linux wrote:
> > On Mon, Sep 28, 2009 at 02:01:58PM +0300, Imre Deak wrote:
> > > On Mon, Sep 28, 2009 at 12:27:10PM +0200, ext Russell King - ARM Linux wrote:
> > > > [...]
> > > >
> > > > Now you're talking about different code - the bit I quoted was what was
> > > > in your submitted patch, without deletion of intervening lines.  There
> > > > was no else clause in your patch.
> > > > 
> > > > Please, go back and look at your original patch.
> > > 
> > > Ah, sorry. Missed that line when rebasing. I'll send an updated one.
> > 
> > Now I see what you're getting at.  Yes, we can make this a debugging
> > option, but I believe it should exhibit the right behaviour.
> > 
> > In other words (and as I already covered) it should invoke the standard
> > no_context thing if there isn't a fixup handler in place rather than
> > continuing blindly on to try to handle the fault.  So, it should be:
> > 
> > 	if (!down_read_trylock(&mm->mmap_sem)) {
> > 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> > 			goto no_context;
> > 		down_read(&mm->mmap_sem);
> > 	} else {
> > +#ifdef CONFIG_DEBUG_VM
> > +		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
> > +			goto no_context;
> > +#endif
> 
> Ok, agreed. Do you need an updated patch?

Yes, I would like a patch, and preferably tested please.  We've had a
number of not-tested patches hitting the kernel recently and I'd like
to cut down on those.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28  9:29 arm_syscall cacheflush breakage on VIPT platforms Imre Deak
  2009-09-28  9:41 ` Russell King - ARM Linux
  2009-09-28  9:48 ` [PATCH] ARM: add warning for invalid kernel page faults Imre Deak
@ 2009-09-28 12:49 ` Jamie Lokier
  2009-09-28 13:16   ` Imre Deak
  2 siblings, 1 reply; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 12:49 UTC (permalink / raw)
  To: linux-arm-kernel

Imre Deak wrote:
> Hi,
> 
> the following test app will cause an unhandled kernel paging request
> on VIPT platforms. The triggering condition is the mmap_sem held by
> thread_func while the main thread performs cache flushing.
> 
> Since the likelihood of this to trigger is relatively low, a patch will
> follow that makes similar bugs more visible.

I would expect the likelihood of triggering would be higher for at
least one of Java, Mono, Parrot or any of the modern Javascript
engines.  cacheflush is used by anything which generates code, and
mmap is used for loading libraries - things those applications do a lot.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 12:49 ` arm_syscall cacheflush breakage on VIPT platforms Jamie Lokier
@ 2009-09-28 13:16   ` Imre Deak
  2009-09-28 13:19     ` Jamie Lokier
  2009-09-28 13:23     ` Russell King - ARM Linux
  0 siblings, 2 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-28 13:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 02:49:22PM +0200, ext Jamie Lokier wrote:
> Imre Deak wrote:
> > Hi,
> > 
> > the following test app will cause an unhandled kernel paging request
> > on VIPT platforms. The triggering condition is the mmap_sem held by
> > thread_func while the main thread performs cache flushing.
> > 
> > Since the likelihood of this to trigger is relatively low, a patch will
> > follow that makes similar bugs more visible.
> 
> I would expect the likelihood of triggering would be higher for at
> least one of Java, Mono, Parrot or any of the modern Javascript
> engines.

True, the above statement is only valid for certain use patterns. I was
mainly interested in applications that do user range cache flushing as
part of their DMA requests and they didn't have threads with frequent
syscalls that required mmap_sem, so the problem remained hidden for a
long time.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:16   ` Imre Deak
@ 2009-09-28 13:19     ` Jamie Lokier
  2009-09-28 13:25       ` Russell King - ARM Linux
  2009-09-28 13:31       ` Imre Deak
  2009-09-28 13:23     ` Russell King - ARM Linux
  1 sibling, 2 replies; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Imre Deak wrote:
> On Mon, Sep 28, 2009 at 02:49:22PM +0200, ext Jamie Lokier wrote:
> > Imre Deak wrote:
> > > Hi,
> > > 
> > > the following test app will cause an unhandled kernel paging request
> > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > thread_func while the main thread performs cache flushing.
> > > 
> > > Since the likelihood of this to trigger is relatively low, a patch will
> > > follow that makes similar bugs more visible.
> > 
> > I would expect the likelihood of triggering would be higher for at
> > least one of Java, Mono, Parrot or any of the modern Javascript
> > engines.
> 
> True, the above statement is only valid for certain use patterns. I was
> mainly interested in applications that do user range cache flushing as
> part of their DMA requests and they didn't have threads with frequent
> syscalls that required mmap_sem, so the problem remained hidden for a
> long time.

Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
userspace, or is it just luck that it happens to work?

Does that include O_DIRECT regular file I/O as used by databases on
these ARMs?  (Nobody ever gives a straight answer)

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:16   ` Imre Deak
  2009-09-28 13:19     ` Jamie Lokier
@ 2009-09-28 13:23     ` Russell King - ARM Linux
  1 sibling, 0 replies; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 04:16:24PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 02:49:22PM +0200, ext Jamie Lokier wrote:
> > Imre Deak wrote:
> > > Hi,
> > > 
> > > the following test app will cause an unhandled kernel paging request
> > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > thread_func while the main thread performs cache flushing.
> > > 
> > > Since the likelihood of this to trigger is relatively low, a patch will
> > > follow that makes similar bugs more visible.
> > 
> > I would expect the likelihood of triggering would be higher for at
> > least one of Java, Mono, Parrot or any of the modern Javascript
> > engines.
> 
> True, the above statement is only valid for certain use patterns. I was
> mainly interested in applications that do user range cache flushing as
> part of their DMA requests and they didn't have threads with frequent
> syscalls that required mmap_sem, so the problem remained hidden for a
> long time.

Note that currently this API does not really support that action.  It's
there to synchronise the I and D caches when you want to write self
modifying code.

DMA coherency is going to be an extension to it which I'm going to be
looking at in the coming weeks - but I'm not entirely happy about
extending the API this way.  It has to be done because too many
people are demanding a way to do DMA from userspace, so I guess we're
going to have to forego the "lets do it properly" approach - esp. as
tha's been illustrated to fail already.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:19     ` Jamie Lokier
@ 2009-09-28 13:25       ` Russell King - ARM Linux
  2009-09-28 13:56         ` Jamie Lokier
  2009-09-28 13:31       ` Imre Deak
  1 sibling, 1 reply; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 02:19:26PM +0100, Jamie Lokier wrote:
> Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
> userspace, or is it just luck that it happens to work?
> 
> Does that include O_DIRECT regular file I/O as used by databases on
> these ARMs?  (Nobody ever gives a straight answer)

Most definitely not.  As far as O_DIRECT goes, I've no idea what to do
about that, or even if it's a problem.  I just don't use it so it's
not something I care about.

I wouldn't even know _how_ to use it or even how to provoke any bugs
in that area.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:19     ` Jamie Lokier
  2009-09-28 13:25       ` Russell King - ARM Linux
@ 2009-09-28 13:31       ` Imre Deak
  2009-09-28 13:42         ` Russell King - ARM Linux
  1 sibling, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 03:19:26PM +0200, ext Jamie Lokier wrote:
> Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 02:49:22PM +0200, ext Jamie Lokier wrote:
> > > Imre Deak wrote:
> > > > Hi,
> > > > 
> > > > the following test app will cause an unhandled kernel paging request
> > > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > > thread_func while the main thread performs cache flushing.
> > > > 
> > > > Since the likelihood of this to trigger is relatively low, a patch will
> > > > follow that makes similar bugs more visible.
> > > 
> > > I would expect the likelihood of triggering would be higher for at
> > > least one of Java, Mono, Parrot or any of the modern Javascript
> > > engines.
> > 
> > True, the above statement is only valid for certain use patterns. I was
> > mainly interested in applications that do user range cache flushing as
> > part of their DMA requests and they didn't have threads with frequent
> > syscalls that required mmap_sem, so the problem remained hidden for a
> > long time.
> 
> Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
> userspace, or is it just luck that it happens to work?

No, it's not sys_cacheflush but using dma_cache_maint for user range.
And yes I know that at the moment it's not the Right Way to use it on
a user range, but the alternative of flushing each page separately is
just prohibitively slow.

Hopefully by adding the necessary fixups for the cache ops (and taking
mmap_sem) will make this an ok thing to do. An alternative is to
mlock the range so no faults are triggered for it, but that's again a
not-supported-thing to do from a driver.

> Does that include O_DIRECT regular file I/O as used by databases on
> these ARMs?  (Nobody ever gives a straight answer)

Not in my case, it's just a regular anonymous mapping pinned with
get_user_pages.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:31       ` Imre Deak
@ 2009-09-28 13:42         ` Russell King - ARM Linux
  2009-09-28 13:55           ` Aguirre Rodriguez, Sergio Alberto
  2009-09-28 14:10           ` Laurent Pinchart
  0 siblings, 2 replies; 40+ messages in thread
From: Russell King - ARM Linux @ 2009-09-28 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 04:31:09PM +0300, Imre Deak wrote:
> On Mon, Sep 28, 2009 at 03:19:26PM +0200, ext Jamie Lokier wrote:
> > Imre Deak wrote:
> > > On Mon, Sep 28, 2009 at 02:49:22PM +0200, ext Jamie Lokier wrote:
> > > > Imre Deak wrote:
> > > > > Hi,
> > > > > 
> > > > > the following test app will cause an unhandled kernel paging request
> > > > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > > > thread_func while the main thread performs cache flushing.
> > > > > 
> > > > > Since the likelihood of this to trigger is relatively low, a patch will
> > > > > follow that makes similar bugs more visible.
> > > > 
> > > > I would expect the likelihood of triggering would be higher for at
> > > > least one of Java, Mono, Parrot or any of the modern Javascript
> > > > engines.
> > > 
> > > True, the above statement is only valid for certain use patterns. I was
> > > mainly interested in applications that do user range cache flushing as
> > > part of their DMA requests and they didn't have threads with frequent
> > > syscalls that required mmap_sem, so the problem remained hidden for a
> > > long time.
> > 
> > Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
> > userspace, or is it just luck that it happens to work?
> 
> No, it's not sys_cacheflush but using dma_cache_maint for user range.

dma_cache_maint can't be used either, because it's only valid for the
kernel's RAM mapping.

> And yes I know that at the moment it's not the Right Way to use it on
> a user range, but the alternative of flushing each page separately is
> just prohibitively slow.

That's the way it's going to have to be done I'm afraid, especially
when you realise you require the physical address for flushing
non-coherent L2 caches.  Since you need to translate to a struct page
anyway, getting that is just essential.

> Hopefully by adding the necessary fixups for the cache ops (and taking
> mmap_sem) will make this an ok thing to do. An alternative is to
> mlock the range so no faults are triggered for it, but that's again a
> not-supported-thing to do from a driver.

As I do keep pointing out (and people keep ignoring) there is no real
way to do DMA direct from user mappings.  It's something that the Linux
kernel Just Doesn't Support.

You ask any mainline person, and that's basically the reply you get.
It's been asked about many times.  The answer is always the same.

I believe that the reason for this is that it is _impossible_ to come
up with a way to do DMA from userspace in a cross-architecture way.
There's too many architectural details to make it work.

Eg, for at least one architecture, you need to get the right colouring
of all pages to be DMA'd and program that colour index into the DMA
controller for it to be coherent.

I really don't know what the answer is, and the pressure that I'm being
placed under on this is going to lead us into a botched solution that's
going to have long term problems.  We'll probably end up having to have
multiple interfaces, and userspace will have to work out which is the
right one to use.

I'd much rather just say "no, userspace DMA is *never* going to ever
be supported" and call it a day, but I suspect no one's going to like
that either.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:42         ` Russell King - ARM Linux
@ 2009-09-28 13:55           ` Aguirre Rodriguez, Sergio Alberto
  2009-09-28 14:07             ` Jamie Lokier
  2009-09-28 14:10           ` Laurent Pinchart
  1 sibling, 1 reply; 40+ messages in thread
From: Aguirre Rodriguez, Sergio Alberto @ 2009-09-28 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

> From: linux-arm-kernel-bounces at lists.infradead.org [linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of Russell King - ARM Linux [linux at arm.linux.org.uk]
> Sent: Monday, September 28, 2009 8:42 AM
> On Mon, Sep 28, 2009 at 04:31:09PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 03:19:26PM +0200, ext Jamie Lokier wrote:
> > > Imre Deak wrote:
> > > > On Mon, Sep 28, 2009 at 02:49:22PM +0200, ext Jamie Lokier wrote:
> > > > > Imre Deak wrote:
> > > > > > Hi,
> > > > > >
> > > > > > the following test app will cause an unhandled kernel paging request
> > > > > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > > > > thread_func while the main thread performs cache flushing.
> > > > > >
> > > > > > Since the likelihood of this to trigger is relatively low, a patch will
> > > > > > follow that makes similar bugs more visible.
> > > > >
> > > > > I would expect the likelihood of triggering would be higher for at
> > > > > least one of Java, Mono, Parrot or any of the modern Javascript
> > > > > engines.
> > > >
> > > > True, the above statement is only valid for certain use patterns. I was
> > > > mainly interested in applications that do user range cache flushing as
> > > > part of their DMA requests and they didn't have threads with frequent
> > > > syscalls that required mmap_sem, so the problem remained hidden for a
> > > > long time.
> > >
> > > Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
> > > userspace, or is it just luck that it happens to work?
> >
> > No, it's not sys_cacheflush but using dma_cache_maint for user range.
> 
> dma_cache_maint can't be used either, because it's only valid for the
> kernel's RAM mapping.
> 
> > And yes I know that at the moment it's not the Right Way to use it on
> > a user range, but the alternative of flushing each page separately is
> > just prohibitively slow.
> 
> That's the way it's going to have to be done I'm afraid, especially
> when you realise you require the physical address for flushing
> non-coherent L2 caches.  Since you need to translate to a struct page
> anyway, getting that is just essential.
> 
> > Hopefully by adding the necessary fixups for the cache ops (and taking
> > mmap_sem) will make this an ok thing to do. An alternative is to
> > mlock the range so no faults are triggered for it, but that's again a
> > not-supported-thing to do from a driver.
> 
> As I do keep pointing out (and people keep ignoring) there is no real
> way to do DMA direct from user mappings.  It's something that the Linux
> kernel Just Doesn't Support.
> 
> You ask any mainline person, and that's basically the reply you get.
> It's been asked about many times.  The answer is always the same.
> 
> I believe that the reason for this is that it is _impossible_ to come
> up with a way to do DMA from userspace in a cross-architecture way.
> There's too many architectural details to make it work.
> 
> Eg, for at least one architecture, you need to get the right colouring
> of all pages to be DMA'd and program that colour index into the DMA
> controller for it to be coherent.
> 
> I really don't know what the answer is, and the pressure that I'm being
> placed under on this is going to lead us into a botched solution that's
> going to have long term problems.  We'll probably end up having to have
> multiple interfaces, and userspace will have to work out which is the
> right one to use.
> 
> I'd much rather just say "no, userspace DMA is *never* going to ever
> be supported" and call it a day, but I suspect no one's going to like
> that either.

Just FYI...

In OMAP3 specifically, we were looking for this to happen, since we have
big buffers that we need to share with other subsystems.

For example, when we take a 8 megapixel RAW 10-bit image (16MB) and
we want to send it to the DSP bridge driver, doing a memcpy to another kernel
allocated and mmaped buffer is a very suboptimal idea.
(which is our only option as per your statement)

Is not that we don't like it, but having a global solution will be more demanded in the
future, so not everybody has to work around this in their subsystem.

Its a real problem, and its not hard to hit nowadays.

Regards,
Sergio
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:25       ` Russell King - ARM Linux
@ 2009-09-28 13:56         ` Jamie Lokier
  0 siblings, 0 replies; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 13:56 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 02:19:26PM +0100, Jamie Lokier wrote:
> > Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
> > userspace, or is it just luck that it happens to work?
> > 
> > Does that include O_DIRECT regular file I/O as used by databases on
> > these ARMs?  (Nobody ever gives a straight answer)
> 
> Most definitely not.  As far as O_DIRECT goes, I've no idea what to do
> about that, or even if it's a problem.  I just don't use it so it's
> not something I care about.

O_DIRECT is a slightly obscure open() flag, which means bypass the
page cache when possible.

Although obscure, it is often used by databases and virtual machines,
and some file-copying utilities.  Databases includes MySQL,
PostgreSQL, Sqlite.

Direct I/O results in a read() or write() transferring directly
between a userspace-mapped page and the block device underlying a file
(if no highmem bounce buffer is used).  If the block driver uses DMA,
then the DMA goes to the userspace-mapped page.

I say often, because O_DIRECT has a fallback where it uses the regular
page cache path sometimes.  Extending a file and filling holes always
uses the page cache.  Reads and in-place writes which are page-aligned
and filesystem-block-aligned result in direct I/O.

You can generally tell what happened from timing: reading twice will
be fast the second time through the page cache, but takes the same
time using direct I/O because it goes to the device each time; writing
is fast the first time into the page cache (which is write-back), but
direct I/O writes take as much time as the device needs.

> I wouldn't even know _how_ to use it or even how to provoke any bugs
> in that area.

Here are some simple tests:

Read a file with O_DIRECT:

   dd if=somefile iflag=direct bs=1M | md5sum -

Read a disk partition with O_DIRECT:

   dd if=/dev/sda1 iflag=direct bs=16M | md5sum -

Write a file with O_DIRECT:

   dd if=/dev/zero of=testfile bs=1M count=16 # Preallocate the file
   dd if=somedata of=testfile oflag=direct bs=1M # Write in place

As above to write to a disk partition.

It's not hard to imagine how that translates to DMA using the block
device driver.

(Note, if you test, it's not supported on all filesystems, just the
"major" ones like ext2/3/4, reiserfs, xfs, btrfs etc.  NFS supports
O_DIRECT but might not use DMA in the same way.  I don't think it
applies to any of the flash filesystems.  As said earlier, you can
tell if direct I/O is being used from the timing).

If there are DMA cache coherence issues, I would expect _some_
combination of dd commends to result in a corrupt file, either on disk
afterwards, or in page cache which is detectable by md5sum.  It might
be necessary to choose a particular block size and data pattern to
show it.

Unfortunately I don't have any ARM hardware with the type of caches
which have been discussed re. the DMA to/from userspace issues to
perform those tests, or to refine them to highlight an effect, or to
rule it out.

Usually I'd say DMA to userspace is dirty and arch-specific, and
people must do special things or even not use it, on some archs.
But O_DIRECT is a generic filesystem feature on all Linux kernels (and
other OSes), and is used by certain widely used apps, so needs to
either work correctly, or if that's really too difficult, then
O_DIRECT should be prevented from being enabled at all.  (All apps can
cope with the fallback to non-direct I/O).

I simply couldn't tell from the prior discussions about userspace DMA
not being possible due to cache incoherence, whether that would affect
O_DIRECT I/O or not.  But if you need help working it out, or making a
test, I can probably help with that.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:55           ` Aguirre Rodriguez, Sergio Alberto
@ 2009-09-28 14:07             ` Jamie Lokier
  0 siblings, 0 replies; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

Aguirre Rodriguez, Sergio Alberto wrote:
> In OMAP3 specifically, we were looking for this to happen, since we have
> big buffers that we need to share with other subsystems.
> 
> For example, when we take a 8 megapixel RAW 10-bit image (16MB) and
> we want to send it to the DSP bridge driver, doing a memcpy to another kernel
> allocated and mmaped buffer is a very suboptimal idea.
> (which is our only option as per your statement)

I used to think that about moving compressed video data around.  I
wanted my video decoding application to avoid copying compressed data
(which is about 20Mbit/s).  But then I did some back-of-the-envelope
calculations and decided it was so much easier to copy:

You're copying 16MB.  Let me guess - 32 bit data bus?  According to
Wikipedia, OMAP3 CPU is 600 to 1000MHz.  I've no idea what you're RAM
bus speed is, but is it somewhere around 200MHz?

Just guesstimating here: 16MB on 32-bit bus at 200MHz, copying to RAM
and back again = 200/8 = 0.04 seconds.

Is 0.04 seconds copying time worth devising zero-copy schemes for?  If
that's every frame of a video I'd say yes (but you'd have other
problems first); if it's non-video camera processing then I'd say no.
It's _nice_ to shave off 0.04 seconds, but weight it up against the
difficulties (and overheads) of making reliable DMA in this case.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 13:42         ` Russell King - ARM Linux
  2009-09-28 13:55           ` Aguirre Rodriguez, Sergio Alberto
@ 2009-09-28 14:10           ` Laurent Pinchart
  2009-09-28 14:15             ` Jamie Lokier
  2009-09-28 14:20             ` Bill Gatliff
  1 sibling, 2 replies; 40+ messages in thread
From: Laurent Pinchart @ 2009-09-28 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russel,

On Monday 28 September 2009 15:42:32 Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 04:31:09PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 03:19:26PM +0200, ext Jamie Lokier wrote:
> > > Imre Deak wrote:
> > > > I was mainly interested in applications that do user range cache
> > > > flushing as part of their DMA requests and they didn't have threads
> > > > with frequent syscalls that required mmap_sem, so the problem
> > > > remained hidden for a long time.
> > >
> > > Aieee.  Is sys_cacheflush architecturally the Right Way to do DMA to
> > > userspace, or is it just luck that it happens to work?
> >
> > No, it's not sys_cacheflush but using dma_cache_maint for user range.
> 
> dma_cache_maint can't be used either, because it's only valid for the
> kernel's RAM mapping.
> 
> > And yes I know that at the moment it's not the Right Way to use it on
> > a user range, but the alternative of flushing each page separately is
> > just prohibitively slow.
> 
> That's the way it's going to have to be done I'm afraid, especially
> when you realise you require the physical address for flushing
> non-coherent L2 caches.  Since you need to translate to a struct page
> anyway, getting that is just essential.
> 
> > Hopefully by adding the necessary fixups for the cache ops (and taking
> > mmap_sem) will make this an ok thing to do. An alternative is to
> > mlock the range so no faults are triggered for it, but that's again a
> > not-supported-thing to do from a driver.
> 
> As I do keep pointing out (and people keep ignoring) there is no real
> way to do DMA direct from user mappings.  It's something that the Linux
> kernel Just Doesn't Support.

I don't think people are ignoring you, but they are mostly trying to find a 
way to make it (somehow) work.

> You ask any mainline person, and that's basically the reply you get.
> It's been asked about many times.  The answer is always the same.
> 
> I believe that the reason for this is that it is _impossible_ to come
> up with a way to do DMA from userspace in a cross-architecture way.
> There's too many architectural details to make it work.
> 
> Eg, for at least one architecture, you need to get the right colouring
> of all pages to be DMA'd and program that colour index into the DMA
> controller for it to be coherent.
>
> I really don't know what the answer is, and the pressure that I'm being
> placed under on this is going to lead us into a botched solution that's
> going to have long term problems.  We'll probably end up having to have
> multiple interfaces, and userspace will have to work out which is the
> right one to use.

Do we really need a cross-architecture solution ? The pressure to implement a 
working userspace DMA solution seem to come mostly from embedded system 
developers, and embedded systems usually don't mind arch-specific APIs.
 
> I'd much rather just say "no, userspace DMA is *never* going to ever
> be supported" and call it a day, but I suspect no one's going to like
> that either.

In that case developers will all create their own incompatible solutions and 
the situation will likely get worse. Do you think it would be possible to 
create a clean DMA-to-userspace API specific to the ARM architecture ?

-- 
Laurent Pinchart

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 14:10           ` Laurent Pinchart
@ 2009-09-28 14:15             ` Jamie Lokier
  2009-09-28 14:22               ` Laurent Pinchart
  2009-09-28 20:18               ` Steven Walter
  2009-09-28 14:20             ` Bill Gatliff
  1 sibling, 2 replies; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

Laurent Pinchart wrote:
> > I'd much rather just say "no, userspace DMA is *never* going to ever
> > be supported" and call it a day, but I suspect no one's going to like
> > that either.
> 
> In that case developers will all create their own incompatible solutions and 
> the situation will likely get worse. Do you think it would be possible to 
> create a clean DMA-to-userspace API specific to the ARM architecture ?

I hate to spell out the obvious, but a fine solution is to _not_ DMA
directly to userspace, but to kmalloc() a large buffer in your own
driver, DMA into the buffer (it's kernel memory so that's ok), and
_then_ mmap() that buffer into userspace after the DMA.  Going the
other way, mmap(), write from userspace, munmap(), then do the DMA to
the device.

That's trivial to implement, and the developer's we're talking about
should have no difficulty writing a simple driver like that.  They
have a driver already, it's just a matter of adding the mmap method.

Russell, is there any reason why the above would not work?

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 14:10           ` Laurent Pinchart
  2009-09-28 14:15             ` Jamie Lokier
@ 2009-09-28 14:20             ` Bill Gatliff
  1 sibling, 0 replies; 40+ messages in thread
From: Bill Gatliff @ 2009-09-28 14:20 UTC (permalink / raw)
  To: linux-arm-kernel

Laurent Pinchart wrote:
> Do we really need a cross-architecture solution ? The pressure to implement a 
> working userspace DMA solution seem to come mostly from embedded system 
> developers, and embedded systems usually don't mind arch-specific APIs.
>   

So what if the first attempt is ARM-only?  Let the PPC guys have theirs, 
and so on--- all that experience should give us ideas on how to do the 
One Size Fits All version.  In the meantime, at least we'll have 
something we can use.

It's obviously a tricky problem to solve.  Think of the arch-specific 
version as being "prototype".  We'll be prototyping for a while, I'm 
sure, and the fact that we don't have such an API yet tells me that we 
need a few prototypes to identify how--- if it's even possible--- to 
proceed with a generic implementation.

> In that case developers will all create their own incompatible solutions and 
> the situation will likely get worse.

... and that creates the additional problem of cleaning up the mess 
while at the same time finding and implementing a cross-architecture 
solution.



b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 14:15             ` Jamie Lokier
@ 2009-09-28 14:22               ` Laurent Pinchart
  2009-09-28 14:50                 ` Jamie Lokier
  2009-09-28 20:18               ` Steven Walter
  1 sibling, 1 reply; 40+ messages in thread
From: Laurent Pinchart @ 2009-09-28 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 28 September 2009 16:15:10 Jamie Lokier wrote:
> Laurent Pinchart wrote:
> > > I'd much rather just say "no, userspace DMA is *never* going to ever
> > > be supported" and call it a day, but I suspect no one's going to like
> > > that either.
> >
> > In that case developers will all create their own incompatible solutions
> > and the situation will likely get worse. Do you think it would be
> > possible to create a clean DMA-to-userspace API specific to the ARM
> > architecture ?
> 
> I hate to spell out the obvious, but a fine solution is to _not_ DMA
> directly to userspace, but to kmalloc() a large buffer in your own
> driver, DMA into the buffer (it's kernel memory so that's ok), and
> _then_ mmap() that buffer into userspace after the DMA.  Going the
> other way, mmap(), write from userspace, munmap(), then do the DMA to
> the device.
> 
> That's trivial to implement, and the developer's we're talking about
> should have no difficulty writing a simple driver like that.  They
> have a driver already, it's just a matter of adding the mmap method.
> 
> Russell, is there any reason why the above would not work?

Performance reasons mostly. Think about transferring uncompressed 1080p at 
30fps for instance.

-- 
Laurent Pinchart

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 14:22               ` Laurent Pinchart
@ 2009-09-28 14:50                 ` Jamie Lokier
  2009-09-28 16:28                   ` Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

Laurent Pinchart wrote:
> On Monday 28 September 2009 16:15:10 Jamie Lokier wrote:
> > Laurent Pinchart wrote:
> > > > I'd much rather just say "no, userspace DMA is *never* going to ever
> > > > be supported" and call it a day, but I suspect no one's going to like
> > > > that either.
> > >
> > > In that case developers will all create their own incompatible solutions
> > > and the situation will likely get worse. Do you think it would be
> > > possible to create a clean DMA-to-userspace API specific to the ARM
> > > architecture ?
> > 
> > I hate to spell out the obvious, but a fine solution is to _not_ DMA
> > directly to userspace, but to kmalloc() a large buffer in your own
> > driver, DMA into the buffer (it's kernel memory so that's ok), and
> > _then_ mmap() that buffer into userspace after the DMA.  Going the
> > other way, mmap(), write from userspace, munmap(), then do the DMA to
> > the device.
> > 
> > That's trivial to implement, and the developer's we're talking about
> > should have no difficulty writing a simple driver like that.  They
> > have a driver already, it's just a matter of adding the mmap method.
> > 
> > Russell, is there any reason why the above would not work?
> 
> Performance reasons mostly. Think about transferring uncompressed 1080p at 
> 30fps for instance.

Did you confuse the mail you quoted with a different one?  What part
of using mmap+munmap 30 times per second has any impact at all on your
DMA performance?

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 14:50                 ` Jamie Lokier
@ 2009-09-28 16:28                   ` Imre Deak
  2009-09-28 19:35                     ` Jamie Lokier
  0 siblings, 1 reply; 40+ messages in thread
From: Imre Deak @ 2009-09-28 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 04:50:28PM +0200, ext Jamie Lokier wrote:
> Laurent Pinchart wrote:
> > On Monday 28 September 2009 16:15:10 Jamie Lokier wrote:
> > > Laurent Pinchart wrote:
> > > > > I'd much rather just say "no, userspace DMA is *never* going to ever
> > > > > be supported" and call it a day, but I suspect no one's going to like
> > > > > that either.
> > > >
> > > > In that case developers will all create their own incompatible solutions
> > > > and the situation will likely get worse. Do you think it would be
> > > > possible to create a clean DMA-to-userspace API specific to the ARM
> > > > architecture ?
> > > 
> > > I hate to spell out the obvious, but a fine solution is to _not_ DMA
> > > directly to userspace, but to kmalloc() a large buffer in your own
> > > driver, DMA into the buffer (it's kernel memory so that's ok), and
> > > _then_ mmap() that buffer into userspace after the DMA.  Going the
> > > other way, mmap(), write from userspace, munmap(), then do the DMA to
> > > the device.

One case where I don't see how this would work is when you want to pass
on the read data to another device using DMA as well. For example when
the raw captured data is written to flash storage. Unless you have some
way of letting know the target device that the area is kmalloc'd, but
that seems to be not so standard again.

> > > That's trivial to implement, and the developer's we're talking about
> > > should have no difficulty writing a simple driver like that.  They
> > > have a driver already, it's just a matter of adding the mmap method.
> > > 
> > > Russell, is there any reason why the above would not work?

The need for large physically contiguous allocations at run time.
Preallocation is not so nice if you have a bunch of multimedia
peripherals in your device.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28  9:59     ` Russell King - ARM Linux
  2009-09-28 10:10       ` Imre Deak
@ 2009-09-28 16:54       ` Catalin Marinas
  1 sibling, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2009-09-28 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-09-28 at 10:59 +0100, Russell King - ARM Linux wrote:
> On Mon, Sep 28, 2009 at 12:54:29PM +0300, Imre Deak wrote:
> > On Mon, Sep 28, 2009 at 11:41:40AM +0200, ext Russell King - ARM Linux wrote:
> > > On Mon, Sep 28, 2009 at 12:29:19PM +0300, Imre Deak wrote:
> > > > Hi,
> > > > 
> > > > the following test app will cause an unhandled kernel paging request
> > > > on VIPT platforms. The triggering condition is the mmap_sem held by
> > > > thread_func while the main thread performs cache flushing.
> > > > 
> > > > Since the likelihood of this to trigger is relatively low, a patch will
> > > > follow that makes similar bugs more visible.
> > > 
> > > The problem is that, unlike previous cache architectures, if a page is
> > > not present we now get a data abort during cache maintainence.  That
> > > means the cache maintainence instructions used for this call need to
> > > be marked with user fixups, so that the kernel knows how to handle
> > > such an abort.
> > > 
> > > It is not caused by the holding of mmap_sem.
> > 
> > This particular bug is caused by holding of mmap_sem, without any fixup
> > being defined.
> 
> No it is not.  The problem has nothing to do with holding of mmap_sem
> AT ALL.  In fact, do_cache_op needs to hold mmap_sem itself, to prevent
> the VMA going away beneath it.  That's not going to stop it generating
> faults, and it's not going to stop it oopsing.
> 
> The problem is that we don't have any fixup in place for this situation.

Something like this should solve the problem (not fully tested but
implemented as part of the mprotect workaround for the COW issue):


Handle possible translation errors in the ARMv6 and ARMv7 coherent user

Handle possible translation errors in the ARMv6 and ARMv7 coherent user
range functions. This is needed because applications using the
sys_cacheflush system call can pass a memory range which isn't mapped
yet even though the corresponding vma is valid.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---

diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 8364f6c..7baa6ce 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -12,6 +12,7 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <asm/assembler.h>
+#include <asm/unwind.h>
 
 #include "proc-macros.S"
 
@@ -129,11 +130,13 @@ ENTRY(v6_coherent_kern_range)
  *	- the Icache does not read data from the write buffer
  */
 ENTRY(v6_coherent_user_range)
-
+ UNWIND(.fnstart		)
 #ifdef HARVARD_CACHE
 	bic	r0, r0, #CACHE_LINE_SIZE - 1
-1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
+1:
+ USER(	mcr	p15, 0, r0, c7, c10, 1	)	@ clean D line
 	add	r0, r0, #CACHE_LINE_SIZE
+2:
 	cmp	r0, r1
 	blo	1b
 #endif
@@ -151,6 +154,19 @@ ENTRY(v6_coherent_user_range)
 	mov	pc, lr
 
 /*
+ * Fault handling for the cache operation above. If the virtual address in r0
+ * isn't mapped, just try the next page.
+ */
+9001:
+	mov	r0, r0, lsr #12
+	mov	r0, r0, lsl #12
+	add	r0, r0, #4096
+	b	2b
+ UNWIND(.fnend		)
+ENDPROC(v6_coherent_user_range)
+ENDPROC(v6_coherent_kern_range)
+
+/*
  *	v6_flush_kern_dcache_page(kaddr)
  *
  *	Ensure that the data held in the page kaddr is written back
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 29e6904..4b733d1 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -13,6 +13,7 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <asm/assembler.h>
+#include <asm/unwind.h>
 
 #include "proc-macros.S"
 
@@ -153,13 +154,16 @@ ENTRY(v7_coherent_kern_range)
  *	- the Icache does not read data from the write buffer
  */
 ENTRY(v7_coherent_user_range)
+ UNWIND(.fnstart		)
 	dcache_line_size r2, r3
 	sub	r3, r2, #1
 	bic	r0, r0, r3
-1:	mcr	p15, 0, r0, c7, c11, 1		@ clean D line to the point of unification
+1:
+ USER(	mcr	p15, 0, r0, c7, c11, 1	)	@ clean D line to the point of unification
 	dsb
-	mcr	p15, 0, r0, c7, c5, 1		@ invalidate I line
+ USER(	mcr	p15, 0, r0, c7, c5, 1	)	@ invalidate I line
 	add	r0, r0, r2
+2:
 	cmp	r0, r1
 	blo	1b
 	mov	r0, #0
@@ -167,6 +171,17 @@ ENTRY(v7_coherent_user_range)
 	dsb
 	isb
 	mov	pc, lr
+
+/*
+ * Fault handling for the cache operation above. If the virtual address in r0
+ * isn't mapped, just try the next page.
+ */
+9001:
+	mov	r0, r0, lsr #12
+	mov	r0, r0, lsl #12
+	add	r0, r0, #4096
+	b	2b
+ UNWIND(.fnend		)
 ENDPROC(v7_coherent_kern_range)
 ENDPROC(v7_coherent_user_range)
 

-- 
Catalin

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 16:28                   ` Imre Deak
@ 2009-09-28 19:35                     ` Jamie Lokier
  2009-09-29  9:10                       ` Imre Deak
  0 siblings, 1 reply; 40+ messages in thread
From: Jamie Lokier @ 2009-09-28 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

Imre Deak wrote:
> > > Jamie Lokier wrote:
> > > > I hate to spell out the obvious, but a fine solution is to _not_ DMA
> > > > directly to userspace, but to kmalloc() a large buffer in your own
> > > > driver, DMA into the buffer (it's kernel memory so that's ok), and
> > > > _then_ mmap() that buffer into userspace after the DMA.  Going the
> > > > other way, mmap(), write from userspace, munmap(), then do the DMA to
> > > > the device.
> 
> One case where I don't see how this would work is when you want to pass
> on the read data to another device using DMA as well. For example when
> the raw captured data is written to flash storage. Unless you have some
> way of letting know the target device that the area is kmalloc'd, but
> that seems to be not so standard again.

An understandable desire.

But if you want to do that, DMA to userspace and then from userspace is
not a particularly efficient way to do it anyway - because both DMAs
would have to walk the page tables in get_user_pages.

I believe someone posted an architecture/RFC/patches (I forget) for
passing memory blocks between devices for camera/video type
applications a few months ago.

I was advocating transfering to/from userspace, and the person
providing that framework said it was too slow to go via userspace.

> > > > That's trivial to implement, and the developer's we're talking about
> > > > should have no difficulty writing a simple driver like that.  They
> > > > have a driver already, it's just a matter of adding the mmap method.
> > > > 
> > > > Russell, is there any reason why the above would not work?
> 
> The need for large physically contiguous allocations at run time.
> Preallocation is not so nice if you have a bunch of multimedia
> peripherals in your device.

The kmalloc+mmap approach does not require any large contiguous
allocations, unless that's a property of your hardware, in which case
nothing will avoid it.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 14:15             ` Jamie Lokier
  2009-09-28 14:22               ` Laurent Pinchart
@ 2009-09-28 20:18               ` Steven Walter
  2009-09-29  0:50                 ` Jamie Lokier
  1 sibling, 1 reply; 40+ messages in thread
From: Steven Walter @ 2009-09-28 20:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 10:15 AM, Jamie Lokier <jamie@shareable.org> wrote:
> Laurent Pinchart wrote:
>> > I'd much rather just say "no, userspace DMA is *never* going to ever
>> > be supported" and call it a day, but I suspect no one's going to like
>> > that either.
>>
>> In that case developers will all create their own incompatible solutions and
>> the situation will likely get worse. Do you think it would be possible to
>> create a clean DMA-to-userspace API specific to the ARM architecture ?
>
> I hate to spell out the obvious, but a fine solution is to _not_ DMA
> directly to userspace, but to kmalloc() a large buffer in your own
> driver, DMA into the buffer (it's kernel memory so that's ok), and
> _then_ mmap() that buffer into userspace after the DMA. ?Going the
> other way, mmap(), write from userspace, munmap(), then do the DMA to
> the device.

It's not that simple when you have VIVT caches.

For example, consider the to-device case.  Userspace mmap()s the
buffer and writes into it.  They indicate to the device driver that
the data is ready for DMA.  Unfortunately, a simple dma_map_single of
the kmalloc'd buffer is not sufficient.  That will only clean
cachelines that fall into the kernel address range, and userspace
stored into the buffer using the userspace address range.

The problem exists for the from-device case on the second and
subsequent DMA.  After the app reads from the buffer the first time,
the cache may potentially contain stale cache lines that need to be
invalidated, and dma_unmap_single will not invalidate them.
-- 
-Steven Walter <stevenrwalter@gmail.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 20:18               ` Steven Walter
@ 2009-09-29  0:50                 ` Jamie Lokier
  0 siblings, 0 replies; 40+ messages in thread
From: Jamie Lokier @ 2009-09-29  0:50 UTC (permalink / raw)
  To: linux-arm-kernel

Steven Walter wrote:
> On Mon, Sep 28, 2009 at 10:15 AM, Jamie Lokier <jamie@shareable.org> wrote:
> > Laurent Pinchart wrote:
> >> > I'd much rather just say "no, userspace DMA is *never* going to ever
> >> > be supported" and call it a day, but I suspect no one's going to like
> >> > that either.
> >>
> >> In that case developers will all create their own incompatible solutions and
> >> the situation will likely get worse. Do you think it would be possible to
> >> create a clean DMA-to-userspace API specific to the ARM architecture ?
> >
> > I hate to spell out the obvious, but a fine solution is to _not_ DMA
> > directly to userspace, but to kmalloc() a large buffer in your own
> > driver, DMA into the buffer (it's kernel memory so that's ok), and
> > _then_ mmap() that buffer into userspace after the DMA. ?Going the
> > other way, mmap(), write from userspace, munmap(), then do the DMA to
> > the device.
> 
> It's not that simple when you have VIVT caches.
> 
> For example, consider the to-device case.  Userspace mmap()s the
> buffer and writes into it.  They indicate to the device driver that
> the data is ready for DMA.  Unfortunately, a simple dma_map_single of
> the kmalloc'd buffer is not sufficient.  That will only clean
> cachelines that fall into the kernel address range, and userspace
> stored into the buffer using the userspace address range.

Read the mail again.

The point is to write to the buffer and then call munmap() before
doing DMA.

> The problem exists for the from-device case on the second and
> subsequent DMA.  After the app reads from the buffer the first time,
> the cache may potentially contain stale cache lines that need to be
> invalidated, and dma_unmap_single will not invalidate them.

That's why it unmaps the buffer over DMA operations.

If munmap() doesn't unmap properly you have bigger problems than DMA.

Does munmap() work for this, or does that contain a nasty surprise too?

[ msync() would be logical a place to put explicit pre-DMA and
post-DMA cache requests, to avoid needing to unmap.  But I don't hold
much hope for msync() doing the right thing at this time.  If it turns
out that explicit cache cleaning and flushing are needed, though,
msync() would be a logical place to put it, perhaps with new flags. ]

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* arm_syscall cacheflush breakage on VIPT platforms
  2009-09-28 19:35                     ` Jamie Lokier
@ 2009-09-29  9:10                       ` Imre Deak
  0 siblings, 0 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-29  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 28, 2009 at 09:35:29PM +0200, ext Jamie Lokier wrote:
> Imre Deak wrote:
> > > > Jamie Lokier wrote:
> > > > > I hate to spell out the obvious, but a fine solution is to _not_ DMA
> > > > > directly to userspace, but to kmalloc() a large buffer in your own
> > > > > driver, DMA into the buffer (it's kernel memory so that's ok), and
> > > > > _then_ mmap() that buffer into userspace after the DMA.  Going the
> > > > > other way, mmap(), write from userspace, munmap(), then do the DMA to
> > > > > the device.
> > 
> > One case where I don't see how this would work is when you want to pass
> > on the read data to another device using DMA as well. For example when
> > the raw captured data is written to flash storage. Unless you have some
> > way of letting know the target device that the area is kmalloc'd, but
> > that seems to be not so standard again.
> 
> An understandable desire.
> 
> But if you want to do that, DMA to userspace and then from userspace is
> not a particularly efficient way to do it anyway - because both DMAs
> would have to walk the page tables in get_user_pages.
> I believe someone posted an architecture/RFC/patches (I forget) for
> passing memory blocks between devices for camera/video type
> applications a few months ago.

True, that would be the most ideal, you would even save the cache flush.
It assumes also that the data can be passed as-is, but perhaps that's a
fair assumption.

Another example where mmap would be inflexible is when the buffer is
shared between processes, or you are provided by a framework - like
gstreamer - with a ready buffer to transfer.

> I was advocating transfering to/from userspace, and the person
> providing that framework said it was too slow to go via userspace.
> 
> > > > > That's trivial to implement, and the developer's we're talking about
> > > > > should have no difficulty writing a simple driver like that.  They
> > > > > have a driver already, it's just a matter of adding the mmap method.
> > > > > 
> > > > > Russell, is there any reason why the above would not work?
> > 
> > The need for large physically contiguous allocations at run time.
> > Preallocation is not so nice if you have a bunch of multimedia
> > peripherals in your device.
> 
> The kmalloc+mmap approach does not require any large contiguous
> allocations, unless that's a property of your hardware, in which case
> nothing will avoid it.

No, I didn't mean the hardware, but was just wondering what the
benefit would be over get_user_pages. As I see you can either do:

a. kmalloc the whole buffer which will be physically contiguous. If
   this is mapped to user space so that it won't create aliases with
   the kernel direct mapping, dma_map_single can be used which will
   do proper cache sync'ing.

b. kmalloc in chunks (practically pages) and pass it to dma_map_sg,
   which will only flush cache lines for the kernel direct mapping.
   Thus in addition for each page you'll have to flush cache lines
   for the user mapping. For me this doesn't provide much benefit
   over using get_user_pages.

--Imre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3] ARM: add debug check for invalid kernel page faults
  2009-09-28 11:34                   ` Russell King - ARM Linux
@ 2009-09-29 10:07                     ` Imre Deak
  0 siblings, 0 replies; 40+ messages in thread
From: Imre Deak @ 2009-09-29 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

From: Imre Deak <imre.deak@nokia.com>

According to the following in arch/arm/mm/fault.c page faults from
kernel mode are invalid if mmap_sem is already held and there is
no fixup handler defined for the faulting instruction:

/*
 * As per x86, we may deadlock here.  However, since the kernel only
 * validly references user space from well defined areas of the code,
 * we can bug out early if this is from code which shouldn't.
 */
if (!down_read_trylock(&mm->mmap_sem)) {
	if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
		goto no_context;

Since mmap_sem can be held at arbitrary times by another thread this
also means that any page faults from kernel mode are invalid if no
fixup handler is defined for them, regardless whether mmap_sem is
held at the time of fault.

To easier detect code that can trigger the above error, add a check
also for the case where mmap_sem is acquired. As this has an overhead
make it a VM debug check.

Signed-off-by: Imre Deak <imre.deak@nokia.com>
---

Instead of a warning kill the process if the problem is detected.

 arch/arm/mm/fault.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index cc8829d..b5e9c88 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -278,6 +278,12 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
 			goto no_context;
 		down_read(&mm->mmap_sem);
+	} else {
+#ifdef CONFIG_DEBUG_VM
+		if (!user_mode(regs) &&
+		    !search_exception_tables(regs->ARM_pc))
+			goto no_context;
+#endif
 	}
 
 	fault = __do_page_fault(mm, addr, fsr, tsk);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2009-09-29 10:07 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-28  9:29 arm_syscall cacheflush breakage on VIPT platforms Imre Deak
2009-09-28  9:41 ` Russell King - ARM Linux
2009-09-28  9:54   ` Imre Deak
2009-09-28  9:59     ` Russell King - ARM Linux
2009-09-28 10:10       ` Imre Deak
2009-09-28 10:28         ` Russell King - ARM Linux
2009-09-28 11:00           ` Imre Deak
2009-09-28 16:54       ` Catalin Marinas
2009-09-28  9:48 ` [PATCH] ARM: add warning for invalid kernel page faults Imre Deak
2009-09-28  9:55   ` Russell King - ARM Linux
2009-09-28 10:00     ` Imre Deak
2009-09-28 10:04       ` Russell King - ARM Linux
2009-09-28 10:16         ` Imre Deak
2009-09-28 10:27           ` Russell King - ARM Linux
2009-09-28 11:01             ` Imre Deak
2009-09-28 11:05               ` [PATCH v2] " Imre Deak
2009-09-28 11:26               ` [PATCH] " Russell King - ARM Linux
2009-09-28 11:33                 ` Imre Deak
2009-09-28 11:34                   ` Russell King - ARM Linux
2009-09-29 10:07                     ` [PATCH v3] ARM: add debug check " Imre Deak
2009-09-28 12:49 ` arm_syscall cacheflush breakage on VIPT platforms Jamie Lokier
2009-09-28 13:16   ` Imre Deak
2009-09-28 13:19     ` Jamie Lokier
2009-09-28 13:25       ` Russell King - ARM Linux
2009-09-28 13:56         ` Jamie Lokier
2009-09-28 13:31       ` Imre Deak
2009-09-28 13:42         ` Russell King - ARM Linux
2009-09-28 13:55           ` Aguirre Rodriguez, Sergio Alberto
2009-09-28 14:07             ` Jamie Lokier
2009-09-28 14:10           ` Laurent Pinchart
2009-09-28 14:15             ` Jamie Lokier
2009-09-28 14:22               ` Laurent Pinchart
2009-09-28 14:50                 ` Jamie Lokier
2009-09-28 16:28                   ` Imre Deak
2009-09-28 19:35                     ` Jamie Lokier
2009-09-29  9:10                       ` Imre Deak
2009-09-28 20:18               ` Steven Walter
2009-09-29  0:50                 ` Jamie Lokier
2009-09-28 14:20             ` Bill Gatliff
2009-09-28 13:23     ` Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.