All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Optimise cache-flushing system call
@ 2013-05-24 11:31 Will Deacon
  2013-05-24 11:31 ` [PATCH v2 1/4] ARM: entry: allow ARM-private syscalls to be restarted Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Will Deacon @ 2013-05-24 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi guys,

This is a follow-on from the patches I originally posted here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-March/157810.html

but with some notable differences:

  - I've temporarily dropped the iovec system call while I try to work
    out a sane threshold value between flushing by line and nuking L1.

  - Added syscall restarting to address DoS issues raised by Catalin.

  - Added access_ok check now that vma searching code is removed.

  - Based on 3.10-rc2.

As per usual, all comments are welcome.

Cheers,

Will


Will Deacon (4):
  ARM: entry: allow ARM-private syscalls to be restarted
  ARM: cacheflush: split user cache-flushing into interruptible chunks
  ARM: cacheflush: don't round address range up to nearest page
  ARM: cacheflush: don't bother rounding to nearest vma

 arch/arm/include/asm/cacheflush.h  |  3 +-
 arch/arm/include/asm/thread_info.h | 11 +++++++
 arch/arm/kernel/entry-common.S     |  4 +--
 arch/arm/kernel/traps.c            | 64 +++++++++++++++++++++++++++++---------
 4 files changed, 63 insertions(+), 19 deletions(-)

-- 
1.8.2.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] ARM: entry: allow ARM-private syscalls to be restarted
  2013-05-24 11:31 [PATCH v2 0/4] Optimise cache-flushing system call Will Deacon
@ 2013-05-24 11:31 ` Will Deacon
  2013-05-24 11:31 ` [PATCH v2 2/4] ARM: cacheflush: split user cache-flushing into interruptible chunks Will Deacon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2013-05-24 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

System calls will only be restarted after signal handling if they (a)
return an error code indicating that a restart is required and (b) have
`why' set to a non-zero value, to indicate that the signal interrupted
them.

This patch leaves `why' set to a non-zero value for ARM-private syscalls
, and only zeroes it for syscalls that are not implemented.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/entry-common.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index bc5bc0a..8f38d24 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -437,10 +437,10 @@ local_restart:
 	ldrcc	pc, [tbl, scno, lsl #2]		@ call sys_* routine
 
 	add	r1, sp, #S_OFF
-2:	mov	why, #0				@ no longer a real syscall
 	cmp	scno, #(__ARM_NR_BASE - __NR_SYSCALL_BASE)
 	eor	r0, scno, #__NR_SYSCALL_BASE	@ put OS number back
-	bcs	arm_syscall	
+	bcs	arm_syscall
+2:	mov	why, #0				@ no longer a real syscall
 	b	sys_ni_syscall			@ not private func
 ENDPROC(vector_swi)
 
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] ARM: cacheflush: split user cache-flushing into interruptible chunks
  2013-05-24 11:31 [PATCH v2 0/4] Optimise cache-flushing system call Will Deacon
  2013-05-24 11:31 ` [PATCH v2 1/4] ARM: entry: allow ARM-private syscalls to be restarted Will Deacon
@ 2013-05-24 11:31 ` Will Deacon
  2013-05-24 11:31 ` [PATCH v2 3/4] ARM: cacheflush: don't round address range up to nearest page Will Deacon
  2013-05-24 11:31 ` [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma Will Deacon
  3 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2013-05-24 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

Flushing a large, non-faulting VMA from userspace can potentially result
in a long time spent flushing the cache line-by-line without preemption
occurring (in the case of CONFIG_PREEMPT=n).

Whilst this doesn't affect the stability of the system, it can certainly
affect the responsiveness and CPU availability for other tasks.

This patch splits up the user cacheflush code so that it flushes in
chunks of a page. After each chunk has been flushed, we may reschedule
if appropriate and, before processing the next chunk, we allow any
pending signals to be handled before resuming from where we left off.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/thread_info.h | 11 +++++++
 arch/arm/kernel/traps.c            | 63 +++++++++++++++++++++++++++++++++-----
 2 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 1995d1a..5c3964b 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -43,6 +43,16 @@ struct cpu_context_save {
 	__u32	extra[2];		/* Xscale 'acc' register, etc */
 };
 
+struct arm_restart_block {
+	union {
+		/* For user cache flushing */
+		struct {
+			unsigned long start;
+			unsigned long end;
+		} cache;
+	};
+};
+
 /*
  * low level task data that entry.S needs immediate access to.
  * __switch_to() assumes cpu_context follows immediately after cpu_domain.
@@ -68,6 +78,7 @@ struct thread_info {
 	unsigned long		thumbee_state;	/* ThumbEE Handler Base register */
 #endif
 	struct restart_block	restart_block;
+	struct arm_restart_block	arm_restart_block;
 };
 
 #define INIT_THREAD_INFO(tsk)						\
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 18b32e8..47092f2 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -499,6 +499,52 @@ static int bad_syscall(int n, struct pt_regs *regs)
 	return regs->ARM_r0;
 }
 
+static long do_cache_op_restart(struct restart_block *);
+
+static inline int
+__do_cache_op(unsigned long start, unsigned long end)
+{
+	int ret;
+	unsigned long chunk = PAGE_SIZE;
+
+	do {
+		if (signal_pending(current)) {
+			struct thread_info *ti = current_thread_info();
+
+			ti->restart_block = (struct restart_block) {
+				.fn	= do_cache_op_restart,
+			};
+
+			ti->arm_restart_block = (struct arm_restart_block) {
+				.cache = {
+					.start	= start,
+					.end	= end,
+				},
+			};
+
+			return -ERESTART_RESTARTBLOCK;
+		}
+
+		ret = flush_cache_user_range(start, start + chunk);
+		if (ret)
+			return ret;
+
+		cond_resched();
+		start += chunk;
+	} while (start < end);
+
+	return 0;
+}
+
+static long do_cache_op_restart(struct restart_block *unused)
+{
+	struct arm_restart_block *restart_block;
+
+	restart_block = &current_thread_info()->arm_restart_block;
+	return __do_cache_op(restart_block->cache.start,
+			     restart_block->cache.end);
+}
+
 static inline int
 do_cache_op(unsigned long start, unsigned long end, int flags)
 {
@@ -510,17 +556,18 @@ do_cache_op(unsigned long start, unsigned long end, int flags)
 
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, start);
-	if (vma && vma->vm_start < end) {
-		if (start < vma->vm_start)
-			start = vma->vm_start;
-		if (end > vma->vm_end)
-			end = vma->vm_end;
-
+	if (!vma || vma->vm_start >= end) {
 		up_read(&mm->mmap_sem);
-		return flush_cache_user_range(start, end);
+		return -EINVAL;
 	}
+
+	if (start < vma->vm_start)
+		start = vma->vm_start;
+	if (end > vma->vm_end)
+		end = vma->vm_end;
 	up_read(&mm->mmap_sem);
-	return -EINVAL;
+
+	return __do_cache_op(start, end);
 }
 
 /*
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] ARM: cacheflush: don't round address range up to nearest page
  2013-05-24 11:31 [PATCH v2 0/4] Optimise cache-flushing system call Will Deacon
  2013-05-24 11:31 ` [PATCH v2 1/4] ARM: entry: allow ARM-private syscalls to be restarted Will Deacon
  2013-05-24 11:31 ` [PATCH v2 2/4] ARM: cacheflush: split user cache-flushing into interruptible chunks Will Deacon
@ 2013-05-24 11:31 ` Will Deacon
  2013-05-24 11:31 ` [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma Will Deacon
  3 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2013-05-24 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

The flush_cache_user_range macro takes a pair of addresses describing
the start and end of the virtual address range to flush. Due to an
accidental oversight when flush_cache_range_user was introduced, the
address range was rounded up so that the start and end addresses were
page-aligned.

For historical reference, the interesting commits in history.git are:

10eacf1775e1 ("[ARM] Clean up ARM cache handling interfaces (part 1)")
71432e79b76b ("[ARM] Add flush_cache_user_page() for sys_cacheflush()")

This patch removes the alignment code, reducing the amount of flushing
required for ranges that are not an exact multiple of PAGE_SIZE.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/cacheflush.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index bff7138..39c3faa 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -268,8 +268,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * Harvard caches are synchronised for the user space address range.
  * This is used for the ARM private sys_cacheflush system call.
  */
-#define flush_cache_user_range(start,end) \
-	__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
+#define flush_cache_user_range(s,e)	__cpuc_coherent_user_range(s,e)
 
 /*
  * Perform necessary cache operations to ensure that data previously
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma
  2013-05-24 11:31 [PATCH v2 0/4] Optimise cache-flushing system call Will Deacon
                   ` (2 preceding siblings ...)
  2013-05-24 11:31 ` [PATCH v2 3/4] ARM: cacheflush: don't round address range up to nearest page Will Deacon
@ 2013-05-24 11:31 ` Will Deacon
  2013-05-24 11:59   ` Russell King - ARM Linux
  3 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2013-05-24 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

do_cache_op finds the lowest VMA contained in the specified address
range and rounds the range to cover only the mapped addresses.

Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from
flush_cache_user_range") the VMA is not used for anything else in this
code and seeing as the low-level cache flushing routines return -EFAULT
if the address is not valid, there is no need for this range truncation.

This patch removes the VMA handling code from the cacheflushing syscall.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/traps.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 47092f2..9af88fd 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -548,24 +548,11 @@ static long do_cache_op_restart(struct restart_block *unused)
 static inline int
 do_cache_op(unsigned long start, unsigned long end, int flags)
 {
-	struct mm_struct *mm = current->active_mm;
-	struct vm_area_struct *vma;
-
 	if (end < start || flags)
 		return -EINVAL;
 
-	down_read(&mm->mmap_sem);
-	vma = find_vma(mm, start);
-	if (!vma || vma->vm_start >= end) {
-		up_read(&mm->mmap_sem);
-		return -EINVAL;
-	}
-
-	if (start < vma->vm_start)
-		start = vma->vm_start;
-	if (end > vma->vm_end)
-		end = vma->vm_end;
-	up_read(&mm->mmap_sem);
+	if (!access_ok(VERIFY_READ, start, end - start))
+		return -EFAULT;
 
 	return __do_cache_op(start, end);
 }
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma
  2013-05-24 11:31 ` [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma Will Deacon
@ 2013-05-24 11:59   ` Russell King - ARM Linux
  2013-05-24 12:56     ` Will Deacon
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux @ 2013-05-24 11:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 24, 2013 at 12:31:27PM +0100, Will Deacon wrote:
> do_cache_op finds the lowest VMA contained in the specified address
> range and rounds the range to cover only the mapped addresses.
> 
> Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from
> flush_cache_user_range") the VMA is not used for anything else in this
> code and seeing as the low-level cache flushing routines return -EFAULT
> if the address is not valid, there is no need for this range truncation.
> 
> This patch removes the VMA handling code from the cacheflushing syscall.

The only thing which access_ok() tells you is that the addresses are
_potentially_ valid user addresses.  That's not what the VMA check is
there for.

That check is there to make sure userspace doesn't do something idiotic,
and to keep the use of this API limited to specific actions such as self
modifying code, and not a general purpose cache flushing API covering
multiple VMAs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma
  2013-05-24 11:59   ` Russell King - ARM Linux
@ 2013-05-24 12:56     ` Will Deacon
  0 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2013-05-24 12:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 24, 2013 at 12:59:17PM +0100, Russell King - ARM Linux wrote:
> On Fri, May 24, 2013 at 12:31:27PM +0100, Will Deacon wrote:
> > do_cache_op finds the lowest VMA contained in the specified address
> > range and rounds the range to cover only the mapped addresses.
> > 
> > Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from
> > flush_cache_user_range") the VMA is not used for anything else in this
> > code and seeing as the low-level cache flushing routines return -EFAULT
> > if the address is not valid, there is no need for this range truncation.
> > 
> > This patch removes the VMA handling code from the cacheflushing syscall.
> 
> The only thing which access_ok() tells you is that the addresses are
> _potentially_ valid user addresses.  That's not what the VMA check is
> there for.

Agreed, but it becomes necessary if we remove the vma check, since then
kernel addresses could be passed in unnoticed. The moment we get a fault,
we'll stop and return -EFAULT.

> That check is there to make sure userspace doesn't do something idiotic,
> and to keep the use of this API limited to specific actions such as self
> modifying code, and not a general purpose cache flushing API covering
> multiple VMAs.

Why make the distinction? You can already create single VMAs up to around
2GB and use the syscall in mainline today to flush that area by line. With
these patches we avoid touching mmap_sem, simplify the semantics of the
call, remove the possibility of DoS with non-preemptible kernels (which also
exists in mainline today) and measurably improve performance (~2%
improvement on a browser benchmark test).

If userspace does something idiotic, that should be fine as long as the
idiocy is confined to the task issuing the system call.

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-24 12:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-24 11:31 [PATCH v2 0/4] Optimise cache-flushing system call Will Deacon
2013-05-24 11:31 ` [PATCH v2 1/4] ARM: entry: allow ARM-private syscalls to be restarted Will Deacon
2013-05-24 11:31 ` [PATCH v2 2/4] ARM: cacheflush: split user cache-flushing into interruptible chunks Will Deacon
2013-05-24 11:31 ` [PATCH v2 3/4] ARM: cacheflush: don't round address range up to nearest page Will Deacon
2013-05-24 11:31 ` [PATCH v2 4/4] ARM: cacheflush: don't bother rounding to nearest vma Will Deacon
2013-05-24 11:59   ` Russell King - ARM Linux
2013-05-24 12:56     ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.