All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] ARM: uprobes need icache flush after xol write
@ 2014-04-09  5:58 Victor Kamensky
  2014-04-09  5:58 ` Victor Kamensky
  0 siblings, 1 reply; 66+ messages in thread
From: Victor Kamensky @ 2014-04-09  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Guys,

This second version of patch to flush icache and dcache after
uprobes xol write to make written instruction available in icache.
Please see [1] for initial discussion.

This patch follows Russell's suggestion, and function that does
cache flush after xol slot instruction write is shared/reused
with similar one implemented already for ptrace code.

In order to reuse common implementation but to avoid vma use
by xol_get_insn_slot I split flush_ptrace_access into two
functions. Where first part retrieves all required conditions
from vma and places them into flags variable and then calls 
second function which is common code.

Also I had to change xol_get_insn_slot function to map page
into kernel explicitly within function without use of 
copy_to_page helper because ARM cache flush code need both
kernel address through which instruction write happens and
virtual address of user-land process where instruction will
end up. I hope this call back is universal enough so other
CPU could implement their cache invalidation/sync after
uprobes xol instruction write logic based on provided
parameters.

I've tested it on Arndale board with my SystemTap test case
that had cache problem before. Disassemble of 
flush_uprobe_xol_access in case of Arndale shows that compiler
does good job and optimizes out all flags check effectively
leaving on this cpu call to flush_icache_alias or call to
v7_coherent_user_range (__cpuc_coherent_kern_range).

Also tested basic user-level debugging.

Wondering on what ARM boards/cpus could we test cache_is_vivt()
and cache_is_vipt_aliasing cases ...

Just to summarize, please note on [1] there were couple other
suggestions:

   Oleg suggested to use flush_icache_user_range but Russell
argument was that meaning of the function is lost and on ARM
it is not implemented in such way that it could address the
issue anyway. Please see [2] for details. Note it would has
vma problem use or not, that should be hacked.

   Dave Martin suggested to use flush_icache_range, which is
effectively better way to call 
__cpuc_coherent_[kern|user]_range(s,e), that was originally 
suggested. But Russell explained that it won't be enough in
case of user-land process pages and variety of cache types have
to be covered. Note for kernel pages it would be OK and it is
used in multiple places like kprobes, modules, etc.

Thanks,
Victor

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245595.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245427.html

[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245605.html

Victor Kamensky (1):
  ARM: uprobes need icache flush after xol write

 arch/arm/include/asm/cacheflush.h |  2 ++
 arch/arm/kernel/uprobes.c         |  6 ++++++
 arch/arm/mm/flush.c               | 41 +++++++++++++++++++++++++++++++++------
 include/linux/uprobes.h           |  3 +++
 kernel/events/uprobes.c           | 33 +++++++++++++++++++++++++------
 5 files changed, 73 insertions(+), 12 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2] ARM: uprobes need icache flush after xol write
  2014-04-09  5:58 [PATCH v2] ARM: uprobes need icache flush after xol write Victor Kamensky
@ 2014-04-09  5:58 ` Victor Kamensky
  2014-04-09 18:23   ` David Long
  2014-04-09 18:45   ` Oleg Nesterov
  0 siblings, 2 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-09  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

After instruction write into xol area, on ARM V7
architecture code need to flush dcache and icache to sync
them up for given set of addresses. Having just
'flush_dcache_page(page)' call is not enough - it is
possible to have stale instruction sitting in icache
for given xol area slot address.

Introduce arch_uprobe_flush_xol_access weak function
that by default calls 'flush_dcache_page(page)' and on
ARM define new one that calls flush_uprobe_xol_access
function.

flush_uprobe_xol_access function shares/reuses implementation
with/of flush_ptrace_access function and takes care of writing
instruction to user land address space on given variety of
different cache types on ARM CPUs. Because
flush_uprobe_xol_access does not have vma around
flush_ptrace_access was split into two parts. First that
retrieves set of condition from vma and common that receives
those conditions as flags.

Because on ARM cache flush function need kernel address
through which instruction write happened, changed
xol_get_insn_slot to explicitly map page and do memcpy
rather than use copy_to_page helper. In this case
xol_get_insn_slot knows kernel address and passes it to
arch_uprobe_flush_xol_access function.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/include/asm/cacheflush.h |  2 ++
 arch/arm/kernel/uprobes.c         |  6 ++++++
 arch/arm/mm/flush.c               | 41 +++++++++++++++++++++++++++++++++------
 include/linux/uprobes.h           |  3 +++
 kernel/events/uprobes.c           | 33 +++++++++++++++++++++++++------
 5 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 8b8b616..e02712a 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -487,4 +487,6 @@ int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_x(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
 
+void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
+			     void *kaddr, unsigned long len);
 #endif
diff --git a/arch/arm/kernel/uprobes.c b/arch/arm/kernel/uprobes.c
index f9bacee..a0339a6 100644
--- a/arch/arm/kernel/uprobes.c
+++ b/arch/arm/kernel/uprobes.c
@@ -113,6 +113,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	return 0;
 }
 
+void arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
+				  void *kaddr, unsigned long len)
+{
+	flush_uprobe_xol_access(page, vaddr, kaddr, len);
+}
+
 int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
 	struct uprobe_task *utask = current->utask;
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 3387e60..69a0bd08 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -104,17 +104,21 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
 #define flush_icache_alias(pfn,vaddr,len)	do { } while (0)
 #endif
 
+#define FLAG_UA_IS_EXEC 1
+#define FLAG_UA_CORE_IN_MM 2
+#define FLAG_UA_BROADCAST 4
+
 static void flush_ptrace_access_other(void *args)
 {
 	__flush_icache_all();
 }
 
-static
-void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
-			 unsigned long uaddr, void *kaddr, unsigned long len)
+static inline
+void __flush_ptrace_access(struct page *page, unsigned long uaddr, void *kaddr,
+			   unsigned long len, unsigned int flags)
 {
 	if (cache_is_vivt()) {
-		if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
+		if (flags & FLAG_UA_CORE_IN_MM) {
 			unsigned long addr = (unsigned long)kaddr;
 			__cpuc_coherent_kern_range(addr, addr + len);
 		}
@@ -128,18 +132,43 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 	}
 
 	/* VIPT non-aliasing D-cache */
-	if (vma->vm_flags & VM_EXEC) {
+	if (flags & FLAG_UA_IS_EXEC) {
 		unsigned long addr = (unsigned long)kaddr;
 		if (icache_is_vipt_aliasing())
 			flush_icache_alias(page_to_pfn(page), uaddr, len);
 		else
 			__cpuc_coherent_kern_range(addr, addr + len);
-		if (cache_ops_need_broadcast())
+		if (flags & FLAG_UA_BROADCAST)
 			smp_call_function(flush_ptrace_access_other,
 					  NULL, 1);
 	}
 }
 
+static
+void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
+			 unsigned long uaddr, void *kaddr, unsigned long len)
+{
+	unsigned int flags = 0;
+	if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
+		flags |= FLAG_UA_CORE_IN_MM;
+	}
+	if (vma->vm_flags & VM_EXEC) {
+		flags |= FLAG_UA_IS_EXEC;
+	}
+	if (cache_ops_need_broadcast()) {
+		flags |= FLAG_UA_BROADCAST;
+	}
+	__flush_ptrace_access(page, uaddr, kaddr, len, flags);
+}
+
+void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
+			     void *kaddr, unsigned long len)
+{
+	unsigned int flags = FLAG_UA_CORE_IN_MM|FLAG_UA_IS_EXEC;
+
+	__flush_ptrace_access(page, uaddr, kaddr, len, flags);
+}
+
 /*
  * Copy user data from/to a page which is mapped into a different
  * processes address space.  Really, we want to allow our "user
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index edff2b9..534e083 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -32,6 +32,7 @@ struct vm_area_struct;
 struct mm_struct;
 struct inode;
 struct notifier_block;
+struct page;
 
 #define UPROBE_HANDLER_REMOVE		1
 #define UPROBE_HANDLER_MASK		1
@@ -127,6 +128,8 @@ extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned l
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
 extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
 extern bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
+extern void __weak arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
+						void *kaddr, unsigned long len);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..b9142d5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1287,6 +1287,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 {
 	struct xol_area *area;
 	unsigned long xol_vaddr;
+	void *xol_page_kaddr;
 
 	area = get_xol_area();
 	if (!area)
@@ -1296,14 +1297,22 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 	if (unlikely(!xol_vaddr))
 		return 0;
 
-	/* Initialize the slot */
-	copy_to_page(area->page, xol_vaddr,
-			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
 	/*
-	 * We probably need flush_icache_user_range() but it needs vma.
-	 * This should work on supported architectures too.
+	 * We don't use copy_to_page here because we need kernel page
+	 * addr to invalidate caches correctly
 	 */
-	flush_dcache_page(area->page);
+	xol_page_kaddr = kmap_atomic(area->page);
+
+	/* Initialize the slot */
+	memcpy(xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
+	       &uprobe->arch.ixol,
+	       sizeof(uprobe->arch.ixol));
+
+	arch_uprobe_flush_xol_access(area->page, xol_vaddr,
+				     xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
+				     sizeof(uprobe->arch.ixol));
+
+	kunmap_atomic(xol_page_kaddr);
 
 	return xol_vaddr;
 }
@@ -1346,6 +1355,18 @@ static void xol_free_insn_slot(struct task_struct *tsk)
 	}
 }
 
+void __weak arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
+					 void *kaddr, unsigned long len)
+{
+	/*
+	 * We probably need flush_icache_user_range() but it needs vma.
+	 * This should work on most of architectures by default. If
+	 * architecture needs to do something different it can define
+	 * its own version of the function.
+	 */
+	flush_dcache_page(page);
+}
+
 /**
  * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
  * @regs: Reflects the saved state of the task after it has hit a breakpoint
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2] ARM: uprobes need icache flush after xol write
  2014-04-09  5:58 ` Victor Kamensky
@ 2014-04-09 18:23   ` David Long
  2014-04-09 18:45   ` Oleg Nesterov
  1 sibling, 0 replies; 66+ messages in thread
From: David Long @ 2014-04-09 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/09/14 01:58, Victor Kamensky wrote:
> After instruction write into xol area, on ARM V7
> architecture code need to flush dcache and icache to sync
> them up for given set of addresses. Having just
> 'flush_dcache_page(page)' call is not enough - it is
> possible to have stale instruction sitting in icache
> for given xol area slot address.
>
> Introduce arch_uprobe_flush_xol_access weak function
> that by default calls 'flush_dcache_page(page)' and on
> ARM define new one that calls flush_uprobe_xol_access
> function.
>
> flush_uprobe_xol_access function shares/reuses implementation
> with/of flush_ptrace_access function and takes care of writing
> instruction to user land address space on given variety of
> different cache types on ARM CPUs. Because
> flush_uprobe_xol_access does not have vma around
> flush_ptrace_access was split into two parts. First that
> retrieves set of condition from vma and common that receives
> those conditions as flags.
>
> Because on ARM cache flush function need kernel address
> through which instruction write happened, changed
> xol_get_insn_slot to explicitly map page and do memcpy
> rather than use copy_to_page helper. In this case
> xol_get_insn_slot knows kernel address and passes it to
> arch_uprobe_flush_xol_access function.
>
> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
> ---
>   arch/arm/include/asm/cacheflush.h |  2 ++
>   arch/arm/kernel/uprobes.c         |  6 ++++++
>   arch/arm/mm/flush.c               | 41 +++++++++++++++++++++++++++++++++------
>   include/linux/uprobes.h           |  3 +++
>   kernel/events/uprobes.c           | 33 +++++++++++++++++++++++++------
>   5 files changed, 73 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 8b8b616..e02712a 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -487,4 +487,6 @@ int set_memory_rw(unsigned long addr, int numpages);
>   int set_memory_x(unsigned long addr, int numpages);
>   int set_memory_nx(unsigned long addr, int numpages);
>
> +void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
> +			     void *kaddr, unsigned long len);
>   #endif
> diff --git a/arch/arm/kernel/uprobes.c b/arch/arm/kernel/uprobes.c
> index f9bacee..a0339a6 100644
> --- a/arch/arm/kernel/uprobes.c
> +++ b/arch/arm/kernel/uprobes.c
> @@ -113,6 +113,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
>   	return 0;
>   }
>
> +void arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
> +				  void *kaddr, unsigned long len)
> +{
> +	flush_uprobe_xol_access(page, vaddr, kaddr, len);
> +}
> +
>   int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>   {
>   	struct uprobe_task *utask = current->utask;
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 3387e60..69a0bd08 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -104,17 +104,21 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
>   #define flush_icache_alias(pfn,vaddr,len)	do { } while (0)
>   #endif
>
> +#define FLAG_UA_IS_EXEC 1
> +#define FLAG_UA_CORE_IN_MM 2
> +#define FLAG_UA_BROADCAST 4
> +
>   static void flush_ptrace_access_other(void *args)
>   {
>   	__flush_icache_all();
>   }
>
> -static
> -void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
> -			 unsigned long uaddr, void *kaddr, unsigned long len)
> +static inline
> +void __flush_ptrace_access(struct page *page, unsigned long uaddr, void *kaddr,
> +			   unsigned long len, unsigned int flags)
>   {
>   	if (cache_is_vivt()) {
> -		if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
> +		if (flags & FLAG_UA_CORE_IN_MM) {
>   			unsigned long addr = (unsigned long)kaddr;
>   			__cpuc_coherent_kern_range(addr, addr + len);
>   		}
> @@ -128,18 +132,43 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
>   	}
>
>   	/* VIPT non-aliasing D-cache */
> -	if (vma->vm_flags & VM_EXEC) {
> +	if (flags & FLAG_UA_IS_EXEC) {
>   		unsigned long addr = (unsigned long)kaddr;
>   		if (icache_is_vipt_aliasing())
>   			flush_icache_alias(page_to_pfn(page), uaddr, len);
>   		else
>   			__cpuc_coherent_kern_range(addr, addr + len);
> -		if (cache_ops_need_broadcast())
> +		if (flags & FLAG_UA_BROADCAST)
>   			smp_call_function(flush_ptrace_access_other,
>   					  NULL, 1);
>   	}
>   }
>
> +static
> +void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
> +			 unsigned long uaddr, void *kaddr, unsigned long len)
> +{
> +	unsigned int flags = 0;
> +	if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
> +		flags |= FLAG_UA_CORE_IN_MM;
> +	}
> +	if (vma->vm_flags & VM_EXEC) {
> +		flags |= FLAG_UA_IS_EXEC;
> +	}
> +	if (cache_ops_need_broadcast()) {
> +		flags |= FLAG_UA_BROADCAST;
> +	}
> +	__flush_ptrace_access(page, uaddr, kaddr, len, flags);
> +}
> +
> +void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
> +			     void *kaddr, unsigned long len)
> +{
> +	unsigned int flags = FLAG_UA_CORE_IN_MM|FLAG_UA_IS_EXEC;
> +
> +	__flush_ptrace_access(page, uaddr, kaddr, len, flags);
> +}
> +

I guess this is cleaner than duplicating some part of 
flush_ptrace_access inside your new flush function.

>   /*
>    * Copy user data from/to a page which is mapped into a different
>    * processes address space.  Really, we want to allow our "user
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index edff2b9..534e083 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -32,6 +32,7 @@ struct vm_area_struct;
>   struct mm_struct;
>   struct inode;
>   struct notifier_block;
> +struct page;
>
>   #define UPROBE_HANDLER_REMOVE		1
>   #define UPROBE_HANDLER_MASK		1
> @@ -127,6 +128,8 @@ extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned l
>   extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
>   extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
>   extern bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
> +extern void __weak arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
> +						void *kaddr, unsigned long len);
>   #else /* !CONFIG_UPROBES */
>   struct uprobes_state {
>   };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 04709b6..b9142d5 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1287,6 +1287,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>   {
>   	struct xol_area *area;
>   	unsigned long xol_vaddr;
> +	void *xol_page_kaddr;
>
>   	area = get_xol_area();
>   	if (!area)
> @@ -1296,14 +1297,22 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>   	if (unlikely(!xol_vaddr))
>   		return 0;
>
> -	/* Initialize the slot */
> -	copy_to_page(area->page, xol_vaddr,
> -			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
>   	/*
> -	 * We probably need flush_icache_user_range() but it needs vma.
> -	 * This should work on supported architectures too.
> +	 * We don't use copy_to_page here because we need kernel page
> +	 * addr to invalidate caches correctly
>   	 */
> -	flush_dcache_page(area->page);
> +	xol_page_kaddr = kmap_atomic(area->page);
> +
> +	/* Initialize the slot */
> +	memcpy(xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
> +	       &uprobe->arch.ixol,
> +	       sizeof(uprobe->arch.ixol));
> +
> +	arch_uprobe_flush_xol_access(area->page, xol_vaddr,
> +				     xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
> +				     sizeof(uprobe->arch.ixol));
> +
> +	kunmap_atomic(xol_page_kaddr);
>
>   	return xol_vaddr;
>   }
> @@ -1346,6 +1355,18 @@ static void xol_free_insn_slot(struct task_struct *tsk)
>   	}
>   }
>
> +void __weak arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
> +					 void *kaddr, unsigned long len)
> +{
> +	/*
> +	 * We probably need flush_icache_user_range() but it needs vma.
> +	 * This should work on most of architectures by default. If
> +	 * architecture needs to do something different it can define
> +	 * its own version of the function.
> +	 */
> +	flush_dcache_page(page);
> +}
> +
>   /**
>    * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
>    * @regs: Reflects the saved state of the task after it has hit a breakpoint
>

I can confirm this fixes the problem I was seeing on arndale.  x86 
builds but I have not tested it.

It looks to me like this is the best way to solve the problem.  You 
should run checkpatch.pl on your patch though, there are some warnings 
from unnecessary curly braces on if statements and a couple lines over 
80 characters long.  Minor issues, but easily fixed.  Assuming you fix 
those you can add:

Reviewed-by: David A. Long <dave.long@linaro.org>


-dl

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2] ARM: uprobes need icache flush after xol write
  2014-04-09  5:58 ` Victor Kamensky
  2014-04-09 18:23   ` David Long
@ 2014-04-09 18:45   ` Oleg Nesterov
  2014-04-09 19:13     ` Victor Kamensky
  1 sibling, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-09 18:45 UTC (permalink / raw)
  To: linux-arm-kernel

Victor, sorry, I can't even read this thread now, will try to reply
tomorrow... But at first glance,

On 04/08, Victor Kamensky wrote:
>
> Because on ARM cache flush function need kernel address

...

> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1287,6 +1287,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>  {
>  	struct xol_area *area;
>  	unsigned long xol_vaddr;
> +	void *xol_page_kaddr;
>  
>  	area = get_xol_area();
>  	if (!area)
> @@ -1296,14 +1297,22 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>  	if (unlikely(!xol_vaddr))
>  		return 0;
>  
> -	/* Initialize the slot */
> -	copy_to_page(area->page, xol_vaddr,
> -			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
>  	/*
> -	 * We probably need flush_icache_user_range() but it needs vma.
> -	 * This should work on supported architectures too.
> +	 * We don't use copy_to_page here because we need kernel page
> +	 * addr to invalidate caches correctly
>  	 */
> -	flush_dcache_page(area->page);
> +	xol_page_kaddr = kmap_atomic(area->page);
> +
> +	/* Initialize the slot */
> +	memcpy(xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
> +	       &uprobe->arch.ixol,
> +	       sizeof(uprobe->arch.ixol));
> +
> +	arch_uprobe_flush_xol_access(area->page, xol_vaddr,
> +				     xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
> +				     sizeof(uprobe->arch.ixol));
> +
> +	kunmap_atomic(xol_page_kaddr);
>  
>  	return xol_vaddr;
>  }
> @@ -1346,6 +1355,18 @@ static void xol_free_insn_slot(struct task_struct *tsk)
>  	}
>  }
>  
> +void __weak arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
> +					 void *kaddr, unsigned long len)
> +{
> +	/*
> +	 * We probably need flush_icache_user_range() but it needs vma.
> +	 * This should work on most of architectures by default. If
> +	 * architecture needs to do something different it can define
> +	 * its own version of the function.
> +	 */
> +	flush_dcache_page(page);
> +}

In this case I'd suggest to move this kmap/memcpy horror into arch_ helper.
IOW, something like below.

If arm needs the kernel address we are writing to, let it do kmap/etc in
its implementation.

Oleg.


--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -1291,18 +1291,16 @@ static unsigned long xol_get_insn_slot(s
 	if (unlikely(!xol_vaddr))
 		return 0;
 
-	/* Initialize the slot */
-	copy_to_page(area->page, xol_vaddr,
-			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-	/*
-	 * We probably need flush_icache_user_range() but it needs vma.
-	 * This should work on supported architectures too.
-	 */
-	flush_dcache_page(area->page);
-
+	arch_uprobe_xxx(area->page, xol_vaddr, ...);
 	return xol_vaddr;
 }
 
+void __weak arch_uprobe_xxx(page, xol_vaddr, ...)
+{
+	copy_to_page(page, xol_vaddr, ...)
+	flush_dcache_page(page);
+}
+
 /*
  * xol_free_insn_slot - If slot was earlier allocated by
  * @xol_get_insn_slot(), make the slot available for

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2] ARM: uprobes need icache flush after xol write
  2014-04-09 18:45   ` Oleg Nesterov
@ 2014-04-09 19:13     ` Victor Kamensky
  2014-04-09 19:19       ` Russell King - ARM Linux
                         ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-09 19:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 9 April 2014 11:45, Oleg Nesterov <oleg@redhat.com> wrote:
> Victor, sorry, I can't even read this thread now, will try to reply
> tomorrow... But at first glance,

Sure, np. Will wait for you to free up.

> On 04/08, Victor Kamensky wrote:
>>
>> Because on ARM cache flush function need kernel address
>
> ...
>
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1287,6 +1287,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>>  {
>>       struct xol_area *area;
>>       unsigned long xol_vaddr;
>> +     void *xol_page_kaddr;
>>
>>       area = get_xol_area();
>>       if (!area)
>> @@ -1296,14 +1297,22 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>>       if (unlikely(!xol_vaddr))
>>               return 0;
>>
>> -     /* Initialize the slot */
>> -     copy_to_page(area->page, xol_vaddr,
>> -                     &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
>>       /*
>> -      * We probably need flush_icache_user_range() but it needs vma.
>> -      * This should work on supported architectures too.
>> +      * We don't use copy_to_page here because we need kernel page
>> +      * addr to invalidate caches correctly
>>        */
>> -     flush_dcache_page(area->page);
>> +     xol_page_kaddr = kmap_atomic(area->page);
>> +
>> +     /* Initialize the slot */
>> +     memcpy(xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
>> +            &uprobe->arch.ixol,
>> +            sizeof(uprobe->arch.ixol));
>> +
>> +     arch_uprobe_flush_xol_access(area->page, xol_vaddr,
>> +                                  xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
>> +                                  sizeof(uprobe->arch.ixol));
>> +
>> +     kunmap_atomic(xol_page_kaddr);
>>
>>       return xol_vaddr;
>>  }
>> @@ -1346,6 +1355,18 @@ static void xol_free_insn_slot(struct task_struct *tsk)
>>       }
>>  }
>>
>> +void __weak arch_uprobe_flush_xol_access(struct page *page, unsigned long vaddr,
>> +                                      void *kaddr, unsigned long len)
>> +{
>> +     /*
>> +      * We probably need flush_icache_user_range() but it needs vma.
>> +      * This should work on most of architectures by default. If
>> +      * architecture needs to do something different it can define
>> +      * its own version of the function.
>> +      */
>> +     flush_dcache_page(page);
>> +}
>
> In this case I'd suggest to move this kmap/memcpy horror into arch_ helper.
> IOW, something like below.
>
> If arm needs the kernel address we are writing to, let it do kmap/etc in
> its implementation.

Fair enough. Can do that, as well as address Dave's comment about
checkpatch.

But before I do that let's reach some conclusion wrt latest Russell's
remarks about copy_{to,from}_user_page usage in uprobes. It is bigger
question covering more than ARM issue and possibly having bigger
implication on uprobes design. I have some comments/thoughts about
it, but since I am not uprobes maintainer, will wait for you to address
it first.

Thanks,
Victor

> Oleg.
>
>
> --- x/kernel/events/uprobes.c
> +++ x/kernel/events/uprobes.c
> @@ -1291,18 +1291,16 @@ static unsigned long xol_get_insn_slot(s
>         if (unlikely(!xol_vaddr))
>                 return 0;
>
> -       /* Initialize the slot */
> -       copy_to_page(area->page, xol_vaddr,
> -                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
> -       /*
> -        * We probably need flush_icache_user_range() but it needs vma.
> -        * This should work on supported architectures too.
> -        */
> -       flush_dcache_page(area->page);
> -
> +       arch_uprobe_xxx(area->page, xol_vaddr, ...);
>         return xol_vaddr;
>  }
>
> +void __weak arch_uprobe_xxx(page, xol_vaddr, ...)
> +{
> +       copy_to_page(page, xol_vaddr, ...)
> +       flush_dcache_page(page);
> +}
> +
>  /*
>   * xol_free_insn_slot - If slot was earlier allocated by
>   * @xol_get_insn_slot(), make the slot available for
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2] ARM: uprobes need icache flush after xol write
  2014-04-09 19:13     ` Victor Kamensky
@ 2014-04-09 19:19       ` Russell King - ARM Linux
  2014-04-11  3:42       ` [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing David Long
  2014-04-11  3:45       ` David Long
  2 siblings, 0 replies; 66+ messages in thread
From: Russell King - ARM Linux @ 2014-04-09 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 09, 2014 at 12:13:56PM -0700, Victor Kamensky wrote:
> Fair enough. Can do that, as well as address Dave's comment about
> checkpatch.
> 
> But before I do that let's reach some conclusion wrt latest Russell's
> remarks about copy_{to,from}_user_page usage in uprobes. It is bigger
> question covering more than ARM issue and possibly having bigger
> implication on uprobes design. I have some comments/thoughts about
> it, but since I am not uprobes maintainer, will wait for you to address
> it first.

I encourage folk to read this thread:

http://lkml.iu.edu//hypermail/linux/kernel/1404.1/00725.html

and involve David in the loop - I suspect you need to sell whatever
solution you agree on to David (as one of the other arch maintainers)
as well.  Thanks.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-09 19:13     ` Victor Kamensky
  2014-04-09 19:19       ` Russell King - ARM Linux
@ 2014-04-11  3:42       ` David Long
  2014-04-11  3:45       ` David Long
  2 siblings, 0 replies; 66+ messages in thread
From: David Long @ 2014-04-11  3:42 UTC (permalink / raw)
  To: linux-arm-kernel

OK, here is an alternative solution for the kernel-dcache / user-icache
flush issue in uprobes which I think follows Dave Miller's suggested
approach.  As a reminder:  the goal is to make sure the user-space
icache does not have stale data after the kernel rewrite of an
instruction in the user's uprobe "execute out of line" (xol) page. It
seems only ARM currently finds the flush_dcache_page() call
insufficient, but then apparently only two architectures (other than
ARM) support uprobes.

I've modified events/uprobes.c to simply call the copy_to_user_page()
function instead of doing a memcpy() followed by a flush_dcache_page()
call.  This results in a net reduction of one line of code in that file.
Then I modified copy_to_user_page() and/or flushing function(s) it
calls to treat a NULL vma pointer to mean:  "assume the user icache
address range is now invalid".  In the majority of cases this is pretty
basic and should be safe as nothing could have been doing this
previously.  In some cases this now results in flushing more icache
than is necessary.  For the mips, sh, sparc, and alpha architectures
something more complicated is necessary and I have not currently done
that.  I am not certain this approach can be made to work cleanly for
those architectures, although there is probably always the last resort
of flushing all icache.  On the other hand, it appears only x86,
powerpc, and (god-willing) ARM currently support uprobes.

I have only tested this on ARM (arndale) at this point.

The preliminary patch follows in my next email.

(BTW, is depending on the C compiler short-circuiting conditonals
acceptable style?)

-dl

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-09 19:13     ` Victor Kamensky
  2014-04-09 19:19       ` Russell King - ARM Linux
  2014-04-11  3:42       ` [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing David Long
@ 2014-04-11  3:45       ` David Long
  2014-04-11  4:36         ` David Miller
                           ` (2 more replies)
  2 siblings, 3 replies; 66+ messages in thread
From: David Long @ 2014-04-11  3:45 UTC (permalink / raw)
  To: linux-arm-kernel

Replace memcpy and dcache flush in generic uprobes with a call to
copy_to_user_page(), which will do a proper flushing of kernel and
user cache.  Also modify the inmplementation of copy_to_user_page
to assume a NULL vma pointer means the user icache corresponding
to this right is stale and needs to be flushed.  Note that this patch
does not fix copy_to_user page for the sh, alpha, sparc, or mips
architectures (which do not currently support uprobes).

Signed-off-by: David A. Long <dave.long@linaro.org>
---
 arch/arc/include/asm/cacheflush.h        | 2 +-
 arch/arm/mm/flush.c                      | 4 ++--
 arch/arm64/mm/flush.c                    | 2 +-
 arch/avr32/mm/cache.c                    | 2 +-
 arch/hexagon/include/asm/cacheflush.h    | 2 +-
 arch/m68k/include/asm/cacheflush_mm.h    | 2 +-
 arch/microblaze/include/asm/cacheflush.h | 2 +-
 arch/mips/mm/init.c                      | 2 +-
 arch/parisc/include/asm/tlbflush.h       | 4 ++++
 arch/parisc/kernel/cache.c               | 4 ++--
 arch/score/include/asm/cacheflush.h      | 2 +-
 arch/score/mm/cache.c                    | 2 +-
 arch/sh/mm/cache.c                       | 2 +-
 arch/sparc/mm/leon_mm.c                  | 2 +-
 arch/tile/include/asm/cacheflush.h       | 2 +-
 arch/unicore32/mm/flush.c                | 2 +-
 arch/xtensa/mm/cache.c                   | 4 ++--
 kernel/events/uprobes.c                  | 7 +------
 18 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index 6abc497..64d67e4 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -110,7 +110,7 @@ static inline int cache_is_vipt_aliasing(void)
 #define copy_to_user_page(vma, page, vaddr, dst, src, len)		\
 do {									\
 	memcpy(dst, src, len);						\
-	if (vma->vm_flags & VM_EXEC)					\
+	if (!vma || vma->vm_flags & VM_EXEC)					\
 		__sync_icache_dcache((unsigned long)(dst), vaddr, len);	\
 } while (0)
 
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 3387e60..dd19ad4 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -114,7 +114,7 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 			 unsigned long uaddr, void *kaddr, unsigned long len)
 {
 	if (cache_is_vivt()) {
-		if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
+		if (!vma || cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
 			unsigned long addr = (unsigned long)kaddr;
 			__cpuc_coherent_kern_range(addr, addr + len);
 		}
@@ -128,7 +128,7 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 	}
 
 	/* VIPT non-aliasing D-cache */
-	if (vma->vm_flags & VM_EXEC) {
+	if (!vma || vma->vm_flags & VM_EXEC) {
 		unsigned long addr = (unsigned long)kaddr;
 		if (icache_is_vipt_aliasing())
 			flush_icache_alias(page_to_pfn(page), uaddr, len);
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index e4193e3..cde3cb4 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -38,7 +38,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 				unsigned long uaddr, void *kaddr,
 				unsigned long len)
 {
-	if (vma->vm_flags & VM_EXEC) {
+	if (!vma || vma->vm_flags & VM_EXEC) {
 		unsigned long addr = (unsigned long)kaddr;
 		if (icache_is_aliasing()) {
 			__flush_dcache_area(kaddr, len);
diff --git a/arch/avr32/mm/cache.c b/arch/avr32/mm/cache.c
index 6a46ecd..cd3d378 100644
--- a/arch/avr32/mm/cache.c
+++ b/arch/avr32/mm/cache.c
@@ -156,7 +156,7 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 		unsigned long len)
 {
 	memcpy(dst, src, len);
-	if (vma->vm_flags & VM_EXEC)
+	if (!vma || vma->vm_flags & VM_EXEC)
 		flush_icache_range((unsigned long)dst,
 				(unsigned long)dst + len);
 }
diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
index 49e0896..9bea768 100644
--- a/arch/hexagon/include/asm/cacheflush.h
+++ b/arch/hexagon/include/asm/cacheflush.h
@@ -86,7 +86,7 @@ static inline void copy_to_user_page(struct vm_area_struct *vma,
 					     void *dst, void *src, int len)
 {
 	memcpy(dst, src, len);
-	if (vma->vm_flags & VM_EXEC) {
+	if (!vma || vma->vm_flags & VM_EXEC) {
 		flush_icache_range((unsigned long) dst,
 		(unsigned long) dst + len);
 	}
diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
index fa2c3d6..afefdeb 100644
--- a/arch/m68k/include/asm/cacheflush_mm.h
+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -212,7 +212,7 @@ static inline void flush_cache_range(struct vm_area_struct *vma,
 
 static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn)
 {
-	if (vma->vm_mm == current->mm)
+	if (!vma || vma->vm_mm == current->mm)
 	        __flush_cache_030();
 }
 
diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h
index ffea82a..9eef956 100644
--- a/arch/microblaze/include/asm/cacheflush.h
+++ b/arch/microblaze/include/asm/cacheflush.h
@@ -108,7 +108,7 @@ static inline void copy_to_user_page(struct vm_area_struct *vma,
 {
 	u32 addr = virt_to_phys(dst);
 	memcpy(dst, src, len);
-	if (vma->vm_flags & VM_EXEC) {
+	if (!vma || vma->vm_flags & VM_EXEC) {
 		invalidate_icache_range(addr, addr + PAGE_SIZE);
 		flush_dcache_range(addr, addr + PAGE_SIZE);
 	}
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 6b59617..e428551 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -232,7 +232,7 @@ void copy_to_user_page(struct vm_area_struct *vma,
 		if (cpu_has_dc_aliases)
 			SetPageDcacheDirty(page);
 	}
-	if ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc)
+	if ((!vma || vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc)
 		flush_cache_page(vma, vaddr, page_to_pfn(page));
 }
 
diff --git a/arch/parisc/include/asm/tlbflush.h b/arch/parisc/include/asm/tlbflush.h
index 9d086a5..7aad1b6 100644
--- a/arch/parisc/include/asm/tlbflush.h
+++ b/arch/parisc/include/asm/tlbflush.h
@@ -68,6 +68,10 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
 	/* For one page, it's not worth testing the split_tlb variable */
 
 	mb();
+	if (!vma) {
+		flush_tlb_all();
+		return;
+	}
 	sid = vma->vm_mm->context;
 	purge_tlb_start(flags);
 	mtsp(sid, 1);
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index ac87a40..ff09f05 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -278,7 +278,7 @@ __flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 {
 	preempt_disable();
 	flush_dcache_page_asm(physaddr, vmaddr);
-	if (vma->vm_flags & VM_EXEC)
+	if (!vma || vma->vm_flags & VM_EXEC)
 		flush_icache_page_asm(physaddr, vmaddr);
 	preempt_enable();
 }
@@ -574,7 +574,7 @@ void flush_cache_range(struct vm_area_struct *vma,
 void
 flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn)
 {
-	BUG_ON(!vma->vm_mm->context);
+	BUG_ON(vma && !vma->vm_mm->context);
 
 	if (pfn_valid(pfn)) {
 		flush_tlb_page(vma, vmaddr);
diff --git a/arch/score/include/asm/cacheflush.h b/arch/score/include/asm/cacheflush.h
index 1d545d0..63e7b4e 100644
--- a/arch/score/include/asm/cacheflush.h
+++ b/arch/score/include/asm/cacheflush.h
@@ -41,7 +41,7 @@ static inline void flush_icache_page(struct vm_area_struct *vma,
 #define copy_to_user_page(vma, page, vaddr, dst, src, len)	\
 	do {							\
 		memcpy(dst, src, len);				\
-		if ((vma->vm_flags & VM_EXEC))			\
+		if (!vma || (vma->vm_flags & VM_EXEC))			\
 			flush_cache_page(vma, vaddr, page_to_pfn(page));\
 	} while (0)
 
diff --git a/arch/score/mm/cache.c b/arch/score/mm/cache.c
index f85ec1a..8464575 100644
--- a/arch/score/mm/cache.c
+++ b/arch/score/mm/cache.c
@@ -210,7 +210,7 @@ void flush_cache_range(struct vm_area_struct *vma,
 void flush_cache_page(struct vm_area_struct *vma,
 		unsigned long addr, unsigned long pfn)
 {
-	int exec = vma->vm_flags & VM_EXEC;
+	int exec = !vma || vma->vm_flags & VM_EXEC;
 	unsigned long kaddr = 0xa0000000 | (pfn << PAGE_SHIFT);
 
 	flush_dcache_range(kaddr, kaddr + PAGE_SIZE);
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 616966a..ba2313a 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -70,7 +70,7 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 			clear_bit(PG_dcache_clean, &page->flags);
 	}
 
-	if (vma->vm_flags & VM_EXEC)
+	if (!vma || vma->vm_flags & VM_EXEC)
 		flush_cache_page(vma, vaddr, page_to_pfn(page));
 }
 
diff --git a/arch/sparc/mm/leon_mm.c b/arch/sparc/mm/leon_mm.c
index 5bed085..dca5e18 100644
--- a/arch/sparc/mm/leon_mm.c
+++ b/arch/sparc/mm/leon_mm.c
@@ -192,7 +192,7 @@ void leon_flush_dcache_all(void)
 
 void leon_flush_pcache_all(struct vm_area_struct *vma, unsigned long page)
 {
-	if (vma->vm_flags & VM_EXEC)
+	if (!vma || vma->vm_flags & VM_EXEC)
 		leon_flush_icache_all();
 	leon_flush_dcache_all();
 }
diff --git a/arch/tile/include/asm/cacheflush.h b/arch/tile/include/asm/cacheflush.h
index 92ee4c8..7b7022c 100644
--- a/arch/tile/include/asm/cacheflush.h
+++ b/arch/tile/include/asm/cacheflush.h
@@ -66,7 +66,7 @@ static inline void copy_to_user_page(struct vm_area_struct *vma,
 				     void *dst, void *src, int len)
 {
 	memcpy(dst, src, len);
-	if (vma->vm_flags & VM_EXEC) {
+	if (!vma || vma->vm_flags & VM_EXEC) {
 		flush_icache_range((unsigned long) dst,
 				   (unsigned long) dst + len);
 	}
diff --git a/arch/unicore32/mm/flush.c b/arch/unicore32/mm/flush.c
index 6d4c096..10ddab3 100644
--- a/arch/unicore32/mm/flush.c
+++ b/arch/unicore32/mm/flush.c
@@ -36,7 +36,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 			 unsigned long uaddr, void *kaddr, unsigned long len)
 {
 	/* VIPT non-aliasing D-cache */
-	if (vma->vm_flags & VM_EXEC) {
+	if (!vma || vma->vm_flags & VM_EXEC) {
 		unsigned long addr = (unsigned long)kaddr;
 
 		__cpuc_coherent_kern_range(addr, addr + len);
diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c
index ba4c47f..d34c06c 100644
--- a/arch/xtensa/mm/cache.c
+++ b/arch/xtensa/mm/cache.c
@@ -221,10 +221,10 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 		unsigned long t = TLBTEMP_BASE_1 + (vaddr & DCACHE_ALIAS_MASK);
 
 		__flush_invalidate_dcache_range((unsigned long) dst, len);
-		if ((vma->vm_flags & VM_EXEC) != 0)
+		if (!vma || (vma->vm_flags & VM_EXEC) != 0)
 			__invalidate_icache_page_alias(t, phys);
 
-	} else if ((vma->vm_flags & VM_EXEC) != 0) {
+	} else if (!vma || (vma->vm_flags & VM_EXEC) != 0) {
 		__flush_dcache_range((unsigned long)dst,len);
 		__invalidate_icache_range((unsigned long) dst, len);
 	}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..2e976fb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -241,7 +241,7 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in
 static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
 {
 	void *kaddr = kmap_atomic(page);
-	memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
+	copy_to_user_page(NULL, page, vaddr, kaddr + (vaddr & ~PAGE_MASK), src, len);
 	kunmap_atomic(kaddr);
 }
 
@@ -1299,11 +1299,6 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 	/* Initialize the slot */
 	copy_to_page(area->page, xol_vaddr,
 			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-	/*
-	 * We probably need flush_icache_user_range() but it needs vma.
-	 * This should work on supported architectures too.
-	 */
-	flush_dcache_page(area->page);
 
 	return xol_vaddr;
 }
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11  3:45       ` David Long
@ 2014-04-11  4:36         ` David Miller
  2014-04-11 14:26           ` Victor Kamensky
  2014-04-11 14:56           ` Oleg Nesterov
  2014-04-11 13:08         ` Oleg Nesterov
  2014-04-23 10:45         ` Catalin Marinas
  2 siblings, 2 replies; 66+ messages in thread
From: David Miller @ 2014-04-11  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

From: David Long <dave.long@linaro.org>
Date: Thu, 10 Apr 2014 23:45:31 -0400

> Replace memcpy and dcache flush in generic uprobes with a call to
> copy_to_user_page(), which will do a proper flushing of kernel and
> user cache.  Also modify the inmplementation of copy_to_user_page
> to assume a NULL vma pointer means the user icache corresponding
> to this right is stale and needs to be flushed.  Note that this patch
> does not fix copy_to_user page for the sh, alpha, sparc, or mips
> architectures (which do not currently support uprobes).
> 
> Signed-off-by: David A. Long <dave.long@linaro.org>

You really need to pass the proper VMA down to the call site
rather than pass NULL, that's extremely ugly and totally
unnecesary.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11  3:45       ` David Long
  2014-04-11  4:36         ` David Miller
@ 2014-04-11 13:08         ` Oleg Nesterov
  2014-04-23 10:45         ` Catalin Marinas
  2 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/10, David Long wrote:
>
>  static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
>  {
>  	void *kaddr = kmap_atomic(page);
> -	memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
> +	copy_to_user_page(NULL, page, vaddr, kaddr + (vaddr & ~PAGE_MASK), src, len);

No, no, this is not what we want... And I do not think we should change
copy_to_user_page().

I'll write another email...

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11  4:36         ` David Miller
@ 2014-04-11 14:26           ` Victor Kamensky
  2014-04-11 14:35             ` Oleg Nesterov
  2014-04-11 14:55             ` Victor Kamensky
  2014-04-11 14:56           ` Oleg Nesterov
  1 sibling, 2 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-11 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 10 April 2014 21:36, David Miller <davem@davemloft.net> wrote:
> From: David Long <dave.long@linaro.org>
> Date: Thu, 10 Apr 2014 23:45:31 -0400
>
>> Replace memcpy and dcache flush in generic uprobes with a call to
>> copy_to_user_page(), which will do a proper flushing of kernel and
>> user cache.  Also modify the inmplementation of copy_to_user_page
>> to assume a NULL vma pointer means the user icache corresponding
>> to this right is stale and needs to be flushed.  Note that this patch
>> does not fix copy_to_user page for the sh, alpha, sparc, or mips
>> architectures (which do not currently support uprobes).
>>
>> Signed-off-by: David A. Long <dave.long@linaro.org>
>
> You really need to pass the proper VMA down to the call site
> rather than pass NULL, that's extremely ugly and totally
> unnecesary.

Agreed that VMA is really needed.

Here is variant that I tried while waiting for Oleg's response:

>From 4a6a9043e0910041dd8842835a528cbdc39fad34 Mon Sep 17 00:00:00 2001
From: Victor Kamensky <victor.kamensky@linaro.org>
Date: Thu, 10 Apr 2014 17:06:39 -0700
Subject: [PATCH] uprobes: use copy_to_user_page function to copy instr to xol
 area

Use copy_to_user_page function to copy instruction into xol area.
copy_to_user_page function guarantee that all caches are correctly
flushed during such write (including icache as well if needed).

Because copy_to_user_page needs vm_area_struct, vma
field was added into struct xol_area. It holds cached vma value
for xol_area.

Also using copy_to_user_page we make sure that we use the same
code that ptrace write code uses.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 kernel/events/uprobes.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..1ae4563 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -117,6 +117,7 @@ struct xol_area {
      * the vma go away, and we must handle that reasonably gracefully.
      */
     unsigned long         vaddr;        /* Page(s) of instruction slots */
+    struct vm_area_struct    *vma;        /* VMA that holds above address */
 };

 /*
@@ -1150,6 +1151,7 @@ static int xol_add_vma(struct mm_struct *mm,
struct xol_area *area)

     ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
                 VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
+    area->vma = find_vma(mm, area->vaddr);
     if (ret)
         goto fail;

@@ -1287,6 +1289,7 @@ static unsigned long xol_get_insn_slot(struct
uprobe *uprobe)
 {
     struct xol_area *area;
     unsigned long xol_vaddr;
+    void *xol_page_kaddr;

     area = get_xol_area();
     if (!area)
@@ -1297,8 +1300,11 @@ static unsigned long xol_get_insn_slot(struct
uprobe *uprobe)
         return 0;

     /* Initialize the slot */
-    copy_to_page(area->page, xol_vaddr,
-            &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
+    xol_page_kaddr = kmap_atomic(area->page);
+    copy_to_user_page(area->vma, area->page, xol_vaddr,
+              xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
+              &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
+    kunmap_atomic(xol_page_kaddr);
     /*
      * We probably need flush_icache_user_range() but it needs vma.
      * This should work on supported architectures too.
-- 
1.8.1.4

Now while waiting for Oleg's response I am trying to run some performance
tests between different solution to understand situation better.

One of my concerns was ARM smp function call that
could be reached from copy_to_user_page. But I have hard time
finding CPU that really doing it ... It does not look such call
will happen on relatively modern ARM CPUs (tried Arndale,
and Pandaboard ES). So my issues with use
of copy_to_user_page are quite elevated.

Thanks,
Victor

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 14:26           ` Victor Kamensky
@ 2014-04-11 14:35             ` Oleg Nesterov
  2014-04-11 14:55             ` Victor Kamensky
  1 sibling, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Victor Kamensky wrote:
>
> On 10 April 2014 21:36, David Miller <davem@davemloft.net> wrote:
> > You really need to pass the proper VMA down to the call site
> > rather than pass NULL, that's extremely ugly and totally
> > unnecesary.
>
> Agreed that VMA is really needed.

I do not ;) but I am still trying to finish my email...

> index 04709b6..1ae4563 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -117,6 +117,7 @@ struct xol_area {
>       * the vma go away, and we must handle that reasonably gracefully.
>       */
>      unsigned long         vaddr;        /* Page(s) of instruction slots */
> +    struct vm_area_struct    *vma;        /* VMA that holds above address */
>  };
> 
>  /*
> @@ -1150,6 +1151,7 @@ static int xol_add_vma(struct mm_struct *mm,
> struct xol_area *area)
> 
>      ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
>                  VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
> +    area->vma = find_vma(mm, area->vaddr);

No, this can't work. This vma can be unmapped/freed/etc.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 14:26           ` Victor Kamensky
  2014-04-11 14:35             ` Oleg Nesterov
@ 2014-04-11 14:55             ` Victor Kamensky
  1 sibling, 0 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-11 14:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 April 2014 07:26, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> On 10 April 2014 21:36, David Miller <davem@davemloft.net> wrote:
>> From: David Long <dave.long@linaro.org>
>> Date: Thu, 10 Apr 2014 23:45:31 -0400
>>
>>> Replace memcpy and dcache flush in generic uprobes with a call to
>>> copy_to_user_page(), which will do a proper flushing of kernel and
>>> user cache.  Also modify the inmplementation of copy_to_user_page
>>> to assume a NULL vma pointer means the user icache corresponding
>>> to this right is stale and needs to be flushed.  Note that this patch
>>> does not fix copy_to_user page for the sh, alpha, sparc, or mips
>>> architectures (which do not currently support uprobes).
>>>
>>> Signed-off-by: David A. Long <dave.long@linaro.org>
>>
>> You really need to pass the proper VMA down to the call site
>> rather than pass NULL, that's extremely ugly and totally
>> unnecesary.
>
> Agreed that VMA is really needed.
>
> Here is variant that I tried while waiting for Oleg's response:
>
> From 4a6a9043e0910041dd8842835a528cbdc39fad34 Mon Sep 17 00:00:00 2001
> From: Victor Kamensky <victor.kamensky@linaro.org>
> Date: Thu, 10 Apr 2014 17:06:39 -0700
> Subject: [PATCH] uprobes: use copy_to_user_page function to copy instr to xol
>  area
>
> Use copy_to_user_page function to copy instruction into xol area.
> copy_to_user_page function guarantee that all caches are correctly
> flushed during such write (including icache as well if needed).
>
> Because copy_to_user_page needs vm_area_struct, vma
> field was added into struct xol_area. It holds cached vma value
> for xol_area.
>
> Also using copy_to_user_page we make sure that we use the same
> code that ptrace write code uses.
>
> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
> ---
>  kernel/events/uprobes.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 04709b6..1ae4563 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -117,6 +117,7 @@ struct xol_area {
>       * the vma go away, and we must handle that reasonably gracefully.
>       */
>      unsigned long         vaddr;        /* Page(s) of instruction slots */
> +    struct vm_area_struct    *vma;        /* VMA that holds above address */
>  };
>
>  /*
> @@ -1150,6 +1151,7 @@ static int xol_add_vma(struct mm_struct *mm,
> struct xol_area *area)
>
>      ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
>                  VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
> +    area->vma = find_vma(mm, area->vaddr);
>      if (ret)
>          goto fail;
>
> @@ -1287,6 +1289,7 @@ static unsigned long xol_get_insn_slot(struct
> uprobe *uprobe)
>  {
>      struct xol_area *area;
>      unsigned long xol_vaddr;
> +    void *xol_page_kaddr;
>
>      area = get_xol_area();
>      if (!area)
> @@ -1297,8 +1300,11 @@ static unsigned long xol_get_insn_slot(struct
> uprobe *uprobe)
>          return 0;
>
>      /* Initialize the slot */
> -    copy_to_page(area->page, xol_vaddr,
> -            &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
> +    xol_page_kaddr = kmap_atomic(area->page);
> +    copy_to_user_page(area->vma, area->page, xol_vaddr,
> +              xol_page_kaddr + (xol_vaddr & ~PAGE_MASK),
> +              &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
> +    kunmap_atomic(xol_page_kaddr);
>      /*
>       * We probably need flush_icache_user_range() but it needs vma.
>       * This should work on supported architectures too.

Above comment and following call to 'flush_dcache_page(area->page);'
should be removed of course. Just noticed that ...

Thanks,
Victor

> --
> 1.8.1.4
>
> Now while waiting for Oleg's response I am trying to run some performance
> tests between different solution to understand situation better.
>
> One of my concerns was ARM smp function call that
> could be reached from copy_to_user_page. But I have hard time
> finding CPU that really doing it ... It does not look such call
> will happen on relatively modern ARM CPUs (tried Arndale,
> and Pandaboard ES). So my issues with use
> of copy_to_user_page are quite elevated.
>
> Thanks,
> Victor

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11  4:36         ` David Miller
  2014-04-11 14:26           ` Victor Kamensky
@ 2014-04-11 14:56           ` Oleg Nesterov
  2014-04-11 15:22             ` Oleg Nesterov
                               ` (2 more replies)
  1 sibling, 3 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

First of all: I do not pretend I really understand the problems with
icache/etc coherency and how flush_icache_range() actually works on
alpha. Help.

For those who were not cc'ed. A probed task has a special "xol" vma,
it is used to execute the probed insn out of line.

Initial implementation was x86 only, so we simply copied the probed
insn into this vma and everything was fine, flush_icache_range() is
nop on x86.

Then we added flush_dcache_page() for powerpc,

	// this is just kmap() + memcpy()
	copy_to_page(area->page, xol_vaddr,
			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));

	/*
	 * We probably need flush_icache_user_range() but it needs vma.
	 * This should work on supported architectures too.
	 */
	flush_dcache_page(area->page);

but this doesn't work on arm. So we need another fix.

On 04/11, David Miller wrote:
>
> From: David Long <dave.long@linaro.org>
> Date: Thu, 10 Apr 2014 23:45:31 -0400
>
> > Replace memcpy and dcache flush in generic uprobes with a call to
> > copy_to_user_page(), which will do a proper flushing of kernel and
> > user cache.  Also modify the inmplementation of copy_to_user_page
> > to assume a NULL vma pointer means the user icache corresponding
> > to this right is stale and needs to be flushed.  Note that this patch
> > does not fix copy_to_user page for the sh, alpha, sparc, or mips
> > architectures (which do not currently support uprobes).
> >
> > Signed-off-by: David A. Long <dave.long@linaro.org>
>
> You really need to pass the proper VMA down to the call site
> rather than pass NULL, that's extremely ugly and totally
> unnecesary.

I agree that we should not change copy_to_user_page(), but I am not sure
the code above should use copy_to_user_page().

Because the code above really differs in my opinion from the only user of
copy_to_user_page(), __access_remote_vm().

	1. First of all, we do not know vma.

	   OK, we can down_read(mmap_sem) and do find_vma() of course.
	   This is a bit unfortunate, especially because the architectures
	   we currently support do not need this.

	   But,

	2. The problem is, it would be very nice to remove this vma, or
	   at least hide it somehow from find_vma/etc. This is the special
	   mapping we do not want to expose to user-space.

	   In fact I even have the patches which remove this vma, but they
	   do not work with compat tasks unfortunately.

	3. Unlike __access_remote_vm() we always use current->mm, and the
	   memory range changed by the code above can only be used by
	   "current" thread.

	   So (perhaps) flush_icache_user_range() can even treat this case
	   as "mm->mm_users) <= 1" and avoid ipi_flush_icache_page (just in
	   case, of course I do not know if this is actually possible).

So can't we do something else? Lets forget about arch/arm for the moment,
suppose that we want to support uprobes on alpha. Can't we do _something_
like below?

And perhap we can do something like this on arm? it can do kmap(page) /
page_address(page) itself.

Oleg.

--- x/arch/alpha/kernel/smp.c
+++ x/arch/alpha/kernel/smp.c
@@ -751,15 +751,9 @@ ipi_flush_icache_page(void *x)
 		flush_tlb_other(mm);
 }
 
-void
-flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
-			unsigned long addr, int len)
-{
-	struct mm_struct *mm = vma->vm_mm;
-
-	if ((vma->vm_flags & VM_EXEC) == 0)
-		return;
 
+void __flush_icache_page_xxx(struct mm_struct *mm, struct page *page) // addr, len ?
+{
 	preempt_disable();
 
 	if (mm == current->active_mm) {
@@ -783,3 +777,24 @@ flush_icache_user_range(struct vm_area_s
 
 	preempt_enable();
 }
+
+void flush_icache_page_xxx(struct mm_struct *mm, struct page *page)
+{
+	struct mm_struct *mm = current->mm;
+
+	down_read(&mm->mmap_sem);
+	__flush_icache_page_xxx(mm, page);
+	up_read(&mm->mmap_sem);
+}
+
+void
+flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
+			unsigned long addr, int len)
+{
+	struct mm_struct *mm = vma->vm_mm;
+
+	if ((vma->vm_flags & VM_EXEC) == 0)
+		return;
+
+	__flush_icache_page_xxx(mm, page);
+}

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 14:56           ` Oleg Nesterov
@ 2014-04-11 15:22             ` Oleg Nesterov
  2014-04-11 15:30               ` Russell King - ARM Linux
  2014-04-11 15:32               ` Peter Zijlstra
  2014-04-11 15:37             ` Victor Kamensky
  2014-04-11 15:42             ` Linus Torvalds
  2 siblings, 2 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Oleg Nesterov wrote:
>
> Can't we do _something_
> like below?

If not, I'd propose the patch below.

I can be easily wrong, but it seems that arch/arm can reimplement
arch_uprobe_flush_xol_icache() and do flush_ptrace_access()-like
code. It needs kaddr, but this is not a problem.

Btw. From arch/arm/include/asm/cacheflush.h

	#define flush_icache_user_range(vma,page,addr,len) \
		flush_dcache_page(page)

but it has no users?

And I am just curious, why arm's copy_to_user_page() disables premption
before memcpy?

Oleg.

--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -1274,6 +1274,17 @@ static unsigned long xol_take_insn_slot(
 	return slot_addr;
 }
 
+void __weak arch_uprobe_flush_xol_icache(struct page *page,
+					 unsigned long vaddr, int len)
+{
+	/*
+	 * We need copy_to_user_page/flush_icache_user_range but this
+	 * needs vma. If this doesn't work on your arch, reimplement.
+	 */
+	flush_dcache_page(area->page);
+
+}
+
 /*
  * xol_get_insn_slot - allocate a slot for xol.
  * Returns the allocated slot address or 0.
@@ -1294,11 +1305,8 @@ static unsigned long xol_get_insn_slot(s
 	/* Initialize the slot */
 	copy_to_page(area->page, xol_vaddr,
 			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-	/*
-	 * We probably need flush_icache_user_range() but it needs vma.
-	 * This should work on supported architectures too.
-	 */
-	flush_dcache_page(area->page);
+	arch_uprobe_flush_xol_icache(area->page, xol_vaddr,
+			sizeof(uprobe->arch.ixol));
 
 	return xol_vaddr;
 }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 15:22             ` Oleg Nesterov
@ 2014-04-11 15:30               ` Russell King - ARM Linux
  2014-04-11 17:24                 ` Oleg Nesterov
  2014-04-11 17:43                 ` David Miller
  2014-04-11 15:32               ` Peter Zijlstra
  1 sibling, 2 replies; 66+ messages in thread
From: Russell King - ARM Linux @ 2014-04-11 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote:
> On 04/11, Oleg Nesterov wrote:
> >
> > Can't we do _something_
> > like below?
> 
> If not, I'd propose the patch below.
> 
> I can be easily wrong, but it seems that arch/arm can reimplement
> arch_uprobe_flush_xol_icache() and do flush_ptrace_access()-like
> code. It needs kaddr, but this is not a problem.
> 
> Btw. From arch/arm/include/asm/cacheflush.h
> 
> 	#define flush_icache_user_range(vma,page,addr,len) \
> 		flush_dcache_page(page)
> 
> but it has no users?

I wonder whether you've read this yet:

	http://lkml.iu.edu//hypermail/linux/kernel/1404.1/00725.html

where I proposed removing flush_icache_user_range() since it's not used
on a great many architectures.

> And I am just curious, why arm's copy_to_user_page() disables premption
> before memcpy?

flush_ptrace_access() needs to run on the CPU which ended up with the
dirty cache line(s) to cope with those which do not have hardware
broadcasting of cache maintanence operations.

This is why the hacks that you're doing are just that - they're hacks
and are all broken in some way.

So, let me re-quote what I asked in a previous thread:

| Given that we've already solved that problem, wouldn't it be a good idea
| if the tracing code would stop trying to reinvent broken solutions to
| problems we have already solved?

I fail to see what your problem is with keeping the vma around, and
using that infrastructure.  If it needs optimisation for uprobes, then
let's optimise it.  Let's not go inventing a whole new interface
solving the same problem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 15:22             ` Oleg Nesterov
  2014-04-11 15:30               ` Russell King - ARM Linux
@ 2014-04-11 15:32               ` Peter Zijlstra
  2014-04-11 16:00                 ` Russell King - ARM Linux
  1 sibling, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2014-04-11 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote:
> And I am just curious, why arm's copy_to_user_page() disables premption
> before memcpy?

Without looking, I suspect its because the VIVT caches, they need to get
shot down on every context switch.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 14:56           ` Oleg Nesterov
  2014-04-11 15:22             ` Oleg Nesterov
@ 2014-04-11 15:37             ` Victor Kamensky
  2014-04-11 16:22               ` Oleg Nesterov
  2014-04-11 15:42             ` Linus Torvalds
  2 siblings, 1 reply; 66+ messages in thread
From: Victor Kamensky @ 2014-04-11 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 April 2014 07:56, Oleg Nesterov <oleg@redhat.com> wrote:
> First of all: I do not pretend I really understand the problems with
> icache/etc coherency and how flush_icache_range() actually works on
> alpha. Help.
>
> For those who were not cc'ed. A probed task has a special "xol" vma,
> it is used to execute the probed insn out of line.
>
> Initial implementation was x86 only, so we simply copied the probed
> insn into this vma and everything was fine, flush_icache_range() is
> nop on x86.
>
> Then we added flush_dcache_page() for powerpc,
>
>         // this is just kmap() + memcpy()
>         copy_to_page(area->page, xol_vaddr,
>                         &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
>
>         /*
>          * We probably need flush_icache_user_range() but it needs vma.
>          * This should work on supported architectures too.
>          */
>         flush_dcache_page(area->page);
>
> but this doesn't work on arm. So we need another fix.
>
> On 04/11, David Miller wrote:
>>
>> From: David Long <dave.long@linaro.org>
>> Date: Thu, 10 Apr 2014 23:45:31 -0400
>>
>> > Replace memcpy and dcache flush in generic uprobes with a call to
>> > copy_to_user_page(), which will do a proper flushing of kernel and
>> > user cache.  Also modify the inmplementation of copy_to_user_page
>> > to assume a NULL vma pointer means the user icache corresponding
>> > to this right is stale and needs to be flushed.  Note that this patch
>> > does not fix copy_to_user page for the sh, alpha, sparc, or mips
>> > architectures (which do not currently support uprobes).
>> >
>> > Signed-off-by: David A. Long <dave.long@linaro.org>
>>
>> You really need to pass the proper VMA down to the call site
>> rather than pass NULL, that's extremely ugly and totally
>> unnecesary.
>
> I agree that we should not change copy_to_user_page(), but I am not sure
> the code above should use copy_to_user_page().
>
> Because the code above really differs in my opinion from the only user of
> copy_to_user_page(), __access_remote_vm().
>
>         1. First of all, we do not know vma.
>
>            OK, we can down_read(mmap_sem) and do find_vma() of course.
>            This is a bit unfortunate, especially because the architectures
>            we currently support do not need this.

Question, maybe silly one but I don't know the answer, why can't we just do
look up for vma once and cache results in place like xol_area (along with
xol_area.vaddr) and use it all the time. IOW under what circumstances
vma for xol area can disappear change so we need constant lookup for it?
Comment in xol_area

>    /*
>    * We keep the vma's vm_start rather than a pointer to the vma
>     * itself.  The probed process or a naughty kernel module could make
>     * the vma go away, and we must handle that reasonably gracefully.
>     */
>     unsigned long         vaddr;        /* Page(s) of instruction slots */

alludes to some of those conditions, but I don't quite follow.
Should not we go after "probed process" ability to unmap xol area.
xol area is like vdso, signal page and other mapping injected by
kernel into user process address space, mmap call should ignore
those.. I wonder what would happen if process would try to unmap
vdso region.

>            But,
>
>         2. The problem is, it would be very nice to remove this vma, or
>            at least hide it somehow from find_vma/etc. This is the special
>            mapping we do not want to expose to user-space.
>
>            In fact I even have the patches which remove this vma, but they
>            do not work with compat tasks unfortunately.

I don't think it is right route. Xol area as well as vdso, signal page, etc
should be visible as regular VMAs. There are other aspects of the system
where they needed. Like core file collection - I would like to have
xol area present in my core file if traced process crashed. Like
/porc/<pid>/maps - I would like to see my memory layout through
this interface and I would like to see xol area there because I
can see xol area addresses by some other means.

>         3. Unlike __access_remote_vm() we always use current->mm, and the
>            memory range changed by the code above can only be used by
>            "current" thread.
>
>            So (perhaps) flush_icache_user_range() can even treat this case
>            as "mm->mm_users) <= 1" and avoid ipi_flush_icache_page (just in
>            case, of course I do not know if this is actually possible).
>
> So can't we do something else? Lets forget about arch/arm for the moment,
> suppose that we want to support uprobes on alpha. Can't we do _something_
> like below?

Appeal of copy_to_user_page approach is that I don't need to know
how to handle sync up of icache and dcache on that architecture, it is
already done by someone else when they programmed basic ptrace
breakpoint write behavior.

Thanks,
Victor

> And perhap we can do something like this on arm? it can do kmap(page) /
> page_address(page) itself.
>
> Oleg.
>
> --- x/arch/alpha/kernel/smp.c
> +++ x/arch/alpha/kernel/smp.c
> @@ -751,15 +751,9 @@ ipi_flush_icache_page(void *x)
>                 flush_tlb_other(mm);
>  }
>
> -void
> -flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
> -                       unsigned long addr, int len)
> -{
> -       struct mm_struct *mm = vma->vm_mm;
> -
> -       if ((vma->vm_flags & VM_EXEC) == 0)
> -               return;
>
> +void __flush_icache_page_xxx(struct mm_struct *mm, struct page *page) // addr, len ?
> +{
>         preempt_disable();
>
>         if (mm == current->active_mm) {
> @@ -783,3 +777,24 @@ flush_icache_user_range(struct vm_area_s
>
>         preempt_enable();
>  }
> +
> +void flush_icache_page_xxx(struct mm_struct *mm, struct page *page)
> +{
> +       struct mm_struct *mm = current->mm;
> +
> +       down_read(&mm->mmap_sem);
> +       __flush_icache_page_xxx(mm, page);
> +       up_read(&mm->mmap_sem);
> +}
> +
> +void
> +flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
> +                       unsigned long addr, int len)
> +{
> +       struct mm_struct *mm = vma->vm_mm;
> +
> +       if ((vma->vm_flags & VM_EXEC) == 0)
> +               return;
> +
> +       __flush_icache_page_xxx(mm, page);
> +}
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 14:56           ` Oleg Nesterov
  2014-04-11 15:22             ` Oleg Nesterov
  2014-04-11 15:37             ` Victor Kamensky
@ 2014-04-11 15:42             ` Linus Torvalds
  2 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2014-04-11 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 7:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> First of all: I do not pretend I really understand the problems with
> icache/etc coherency and how flush_icache_range() actually works on
> alpha. Help.

According to the alpha architecture rules, the instruction cache can
be completely virtual, and is not only not coherent with the data
cache, it's not even necessarily coherent with TLB mapping changes (ie
it's purely virtual, and you need to flush it if you change
instruction mappings). The virtual caches do have an address space
number, so you can have multiple separate virtual address spaces.

The way to flush it is with the "imb" instruction (which is not
actually an instruction at all, it's a jump to PAL-code, alpha's
"explicit microcode")

That means that when you modify data that could be code, you do need
to do an "imb" _and_ you do need to do it cross-cpu even for
thread-local cases in case your thread migrates to another CPU with
stale I$ data (the ASN will be the same). You can use the usual VM
cpu-mask to tell which other CPU's you'd need to do it on, though.

But alpha does not need page or addr/len, because "imb" is "make the
whole instruction cache coherent".

Your patch looks correct for alpha, afaik.

               Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 15:32               ` Peter Zijlstra
@ 2014-04-11 16:00                 ` Russell King - ARM Linux
  2014-04-11 18:39                   ` Peter Zijlstra
  0 siblings, 1 reply; 66+ messages in thread
From: Russell King - ARM Linux @ 2014-04-11 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 05:32:49PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote:
> > And I am just curious, why arm's copy_to_user_page() disables premption
> > before memcpy?
> 
> Without looking, I suspect its because the VIVT caches, they need to get
> shot down on every context switch.

So... let's think about that for a moment... if we have a preemption event,
then that's a context switch, which means...

No, this is obviously not the reason, because such an event on a fully
VIVT system would result in the caches being flushed, meaning that we
wouldn't need to do anything if we could be predicably preempted at that
point.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 15:37             ` Victor Kamensky
@ 2014-04-11 16:22               ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Victor Kamensky wrote:
>
> On 11 April 2014 07:56, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >         1. First of all, we do not know vma.
> >
> >            OK, we can down_read(mmap_sem) and do find_vma() of course.
> >            This is a bit unfortunate, especially because the architectures
> >            we currently support do not need this.
>
> Question, maybe silly one but I don't know the answer, why can't we just do
> look up for vma once and cache results in place like xol_area (along with
> xol_area.vaddr) and use it all the time. IOW under what circumstances
> vma for xol area can disappear change so we need constant lookup for it?
> Comment in xol_area
>
> >    /*
> >    * We keep the vma's vm_start rather than a pointer to the vma
> >     * itself.  The probed process or a naughty kernel module could make
> >     * the vma go away, and we must handle that reasonably gracefully.
> >     */
> >     unsigned long         vaddr;        /* Page(s) of instruction slots */
>
> alludes to some of those conditions, but I don't quite follow.
> Should not we go after "probed process" ability to unmap xol area.
> xol area is like vdso,

But it is not like vdso. And (unlike vsyscall page) vdso can be unmapped
too (unless it is FIX_VDSO).

> mmap call should ignore
> those..

This is not that simple, this means more ugly uprobe_ hooks in mm/.
And I think we simply do not want/need this.

I didn't write the comment above, but "reasonably gracefully" should mean
"we should not allow unmap/remap/etc(xol_area) crash the kernel, the task
can crashif it does this, we do not care".

The same for vdso, except in this case the kernel can simply forget about
this area after it does setup_additional_pages().

> >         2. The problem is, it would be very nice to remove this vma, or
> >            at least hide it somehow from find_vma/etc. This is the special
> >            mapping we do not want to expose to user-space.
> >
> >            In fact I even have the patches which remove this vma, but they
> >            do not work with compat tasks unfortunately.
>
> I don't think it is right route. Xol area as well as vdso, signal page, etc
> should be visible as regular VMAs. There are other aspects of the system
> where they needed. Like core file collection - I would like to have
> xol area present in my core file if traced process crashed.

It must never crash in xol_area, or we have a kernel bug. (we do have such
a bug which I am trying to fix right now ;)

> /porc/<pid>/maps - I would like to see my memory layout through
> this interface and I would like to see xol area there because I
> can see xol area addresses by some other means.

But it is not "your memory", to some degree. I mean, it would be nice if
it was not.

This should be more like vsyscall page. And indeed, we can move this into
FIXMAP area. The only problem, 32bit task can't use this area in 64-bit
machine.

> Appeal of copy_to_user_page approach is that I don't need to know
> how to handle sync up of icache and dcache on that architecture,

Yes, sure, this is true.

> it is
> already done by someone else when they programmed basic ptrace
> breakpoint write behavior.

Yes, but (rightly or not) I still think that uprobes differs from ptrace.
Perhaps we do not have other choice though.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 15:30               ` Russell King - ARM Linux
@ 2014-04-11 17:24                 ` Oleg Nesterov
  2014-04-11 17:38                   ` Oleg Nesterov
  2014-04-11 17:50                   ` Linus Torvalds
  2014-04-11 17:43                 ` David Miller
  1 sibling, 2 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Russell King - ARM Linux wrote:
>
> On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote:
>
> I wonder whether you've read this yet:
>
> 	http://lkml.iu.edu//hypermail/linux/kernel/1404.1/00725.html

it seems that the only result of this discussion is "stop trying to
reinvent" you already quoted. Thanks.

> where I proposed removing flush_icache_user_range() since it's not used
> on a great many architectures.

Or at least it and its usage can be cleanuped somehow...

> > And I am just curious, why arm's copy_to_user_page() disables premption
> > before memcpy?
>
> flush_ptrace_access() needs to run on the CPU which ended up with the
> dirty cache line(s) to cope with those which do not have hardware
> broadcasting of cache maintanence operations.

Aha, thanks.

But you know, perhaps I'll ask you another stupid question later. Because
it still seems to me that we can do something better/cheaper in uprobe case.
Nevermind.

> This is why the hacks that you're doing are just that - they're hacks
> and are all broken in some way.

OK.

> I fail to see what your problem is with keeping the vma around,

We can't pin vm_area_struct.

> Let's not go inventing a whole new interface
> solving the same problem.

OK. How about the patch below?

Oleg.
---

index 2adbc97..9d45a4a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1274,6 +1274,33 @@ static unsigned long xol_take_insn_slot(struct xol_area *area)
 	return slot_addr;
 }
 
+static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr,
+				 	struct arch_uprobe *auprobe)
+{
+#ifndef ARCH_UPROBE_XXX
+	copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol));
+	/*
+	 * We probably need flush_icache_user_range() but it needs vma.
+	 * If this doesn't work define ARCH_UPROBE_XXX.
+	 */
+	flush_dcache_page(area->page);
+#else
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE);
+	if (vma) {
+		void *kaddr = kmap_atomic(area->page);
+		copy_to_user_page(vma, area->page,
+					vaddr, kaddr + (vaddr & ~PAGE_MASK),
+					&auprobe->ixol, sizeof(&auprobe->ixol));
+		kunmap_atomic(kaddr);
+	}
+	up_read(&mm->mmap_sem);
+#endif
+}
+
 /*
  * xol_get_insn_slot - allocate a slot for xol.
  * Returns the allocated slot address or 0.
@@ -1291,15 +1318,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 	if (unlikely(!xol_vaddr))
 		return 0;
 
-	/* Initialize the slot */
-	copy_to_page(area->page, xol_vaddr,
-			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-	/*
-	 * We probably need flush_icache_user_range() but it needs vma.
-	 * This should work on supported architectures too.
-	 */
-	flush_dcache_page(area->page);
-
+	arch_uprobe_copy_ixol(area, xol_vaddr, &uprobe->arch);
 	return xol_vaddr;
 }
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 17:24                 ` Oleg Nesterov
@ 2014-04-11 17:38                   ` Oleg Nesterov
  2014-04-11 18:00                     ` David Miller
  2014-04-11 17:50                   ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Oleg Nesterov wrote:
>
> +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr,
> +				 	struct arch_uprobe *auprobe)
> +{
> +#ifndef ARCH_UPROBE_XXX
> +	copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol));
> +	/*
> +	 * We probably need flush_icache_user_range() but it needs vma.
> +	 * If this doesn't work define ARCH_UPROBE_XXX.
> +	 */
> +	flush_dcache_page(area->page);
> +#else
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +
> +	down_read(&mm->mmap_sem);
> +	vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE);
> +	if (vma) {
> +		void *kaddr = kmap_atomic(area->page);
> +		copy_to_user_page(vma, area->page,
> +					vaddr, kaddr + (vaddr & ~PAGE_MASK),
> +					&auprobe->ixol, sizeof(&auprobe->ixol));
> +		kunmap_atomic(kaddr);
> +	}
> +	up_read(&mm->mmap_sem);
> +#endif

And perhaps the patch is not complete. "if (vma)" is not enough, a probed
task can mmap something else at this vaddr.

copy_to_user_page() should only change the contents of area->page, so memcpy
should be fine. But I am not sure that flush_icache_user_range() or
flush_ptrace_access() is always safe on every arch if "struct page *page"
doesn't match vma.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 15:30               ` Russell King - ARM Linux
  2014-04-11 17:24                 ` Oleg Nesterov
@ 2014-04-11 17:43                 ` David Miller
  1 sibling, 0 replies; 66+ messages in thread
From: David Miller @ 2014-04-11 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Fri, 11 Apr 2014 16:30:41 +0100

> | Given that we've already solved that problem, wouldn't it be a good idea
> | if the tracing code would stop trying to reinvent broken solutions to
> | problems we have already solved?
> 
> I fail to see what your problem is with keeping the vma around, and
> using that infrastructure.  If it needs optimisation for uprobes, then
> let's optimise it.  Let's not go inventing a whole new interface
> solving the same problem.

+1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 17:24                 ` Oleg Nesterov
  2014-04-11 17:38                   ` Oleg Nesterov
@ 2014-04-11 17:50                   ` Linus Torvalds
  2014-04-11 18:02                     ` David Miller
  2014-04-14 18:59                     ` Oleg Nesterov
  1 sibling, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2014-04-11 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr,
> +                                       struct arch_uprobe *auprobe)
> +{
> +#ifndef ARCH_UPROBE_XXX
> +       copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol));
> +       /*
> +        * We probably need flush_icache_user_range() but it needs vma.
> +        * If this doesn't work define ARCH_UPROBE_XXX.
> +        */
> +       flush_dcache_page(area->page);
> +#else
> +       struct mm_struct *mm = current->mm;
> +       struct vm_area_struct *vma;
> +
> +       down_read(&mm->mmap_sem);
> +       vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE);
> +       if (vma) {
> +               void *kaddr = kmap_atomic(area->page);
> +               copy_to_user_page(vma, area->page,
> +                                       vaddr, kaddr + (vaddr & ~PAGE_MASK),
> +                                       &auprobe->ixol, sizeof(&auprobe->ixol));
> +               kunmap_atomic(kaddr);
> +       }
> +       up_read(&mm->mmap_sem);
> +#endif

Yeah, no, this is wrong.

the fact is, the *only* possible use for the whole "vma" argument is
the "can this be executable" optimization, afaik.

So I really think we should just have a fixed
"flush_icache_page(page,vaddr)" function. Maybe add a "len" argument,
to allow architectures that have to loop over cachelines to do just a
minimal loop.

Then, to do the vma optimization, let's introduce a new

    arch_needs_icache_flush(vma, page)

function, which on cache coherent architectures can just be zero (or
one, since the icache flush itself will be a no-op, so it doesn't
really matter), and on others it can do the "have we executed from
this page", which may involve just looking at the vma..

Then the uprobe case can just do

    copy_to_page()
    flush_dcache_page()
    flush_icache_page()

and be done with it.

Hmm?

                Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 17:38                   ` Oleg Nesterov
@ 2014-04-11 18:00                     ` David Miller
  2014-04-11 18:25                       ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-11 18:00 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Fri, 11 Apr 2014 19:38:53 +0200

> And perhaps the patch is not complete. "if (vma)" is not enough, a probed
> task can mmap something else at this vaddr.
> 
> copy_to_user_page() should only change the contents of area->page, so memcpy
> should be fine. But I am not sure that flush_icache_user_range() or
> flush_ptrace_access() is always safe on every arch if "struct page *page"
> doesn't match vma.

The architectures want the VMA for two reasons:

1) To get at the 'mm'.  The 'mm' is absolutely essential so that we can look
   at the MM cpumask and therefore determine what cpus this address space has
   executed upon, and therefore what cpus need the flush broadcast to.

2) To determine if the VMA is executable, in order to avoid the I-cache flush
   if possible.

I think you can get at the 'mm' trivially in this uprobes path, and we can just
as well assume that the VMA is executable since this thing is always writing
instructions.

So we could create a __copy_to_user_page() that takes an 'mm' and a boolean
'executable' which uprobes could unconditionally set true, and copy_to_user_page()
would then be implemented in terms of __copy_to_user_page().

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 17:50                   ` Linus Torvalds
@ 2014-04-11 18:02                     ` David Miller
  2014-04-11 18:11                       ` Linus Torvalds
  2014-04-11 18:13                       ` Victor Kamensky
  2014-04-14 18:59                     ` Oleg Nesterov
  1 sibling, 2 replies; 66+ messages in thread
From: David Miller @ 2014-04-11 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 11 Apr 2014 10:50:31 -0700

> So I really think we should just have a fixed
> "flush_icache_page(page,vaddr)" function. Maybe add a "len" argument,
> to allow architectures that have to loop over cachelines to do just a
> minimal loop.

It's not enough, we need to have the 'mm' so we can know what cpu's this
address space has executed upon, and therefore what cpus need the broadcast
flush.

See my other reply, we can just make a __copy_to_user_page() that takes 'mm'
and a boolean 'executable' which uprobes can unconditionally pass as true.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:02                     ` David Miller
@ 2014-04-11 18:11                       ` Linus Torvalds
  2014-04-11 18:19                         ` David Miller
  2014-04-11 18:13                       ` Victor Kamensky
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2014-04-11 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 11:02 AM, David Miller <davem@davemloft.net> wrote:
>
> It's not enough, we need to have the 'mm' so we can know what cpu's this
> address space has executed upon, and therefore what cpus need the broadcast
> flush.

Ok. But still, it shouldn't need "vma".

> See my other reply, we can just make a __copy_to_user_page() that takes 'mm'
> and a boolean 'executable' which uprobes can unconditionally pass as true.

Sure, that doesn't look disgusting. That said, I thought at least one
architecture (powerpc) did more than just check the executable bit: I
think somebody actually does a page-per-page "has this been mapped
executably" thing because their icache flush is *so* expensive. So
that boolean "executable" bit is potentially architecture-specific.

And quite frankly, using the "vma->vm_flags" sounds potentially
*incorrect* to me, since it really isn't about the vma. If you change
a page through a non-executable vma, you'd want to flush the icache
entry for that page mapped in a totally different vma. So I really get
the feeling that passing in "vma" is actively *wrong*. The vma
interface really makes little to no sense.

Hmm?

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:02                     ` David Miller
  2014-04-11 18:11                       ` Linus Torvalds
@ 2014-04-11 18:13                       ` Victor Kamensky
  2014-04-11 18:36                         ` Oleg Nesterov
  1 sibling, 1 reply; 66+ messages in thread
From: Victor Kamensky @ 2014-04-11 18:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 April 2014 11:02, David Miller <davem@davemloft.net> wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 11 Apr 2014 10:50:31 -0700
>
>> So I really think we should just have a fixed
>> "flush_icache_page(page,vaddr)" function. Maybe add a "len" argument,
>> to allow architectures that have to loop over cachelines to do just a
>> minimal loop.
>
> It's not enough, we need to have the 'mm' so we can know what cpu's this
> address space has executed upon, and therefore what cpus need the broadcast
> flush.

But in uprobes case xol slot where instruction write happened will be
used only by current CPU. The way I read uprobes code other core
when it hit the same uprobe address will use different xol slot. Xol slot
size is cache line so it will not be moved around. So as long as we
know for sure that while tasks performs single step on uprobe xol
area instruction it won't be migrated to another core we don't need to
do broadcast to any other cores.

Thanks,
Victor

> See my other reply, we can just make a __copy_to_user_page() that takes 'mm'
> and a boolean 'executable' which uprobes can unconditionally pass as true.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:11                       ` Linus Torvalds
@ 2014-04-11 18:19                         ` David Miller
  2014-04-11 18:24                           ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-11 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 11 Apr 2014 11:11:33 -0700

> And quite frankly, using the "vma->vm_flags" sounds potentially
> *incorrect* to me, since it really isn't about the vma. If you change
> a page through a non-executable vma, you'd want to flush the icache
> entry for that page mapped in a totally different vma. So I really get
> the feeling that passing in "vma" is actively *wrong*. The vma
> interface really makes little to no sense.
> 
> Hmm?

The vm_flags check is about "could it have gotten into the I-cache
via this VMA".

If the VMA protections change, we'd do a flush of some sort during
that change.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:19                         ` David Miller
@ 2014-04-11 18:24                           ` Linus Torvalds
  2014-04-11 18:58                             ` David Miller
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2014-04-11 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 11:19 AM, David Miller <davem@davemloft.net> wrote:
>
> The vm_flags check is about "could it have gotten into the I-cache
> via this VMA".

.. and that's obviously complete bullshit and wrong. Which is my point.

Now, it's possible that doing things right is just too much work for
architectures that don't even matter, but dammit, it's still wrong. If
you change a page, and it's executably mapped into some other vma, the
icache is possibly stale there. The whole _point_ of our cache
flushing is to make caches coherent, and anything that uses "vma" to
do so is *wrong*.

So your argument makes no sense. You're just re-stating that "it's
wrong", but you're re-stating it in a way that makes it sounds like it
could be right.

The "this page has been mapped executably" approach, in contrast, is
*correct*. It has a chance in hell of actually making caches coherent.

             Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:00                     ` David Miller
@ 2014-04-11 18:25                       ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Fri, 11 Apr 2014 19:38:53 +0200
>
> > And perhaps the patch is not complete. "if (vma)" is not enough, a probed
> > task can mmap something else at this vaddr.
> >
> > copy_to_user_page() should only change the contents of area->page, so memcpy
> > should be fine. But I am not sure that flush_icache_user_range() or
> > flush_ptrace_access() is always safe on every arch if "struct page *page"
> > doesn't match vma.
>
> The architectures want the VMA for two reasons:
>
> 1) To get at the 'mm'.  The 'mm' is absolutely essential so that we can look
>    at the MM cpumask and therefore determine what cpus this address space has
>    executed upon, and therefore what cpus need the flush broadcast to.
>
> 2) To determine if the VMA is executable, in order to avoid the I-cache flush
>    if possible.

Yes, thanks, this is clear.

> I think you can get at the 'mm' trivially in this uprobes path,

sure, it is always current->mm.

> and we can just
> as well assume that the VMA is executable since this thing is always writing
> instructions.

yes.

> So we could create a __copy_to_user_page() that takes an 'mm' and a boolean
> 'executable' which uprobes could unconditionally set true, and copy_to_user_page()
> would then be implemented in terms of __copy_to_user_page().

This needs a lot of per-arch changes. Plus, it seems, in general VM_EXEC
is not the only thing __copy_to_user_page() should take into account...

But at least we are starting to agree that we need something else ;)

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:13                       ` Victor Kamensky
@ 2014-04-11 18:36                         ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-11 18:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Victor Kamensky wrote:
>
> On 11 April 2014 11:02, David Miller <davem@davemloft.net> wrote:
> But in uprobes case xol slot where instruction write happened will be
> used only by current CPU. The way I read uprobes code other core
> when it hit the same uprobe address will use different xol slot. Xol slot
> size is cache line so it will not be moved around.

Yes.

> So as long as we
> know for sure that while tasks performs single step on uprobe xol
> area instruction it won't be migrated to another core we don't need to
> do broadcast to any other cores.

It can migrate to another CPU before it does single-step.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 16:00                 ` Russell King - ARM Linux
@ 2014-04-11 18:39                   ` Peter Zijlstra
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Zijlstra @ 2014-04-11 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 05:00:29PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 11, 2014 at 05:32:49PM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote:
> > > And I am just curious, why arm's copy_to_user_page() disables premption
> > > before memcpy?
> > 
> > Without looking, I suspect its because the VIVT caches, they need to get
> > shot down on every context switch.
> 
> So... let's think about that for a moment... if we have a preemption event,
> then that's a context switch, which means...
> 
> No, this is obviously not the reason, because such an event on a fully
> VIVT system would result in the caches being flushed, meaning that we
> wouldn't need to do anything if we could be predicably preempted at that
> point.

Yeah; I've since realized I was completely wrong about that. Thanks for
explaining though.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:24                           ` Linus Torvalds
@ 2014-04-11 18:58                             ` David Miller
  2014-04-11 19:24                               ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-11 18:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 11 Apr 2014 11:24:58 -0700

> On Fri, Apr 11, 2014 at 11:19 AM, David Miller <davem@davemloft.net> wrote:
>>
>> The vm_flags check is about "could it have gotten into the I-cache
>> via this VMA".
> 
> .. and that's obviously complete bullshit and wrong. Which is my point.
> 
> Now, it's possible that doing things right is just too much work for
> architectures that don't even matter, but dammit, it's still wrong. If
> you change a page, and it's executably mapped into some other vma, the
> icache is possibly stale there. The whole _point_ of our cache
> flushing is to make caches coherent, and anything that uses "vma" to
> do so is *wrong*.
> 
> So your argument makes no sense. You're just re-stating that "it's
> wrong", but you're re-stating it in a way that makes it sounds like it
> could be right.
> 
> The "this page has been mapped executably" approach, in contrast, is
> *correct*. It has a chance in hell of actually making caches coherent.

You're right that using VMA as a hint during ptrace accesses is bogus.
If it's writeable, shared, and executable, we won't do the right thing.

Since we do most of the cache flushing stuff during normal operations
at the PTE modification point, perhaps a piece of page state could be
used to handle this.  We already use such a thing for D-cache alias
flushing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 18:58                             ` David Miller
@ 2014-04-11 19:24                               ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2014-04-11 19:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 11:58 AM, David Miller <davem@davemloft.net> wrote:
>
> Since we do most of the cache flushing stuff during normal operations
> at the PTE modification point, perhaps a piece of page state could be
> used to handle this.  We already use such a thing for D-cache alias
> flushing.

So looking at the powerpc code, I thought ppc already did this, but it
seems to do something different: it lazily does the icache flush at
page fault time if the page has been marked by dcache flush (with the
PG_arch_1 bit indicating whether the page is coherent in the I$).

But I don't see it trying to actually flush the icache of already
mapped processes when modifying the dcache.

So while we *could* do that, apparently no architecture does this.
Even the one architecture that I thought did it doesn'r really try to
make things globally coherent.

(My "analysis" was mainly using "git grep", so maybe I missed something).

           Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11 17:50                   ` Linus Torvalds
  2014-04-11 18:02                     ` David Miller
@ 2014-04-14 18:59                     ` Oleg Nesterov
  2014-04-14 20:05                       ` Victor Kamensky
  1 sibling, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-14 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11, Linus Torvalds wrote:
>
> On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr,
> > +                                       struct arch_uprobe *auprobe)
> > +{
> > +#ifndef ARCH_UPROBE_XXX
> > +       copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol));
> > +       /*
> > +        * We probably need flush_icache_user_range() but it needs vma.
> > +        * If this doesn't work define ARCH_UPROBE_XXX.
> > +        */
> > +       flush_dcache_page(area->page);
> > +#else
> > +       struct mm_struct *mm = current->mm;
> > +       struct vm_area_struct *vma;
> > +
> > +       down_read(&mm->mmap_sem);
> > +       vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE);
> > +       if (vma) {
> > +               void *kaddr = kmap_atomic(area->page);
> > +               copy_to_user_page(vma, area->page,
> > +                                       vaddr, kaddr + (vaddr & ~PAGE_MASK),
> > +                                       &auprobe->ixol, sizeof(&auprobe->ixol));
> > +               kunmap_atomic(kaddr);
> > +       }
> > +       up_read(&mm->mmap_sem);
> > +#endif
>
> Yeah, no, this is wrong.

Yesss, agreed.

> So I really think we should just have a fixed
> "flush_icache_page(page,vaddr)" function.
> ...
> Then the uprobe case can just do
>
>     copy_to_page()
>     flush_dcache_page()
>     flush_icache_page()


And I obviously like this idea because (iiuc) it more or less matches
flush_icache_page_xxx() I tried to suggest.

But we need a short term solution for arm. And unless I misunderstood
Russell (this is quite possible), arm needs to disable preemption around
copy + flush.

Russel, so what do you think we can do for arm right now? Does the patch
above (and subsequent discussion) answer the "why reinvent" question ?

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-14 18:59                     ` Oleg Nesterov
@ 2014-04-14 20:05                       ` Victor Kamensky
  2014-04-14 21:40                         ` Victor Kamensky
  2014-04-15 15:46                         ` Oleg Nesterov
  0 siblings, 2 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-14 20:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote:
> On 04/11, Linus Torvalds wrote:
>>
>> On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr,
>> > +                                       struct arch_uprobe *auprobe)
>> > +{
>> > +#ifndef ARCH_UPROBE_XXX
>> > +       copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol));
>> > +       /*
>> > +        * We probably need flush_icache_user_range() but it needs vma.
>> > +        * If this doesn't work define ARCH_UPROBE_XXX.
>> > +        */
>> > +       flush_dcache_page(area->page);
>> > +#else
>> > +       struct mm_struct *mm = current->mm;
>> > +       struct vm_area_struct *vma;
>> > +
>> > +       down_read(&mm->mmap_sem);
>> > +       vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE);
>> > +       if (vma) {
>> > +               void *kaddr = kmap_atomic(area->page);
>> > +               copy_to_user_page(vma, area->page,
>> > +                                       vaddr, kaddr + (vaddr & ~PAGE_MASK),
>> > +                                       &auprobe->ixol, sizeof(&auprobe->ixol));
>> > +               kunmap_atomic(kaddr);
>> > +       }
>> > +       up_read(&mm->mmap_sem);
>> > +#endif
>>
>> Yeah, no, this is wrong.
>
> Yesss, agreed.
>
>> So I really think we should just have a fixed
>> "flush_icache_page(page,vaddr)" function.
>> ...
>> Then the uprobe case can just do
>>
>>     copy_to_page()
>>     flush_dcache_page()
>>     flush_icache_page()
>
>
> And I obviously like this idea because (iiuc) it more or less matches
> flush_icache_page_xxx() I tried to suggest.

Would not page granularity to be too expensive? Note you need to do that on
each probe hit and you flushing whole data and instruction page every time.
IMHO it will work correctly when you flush just few dcache/icache lines that
correspond to xol slot that got modified. Note copy_to_user_page takes
len that describes size of area that has to be flushed. Given that we are
flushing xol area page at this case; and nothing except one xol slot is
any interest for current task, and if CPU can flush one dcache and icache
page as quickly as it can flush range, my remark may not matter.

I personally would prefer if we could have function like copy_to_user_page
but without requirement to pass vma to it.

Thanks,
Victor

> But we need a short term solution for arm. And unless I misunderstood
> Russell (this is quite possible), arm needs to disable preemption around
> copy + flush.
>
> Russel, so what do you think we can do for arm right now? Does the patch
> above (and subsequent discussion) answer the "why reinvent" question ?
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-14 20:05                       ` Victor Kamensky
@ 2014-04-14 21:40                         ` Victor Kamensky
  2014-04-15 16:26                           ` Oleg Nesterov
  2014-04-15 15:46                         ` Oleg Nesterov
  1 sibling, 1 reply; 66+ messages in thread
From: Victor Kamensky @ 2014-04-14 21:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 April 2014 13:05, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 04/11, Linus Torvalds wrote:
>>>
>>> On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr,
>>> > +                                       struct arch_uprobe *auprobe)
>>> > +{
>>> > +#ifndef ARCH_UPROBE_XXX
>>> > +       copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol));
>>> > +       /*
>>> > +        * We probably need flush_icache_user_range() but it needs vma.
>>> > +        * If this doesn't work define ARCH_UPROBE_XXX.
>>> > +        */
>>> > +       flush_dcache_page(area->page);
>>> > +#else
>>> > +       struct mm_struct *mm = current->mm;
>>> > +       struct vm_area_struct *vma;
>>> > +
>>> > +       down_read(&mm->mmap_sem);
>>> > +       vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE);
>>> > +       if (vma) {
>>> > +               void *kaddr = kmap_atomic(area->page);
>>> > +               copy_to_user_page(vma, area->page,
>>> > +                                       vaddr, kaddr + (vaddr & ~PAGE_MASK),
>>> > +                                       &auprobe->ixol, sizeof(&auprobe->ixol));
>>> > +               kunmap_atomic(kaddr);
>>> > +       }
>>> > +       up_read(&mm->mmap_sem);
>>> > +#endif
>>>
>>> Yeah, no, this is wrong.
>>
>> Yesss, agreed.
>>
>>> So I really think we should just have a fixed
>>> "flush_icache_page(page,vaddr)" function.
>>> ...
>>> Then the uprobe case can just do
>>>
>>>     copy_to_page()
>>>     flush_dcache_page()
>>>     flush_icache_page()
>>
>>
>> And I obviously like this idea because (iiuc) it more or less matches
>> flush_icache_page_xxx() I tried to suggest.
>
> Would not page granularity to be too expensive? Note you need to do that on
> each probe hit and you flushing whole data and instruction page every time.
> IMHO it will work correctly when you flush just few dcache/icache lines that
> correspond to xol slot that got modified. Note copy_to_user_page takes
> len that describes size of area that has to be flushed. Given that we are
> flushing xol area page at this case; and nothing except one xol slot is
> any interest for current task, and if CPU can flush one dcache and icache
> page as quickly as it can flush range, my remark may not matter.
>
> I personally would prefer if we could have function like copy_to_user_page
> but without requirement to pass vma to it.

I was trying to collect some experimental data around this
discussion. I did not find anything super surprising and
I am not sure how it would matter, but since I collected it already,
I will just share anyway. The result covers only one architecture so
they should be taken with grain of salt.

Test was conducted on ARM h/w. Arndale with Exynos 5250 2 cores CPU
and Pandaboard ES with OMAP 4460 were tested.

The uporbes/systemtap test was arranged in the following way.
SystemTap module was counting number of times functions was
called. The SystemTap action is very simple, counter increment,
is close to noop operation that allows to see tracing overhead.

Traced user-land function had approximately 8000 instructions
(unoptimized empty loop of 1000 iterations, with each
interaction is 8 instructions). That function was constantly
called in the loop 1 million times, and that interval was timed.
SystemTap/uprobes testing was enabled and it was observed how
targeted user-land execution time changed.

Test scenarios and variations
-----------------------------

Here is scenarios where measurements took place:

vanilla - no tracing, 1 million calls of function that executes
8000 instructions

Oleg's fix - Oleg's fix proposed on [1]. Basically it uses
copy_to_user_page and it does dynamic look-up of xol area vma
every tracing

my arm specific fix - this one was proposed on as [2]. It is
close to discussed possible solution if we could have something
similar to copy_to_user_page function but which does not
required vma. My code tried to shared ARM backend of
copy_to_user_page and xol access flush function

Oleg's fix + forced broadcast - one of concerns I had is
situation where smp function call broadcast has to be happen
to flush icache. On both of my board that was not needed.
So to simulated such situation I've changed code ARM backend
of copy_to_user_page to do smp_call_function(flush_ptrace_access_other


Tested application had two possible dimensions:

1) number of threads that runs loop over traced function to see
how tracing would cope with multicore, default is only one
thread, but test could run another loop on second core.

2) number of mapping in target process, target process could
have 1000 files mapping to create bunch of vmas. It is to test
how much dynamic look-up of xol area vma matters.

Results
-------

Number shown in the table is time in microseconds to execute
tested function with and without tracing presence.

Please note well tracing overhead include all related to tracing
pieces, not only cache flush that is under discussion. Those pieces
will be: taking arch exception layer; uprobes layer; uprobes
arch specific layer (before/after); xol look-up, update and cache
flush; uprobes single stepping logic, systemtap module callback
that generated for .stp tracing script, etc


                       Arndale       Pandaboard ES

vanilla              5.0 (100%)         11.5 (100%)

Oleg's fix          9.8 (196%)         28.1 (244%)

Oleg's fix         10.0 (200%)         28.7 (250%)
+ 1000 mappings

Victor's fix        9.4 (188%)         26.4 (230%)

Oleg's fix         13.7 (274%)         39.8 (346%)
+ broadcast
1 thread

Oleg's fix         14.1 (282%)         41.6 (361%)
+ broadcast
2 threads


Observations
------------

x) generally uprobes tracing is a bit expensive 1 trace
roughly would cost around 10000 instructions/cycles

x) the way how cache is flushed matters somewhat, but
because of big overall tracing overhead those differences
may not matter

x) looking at 'Oleg's fix' vs 'Oleg's fix + 1000 mappings'
shows that vma look-up noticeable but the difference is
marginal. No surprise here: rb tree search works fast.

x) need to broadcast icache flush has most noticeable impact.
I am not sure how essential is that. Both tested platforms
Exynos 5250 and OMAP 4460 did not need that operation. I
am not sure what CPU would really have this issue ...

x) fix that I did for ARM that shares ARM code with
copy_to_user_page but does not need vma performs best.
Essentially it differs from 'Oleg's fix' that no vma lookup
at all. But I have to admit resulting code looks a bit
ugly. I wonder whether the gain matters enough ... maybe
not.


Dynamic xol slots vs cached to uprobe xol slots
-----------------------------------------------

This section could be a bit off topic, but introduce
interesting data point.

When I looked at current uprobes single step out line code
and compared it with code that was in the past (utrace
times) I noticed main essential difference how xol slots
are handled: Currently for each hit uprobes code allocate
xol slot and needs dcache/icache flush. But in the past
xol slot was attached/cached to uprobe entry and if there
were enough xol slots dcache/icache flush would happen
only once and latter tracing would not need it. For cases
where there were not enough xol slots lru algorithm was
used to rotate xol slots vs uprobes.

Previous uprobes mechanism was more immune to cost
of modifying instruction stream, because modifying
instruction stream was one time operation and after that
under normal circumstances traced apps did not touch
instructions at all. I am quite sure that semi-static
xol slot allocation scheme had it is own set of issues.

I've tried to hack cached xol slot scheme and measure
time difference it brings.

                         Arndale

hack static           8.9 (188%)
xol slot

8.9 microseconds gives idea about all other overheads
during uprobes tracing except of xol allocation and
icache/dcache flush. I.e cost of dynamically allocating
xol slot, dcache/icache flush and impact of cache flush
on application is around 0.5 and 1.1 microsecond as
long as no cache operations broadcast is involved. I.e
cost is not that big, as long as modern CPU that does
not need cache flush broadcasts, dynamic xol scheme
looks OK to me.

Raw Results and Test Source Code
--------------------------------

I don't publish my test source code and raw results
here because it is quite big. Raw results were
collected with target test running under perf, so
it could be seen how different schemes affect cache
and tlb misses. If anyone interested in source and
raw data please let me know I will post it here.

Thanks,
Victor

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/246595.html

[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245743.html

> Thanks,
> Victor
>
>> But we need a short term solution for arm. And unless I misunderstood
>> Russell (this is quite possible), arm needs to disable preemption around
>> copy + flush.
>>
>> Russel, so what do you think we can do for arm right now? Does the patch
>> above (and subsequent discussion) answer the "why reinvent" question ?
>>
>> Oleg.
>>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-14 20:05                       ` Victor Kamensky
  2014-04-14 21:40                         ` Victor Kamensky
@ 2014-04-15 15:46                         ` Oleg Nesterov
  2014-04-15 16:46                           ` Victor Kamensky
  2014-04-15 17:19                           ` David Long
  1 sibling, 2 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/14, Victor Kamensky wrote:
>
> On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 04/11, Linus Torvalds wrote:
> >>
> >> So I really think we should just have a fixed
> >> "flush_icache_page(page,vaddr)" function.
> >> ...
> >> Then the uprobe case can just do
> >>
> >>     copy_to_page()
> >>     flush_dcache_page()
> >>     flush_icache_page()
> >
> >
> > And I obviously like this idea because (iiuc) it more or less matches
> > flush_icache_page_xxx() I tried to suggest.
>
> Would not page granularity to be too expensive? Note you need to do that on
> each probe hit and you flushing whole data and instruction page every time.
> IMHO it will work correctly when you flush just few dcache/icache lines that
> correspond to xol slot that got modified. Note copy_to_user_page takes
> len that describes size of area that has to be flushed. Given that we are
> flushing xol area page at this case; and nothing except one xol slot is
> any interest for current task, and if CPU can flush one dcache and icache
> page as quickly as it can flush range, my remark may not matter.

We can add "vaddr, len" to the argument list.

> I personally would prefer if we could have function like copy_to_user_page
> but without requirement to pass vma to it.

I won't argue, but you need to convince maintainers.


And to remind, we need something simple/nonintrusive for arm right now.
Again, I won't argue if we turn copy_to_page() + flush_dcache_page() into
__weak arch_uprobe_copy_ixol(), and add the necessary hacks into arm's
implementatiion. This is up to you and Russel.



But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
did). This is certainly not what uprobe_write_opcode() wants, we do not want
or need "flush" in this case. The same for __create_xol_area().

Note also that currently copy_to_user_page() is always called under mmap_sem.
I do not know if arm actually needs ->mmap_sem, but if you propose it as a
generic solution we should probably take this lock.

Also. Even if we have copy_to_user_page_no_vma() or change copy_to_user_page()
to accept vma => NULL, I am not sure this will work fine on arm when the probed
application unmaps xol_area and mmaps something else at the same vaddr. I mean,
in this case we do not care if the application crashes, but please verify that
something really bad can't happen.

Let me repeat just in case, I know nothing about arm/. I can't even understand
how, say, flush_pfn_alias() works, and how it should work if 2 threads call it
at the same time with the same vaddr (or CACHE_COLOUR(vaddr)).

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-14 21:40                         ` Victor Kamensky
@ 2014-04-15 16:26                           ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/14, Victor Kamensky wrote:
>
> Oleg's fix - Oleg's fix proposed on [1]. Basically it uses
> copy_to_user_page and it does dynamic look-up of xol area vma
> every tracing

I guess I was not clear. No, I didn't really try to propose this change,
I do not like it ;)

I showed this hack in reply to multiple and persistent requests to reuse
the ptrace solution we already have.

> my arm specific fix - this one was proposed on as [2].

I didn't even try to read the changes in arm/, I can't understand them
anyway. I leave this to you and Russel.

But, once again, please do not add arch_uprobe_flush_xol_access(), add
arch_uprobe_copy_ixol().

> x) fix that I did for ARM that shares ARM code with
> copy_to_user_page but does not need vma performs best.

The patch which adds copy_to_user_page(vma => NULL) into copy_to_page() ?
Please see the comments in my previous email.

> When I looked at current uprobes single step out line code
> and compared it with code that was in the past (utrace
> times) I noticed main essential difference how xol slots
> are handled: Currently for each hit uprobes code allocate
> xol slot and needs dcache/icache flush. But in the past
> xol slot was attached/cached to uprobe entry

Can't comment, I am not familiar with the old implementation.

But yes, the current implementation is not perfect. Once again, it would
be nice to remove this vma. Even if this is not possible, we can try to
share this memory. We do not even need lru, we can make it "per cpu" and
avoid the broadcasts. On x86 this is simple, we have __switch_to_xtra()
which can re-copy ->ixol[] and do flush_icache_range() if UTASK_SSTEP.
Not sure this is possible on arm and other arch'es. But lets not discuss
this right now, this is a bit off-topic currently.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 15:46                         ` Oleg Nesterov
@ 2014-04-15 16:46                           ` Victor Kamensky
  2014-04-15 17:19                           ` David Long
  1 sibling, 0 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-15 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 April 2014 08:46, Oleg Nesterov <oleg@redhat.com> wrote:
> On 04/14, Victor Kamensky wrote:
>>
>> On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 04/11, Linus Torvalds wrote:
>> >>
>> >> So I really think we should just have a fixed
>> >> "flush_icache_page(page,vaddr)" function.
>> >> ...
>> >> Then the uprobe case can just do
>> >>
>> >>     copy_to_page()
>> >>     flush_dcache_page()
>> >>     flush_icache_page()
>> >
>> >
>> > And I obviously like this idea because (iiuc) it more or less matches
>> > flush_icache_page_xxx() I tried to suggest.
>>
>> Would not page granularity to be too expensive? Note you need to do that on
>> each probe hit and you flushing whole data and instruction page every time.
>> IMHO it will work correctly when you flush just few dcache/icache lines that
>> correspond to xol slot that got modified. Note copy_to_user_page takes
>> len that describes size of area that has to be flushed. Given that we are
>> flushing xol area page at this case; and nothing except one xol slot is
>> any interest for current task, and if CPU can flush one dcache and icache
>> page as quickly as it can flush range, my remark may not matter.
>
> We can add "vaddr, len" to the argument list.
>
>> I personally would prefer if we could have function like copy_to_user_page
>> but without requirement to pass vma to it.
>
> I won't argue, but you need to convince maintainers.
>
>
> And to remind, we need something simple/nonintrusive for arm right now.
> Again, I won't argue if we turn copy_to_page() + flush_dcache_page() into
> __weak arch_uprobe_copy_ixol(), and add the necessary hacks into arm's
> implementatiion. This is up to you and Russel.

For short term arm specific solution I will follow up on [1]. Yes, I
will incorporate
your request to make arch_uprobe_copy_ixol() instead of
arch_uprobe_flush_xol_access, will address Dave Long's
comments about checkpatch and will remove special handling for broadcast
situation (FLAG_UA_BROADCAST) since in further discussion it was
established that task can migrate while doing uprobes xol single stepping.

I don't think my patch does those things that you describe below. Anyway,
I will repost new version of short term arm specific fix proposal today PST
and we will see what Russell, you and all will say about it.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245952.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245743.html

Thanks,
Victor

>
>
> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> did). This is certainly not what uprobe_write_opcode() wants, we do not want
> or need "flush" in this case. The same for __create_xol_area().
>
> Note also that currently copy_to_user_page() is always called under mmap_sem.
> I do not know if arm actually needs ->mmap_sem, but if you propose it as a
> generic solution we should probably take this lock.
>
> Also. Even if we have copy_to_user_page_no_vma() or change copy_to_user_page()
> to accept vma => NULL, I am not sure this will work fine on arm when the probed
> application unmaps xol_area and mmaps something else at the same vaddr. I mean,
> in this case we do not care if the application crashes, but please verify that
> something really bad can't happen.
>
> Let me repeat just in case, I know nothing about arm/. I can't even understand
> how, say, flush_pfn_alias() works, and how it should work if 2 threads call it
> at the same time with the same vaddr (or CACHE_COLOUR(vaddr)).
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 15:46                         ` Oleg Nesterov
  2014-04-15 16:46                           ` Victor Kamensky
@ 2014-04-15 17:19                           ` David Long
  2014-04-15 17:38                             ` David Miller
  2014-04-15 17:43                             ` Oleg Nesterov
  1 sibling, 2 replies; 66+ messages in thread
From: David Long @ 2014-04-15 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15/14 11:46, Oleg Nesterov wrote:
> 
> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> did). This is certainly not what uprobe_write_opcode() wants, we do not want
> or need "flush" in this case. The same for __create_xol_area().
> 

It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
be more philosophically correct for all architectures):


diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..b418626 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1287,6 +1287,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 {
        struct xol_area *area;
        unsigned long xol_vaddr;
+       void *kaddr;
 
        area = get_xol_area();
        if (!area)
@@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
                return 0;
 
        /* Initialize the slot */
-       copy_to_page(area->page, xol_vaddr,
-                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-       /*
-        * We probably need flush_icache_user_range() but it needs vma.
-        * This should work on supported architectures too.
-        */
-       flush_dcache_page(area->page);
+       kaddr = kmap_atomic(area->page);
+       __copy_to_user_page(current->mm, area->page, xol_vaddr,
+                       kaddr + (xol_vaddr & ~PAGE_MASK),
+                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true);
+       kunmap_atomic(kaddr);
 
        return xol_vaddr;
 }


Opinions?  It's possible this approach isn't good enough.  Cache operations and VM
are not my strong suit.

-dl

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:19                           ` David Long
@ 2014-04-15 17:38                             ` David Miller
  2014-04-15 17:49                               ` Oleg Nesterov
  2014-04-15 17:43                             ` Oleg Nesterov
  1 sibling, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-15 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

From: David Long <dave.long@linaro.org>
Date: Tue, 15 Apr 2014 13:19:27 -0400

> On 04/15/14 11:46, Oleg Nesterov wrote:
>> 
>> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
>> did). This is certainly not what uprobe_write_opcode() wants, we do not want
>> or need "flush" in this case. The same for __create_xol_area().
>> 
> 
> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
> be more philosophically correct for all architectures):

It occurs to me that because of the specific environment in which this
executes we can make the interface take advantage of the invariants at
this call site:

1) We are copying into userspace and thus current->mm

2) We are only storing an instruction or two

So it would be just like the dynamic linker lazy resolving a PLT slot in
userland.

Furthermore, we can do the stores using something akin to put_user(),
directly store them into userspace.

This avoids completely any D-cache aliasing issues.  The kernel stores
to the same address, and therefore the same cache lines, as userspace
would.

So for example, since even on the most braindamaged sparc64 cpus the
remote cpus will have their I-cache snoop the store we do locally we
just need to do a simple flush on the local cpu where this store is
happening.  So I'd like to do this something like:

	stw	%r1, [%r2 + 0x0]
	flush	%r2

PowerPC could probably do something similar.

The copy_to_user_page() interface has to deal with storing into a non-current
'mm' and therefore via the kernel side copy of the page and that's why the
D-cache aliasing issues are so painful to deal with.

Just a thought...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:19                           ` David Long
  2014-04-15 17:38                             ` David Miller
@ 2014-04-15 17:43                             ` Oleg Nesterov
  2014-04-15 17:46                               ` David Miller
  2014-04-15 19:39                               ` David Long
  1 sibling, 2 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15, David Long wrote:
>
> On 04/15/14 11:46, Oleg Nesterov wrote:
> >
> > But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> > did). This is certainly not what uprobe_write_opcode() wants, we do not want
> > or need "flush" in this case. The same for __create_xol_area().
> >
>
> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
> be more philosophically correct for all architectures):

David, let me repeat. I am not going to argue even if I obviously like
the Linus's suggestion more.

I only argued with the suggestions to follow the __access_remote_vm()
logic.

> @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>                 return 0;
>
>         /* Initialize the slot */
> -       copy_to_page(area->page, xol_vaddr,
> -                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
> -       /*
> -        * We probably need flush_icache_user_range() but it needs vma.
> -        * This should work on supported architectures too.
> -        */
> -       flush_dcache_page(area->page);
> +       kaddr = kmap_atomic(area->page);
> +       __copy_to_user_page(current->mm, area->page, xol_vaddr,
> +                       kaddr + (xol_vaddr & ~PAGE_MASK),
> +                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true);
> +       kunmap_atomic(kaddr);

This all is fine, but you need to introduce __copy_to_user_page() first.

And you need to do this for every arch. And you need to verify that it
does not need mmap_sem. It seems that at least alpha needs this lock.
And as Russel pointed out (again, again, I could easily misunderstood him)
you need preempt_disable() around memcpy + flush, so down_read(mmap_sem)
should probably go into __copy_to_user_page(). And I am not sure this
helper needs "struct mm_struct *mm" argument, but this is minor.

Finally, let me repeat, you should verify that this
__copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
is not mmapped, or its mapping do not match area->page.

Good luck ;)

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:43                             ` Oleg Nesterov
@ 2014-04-15 17:46                               ` David Miller
  2014-04-15 18:03                                 ` Oleg Nesterov
  2014-04-15 19:39                               ` David Long
  1 sibling, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-15 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 19:43:30 +0200

> Finally, let me repeat, you should verify that this
> __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
> is not mmapped, or its mapping do not match area->page.

Just directly access userspace with the usual exception mechanism we
use for copy_to_user(), put_user(), et al. and if it faults you'll get
-EFAULT and handle it.

This also avoids the D-cache aliasing issues entirely as I explained
in my other reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:38                             ` David Miller
@ 2014-04-15 17:49                               ` Oleg Nesterov
  2014-04-15 17:50                                 ` David Miller
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 17:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15, David Miller wrote:
>
> From: David Long <dave.long@linaro.org>
> Date: Tue, 15 Apr 2014 13:19:27 -0400
>
> > On 04/15/14 11:46, Oleg Nesterov wrote:
> >>
> >> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> >> did). This is certainly not what uprobe_write_opcode() wants, we do not want
> >> or need "flush" in this case. The same for __create_xol_area().
> >>
> >
> > It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
> > would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
> > be more philosophically correct for all architectures):
>
> It occurs to me that because of the specific environment in which this
> executes we can make the interface take advantage of the invariants at
> this call site:
>
> 1) We are copying into userspace and thus current->mm
>
> 2) We are only storing an instruction or two

Not sure this can help, but also

  3) nobody else will even read/execute this peace of memory

> Furthermore, we can do the stores using something akin to put_user(),
> directly store them into userspace.
>
> This avoids completely any D-cache aliasing issues.  The kernel stores
> to the same address, and therefore the same cache lines, as userspace
> would.

Heh. Of course I thought about this. This can not work afaics.

We do not want to write to, say, page cache if the probed application
mmaps a file at the same vaddr.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:49                               ` Oleg Nesterov
@ 2014-04-15 17:50                                 ` David Miller
  2014-04-15 18:07                                   ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-15 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 19:49:00 +0200

> We do not want to write to, say, page cache if the probed application
> mmaps a file at the same vaddr.

If user has write access to that page... not our problem.

And if he doesn't we'll fault.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:46                               ` David Miller
@ 2014-04-15 18:03                                 ` Oleg Nesterov
  2014-04-15 18:30                                   ` David Miller
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 18:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 15 Apr 2014 19:43:30 +0200
>
> > Finally, let me repeat, you should verify that this
> > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
> > is not mmapped, or its mapping do not match area->page.
>
> Just directly access userspace with the usual exception mechanism we
> use for copy_to_user(), put_user(), et al. and if it faults you'll get
> -EFAULT and handle it.
>
> This also avoids the D-cache aliasing issues entirely as I explained
> in my other reply.

Yes, yes, this is obvious.

But I have no idea what else we should do to take care of icache.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:50                                 ` David Miller
@ 2014-04-15 18:07                                   ` Oleg Nesterov
  2014-04-15 18:27                                     ` David Miller
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 15 Apr 2014 19:49:00 +0200
>
> > We do not want to write to, say, page cache if the probed application
> > mmaps a file at the same vaddr.
>
> If user has write access to that page... not our problem.

Well, I am not sure.

Yes, this won't allow you to write the exploit. But if this actually
happens because an application is buggy, this can lead to really
hard-to-debug problems.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:07                                   ` Oleg Nesterov
@ 2014-04-15 18:27                                     ` David Miller
  2014-04-15 18:46                                       ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-15 18:27 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 20:07:00 +0200

> On 04/15, David Miller wrote:
>>
>> From: Oleg Nesterov <oleg@redhat.com>
>> Date: Tue, 15 Apr 2014 19:49:00 +0200
>>
>> > We do not want to write to, say, page cache if the probed application
>> > mmaps a file at the same vaddr.
>>
>> If user has write access to that page... not our problem.
> 
> Well, I am not sure.
> 
> Yes, this won't allow you to write the exploit. But if this actually
> happens because an application is buggy, this can lead to really
> hard-to-debug problems.

And this is different from a stray userland scribble onto such a page
how?

This is all so much simpler if you don't insist on doing all of this
hand-holding for the user, that's not the kernel's job.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:03                                 ` Oleg Nesterov
@ 2014-04-15 18:30                                   ` David Miller
  2014-04-15 18:47                                     ` Russell King - ARM Linux
  2014-04-15 18:50                                     ` David Miller
  0 siblings, 2 replies; 66+ messages in thread
From: David Miller @ 2014-04-15 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 20:03:11 +0200

> On 04/15, David Miller wrote:
>>
>> From: Oleg Nesterov <oleg@redhat.com>
>> Date: Tue, 15 Apr 2014 19:43:30 +0200
>>
>> > Finally, let me repeat, you should verify that this
>> > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
>> > is not mmapped, or its mapping do not match area->page.
>>
>> Just directly access userspace with the usual exception mechanism we
>> use for copy_to_user(), put_user(), et al. and if it faults you'll get
>> -EFAULT and handle it.
>>
>> This also avoids the D-cache aliasing issues entirely as I explained
>> in my other reply.
> 
> Yes, yes, this is obvious.
> 
> But I have no idea what else we should do to take care of icache.

Russell, if for example userland stores into the instruction stream
to patch an instruction, what needs to be done on ARM?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:27                                     ` David Miller
@ 2014-04-15 18:46                                       ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2014-04-15 18:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 15 Apr 2014 20:07:00 +0200
>
> > On 04/15, David Miller wrote:
> >>
> >> From: Oleg Nesterov <oleg@redhat.com>
> >> Date: Tue, 15 Apr 2014 19:49:00 +0200
> >>
> >> > We do not want to write to, say, page cache if the probed application
> >> > mmaps a file at the same vaddr.
> >>
> >> If user has write access to that page... not our problem.
> >
> > Well, I am not sure.
> >
> > Yes, this won't allow you to write the exploit. But if this actually
> > happens because an application is buggy, this can lead to really
> > hard-to-debug problems.
>
> And this is different from a stray userland scribble onto such a page
> how?

Sure. But I think that the kernel should not write to the file just because
a buggy application did the extra munmap() or passed a wrong addres to
mmap/munmap.

> This is all so much simpler if you don't insist on doing all of this
> hand-holding for the user, that's not the kernel's job.

Well, I do not agree, but perhaps because I do not really understand
what do you mean.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:30                                   ` David Miller
@ 2014-04-15 18:47                                     ` Russell King - ARM Linux
  2014-04-15 18:53                                       ` David Miller
  2014-04-15 18:50                                     ` David Miller
  1 sibling, 1 reply; 66+ messages in thread
From: Russell King - ARM Linux @ 2014-04-15 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 15, 2014 at 02:30:54PM -0400, David Miller wrote:
> Russell, if for example userland stores into the instruction stream
> to patch an instruction, what needs to be done on ARM?

Hi David,

I'm afraid I won't be too responsive this week, but I'll try to provide
input where I can.  So I'll try to cover all points in the previous
discussion in this reply.

I think your suggestion that we should be writing directly to userspace
from uprobes is definitely the correct way forward, as that nicely takes
any d-cache aliasing issues completely out of the picture.

However, unlike your "most braindead sparc" CPU, the i-cache doesn't
snoop d-cache stores at all.  However, this is something that we already
deal with since self-modifying code has to work, so (from userspace) we
have a syscall that is used to sort that out.  Internally in the kernel,
this translates to:

	ret = flush_cache_user_range(start, end)

This deals with whatever the CPU requires to be able to correctly execute
code which has been previously written in the range - and only actions on
the currently mapped userspace.

I hope this helps.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:30                                   ` David Miller
  2014-04-15 18:47                                     ` Russell King - ARM Linux
@ 2014-04-15 18:50                                     ` David Miller
  2014-04-15 19:29                                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-15 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

From: David Miller <davem@davemloft.net>
Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT)

> Russell, if for example userland stores into the instruction stream
> to patch an instruction, what needs to be done on ARM?

Looking around I suspect something like:

	mcrne	p15, 0, INSN_ADDR, c7, c5, 1

after the instruction stores will do it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:47                                     ` Russell King - ARM Linux
@ 2014-04-15 18:53                                       ` David Miller
  0 siblings, 0 replies; 66+ messages in thread
From: David Miller @ 2014-04-15 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Tue, 15 Apr 2014 19:47:26 +0100

> However, unlike your "most braindead sparc" CPU, the i-cache doesn't
> snoop d-cache stores at all.

All that matters is remote snooping on SMP, not local snooping.

> However, this is something that we already deal with since
> self-modifying code has to work, so (from userspace) we have a
> syscall that is used to sort that out.  Internally in the kernel,
> this translates to:
> 
> 	ret = flush_cache_user_range(start, end)
> 
> This deals with whatever the CPU requires to be able to correctly execute
> code which has been previously written in the range - and only actions on
> the currently mapped userspace.

Looking around it seems the I-cache line mcr operation should do the
right thing for most chips.

You could simply make a new cpuc op for writing an instruction or two
to userspace and doing the I-cache line mcr op afterwards.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 18:50                                     ` David Miller
@ 2014-04-15 19:29                                       ` Russell King - ARM Linux
  2014-04-15 19:51                                         ` David Miller
  0 siblings, 1 reply; 66+ messages in thread
From: Russell King - ARM Linux @ 2014-04-15 19:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 15, 2014 at 02:50:06PM -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT)
> 
> > Russell, if for example userland stores into the instruction stream
> > to patch an instruction, what needs to be done on ARM?
> 
> Looking around I suspect something like:
> 
> 	mcrne	p15, 0, INSN_ADDR, c7, c5, 1
> 
> after the instruction stores will do it.

It does still need to be pushed out of the D-cache first though.  So,
for ARMv7 for example:

	str	NEW_INSN, [INSN_ADDR]		@ store new instruction
	mcr	p15, 0, INSN_ADDR, c7, c11, 1	@ clean d line
	mcr	p15, 0, INSN_ADDR, c7, c5, 1	@ flush i line

would do it.  We of course need the user access marking on that (so that
any fault doesn't oops the kernel) - not only for the store, but also the
following two instructions which could fault (and oops unless they're
marked with a fixup) if someone were to munmap() this page in another
thread.  All those fixups can just do the "lets return -EFAULT" from
the operation.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 17:43                             ` Oleg Nesterov
  2014-04-15 17:46                               ` David Miller
@ 2014-04-15 19:39                               ` David Long
  2014-04-15 19:53                                 ` David Miller
  1 sibling, 1 reply; 66+ messages in thread
From: David Long @ 2014-04-15 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15/14 13:43, Oleg Nesterov wrote:
> On 04/15, David Long wrote:
>>
>> On 04/15/14 11:46, Oleg Nesterov wrote:
>>>
>>> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
>>> did). This is certainly not what uprobe_write_opcode() wants, we do not want
>>> or need "flush" in this case. The same for __create_xol_area().
>>>
>>
>> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
>> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
>> be more philosophically correct for all architectures):
>
> David, let me repeat. I am not going to argue even if I obviously like
> the Linus's suggestion more.
>
> I only argued with the suggestions to follow the __access_remote_vm()
> logic.
>
>> @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>>                  return 0;
>>
>>          /* Initialize the slot */
>> -       copy_to_page(area->page, xol_vaddr,
>> -                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
>> -       /*
>> -        * We probably need flush_icache_user_range() but it needs vma.
>> -        * This should work on supported architectures too.
>> -        */
>> -       flush_dcache_page(area->page);
>> +       kaddr = kmap_atomic(area->page);
>> +       __copy_to_user_page(current->mm, area->page, xol_vaddr,
>> +                       kaddr + (xol_vaddr & ~PAGE_MASK),
>> +                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true);
>> +       kunmap_atomic(kaddr);
>
> This all is fine, but you need to introduce __copy_to_user_page() first.
>

Yes, I have that coded for 14 out of 24 architectures (the easy ones). 
The remaining ones need more work.  Some may prove problematic.  The 
whole approach could yet prove impractical.  More recent suggested 
approaches may be better too (or Linus's relatively simple change).

> And you need to do this for every arch. And you need to verify that it
> does not need mmap_sem. It seems that at least alpha needs this lock.
> And as Russel pointed out (again, again, I could easily misunderstood him)
> you need preempt_disable() around memcpy + flush, so down_read(mmap_sem)
> should probably go into __copy_to_user_page(). And I am not sure this
> helper needs "struct mm_struct *mm" argument, but this is minor.
>

I will look at the locking.  I'm not sure yet if anything needs struct 
mm_struct either.

> Finally, let me repeat, you should verify that this
> __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
> is not mmapped, or its mapping do not match area->page.
>
> Good luck ;)
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 19:29                                       ` Russell King - ARM Linux
@ 2014-04-15 19:51                                         ` David Miller
  0 siblings, 0 replies; 66+ messages in thread
From: David Miller @ 2014-04-15 19:51 UTC (permalink / raw)
  To: linux-arm-kernel

From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Tue, 15 Apr 2014 20:29:15 +0100

> On Tue, Apr 15, 2014 at 02:50:06PM -0400, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT)
>> 
>> > Russell, if for example userland stores into the instruction stream
>> > to patch an instruction, what needs to be done on ARM?
>> 
>> Looking around I suspect something like:
>> 
>> 	mcrne	p15, 0, INSN_ADDR, c7, c5, 1
>> 
>> after the instruction stores will do it.
> 
> It does still need to be pushed out of the D-cache first though.  So,
> for ARMv7 for example:
> 
> 	str	NEW_INSN, [INSN_ADDR]		@ store new instruction
> 	mcr	p15, 0, INSN_ADDR, c7, c11, 1	@ clean d line
> 	mcr	p15, 0, INSN_ADDR, c7, c5, 1	@ flush i line
> 
> would do it.  We of course need the user access marking on that (so that
> any fault doesn't oops the kernel) - not only for the store, but also the
> following two instructions which could fault (and oops unless they're
> marked with a fixup) if someone were to munmap() this page in another
> thread.  All those fixups can just do the "lets return -EFAULT" from
> the operation.

Right, you'd need the exception table business used in put_user() et al.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 19:39                               ` David Long
@ 2014-04-15 19:53                                 ` David Miller
  2014-04-16  1:42                                   ` Victor Kamensky
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-15 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: David Long <dave.long@linaro.org>
Date: Tue, 15 Apr 2014 15:39:18 -0400

> Yes, I have that coded for 14 out of 24 architectures (the easy
> ones). The remaining ones need more work.  Some may prove problematic.
> The whole approach could yet prove impractical.  More recent suggested
> approaches may be better too (or Linus's relatively simple change).

BTW, another reason perhaps to write directly to userspace and have
a uprobes_*() specific interface is that you'll only need to do
3 architectures, the ones that support uprobes.

That way, this is just part of what every arch needs to implement in
order to support uprobes.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-15 19:53                                 ` David Miller
@ 2014-04-16  1:42                                   ` Victor Kamensky
  2014-04-16  2:22                                     ` David Miller
  2014-04-16  2:24                                     ` David Miller
  0 siblings, 2 replies; 66+ messages in thread
From: Victor Kamensky @ 2014-04-16  1:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 April 2014 12:53, David Miller <davem@davemloft.net> wrote:
> From: David Long <dave.long@linaro.org>
> Date: Tue, 15 Apr 2014 15:39:18 -0400
>
>> Yes, I have that coded for 14 out of 24 architectures (the easy
>> ones). The remaining ones need more work.  Some may prove problematic.
>> The whole approach could yet prove impractical.  More recent suggested
>> approaches may be better too (or Linus's relatively simple change).
>
> BTW, another reason perhaps to write directly to userspace and have
> a uprobes_*() specific interface is that you'll only need to do
> 3 architectures, the ones that support uprobes.
>
> That way, this is just part of what every arch needs to implement in
> order to support uprobes.

David, Russell, I hope I correctly understood your idea of writing
directly into user space. Please check patch below. On my tests
it works ok. Look for arch_uprobe_copy_ixol in
arch/arm/kernel/uprobes.c.

However, I've run into the issue while I've tried that - I had to
add VM_WRITE to xol area mapping. I.e area should be
writable and executable in order for put_user or __copy_to_user
to work. This does not seem right to have such mapping
inside of user-space process. It seems as possible exploitation
point. Especially it seems that xol page is left around even
when tracing is complete. I feel very uneasy about
this direction, unless I am missing something and there is
other way to do that.

Another concern about this approach: will
flush_cache_user_range function take care of smp case
were remove snooping is not supported. I think use
case that Russell mentioned about self modified code
and special system call implies yes.

>From e325a1a1bdddc7bd95301f0031d868bc69ddcddb Mon Sep 17 00:00:00 2001
From: Victor Kamensky <victor.kamensky@linaro.org>
Date: Mon, 7 Apr 2014 17:57:36 -0700
Subject: [PATCH] ARM: uprobes need icache flush after xol write

After instruction write into xol area, on ARM V7
architecture code need to flush dcache and icache to sync
them up for given set of addresses. Having just
'flush_dcache_page(page)' call is not enough - it is
possible to have stale instruction sitting in icache
for given xol area slot address.

Introduce arch_uprobe_ixol_copy weak function
that by default calls uprobes copy_to_page function and
than flush_dcache_page function and on ARM define new one
that handles xol slot copy in ARM specific way

Arm implementation of arch_uprobe_ixol_copy just
makes __copy_to_user which does not have dcache aliasing
issues and then flush_cache_user_range to push dcache
out and invalidate corresponding icache entries.

Note in order to write into uprobes xol area had
to add VM_WRITE to xol area mapping.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/kernel/uprobes.c |  8 ++++++++
 include/linux/uprobes.h   |  3 +++
 kernel/events/uprobes.c   | 27 ++++++++++++++++++---------
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/uprobes.c b/arch/arm/kernel/uprobes.c
index f9bacee..4836e54 100644
--- a/arch/arm/kernel/uprobes.c
+++ b/arch/arm/kernel/uprobes.c
@@ -113,6 +113,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe
*auprobe, struct mm_struct *mm,
     return 0;
 }

+void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
+               void *src, unsigned long len)
+{
+    if (!__copy_to_user((void *) vaddr, src, len))
+        flush_cache_user_range(vaddr, len);
+}
+
+
 int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
     struct uprobe_task *utask = current->utask;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index edff2b9..c52f827 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -32,6 +32,7 @@ struct vm_area_struct;
 struct mm_struct;
 struct inode;
 struct notifier_block;
+struct page;

 #define UPROBE_HANDLER_REMOVE        1
 #define UPROBE_HANDLER_MASK        1
@@ -127,6 +128,8 @@ extern int  arch_uprobe_exception_notify(struct
notifier_block *self, unsigned l
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct
pt_regs *regs);
 extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long
trampoline_vaddr, struct pt_regs *regs);
 extern bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct
pt_regs *regs);
+extern void __weak arch_uprobe_copy_ixol(struct page *page, unsigned
long vaddr,
+                     void *src, unsigned long len);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..9e22002 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1149,7 +1149,7 @@ static int xol_add_vma(struct mm_struct *mm,
struct xol_area *area)
     }

     ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
-                VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
+                VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_WRITE, &area->page);
     if (ret)
         goto fail;

@@ -1296,14 +1296,8 @@ static unsigned long xol_get_insn_slot(struct
uprobe *uprobe)
     if (unlikely(!xol_vaddr))
         return 0;

-    /* Initialize the slot */
-    copy_to_page(area->page, xol_vaddr,
-            &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-    /*
-     * We probably need flush_icache_user_range() but it needs vma.
-     * This should work on supported architectures too.
-     */
-    flush_dcache_page(area->page);
+    arch_uprobe_copy_ixol(area->page, xol_vaddr,
+                  &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));

     return xol_vaddr;
 }
@@ -1346,6 +1340,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
     }
 }

+void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
+                  void *src, unsigned long len)
+{
+    /* Initialize the slot */
+    copy_to_page(page, vaddr, src, len);
+
+    /*
+     * We probably need flush_icache_user_range() but it needs vma.
+     * This should work on most of architectures by default. If
+     * architecture needs to do something different it can define
+     * its own version of the function.
+     */
+    flush_dcache_page(page);
+}
+
 /**
  * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
  * @regs: Reflects the saved state of the task after it has hit a breakpoint
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-16  1:42                                   ` Victor Kamensky
@ 2014-04-16  2:22                                     ` David Miller
  2014-04-16  2:24                                     ` David Miller
  1 sibling, 0 replies; 66+ messages in thread
From: David Miller @ 2014-04-16  2:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Victor Kamensky <victor.kamensky@linaro.org>
Date: Tue, 15 Apr 2014 18:42:39 -0700

> On 15 April 2014 12:53, David Miller <davem@davemloft.net> wrote:
>> From: David Long <dave.long@linaro.org>
>> Date: Tue, 15 Apr 2014 15:39:18 -0400
>>
>>> Yes, I have that coded for 14 out of 24 architectures (the easy
>>> ones). The remaining ones need more work.  Some may prove problematic.
>>> The whole approach could yet prove impractical.  More recent suggested
>>> approaches may be better too (or Linus's relatively simple change).
>>
>> BTW, another reason perhaps to write directly to userspace and have
>> a uprobes_*() specific interface is that you'll only need to do
>> 3 architectures, the ones that support uprobes.
>>
>> That way, this is just part of what every arch needs to implement in
>> order to support uprobes.
> 
> David, Russell, I hope I correctly understood your idea of writing
> directly into user space. Please check patch below.

Just FYI, your mail client corrupted the patch, at a minimum chopping
up long lines.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-16  1:42                                   ` Victor Kamensky
  2014-04-16  2:22                                     ` David Miller
@ 2014-04-16  2:24                                     ` David Miller
  2014-04-16  3:06                                       ` Victor Kamensky
  1 sibling, 1 reply; 66+ messages in thread
From: David Miller @ 2014-04-16  2:24 UTC (permalink / raw)
  To: linux-arm-kernel

From: Victor Kamensky <victor.kamensky@linaro.org>
Date: Tue, 15 Apr 2014 18:42:39 -0700

> +void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
> +                  void *src, unsigned long len)
> +{
> +    /* Initialize the slot */
> +    copy_to_page(page, vaddr, src, len);
> +
> +    /*
> +     * We probably need flush_icache_user_range() but it needs vma.
> +     * This should work on most of architectures by default. If
> +     * architecture needs to do something different it can define
> +     * its own version of the function.
> +     */
> +    flush_dcache_page(page);
> +}
> +

I would say that, if anything, flush_dcache_page() is unnecessary
if you just copy straight to userspace.

The default implementation should be copy_to_user(), and that's what
every architecture can use if it needs no I-cache flushing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-16  2:24                                     ` David Miller
@ 2014-04-16  3:06                                       ` Victor Kamensky
  2014-04-16  3:17                                         ` David Miller
  0 siblings, 1 reply; 66+ messages in thread
From: Victor Kamensky @ 2014-04-16  3:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 April 2014 19:24, David Miller <davem@davemloft.net> wrote:
> From: Victor Kamensky <victor.kamensky@linaro.org>
> Date: Tue, 15 Apr 2014 18:42:39 -0700
>
>> +void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>> +                  void *src, unsigned long len)
>> +{
>> +    /* Initialize the slot */
>> +    copy_to_page(page, vaddr, src, len);
>> +
>> +    /*
>> +     * We probably need flush_icache_user_range() but it needs vma.
>> +     * This should work on most of architectures by default. If
>> +     * architecture needs to do something different it can define
>> +     * its own version of the function.
>> +     */
>> +    flush_dcache_page(page);
>> +}
>> +
>
> I would say that, if anything, flush_dcache_page() is unnecessary
> if you just copy straight to userspace.
> The default implementation should be copy_to_user(), and that's what
> every architecture can use if it needs no I-cache flushing.

OK, got it. I tried not to touch existing cases (x86 and ppc), but yes
it would benefit here as well.

But don't you think that writable and executable uprobes
xol page is show stopper for this approach?

Thanks,
Victor

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-16  3:06                                       ` Victor Kamensky
@ 2014-04-16  3:17                                         ` David Miller
  0 siblings, 0 replies; 66+ messages in thread
From: David Miller @ 2014-04-16  3:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Victor Kamensky <victor.kamensky@linaro.org>
Date: Tue, 15 Apr 2014 20:06:57 -0700

> But don't you think that writable and executable uprobes xol page is
> show stopper for this approach?

Not really.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing
  2014-04-11  3:45       ` David Long
  2014-04-11  4:36         ` David Miller
  2014-04-11 13:08         ` Oleg Nesterov
@ 2014-04-23 10:45         ` Catalin Marinas
  2 siblings, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2014-04-23 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 10, 2014 at 11:45:31PM -0400, David Long wrote:
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 04709b6..2e976fb 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -241,7 +241,7 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in
>  static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
>  {
>  	void *kaddr = kmap_atomic(page);
> -	memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
> +	copy_to_user_page(NULL, page, vaddr, kaddr + (vaddr & ~PAGE_MASK), src, len);
>  	kunmap_atomic(kaddr);
>  }

Rather than changing all the architectures to be able to pass a NULL vma
to copy_to_user_page(), you can create a dummy vma on the stack with the
VM_EXEC flag and pass a pointer to it.

-- 
Catalin

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2014-04-23 10:45 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-09  5:58 [PATCH v2] ARM: uprobes need icache flush after xol write Victor Kamensky
2014-04-09  5:58 ` Victor Kamensky
2014-04-09 18:23   ` David Long
2014-04-09 18:45   ` Oleg Nesterov
2014-04-09 19:13     ` Victor Kamensky
2014-04-09 19:19       ` Russell King - ARM Linux
2014-04-11  3:42       ` [RFC PATCH] uprobes: copy to user-space xol page with proper cache flushing David Long
2014-04-11  3:45       ` David Long
2014-04-11  4:36         ` David Miller
2014-04-11 14:26           ` Victor Kamensky
2014-04-11 14:35             ` Oleg Nesterov
2014-04-11 14:55             ` Victor Kamensky
2014-04-11 14:56           ` Oleg Nesterov
2014-04-11 15:22             ` Oleg Nesterov
2014-04-11 15:30               ` Russell King - ARM Linux
2014-04-11 17:24                 ` Oleg Nesterov
2014-04-11 17:38                   ` Oleg Nesterov
2014-04-11 18:00                     ` David Miller
2014-04-11 18:25                       ` Oleg Nesterov
2014-04-11 17:50                   ` Linus Torvalds
2014-04-11 18:02                     ` David Miller
2014-04-11 18:11                       ` Linus Torvalds
2014-04-11 18:19                         ` David Miller
2014-04-11 18:24                           ` Linus Torvalds
2014-04-11 18:58                             ` David Miller
2014-04-11 19:24                               ` Linus Torvalds
2014-04-11 18:13                       ` Victor Kamensky
2014-04-11 18:36                         ` Oleg Nesterov
2014-04-14 18:59                     ` Oleg Nesterov
2014-04-14 20:05                       ` Victor Kamensky
2014-04-14 21:40                         ` Victor Kamensky
2014-04-15 16:26                           ` Oleg Nesterov
2014-04-15 15:46                         ` Oleg Nesterov
2014-04-15 16:46                           ` Victor Kamensky
2014-04-15 17:19                           ` David Long
2014-04-15 17:38                             ` David Miller
2014-04-15 17:49                               ` Oleg Nesterov
2014-04-15 17:50                                 ` David Miller
2014-04-15 18:07                                   ` Oleg Nesterov
2014-04-15 18:27                                     ` David Miller
2014-04-15 18:46                                       ` Oleg Nesterov
2014-04-15 17:43                             ` Oleg Nesterov
2014-04-15 17:46                               ` David Miller
2014-04-15 18:03                                 ` Oleg Nesterov
2014-04-15 18:30                                   ` David Miller
2014-04-15 18:47                                     ` Russell King - ARM Linux
2014-04-15 18:53                                       ` David Miller
2014-04-15 18:50                                     ` David Miller
2014-04-15 19:29                                       ` Russell King - ARM Linux
2014-04-15 19:51                                         ` David Miller
2014-04-15 19:39                               ` David Long
2014-04-15 19:53                                 ` David Miller
2014-04-16  1:42                                   ` Victor Kamensky
2014-04-16  2:22                                     ` David Miller
2014-04-16  2:24                                     ` David Miller
2014-04-16  3:06                                       ` Victor Kamensky
2014-04-16  3:17                                         ` David Miller
2014-04-11 17:43                 ` David Miller
2014-04-11 15:32               ` Peter Zijlstra
2014-04-11 16:00                 ` Russell King - ARM Linux
2014-04-11 18:39                   ` Peter Zijlstra
2014-04-11 15:37             ` Victor Kamensky
2014-04-11 16:22               ` Oleg Nesterov
2014-04-11 15:42             ` Linus Torvalds
2014-04-11 13:08         ` Oleg Nesterov
2014-04-23 10:45         ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.