All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
@ 2014-04-16  5:31 Victor Kamensky
  2014-04-16  5:31 ` Victor Kamensky
  0 siblings, 1 reply; 18+ messages in thread
From: Victor Kamensky @ 2014-04-16  5:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Guys,

Here is my understanding of Dave's and Russell's suggestion on [1] 
to use direct write of xol slot instructions to user space. Now 
posting patch through 'git send-email' since, as it was noted, my
mailer corrupts patches otherwise.

Note default case with __copy_to_user is NOT tested. It addresses
David's remark.

Personally, I am very concerned about this patch because it creates
writable and executable page in traced process. The way how uprobes
is implemented such page will stay in process even if all uprobes
are detached from process. IMHO it may create possible attack hole.
I would prefer to see any executable memory read-only all the time.

On top of that, at least in ARM case xol page address is not even 
randomized, which was perfectly fine with current nowrite/noread,
just execute permissions.

Patch follows this cover letter.

Thanks,
Victor

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/247763.html

Victor Kamensky (1):
  ARM: uprobes xol write directly to userspace

 arch/arm/kernel/uprobes.c |  8 ++++++++
 include/linux/uprobes.h   |  3 +++
 kernel/events/uprobes.c   | 28 +++++++++++++++++++---------
 3 files changed, 30 insertions(+), 9 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16  5:31 [RFC PATCH v4] ARM: uprobes xol write directly to userspace Victor Kamensky
@ 2014-04-16  5:31 ` Victor Kamensky
  2014-04-16 14:51   ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Victor Kamensky @ 2014-04-16  5:31 UTC (permalink / raw)
  To: linux-arm-kernel

After instruction write into xol area, on ARM V7
architecture code need to flush dcache and icache to sync
them up for given set of addresses. Having just
'flush_dcache_page(page)' call is not enough - it is
possible to have stale instruction sitting in icache
for given xol area slot address.

Introduce arch_uprobe_ixol_copy weak function
that by default calls __copy_to_user function, and
that sufficient for CPUs that can snoop instruction
writes from dcache. On ARM define new one
that handles xol slot copy in ARM specific way.

Arm implementation of arch_uprobe_ixol_copy function
makes __copy_to_user call which does not have dcache
aliasing issues and then flush_cache_user_range to
push dcache out and invalidate corresponding icache
entries.

Note in order to write into uprobes xol area had
to add VM_WRITE to xol area mapping.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/kernel/uprobes.c |  8 ++++++++
 include/linux/uprobes.h   |  3 +++
 kernel/events/uprobes.c   | 28 +++++++++++++++++++---------
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/uprobes.c b/arch/arm/kernel/uprobes.c
index f9bacee..4836e54 100644
--- a/arch/arm/kernel/uprobes.c
+++ b/arch/arm/kernel/uprobes.c
@@ -113,6 +113,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	return 0;
 }
 
+void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
+			   void *src, unsigned long len)
+{
+	if (!__copy_to_user((void *) vaddr, src, len))
+		flush_cache_user_range(vaddr, len);
+}
+
+
 int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
 	struct uprobe_task *utask = current->utask;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index edff2b9..c52f827 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -32,6 +32,7 @@ struct vm_area_struct;
 struct mm_struct;
 struct inode;
 struct notifier_block;
+struct page;
 
 #define UPROBE_HANDLER_REMOVE		1
 #define UPROBE_HANDLER_MASK		1
@@ -127,6 +128,8 @@ extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned l
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
 extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
 extern bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
+extern void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
+					 void *src, unsigned long len);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..1038e57 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1149,7 +1149,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	}
 
 	ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
-				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
+				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_WRITE, &area->page);
 	if (ret)
 		goto fail;
 
@@ -1296,14 +1296,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 	if (unlikely(!xol_vaddr))
 		return 0;
 
-	/* Initialize the slot */
-	copy_to_page(area->page, xol_vaddr,
-			&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-	/*
-	 * We probably need flush_icache_user_range() but it needs vma.
-	 * This should work on supported architectures too.
-	 */
-	flush_dcache_page(area->page);
+	arch_uprobe_copy_ixol(area->page, xol_vaddr,
+			      &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
 
 	return xol_vaddr;
 }
@@ -1346,6 +1340,22 @@ static void xol_free_insn_slot(struct task_struct *tsk)
 	}
 }
 
+void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
+				  void *src, unsigned long len)
+{
+	/*
+	 * Note if CPU does not support instructions write snooping
+	 * from dcache it needs to define its own version of this
+	 * function that would take care of proper cache flushes.
+	 *
+	 * Nothing we can do if it fails, added if to make unused
+	 * result warning happy. If xol write failed because process
+	 * unmapped xol area by mistake, process will crash in some
+	 * other place.
+	 */
+	if (__copy_to_user((void *) vaddr, src, len));
+}
+
 /**
  * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
  * @regs: Reflects the saved state of the task after it has hit a breakpoint
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16  5:31 ` Victor Kamensky
@ 2014-04-16 14:51   ` Oleg Nesterov
  2014-04-16 15:00     ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2014-04-16 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15, Victor Kamensky wrote:
>
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1149,7 +1149,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
>  	}
>
>  	ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
> -				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
> +				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_WRITE, &area->page);

Yes, this is nasty.

I would like to have a reason to nack this change ;) Unfortunately the current
code is buggy too and we need to protect the kernel from malicious applications
which can rewrite the insn we are going to step over in UTASK_SSTEP state anyway.

> +void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
> +				  void *src, unsigned long len)
> +{
> +	/*
> +	 * Note if CPU does not support instructions write snooping
> +	 * from dcache it needs to define its own version of this
> +	 * function that would take care of proper cache flushes.
> +	 *
> +	 * Nothing we can do if it fails, added if to make unused
> +	 * result warning happy. If xol write failed because process
> +	 * unmapped xol area by mistake, process will crash in some
> +	 * other place.
> +	 */
> +	if (__copy_to_user((void *) vaddr, src, len));
> +}

Plus, again, this can write to another mapping, say to file-backed memory.

Finally, with this change it won't be possible to share this xol memory with
other tasks.

But it seems that it is pointless to argue.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 14:51   ` Oleg Nesterov
@ 2014-04-16 15:00     ` David Miller
  2014-04-16 16:43       ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2014-04-16 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 16 Apr 2014 16:51:07 +0200

> On 04/15, Victor Kamensky wrote:
>>
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1149,7 +1149,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
>>  	}
>>
>>  	ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
>> -				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
>> +				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_WRITE, &area->page);
> 
> Yes, this is nasty.
> 
> I would like to have a reason to nack this change ;) Unfortunately the current
> code is buggy too and we need to protect the kernel from malicious applications
> which can rewrite the insn we are going to step over in UTASK_SSTEP state anyway.

I think there may be a way to achieve your objectives.

Pass MAP_SHARED into the flags argument of get_unmapped_area(), and
pass the pfn of the xol page in as "pgoff".

This will make the xol page get mapped into the user process at an
address which is "D-cache congruent" to the kernel side mapping.

So all kernel stores to the page will use the same D-cache line that
user space accesses to it will.

So we end up with all of the benefits of storing directly to
userspace, along with what you're trying to achieve.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 15:00     ` David Miller
@ 2014-04-16 16:43       ` Oleg Nesterov
  2014-04-16 17:38         ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2014-04-16 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/16, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Wed, 16 Apr 2014 16:51:07 +0200
>
> > On 04/15, Victor Kamensky wrote:
> >>
> >> --- a/kernel/events/uprobes.c
> >> +++ b/kernel/events/uprobes.c
> >> @@ -1149,7 +1149,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
> >>  	}
> >>
> >>  	ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
> >> -				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
> >> +				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_WRITE, &area->page);
> >
> > Yes, this is nasty.
> >
> > I would like to have a reason to nack this change ;) Unfortunately the current
> > code is buggy too and we need to protect the kernel from malicious applications
> > which can rewrite the insn we are going to step over in UTASK_SSTEP state anyway.
>
> I think there may be a way to achieve your objectives.
>
> Pass MAP_SHARED into the flags argument of get_unmapped_area(), and
> pass the pfn of the xol page in as "pgoff".
>
> This will make the xol page get mapped into the user process at an
> address which is "D-cache congruent" to the kernel side mapping.
>
> So all kernel stores to the page will use the same D-cache line that
> user space accesses to it will.

Thanks... I didn't know..

But did you really mean get_unmapped_area(pgoff => page_to_pfn(area->page)) ?

I simply can't understand how this can work, arm (and x86) really use it as
"pgoff << PAGE_SHIFT" align_offset accounted in unmapped_area() ?

_Perhaps_ this can help as a "random number" unique for every xol mapping ?
Hmm, no, I don't understand this COLOUR_ALIGN() magic on arm, but unlikely
this is true.

Help!

> So we end up with all of the benefits of storing directly to
> userspace, along with what you're trying to achieve.

And in this case we can avoid copy_to_user(), right ?

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 16:43       ` Oleg Nesterov
@ 2014-04-16 17:38         ` David Miller
  2014-04-16 19:18           ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2014-04-16 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 16 Apr 2014 18:43:10 +0200

> But did you really mean get_unmapped_area(pgoff => page_to_pfn(area->page)) ?

Yes.

> I simply can't understand how this can work, arm (and x86) really use it as
> "pgoff << PAGE_SHIFT" align_offset accounted in unmapped_area() ?

When a platform has D-cache aliasing issues, it must make sure that
every shared page is mapped to the same D-cache alias in all such
shared mappings.

The way this is done is to map each pgoff to a page in the D-cache.
For example, pgoff 0 would be given a virtual address that maps to the
first page in the D-cache, pgoff 1 to the second, and so forth.

What we're doing with this get_unmapped_area() call is to make it so
that userspace's virtual address will land@the same virtual alias
as the kernel one does.

So that stores on the kernel side will be seen properly, without any
cache flushing, by the user side mapping.

>> So we end up with all of the benefits of storing directly to
>> userspace, along with what you're trying to achieve.
> 
> And in this case we can avoid copy_to_user(), right ?

Yes, that's the whole idea, you can forget about the VM_WRITE etc.
that caused your concerns.

It'd be just memcpy to kernel side mapping of the page + i-cache flush
where necessary.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 17:38         ` David Miller
@ 2014-04-16 19:18           ` Oleg Nesterov
  2014-04-16 19:37             ` David Miller
  2014-04-16 19:53             ` Russell King - ARM Linux
  0 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2014-04-16 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/16, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Wed, 16 Apr 2014 18:43:10 +0200
>
> > But did you really mean get_unmapped_area(pgoff => page_to_pfn(area->page)) ?
>
> Yes.
>
> > I simply can't understand how this can work, arm (and x86) really use it as
> > "pgoff << PAGE_SHIFT" align_offset accounted in unmapped_area() ?
>
> When a platform has D-cache aliasing issues, it must make sure that
> every shared page is mapped to the same D-cache alias in all such
> shared mappings.
>
> The way this is done is to map each pgoff to a page in the D-cache.
> For example, pgoff 0 would be given a virtual address that maps to the
> first page in the D-cache, pgoff 1 to the second, and so forth.
>
> What we're doing with this get_unmapped_area() call is to make it so
> that userspace's virtual address will land at the same virtual alias
> as the kernel one does.
         ^^^^^^^^^^^^^^^

and page_address() == xxx + (pfn << PAGE_SHIFT), I seem to understand...

David, thanks a lot. I am not saying I fully understand this all, I'll
try to reread your email tomorrow, but it seems that I see the light.

The last question... area->page = alloc_page(GFP_HIGHUSER), and I am
not sure that arch/arm/mm/highmem.c:kmap_atomic() can't break the
aliasing, __fix_to_virt() in this case will use the same (per-cpu) idx.

Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
care, but could you please ack/nack my understanding?

Thanks!

> >> So we end up with all of the benefits of storing directly to
> >> userspace, along with what you're trying to achieve.
> >
> > And in this case we can avoid copy_to_user(), right ?
>
> Yes, that's the whole idea, you can forget about the VM_WRITE etc.
> that caused your concerns.
>
> It'd be just memcpy to kernel side mapping of the page + i-cache flush
> where necessary.

Great.

Victor, Russel, what do you think? It seems that this patch can be updated.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 19:18           ` Oleg Nesterov
@ 2014-04-16 19:37             ` David Miller
  2014-04-16 20:24               ` David Long
  2014-04-16 19:53             ` Russell King - ARM Linux
  1 sibling, 1 reply; 18+ messages in thread
From: David Miller @ 2014-04-16 19:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 16 Apr 2014 21:18:25 +0200

> The last question... area->page = alloc_page(GFP_HIGHUSER), and I am
> not sure that arch/arm/mm/highmem.c:kmap_atomic() can't break the
> aliasing, __fix_to_virt() in this case will use the same (per-cpu) idx.
> 
> Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
> care, but could you please ack/nack my understanding?

Good point, it might therefore make sense to use a low-mem page.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 19:18           ` Oleg Nesterov
  2014-04-16 19:37             ` David Miller
@ 2014-04-16 19:53             ` Russell King - ARM Linux
  2014-04-16 20:23               ` Oleg Nesterov
  1 sibling, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-04-16 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 16, 2014 at 09:18:25PM +0200, Oleg Nesterov wrote:
> Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
> care, but could you please ack/nack my understanding?

flush_dcache_area() doesn't touch the I-cache... the hint is in the
name. :)  This is also the function which is used for flush_dcache_page()
which we've already established isn't sufficient (for the same reason.)

Plus... we still would need to know the user address(es) to flush for
the I-cache side...

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 19:53             ` Russell King - ARM Linux
@ 2014-04-16 20:23               ` Oleg Nesterov
  0 siblings, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2014-04-16 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

It is too late for me to even try to read emails ;) perhaps I am totally
confused.

On 04/16, Russell King - ARM Linux wrote:
>
> On Wed, Apr 16, 2014 at 09:18:25PM +0200, Oleg Nesterov wrote:
> > Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
> > care, but could you please ack/nack my understanding?
>
> flush_dcache_area() doesn't touch the I-cache... the hint is in the
> name. :)  This is also the function which is used for flush_dcache_page()
> which we've already established isn't sufficient (for the same reason.)

Yes, we still need "i-cache flush where necessary" as David pointed.

> Plus... we still would need to know the user address(es) to flush for
> the I-cache side...

We know it, it is xol_vaddr in xol_get_insn_slot().

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 19:37             ` David Miller
@ 2014-04-16 20:24               ` David Long
  2014-04-16 21:21                 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: David Long @ 2014-04-16 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/16/14 15:37, David Miller wrote:
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Wed, 16 Apr 2014 21:18:25 +0200
> 
>> The last question... area->page = alloc_page(GFP_HIGHUSER), and I am
>> not sure that arch/arm/mm/highmem.c:kmap_atomic() can't break the
>> aliasing, __fix_to_virt() in this case will use the same (per-cpu) idx.
>>
>> Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
>> care, but could you please ack/nack my understanding?
> 
> Good point, it might therefore make sense to use a low-mem page.
> 

The following test code seems to have the same problems with stale user
icache.  It works if I put the dcache flush back in.  Am I missing
something?

-dl



diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..10ad973 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -34,6 +34,7 @@
 #include <linux/ptrace.h>      /* user_enable_single_step */
 #include <linux/kdebug.h>      /* notifier mechanism */
 #include "../../mm/internal.h" /* munlock_vma_page */
+#include <linux/mman.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/task_work.h>
 
@@ -1141,7 +1142,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
        if (!area->vaddr) {
                /* Try to map as high as possible, this is only a hint. */
                area->vaddr = get_unmapped_area(NULL, TASK_SIZE - PAGE_SIZE,
-                                               PAGE_SIZE, 0, 0);
+                                               PAGE_SIZE, page_to_pfn(area->page), MAP_SHARED);
                if (area->vaddr & ~PAGE_MASK) {
                        ret = area->vaddr;
                        goto fail;
@@ -1175,7 +1176,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
        if (!area->bitmap)
                goto free_area;
 
-       area->page = alloc_page(GFP_HIGHUSER);
+       area->page = alloc_page(GFP_USER);
        if (!area->page)
                goto free_bitmap;
 
@@ -1299,11 +1300,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
        /* Initialize the slot */
        copy_to_page(area->page, xol_vaddr,
                        &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-       /*
-        * We probably need flush_icache_user_range() but it needs vma.
-        * This should work on supported architectures too.
-        */
-       flush_dcache_page(area->page);
+/* Temporary hard-core icache flush for testing */
+       __flush_icache_all();
 
        return xol_vaddr;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 20:24               ` David Long
@ 2014-04-16 21:21                 ` David Miller
  2014-04-16 22:01                   ` Victor Kamensky
  2014-04-16 22:25                   ` Russell King - ARM Linux
  0 siblings, 2 replies; 18+ messages in thread
From: David Miller @ 2014-04-16 21:21 UTC (permalink / raw)
  To: linux-arm-kernel

From: David Long <dave.long@linaro.org>
Date: Wed, 16 Apr 2014 16:24:18 -0400

> On 04/16/14 15:37, David Miller wrote:
>> From: Oleg Nesterov <oleg@redhat.com>
>> Date: Wed, 16 Apr 2014 21:18:25 +0200
>> 
>>> The last question... area->page = alloc_page(GFP_HIGHUSER), and I am
>>> not sure that arch/arm/mm/highmem.c:kmap_atomic() can't break the
>>> aliasing, __fix_to_virt() in this case will use the same (per-cpu) idx.
>>>
>>> Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
>>> care, but could you please ack/nack my understanding?
>> 
>> Good point, it might therefore make sense to use a low-mem page.
>> 
> 
> The following test code seems to have the same problems with stale user
> icache.  It works if I put the dcache flush back in.  Am I missing
> something?

Weird, if we store to the kernel side it should be just a matter of
clearing the I-cache out.  There should be no D-cache aliasing
whatsoever.  Maybe you could print out area->vaddr and
page_to_virt(area->page) so we can see if area->vaddr is choosen
correctly?

Although I notice that flush_cache_user_range() on ARM flushes both D
and I caches.  And I think that's what userspace ends up triggering
when it uses the system call that exists to support self-modifying and
JIT code generators.

An ARM expert will have to chime in...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 21:21                 ` David Miller
@ 2014-04-16 22:01                   ` Victor Kamensky
  2014-04-16 22:25                   ` Russell King - ARM Linux
  1 sibling, 0 replies; 18+ messages in thread
From: Victor Kamensky @ 2014-04-16 22:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 16 April 2014 14:21, David Miller <davem@davemloft.net> wrote:
> From: David Long <dave.long@linaro.org>
> Date: Wed, 16 Apr 2014 16:24:18 -0400
>
>> On 04/16/14 15:37, David Miller wrote:
>>> From: Oleg Nesterov <oleg@redhat.com>
>>> Date: Wed, 16 Apr 2014 21:18:25 +0200
>>>
>>>> The last question... area->page = alloc_page(GFP_HIGHUSER), and I am
>>>> not sure that arch/arm/mm/highmem.c:kmap_atomic() can't break the
>>>> aliasing, __fix_to_virt() in this case will use the same (per-cpu) idx.
>>>>
>>>> Looks like, __kunmap_atomic()->__cpuc_flush_dcache_area() should take
>>>> care, but could you please ack/nack my understanding?
>>>
>>> Good point, it might therefore make sense to use a low-mem page.
>>>
>>
>> The following test code seems to have the same problems with stale user
>> icache.  It works if I put the dcache flush back in.  Am I missing
>> something?
>
> Weird, if we store to the kernel side it should be just a matter of
> clearing the I-cache out.  There should be no D-cache aliasing
> whatsoever.

I don't think there is dcache aliasing, but dcache should be flushed anyway
otherwise icache will not see updated values even it is invalidated. I.e
dcache should be flushed up to dcache/icache unification layer.
I.e icache does not read anything from dcache directly. It reads from L2.
As it was noted below flush_cache_user_range will take care of both,
but just __flush_icache_all is not enough.

I will try to add non-aliased mapping logic from Dave's patch on top
of previously posted patch (which goes under "write directly"), has
default/arm split, etc ...

Thanks,
Victor

>  Maybe you could print out area->vaddr and
> page_to_virt(area->page) so we can see if area->vaddr is choosen
> correctly?
>
> Although I notice that flush_cache_user_range() on ARM flushes both D
> and I caches.  And I think that's what userspace ends up triggering
> when it uses the system call that exists to support self-modifying and
> JIT code generators.
>
> An ARM expert will have to chime in...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 21:21                 ` David Miller
  2014-04-16 22:01                   ` Victor Kamensky
@ 2014-04-16 22:25                   ` Russell King - ARM Linux
  2014-04-16 23:19                     ` David Long
  2014-04-21 16:16                     ` David Long
  1 sibling, 2 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-04-16 22:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 16, 2014 at 05:21:53PM -0400, David Miller wrote:
> Weird, if we store to the kernel side it should be just a matter of
> clearing the I-cache out.  There should be no D-cache aliasing
> whatsoever.  Maybe you could print out area->vaddr and
> page_to_virt(area->page) so we can see if area->vaddr is choosen
> correctly?
> 
> Although I notice that flush_cache_user_range() on ARM flushes both D
> and I caches.  And I think that's what userspace ends up triggering
> when it uses the system call that exists to support self-modifying and
> JIT code generators.
> 
> An ARM expert will have to chime in...

So, David's patch is against the existing kernel (I checked the blob ID
in the patch.)

It looks like David just replaced flush_dcache_page() with
flush_icache_all() as a hack. So my question is... between
flush_dcache_page() and flush_icache_all(), what was the intermediary
(if any) that was attempted?


Now, I'm going to fill in some details for DaveM.  The type of the I/D
L1 caches found on any particular architecture version of CPU can be:

Arch	D-cache			I-cache
ARMv7	VIPT nonaliasing	VIVT ASID tagged
				PIPT
-------------------------------------------------
ARMv6	VIPT nonalising		VIPT nonaliasing
	VIPT aliasing		VIPT aliasing
-------------------------------------------------
ARMv5	VIVT			VIVT
&older

(For ARMv6, each can be either/or, though practically, there's no point
to I-aliasing and D-nonaliasing since it implies the I-cache is bigger
than the D-cache.)

Our I-caches don't snoop/see the D-cache at all - so writes need to be
pushed out to what we call the "point of unification" where the I and D
streams meet.  For anything we care about, that's normally the L2 cache -
L1 cache is harvard, L2 cache is unified.

Hence, we don't care which D-alias (if any) the data is written, so long
as it's pushed out of the L1 data cache so that it's visible to the L1
instruction cache.

If we're writing via a different mapping to that which is being executed,
I think the safest thing to do is to flush it out of the L1 D-cache at
the address it was written, and then flush any stale line from the L1
I-cache using the user address.  This is quite a unique requirement, and
we don't have anything which covers it.  The closest you could get is
to that using existing calls is:

1. write the new instruction
2. flush_dcache_page()
3. flush_cache_user_range() using the user address

and I think that should be safe on all the above cache types.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 22:25                   ` Russell King - ARM Linux
@ 2014-04-16 23:19                     ` David Long
  2014-04-21 16:16                     ` David Long
  1 sibling, 0 replies; 18+ messages in thread
From: David Long @ 2014-04-16 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/16/14 18:25, Russell King - ARM Linux wrote:
> On Wed, Apr 16, 2014 at 05:21:53PM -0400, David Miller wrote:
>> Weird, if we store to the kernel side it should be just a matter of
>> clearing the I-cache out.  There should be no D-cache aliasing
>> whatsoever.  Maybe you could print out area->vaddr and
>> page_to_virt(area->page) so we can see if area->vaddr is choosen
>> correctly?
>>
>> Although I notice that flush_cache_user_range() on ARM flushes both D
>> and I caches.  And I think that's what userspace ends up triggering
>> when it uses the system call that exists to support self-modifying and
>> JIT code generators.
>>
>> An ARM expert will have to chime in...
> 
> So, David's patch is against the existing kernel (I checked the blob ID
> in the patch.)

Sorry, that patch was against my uprobes-v7 branch which means it was v3.14-rc5
plus the uprobes work you pulled from me for v3.15.  Thanks for reminding me
it's time to update my repo.

> It looks like David just replaced flush_dcache_page() with
> flush_icache_all() as a hack. So my question is... between
> flush_dcache_page() and flush_icache_all(), what was the intermediary
> (if any) that was attempted?

The other combinations I tried were: 1) existing dcache flush followed by
flush_icache_all, which works;  2) existing dcache flush followed by:

	flush_icache_range(xol_vaddr, sizeof uprobe->arch.ixol);

...which also worked (xol_vaddr is the beginning of the two instruction
out-of-line sequence, and the sizeof works out to be 8).

I didn't bother trying flush_icache_user_range() because that is #define'd
to be just flush_dcache_page() on ARM, which I don't understand at all.

> 
> Now, I'm going to fill in some details for DaveM.  The type of the I/D
> L1 caches found on any particular architecture version of CPU can be:
> 
> Arch	D-cache			I-cache
> ARMv7	VIPT nonaliasing	VIVT ASID tagged
> 				PIPT
> -------------------------------------------------
> ARMv6	VIPT nonalising		VIPT nonaliasing
> 	VIPT aliasing		VIPT aliasing
> -------------------------------------------------
> ARMv5	VIVT			VIVT
> &older
> 
> (For ARMv6, each can be either/or, though practically, there's no point
> to I-aliasing and D-nonaliasing since it implies the I-cache is bigger
> than the D-cache.)
> 
> Our I-caches don't snoop/see the D-cache at all - so writes need to be
> pushed out to what we call the "point of unification" where the I and D
> streams meet.  For anything we care about, that's normally the L2 cache -
> L1 cache is harvard, L2 cache is unified.
> 
> Hence, we don't care which D-alias (if any) the data is written, so long
> as it's pushed out of the L1 data cache so that it's visible to the L1
> instruction cache.
> 
> If we're writing via a different mapping to that which is being executed,
> I think the safest thing to do is to flush it out of the L1 D-cache at
> the address it was written, and then flush any stale line from the L1
> I-cache using the user address.  This is quite a unique requirement, and
> we don't have anything which covers it.  The closest you could get is
> to that using existing calls is:
> 
> 1. write the new instruction
> 2. flush_dcache_page()
> 3. flush_cache_user_range() using the user address
> 
> and I think that should be safe on all the above cache types.
> 

OK, still needing the dcache flush makes sense to me.  I thought I was
reading from (the other) David that it shouldn't be necessary, but I
could not understand why.

-dl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-16 22:25                   ` Russell King - ARM Linux
  2014-04-16 23:19                     ` David Long
@ 2014-04-21 16:16                     ` David Long
  2014-04-21 16:41                       ` Linus Torvalds
  2014-04-21 17:56                       ` Victor Kamensky
  1 sibling, 2 replies; 18+ messages in thread
From: David Long @ 2014-04-21 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/16/14 18:25, Russell King - ARM Linux wrote:

> Our I-caches don't snoop/see the D-cache at all - so writes need to be
> pushed out to what we call the "point of unification" where the I and D
> streams meet.  For anything we care about, that's normally the L2 cache -
> L1 cache is harvard, L2 cache is unified.
> 
> Hence, we don't care which D-alias (if any) the data is written, so long
> as it's pushed out of the L1 data cache so that it's visible to the L1
> instruction cache.
> 
> If we're writing via a different mapping to that which is being executed,
> I think the safest thing to do is to flush it out of the L1 D-cache at
> the address it was written, and then flush any stale line from the L1
> I-cache using the user address.  This is quite a unique requirement, and
> we don't have anything which covers it.  The closest you could get is
> to that using existing calls is:
> 
> 1. write the new instruction
> 2. flush_dcache_page()
> 3. flush_cache_user_range() using the user address
> 
> and I think that should be safe on all the above cache types.
> 

It doesn't feel to me like we yet have a clear consensus on the appropriate
near or long-term fix for this problem.  I'm worried time is short to get a
fix in for v3.15.  I'm not sure how elegant that fix needs to be.  I've gotten
good test runs using a modified/simplified version of Victor's arch callback
and a slight variation of Russell's sequence of operations from above:

void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
			const void *src, int len)
{
	void *kaddr = kmap_atomic(page);

#ifdef CONFIG_SMP
	preempt_disable();
#endif
	memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
	clean_dcache_area(kaddr, len);
	flush_cache_user_range(vaddr, vaddr + len);
#ifdef CONFIG_SMP
	preempt_enable();
#endif
	kunmap_atomic(kaddr);
}


I have to say using clean_dcache_area() to write back the two words that changed
(and rest of the cache line of course) seems more appropriate than flushing a
whole page.  Are there implications in doing that which makes this a bad idea
though?

At any rate, for v3.15 do we want to persue the more complex solutions with
"congruent" mappings and use of copy_to_user(), or just something like the above
(plus the rest of Victors v3 patch)?  I'm sure Oleg is even less happy than me
about yet another arch_ callback but we can hold out the hope that a more elegant
solution can follow in the next release.  One that might introduce risk we can't
accept in v3.15 right now (e.g.: mapping the xol area writeable for all
architectures).

I have also tested (somewhat) both Victor's unmodified v3 and v4 patches on
exynos 5250 and found them to work.

Thanks,
-dl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-21 16:16                     ` David Long
@ 2014-04-21 16:41                       ` Linus Torvalds
  2014-04-21 17:56                       ` Victor Kamensky
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2014-04-21 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 21, 2014 at 9:16 AM, David Long <dave.long@linaro.org> wrote:
>
> void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>                         const void *src, int len)
> {
>         void *kaddr = kmap_atomic(page);
>
> #ifdef CONFIG_SMP
>         preempt_disable();
> #endif

That #ifdef seems totally pointless. Make it unconditional, or remove
it entirely (the kmap_atomic() already causes us to be non-preemptible
in practice).

           Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4] ARM: uprobes xol write directly to userspace
  2014-04-21 16:16                     ` David Long
  2014-04-21 16:41                       ` Linus Torvalds
@ 2014-04-21 17:56                       ` Victor Kamensky
  1 sibling, 0 replies; 18+ messages in thread
From: Victor Kamensky @ 2014-04-21 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 21 April 2014 09:16, David Long <dave.long@linaro.org> wrote:
> On 04/16/14 18:25, Russell King - ARM Linux wrote:
>
>> Our I-caches don't snoop/see the D-cache at all - so writes need to be
>> pushed out to what we call the "point of unification" where the I and D
>> streams meet.  For anything we care about, that's normally the L2 cache -
>> L1 cache is harvard, L2 cache is unified.
>>
>> Hence, we don't care which D-alias (if any) the data is written, so long
>> as it's pushed out of the L1 data cache so that it's visible to the L1
>> instruction cache.
>>
>> If we're writing via a different mapping to that which is being executed,
>> I think the safest thing to do is to flush it out of the L1 D-cache at
>> the address it was written, and then flush any stale line from the L1
>> I-cache using the user address.  This is quite a unique requirement, and
>> we don't have anything which covers it.  The closest you could get is
>> to that using existing calls is:
>>
>> 1. write the new instruction
>> 2. flush_dcache_page()
>> 3. flush_cache_user_range() using the user address
>>
>> and I think that should be safe on all the above cache types.
>>
>
> It doesn't feel to me like we yet have a clear consensus on the appropriate
> near or long-term fix for this problem.  I'm worried time is short to get a
> fix in for v3.15.  I'm not sure how elegant that fix needs to be.  I've gotten
> good test runs using a modified/simplified version of Victor's arch callback
> and a slight variation of Russell's sequence of operations from above:
>
> void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>                         const void *src, int len)
> {
>         void *kaddr = kmap_atomic(page);
>
> #ifdef CONFIG_SMP
>         preempt_disable();
> #endif
>         memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
>         clean_dcache_area(kaddr, len);
>         flush_cache_user_range(vaddr, vaddr + len);

I wonder would flush_cache_user_range addresses other cores
icache invaludation issue? We discussed that
user-land task while doing uprobes single step could be
migrated to another core. So if ARM cpu that has
cache_ops_need_broadcast() to true and migration
happens, other core icache may still has old stale instruction
by this address. Note ARM ptrace breakpoint write code in
flush_ptrace_access deals with such situation.

Given that cache_ops_need_broadcast() true is not typical
and cache invalidation in this case could be slow, but we
would like to be functionally correct even in such situations,
rather than experience very very rare mysterious crash of
user-land process under the trace.

> #ifdef CONFIG_SMP
>         preempt_enable();
> #endif
>         kunmap_atomic(kaddr);
> }
>
>
> I have to say using clean_dcache_area() to write back the two words that changed
> (and rest of the cache line of course) seems more appropriate than flushing a
> whole page.  Are there implications in doing that which makes this a bad idea
> though?
>
> At any rate, for v3.15 do we want to persue the more complex solutions with
> "congruent" mappings and use of copy_to_user(), or just something like the above
> (plus the rest of Victors v3 patch)?  I'm sure Oleg is even less happy than me
> about yet another arch_ callback but we can hold out the hope that a more elegant
> solution can follow in the next release.  One that might introduce risk we can't
> accept in v3.15 right now (e.g.: mapping the xol area writeable for all
> architectures).
>
> I have also tested (somewhat) both Victor's unmodified v3 and v4 patches on
> exynos 5250 and found them to work.

It seems to me that we cannot find common solution
for this issue and arch specific callback should be introduced.

If arch callback is introduced I will be more comfortable to keep
current behavior as default and as far as ARM specific implementation
is concerned it would be good to have code/logic sharing with code that
deals with ptrace breakpoint write.

IMHO such solution would be something like V3 [1] version of the patch,
Note [1] also needs to address Linus's comment about removing
'#ifdef CONFIG_SMP' in code sequence similar to above.

Thanks,
Victor

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/247793.html

> Thanks,
> -dl
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-04-21 17:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-16  5:31 [RFC PATCH v4] ARM: uprobes xol write directly to userspace Victor Kamensky
2014-04-16  5:31 ` Victor Kamensky
2014-04-16 14:51   ` Oleg Nesterov
2014-04-16 15:00     ` David Miller
2014-04-16 16:43       ` Oleg Nesterov
2014-04-16 17:38         ` David Miller
2014-04-16 19:18           ` Oleg Nesterov
2014-04-16 19:37             ` David Miller
2014-04-16 20:24               ` David Long
2014-04-16 21:21                 ` David Miller
2014-04-16 22:01                   ` Victor Kamensky
2014-04-16 22:25                   ` Russell King - ARM Linux
2014-04-16 23:19                     ` David Long
2014-04-21 16:16                     ` David Long
2014-04-21 16:41                       ` Linus Torvalds
2014-04-21 17:56                       ` Victor Kamensky
2014-04-16 19:53             ` Russell King - ARM Linux
2014-04-16 20:23               ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.