linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
@ 2016-01-12 21:46 Laura Abbott
  2016-01-13  0:01 ` Alexei Starovoitov
  2016-01-13 14:03 ` Ard Biesheuvel
  0 siblings, 2 replies; 15+ messages in thread
From: Laura Abbott @ 2016-01-12 21:46 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Laura Abbott, Kees Cook, linux-arm-kernel, linux-kernel


The range of set_memory_* is currently restricted to the module address
range because of difficulties in breaking down larger block sizes.
vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
function ranges and add a comment explaining why the range is restricted
the way it is.

Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
---
This should let the protections for the eBPF work as expected, I don't
know if there is some sort of self test for thatL.
---
 arch/arm64/mm/pageattr.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 3571c73..274208e 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -36,6 +36,26 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
 	return 0;
 }
 
+static bool validate_addr(unsigned long start, unsigned long end)
+{
+	/*
+	 * This check explicitly excludes most kernel memory. Most kernel
+	 * memory is mapped with a larger page size and breaking down the
+	 * larger page size without causing TLB conflicts is very difficult.
+	 *
+	 * If you need to call set_memory_* on a range, the recommendation is
+	 * to use vmalloc since that range is mapped with pages.
+	 */
+	if (start >= MODULES_VADDR && start < MODULES_END &&
+	    end >= MODULES_VADDR && end < MODULES_END)
+		return true;
+
+	if (is_vmalloc_addr(start) && is_vmalloc_addr(end))
+		return true;
+
+	return false;
+}
+
 static int change_memory_common(unsigned long addr, int numpages,
 				pgprot_t set_mask, pgprot_t clear_mask)
 {
@@ -51,10 +71,7 @@ static int change_memory_common(unsigned long addr, int numpages,
 		WARN_ON_ONCE(1);
 	}
 
-	if (start < MODULES_VADDR || start >= MODULES_END)
-		return -EINVAL;
-
-	if (end < MODULES_VADDR || end >= MODULES_END)
+	if (!validate_addr(start, end))
 		return -EINVAL;
 
 	data.set_mask = set_mask;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-12 21:46 [PATCH] arm64: Allow vmalloc regions to be set with set_memory_* Laura Abbott
@ 2016-01-13  0:01 ` Alexei Starovoitov
  2016-01-13  0:31   ` Daniel Borkmann
  2016-01-13 14:03 ` Ard Biesheuvel
  1 sibling, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2016-01-13  0:01 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Kees Cook,
	linux-arm-kernel, linux-kernel

On Tue, Jan 12, 2016 at 01:46:27PM -0800, Laura Abbott wrote:
> 
> The range of set_memory_* is currently restricted to the module address
> range because of difficulties in breaking down larger block sizes.
> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
> function ranges and add a comment explaining why the range is restricted
> the way it is.
> 
> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
> ---
> This should let the protections for the eBPF work as expected, I don't
> know if there is some sort of self test for thatL.

you can test it with:
# sysctl net.core.bpf_jit_enable=1
# insmod test_bpf.ko

On x64 it shows:
test_bpf: Summary: 291 PASSED, 0 FAILED, [282/283 JIT'ed]

arm64 may have less JIT'ed tests.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-13  0:01 ` Alexei Starovoitov
@ 2016-01-13  0:31   ` Daniel Borkmann
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-01-13  0:31 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Alexei Starovoitov, Catalin Marinas, Will Deacon, Mark Rutland,
	Kees Cook, linux-arm-kernel, linux-kernel

On 01/13/2016 01:01 AM, Alexei Starovoitov wrote:
> On Tue, Jan 12, 2016 at 01:46:27PM -0800, Laura Abbott wrote:
>>
>> The range of set_memory_* is currently restricted to the module address
>> range because of difficulties in breaking down larger block sizes.
>> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
>> function ranges and add a comment explaining why the range is restricted
>> the way it is.
>>
>> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
>> ---
>> This should let the protections for the eBPF work as expected, I don't
>> know if there is some sort of self test for thatL.
>
> you can test it with:
> # sysctl net.core.bpf_jit_enable=1
> # insmod test_bpf.ko
>
> On x64 it shows:
> test_bpf: Summary: 291 PASSED, 0 FAILED, [282/283 JIT'ed]
>
> arm64 may have less JIT'ed tests.

Also, in that lib/test_bpf.c file, you can do a test by overwriting/'corrupting'
part of the fp->insnsi instructions right after bpf_prog_select_runtime(fp) to
see if setting the bpf_prog as RO works.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-12 21:46 [PATCH] arm64: Allow vmalloc regions to be set with set_memory_* Laura Abbott
  2016-01-13  0:01 ` Alexei Starovoitov
@ 2016-01-13 14:03 ` Ard Biesheuvel
  2016-01-13 16:10   ` Ard Biesheuvel
  1 sibling, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 14:03 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Kees Cook,
	linux-kernel, linux-arm-kernel

On 12 January 2016 at 22:46, Laura Abbott <labbott@fedoraproject.org> wrote:
>
> The range of set_memory_* is currently restricted to the module address
> range because of difficulties in breaking down larger block sizes.
> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
> function ranges and add a comment explaining why the range is restricted
> the way it is.
>
> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
> ---
> This should let the protections for the eBPF work as expected, I don't
> know if there is some sort of self test for thatL.


This is going to conflict with my KASLR implementation, since it puts
the kernel image right in the middle of the vmalloc area, and the
kernel is obviously mapped with block mappings. In fact, I am
proposing enabling huge-vmap for arm64 as well, since it seems an
improvement generally, but also specifically allows me to unmap the
__init section using the generic vunmap code (remove_vm_area). But in
general, I think the assumption that the whole vmalloc area is mapped
using pages is not tenable.

AFAICT, vmalloc still use pages exclusively even with huge-vmap (but
ioremap does not). So perhaps it would make sense to check for the
VM_ALLOC bit in the VMA flags (which I will not set for the kernel
regions either)


> ---
>  arch/arm64/mm/pageattr.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 3571c73..274208e 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -36,6 +36,26 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
>         return 0;
>  }
>
> +static bool validate_addr(unsigned long start, unsigned long end)
> +{
> +       /*
> +        * This check explicitly excludes most kernel memory. Most kernel
> +        * memory is mapped with a larger page size and breaking down the
> +        * larger page size without causing TLB conflicts is very difficult.
> +        *
> +        * If you need to call set_memory_* on a range, the recommendation is
> +        * to use vmalloc since that range is mapped with pages.
> +        */
> +       if (start >= MODULES_VADDR && start < MODULES_END &&
> +           end >= MODULES_VADDR && end < MODULES_END)
> +               return true;
> +
> +       if (is_vmalloc_addr(start) && is_vmalloc_addr(end))
> +               return true;
> +
> +       return false;
> +}
> +
>  static int change_memory_common(unsigned long addr, int numpages,
>                                 pgprot_t set_mask, pgprot_t clear_mask)
>  {
> @@ -51,10 +71,7 @@ static int change_memory_common(unsigned long addr, int numpages,
>                 WARN_ON_ONCE(1);
>         }
>
> -       if (start < MODULES_VADDR || start >= MODULES_END)
> -               return -EINVAL;
> -
> -       if (end < MODULES_VADDR || end >= MODULES_END)
> +       if (!validate_addr(start, end))
>                 return -EINVAL;
>
>         data.set_mask = set_mask;
> --
> 2.5.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-13 14:03 ` Ard Biesheuvel
@ 2016-01-13 16:10   ` Ard Biesheuvel
  2016-01-14 23:01     ` Laura Abbott
  2016-01-18 11:56     ` Mark Rutland
  0 siblings, 2 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 16:10 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Kees Cook,
	linux-kernel, linux-arm-kernel

On 13 January 2016 at 15:03, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 January 2016 at 22:46, Laura Abbott <labbott@fedoraproject.org> wrote:
>>
>> The range of set_memory_* is currently restricted to the module address
>> range because of difficulties in breaking down larger block sizes.
>> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
>> function ranges and add a comment explaining why the range is restricted
>> the way it is.
>>
>> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
>> ---
>> This should let the protections for the eBPF work as expected, I don't
>> know if there is some sort of self test for thatL.
>
>
> This is going to conflict with my KASLR implementation, since it puts
> the kernel image right in the middle of the vmalloc area, and the
> kernel is obviously mapped with block mappings. In fact, I am
> proposing enabling huge-vmap for arm64 as well, since it seems an
> improvement generally, but also specifically allows me to unmap the
> __init section using the generic vunmap code (remove_vm_area). But in
> general, I think the assumption that the whole vmalloc area is mapped
> using pages is not tenable.
>
> AFAICT, vmalloc still use pages exclusively even with huge-vmap (but
> ioremap does not). So perhaps it would make sense to check for the
> VM_ALLOC bit in the VMA flags (which I will not set for the kernel
> regions either)
>

Something along these lines, perhaps?

diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 3571c7309c5e..bda0a776c58e 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/module.h>
+#include <linux/vmalloc.h>
 #include <linux/sched.h>

 #include <asm/pgtable.h>
@@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
        unsigned long end = start + size;
        int ret;
        struct page_change_data data;
+       struct vm_struct *area;

        if (!PAGE_ALIGNED(addr)) {
                start &= PAGE_MASK;
@@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
                WARN_ON_ONCE(1);
        }

-       if (start < MODULES_VADDR || start >= MODULES_END)
-               return -EINVAL;
-
-       if (end < MODULES_VADDR || end >= MODULES_END)
+       /*
+        * Check whether the [addr, addr + size) interval is entirely
+        * covered by precisely one VM area that has the VM_ALLOC flag set
+        */
+       area = find_vm_area((void *)addr);
+       if (!area ||
+           end > (unsigned long)area->addr + area->size ||
+           !(area->flags & VM_ALLOC))
                return -EINVAL;

        data.set_mask = set_mask;


>
>> ---
>>  arch/arm64/mm/pageattr.c | 25 +++++++++++++++++++++----
>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>> index 3571c73..274208e 100644
>> --- a/arch/arm64/mm/pageattr.c
>> +++ b/arch/arm64/mm/pageattr.c
>> @@ -36,6 +36,26 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
>>         return 0;
>>  }
>>
>> +static bool validate_addr(unsigned long start, unsigned long end)
>> +{
>> +       /*
>> +        * This check explicitly excludes most kernel memory. Most kernel
>> +        * memory is mapped with a larger page size and breaking down the
>> +        * larger page size without causing TLB conflicts is very difficult.
>> +        *
>> +        * If you need to call set_memory_* on a range, the recommendation is
>> +        * to use vmalloc since that range is mapped with pages.
>> +        */
>> +       if (start >= MODULES_VADDR && start < MODULES_END &&
>> +           end >= MODULES_VADDR && end < MODULES_END)
>> +               return true;
>> +
>> +       if (is_vmalloc_addr(start) && is_vmalloc_addr(end))
>> +               return true;
>> +
>> +       return false;
>> +}
>> +
>>  static int change_memory_common(unsigned long addr, int numpages,
>>                                 pgprot_t set_mask, pgprot_t clear_mask)
>>  {
>> @@ -51,10 +71,7 @@ static int change_memory_common(unsigned long addr, int numpages,
>>                 WARN_ON_ONCE(1);
>>         }
>>
>> -       if (start < MODULES_VADDR || start >= MODULES_END)
>> -               return -EINVAL;
>> -
>> -       if (end < MODULES_VADDR || end >= MODULES_END)
>> +       if (!validate_addr(start, end))
>>                 return -EINVAL;
>>
>>         data.set_mask = set_mask;
>> --
>> 2.5.0
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-13 16:10   ` Ard Biesheuvel
@ 2016-01-14 23:01     ` Laura Abbott
  2016-01-18 11:56     ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: Laura Abbott @ 2016-01-14 23:01 UTC (permalink / raw)
  To: Ard Biesheuvel, Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Kees Cook,
	linux-kernel, linux-arm-kernel

On 01/13/2016 08:10 AM, Ard Biesheuvel wrote:
> On 13 January 2016 at 15:03, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 12 January 2016 at 22:46, Laura Abbott <labbott@fedoraproject.org> wrote:
>>>
>>> The range of set_memory_* is currently restricted to the module address
>>> range because of difficulties in breaking down larger block sizes.
>>> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
>>> function ranges and add a comment explaining why the range is restricted
>>> the way it is.
>>>
>>> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
>>> ---
>>> This should let the protections for the eBPF work as expected, I don't
>>> know if there is some sort of self test for thatL.
>>
>>
>> This is going to conflict with my KASLR implementation, since it puts
>> the kernel image right in the middle of the vmalloc area, and the
>> kernel is obviously mapped with block mappings. In fact, I am
>> proposing enabling huge-vmap for arm64 as well, since it seems an
>> improvement generally, but also specifically allows me to unmap the
>> __init section using the generic vunmap code (remove_vm_area). But in
>> general, I think the assumption that the whole vmalloc area is mapped
>> using pages is not tenable.
>>
>> AFAICT, vmalloc still use pages exclusively even with huge-vmap (but
>> ioremap does not). So perhaps it would make sense to check for the
>> VM_ALLOC bit in the VMA flags (which I will not set for the kernel
>> regions either)
>>
>
> Something along these lines, perhaps?
>
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 3571c7309c5e..bda0a776c58e 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -13,6 +13,7 @@
>   #include <linux/kernel.h>
>   #include <linux/mm.h>
>   #include <linux/module.h>
> +#include <linux/vmalloc.h>
>   #include <linux/sched.h>
>
>   #include <asm/pgtable.h>
> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
>          unsigned long end = start + size;
>          int ret;
>          struct page_change_data data;
> +       struct vm_struct *area;
>
>          if (!PAGE_ALIGNED(addr)) {
>                  start &= PAGE_MASK;
> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
>                  WARN_ON_ONCE(1);
>          }
>
> -       if (start < MODULES_VADDR || start >= MODULES_END)
> -               return -EINVAL;
> -
> -       if (end < MODULES_VADDR || end >= MODULES_END)
> +       /*
> +        * Check whether the [addr, addr + size) interval is entirely
> +        * covered by precisely one VM area that has the VM_ALLOC flag set
> +        */
> +       area = find_vm_area((void *)addr);
> +       if (!area ||
> +           end > (unsigned long)area->addr + area->size ||
> +           !(area->flags & VM_ALLOC))
>                  return -EINVAL;
>
>          data.set_mask = set_mask;
>
>

The patch works in my test for both vmalloc and module spaces. Can we either
keep my comment or add another that explains why we are excluding the rest
of the kernel range?

Thanks,
Laura

  
>>
>>> ---
>>>   arch/arm64/mm/pageattr.c | 25 +++++++++++++++++++++----
>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>> index 3571c73..274208e 100644
>>> --- a/arch/arm64/mm/pageattr.c
>>> +++ b/arch/arm64/mm/pageattr.c
>>> @@ -36,6 +36,26 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
>>>          return 0;
>>>   }
>>>
>>> +static bool validate_addr(unsigned long start, unsigned long end)
>>> +{
>>> +       /*
>>> +        * This check explicitly excludes most kernel memory. Most kernel
>>> +        * memory is mapped with a larger page size and breaking down the
>>> +        * larger page size without causing TLB conflicts is very difficult.
>>> +        *
>>> +        * If you need to call set_memory_* on a range, the recommendation is
>>> +        * to use vmalloc since that range is mapped with pages.
>>> +        */
>>> +       if (start >= MODULES_VADDR && start < MODULES_END &&
>>> +           end >= MODULES_VADDR && end < MODULES_END)
>>> +               return true;
>>> +
>>> +       if (is_vmalloc_addr(start) && is_vmalloc_addr(end))
>>> +               return true;
>>> +
>>> +       return false;
>>> +}
>>> +
>>>   static int change_memory_common(unsigned long addr, int numpages,
>>>                                  pgprot_t set_mask, pgprot_t clear_mask)
>>>   {
>>> @@ -51,10 +71,7 @@ static int change_memory_common(unsigned long addr, int numpages,
>>>                  WARN_ON_ONCE(1);
>>>          }
>>>
>>> -       if (start < MODULES_VADDR || start >= MODULES_END)
>>> -               return -EINVAL;
>>> -
>>> -       if (end < MODULES_VADDR || end >= MODULES_END)
>>> +       if (!validate_addr(start, end))
>>>                  return -EINVAL;
>>>
>>>          data.set_mask = set_mask;
>>> --
>>> 2.5.0
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-13 16:10   ` Ard Biesheuvel
  2016-01-14 23:01     ` Laura Abbott
@ 2016-01-18 11:56     ` Mark Rutland
  2016-01-28  1:47       ` Xishi Qiu
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2016-01-18 11:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Laura Abbott, Catalin Marinas, Will Deacon, Kees Cook,
	linux-kernel, linux-arm-kernel

On Wed, Jan 13, 2016 at 05:10:31PM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 15:03, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 12 January 2016 at 22:46, Laura Abbott <labbott@fedoraproject.org> wrote:
> >>
> >> The range of set_memory_* is currently restricted to the module address
> >> range because of difficulties in breaking down larger block sizes.
> >> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
> >> function ranges and add a comment explaining why the range is restricted
> >> the way it is.
> >>
> >> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
> >> ---
> >> This should let the protections for the eBPF work as expected, I don't
> >> know if there is some sort of self test for thatL.
> >
> >
> > This is going to conflict with my KASLR implementation, since it puts
> > the kernel image right in the middle of the vmalloc area, and the
> > kernel is obviously mapped with block mappings. In fact, I am
> > proposing enabling huge-vmap for arm64 as well, since it seems an
> > improvement generally, but also specifically allows me to unmap the
> > __init section using the generic vunmap code (remove_vm_area). But in
> > general, I think the assumption that the whole vmalloc area is mapped
> > using pages is not tenable.
> >
> > AFAICT, vmalloc still use pages exclusively even with huge-vmap (but
> > ioremap does not). So perhaps it would make sense to check for the
> > VM_ALLOC bit in the VMA flags (which I will not set for the kernel
> > regions either)
> >
> 
> Something along these lines, perhaps?
> 
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 3571c7309c5e..bda0a776c58e 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -13,6 +13,7 @@
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  #include <linux/module.h>
> +#include <linux/vmalloc.h>
>  #include <linux/sched.h>
> 
>  #include <asm/pgtable.h>
> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
>         unsigned long end = start + size;
>         int ret;
>         struct page_change_data data;
> +       struct vm_struct *area;
> 
>         if (!PAGE_ALIGNED(addr)) {
>                 start &= PAGE_MASK;
> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
>                 WARN_ON_ONCE(1);
>         }
> 
> -       if (start < MODULES_VADDR || start >= MODULES_END)
> -               return -EINVAL;
> -
> -       if (end < MODULES_VADDR || end >= MODULES_END)
> +       /*
> +        * Check whether the [addr, addr + size) interval is entirely
> +        * covered by precisely one VM area that has the VM_ALLOC flag set
> +        */
> +       area = find_vm_area((void *)addr);
> +       if (!area ||
> +           end > (unsigned long)area->addr + area->size ||
> +           !(area->flags & VM_ALLOC))
>                 return -EINVAL;
> 
>         data.set_mask = set_mask;

Neat. That fixes the fencepost bug too.

Looks good to me, though as Laura suggested we should have a comment as
to why we limit changes to such regions. Fancy taking her wording below
and spinning this as a patch?

> >> +       /*
> >> +        * This check explicitly excludes most kernel memory. Most kernel
> >> +        * memory is mapped with a larger page size and breaking down the
> >> +        * larger page size without causing TLB conflicts is very difficult.
> >> +        *
> >> +        * If you need to call set_memory_* on a range, the recommendation is
> >> +        * to use vmalloc since that range is mapped with pages.
> >> +        */

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-18 11:56     ` Mark Rutland
@ 2016-01-28  1:47       ` Xishi Qiu
  2016-01-28 10:51         ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Xishi Qiu @ 2016-01-28  1:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang

On 2016/1/18 19:56, Mark Rutland wrote:

> On Wed, Jan 13, 2016 at 05:10:31PM +0100, Ard Biesheuvel wrote:
>> On 13 January 2016 at 15:03, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> On 12 January 2016 at 22:46, Laura Abbott <labbott@fedoraproject.org> wrote:
>>>>
>>>> The range of set_memory_* is currently restricted to the module address
>>>> range because of difficulties in breaking down larger block sizes.
>>>> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
>>>> function ranges and add a comment explaining why the range is restricted
>>>> the way it is.
>>>>
>>>> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
>>>> ---
>>>> This should let the protections for the eBPF work as expected, I don't
>>>> know if there is some sort of self test for thatL.
>>>
>>>
>>> This is going to conflict with my KASLR implementation, since it puts
>>> the kernel image right in the middle of the vmalloc area, and the
>>> kernel is obviously mapped with block mappings. In fact, I am
>>> proposing enabling huge-vmap for arm64 as well, since it seems an
>>> improvement generally, but also specifically allows me to unmap the
>>> __init section using the generic vunmap code (remove_vm_area). But in
>>> general, I think the assumption that the whole vmalloc area is mapped
>>> using pages is not tenable.
>>>
>>> AFAICT, vmalloc still use pages exclusively even with huge-vmap (but
>>> ioremap does not). So perhaps it would make sense to check for the
>>> VM_ALLOC bit in the VMA flags (which I will not set for the kernel
>>> regions either)
>>>
>>
>> Something along these lines, perhaps?
>>
>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>> index 3571c7309c5e..bda0a776c58e 100644
>> --- a/arch/arm64/mm/pageattr.c
>> +++ b/arch/arm64/mm/pageattr.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/mm.h>
>>  #include <linux/module.h>
>> +#include <linux/vmalloc.h>
>>  #include <linux/sched.h>
>>
>>  #include <asm/pgtable.h>
>> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
>>         unsigned long end = start + size;
>>         int ret;
>>         struct page_change_data data;
>> +       struct vm_struct *area;
>>
>>         if (!PAGE_ALIGNED(addr)) {
>>                 start &= PAGE_MASK;
>> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
>>                 WARN_ON_ONCE(1);
>>         }
>>
>> -       if (start < MODULES_VADDR || start >= MODULES_END)
>> -               return -EINVAL;
>> -
>> -       if (end < MODULES_VADDR || end >= MODULES_END)
>> +       /*
>> +        * Check whether the [addr, addr + size) interval is entirely
>> +        * covered by precisely one VM area that has the VM_ALLOC flag set
>> +        */
>> +       area = find_vm_area((void *)addr);
>> +       if (!area ||
>> +           end > (unsigned long)area->addr + area->size ||
>> +           !(area->flags & VM_ALLOC))
>>                 return -EINVAL;
>>
>>         data.set_mask = set_mask;
> 
> Neat. That fixes the fencepost bug too.
> 
> Looks good to me, though as Laura suggested we should have a comment as
> to why we limit changes to such regions. Fancy taking her wording below
> and spinning this as a patch?
> 
>>>> +       /*
>>>> +        * This check explicitly excludes most kernel memory. Most kernel
>>>> +        * memory is mapped with a larger page size and breaking down the
>>>> +        * larger page size without causing TLB conflicts is very difficult.
>>>> +        *
>>>> +        * If you need to call set_memory_* on a range, the recommendation is
>>>> +        * to use vmalloc since that range is mapped with pages.
>>>> +        */
> 
> Thanks,
> Mark.
> 

Hi Mark,

After change the flag, it calls only flush_tlb_kernel_range(), so why not use 
cpu_replace_ttbr1(swapper_pg_dir)? 

One more question, does TLB conflict only affect kernel page talbe?
There is no problem when spliting the transparent hugepage, right?

Thanks,
Xishi Qiu

> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-28  1:47       ` Xishi Qiu
@ 2016-01-28 10:51         ` Mark Rutland
  2016-01-28 11:47           ` Xishi Qiu
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2016-01-28 10:51 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper

On Thu, Jan 28, 2016 at 09:47:20AM +0800, Xishi Qiu wrote:
> On 2016/1/18 19:56, Mark Rutland wrote:
> > On Wed, Jan 13, 2016 at 05:10:31PM +0100, Ard Biesheuvel wrote:
> >> Something along these lines, perhaps?
> >>
> >> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> >> index 3571c7309c5e..bda0a776c58e 100644
> >> --- a/arch/arm64/mm/pageattr.c
> >> +++ b/arch/arm64/mm/pageattr.c
> >> @@ -13,6 +13,7 @@
> >>  #include <linux/kernel.h>
> >>  #include <linux/mm.h>
> >>  #include <linux/module.h>
> >> +#include <linux/vmalloc.h>
> >>  #include <linux/sched.h>
> >>
> >>  #include <asm/pgtable.h>
> >> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
> >>         unsigned long end = start + size;
> >>         int ret;
> >>         struct page_change_data data;
> >> +       struct vm_struct *area;
> >>
> >>         if (!PAGE_ALIGNED(addr)) {
> >>                 start &= PAGE_MASK;
> >> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
> >>                 WARN_ON_ONCE(1);
> >>         }
> >>
> >> -       if (start < MODULES_VADDR || start >= MODULES_END)
> >> -               return -EINVAL;
> >> -
> >> -       if (end < MODULES_VADDR || end >= MODULES_END)
> >> +       /*
> >> +        * Check whether the [addr, addr + size) interval is entirely
> >> +        * covered by precisely one VM area that has the VM_ALLOC flag set
> >> +        */
> >> +       area = find_vm_area((void *)addr);
> >> +       if (!area ||
> >> +           end > (unsigned long)area->addr + area->size ||
> >> +           !(area->flags & VM_ALLOC))
> >>                 return -EINVAL;
> >>
> >>         data.set_mask = set_mask;
> > 
> > Neat. That fixes the fencepost bug too.
> > 
> > Looks good to me, though as Laura suggested we should have a comment as
> > to why we limit changes to such regions. Fancy taking her wording below
> > and spinning this as a patch?
> > 
> >>>> +       /*
> >>>> +        * This check explicitly excludes most kernel memory. Most kernel
> >>>> +        * memory is mapped with a larger page size and breaking down the
> >>>> +        * larger page size without causing TLB conflicts is very difficult.
> >>>> +        *
> >>>> +        * If you need to call set_memory_* on a range, the recommendation is
> >>>> +        * to use vmalloc since that range is mapped with pages.
> >>>> +        */
> > 
> > Thanks,
> > Mark.
> > 
> 
> Hi Mark,
> 
> After change the flag, it calls only flush_tlb_kernel_range(), so why not use 
> cpu_replace_ttbr1(swapper_pg_dir)? 

We cannot use cpu_replace_ttbr1 here. Other CPUs may be online, and we
have no mechanism to place them in a safe set of page tables while
swapping TTBR1, we'd have to perform a deep copy of tables, and this
would be horrendously expensive.

Using flush_tlb_kernel_range() is sufficient. As modules don't share a
page or section mapping with other users, we can follow a
Break-Before-Make approach. Additionally, they're mapped at page
granularity so we never split or fuse sections anyway. We only modify
the permission bits.

> One more question, does TLB conflict only affect kernel page talbe?

It's harder to solve for the text/linear map as we can't do
Break-Before-Make without potentially unmapping something in active use
(e.g. the code used to implement Break-Before-Make).

> There is no problem when spliting the transparent hugepage, right?

There was a potential problem with huge pages causing TLB conflicts,
which didn't always seem to follow a Break-Before-Make approach.

I believe that Kirill Shutemov's recent THP rework means that
section->table and table->section conversions always go via an invalid
entry, with appropriate TLB invalidation, making that safe. I have not
yet had the chance to verify that yet, however.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-28 10:51         ` Mark Rutland
@ 2016-01-28 11:47           ` Xishi Qiu
  2016-01-28 14:27             ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Xishi Qiu @ 2016-01-28 11:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper

On 2016/1/28 18:51, Mark Rutland wrote:

> On Thu, Jan 28, 2016 at 09:47:20AM +0800, Xishi Qiu wrote:
>> On 2016/1/18 19:56, Mark Rutland wrote:
>>> On Wed, Jan 13, 2016 at 05:10:31PM +0100, Ard Biesheuvel wrote:
>>>> Something along these lines, perhaps?
>>>>
>>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>>> index 3571c7309c5e..bda0a776c58e 100644
>>>> --- a/arch/arm64/mm/pageattr.c
>>>> +++ b/arch/arm64/mm/pageattr.c
>>>> @@ -13,6 +13,7 @@
>>>>  #include <linux/kernel.h>
>>>>  #include <linux/mm.h>
>>>>  #include <linux/module.h>
>>>> +#include <linux/vmalloc.h>
>>>>  #include <linux/sched.h>
>>>>
>>>>  #include <asm/pgtable.h>
>>>> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
>>>>         unsigned long end = start + size;
>>>>         int ret;
>>>>         struct page_change_data data;
>>>> +       struct vm_struct *area;
>>>>
>>>>         if (!PAGE_ALIGNED(addr)) {
>>>>                 start &= PAGE_MASK;
>>>> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
>>>>                 WARN_ON_ONCE(1);
>>>>         }
>>>>
>>>> -       if (start < MODULES_VADDR || start >= MODULES_END)
>>>> -               return -EINVAL;
>>>> -
>>>> -       if (end < MODULES_VADDR || end >= MODULES_END)
>>>> +       /*
>>>> +        * Check whether the [addr, addr + size) interval is entirely
>>>> +        * covered by precisely one VM area that has the VM_ALLOC flag set
>>>> +        */
>>>> +       area = find_vm_area((void *)addr);
>>>> +       if (!area ||
>>>> +           end > (unsigned long)area->addr + area->size ||
>>>> +           !(area->flags & VM_ALLOC))
>>>>                 return -EINVAL;
>>>>
>>>>         data.set_mask = set_mask;
>>>
>>> Neat. That fixes the fencepost bug too.
>>>
>>> Looks good to me, though as Laura suggested we should have a comment as
>>> to why we limit changes to such regions. Fancy taking her wording below
>>> and spinning this as a patch?
>>>
>>>>>> +       /*
>>>>>> +        * This check explicitly excludes most kernel memory. Most kernel
>>>>>> +        * memory is mapped with a larger page size and breaking down the
>>>>>> +        * larger page size without causing TLB conflicts is very difficult.
>>>>>> +        *
>>>>>> +        * If you need to call set_memory_* on a range, the recommendation is
>>>>>> +        * to use vmalloc since that range is mapped with pages.
>>>>>> +        */
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> After change the flag, it calls only flush_tlb_kernel_range(), so why not use 
>> cpu_replace_ttbr1(swapper_pg_dir)? 
> 
> We cannot use cpu_replace_ttbr1 here. Other CPUs may be online, and we
> have no mechanism to place them in a safe set of page tables while
> swapping TTBR1, we'd have to perform a deep copy of tables, and this
> would be horrendously expensive.
> 
> Using flush_tlb_kernel_range() is sufficient. As modules don't share a
> page or section mapping with other users, we can follow a
> Break-Before-Make approach. Additionally, they're mapped at page
> granularity so we never split or fuse sections anyway. We only modify
> the permission bits.
> 

Hi Mark,

Is it safe in the following path?

alloc the whole linear map section
cpu A write something on it
cpu B write something on it
cpu C set read only flag and call flush_tlb_kernel_range()

Thanks,
Xishi Qiu

>> One more question, does TLB conflict only affect kernel page talbe?
> 
> It's harder to solve for the text/linear map as we can't do
> Break-Before-Make without potentially unmapping something in active use
> (e.g. the code used to implement Break-Before-Make).
> 
>> There is no problem when spliting the transparent hugepage, right?
> 
> There was a potential problem with huge pages causing TLB conflicts,
> which didn't always seem to follow a Break-Before-Make approach.
> 
> I believe that Kirill Shutemov's recent THP rework means that
> section->table and table->section conversions always go via an invalid
> entry, with appropriate TLB invalidation, making that safe. I have not
> yet had the chance to verify that yet, however.
> 
> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-28 11:47           ` Xishi Qiu
@ 2016-01-28 14:27             ` Mark Rutland
  2016-01-29  1:21               ` Xishi Qiu
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2016-01-28 14:27 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper

On Thu, Jan 28, 2016 at 07:47:09PM +0800, Xishi Qiu wrote:
> On 2016/1/28 18:51, Mark Rutland wrote:
> 
> > On Thu, Jan 28, 2016 at 09:47:20AM +0800, Xishi Qiu wrote:
> >> On 2016/1/18 19:56, Mark Rutland wrote:
> >>> On Wed, Jan 13, 2016 at 05:10:31PM +0100, Ard Biesheuvel wrote:
> >>>> Something along these lines, perhaps?
> >>>>
> >>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> >>>> index 3571c7309c5e..bda0a776c58e 100644
> >>>> --- a/arch/arm64/mm/pageattr.c
> >>>> +++ b/arch/arm64/mm/pageattr.c
> >>>> @@ -13,6 +13,7 @@
> >>>>  #include <linux/kernel.h>
> >>>>  #include <linux/mm.h>
> >>>>  #include <linux/module.h>
> >>>> +#include <linux/vmalloc.h>
> >>>>  #include <linux/sched.h>
> >>>>
> >>>>  #include <asm/pgtable.h>
> >>>> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
> >>>>         unsigned long end = start + size;
> >>>>         int ret;
> >>>>         struct page_change_data data;
> >>>> +       struct vm_struct *area;
> >>>>
> >>>>         if (!PAGE_ALIGNED(addr)) {
> >>>>                 start &= PAGE_MASK;
> >>>> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
> >>>>                 WARN_ON_ONCE(1);
> >>>>         }
> >>>>
> >>>> -       if (start < MODULES_VADDR || start >= MODULES_END)
> >>>> -               return -EINVAL;
> >>>> -
> >>>> -       if (end < MODULES_VADDR || end >= MODULES_END)
> >>>> +       /*
> >>>> +        * Check whether the [addr, addr + size) interval is entirely
> >>>> +        * covered by precisely one VM area that has the VM_ALLOC flag set
> >>>> +        */
> >>>> +       area = find_vm_area((void *)addr);
> >>>> +       if (!area ||
> >>>> +           end > (unsigned long)area->addr + area->size ||
> >>>> +           !(area->flags & VM_ALLOC))
> >>>>                 return -EINVAL;
> >>>>
> >>>>         data.set_mask = set_mask;
> >>>
> >>> Neat. That fixes the fencepost bug too.
> >>>
> >>> Looks good to me, though as Laura suggested we should have a comment as
> >>> to why we limit changes to such regions. Fancy taking her wording below
> >>> and spinning this as a patch?
> >>>
> >>>>>> +       /*
> >>>>>> +        * This check explicitly excludes most kernel memory. Most kernel
> >>>>>> +        * memory is mapped with a larger page size and breaking down the
> >>>>>> +        * larger page size without causing TLB conflicts is very difficult.
> >>>>>> +        *
> >>>>>> +        * If you need to call set_memory_* on a range, the recommendation is
> >>>>>> +        * to use vmalloc since that range is mapped with pages.
> >>>>>> +        */
> >>>
> >>> Thanks,
> >>> Mark.
> >>>
> >>
> >> Hi Mark,
> >>
> >> After change the flag, it calls only flush_tlb_kernel_range(), so why not use 
> >> cpu_replace_ttbr1(swapper_pg_dir)? 
> > 
> > We cannot use cpu_replace_ttbr1 here. Other CPUs may be online, and we
> > have no mechanism to place them in a safe set of page tables while
> > swapping TTBR1, we'd have to perform a deep copy of tables, and this
> > would be horrendously expensive.
> > 
> > Using flush_tlb_kernel_range() is sufficient. As modules don't share a
> > page or section mapping with other users, we can follow a
> > Break-Before-Make approach. Additionally, they're mapped at page
> > granularity so we never split or fuse sections anyway. We only modify
> > the permission bits.
> > 
> 
> Hi Mark,
> 
> Is it safe in the following path?
> 
> alloc the whole linear map section

I don't understand what you mean by this, you will need to elaborate.
The terms "alloc" and "section" can mean a number of different things in
this context.

> cpu A write something on it
> cpu B write something on it
> cpu C set read only flag and call flush_tlb_kernel_range()

If you want to modify a portion of the linear map, this will not work.
Modfiying the linear map in this manner is not safe.

If you want an alias of the linear map which was mapped using pages, and
you wanted to change that alias, that could work.

It depends on what precisely you are trying to do.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-28 14:27             ` Mark Rutland
@ 2016-01-29  1:21               ` Xishi Qiu
  2016-01-29 11:02                 ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Xishi Qiu @ 2016-01-29  1:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper

On 2016/1/28 22:27, Mark Rutland wrote:

> On Thu, Jan 28, 2016 at 07:47:09PM +0800, Xishi Qiu wrote:
>> On 2016/1/28 18:51, Mark Rutland wrote:
>>
>>> On Thu, Jan 28, 2016 at 09:47:20AM +0800, Xishi Qiu wrote:
>>>> On 2016/1/18 19:56, Mark Rutland wrote:
>>>>> On Wed, Jan 13, 2016 at 05:10:31PM +0100, Ard Biesheuvel wrote:
>>>>>> Something along these lines, perhaps?
>>>>>>
>>>>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>>>>> index 3571c7309c5e..bda0a776c58e 100644
>>>>>> --- a/arch/arm64/mm/pageattr.c
>>>>>> +++ b/arch/arm64/mm/pageattr.c
>>>>>> @@ -13,6 +13,7 @@
>>>>>>  #include <linux/kernel.h>
>>>>>>  #include <linux/mm.h>
>>>>>>  #include <linux/module.h>
>>>>>> +#include <linux/vmalloc.h>
>>>>>>  #include <linux/sched.h>
>>>>>>
>>>>>>  #include <asm/pgtable.h>
>>>>>> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr
>>>>>>         unsigned long end = start + size;
>>>>>>         int ret;
>>>>>>         struct page_change_data data;
>>>>>> +       struct vm_struct *area;
>>>>>>
>>>>>>         if (!PAGE_ALIGNED(addr)) {
>>>>>>                 start &= PAGE_MASK;
>>>>>> @@ -51,10 +53,14 @@ static int change_memory_common(unsigned long addr,
>>>>>>                 WARN_ON_ONCE(1);
>>>>>>         }
>>>>>>
>>>>>> -       if (start < MODULES_VADDR || start >= MODULES_END)
>>>>>> -               return -EINVAL;
>>>>>> -
>>>>>> -       if (end < MODULES_VADDR || end >= MODULES_END)
>>>>>> +       /*
>>>>>> +        * Check whether the [addr, addr + size) interval is entirely
>>>>>> +        * covered by precisely one VM area that has the VM_ALLOC flag set
>>>>>> +        */
>>>>>> +       area = find_vm_area((void *)addr);
>>>>>> +       if (!area ||
>>>>>> +           end > (unsigned long)area->addr + area->size ||
>>>>>> +           !(area->flags & VM_ALLOC))
>>>>>>                 return -EINVAL;
>>>>>>
>>>>>>         data.set_mask = set_mask;
>>>>>
>>>>> Neat. That fixes the fencepost bug too.
>>>>>
>>>>> Looks good to me, though as Laura suggested we should have a comment as
>>>>> to why we limit changes to such regions. Fancy taking her wording below
>>>>> and spinning this as a patch?
>>>>>
>>>>>>>> +       /*
>>>>>>>> +        * This check explicitly excludes most kernel memory. Most kernel
>>>>>>>> +        * memory is mapped with a larger page size and breaking down the
>>>>>>>> +        * larger page size without causing TLB conflicts is very difficult.
>>>>>>>> +        *
>>>>>>>> +        * If you need to call set_memory_* on a range, the recommendation is
>>>>>>>> +        * to use vmalloc since that range is mapped with pages.
>>>>>>>> +        */
>>>>>
>>>>> Thanks,
>>>>> Mark.
>>>>>
>>>>
>>>> Hi Mark,
>>>>
>>>> After change the flag, it calls only flush_tlb_kernel_range(), so why not use 
>>>> cpu_replace_ttbr1(swapper_pg_dir)? 
>>>
>>> We cannot use cpu_replace_ttbr1 here. Other CPUs may be online, and we
>>> have no mechanism to place them in a safe set of page tables while
>>> swapping TTBR1, we'd have to perform a deep copy of tables, and this
>>> would be horrendously expensive.
>>>
>>> Using flush_tlb_kernel_range() is sufficient. As modules don't share a
>>> page or section mapping with other users, we can follow a
>>> Break-Before-Make approach. Additionally, they're mapped at page
>>> granularity so we never split or fuse sections anyway. We only modify
>>> the permission bits.
>>>
>>
>> Hi Mark,
>>
>> Is it safe in the following path?
>>
>> alloc the whole linear map section
> 
> I don't understand what you mean by this, you will need to elaborate.
> The terms "alloc" and "section" can mean a number of different things in
> this context.
> 
>> cpu A write something on it
>> cpu B write something on it
>> cpu C set read only flag and call flush_tlb_kernel_range()
> 
> If you want to modify a portion of the linear map, this will not work.
> Modfiying the linear map in this manner is not safe.
> 
> If you want an alias of the linear map which was mapped using pages, and
> you wanted to change that alias, that could work.
> 

Hi Mark,

I mean I change the whole section(maybe 1G?) in linear map.

In our software, kernel create mapping(linear map) on special memory, and
it is separated from buddy system, the service manage the special memory itself.
And the service alloc/free the memory based on the physical address, so if 
the service want to change the prot dynamically, vmalloc doesn't work, and
fixmap is a little complex.

I think if I create the spacial memory in 4kb, then the service could
use set_memory_ro() directly, right?

Thanks,
Xishi Qiu

> It depends on what precisely you are trying to do.
> 
> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-29  1:21               ` Xishi Qiu
@ 2016-01-29 11:02                 ` Mark Rutland
  2016-01-30  2:48                   ` Xishi Qiu
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2016-01-29 11:02 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper

On Fri, Jan 29, 2016 at 09:21:40AM +0800, Xishi Qiu wrote:
> On 2016/1/28 22:27, Mark Rutland wrote:
> > On Thu, Jan 28, 2016 at 07:47:09PM +0800, Xishi Qiu wrote:
> >> Hi Mark,
> >>
> >> Is it safe in the following path?
> >>
> >> alloc the whole linear map section
> > 
> > I don't understand what you mean by this, you will need to elaborate.
> > The terms "alloc" and "section" can mean a number of different things in
> > this context.
> > 
> >> cpu A write something on it
> >> cpu B write something on it
> >> cpu C set read only flag and call flush_tlb_kernel_range()
> > 
> > If you want to modify a portion of the linear map, this will not work.
> > Modfiying the linear map in this manner is not safe.
> > 
> > If you want an alias of the linear map which was mapped using pages, and
> > you wanted to change that alias, that could work.
> > 
> 
> Hi Mark,
> 
> I mean I change the whole section(maybe 1G?) in linear map.

If you mean something that was mapped with a section (i.e. a block entry
in some level of page table), then no. The linear map is not open to
this kind of change, as portions of the region may be in use elsewhere
within Linux.

> In our software, kernel create mapping(linear map) on special memory,
> and
> it is separated from buddy system, the service manage the special memory itself.

This is not what the linear map is for. What exactly is this "special
memory"?

Is it some persistent memory?

Is it normal memory that you wish to use for some communication with
other agents and/or DMA?

Is it normal memory that you simply have a special use-case for?

> And the service alloc/free the memory based on the physical address, so if 
> the service want to change the prot dynamically, vmalloc doesn't work, and
> fixmap is a little complex.

Without further explanation of your use-case, this doesn't make sense to
me. I don't understand why the physical address matters -- that implies
you have other agents accessing this memory. If that's the case, I don't
see what changing the permissions buys you.

Regardless, it sounds like either we're missing some infrastructure, or
you are mis-using existing APIs.

> I think if I create the spacial memory in 4kb, then the service could
> use set_memory_ro() directly, right?

Perhaps. If it's a vmalloc'd area, then yes (once Ard's patch to allow
that is in). I have more general concerns with your approach, in that I
still do not understand the problem you are trying to solve.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-29 11:02                 ` Mark Rutland
@ 2016-01-30  2:48                   ` Xishi Qiu
  2016-02-03 13:43                     ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Xishi Qiu @ 2016-01-30  2:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper, Hanjun Guo

On 2016/1/29 19:02, Mark Rutland wrote:

> On Fri, Jan 29, 2016 at 09:21:40AM +0800, Xishi Qiu wrote:
>> On 2016/1/28 22:27, Mark Rutland wrote:
>>> On Thu, Jan 28, 2016 at 07:47:09PM +0800, Xishi Qiu wrote:
>>>> Hi Mark,
>>>>
>>>> Is it safe in the following path?
>>>>
>>>> alloc the whole linear map section
>>>
>>> I don't understand what you mean by this, you will need to elaborate.
>>> The terms "alloc" and "section" can mean a number of different things in
>>> this context.
>>>
>>>> cpu A write something on it
>>>> cpu B write something on it
>>>> cpu C set read only flag and call flush_tlb_kernel_range()
>>>
>>> If you want to modify a portion of the linear map, this will not work.
>>> Modfiying the linear map in this manner is not safe.
>>>
>>> If you want an alias of the linear map which was mapped using pages, and
>>> you wanted to change that alias, that could work.
>>>
>>
>> Hi Mark,
>>
>> I mean I change the whole section(maybe 1G?) in linear map.
> 
> If you mean something that was mapped with a section (i.e. a block entry
> in some level of page table), then no. The linear map is not open to
> this kind of change, as portions of the region may be in use elsewhere
> within Linux.
> 
>> In our software, kernel create mapping(linear map) on special memory,
>> and
>> it is separated from buddy system, the service manage the special memory itself.
> 
> This is not what the linear map is for. What exactly is this "special
> memory"?
> 
> Is it some persistent memory?
> 
> Is it normal memory that you wish to use for some communication with
> other agents and/or DMA?
> 
> Is it normal memory that you simply have a special use-case for?
> 
>> And the service alloc/free the memory based on the physical address, so if 
>> the service want to change the prot dynamically, vmalloc doesn't work, and
>> fixmap is a little complex.
> 
> Without further explanation of your use-case, this doesn't make sense to
> me. I don't understand why the physical address matters -- that implies
> you have other agents accessing this memory. If that's the case, I don't
> see what changing the permissions buys you.
> 
> Regardless, it sounds like either we're missing some infrastructure, or
> you are mis-using existing APIs.
> 
>> I think if I create the spacial memory in 4kb, then the service could
>> use set_memory_ro() directly, right?
> 
> Perhaps. If it's a vmalloc'd area, then yes (once Ard's patch to allow
> that is in). I have more general concerns with your approach, in that I
> still do not understand the problem you are trying to solve.
> 

Hi Mark,

Thanks for your reply. Maybe I didn't express it clearly, sorry for it.

The abstract process is the following:
1. do not create a large block, use 4kb for all of the memory(regardless of performance).
setup_arch->paging_init()->map_mem()->__map_memblock()->create_mapping()
2. alloc a page and get the the linear mapping address.
3. modify judgment condition of the function set_memory_ro(), so it could handle the linear mapping range.
4. use set_memory_ro() to change the prot flag of the page which we get in step 2.

Is it safe?

Thanks,
Xishi Qiu

> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] arm64: Allow vmalloc regions to be set with set_memory_*
  2016-01-30  2:48                   ` Xishi Qiu
@ 2016-02-03 13:43                     ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2016-02-03 13:43 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Ard Biesheuvel, Laura Abbott, Catalin Marinas, Will Deacon,
	Kees Cook, linux-kernel, linux-arm-kernel, zhong jiang,
	steve.capper, Hanjun Guo

On Sat, Jan 30, 2016 at 10:48:02AM +0800, Xishi Qiu wrote:
 Hi Mark,
> 
> Thanks for your reply. Maybe I didn't express it clearly, sorry for it.
> 
> The abstract process is the following:
> 1. do not create a large block, use 4kb for all of the memory(regardless of performance).
> setup_arch->paging_init()->map_mem()->__map_memblock()->create_mapping()
> 2. alloc a page and get the the linear mapping address.
> 3. modify judgment condition of the function set_memory_ro(), so it could handle the linear mapping range.
> 4. use set_memory_ro() to change the prot flag of the page which we get in step 2.
> 
> Is it safe?

This will be free from the problems associated with splitting/fusing
sections.

My concern was with what exactly this "special memory" was and how it
was being used. It sounds like either some infrstructure is missing, or
you are using the wrong abstractions.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-02-03 13:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12 21:46 [PATCH] arm64: Allow vmalloc regions to be set with set_memory_* Laura Abbott
2016-01-13  0:01 ` Alexei Starovoitov
2016-01-13  0:31   ` Daniel Borkmann
2016-01-13 14:03 ` Ard Biesheuvel
2016-01-13 16:10   ` Ard Biesheuvel
2016-01-14 23:01     ` Laura Abbott
2016-01-18 11:56     ` Mark Rutland
2016-01-28  1:47       ` Xishi Qiu
2016-01-28 10:51         ` Mark Rutland
2016-01-28 11:47           ` Xishi Qiu
2016-01-28 14:27             ` Mark Rutland
2016-01-29  1:21               ` Xishi Qiu
2016-01-29 11:02                 ` Mark Rutland
2016-01-30  2:48                   ` Xishi Qiu
2016-02-03 13:43                     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).