All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
@ 2017-04-24  9:22 zhongjiang
  2017-04-24 10:44 ` Mark Rutland
  0 siblings, 1 reply; 11+ messages in thread
From: zhongjiang @ 2017-04-24  9:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: zhong jiang <zhongjiang@huawei.com>

Recently, xiaojun report the following issue.

[ 4544.984139] Unable to handle kernel paging request at virtual address ffff804392800000
[ 4544.991995] pgd = ffff80096745f000
[ 4544.995369] [ffff804392800000] *pgd=0000000000000000
[ 4545.000297] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 4545.005815] Modules linked in:
[ 4545.008843] CPU: 1 PID: 8976 Comm: cat Not tainted 4.11.0-rc6 #1
[ 4545.014790] Hardware name: ARM Juno development board (r1) (DT)
[ 4545.020653] task: ffff8009753fdb00 task.stack: ffff80097533c000
[ 4545.026520] PC is at __memcpy+0x100/0x180
[ 4545.030491] LR is at vread+0x144/0x280
[ 4545.034202] pc : [<ffff0000083a1000>] lr : [<ffff0000081c126c>] pstate: 20000145
[ 4545.041530] sp : ffff80097533fcb0
[ 4545.044811] x29: ffff80097533fcb0 x28: ffff800962d24000
[ 4545.050074] x27: 0000000000001000 x26: ffff8009753fdb00
[ 4545.055337] x25: ffff000008200000 x24: ffff800977801380
[ 4545.060600] x23: ffff8009753fdb00 x22: ffff800962d24000
[ 4545.065863] x21: 0000000000001000 x20: ffff000008200000
[ 4545.071125] x19: 0000000000001000 x18: 0000ffffefa323c0
[ 4545.076387] x17: 0000ffffa9c87440 x16: ffff0000081fdfd0
[ 4545.081649] x15: 0000ffffa9d01588 x14: 72a77346b2407be7
[ 4545.086911] x13: 5299400690000000 x12: b0000001f9001a79
[ 4545.092173] x11: 97fc098d91042260 x10: 0000000000000000
[ 4545.097435] x9 : 0000000000000000 x8 : 9110626091260021
[ 4545.102698] x7 : 0000000000001000 x6 : ffff800962d24000
[ 4545.107960] x5 : ffff8009778013b0 x4 : 0000000000000000
[ 4545.113222] x3 : 0400000000000001 x2 : 0000000000000f80
[ 4545.118484] x1 : ffff804392800000 x0 : ffff800962d24000
[ 4545.123745]
[ 4545.125220] Process cat (pid: 8976, stack limit = 0xffff80097533c000)
[ 4545.131598] Stack: (0xffff80097533fcb0 to 0xffff800975340000)
[ 4545.137289] fca0:                                   ffff80097533fd30 ffff000008270f64
[ 4545.145049] fcc0: 000000000000e000 000000003956f000 ffff000008f950d0 ffff80097533feb8
[ 4545.152809] fce0: 0000000000002000 ffff8009753fdb00 ffff800962d24000 ffff000008e8d3d8
[ 4545.160568] fd00: 0000000000001000 ffff000008200000 0000000000001000 ffff800962d24000
[ 4545.168327] fd20: 0000000000001000 ffff000008e884a0 ffff80097533fdb0 ffff00000826340c
[ 4545.176086] fd40: ffff800976bf2800 fffffffffffffffb 000000003956d000 ffff80097533feb8
[ 4545.183846] fd60: 0000000060000000 0000000000000015 0000000000000124 000000000000003f
[ 4545.191605] fd80: ffff000008962000 ffff8009753fdb00 ffff8009753fdb00 ffff8009753fdb00
[ 4545.199364] fda0: 0000000300000124 0000000000002000 ffff80097533fdd0 ffff0000081fb83c
[ 4545.207123] fdc0: 0000000000010000 ffff80097514f900 ffff80097533fe50 ffff0000081fcb28
[ 4545.214883] fde0: 0000000000010000 ffff80097514f900 0000000000000000 0000000000000000
[ 4545.222642] fe00: ffff80097533fe30 ffff0000081fca1c ffff80097514f900 0000000000000000
[ 4545.230401] fe20: 000000003956d000 ffff80097533feb8 ffff80097533fe50 ffff0000081fcb04
[ 4545.238160] fe40: 0000000000010000 ffff80097514f900 ffff80097533fe80 ffff0000081fe014
[ 4545.245919] fe60: ffff80097514f900 ffff80097514f900 000000003956d000 0000000000010000
[ 4545.253678] fe80: 0000000000000000 ffff000008082f30 0000000000000000 0000800977146000
[ 4545.261438] fea0: ffffffffffffffff 0000ffffa9c8745c 0000000000000124 0000000008202000
[ 4545.269197] fec0: 0000000000000003 000000003956d000 0000000000010000 0000000000000000
[ 4545.276956] fee0: 0000000000011011 0000000000000001 0000000000000011 0000000000000002
[ 4545.284715] ff00: 000000000000003f 1f3c201f7372686b 00000000ffffffff 0000000000000030
[ 4545.292474] ff20: 0000000000000038 0000000000000000 0000ffffa9bcca94 0000ffffa9d01588
[ 4545.300233] ff40: 0000000000000000 0000ffffa9c87440 0000ffffefa323c0 0000000000010000
[ 4545.307993] ff60: 000000000041a310 000000003956d000 0000000000000003 000000007fffe000
[ 4545.315751] ff80: 00000000004088d0 0000000000010000 0000000000000000 0000000000000000
[ 4545.323511] ffa0: 0000000000010000 0000ffffefa32690 0000000000404dcc 0000ffffefa32690
[ 4545.331270] ffc0: 0000ffffa9c8745c 0000000060000000 0000000000000003 000000000000003f
[ 4545.339029] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 4545.346786] Call trace:
[ 4545.349207] Exception stack(0xffff80097533fae0 to 0xffff80097533fc10)
[ 4545.355586] fae0: 0000000000001000 0001000000000000 ffff80097533fcb0 ffff0000083a1000
[ 4545.363345] fb00: 000000003957c000 ffff80097533fc00 0000000020000145 0000000000000025
[ 4545.371105] fb20: ffff800962d24000 ffff000008e8d3d8 0000000000001000 ffff8009753fdb00
[ 4545.378864] fb40: 0000000000000000 0000000000000002 ffff80097533fd30 ffff000008082604
[ 4545.386623] fb60: 0000000000001000 0001000000000000 ffff80097533fd30 ffff0000083a0a90
[ 4545.394382] fb80: ffff800962d24000 ffff804392800000 0000000000000f80 0400000000000001
[ 4545.402140] fba0: 0000000000000000 ffff8009778013b0 ffff800962d24000 0000000000001000
[ 4545.409899] fbc0: 9110626091260021 0000000000000000 0000000000000000 97fc098d91042260
[ 4545.417658] fbe0: b0000001f9001a79 5299400690000000 72a77346b2407be7 0000ffffa9d01588
[ 4545.425416] fc00: ffff0000081fdfd0 0000ffffa9c87440
[ 4545.430248] [<ffff0000083a1000>] __memcpy+0x100/0x180
[ 4545.435253] [<ffff000008270f64>] read_kcore+0x21c/0x3b0
[ 4545.440429] [<ffff00000826340c>] proc_reg_read+0x64/0x90
[ 4545.445691] [<ffff0000081fb83c>] __vfs_read+0x1c/0x108
[ 4545.450779] [<ffff0000081fcb28>] vfs_read+0x80/0x130
[ 4545.455696] [<ffff0000081fe014>] SyS_read+0x44/0xa0
[ 4545.460528] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
[ 4545.465790] Code: d503201f d503201f d503201f d503201f (a8c12027)
[ 4545.471852] ---[ end trace 4d1897f94759f461 ]---
[ 4545.476435] note: cat[8976] exited with preempt_count 2

I find the issue is introduced when applying commit f9040773b7bb
("arm64: move kernel image to base of vmalloc area"). This patch
make the kernel image overlap with vmalloc area. It will result in
vmalloc area have the huge page table. but the vmalloc_to_page is
not realize the change. and the function is public to any arch.

I fix it by change the init mapping to make it keep the accordance
with vmalloc area mapping.

Fixes: f9040773b7bb ("arm64: move kernel image to base of vmalloc area")
Reported-by: tan xiaojun <tanxiaojun@huawei.com>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/arm64/mm/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e4..2d8b34d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -185,7 +185,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
-		      !page_mappings_only) {
+		      !page_mappings_only && !is_vmalloc_addr((void *)addr)) {
 			/*
 			 * Set the contiguous bit for the subsequent group of
 			 * PMDs if its size and alignment are appropriate.
@@ -256,7 +256,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
-		if (use_1G_block(addr, next, phys) && !page_mappings_only) {
+		if (use_1G_block(addr, next, phys) && !page_mappings_only &&
+					!is_vmalloc_addr((void *)addr)) {
 			pud_set_huge(pud, phys, prot);
 
 			/*
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24  9:22 [PATCH] arm64: fix the overlap between the kernel image and vmalloc address zhongjiang
@ 2017-04-24 10:44 ` Mark Rutland
  2017-04-24 13:28   ` zhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Rutland @ 2017-04-24 10:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Thanks for reporting the problematic usage of is_vmalloc_addr() and
vmalloc_to_page() here. That is a real problem that we need to address.

On Mon, Apr 24, 2017 at 05:22:09PM +0800, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
>
> Recently, xiaojun report the following issue.
>
> [ 4544.984139] Unable to handle kernel paging request at virtual address ffff804392800000
> [ 4544.991995] pgd = ffff80096745f000
> [ 4544.995369] [ffff804392800000] *pgd=0000000000000000

> [ 4545.425416] fc00: ffff0000081fdfd0 0000ffffa9c87440
> [ 4545.430248] [<ffff0000083a1000>] __memcpy+0x100/0x180
> [ 4545.435253] [<ffff000008270f64>] read_kcore+0x21c/0x3b0
> [ 4545.440429] [<ffff00000826340c>] proc_reg_read+0x64/0x90
> [ 4545.445691] [<ffff0000081fb83c>] __vfs_read+0x1c/0x108
> [ 4545.450779] [<ffff0000081fcb28>] vfs_read+0x80/0x130
> [ 4545.455696] [<ffff0000081fe014>] SyS_read+0x44/0xa0
> [ 4545.460528] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
> [ 4545.465790] Code: d503201f d503201f d503201f d503201f (a8c12027)
> [ 4545.471852] ---[ end trace 4d1897f94759f461 ]---
> [ 4545.476435] note: cat[8976] exited with preempt_count 2
>
> I find the issue is introduced when applying commit f9040773b7bb
> ("arm64: move kernel image to base of vmalloc area"). This patch
> make the kernel image overlap with vmalloc area. It will result in
> vmalloc area have the huge page table. but the vmalloc_to_page is
> not realize the change. and the function is public to any arch.

So the issue is that we have the callchain below for a kernel image
address:

read_kcore()
->is_vmalloc_or_module_addr() // returns true
->vread()
-->aligned_vread()
--->vmalloc_to_page()

In is_vmalloc{,or_module}_addr() we just check the addr against
VMALLOC_START and VMALLOC_END, so they will return true for a kernel
image address.

Then, we call vmalloc_to_page(). While this only handles mappings made
at page granularity, the kernel image mapping may have used sections. So
this tries a bogus walk to the pte level.

Evidently, we assume that any memory in the vmalloc area (or module
areas) is mapped at page granularity. Is that always the case?

AFAICT, memremap'd memory isn't necessarily, but vread() should skip
that due to the VM_IOREMAP flag on the vma. The KASAN region should be
below MODULES_VADDR on arm64. I'm not sure if there's anything else.

Does it make sense to teach vmalloc_to_page() about section mappings?

Should we special-case kernel image handling, e.g. with new
is_kernel_image_addr() / kernel_image_to_page() helpers?

Do we need to shuffle things around such that the kernel image is not
between VMALLOC_START and VMALLOC_END?

> I fix it by change the init mapping to make it keep the accordance
> with vmalloc area mapping.
>
> Fixes: f9040773b7bb ("arm64: move kernel image to base of vmalloc area")
> Reported-by: tan xiaojun <tanxiaojun@huawei.com>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  arch/arm64/mm/mmu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e4..2d8b34d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -185,7 +185,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>
>               /* try section mapping first */
>               if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
> -                   !page_mappings_only) {
> +                   !page_mappings_only && !is_vmalloc_addr((void *)addr)) {
>                       /*
>                        * Set the contiguous bit for the subsequent group of
>                        * PMDs if its size and alignment are appropriate.
> @@ -256,7 +256,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>               /*
>                * For 4K granule only, attempt to put down a 1GB block
>                */
> -             if (use_1G_block(addr, next, phys) && !page_mappings_only) {
> +             if (use_1G_block(addr, next, phys) && !page_mappings_only &&
> +                                     !is_vmalloc_addr((void *)addr)) {
>                       pud_set_huge(pud, phys, prot);
>

This will force the kernel image mappings to use page granularity, which
will come at a significant TLB pressure cost, and would be incredibly
unfortunate.

I would rather we solved this through other means.

Thanks,
Mark.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 10:44 ` Mark Rutland
@ 2017-04-24 13:28   ` zhong jiang
  2017-04-24 15:51     ` Mark Rutland
  0 siblings, 1 reply; 11+ messages in thread
From: zhong jiang @ 2017-04-24 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/4/24 18:44, Mark Rutland wrote:
> Hi,
>
> Thanks for reporting the problematic usage of is_vmalloc_addr() and
> vmalloc_to_page() here. That is a real problem that we need to address.
>
> On Mon, Apr 24, 2017 at 05:22:09PM +0800, zhongjiang wrote:
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> Recently, xiaojun report the following issue.
>>
>> [ 4544.984139] Unable to handle kernel paging request at virtual address ffff804392800000
>> [ 4544.991995] pgd = ffff80096745f000
>> [ 4544.995369] [ffff804392800000] *pgd=0000000000000000
>> [ 4545.425416] fc00: ffff0000081fdfd0 0000ffffa9c87440
>> [ 4545.430248] [<ffff0000083a1000>] __memcpy+0x100/0x180
>> [ 4545.435253] [<ffff000008270f64>] read_kcore+0x21c/0x3b0
>> [ 4545.440429] [<ffff00000826340c>] proc_reg_read+0x64/0x90
>> [ 4545.445691] [<ffff0000081fb83c>] __vfs_read+0x1c/0x108
>> [ 4545.450779] [<ffff0000081fcb28>] vfs_read+0x80/0x130
>> [ 4545.455696] [<ffff0000081fe014>] SyS_read+0x44/0xa0
>> [ 4545.460528] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
>> [ 4545.465790] Code: d503201f d503201f d503201f d503201f (a8c12027)
>> [ 4545.471852] ---[ end trace 4d1897f94759f461 ]---
>> [ 4545.476435] note: cat[8976] exited with preempt_count 2
>>
>> I find the issue is introduced when applying commit f9040773b7bb
>> ("arm64: move kernel image to base of vmalloc area"). This patch
>> make the kernel image overlap with vmalloc area. It will result in
>> vmalloc area have the huge page table. but the vmalloc_to_page is
>> not realize the change. and the function is public to any arch.
> So the issue is that we have the callchain below for a kernel image
> address:
>
> read_kcore()
> ->is_vmalloc_or_module_addr() // returns true
> ->vread()
> -->aligned_vread()
> --->vmalloc_to_page()
>
> In is_vmalloc{,or_module}_addr() we just check the addr against
> VMALLOC_START and VMALLOC_END, so they will return true for a kernel
> image address.
>
> Then, we call vmalloc_to_page(). While this only handles mappings made
> at page granularity, the kernel image mapping may have used sections. So
> this tries a bogus walk to the pte level.
>
> Evidently, we assume that any memory in the vmalloc area (or module
> areas) is mapped at page granularity. Is that always the case?
  I do not see a vmalloc area mapped in huge page table so far. 
> AFAICT, memremap'd memory isn't necessarily, but vread() should skip
> that due to the VM_IOREMAP flag on the vma. The KASAN region should be
> below MODULES_VADDR on arm64. I'm not sure if there's anything else.
> Does it make sense to teach vmalloc_to_page() about section mappings?
   I do not know it has any limitation. if it is ok , it is worthy to try.
> Should we special-case kernel image handling, e.g. with new
> is_kernel_image_addr() / kernel_image_to_page() helpers?
  yes ,  it seems to the best way to implents it  without performance back.
> Do we need to shuffle things around such that the kernel image is not
> between VMALLOC_START and VMALLOC_END?
>
>> I fix it by change the init mapping to make it keep the accordance
>> with vmalloc area mapping.
>>
>> Fixes: f9040773b7bb ("arm64: move kernel image to base of vmalloc area")
>> Reported-by: tan xiaojun <tanxiaojun@huawei.com>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  arch/arm64/mm/mmu.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 17243e4..2d8b34d 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -185,7 +185,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>>
>>               /* try section mapping first */
>>               if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>> -                   !page_mappings_only) {
>> +                   !page_mappings_only && !is_vmalloc_addr((void *)addr)) {
>>                       /*
>>                        * Set the contiguous bit for the subsequent group of
>>                        * PMDs if its size and alignment are appropriate.
>> @@ -256,7 +256,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>>               /*
>>                * For 4K granule only, attempt to put down a 1GB block
>>                */
>> -             if (use_1G_block(addr, next, phys) && !page_mappings_only) {
>> +             if (use_1G_block(addr, next, phys) && !page_mappings_only &&
>> +                                     !is_vmalloc_addr((void *)addr)) {
>>                       pud_set_huge(pud, phys, prot);
>>
> This will force the kernel image mappings to use page granularity, which
> will come at a significant TLB pressure cost, and would be incredibly
> unfortunate.
  you are right.  it seems to so.  The simplest way to add is_kernel_image_addr helper.
> I would rather we solved this through other means.
>
> Thanks,
> Mark.
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>
 > The following patch is the implment. Any thought?

Signed-off-by: zhong jiang <zhongjiang@huawei.com>

  diff --git a/include/linux/mm.h b/include/linux/mm.h
index b84615b..851ac35 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
        return false;
 #endif
 }
+
+static inline bool is_kernel_image_addr(const void *x)
+{
+       unsigned long addr = (unsigned long)x;
+
+       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
+
+}
+
 #ifdef CONFIG_MMU
 extern int is_vmalloc_or_module_addr(const void *x);
 #else
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3ca82d4..9a9ef65 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
        return is_vmalloc_addr(x);
 }

+static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
+{
+       unsigned long addr = (unsigned long)kernel_addr;
+       struct page *page = NULL;
+       pud_t *pud;
+       pmd_t *pmd;
+       pte_t *pte;
+
+       if (pgd_none(*pgd))
+               goto out;
+
+       pud = pud_offset(pgd, addr);
+       if (pud_none(*pud))
+               goto out;
+
+       if (pud_sect(*pud))
+               return pud_page(*pud);
+
+       pmd = pmd_offset(*pmd, addr);
+       if (pmd_none(*pmd))
+               goto out;
+
+       if (pmd_sect(*pmd))
+               return pmd_page(*pmd);
+
+       pte = pte_offset_kernel(pmd, addr);
+       if (pte_none(*pte))
+               goto out;
+
+       page = pte_page(*pte);
+
+out:
+       return page;
+
+}
+
 /*
   * Walk a vmap address to the struct page it maps.
   */
@@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
         */
        VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));

+       if (is_kernel_image_addr(vmalloc_addr))
+               return kernel_image_to_page(vmalloc_addr, pgd);
+
        if (!pgd_none(*pgd)) {
                pud_t *pud = pud_offset(pgd, addr);
                if (!pud_none(*pud)) {


 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 13:28   ` zhong jiang
@ 2017-04-24 15:51     ` Mark Rutland
  2017-04-24 17:52       ` Laura Abbott
  2017-04-25 14:11       ` zhong jiang
  0 siblings, 2 replies; 11+ messages in thread
From: Mark Rutland @ 2017-04-24 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
> On 2017/4/24 18:44, Mark Rutland wrote:
> > So the issue is that we have the callchain below for a kernel image
> > address:
> >
> > read_kcore()
> > ->is_vmalloc_or_module_addr() // returns true
> > ->vread()
> > -->aligned_vread()
> > --->vmalloc_to_page()
> >
> > In is_vmalloc{,or_module}_addr() we just check the addr against
> > VMALLOC_START and VMALLOC_END, so they will return true for a kernel
> > image address.
> >
> > Then, we call vmalloc_to_page(). While this only handles mappings made
> > at page granularity, the kernel image mapping may have used sections. So
> > this tries a bogus walk to the pte level.

> > Should we special-case kernel image handling, e.g. with new
> > is_kernel_image_addr() / kernel_image_to_page() helpers?

>   yes ,  it seems to the best way to implents it  without performance back.

> The following patch is the implment. Any thought?
> 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> 
>   diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b84615b..851ac35 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
>         return false;
>  #endif
>  }
> +
> +static inline bool is_kernel_image_addr(const void *x)
> +{
> +       unsigned long addr = (unsigned long)x;
> +
> +       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
> +
> +}
> +
>  #ifdef CONFIG_MMU
>  extern int is_vmalloc_or_module_addr(const void *x);
>  #else
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 3ca82d4..9a9ef65 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
>         return is_vmalloc_addr(x);
>  }
> 
> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
> +{
> +       unsigned long addr = (unsigned long)kernel_addr;
> +       struct page *page = NULL;
> +       pud_t *pud;
> +       pmd_t *pmd;
> +       pte_t *pte;
> +
> +       if (pgd_none(*pgd))
> +               goto out;
> +
> +       pud = pud_offset(pgd, addr);
> +       if (pud_none(*pud))
> +               goto out;
> +
> +       if (pud_sect(*pud))
> +               return pud_page(*pud);

The *_sect() helpers are arch-specific, so we cannot use them in generic
code. This would need to be architecture-specific.

Secondly, this will return head page of the section regardless of which
page in the section the address corresponds to

> +
> +       pmd = pmd_offset(*pmd, addr);
> +       if (pmd_none(*pmd))
> +               goto out;
> +
> +       if (pmd_sect(*pmd))
> +               return pmd_page(*pmd);

Likewise on both counts.

> +
> +       pte = pte_offset_kernel(pmd, addr);
> +       if (pte_none(*pte))
> +               goto out;
> +
> +       page = pte_page(*pte);
> +
> +out:
> +       return page;
> +
> +}

Given we know what the address should map to, I don't think we need to
walk the page tables here. I think this can be:

static struct page *kernel_image_to_page(const void *addr)
{
	return virt_to_page(lm_alias(vmalloc_addr));
}

> +
>  /*
>    * Walk a vmap address to the struct page it maps.
>    */
> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>          */
>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
> 
> +       if (is_kernel_image_addr(vmalloc_addr))
> +               return kernel_image_to_page(vmalloc_addr, pgd);

It's not clear to me that this is the right place for this to live.

It might be best to code the kernel image logic directly in kcore (and
kmem), assuming everyone's OK with that approach.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 15:51     ` Mark Rutland
@ 2017-04-24 17:52       ` Laura Abbott
  2017-04-24 17:56         ` Ard Biesheuvel
  2017-04-25 14:51         ` Mark Rutland
  2017-04-25 14:11       ` zhong jiang
  1 sibling, 2 replies; 11+ messages in thread
From: Laura Abbott @ 2017-04-24 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/24/2017 08:51 AM, Mark Rutland wrote:
> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
>> On 2017/4/24 18:44, Mark Rutland wrote:
>>> So the issue is that we have the callchain below for a kernel image
>>> address:
>>>
>>> read_kcore()
>>> ->is_vmalloc_or_module_addr() // returns true
>>> ->vread()
>>> -->aligned_vread()
>>> --->vmalloc_to_page()
>>>
>>> In is_vmalloc{,or_module}_addr() we just check the addr against
>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel
>>> image address.
>>>
>>> Then, we call vmalloc_to_page(). While this only handles mappings made
>>> at page granularity, the kernel image mapping may have used sections. So
>>> this tries a bogus walk to the pte level.
> 
>>> Should we special-case kernel image handling, e.g. with new
>>> is_kernel_image_addr() / kernel_image_to_page() helpers?
> 
>>   yes ,  it seems to the best way to implents it  without performance back.
> 
>> The following patch is the implment. Any thought?
>>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>>
>>   diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index b84615b..851ac35 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
>>         return false;
>>  #endif
>>  }
>> +
>> +static inline bool is_kernel_image_addr(const void *x)
>> +{
>> +       unsigned long addr = (unsigned long)x;
>> +
>> +       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
>> +
>> +}
>> +
>>  #ifdef CONFIG_MMU
>>  extern int is_vmalloc_or_module_addr(const void *x);
>>  #else
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 3ca82d4..9a9ef65 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
>>         return is_vmalloc_addr(x);
>>  }
>>
>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
>> +{
>> +       unsigned long addr = (unsigned long)kernel_addr;
>> +       struct page *page = NULL;
>> +       pud_t *pud;
>> +       pmd_t *pmd;
>> +       pte_t *pte;
>> +
>> +       if (pgd_none(*pgd))
>> +               goto out;
>> +
>> +       pud = pud_offset(pgd, addr);
>> +       if (pud_none(*pud))
>> +               goto out;
>> +
>> +       if (pud_sect(*pud))
>> +               return pud_page(*pud);
> 
> The *_sect() helpers are arch-specific, so we cannot use them in generic
> code. This would need to be architecture-specific.
> 
> Secondly, this will return head page of the section regardless of which
> page in the section the address corresponds to
> 
>> +
>> +       pmd = pmd_offset(*pmd, addr);
>> +       if (pmd_none(*pmd))
>> +               goto out;
>> +
>> +       if (pmd_sect(*pmd))
>> +               return pmd_page(*pmd);
> 
> Likewise on both counts.
> 
>> +
>> +       pte = pte_offset_kernel(pmd, addr);
>> +       if (pte_none(*pte))
>> +               goto out;
>> +
>> +       page = pte_page(*pte);
>> +
>> +out:
>> +       return page;
>> +
>> +}
> 
> Given we know what the address should map to, I don't think we need to
> walk the page tables here. I think this can be:
> 
> static struct page *kernel_image_to_page(const void *addr)
> {
> 	return virt_to_page(lm_alias(vmalloc_addr));
> }
> 
>> +
>>  /*
>>    * Walk a vmap address to the struct page it maps.
>>    */
>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>          */
>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>
>> +       if (is_kernel_image_addr(vmalloc_addr))
>> +               return kernel_image_to_page(vmalloc_addr, pgd);
> 
> It's not clear to me that this is the right place for this to live.
> 
> It might be best to code the kernel image logic directly in kcore (and
> kmem), assuming everyone's OK with that approach.
> 

That will fix kcore and kmem but this will show up in other places too.
We've gone through and made sure that virt_addr_valid returns
true if and only if virt_to_page returns a valid address. I don't know
if we can make as strong a claim about is_vmalloc_addr and
vmalloc_to_page in all cases but is_vmalloc_addr should not return true
for the kernel image. That would at least let kcore fall back to
kern_addr_valid which should correctly handle the kernel image.
The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
seems like the best approach although I haven't tried a prototype
at all.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 17:52       ` Laura Abbott
@ 2017-04-24 17:56         ` Ard Biesheuvel
  2017-04-25  8:13           ` zhong jiang
  2017-04-25 15:02           ` Mark Rutland
  2017-04-25 14:51         ` Mark Rutland
  1 sibling, 2 replies; 11+ messages in thread
From: Ard Biesheuvel @ 2017-04-24 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote:
> On 04/24/2017 08:51 AM, Mark Rutland wrote:
>> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
>>> On 2017/4/24 18:44, Mark Rutland wrote:
>>>> So the issue is that we have the callchain below for a kernel image
>>>> address:
>>>>
>>>> read_kcore()
>>>> ->is_vmalloc_or_module_addr() // returns true
>>>> ->vread()
>>>> -->aligned_vread()
>>>> --->vmalloc_to_page()
>>>>
>>>> In is_vmalloc{,or_module}_addr() we just check the addr against
>>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel
>>>> image address.
>>>>
>>>> Then, we call vmalloc_to_page(). While this only handles mappings made
>>>> at page granularity, the kernel image mapping may have used sections. So
>>>> this tries a bogus walk to the pte level.
>>
>>>> Should we special-case kernel image handling, e.g. with new
>>>> is_kernel_image_addr() / kernel_image_to_page() helpers?
>>
>>>   yes ,  it seems to the best way to implents it  without performance back.
>>
>>> The following patch is the implment. Any thought?
>>>
>>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>>>
>>>   diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index b84615b..851ac35 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
>>>         return false;
>>>  #endif
>>>  }
>>> +
>>> +static inline bool is_kernel_image_addr(const void *x)
>>> +{
>>> +       unsigned long addr = (unsigned long)x;
>>> +
>>> +       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
>>> +
>>> +}
>>> +
>>>  #ifdef CONFIG_MMU
>>>  extern int is_vmalloc_or_module_addr(const void *x);
>>>  #else
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index 3ca82d4..9a9ef65 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
>>>         return is_vmalloc_addr(x);
>>>  }
>>>
>>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
>>> +{
>>> +       unsigned long addr = (unsigned long)kernel_addr;
>>> +       struct page *page = NULL;
>>> +       pud_t *pud;
>>> +       pmd_t *pmd;
>>> +       pte_t *pte;
>>> +
>>> +       if (pgd_none(*pgd))
>>> +               goto out;
>>> +
>>> +       pud = pud_offset(pgd, addr);
>>> +       if (pud_none(*pud))
>>> +               goto out;
>>> +
>>> +       if (pud_sect(*pud))
>>> +               return pud_page(*pud);
>>
>> The *_sect() helpers are arch-specific, so we cannot use them in generic
>> code. This would need to be architecture-specific.
>>
>> Secondly, this will return head page of the section regardless of which
>> page in the section the address corresponds to
>>
>>> +
>>> +       pmd = pmd_offset(*pmd, addr);
>>> +       if (pmd_none(*pmd))
>>> +               goto out;
>>> +
>>> +       if (pmd_sect(*pmd))
>>> +               return pmd_page(*pmd);
>>
>> Likewise on both counts.
>>
>>> +
>>> +       pte = pte_offset_kernel(pmd, addr);
>>> +       if (pte_none(*pte))
>>> +               goto out;
>>> +
>>> +       page = pte_page(*pte);
>>> +
>>> +out:
>>> +       return page;
>>> +
>>> +}
>>
>> Given we know what the address should map to, I don't think we need to
>> walk the page tables here. I think this can be:
>>
>> static struct page *kernel_image_to_page(const void *addr)
>> {
>>       return virt_to_page(lm_alias(vmalloc_addr));
>> }
>>
>>> +
>>>  /*
>>>    * Walk a vmap address to the struct page it maps.
>>>    */
>>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>>          */
>>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>>
>>> +       if (is_kernel_image_addr(vmalloc_addr))
>>> +               return kernel_image_to_page(vmalloc_addr, pgd);
>>
>> It's not clear to me that this is the right place for this to live.
>>
>> It might be best to code the kernel image logic directly in kcore (and
>> kmem), assuming everyone's OK with that approach.
>>
>
> That will fix kcore and kmem but this will show up in other places too.
> We've gone through and made sure that virt_addr_valid returns
> true if and only if virt_to_page returns a valid address. I don't know
> if we can make as strong a claim about is_vmalloc_addr and
> vmalloc_to_page in all cases but is_vmalloc_addr should not return true
> for the kernel image. That would at least let kcore fall back to
> kern_addr_valid which should correctly handle the kernel image.
> The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
> seems like the best approach although I haven't tried a prototype
> at all.
>

Moving the kernel into the vmalloc region was kind of the point, for
KASLR. Undoing that means either disabling KASLR, or splitting the
vmalloc region into a KASLR region only for the kernel, and a vmalloc
region like we had when vmlinux lived in the linear region.

In general, I think we should be able to deal with different kinds of
mappings with different granularity in the vmalloc region. If
necessary, we could introduce a VM_xxx flag for the kernel to
distinguish such regions from ordinary VM_MAP regions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 17:56         ` Ard Biesheuvel
@ 2017-04-25  8:13           ` zhong jiang
  2017-04-25 15:02           ` Mark Rutland
  1 sibling, 0 replies; 11+ messages in thread
From: zhong jiang @ 2017-04-25  8:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/4/25 1:56, Ard Biesheuvel wrote:
> On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote:
>> On 04/24/2017 08:51 AM, Mark Rutland wrote:
>>> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
>>>> On 2017/4/24 18:44, Mark Rutland wrote:
>>>>> So the issue is that we have the callchain below for a kernel image
>>>>> address:
>>>>>
>>>>> read_kcore()
>>>>> ->is_vmalloc_or_module_addr() // returns true
>>>>> ->vread()
>>>>> -->aligned_vread()
>>>>> --->vmalloc_to_page()
>>>>>
>>>>> In is_vmalloc{,or_module}_addr() we just check the addr against
>>>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel
>>>>> image address.
>>>>>
>>>>> Then, we call vmalloc_to_page(). While this only handles mappings made
>>>>> at page granularity, the kernel image mapping may have used sections. So
>>>>> this tries a bogus walk to the pte level.
>>>>> Should we special-case kernel image handling, e.g. with new
>>>>> is_kernel_image_addr() / kernel_image_to_page() helpers?
>>>>   yes ,  it seems to the best way to implents it  without performance back.
>>>> The following patch is the implment. Any thought?
>>>>
>>>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>>>>
>>>>   diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index b84615b..851ac35 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
>>>>         return false;
>>>>  #endif
>>>>  }
>>>> +
>>>> +static inline bool is_kernel_image_addr(const void *x)
>>>> +{
>>>> +       unsigned long addr = (unsigned long)x;
>>>> +
>>>> +       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
>>>> +
>>>> +}
>>>> +
>>>>  #ifdef CONFIG_MMU
>>>>  extern int is_vmalloc_or_module_addr(const void *x);
>>>>  #else
>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>>> index 3ca82d4..9a9ef65 100644
>>>> --- a/mm/vmalloc.c
>>>> +++ b/mm/vmalloc.c
>>>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
>>>>         return is_vmalloc_addr(x);
>>>>  }
>>>>
>>>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
>>>> +{
>>>> +       unsigned long addr = (unsigned long)kernel_addr;
>>>> +       struct page *page = NULL;
>>>> +       pud_t *pud;
>>>> +       pmd_t *pmd;
>>>> +       pte_t *pte;
>>>> +
>>>> +       if (pgd_none(*pgd))
>>>> +               goto out;
>>>> +
>>>> +       pud = pud_offset(pgd, addr);
>>>> +       if (pud_none(*pud))
>>>> +               goto out;
>>>> +
>>>> +       if (pud_sect(*pud))
>>>> +               return pud_page(*pud);
>>> The *_sect() helpers are arch-specific, so we cannot use them in generic
>>> code. This would need to be architecture-specific.
>>>
>>> Secondly, this will return head page of the section regardless of which
>>> page in the section the address corresponds to
>>>
>>>> +
>>>> +       pmd = pmd_offset(*pmd, addr);
>>>> +       if (pmd_none(*pmd))
>>>> +               goto out;
>>>> +
>>>> +       if (pmd_sect(*pmd))
>>>> +               return pmd_page(*pmd);
>>> Likewise on both counts.
>>>
>>>> +
>>>> +       pte = pte_offset_kernel(pmd, addr);
>>>> +       if (pte_none(*pte))
>>>> +               goto out;
>>>> +
>>>> +       page = pte_page(*pte);
>>>> +
>>>> +out:
>>>> +       return page;
>>>> +
>>>> +}
>>> Given we know what the address should map to, I don't think we need to
>>> walk the page tables here. I think this can be:
>>>
>>> static struct page *kernel_image_to_page(const void *addr)
>>> {
>>>       return virt_to_page(lm_alias(vmalloc_addr));
>>> }
>>>
>>>> +
>>>>  /*
>>>>    * Walk a vmap address to the struct page it maps.
>>>>    */
>>>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>>>          */
>>>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>>>
>>>> +       if (is_kernel_image_addr(vmalloc_addr))
>>>> +               return kernel_image_to_page(vmalloc_addr, pgd);
>>> It's not clear to me that this is the right place for this to live.
>>>
>>> It might be best to code the kernel image logic directly in kcore (and
>>> kmem), assuming everyone's OK with that approach.
>>>
>> That will fix kcore and kmem but this will show up in other places too.
>> We've gone through and made sure that virt_addr_valid returns
>> true if and only if virt_to_page returns a valid address. I don't know
>> if we can make as strong a claim about is_vmalloc_addr and
>> vmalloc_to_page in all cases but is_vmalloc_addr should not return true
>> for the kernel image. That would at least let kcore fall back to
>> kern_addr_valid which should correctly handle the kernel image.
>> The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
>> seems like the best approach although I haven't tried a prototype
>> at all.
>>
> Moving the kernel into the vmalloc region was kind of the point, for
> KASLR. Undoing that means either disabling KASLR, or splitting the
> vmalloc region into a KASLR region only for the kernel, and a vmalloc
> region like we had when vmlinux lived in the linear region.
>
> In general, I think we should be able to deal with different kinds of
> mappings with different granularity in the vmalloc region. If
> necessary, we could introduce a VM_xxx flag for the kernel to
> distinguish such regions from ordinary VM_MAP regions.
> .
>
 Hi, Ard

 My initial thought is same with you. valloc_to_page is public for any ARCH.
 Only arm64 change the mappings granularity. if we need to adjust the function
 to any ARCH, it should implement it in each ARCH. or add CONFIG_ARM64 code to
 exclude the other ARCH. because the pud_* helper is arch related.
 
 Mark proposed that add kernel_image_is_page helper is more simple.
 and I prefer to the Mark opinions. Do you think? or I miss anythings.
 
 Depend on you and other maintainers.
 
 Thanks
 zhongjiang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 15:51     ` Mark Rutland
  2017-04-24 17:52       ` Laura Abbott
@ 2017-04-25 14:11       ` zhong jiang
  1 sibling, 0 replies; 11+ messages in thread
From: zhong jiang @ 2017-04-25 14:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/4/24 23:51, Mark Rutland wrote:
> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
>> On 2017/4/24 18:44, Mark Rutland wrote:
>>> So the issue is that we have the callchain below for a kernel image
>>> address:
>>>
>>> read_kcore()
>>> ->is_vmalloc_or_module_addr() // returns true
>>> ->vread()
>>> -->aligned_vread()
>>> --->vmalloc_to_page()
>>>
>>> In is_vmalloc{,or_module}_addr() we just check the addr against
>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel
>>> image address.
>>>
>>> Then, we call vmalloc_to_page(). While this only handles mappings made
>>> at page granularity, the kernel image mapping may have used sections. So
>>> this tries a bogus walk to the pte level.
>>> Should we special-case kernel image handling, e.g. with new
>>> is_kernel_image_addr() / kernel_image_to_page() helpers?
>>   yes ,  it seems to the best way to implents it  without performance back.
>> The following patch is the implment. Any thought?
>>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>>
>>   diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index b84615b..851ac35 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
>>         return false;
>>  #endif
>>  }
>> +
>> +static inline bool is_kernel_image_addr(const void *x)
>> +{
>> +       unsigned long addr = (unsigned long)x;
>> +
>> +       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
>> +
>> +}
>> +
>>  #ifdef CONFIG_MMU
>>  extern int is_vmalloc_or_module_addr(const void *x);
>>  #else
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 3ca82d4..9a9ef65 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
>>         return is_vmalloc_addr(x);
>>  }
>>
>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
>> +{
>> +       unsigned long addr = (unsigned long)kernel_addr;
>> +       struct page *page = NULL;
>> +       pud_t *pud;
>> +       pmd_t *pmd;
>> +       pte_t *pte;
>> +
>> +       if (pgd_none(*pgd))
>> +               goto out;
>> +
>> +       pud = pud_offset(pgd, addr);
>> +       if (pud_none(*pud))
>> +               goto out;
>> +
>> +       if (pud_sect(*pud))
>> +               return pud_page(*pud);
> The *_sect() helpers are arch-specific, so we cannot use them in generic
> code. This would need to be architecture-specific.
>
> Secondly, this will return head page of the section regardless of which
> page in the section the address corresponds to
  yes,   the code is too harry to finish.  it is so mess.  we will send the formal patch later.
  if you still accept the approach.

 Thanks
 zhongjiang
>> +
>> +       pmd = pmd_offset(*pmd, addr);
>> +       if (pmd_none(*pmd))
>> +               goto out;
>> +
>> +       if (pmd_sect(*pmd))
>> +               return pmd_page(*pmd);
> Likewise on both counts.
>
>> +
>> +       pte = pte_offset_kernel(pmd, addr);
>> +       if (pte_none(*pte))
>> +               goto out;
>> +
>> +       page = pte_page(*pte);
>> +
>> +out:
>> +       return page;
>> +
>> +}
> Given we know what the address should map to, I don't think we need to
> walk the page tables here. I think this can be:
>
> static struct page *kernel_image_to_page(const void *addr)
> {
> 	return virt_to_page(lm_alias(vmalloc_addr));
> }
>
>> +
>>  /*
>>    * Walk a vmap address to the struct page it maps.
>>    */
>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>          */
>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>
>> +       if (is_kernel_image_addr(vmalloc_addr))
>> +               return kernel_image_to_page(vmalloc_addr, pgd);
> It's not clear to me that this is the right place for this to live.
>
> It might be best to code the kernel image logic directly in kcore (and
> kmem), assuming everyone's OK with that approach.
>
> Thanks,
> Mark.
> .
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 17:52       ` Laura Abbott
  2017-04-24 17:56         ` Ard Biesheuvel
@ 2017-04-25 14:51         ` Mark Rutland
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Rutland @ 2017-04-25 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 24, 2017 at 10:52:08AM -0700, Laura Abbott wrote:
> On 04/24/2017 08:51 AM, Mark Rutland wrote:
> > On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:

> >>  /*
> >>    * Walk a vmap address to the struct page it maps.
> >>    */
> >> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
> >>          */
> >>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
> >>
> >> +       if (is_kernel_image_addr(vmalloc_addr))
> >> +               return kernel_image_to_page(vmalloc_addr, pgd);
> > 
> > It's not clear to me that this is the right place for this to live.
> > 
> > It might be best to code the kernel image logic directly in kcore (and
> > kmem), assuming everyone's OK with that approach.
> > 
> 
> That will fix kcore and kmem but this will show up in other places too.

True.

> We've gone through and made sure that virt_addr_valid returns
> true if and only if virt_to_page returns a valid address. I don't know
> if we can make as strong a claim about is_vmalloc_addr and
> vmalloc_to_page in all cases but is_vmalloc_addr should not return true
> for the kernel image. That would at least let kcore fall back to
> kern_addr_valid which should correctly handle the kernel image.

That would largely be my preference.

My fear is that other users of is_vmalloc_addr() are doing the right
thing for the kernel image today (e.g. not doing virt_to_phys()),
because they see it as a vmalloc addr.

So we might have to audit all of those.

> The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
> seems like the best approach although I haven't tried a prototype
> at all.

Given that (AFAICT) we're the only architecture that puts the kernel in
the vmalloc area, I agree that this is likely to be the simplest correct
approach. The interaction with KASLR is somewhat unfortunate.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-24 17:56         ` Ard Biesheuvel
  2017-04-25  8:13           ` zhong jiang
@ 2017-04-25 15:02           ` Mark Rutland
  2017-04-25 15:18             ` Ard Biesheuvel
  1 sibling, 1 reply; 11+ messages in thread
From: Mark Rutland @ 2017-04-25 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 24, 2017 at 06:56:43PM +0100, Ard Biesheuvel wrote:
> On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote:
> > On 04/24/2017 08:51 AM, Mark Rutland wrote:
> >> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:

> >> static struct page *kernel_image_to_page(const void *addr)
> >> {
> >>       return virt_to_page(lm_alias(vmalloc_addr));
> >> }
> >>
> >>> +
> >>>  /*
> >>>    * Walk a vmap address to the struct page it maps.
> >>>    */
> >>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
> >>>          */
> >>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
> >>>
> >>> +       if (is_kernel_image_addr(vmalloc_addr))
> >>> +               return kernel_image_to_page(vmalloc_addr, pgd);
> >>
> >> It's not clear to me that this is the right place for this to live.
> >>
> >> It might be best to code the kernel image logic directly in kcore (and
> >> kmem), assuming everyone's OK with that approach.
> >>
> >
> > That will fix kcore and kmem but this will show up in other places too.
> > We've gone through and made sure that virt_addr_valid returns
> > true if and only if virt_to_page returns a valid address. I don't know
> > if we can make as strong a claim about is_vmalloc_addr and
> > vmalloc_to_page in all cases but is_vmalloc_addr should not return true
> > for the kernel image. That would at least let kcore fall back to
> > kern_addr_valid which should correctly handle the kernel image.
> > The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
> > seems like the best approach although I haven't tried a prototype
> > at all.
> 
> Moving the kernel into the vmalloc region was kind of the point, for
> KASLR. Undoing that means either disabling KASLR, or splitting the
> vmalloc region into a KASLR region only for the kernel, and a vmalloc
> region like we had when vmlinux lived in the linear region.

AFAICT, x86 don't place the kernel in the vmalloc region for its KASLR
implementation. Is that just to avoid the issues that we're seeing, or
are there aother constraints on x86?

> In general, I think we should be able to deal with different kinds of
> mappings with different granularity in the vmalloc region. If
> necessary, we could introduce a VM_xxx flag for the kernel to
> distinguish such regions from ordinary VM_MAP regions.

I don't think that vmalloc_to_page() should have to deal with anything
other than the usual page-granular memory mappings in the vmalloc area,
as it currently does. We'd have to write a page table walker per-arch
for that to work, and the only thing it would benefit is arm64's kernel
image mapping.

Adding VM_xxx (e.g. VM_KERNEL) sounds sane to me regardless of anything
else.

That still leaves the question as to what is_vmalloc_addr(addr) (should)
imply about addr, though.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] arm64: fix the overlap between the kernel image and vmalloc address
  2017-04-25 15:02           ` Mark Rutland
@ 2017-04-25 15:18             ` Ard Biesheuvel
  0 siblings, 0 replies; 11+ messages in thread
From: Ard Biesheuvel @ 2017-04-25 15:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 April 2017 at 16:02, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Apr 24, 2017 at 06:56:43PM +0100, Ard Biesheuvel wrote:
>> On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote:
>> > On 04/24/2017 08:51 AM, Mark Rutland wrote:
>> >> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
>
>> >> static struct page *kernel_image_to_page(const void *addr)
>> >> {
>> >>       return virt_to_page(lm_alias(vmalloc_addr));
>> >> }
>> >>
>> >>> +
>> >>>  /*
>> >>>    * Walk a vmap address to the struct page it maps.
>> >>>    */
>> >>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>> >>>          */
>> >>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>> >>>
>> >>> +       if (is_kernel_image_addr(vmalloc_addr))
>> >>> +               return kernel_image_to_page(vmalloc_addr, pgd);
>> >>
>> >> It's not clear to me that this is the right place for this to live.
>> >>
>> >> It might be best to code the kernel image logic directly in kcore (and
>> >> kmem), assuming everyone's OK with that approach.
>> >>
>> >
>> > That will fix kcore and kmem but this will show up in other places too.
>> > We've gone through and made sure that virt_addr_valid returns
>> > true if and only if virt_to_page returns a valid address. I don't know
>> > if we can make as strong a claim about is_vmalloc_addr and
>> > vmalloc_to_page in all cases but is_vmalloc_addr should not return true
>> > for the kernel image. That would at least let kcore fall back to
>> > kern_addr_valid which should correctly handle the kernel image.
>> > The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
>> > seems like the best approach although I haven't tried a prototype
>> > at all.
>>
>> Moving the kernel into the vmalloc region was kind of the point, for
>> KASLR. Undoing that means either disabling KASLR, or splitting the
>> vmalloc region into a KASLR region only for the kernel, and a vmalloc
>> region like we had when vmlinux lived in the linear region.
>
> AFAICT, x86 don't place the kernel in the vmalloc region for its KASLR
> implementation. Is that just to avoid the issues that we're seeing, or
> are there aother constraints on x86?
>

Given how a lot of architectures put kernel modules in the vmalloc
space, I don't think there is generally an issue with putting the
kernel there as well, other than what we are currently experiencing.

As discussed, ioremap regions are also likely to be mapped using block
mappings these days, and also reside between VMALLOC_START and
VMALLOC_END.

So in my opinion, since
a) we already put text and data in the vmalloc region, and
b) we already use block mappings in the vmalloc region

putting the kernel there is only a minor variation of what we already
have to deal with, and so I don't think there is anything wrong with
it. We simply have to tweak some tests here and there, but nothing
fundamental afaict.

>> In general, I think we should be able to deal with different kinds of
>> mappings with different granularity in the vmalloc region. If
>> necessary, we could introduce a VM_xxx flag for the kernel to
>> distinguish such regions from ordinary VM_MAP regions.
>
> I don't think that vmalloc_to_page() should have to deal with anything
> other than the usual page-granular memory mappings in the vmalloc area,
> as it currently does. We'd have to write a page table walker per-arch
> for that to work, and the only thing it would benefit is arm64's kernel
> image mapping.
>
> Adding VM_xxx (e.g. VM_KERNEL) sounds sane to me regardless of anything
> else.
>
> That still leaves the question as to what is_vmalloc_addr(addr) (should)
> imply about addr, though.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-25 15:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-24  9:22 [PATCH] arm64: fix the overlap between the kernel image and vmalloc address zhongjiang
2017-04-24 10:44 ` Mark Rutland
2017-04-24 13:28   ` zhong jiang
2017-04-24 15:51     ` Mark Rutland
2017-04-24 17:52       ` Laura Abbott
2017-04-24 17:56         ` Ard Biesheuvel
2017-04-25  8:13           ` zhong jiang
2017-04-25 15:02           ` Mark Rutland
2017-04-25 15:18             ` Ard Biesheuvel
2017-04-25 14:51         ` Mark Rutland
2017-04-25 14:11       ` zhong jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.