linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86, mm: set NX across entire PMD at boot
@ 2014-11-14 20:45 Kees Cook
  2014-11-15  1:29 ` Yinghai Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Kees Cook @ 2014-11-14 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Yinghai Lu, Wang Nan,
	David Vrabel

When setting up permissions on kernel memory at boot, the end of the
PMD that was split from bss remained executable. It should be NX like
the rest. This performs a PMD alignment instead of a PAGE alignment to
get the correct span of memory, and should be freed.

Before:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
0xffffffff82df5000-0xffffffff82e00000    44K     RW       GLB x  pte
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

After:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
0xffffffff82df5000-0xffffffff82e00000    44K     RW           NX pte
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v2:
 - added call to free_init_pages(), as suggested by tglx
---
 arch/x86/mm/init_64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4cb8763868fc..0d498c922668 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
 	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long pmd_end = roundup(all_end, PMD_SIZE);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
 	 * The rodata/data/bss/brk section (but not the kernel text!)
 	 * should also be not-executable.
 	 */
-	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 
@@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
+	free_init_pages("unused kernel", all_end, pmd_end);
 	free_init_pages("unused kernel",
 			(unsigned long) __va(__pa_symbol(text_end)),
 			(unsigned long) __va(__pa_symbol(rodata_start)));
-- 
1.9.1


-- 
Kees Cook
Chrome OS Security

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-14 20:45 [PATCH v2] x86, mm: set NX across entire PMD at boot Kees Cook
@ 2014-11-15  1:29 ` Yinghai Lu
  2014-11-15  2:29   ` Yinghai Lu
  2014-11-15  3:06   ` Kees Cook
  0 siblings, 2 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-15  1:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook <keescook@chromium.org> wrote:
> When setting up permissions on kernel memory at boot, the end of the
> PMD that was split from bss remained executable. It should be NX like
> the rest. This performs a PMD alignment instead of a PAGE alignment to
> get the correct span of memory, and should be freed.
>
> Before:
> ---[ High Kernel Mapping ]---
> ...
> 0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
> 0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
> 0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
> 0xffffffff82df5000-0xffffffff82e00000    44K     RW       GLB x  pte
> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd
>
> After:
> ---[ High Kernel Mapping ]---
> ...
> 0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
> 0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
> 0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
> 0xffffffff82df5000-0xffffffff82e00000    44K     RW           NX pte
> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v2:
>  - added call to free_init_pages(), as suggested by tglx
> ---
>  arch/x86/mm/init_64.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4cb8763868fc..0d498c922668 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
>         unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>         unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
>         unsigned long all_end = PFN_ALIGN(&_end);
> +       unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>
>         printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>                (end - start) >> 10);
> @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
>          * The rodata/data/bss/brk section (but not the kernel text!)
>          * should also be not-executable.
>          */
> -       set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
> +       set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>
>         rodata_test();
>
> @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
>         set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>  #endif
>
> +       free_init_pages("unused kernel", all_end, pmd_end);
>         free_init_pages("unused kernel",
>                         (unsigned long) __va(__pa_symbol(text_end)),
>                         (unsigned long) __va(__pa_symbol(rodata_start)));

something is wrong:

[    7.842479] Freeing unused kernel memory: 3844K (ffffffff82e52000 -
ffffffff83213000)
[    7.843305] Write protecting the kernel read-only data: 28672k
[    7.844433] BUG: Bad page state in process swapper/0  pfn:043c0
[    7.845093] page:ffffea000010f000 count:0 mapcount:-127 mapping:
      (null) index:0x2
[    7.846388] flags: 0x10000000000000()
[    7.846871] page dumped because: nonzero mapcount
[    7.847343] Modules linked in:
[    7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
3.18.0-rc4-yh-01896-g40204c8-dirty #23
[    7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[    7.850014]  ffffffff828300ca ffff880078babd68 ffffffff81ff47d0
0000000000000001
[    7.850857]  ffffea000010f000 ffff880078babd98 ffffffff8118c2bd
00000000001d4cc0
[    7.851791]  ffffea000010f000 ffffea000010f000 0000000000000000
ffff880078babdf8
[    7.852700] Call Trace:
[    7.852991]  [<ffffffff81ff47d0>] dump_stack+0x45/0x57
[    7.853494]  [<ffffffff8118c2bd>] bad_page+0xfd/0x130
[    7.854130]  [<ffffffff8118c42c>] free_pages_prepare+0x13c/0x1c0
[    7.854808]  [<ffffffff8118c64d>] ? adjust_managed_page_count+0x5d/0x70
[    7.855575]  [<ffffffff8118f285>] free_hot_cold_page+0x35/0x180
[    7.856326]  [<ffffffff8118f3e3>] __free_pages+0x13/0x40
[    7.856854]  [<ffffffff8118f4dd>] free_reserved_area+0xcd/0x140
[    7.857442]  [<ffffffff81091778>] free_init_pages+0x98/0xb0
[    7.858001]  [<ffffffff81092085>] mark_rodata_ro+0xb5/0x120
[    7.858622]  [<ffffffff81fe3240>] ? rest_init+0xc0/0xc0
[    7.859174]  [<ffffffff81fe325d>] kernel_init+0x1d/0x100
[    7.859724]  [<ffffffff820066ec>] ret_from_fork+0x7c/0xb0
[    7.860279]  [<ffffffff81fe3240>] ? rest_init+0xc0/0xc0
[    7.860836] Disabling lock debugging due to kernel taint
[    7.861432] Freeing unused kernel memory: 376K (ffffffff843a2000 -
ffffffff84400000)
[    7.866118] Freeing unused kernel memory: 1980K (ffff880002011000 -
ffff880002200000)
[    7.870525] Freeing unused kernel memory: 1932K (ffff880002a1d000 -
ffff880002c00000)

[    0.000000]   .text: [0x01000000-0x0200d548]
[    0.000000] .rodata: [0x02200000-0x02a1cfff]
[    0.000000]   .data: [0x02c00000-0x02e50e7f]
[    0.000000]   .init: [0x02e52000-0x03212fff]
[    0.000000]    .bss: [0x03221000-0x0437bfff]
[    0.000000]    .brk: [0x0437c000-0x043a1fff]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  1:29 ` Yinghai Lu
@ 2014-11-15  2:29   ` Yinghai Lu
  2014-11-15  2:46     ` Kees Cook
  2014-11-15  3:06   ` Kees Cook
  1 sibling, 1 reply; 19+ messages in thread
From: Yinghai Lu @ 2014-11-15  2:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]

On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook <keescook@chromium.org> wrote:
>> v2:
>>  - added call to free_init_pages(), as suggested by tglx

> something is wrong:
>
> [    7.842479] Freeing unused kernel memory: 3844K (ffffffff82e52000 -
> ffffffff83213000)
> [    7.843305] Write protecting the kernel read-only data: 28672k

....
should use attached one instead.

1. should use _brk_end instead of &end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted. aka
   is not wasted.

---
 arch/x86/mm/init_64.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
     unsigned long end = (unsigned long) &__end_rodata_hpage_align;
     unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
     unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-    unsigned long all_end = PFN_ALIGN(&_end);
+    unsigned long all_end = PFN_ALIGN(_brk_end);
+    unsigned long pmd_end = roundup(all_end, PMD_SIZE);

     printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
            (end - start) >> 10);
@@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
      * The rodata/data/bss/brk section (but not the kernel text!)
      * should also be not-executable.
      */
-    set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+    set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);

     rodata_test();

@@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
     set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif

+    /* all_end to pmd_end is handled via free_all_bootmem() */
     free_init_pages("unused kernel",
             (unsigned long) __va(__pa_symbol(text_end)),
             (unsigned long) __va(__pa_symbol(rodata_start)));

[-- Attachment #2: nx_after_end.patch --]
[-- Type: text/x-patch, Size: 1529 bytes --]

1. should use _brk_end instead of &end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted. aka
   is not wasted.

---
 arch/x86/mm/init_64.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long all_end = PFN_ALIGN(_brk_end);
+	unsigned long pmd_end = roundup(all_end, PMD_SIZE);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
 	 * The rodata/data/bss/brk section (but not the kernel text!)
 	 * should also be not-executable.
 	 */
-	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 
@@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
+	/* all_end to pmd_end is handled via free_all_bootmem() */
 	free_init_pages("unused kernel",
 			(unsigned long) __va(__pa_symbol(text_end)),
 			(unsigned long) __va(__pa_symbol(rodata_start)));

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  2:29   ` Yinghai Lu
@ 2014-11-15  2:46     ` Kees Cook
  2014-11-15  3:38       ` Yinghai Lu
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Kees Cook @ 2014-11-15  2:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook <keescook@chromium.org> wrote:
>>> v2:
>>>  - added call to free_init_pages(), as suggested by tglx
>
>> something is wrong:
>>
>> [    7.842479] Freeing unused kernel memory: 3844K (ffffffff82e52000 -
>> ffffffff83213000)
>> [    7.843305] Write protecting the kernel read-only data: 28672k
>
> ....
> should use attached one instead.
>
> 1. should use _brk_end instead of &end, as we only use partial of
>    brk.
> 2. [_brk_end, pm_end) page range is already converted. aka
>    is not wasted.

Are you sure? For me, _brk_end isn't far enough:

[    1.475572] all_end: 0xffffffff82df5000
[    1.476736] _brk_end: 0xffffffff82dd6000

>
> ---
>  arch/x86/mm/init_64.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/init_64.c
> +++ linux-2.6/arch/x86/mm/init_64.c
> @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
>      unsigned long end = (unsigned long) &__end_rodata_hpage_align;
>      unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>      unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
> -    unsigned long all_end = PFN_ALIGN(&_end);
> +    unsigned long all_end = PFN_ALIGN(_brk_end);
> +    unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>
>      printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>             (end - start) >> 10);
> @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
>       * The rodata/data/bss/brk section (but not the kernel text!)
>       * should also be not-executable.
>       */
> -    set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
> +    set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>
>      rodata_test();
>
> @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
>      set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>  #endif
>
> +    /* all_end to pmd_end is handled via free_all_bootmem() */
>      free_init_pages("unused kernel",
>              (unsigned long) __va(__pa_symbol(text_end)),
>              (unsigned long) __va(__pa_symbol(rodata_start)));

This patch produces the same results as my v1 patch:

0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82e00000    12M     RW   PSE GLB NX pmd
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

Is this correct? It sounded like tglx wanted the pmd split, like this:

0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
0xffffffff82df5000-0xffffffff82e00000    44K     RW           NX pte
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd


-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  1:29 ` Yinghai Lu
  2014-11-15  2:29   ` Yinghai Lu
@ 2014-11-15  3:06   ` Kees Cook
  2014-11-15  3:34     ` Yinghai Lu
  1 sibling, 1 reply; 19+ messages in thread
From: Kees Cook @ 2014-11-15  3:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook <keescook@chromium.org> wrote:
>> When setting up permissions on kernel memory at boot, the end of the
>> PMD that was split from bss remained executable. It should be NX like
>> the rest. This performs a PMD alignment instead of a PAGE alignment to
>> get the correct span of memory, and should be freed.
>>
>> Before:
>> ---[ High Kernel Mapping ]---
>> ...
>> 0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
>> 0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
>> 0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
>> 0xffffffff82df5000-0xffffffff82e00000    44K     RW       GLB x  pte
>> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd
>>
>> After:
>> ---[ High Kernel Mapping ]---
>> ...
>> 0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
>> 0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
>> 0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
>> 0xffffffff82df5000-0xffffffff82e00000    44K     RW           NX pte
>> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> v2:
>>  - added call to free_init_pages(), as suggested by tglx
>> ---
>>  arch/x86/mm/init_64.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 4cb8763868fc..0d498c922668 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
>>         unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>>         unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
>>         unsigned long all_end = PFN_ALIGN(&_end);
>> +       unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>>
>>         printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>>                (end - start) >> 10);
>> @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
>>          * The rodata/data/bss/brk section (but not the kernel text!)
>>          * should also be not-executable.
>>          */
>> -       set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
>> +       set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>>
>>         rodata_test();
>>
>> @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
>>         set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>>  #endif
>>
>> +       free_init_pages("unused kernel", all_end, pmd_end);
>>         free_init_pages("unused kernel",
>>                         (unsigned long) __va(__pa_symbol(text_end)),
>>                         (unsigned long) __va(__pa_symbol(rodata_start)));
>
> something is wrong:
>
> [    7.842479] Freeing unused kernel memory: 3844K (ffffffff82e52000 -
> ffffffff83213000)
> [    7.843305] Write protecting the kernel read-only data: 28672k
> [    7.844433] BUG: Bad page state in process swapper/0  pfn:043c0
> [    7.845093] page:ffffea000010f000 count:0 mapcount:-127 mapping:
>       (null) index:0x2
> [    7.846388] flags: 0x10000000000000()
> [    7.846871] page dumped because: nonzero mapcount
> [    7.847343] Modules linked in:
> [    7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 3.18.0-rc4-yh-01896-g40204c8-dirty #23
> [    7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
> 04/01/2014
> [    7.850014]  ffffffff828300ca ffff880078babd68 ffffffff81ff47d0
> 0000000000000001
> [    7.850857]  ffffea000010f000 ffff880078babd98 ffffffff8118c2bd
> 00000000001d4cc0
> [    7.851791]  ffffea000010f000 ffffea000010f000 0000000000000000
> ffff880078babdf8
> [    7.852700] Call Trace:
> [    7.852991]  [<ffffffff81ff47d0>] dump_stack+0x45/0x57
> [    7.853494]  [<ffffffff8118c2bd>] bad_page+0xfd/0x130
> [    7.854130]  [<ffffffff8118c42c>] free_pages_prepare+0x13c/0x1c0
> [    7.854808]  [<ffffffff8118c64d>] ? adjust_managed_page_count+0x5d/0x70
> [    7.855575]  [<ffffffff8118f285>] free_hot_cold_page+0x35/0x180
> [    7.856326]  [<ffffffff8118f3e3>] __free_pages+0x13/0x40
> [    7.856854]  [<ffffffff8118f4dd>] free_reserved_area+0xcd/0x140
> [    7.857442]  [<ffffffff81091778>] free_init_pages+0x98/0xb0
> [    7.858001]  [<ffffffff81092085>] mark_rodata_ro+0xb5/0x120
> [    7.858622]  [<ffffffff81fe3240>] ? rest_init+0xc0/0xc0
> [    7.859174]  [<ffffffff81fe325d>] kernel_init+0x1d/0x100
> [    7.859724]  [<ffffffff820066ec>] ret_from_fork+0x7c/0xb0
> [    7.860279]  [<ffffffff81fe3240>] ? rest_init+0xc0/0xc0
> [    7.860836] Disabling lock debugging due to kernel taint
> [    7.861432] Freeing unused kernel memory: 376K (ffffffff843a2000 -
> ffffffff84400000)
> [    7.866118] Freeing unused kernel memory: 1980K (ffff880002011000 -
> ffff880002200000)
> [    7.870525] Freeing unused kernel memory: 1932K (ffff880002a1d000 -
> ffff880002c00000)

Also, what tree is this? "Freeing %s" went away in
c88442ec45f30d587b38b935a14acde4e217a926 (and should probably be
re-added, which is what I assume has happened.)

>
> [    0.000000]   .text: [0x01000000-0x0200d548]
> [    0.000000] .rodata: [0x02200000-0x02a1cfff]
> [    0.000000]   .data: [0x02c00000-0x02e50e7f]
> [    0.000000]   .init: [0x02e52000-0x03212fff]
> [    0.000000]    .bss: [0x03221000-0x0437bfff]
> [    0.000000]    .brk: [0x0437c000-0x043a1fff]

And which CONFIG turns on this reporting?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  3:06   ` Kees Cook
@ 2014-11-15  3:34     ` Yinghai Lu
  0 siblings, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-15  3:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, Nov 14, 2014 at 7:06 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> [    0.000000]   .text: [0x01000000-0x0200d548]
>> [    0.000000] .rodata: [0x02200000-0x02a1cfff]
>> [    0.000000]   .data: [0x02c00000-0x02e50e7f]
>> [    0.000000]   .init: [0x02e52000-0x03212fff]
>> [    0.000000]    .bss: [0x03221000-0x0437bfff]
>> [    0.000000]    .brk: [0x0437c000-0x043a1fff]
>
> And which CONFIG turns on this reporting?
>

My local patch.

this brk does not shrink yet.

Yinghai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  2:46     ` Kees Cook
@ 2014-11-15  3:38       ` Yinghai Lu
  2014-11-15  9:43         ` Yinghai Lu
  2014-11-16 18:52         ` Thomas Gleixner
  2014-11-16 23:44       ` Thomas Gleixner
  2014-11-18 17:11       ` Thomas Gleixner
  2 siblings, 2 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-15  3:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:

>> should use attached one instead.
>>
>> 1. should use _brk_end instead of &end, as we only use partial of
>>    brk.
>> 2. [_brk_end, pm_end) page range is already converted. aka
>>    is not wasted.
>
> Are you sure? For me, _brk_end isn't far enough:
>
> [    1.475572] all_end: 0xffffffff82df5000
> [    1.476736] _brk_end: 0xffffffff82dd6000

Yes. _brk_end should be small then &_end.

>
>>
>> ---
>>  arch/x86/mm/init_64.c |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6/arch/x86/mm/init_64.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/mm/init_64.c
>> +++ linux-2.6/arch/x86/mm/init_64.c
>> @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
>>      unsigned long end = (unsigned long) &__end_rodata_hpage_align;
>>      unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>>      unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
>> -    unsigned long all_end = PFN_ALIGN(&_end);
>> +    unsigned long all_end = PFN_ALIGN(_brk_end);
>> +    unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>>
>>      printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>>             (end - start) >> 10);
>> @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
>>       * The rodata/data/bss/brk section (but not the kernel text!)
>>       * should also be not-executable.
>>       */
>> -    set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
>> +    set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>>
>>      rodata_test();
>>
>> @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
>>      set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>>  #endif
>>
>> +    /* all_end to pmd_end is handled via free_all_bootmem() */
>>      free_init_pages("unused kernel",
>>              (unsigned long) __va(__pa_symbol(text_end)),
>>              (unsigned long) __va(__pa_symbol(rodata_start)));
>
> This patch produces the same results as my v1 patch:
>
> 0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
> 0xffffffff82200000-0xffffffff82e00000    12M     RW   PSE GLB NX pmd
> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd
>
> Is this correct? It sounded like tglx wanted the pmd split, like this:
>
> 0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
> 0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
> 0xffffffff82df5000-0xffffffff82e00000    44K     RW           NX pte
> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

Need to remove GLB ?

Yinghai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  3:38       ` Yinghai Lu
@ 2014-11-15  9:43         ` Yinghai Lu
  2014-11-16 21:26           ` Thomas Gleixner
  2014-11-16 18:52         ` Thomas Gleixner
  1 sibling, 1 reply; 19+ messages in thread
From: Yinghai Lu @ 2014-11-15  9:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

[-- Attachment #1: Type: text/plain, Size: 5650 bytes --]

On Fri, Nov 14, 2014 at 7:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook <keescook@chromium.org> wrote:
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
>>
>> 0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
>> 0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
>> 0xffffffff82df5000-0xffffffff82e00000    44K     RW           NX pte
>> 0xffffffff82e00000-0xffffffffc0000000   978M                     pmd
>
> Need to remove GLB ?

Please check attached one that clean up the highmap tail...

Subject: [RFC PATCH] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of &end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[    0.000000]   .text: [0x01000000-0x0200d5c8]
[    0.000000] .rodata: [0x02200000-0x02a1cfff]
[    0.000000]   .data: [0x02c00000-0x02e50e7f]
[    0.000000]   .init: [0x02e52000-0x03212fff]
[    0.000000]    .bss: [0x03221000-0x0437bfff]
[    0.000000]    .brk: [0x0437c000-0x043a1fff]

Actually used brk:
[    0.272959] memblock_reserve: [0x0000000437c000-0x00000004382fff]
flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff843a2000        1672K     RW             GLB NX pte
0xffffffff843a2000-0xffffffff84400000         376K     RW             GLB x  pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd
After patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84383000        1548K     RW             GLB NX pte
0xffffffff84383000-0xffffffff84400000         500K                           pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c  |   23 ++++++++++++++++++++++-
 arch/x86/mm/pageattr.c |    2 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,7 @@ void __init init_extra_mapping_uc(unsign
     __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }

+static pmd_t *last_pmd;
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
             continue;
         if (vaddr < (unsigned long) _text || vaddr > end)
             set_pmd(pmd, __pmd(0));
+        else
+            last_pmd = pmd;
     }
 }

+static void __init cleanup_highmap_tail(unsigned long addr)
+{
+    int i;
+    pte_t *pte;
+
+    if (!last_pmd || pmd_none(*last_pmd))
+        return;
+
+    pte = (pte_t *)pmd_page_vaddr(*last_pmd);
+    pte += pte_index(addr);
+
+    for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++)
+        set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
           pgprot_t prot)
@@ -1124,7 +1142,8 @@ void mark_rodata_ro(void)
     unsigned long end = (unsigned long) &__end_rodata_hpage_align;
     unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
     unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-    unsigned long all_end = PFN_ALIGN(&_end);
+    unsigned long all_end = PFN_ALIGN(_brk_end);
+    unsigned long pmd_end = roundup(all_end, PMD_SIZE);

     printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
            (end - start) >> 10);
@@ -1137,6 +1156,8 @@ void mark_rodata_ro(void)
      * should also be not-executable.
      */
     set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+    if (all_end < pmd_end)
+        cleanup_highmap_tail(all_end);

     rodata_test();

Index: linux-2.6/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -100,7 +100,7 @@ static inline unsigned long highmap_star

 static inline unsigned long highmap_end_pfn(void)
 {
-    return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
+    return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
 }

 #endif

[-- Attachment #2: nx_after_end.patch --]
[-- Type: text/x-patch, Size: 4928 bytes --]

Subject: [RFC PATCH] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of &end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[    0.000000]   .text: [0x01000000-0x0200d5c8]
[    0.000000] .rodata: [0x02200000-0x02a1cfff]
[    0.000000]   .data: [0x02c00000-0x02e50e7f]
[    0.000000]   .init: [0x02e52000-0x03212fff]
[    0.000000]    .bss: [0x03221000-0x0437bfff]
[    0.000000]    .brk: [0x0437c000-0x043a1fff]

Actually used brk:
[    0.272959] memblock_reserve: [0x0000000437c000-0x00000004382fff] flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff843a2000        1672K     RW             GLB NX pte
0xffffffff843a2000-0xffffffff84400000         376K     RW             GLB x  pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd
After patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84383000        1548K     RW             GLB NX pte
0xffffffff84383000-0xffffffff84400000         500K                           pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c  |   23 ++++++++++++++++++++++-
 arch/x86/mm/pageattr.c |    2 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,7 @@ void __init init_extra_mapping_uc(unsign
 	__init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }
 
+static pmd_t *last_pmd;
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
 			continue;
 		if (vaddr < (unsigned long) _text || vaddr > end)
 			set_pmd(pmd, __pmd(0));
+		else
+			last_pmd = pmd;
 	}
 }
 
+static void __init cleanup_highmap_tail(unsigned long addr)
+{
+	int i;
+	pte_t *pte;
+
+	if (!last_pmd || pmd_none(*last_pmd))
+		return;
+
+	pte = (pte_t *)pmd_page_vaddr(*last_pmd);
+	pte += pte_index(addr);
+
+	for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++)
+		set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
 	      pgprot_t prot)
@@ -1124,7 +1142,8 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long all_end = PFN_ALIGN(_brk_end);
+	unsigned long pmd_end = roundup(all_end, PMD_SIZE);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1137,6 +1156,8 @@ void mark_rodata_ro(void)
 	 * should also be not-executable.
 	 */
 	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+	if (all_end < pmd_end)
+		cleanup_highmap_tail(all_end);
 
 	rodata_test();
 
Index: linux-2.6/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -100,7 +100,7 @@ static inline unsigned long highmap_star
 
 static inline unsigned long highmap_end_pfn(void)
 {
-	return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
+	return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
 }
 
 #endif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  3:38       ` Yinghai Lu
  2014-11-15  9:43         ` Yinghai Lu
@ 2014-11-16 18:52         ` Thomas Gleixner
  2014-11-17  3:26           ` Yinghai Lu
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2014-11-16 18:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Kees Cook, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, 14 Nov 2014, Yinghai Lu wrote:

> On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> 
> >> should use attached one instead.
> >>
> >> 1. should use _brk_end instead of &end, as we only use partial of
> >>    brk.
> >> 2. [_brk_end, pm_end) page range is already converted. aka
> >>    is not wasted.
> >
> > Are you sure? For me, _brk_end isn't far enough:
> >
> > [    1.475572] all_end: 0xffffffff82df5000
> > [    1.476736] _brk_end: 0xffffffff82dd6000
> 
> Yes. _brk_end should be small then &_end.

Wrong. _brk_end can move up to _end, i.e. to __brk_limit.
 
But it's safe to use _brk_end when mark_rodata_ro() is called because
extend_brk() is gone already at that point.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  9:43         ` Yinghai Lu
@ 2014-11-16 21:26           ` Thomas Gleixner
  2014-11-17  3:30             ` Yinghai Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2014-11-16 21:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Kees Cook, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Sat, 15 Nov 2014, Yinghai Lu wrote:
> +static pmd_t *last_pmd;
>  /*
>   * The head.S code sets up the kernel high mapping:
>   *
> @@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
>              continue;
>          if (vaddr < (unsigned long) _text || vaddr > end)
>              set_pmd(pmd, __pmd(0));
> +        else
> +            last_pmd = pmd;

Why do you need to store this? You can compute this.
 
> +static void __init cleanup_highmap_tail(unsigned long addr)

Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which
will free exactly that code.

> +{
> +    int i;
> +    pte_t *pte;
> +
> +    if (!last_pmd || pmd_none(*last_pmd))
> +        return;

Aside of the above: This is complete and utter crap. Just another
useless function which can be avoided by switching on brain before
coding.

> Index: linux-2.6/arch/x86/mm/pageattr.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/pageattr.c
> +++ linux-2.6/arch/x86/mm/pageattr.c
> @@ -100,7 +100,7 @@ static inline unsigned long highmap_star
> 
>  static inline unsigned long highmap_end_pfn(void)
>  {
> -    return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
> +    return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
>  }

Wrong as well for CONFIG_DEBUG_RODATA=n for obvious reasons.

I know why I have your patches on blacklist ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  2:46     ` Kees Cook
  2014-11-15  3:38       ` Yinghai Lu
@ 2014-11-16 23:44       ` Thomas Gleixner
  2014-11-17  4:00         ` Yinghai Lu
  2014-11-17 20:27         ` Kees Cook
  2014-11-18 17:11       ` Thomas Gleixner
  2 siblings, 2 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-11-16 23:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, 14 Nov 2014, Kees Cook wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > should use attached one instead.
> >
> > 1. should use _brk_end instead of &end, as we only use partial of
> >    brk.
> > 2. [_brk_end, pm_end) page range is already converted. aka
> >    is not wasted.
> 
> Are you sure? For me, _brk_end isn't far enough:
> 
> [    1.475572] all_end: 0xffffffff82df5000
> [    1.476736] _brk_end: 0xffffffff82dd6000

_brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
which is the same as _end. We usually do not use all of that space. So
it's expected that _brk_end < _end.
 
> Is this correct? It sounded like tglx wanted the pmd split, like this:

Yes, I wanted to get rid of the high mapping for anything between
_brk_end and _end, and I brought you on the wrong track with my
suggestion to call free_init_pages(). Sorry about that.

That happened because I missed the completely non obvious fact, that
only the effective brk section is reserved for the kernel via
reserve_brk(). So the area between _brk_end and _end is already
reusable. Though that reuse works only by chance and not by design and
is completely undocumented as everything else in that area.

So the initial patch to get rid of the X mapping is of course to just
extend the area to the PMD. A little bit different to your initial
patch, but essentially the same.

-	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);

I'm going to apply your V1 patch with the above roundup()
simplification. If a page of that area gets used later on then we are
going to split up the PMD anyway.
 
But still we want to get rid of that highmap between _brk_end and
_end, but there is absolutely no reason to come up with extra silly
functions for that.

So the obvious solution is to let setup_arch() reserve the memory up
to _end instead of _bss_stop, get rid of the extra reservation in
reserve_brk() and then let free_initmem() release the area between
_brk_end and _end. No extra hackery, no side effects, just works.

I spent quite some time to stare into that and I wonder about the
following related issues:

1) Why is the mark_rodata_ro() business a debug configuration, i.e
   CONFIG_DEBUG_RODATA?

   This does not make any sense at all. We really want RO and NX on by
   default and AFAICT distros are turning that on anyway for obvious
   reasons.

   The only idiocity I found so far is the kgdb Documentation which
   recommends to turn it off. Sigh.

   So that should be changed to:

      config ARCH_HAS_RONX
      	  bool   

      config DISABLE_RONX
   	  def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES

   plus

      config ARCH_WANTS_KGDB_SECURITY_HOLES
      	  bool

      config KGDB_ENABLE_SECURITY_HOLES
      	  bool "WTF?"
   	  depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES

2) What is actually the modules counterpart for mark_rodata_ro()?

   CONFIG_DEBUG_SET_MODULE_RONX

   Of course not enabled by default, but enabled by distros again.

   See #1.
   
   Now what's interesting aside of the general fuckup is that
   CONFIG_DEBUG_RODATA is supported by:
  
     arch/x86 and arch/parisc

   But CONFIG_DEBUG_SET_MODULE_RONX is supported by:

     arch/arm/ arch/arm64 arch/s390 and arch/x86

   This does not make any sense at all.

   Do arm/arm64/s390 have other means to make RO/NX work or are they
   just doing it for modules? And how is that supposed to work with
   KGDB if it is not aware of modules sections being RO/NX? KGDB has
   only extra magic for CONFIG_DEBUG_RODATA, but not for
   CONFIG_DEBUG_SET_MODULE_RONX.

   Now for extended fun the x86 help text for that option says:
   
    ... Such protection may interfere with run-time code
    patching and dynamic kernel tracing - and they might also protect
    against certain classes of kernel exploits.
    If in doubt, say "N".

   Patently wrong. More sigh.  

3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
   cannot be marked __init ?

   Just because ...

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-16 18:52         ` Thomas Gleixner
@ 2014-11-17  3:26           ` Yinghai Lu
  0 siblings, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-17  3:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Sun, Nov 16, 2014 at 10:52 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >
>> > Are you sure? For me, _brk_end isn't far enough:
>> >
>> > [    1.475572] all_end: 0xffffffff82df5000
>> > [    1.476736] _brk_end: 0xffffffff82dd6000
>>
>> Yes. _brk_end should be small then &_end.
>
> Wrong. _brk_end can move up to _end, i.e. to __brk_limit.

Confused. What is wrong? _brk_end is not smaller than &_end?

Yinghai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-16 21:26           ` Thomas Gleixner
@ 2014-11-17  3:30             ` Yinghai Lu
  0 siblings, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-17  3:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

[-- Attachment #1: Type: text/plain, Size: 4715 bytes --]

On Sun, Nov 16, 2014 at 1:26 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 15 Nov 2014, Yinghai Lu wrote:
>> +static pmd_t *last_pmd;
>>  /*
>>   * The head.S code sets up the kernel high mapping:
>>   *
>> @@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
>>              continue;
>>          if (vaddr < (unsigned long) _text || vaddr > end)
>>              set_pmd(pmd, __pmd(0));
>> +        else
>> +            last_pmd = pmd;
>
> Why do you need to store this? You can compute this.

I'm not quite sure about the xen path.

>
>> +static void __init cleanup_highmap_tail(unsigned long addr)
>
> Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which
> will free exactly that code.

I missed that.

Please check this one that address three problems that you pointed out.

Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of &_end in mark_rodata_ro().
   _brk_end can move up to &_end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[    0.000000]    .brk: [0x0437c000-0x043a1fff]

Actually used brk:
[    0.272959] memblock_reserve: [0x0000000437c000-0x00000004382fff]
flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
...
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff843a2000        1672K     RW             GLB NX pte
0xffffffff843a2000-0xffffffff84400000         376K     RW             GLB x  pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd
After patch:
---[ High Kernel Mapping ]---
...
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84383000        1548K     RW             GLB NX pte
0xffffffff84383000-0xffffffff84400000         500K                           pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

-v2: according to tglx
 caculate the pmd postion instead of passing last_pmd.
 cleanup_highmap_tail could not have __init, as it is called in mark_rodata_ro
 and mark_rodata_ro is called after free_initmem.
 highmap_end_pfn should keep PMD_SIZE alignment on !CONFIG_DEBUG_RODATA

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c  |   22 +++++++++++++++++++++-
 arch/x86/mm/pageattr.c |    4 ++++
 2 files changed, 25 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -411,6 +411,23 @@ void __init cleanup_highmap(void)
     }
 }

+static void cleanup_highmap_tail(unsigned long addr)
+{
+    int i;
+    pgd_t *pgd;
+    pud_t *pud;
+    pmd_t *pmd;
+    pte_t *pte;
+
+    pgd = pgd_offset_k(addr);
+    pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
+    pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+    pte = (pte_t *)pmd_page_vaddr(*pmd) + pte_index(addr);
+
+    for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++)
+        set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
           pgprot_t prot)
@@ -1124,7 +1141,8 @@ void mark_rodata_ro(void)
     unsigned long end = (unsigned long) &__end_rodata_hpage_align;
     unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
     unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-    unsigned long all_end = PFN_ALIGN(&_end);
+    unsigned long all_end = PFN_ALIGN(_brk_end);
+    unsigned long pmd_end = roundup(all_end, PMD_SIZE);

     printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
            (end - start) >> 10);
@@ -1137,6 +1155,8 @@ void mark_rodata_ro(void)
      * should also be not-executable.
      */
     set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+    if (all_end < pmd_end)
+        cleanup_highmap_tail(all_end);

     rodata_test();

Index: linux-2.6/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -100,7 +100,11 @@ static inline unsigned long highmap_star

 static inline unsigned long highmap_end_pfn(void)
 {
+#ifdef CONFIG_DEBUG_RODATA
+    return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
+#else
     return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
+#endif
 }

 #endif

[-- Attachment #2: nx_after_end_v2.patch --]
[-- Type: text/x-patch, Size: 3817 bytes --]

Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of &_end in mark_rodata_ro().
   _brk_end can move up to &_end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[    0.000000]    .brk: [0x0437c000-0x043a1fff]

Actually used brk:
[    0.272959] memblock_reserve: [0x0000000437c000-0x00000004382fff] flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
...
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff843a2000        1672K     RW             GLB NX pte
0xffffffff843a2000-0xffffffff84400000         376K     RW             GLB x  pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd
After patch:
---[ High Kernel Mapping ]---
...
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84383000        1548K     RW             GLB NX pte
0xffffffff84383000-0xffffffff84400000         500K                           pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

-v2: according to tglx
 caculate the pmd postion instead of passing last_pmd.
 cleanup_highmap_tail could not have __init, as it is called in mark_rodata_ro
 and mark_rodata_ro is called after free_initmem.
 highmap_end_pfn should keep PMD_SIZE alignment on !CONFIG_DEBUG_RODATA

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c  |   22 +++++++++++++++++++++-
 arch/x86/mm/pageattr.c |    4 ++++
 2 files changed, 25 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -411,6 +411,23 @@ void __init cleanup_highmap(void)
 	}
 }
 
+static void cleanup_highmap_tail(unsigned long addr)
+{
+	int i;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pgd = pgd_offset_k(addr);
+	pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
+	pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+	pte = (pte_t *)pmd_page_vaddr(*pmd) + pte_index(addr);
+
+	for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++)
+		set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
 	      pgprot_t prot)
@@ -1124,7 +1141,8 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long all_end = PFN_ALIGN(_brk_end);
+	unsigned long pmd_end = roundup(all_end, PMD_SIZE);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1137,6 +1155,8 @@ void mark_rodata_ro(void)
 	 * should also be not-executable.
 	 */
 	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+	if (all_end < pmd_end)
+		cleanup_highmap_tail(all_end);
 
 	rodata_test();
 
Index: linux-2.6/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -100,7 +100,11 @@ static inline unsigned long highmap_star
 
 static inline unsigned long highmap_end_pfn(void)
 {
+#ifdef CONFIG_DEBUG_RODATA
+	return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
+#else
 	return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
+#endif
 }
 
 #endif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-16 23:44       ` Thomas Gleixner
@ 2014-11-17  4:00         ` Yinghai Lu
  2014-11-17 20:27         ` Kees Cook
  1 sibling, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-17  4:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
> which is the same as _end. We usually do not use all of that space. So
> it's expected that _brk_end < _end.
>
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
>
> Yes, I wanted to get rid of the high mapping for anything between
> _brk_end and _end, and I brought you on the wrong track with my
> suggestion to call free_init_pages(). Sorry about that.

:)

>
> That happened because I missed the completely non obvious fact, that
> only the effective brk section is reserved for the kernel via
> reserve_brk(). So the area between _brk_end and _end is already
> reusable. Though that reuse works only by chance and not by design and
> is completely undocumented as everything else in that area.

where is info for everything else?

>
> So the initial patch to get rid of the X mapping is of course to just
> extend the area to the PMD. A little bit different to your initial
> patch, but essentially the same.
>
> -       unsigned long all_end = PFN_ALIGN(&_end);
> +       unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
>
> I'm going to apply your V1 patch with the above roundup()
> simplification. If a page of that area gets used later on then we are
> going to split up the PMD anyway.

should use _brk_end instead of &_end for the roundup?

>
> But still we want to get rid of that highmap between _brk_end and
> _end, but there is absolutely no reason to come up with extra silly
> functions for that.
>
> So the obvious solution is to let setup_arch() reserve the memory up
> to _end instead of _bss_stop, get rid of the extra reservation in
> reserve_brk() and then let free_initmem() release the area between
> _brk_end and _end. No extra hackery, no side effects, just works.

So where get the highmap to be removed?
free_init_pages via free_initmem()?

with the code change like you suggested,

---
 arch/x86/kernel/setup.c |    6 +-----
 arch/x86/mm/init.c      |    6 ++++++
 arch/x86/mm/init_64.c   |    2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -286,10 +286,6 @@ static void __init cleanup_highmap(void)

 static void __init reserve_brk(void)
 {
-    if (_brk_end > _brk_start)
-        memblock_reserve(__pa_symbol(_brk_start),
-                 _brk_end - _brk_start);
-
     /* Mark brk area as locked down and no longer taking any
        new allocations */
     _brk_start = 0;
@@ -857,7 +853,7 @@ dump_kernel_offset(struct notifier_block
 void __init setup_arch(char **cmdline_p)
 {
     memblock_reserve(__pa_symbol(_text),
-             (unsigned long)__bss_stop - (unsigned long)_text);
+             (unsigned long)_end - (unsigned long)_text);

     early_reserve_initrd();

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1122,7 +1122,7 @@ void mark_rodata_ro(void)
     unsigned long end = (unsigned long) &__end_rodata_hpage_align;
     unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
     unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-    unsigned long all_end = PFN_ALIGN(&_end);
+    unsigned long all_end = roundup((unsigned long)&_end, PMD_SIZE);

     printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
            (end - start) >> 10);
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -637,9 +637,15 @@ void free_init_pages(char *what, unsigne

 void free_initmem(void)
 {
+    unsigned long all_end = roundup((unsigned long)&_end, PMD_SIZE);
+
     free_init_pages("unused kernel",
             (unsigned long)(&__init_begin),
             (unsigned long)(&__init_end));
+
+#ifdef CONFIG_X86_64
+    free_init_pages("unused brk", PFN_ALIGN(_brk_end), all_end);
+#endif
 }

 #ifdef CONFIG_BLK_DEV_INITRD


got
[    7.942636] Freeing unused brk memory: 500K (ffffffff84383000 -
ffffffff84400000)

---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84400000           2M     RW             GLB NX pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

the _brk_end to &_end highmap is still there

Also if you want to remove highmap between _brk_end and _end,
Do you want highmap for
                        [text_end, rodata_start)
                        [rodata_end, _sdata),
as mark_rodata_ro is calling

        free_init_pages("unused kernel",
                        (unsigned long) __va(__pa_symbol(text_end)),
                        (unsigned long) __va(__pa_symbol(rodata_start)));
        free_init_pages("unused kernel",
                        (unsigned long) __va(__pa_symbol(rodata_end)),
                        (unsigned long) __va(__pa_symbol(_sdata)));

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-16 23:44       ` Thomas Gleixner
  2014-11-17  4:00         ` Yinghai Lu
@ 2014-11-17 20:27         ` Kees Cook
  2014-11-17 20:32           ` Russell King - ARM Linux
  2014-11-18  7:39           ` Yinghai Lu
  1 sibling, 2 replies; 19+ messages in thread
From: Kees Cook @ 2014-11-17 20:27 UTC (permalink / raw)
  To: Thomas Gleixner, Laura Abbott, Russell King
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 14 Nov 2014, Kees Cook wrote:
>> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > should use attached one instead.
>> >
>> > 1. should use _brk_end instead of &end, as we only use partial of
>> >    brk.
>> > 2. [_brk_end, pm_end) page range is already converted. aka
>> >    is not wasted.
>>
>> Are you sure? For me, _brk_end isn't far enough:
>>
>> [    1.475572] all_end: 0xffffffff82df5000
>> [    1.476736] _brk_end: 0xffffffff82dd6000
>
> _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
> which is the same as _end. We usually do not use all of that space. So
> it's expected that _brk_end < _end.
>
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
>
> Yes, I wanted to get rid of the high mapping for anything between
> _brk_end and _end, and I brought you on the wrong track with my
> suggestion to call free_init_pages(). Sorry about that.
>
> That happened because I missed the completely non obvious fact, that
> only the effective brk section is reserved for the kernel via
> reserve_brk(). So the area between _brk_end and _end is already
> reusable. Though that reuse works only by chance and not by design and
> is completely undocumented as everything else in that area.
>
> So the initial patch to get rid of the X mapping is of course to just
> extend the area to the PMD. A little bit different to your initial
> patch, but essentially the same.
>
> -       unsigned long all_end = PFN_ALIGN(&_end);
> +       unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
>
> I'm going to apply your V1 patch with the above roundup()
> simplification. If a page of that area gets used later on then we are
> going to split up the PMD anyway.

That's fine by me. Yinghai Lu seems to have potentially better
solutions, but my head is spinning after reading more of this thread.
:) I just want to make sure that at the end of the day, there are no
RW+x mappings.

> But still we want to get rid of that highmap between _brk_end and
> _end, but there is absolutely no reason to come up with extra silly
> functions for that.
>
> So the obvious solution is to let setup_arch() reserve the memory up
> to _end instead of _bss_stop, get rid of the extra reservation in
> reserve_brk() and then let free_initmem() release the area between
> _brk_end and _end. No extra hackery, no side effects, just works.
>
> I spent quite some time to stare into that and I wonder about the
> following related issues:
>
> 1) Why is the mark_rodata_ro() business a debug configuration, i.e
>    CONFIG_DEBUG_RODATA?
>
>    This does not make any sense at all. We really want RO and NX on by
>    default and AFAICT distros are turning that on anyway for obvious
>    reasons.

This is a historically badly named config item. Once arm and arm64
land their CONFIG_DEBUG_RODATA implementations, I was going to suggest
having this renamed.

>
>    The only idiocity I found so far is the kgdb Documentation which
>    recommends to turn it off. Sigh.
>
>    So that should be changed to:
>
>       config ARCH_HAS_RONX
>           bool
>
>       config DISABLE_RONX
>           def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES
>
>    plus
>
>       config ARCH_WANTS_KGDB_SECURITY_HOLES
>           bool
>
>       config KGDB_ENABLE_SECURITY_HOLES
>           bool "WTF?"
>           depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES
>
> 2) What is actually the modules counterpart for mark_rodata_ro()?
>
>    CONFIG_DEBUG_SET_MODULE_RONX
>
>    Of course not enabled by default, but enabled by distros again.
>
>    See #1.
>
>    Now what's interesting aside of the general fuckup is that
>    CONFIG_DEBUG_RODATA is supported by:
>
>      arch/x86 and arch/parisc
>
>    But CONFIG_DEBUG_SET_MODULE_RONX is supported by:
>
>      arch/arm/ arch/arm64 arch/s390 and arch/x86

I have a series for arm that is waiting to be picked up by rmk:
https://patchwork.ozlabs.org/patch/400383/

Laura has been working on a series for arm64:
http://comments.gmane.org/gmane.linux.ports.arm.kernel/366777

So what you're seeing is us being in the middle of providing support
for it. It just happens that CONFIG_DEBUG_SET_MODULE_RONX is a bit
easier to implement, so it was completed first.

>
>    This does not make any sense at all.
>
>    Do arm/arm64/s390 have other means to make RO/NX work or are they
>    just doing it for modules? And how is that supposed to work with
>    KGDB if it is not aware of modules sections being RO/NX? KGDB has
>    only extra magic for CONFIG_DEBUG_RODATA, but not for
>    CONFIG_DEBUG_SET_MODULE_RONX.
>
>    Now for extended fun the x86 help text for that option says:
>
>     ... Such protection may interfere with run-time code
>     patching and dynamic kernel tracing - and they might also protect
>     against certain classes of kernel exploits.
>     If in doubt, say "N".
>
>    Patently wrong. More sigh.

Like the CONFIG naming, I hope to start cleaning these defaults up
once arm and arm64 are landed.

> 3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
>    cannot be marked __init ?
>
>    Just because ...

That's worth examining. Since most of the logic that does the ro/nx
work needs to stick around for things like ftrace, kgdb, etc, it's
possible there wouldn't be much savings from making mark_rodata_ro as
__init. I'll add this to my list to check.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-17 20:27         ` Kees Cook
@ 2014-11-17 20:32           ` Russell King - ARM Linux
  2014-11-17 20:43             ` Kees Cook
  2014-11-18  7:39           ` Yinghai Lu
  1 sibling, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 20:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Laura Abbott, Yinghai Lu,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrew Morton, Andy Lutomirski,
	Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote:
> I have a series for arm that is waiting to be picked up by rmk:
> https://patchwork.ozlabs.org/patch/400383/

It should've been in linux-next via my tree for the last two weeks
or so.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-17 20:32           ` Russell King - ARM Linux
@ 2014-11-17 20:43             ` Kees Cook
  0 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2014-11-17 20:43 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Thomas Gleixner, Laura Abbott, Yinghai Lu,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrew Morton, Andy Lutomirski,
	Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Mon, Nov 17, 2014 at 12:32 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote:
>> I have a series for arm that is waiting to be picked up by rmk:
>> https://patchwork.ozlabs.org/patch/400383/
>
> It should've been in linux-next via my tree for the last two weeks
> or so.

Fantastic! Thank you!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-17 20:27         ` Kees Cook
  2014-11-17 20:32           ` Russell King - ARM Linux
@ 2014-11-18  7:39           ` Yinghai Lu
  1 sibling, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2014-11-18  7:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Laura Abbott, Russell King,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrew Morton, Andy Lutomirski,
	Yasuaki Ishimatsu, Wang Nan, David Vrabel

[-- Attachment #1: Type: text/plain, Size: 3300 bytes --]

On Mon, Nov 17, 2014 at 12:27 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> So the initial patch to get rid of the X mapping is of course to just
>> extend the area to the PMD. A little bit different to your initial
>> patch, but essentially the same.
>>
>> -       unsigned long all_end = PFN_ALIGN(&_end);
>> +       unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
>>
>> I'm going to apply your V1 patch with the above roundup()
>> simplification. If a page of that area gets used later on then we are
>> going to split up the PMD anyway.
>
> That's fine by me. Yinghai Lu seems to have potentially better
> solutions, but my head is spinning after reading more of this thread.
> :) I just want to make sure that at the end of the day, there are no
> RW+x mappings.
>

Please check v3 that cleanup highmap in the middle.


Before patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff843a2000        1672K     RW             GLB NX pte
0xffffffff843a2000-0xffffffff84400000         376K     RW             GLB x  pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

After patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82000000          16M     ro         PSE GLB x  pmd
0xffffffff82000000-0xffffffff82011000          68K     ro             GLB x  pte
0xffffffff82011000-0xffffffff82200000        1980K                           pte
0xffffffff82200000-0xffffffff82a00000           8M     ro         PSE GLB NX pmd
0xffffffff82a00000-0xffffffff82a1d000         116K     ro             GLB NX pte
0xffffffff82a1d000-0xffffffff82c00000        1932K                           pte
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff82e52000         328K     RW             GLB NX pte
0xffffffff82e52000-0xffffffff83000000        1720K                           pte
0xffffffff83000000-0xffffffff83200000           2M                           pmd
0xffffffff83200000-0xffffffff83213000          76K                           pte
0xffffffff83213000-0xffffffff83400000        1972K     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84383000        1548K     RW             GLB NX pte
0xffffffff84383000-0xffffffff84400000         500K                           pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

Thanks

Yinghai

[-- Attachment #2: nx_after_end_v3_1.patch --]
[-- Type: text/x-patch, Size: 3069 bytes --]

Subject: [PATCH 1/2] x86, 64bit: add pfn_range_is_highmapped()

will use it to support holes in highmap.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/pgtable_64.h |    2 ++
 arch/x86/mm/init_64.c             |   22 ++++++++++++++++++++++
 arch/x86/mm/pageattr.c            |   16 +---------------
 3 files changed, 25 insertions(+), 15 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,23 @@ void __init init_extra_mapping_uc(unsign
 	__init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }
 
+/* three holes at most*/
+#define NR_RANGE 4
+static struct range pfn_highmapped[NR_RANGE];
+static int nr_pfn_highmapped;
+
+int pfn_range_is_highmapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	int i;
+
+	for (i = 0; i < nr_pfn_highmapped; i++)
+		if ((start_pfn >= pfn_highmapped[i].start) &&
+		    (end_pfn <= pfn_highmapped[i].end))
+			return 1;
+
+	return 0;
+}
+
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -409,6 +426,11 @@ void __init cleanup_highmap(void)
 		if (vaddr < (unsigned long) _text || vaddr > end)
 			set_pmd(pmd, __pmd(0));
 	}
+
+	nr_pfn_highmapped = add_range(pfn_highmapped, NR_RANGE,
+			nr_pfn_highmapped,
+			__pa_symbol(_text) >> PAGE_SHIFT,
+			__pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT);
 }
 
 static unsigned long __meminit
Index: linux-2.6/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -91,20 +91,6 @@ void arch_report_meminfo(struct seq_file
 static inline void split_page_count(int level) { }
 #endif
 
-#ifdef CONFIG_X86_64
-
-static inline unsigned long highmap_start_pfn(void)
-{
-	return __pa_symbol(_text) >> PAGE_SHIFT;
-}
-
-static inline unsigned long highmap_end_pfn(void)
-{
-	return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
-}
-
-#endif
-
 #ifdef CONFIG_DEBUG_PAGEALLOC
 # define debug_pagealloc 1
 #else
@@ -1242,7 +1228,7 @@ static int cpa_process_alias(struct cpa_
 	 * to touch the high mapped kernel as well:
 	 */
 	if (!within(vaddr, (unsigned long)_text, _brk_end) &&
-	    within(cpa->pfn, highmap_start_pfn(), highmap_end_pfn())) {
+	    pfn_range_is_highmapped(cpa->pfn, 1)) {
 		unsigned long temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) +
 					       __START_KERNEL_map - phys_base;
 		alias_cpa = *cpa;
Index: linux-2.6/arch/x86/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pgtable_64.h
+++ linux-2.6/arch/x86/include/asm/pgtable_64.h
@@ -167,6 +167,8 @@ static inline int pgd_large(pgd_t pgd) {
 extern int kern_addr_valid(unsigned long addr);
 extern void cleanup_highmap(void);
 
+int pfn_range_is_highmapped(unsigned long start_pfn, unsigned long end_pfn);
+
 #define HAVE_ARCH_UNMAPPED_AREA
 #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
 

[-- Attachment #3: nx_after_end_v3_2.patch --]
[-- Type: text/x-patch, Size: 6281 bytes --]

Subject: [PATCH 2/2] x86, 64bit: cleanup highmap late for not needed range

1. should use _brk_end instead of &_end in mark_rodata_ro().
   _brk_end can move up to &_end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. add cleanup_highmap_late for range for initmem, around rodata, and
   [_brk_end, pmd_end]

Kernel Layout:

[    0.000000]   .text: [0x01000000-0x0200d608]
[    0.000000] .rodata: [0x02200000-0x02a1cfff]
[    0.000000]   .data: [0x02c00000-0x02e50e7f]
[    0.000000]   .init: [0x02e52000-0x03212fff]
[    0.000000]    .bss: [0x03221000-0x0437bfff]
[    0.000000]    .brk: [0x0437c000-0x043a1fff]

Actually used brk:
[    0.272959] memblock_reserve: [0x0000000437c000-0x00000004382fff] flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff83000000           2M     RW             GLB NX pte
0xffffffff83000000-0xffffffff83200000           2M     RW         PSE GLB NX pmd
0xffffffff83200000-0xffffffff83400000           2M     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff843a2000        1672K     RW             GLB NX pte
0xffffffff843a2000-0xffffffff84400000         376K     RW             GLB x  pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

After patch:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82000000          16M     ro         PSE GLB x  pmd
0xffffffff82000000-0xffffffff82011000          68K     ro             GLB x  pte
0xffffffff82011000-0xffffffff82200000        1980K                           pte
0xffffffff82200000-0xffffffff82a00000           8M     ro         PSE GLB NX pmd
0xffffffff82a00000-0xffffffff82a1d000         116K     ro             GLB NX pte
0xffffffff82a1d000-0xffffffff82c00000        1932K                           pte
0xffffffff82c00000-0xffffffff82e00000           2M     RW         PSE GLB NX pmd
0xffffffff82e00000-0xffffffff82e52000         328K     RW             GLB NX pte
0xffffffff82e52000-0xffffffff83000000        1720K                           pte
0xffffffff83000000-0xffffffff83200000           2M                           pmd
0xffffffff83200000-0xffffffff83213000          76K                           pte
0xffffffff83213000-0xffffffff83400000        1972K     RW             GLB NX pte
0xffffffff83400000-0xffffffff84200000          14M     RW         PSE GLB NX pmd
0xffffffff84200000-0xffffffff84383000        1548K     RW             GLB NX pte
0xffffffff84383000-0xffffffff84400000         500K                           pte
0xffffffff84400000-0xffffffffa0000000         444M                           pmd

-v3: remove all not used highmap ranges.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1098,6 +1098,67 @@ void __init mem_init(void)
 }
 
 #ifdef CONFIG_DEBUG_RODATA
+static void remove_highmap_2m(unsigned long addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset_k(addr);
+	pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
+	pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+
+	set_pmd(pmd, __pmd(0));
+}
+
+static void remove_highmap_2m_partial(unsigned long addr, unsigned long end)
+{
+	int i;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	int start_index = pte_index(addr);
+	int end_index = pte_index(end - 1) + 1;
+
+	set_memory_4k(addr, end_index - start_index);
+
+	pgd = pgd_offset_k(addr);
+	pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
+	pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+	pte = (pte_t *)pmd_page_vaddr(*pmd) + start_index;
+
+	for (i = start_index; i < end_index; i++, pte++)
+		set_pte(pte, __pte(0));
+}
+
+static void cleanup_highmap_late(unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	unsigned long start_2m_aligned = roundup(start, PMD_SIZE);
+	unsigned long end_2m_aligned = rounddown(end, PMD_SIZE);
+
+	start = PFN_ALIGN(start);
+	end &= PAGE_MASK;
+
+	if (start >= end)
+		return;
+
+	if (start < start_2m_aligned && start_2m_aligned <= end)
+		remove_highmap_2m_partial(start, start_2m_aligned);
+
+	for (addr = start_2m_aligned; addr < end_2m_aligned; addr += PMD_SIZE)
+		remove_highmap_2m(addr);
+
+	if (start <= end_2m_aligned && end_2m_aligned < end)
+		remove_highmap_2m_partial(end_2m_aligned, end);
+
+	subtract_range(pfn_highmapped, NR_RANGE,
+			__pa_symbol(start) >> PAGE_SHIFT,
+			__pa_symbol(end) >> PAGE_SHIFT);
+	nr_pfn_highmapped = clean_sort_range(pfn_highmapped, NR_RANGE);
+}
+
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
 
@@ -1146,7 +1207,8 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long data_start = PFN_ALIGN(&_sdata);
+	unsigned long all_end = PFN_ALIGN(_brk_end);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1160,6 +1222,12 @@ void mark_rodata_ro(void)
 	 */
 	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
 
+	cleanup_highmap_late(text_end, rodata_start);
+	cleanup_highmap_late(rodata_end, data_start);
+	cleanup_highmap_late(all_end, roundup(_brk_end, PMD_SIZE));
+	cleanup_highmap_late((unsigned long)(&__init_begin),
+				(unsigned long)(&__init_end));
+
 	rodata_test();
 
 #ifdef CONFIG_CPA_DEBUG

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
  2014-11-15  2:46     ` Kees Cook
  2014-11-15  3:38       ` Yinghai Lu
  2014-11-16 23:44       ` Thomas Gleixner
@ 2014-11-18 17:11       ` Thomas Gleixner
  2 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-11-18 17:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrew Morton,
	Andy Lutomirski, Yasuaki Ishimatsu, Wang Nan, David Vrabel

On Fri, 14 Nov 2014, Kees Cook wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> v2:
> >>>  - added call to free_init_pages(), as suggested by tglx
> >
> >> something is wrong:
> >>
> >> [    7.842479] Freeing unused kernel memory: 3844K (ffffffff82e52000 -
> >> ffffffff83213000)
> >> [    7.843305] Write protecting the kernel read-only data: 28672k
> >
> > ....
> > should use attached one instead.
> >
> > 1. should use _brk_end instead of &end, as we only use partial of
> >    brk.
> > 2. [_brk_end, pm_end) page range is already converted. aka
> >    is not wasted.
> 
> Are you sure? For me, _brk_end isn't far enough:

_brk_end is guaranteed to be <= _end. But we really want to use
_brk_end here, because if we have the following situation:

_brk_start:     0x03ff0000
_brk_end:       0x03ffff00
_end:           0x04016000

So we have the following PMDs setup:

0x03e00000     pmd rw nx        <- covers the top of .bss
                                   and the start  of .brk
0x04000000     pmd rw nx        <- covers the end of .brk
                                   and some random unused

So if _brk_end is less than 0x04000000, then cleanup_highmem() has
zapped the extra PMD already. So we don't want to call set_nx() on
that.

If _brk_end is >= 0x04000000 then we cover that last pmd with the
set_nx call.

Completely non obvious as anything else in that area.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-11-18 17:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 20:45 [PATCH v2] x86, mm: set NX across entire PMD at boot Kees Cook
2014-11-15  1:29 ` Yinghai Lu
2014-11-15  2:29   ` Yinghai Lu
2014-11-15  2:46     ` Kees Cook
2014-11-15  3:38       ` Yinghai Lu
2014-11-15  9:43         ` Yinghai Lu
2014-11-16 21:26           ` Thomas Gleixner
2014-11-17  3:30             ` Yinghai Lu
2014-11-16 18:52         ` Thomas Gleixner
2014-11-17  3:26           ` Yinghai Lu
2014-11-16 23:44       ` Thomas Gleixner
2014-11-17  4:00         ` Yinghai Lu
2014-11-17 20:27         ` Kees Cook
2014-11-17 20:32           ` Russell King - ARM Linux
2014-11-17 20:43             ` Kees Cook
2014-11-18  7:39           ` Yinghai Lu
2014-11-18 17:11       ` Thomas Gleixner
2014-11-15  3:06   ` Kees Cook
2014-11-15  3:34     ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).