All of lore.kernel.org
 help / color / mirror / Atom feed
* -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
@ 2010-11-22  7:23 Lin Ming
  2010-11-22  8:07 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Lin Ming @ 2010-11-22  7:23 UTC (permalink / raw)
  To: Matthieu Castet, Siarhei Liakh, Xuxian Jiang, Ingo Molnar
  Cc: Arjan van de Ven, Andi Kleen, lkml

Hi, all

Current -tip tree(92c883a) fail to resume after suspend to mem.
Bisect to commit 5bd5a45(x86: Add NX protection for kernel data).

commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c
Author: Matthieu Castet <castet.matthieu@free.fr>
Date:   Tue Nov 16 22:31:26 2010 +0100

    x86: Add NX protection for kernel data


I did some debug and found the regression is caused by below line.

-       set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
+       set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);

In my machine,
rodata_start=0xffffffff81a00000, end=0xffffffff81e00000, kernel_end=0xffffffff82200000

Without this commit, kernel page table dumps

---[ High Kernel Mapping ]---
....
0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE GLB NX pmd
0xffffffff81c00000-0xffffffff81d91000        1604K     ro             GLB NX pte
0xffffffff81d91000-0xffffffff81e00000         444K     ro             GLB NX pte
0xffffffff81e00000-0xffffffff82000000           2M     RW         PSE GLB x  pmd
0xffffffff82000000-0xffffffff8200c000          48K     RW             GLB x  pte
0xffffffff8200c000-0xffffffff82100000         976K     RW             GLB x  pte
0xffffffff82100000-0xffffffff82200000           1M     RW             GLB x  pte
.....

With this commit, kernel page table dumps

---[ High Kernel Mapping ]---
....
0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE GLB NX pmd
0xffffffff81c00000-0xffffffff81d91000        1604K     ro             GLB NX pte
0xffffffff81d91000-0xffffffff81e00000         444K     ro             GLB NX pte
0xffffffff81e00000-0xffffffff82000000           2M     RW         PSE GLB NX pmd
0xffffffff82000000-0xffffffff8200c000          48K     RW             GLB NX pte
0xffffffff8200c000-0xffffffff82100000         976K     RW             GLB NX pte
0xffffffff82100000-0xffffffff82200000           1M     RW             GLB NX pte
.....

The only difference is 0xffffffff81e00000(end) to 0xffffffff82200000(kernel_end) is set to NX.

I guess this range code is used by resume wakeup code. So setting it to NX causes problem.

Any idea?

Thanks,
Lin Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22  7:23 -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data) Lin Ming
@ 2010-11-22  8:07 ` Ingo Molnar
  2010-11-22  9:20   ` Andi Kleen
  2010-11-22 13:03 ` Peter Zijlstra
  2010-11-22 13:42 ` [tip:x86/security] x86: Resume trampoline must be executable tip-bot for Lin Ming
  2 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2010-11-22  8:07 UTC (permalink / raw)
  To: Lin Ming
  Cc: Matthieu Castet, Siarhei Liakh, Xuxian Jiang, Arjan van de Ven,
	Andi Kleen, lkml


* Lin Ming <ming.m.lin@intel.com> wrote:

> Hi, all
> 
> Current -tip tree(92c883a) fail to resume after suspend to mem.
> Bisect to commit 5bd5a45(x86: Add NX protection for kernel data).
> 
> commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c
> Author: Matthieu Castet <castet.matthieu@free.fr>
> Date:   Tue Nov 16 22:31:26 2010 +0100
> 
>     x86: Add NX protection for kernel data
> 
> 
> I did some debug and found the regression is caused by below line.
> 
> -       set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
> +       set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
> 
> In my machine,
> rodata_start=0xffffffff81a00000, end=0xffffffff81e00000, kernel_end=0xffffffff82200000
> 
> Without this commit, kernel page table dumps
> 
> ---[ High Kernel Mapping ]---
> ....
> 0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE GLB NX pmd
> 0xffffffff81c00000-0xffffffff81d91000        1604K     ro             GLB NX pte
> 0xffffffff81d91000-0xffffffff81e00000         444K     ro             GLB NX pte
> 0xffffffff81e00000-0xffffffff82000000           2M     RW         PSE GLB x  pmd
> 0xffffffff82000000-0xffffffff8200c000          48K     RW             GLB x  pte
> 0xffffffff8200c000-0xffffffff82100000         976K     RW             GLB x  pte
> 0xffffffff82100000-0xffffffff82200000           1M     RW             GLB x  pte
> .....
> 
> With this commit, kernel page table dumps
> 
> ---[ High Kernel Mapping ]---
> ....
> 0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE GLB NX pmd
> 0xffffffff81c00000-0xffffffff81d91000        1604K     ro             GLB NX pte
> 0xffffffff81d91000-0xffffffff81e00000         444K     ro             GLB NX pte
> 0xffffffff81e00000-0xffffffff82000000           2M     RW         PSE GLB NX pmd
> 0xffffffff82000000-0xffffffff8200c000          48K     RW             GLB NX pte
> 0xffffffff8200c000-0xffffffff82100000         976K     RW             GLB NX pte
> 0xffffffff82100000-0xffffffff82200000           1M     RW             GLB NX pte
> .....
> 
> The only difference is 0xffffffff81e00000(end) to 0xffffffff82200000(kernel_end) is set to NX.
> 
> I guess this range code is used by resume wakeup code. So setting it to NX causes problem.
> 
> Any idea?

Yeah, the resume area is a trampoline, it needs to be executable.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22  8:07 ` Ingo Molnar
@ 2010-11-22  9:20   ` Andi Kleen
  2010-11-22 10:29     ` castet.matthieu
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2010-11-22  9:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lin Ming, Matthieu Castet, Siarhei Liakh, Xuxian Jiang,
	Arjan van de Ven, Andi Kleen, lkml

On Mon, Nov 22, 2010 at 09:07:10AM +0100, Ingo Molnar wrote:
> > ---[ High Kernel Mapping ]---
> > ....
> > 0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE GLB NX pmd
> > 0xffffffff81c00000-0xffffffff81d91000        1604K     ro             GLB NX pte
> > 0xffffffff81d91000-0xffffffff81e00000         444K     ro             GLB NX pte
> > 0xffffffff81e00000-0xffffffff82000000           2M     RW         PSE GLB NX pmd
> > 0xffffffff82000000-0xffffffff8200c000          48K     RW             GLB NX pte
> > 0xffffffff8200c000-0xffffffff82100000         976K     RW             GLB NX pte
> > 0xffffffff82100000-0xffffffff82200000           1M     RW             GLB NX pte
> > .....
> > 
> > The only difference is 0xffffffff81e00000(end) to 0xffffffff82200000(kernel_end) is set to NX.
> > 
> > I guess this range code is used by resume wakeup code. So setting it to NX causes problem.
> > 
> > Any idea?
> 
> Yeah, the resume area is a trampoline, it needs to be executable.

Also it looks like most of the kernel linker supplied data is now not 
2MB mapped anymore? That will surely cost TLB entries.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22  9:20   ` Andi Kleen
@ 2010-11-22 10:29     ` castet.matthieu
  2010-11-22 10:47       ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: castet.matthieu @ 2010-11-22 10:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Lin Ming, Matthieu Castet, Siarhei Liakh,
	Xuxian Jiang, Arjan van de Ven, Andi Kleen, lkml

Hi Andi,

Quoting Andi Kleen <andi@firstfloor.org>:

> On Mon, Nov 22, 2010 at 09:07:10AM +0100, Ingo Molnar wrote:
> > > ---[ High Kernel Mapping ]---
> > > ....
> > > 0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE GLB
> NX pmd
> > > 0xffffffff81c00000-0xffffffff81d91000        1604K     ro             GLB
> NX pte
> > > 0xffffffff81d91000-0xffffffff81e00000         444K     ro             GLB
> NX pte
> > > 0xffffffff81e00000-0xffffffff82000000           2M     RW         PSE GLB
> NX pmd
> > > 0xffffffff82000000-0xffffffff8200c000          48K     RW             GLB
> NX pte
> > > 0xffffffff8200c000-0xffffffff82100000         976K     RW             GLB
> NX pte
> > > 0xffffffff82100000-0xffffffff82200000           1M     RW             GLB
> NX pte
> > > .....
> > >
> > > The only difference is 0xffffffff81e00000(end) to
> 0xffffffff82200000(kernel_end) is set to NX.
> > >
> > > I guess this range code is used by resume wakeup code. So setting it to
> NX causes problem.
> > >
> > > Any idea?
> >
> > Yeah, the resume area is a trampoline, it needs to be executable.
>
> Also it looks like most of the kernel linker supplied data is now not
> 2MB mapped anymore? That will surely cost TLB entries.
>
Any idea where it comes from ?

The mapping is still not 2MB mapped without the commit.

The patch only add

+#if defined(CONFIG_DEBUG_RODATA)
+	/* .text should occupy whole number of pages */
+	. = ALIGN(PAGE_SIZE);
+#endif
 	X64_ALIGN_DEBUG_RODATA_BEGIN
 	RO_DATA(PAGE_SIZE)
 	X64_ALIGN_DEBUG_RODATA_END

But X64_ALIGN_DEBUG_RODATA_BEGIN is aligned on 2M, so it shouldn't change
nothing on x64 ?

Matthieu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22 10:29     ` castet.matthieu
@ 2010-11-22 10:47       ` Andi Kleen
  0 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-11-22 10:47 UTC (permalink / raw)
  To: castet.matthieu
  Cc: Andi Kleen, Ingo Molnar, Lin Ming, Siarhei Liakh, Xuxian Jiang,
	Arjan van de Ven, lkml

> 
> But X64_ALIGN_DEBUG_RODATA_BEGIN is aligned on 2M, so it shouldn't change
> nothing on x64 ?

Yes sorry for the confusion. That's unrelated to your patch and already 
happened earlier.  I just noticed it while looking at the page table dumps 
in the report.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22  7:23 -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data) Lin Ming
  2010-11-22  8:07 ` Ingo Molnar
@ 2010-11-22 13:03 ` Peter Zijlstra
  2010-11-22 16:29   ` castet.matthieu
  2010-11-22 13:42 ` [tip:x86/security] x86: Resume trampoline must be executable tip-bot for Lin Ming
  2 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2010-11-22 13:03 UTC (permalink / raw)
  To: Lin Ming
  Cc: Matthieu Castet, Siarhei Liakh, Xuxian Jiang, Ingo Molnar,
	Arjan van de Ven, Andi Kleen, lkml, tglx

On Mon, 2010-11-22 at 15:23 +0800, Lin Ming wrote:
> Hi, all
> 
> Current -tip tree(92c883a) fail to resume after suspend to mem.
> Bisect to commit 5bd5a45(x86: Add NX protection for kernel data).

Drad, if only I'd looked at LKML before I did the bisection myself..

> commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c
> Author: Matthieu Castet <castet.matthieu@free.fr>
> Date:   Tue Nov 16 22:31:26 2010 +0100
> 
>     x86: Add NX protection for kernel data
> 
> 
> I did some debug and found the regression is caused by below line.
> 
> -       set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
> +       set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);


Confirmed, with that hunk reverted my machine works again..

---
 arch/x86/mm/init_64.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ce59c05..71a5929 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -788,7 +788,6 @@ void mark_rodata_ro(void)
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
-	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
 	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
 	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
 	unsigned long data_start = (unsigned long) &_sdata;
@@ -803,7 +802,7 @@ void mark_rodata_ro(void)
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
-	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:x86/security] x86: Resume trampoline must be executable
  2010-11-22  7:23 -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data) Lin Ming
  2010-11-22  8:07 ` Ingo Molnar
  2010-11-22 13:03 ` Peter Zijlstra
@ 2010-11-22 13:42 ` tip-bot for Lin Ming
  2 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Lin Ming @ 2010-11-22 13:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sliakh.lkml, linux-kernel, hpa, mingo, andi, peterz, jiang,
	arjan, castet.matthieu, ming.m.lin, tglx, mingo

Commit-ID:  691513f70d3957939a318da970987b876c720861
Gitweb:     http://git.kernel.org/tip/691513f70d3957939a318da970987b876c720861
Author:     Lin Ming <ming.m.lin@intel.com>
AuthorDate: Mon, 22 Nov 2010 14:03:28 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 22 Nov 2010 14:38:52 +0100

x86: Resume trampoline must be executable

commit 5bd5a452(x86: Add NX protection for kernel data) marked the
trampoline area NX - which unsurprisingly breaks resume and cpu
hotplug.

Revert the portion of that commit, which touches the trampoline.

Originally-from: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/mm/init_64.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ce59c05..71a5929 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -788,7 +788,6 @@ void mark_rodata_ro(void)
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
-	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
 	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
 	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
 	unsigned long data_start = (unsigned long) &_sdata;
@@ -803,7 +802,7 @@ void mark_rodata_ro(void)
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
-	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22 13:03 ` Peter Zijlstra
@ 2010-11-22 16:29   ` castet.matthieu
  2010-11-22 16:35     ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: castet.matthieu @ 2010-11-22 16:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lin Ming, Matthieu Castet, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, Andi Kleen, lkml, tglx

Hi,

Quoting Peter Zijlstra <peterz@infradead.org>:

> On Mon, 2010-11-22 at 15:23 +0800, Lin Ming wrote:
> > Hi, all
> >
> > Current -tip tree(92c883a) fail to resume after suspend to mem.
> > Bisect to commit 5bd5a45(x86: Add NX protection for kernel data).
>
> Drad, if only I'd looked at LKML before I did the bisection myself..
>
> > commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c
> > Author: Matthieu Castet <castet.matthieu@free.fr>
> > Date:   Tue Nov 16 22:31:26 2010 +0100
> >
> >     x86: Add NX protection for kernel data
> >
> >
> > I did some debug and found the regression is caused by below line.
> >
> > -       set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
> > +       set_memory_nx(rodata_start, (kernel_end - rodata_start) >>
> PAGE_SHIFT);
>
>
> Confirmed, with that hunk reverted my machine works again..
>
Ok,
I will submit a proper fix :
in acpi_save_state_mem/acpi_restore_state_mem make the trampoline X or NX.


Matthieu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22 16:29   ` castet.matthieu
@ 2010-11-22 16:35     ` Peter Zijlstra
  2010-11-22 16:42       ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2010-11-22 16:35 UTC (permalink / raw)
  To: castet.matthieu
  Cc: Lin Ming, Siarhei Liakh, Xuxian Jiang, Ingo Molnar,
	Arjan van de Ven, Andi Kleen, lkml, tglx

On Mon, 2010-11-22 at 17:29 +0100, castet.matthieu@free.fr wrote:
> Hi,
> 
> Quoting Peter Zijlstra <peterz@infradead.org>:
> 
> > On Mon, 2010-11-22 at 15:23 +0800, Lin Ming wrote:
> > > Hi, all
> > >
> > > Current -tip tree(92c883a) fail to resume after suspend to mem.
> > > Bisect to commit 5bd5a45(x86: Add NX protection for kernel data).
> >
> > Drad, if only I'd looked at LKML before I did the bisection myself..
> >
> > > commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c
> > > Author: Matthieu Castet <castet.matthieu@free.fr>
> > > Date:   Tue Nov 16 22:31:26 2010 +0100
> > >
> > >     x86: Add NX protection for kernel data
> > >
> > >
> > > I did some debug and found the regression is caused by below line.
> > >
> > > -       set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
> > > +       set_memory_nx(rodata_start, (kernel_end - rodata_start) >>
> > PAGE_SHIFT);
> >
> >
> > Confirmed, with that hunk reverted my machine works again..
> >
> Ok,
> I will submit a proper fix :
> in acpi_save_state_mem/acpi_restore_state_mem make the trampoline X or NX.

That seems to be a S3 specific code path, that won't fix anything.
Simply do:

echo 0 > /sys/devices/system/cpu/cpu1/online;
echo 1 > /sys/devices/system/cpu/cpu1/online;

and your machine will explode..



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22 16:35     ` Peter Zijlstra
@ 2010-11-22 16:42       ` Andi Kleen
  2010-11-23 22:55         ` mat
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2010-11-22 16:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: castet.matthieu, Lin Ming, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, Andi Kleen, lkml, tglx

> That seems to be a S3 specific code path, that won't fix anything.
> Simply do:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online;
> echo 1 > /sys/devices/system/cpu/cpu1/online;
> 
> and your machine will explode..

The SMP startup trampoline is copied I believe
and only executed in real mode without page tables.

So it's perhaps not the trampoline, but the early startup 
code that ends up being broken.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-22 16:42       ` Andi Kleen
@ 2010-11-23 22:55         ` mat
  2010-11-26 17:31           ` mat
  0 siblings, 1 reply; 24+ messages in thread
From: mat @ 2010-11-23 22:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Lin Ming, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

Le Mon, 22 Nov 2010 17:42:47 +0100,
Andi Kleen <andi@firstfloor.org> a écrit :

> > That seems to be a S3 specific code path, that won't fix anything.
> > Simply do:
> > 
> > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > 
> > and your machine will explode..
> 
> The SMP startup trampoline is copied I believe
> and only executed in real mode without page tables.
> 
> So it's perhaps not the trampoline, but the early startup 
> code that ends up being broken.
yes :
acpi wakeup code and smp trampoline are copied in low memory (first
1MB).

So they can't end up int the kernel data mapping ?

So it should something else.

I will try to investigate on this.

Matthieu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-23 22:55         ` mat
@ 2010-11-26 17:31           ` mat
  2010-11-26 23:39             ` Lin Ming
  2010-11-30  5:00             ` Lin Ming
  0 siblings, 2 replies; 24+ messages in thread
From: mat @ 2010-11-26 17:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Lin Ming, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

Le Tue, 23 Nov 2010 23:55:27 +0100,
mat <castet.matthieu@free.fr> a écrit :

> Le Mon, 22 Nov 2010 17:42:47 +0100,
> Andi Kleen <andi@firstfloor.org> a écrit :
> 
> > > That seems to be a S3 specific code path, that won't fix anything.
> > > Simply do:
> > > 
> > > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > > 
> > > and your machine will explode..
> > 
> > The SMP startup trampoline is copied I believe
> > and only executed in real mode without page tables.
> > 
> > So it's perhaps not the trampoline, but the early startup 
> > code that ends up being broken.
> yes :
> acpi wakeup code and smp trampoline are copied in low memory (first
> 1MB).
> 
> So they can't end up int the kernel data mapping ?
> 
> So it should something else.
> 
> I will try to investigate on this.
> 
Unfortunately on my laptop supporting NX, suspend to ram seems broken
(even without this patch) and I got only one core, so I am unable to
test it.

Does cpu suspend/resume is broken ? Or it is only S3 ?

If yes, are there any interesting trace if we suspend only one core with
sysfs.

Thanks,

Matthieu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-26 17:31           ` mat
@ 2010-11-26 23:39             ` Lin Ming
  2010-11-30  5:00             ` Lin Ming
  1 sibling, 0 replies; 24+ messages in thread
From: Lin Ming @ 2010-11-26 23:39 UTC (permalink / raw)
  To: mat
  Cc: Andi Kleen, Peter Zijlstra, Lin Ming, Siarhei Liakh,
	Xuxian Jiang, Ingo Molnar, Arjan van de Ven, lkml, tglx

On Sat, Nov 27, 2010 at 1:31 AM, mat <castet.matthieu@free.fr> wrote:
> Le Tue, 23 Nov 2010 23:55:27 +0100,
> mat <castet.matthieu@free.fr> a écrit :
>
>> Le Mon, 22 Nov 2010 17:42:47 +0100,
>> Andi Kleen <andi@firstfloor.org> a écrit :
>>
>> > > That seems to be a S3 specific code path, that won't fix anything.
>> > > Simply do:
>> > >
>> > > echo 0 > /sys/devices/system/cpu/cpu1/online;
>> > > echo 1 > /sys/devices/system/cpu/cpu1/online;
>> > >
>> > > and your machine will explode..
>> >
>> > The SMP startup trampoline is copied I believe
>> > and only executed in real mode without page tables.
>> >
>> > So it's perhaps not the trampoline, but the early startup
>> > code that ends up being broken.
>> yes :
>> acpi wakeup code and smp trampoline are copied in low memory (first
>> 1MB).
>>
>> So they can't end up int the kernel data mapping ?
>>
>> So it should something else.
>>
>> I will try to investigate on this.
>>
> Unfortunately on my laptop supporting NX, suspend to ram seems broken
> (even without this patch) and I got only one core, so I am unable to
> test it.
>
> Does cpu suspend/resume is broken ? Or it is only S3 ?
>
> If yes, are there any interesting trace if we suspend only one core with
> sysfs.

Hi,

I am on travel now, will test it when I'm back next week.

Thanks,
Lin Ming

>
> Thanks,
>
> Matthieu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-26 17:31           ` mat
  2010-11-26 23:39             ` Lin Ming
@ 2010-11-30  5:00             ` Lin Ming
  2010-11-30 11:27               ` Peter Zijlstra
  2010-12-24 17:26               ` matthieu castet
  1 sibling, 2 replies; 24+ messages in thread
From: Lin Ming @ 2010-11-30  5:00 UTC (permalink / raw)
  To: mat
  Cc: Andi Kleen, Peter Zijlstra, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

On Sat, 2010-11-27 at 01:31 +0800, mat wrote:
> Le Tue, 23 Nov 2010 23:55:27 +0100,
> mat <castet.matthieu@free.fr> a écrit :
> 
> > Le Mon, 22 Nov 2010 17:42:47 +0100,
> > Andi Kleen <andi@firstfloor.org> a écrit :
> > 
> > > > That seems to be a S3 specific code path, that won't fix anything.
> > > > Simply do:
> > > > 
> > > > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > > > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > > > 
> > > > and your machine will explode..
> > > 
> > > The SMP startup trampoline is copied I believe
> > > and only executed in real mode without page tables.
> > > 
> > > So it's perhaps not the trampoline, but the early startup 
> > > code that ends up being broken.
> > yes :
> > acpi wakeup code and smp trampoline are copied in low memory (first
> > 1MB).
> > 
> > So they can't end up int the kernel data mapping ?
> > 
> > So it should something else.
> > 
> > I will try to investigate on this.
> > 
> Unfortunately on my laptop supporting NX, suspend to ram seems broken
> (even without this patch) and I got only one core, so I am unable to
> test it.
> 
> Does cpu suspend/resume is broken ? Or it is only S3 ?
> 
> If yes, are there any interesting trace if we suspend only one core with
> sysfs.

echo 0 > /sys/devices/system/cpu/cpu1/online;
echo 1 > /sys/devices/system/cpu/cpu1/online;

then machine just reboots...

> 
> Thanks,
> 
> Matthieu



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-30  5:00             ` Lin Ming
@ 2010-11-30 11:27               ` Peter Zijlstra
  2010-12-01  0:15                 ` Lin Ming
  2010-12-24 17:26               ` matthieu castet
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2010-11-30 11:27 UTC (permalink / raw)
  To: Lin Ming
  Cc: mat, Andi Kleen, Siarhei Liakh, Xuxian Jiang, Ingo Molnar,
	Arjan van de Ven, lkml, tglx

On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
> echo 0 > /sys/devices/system/cpu/cpu1/online;
> echo 1 > /sys/devices/system/cpu/cpu1/online;
> 
> then machine just reboots...
> 

That should work on current -tip (and my test machine confirms with a
-tip of yesterday) due to:

---
commit 691513f70d3957939a318da970987b876c720861
Author: Lin Ming <ming.m.lin@intel.com>
Date:   Mon Nov 22 14:03:28 2010 +0100

    x86: Resume trampoline must be executable
    
    commit 5bd5a452(x86: Add NX protection for kernel data) marked the
    trampoline area NX - which unsurprisingly breaks resume and cpu
    hotplug.
    
    Revert the portion of that commit, which touches the trampoline.
    
    Originally-from: Lin Ming <ming.m.lin@intel.com>
    LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com>
    Cc: Matthieu Castet <castet.matthieu@free.fr>
    Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
    Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Arjan van de Ven <arjan@infradead.org>
    Cc: Andi Kleen <andi@firstfloor.org>
    Tested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ce59c05..71a5929 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -788,7 +788,6 @@ void mark_rodata_ro(void)
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
-	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
 	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
 	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
 	unsigned long data_start = (unsigned long) &_sdata;
@@ -803,7 +802,7 @@ void mark_rodata_ro(void)
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
-	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-30 11:27               ` Peter Zijlstra
@ 2010-12-01  0:15                 ` Lin Ming
  2011-01-23 19:06                   ` matthieu castet
  0 siblings, 1 reply; 24+ messages in thread
From: Lin Ming @ 2010-12-01  0:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mat, Andi Kleen, Siarhei Liakh, Xuxian Jiang, Ingo Molnar,
	Arjan van de Ven, lkml, tglx

On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
> > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > 
> > then machine just reboots...
> > 
> 
> That should work on current -tip (and my test machine confirms with a
> -tip of yesterday) due to:

Yes, I know.

I tested for Mat's request to confirm that CPU suspend/resume also
broken, not only S3.

And no interesting trace I can get because machine just reboots
immediately when CPU suspend/resume.

Thanks.

> 
> ---
> commit 691513f70d3957939a318da970987b876c720861
> Author: Lin Ming <ming.m.lin@intel.com>
> Date:   Mon Nov 22 14:03:28 2010 +0100
> 
>     x86: Resume trampoline must be executable
>     
>     commit 5bd5a452(x86: Add NX protection for kernel data) marked the
>     trampoline area NX - which unsurprisingly breaks resume and cpu
>     hotplug.
>     
>     Revert the portion of that commit, which touches the trampoline.
>     
>     Originally-from: Lin Ming <ming.m.lin@intel.com>
>     LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com>
>     Cc: Matthieu Castet <castet.matthieu@free.fr>
>     Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
>     Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
>     Cc: Ingo Molnar <mingo@elte.hu>
>     Cc: Arjan van de Ven <arjan@infradead.org>
>     Cc: Andi Kleen <andi@firstfloor.org>
>     Tested-by: Peter Zijlstra <peterz@infradead.org>
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index ce59c05..71a5929 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -788,7 +788,6 @@ void mark_rodata_ro(void)
>  	unsigned long rodata_start =
>  		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
>  	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
> -	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
>  	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
>  	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
>  	unsigned long data_start = (unsigned long) &_sdata;
> @@ -803,7 +802,7 @@ void mark_rodata_ro(void)
>  	 * The rodata section (but not the kernel text!) should also be
>  	 * not-executable.
>  	 */
> -	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
> +	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
>  
>  	rodata_test();
>  
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-11-30  5:00             ` Lin Ming
  2010-11-30 11:27               ` Peter Zijlstra
@ 2010-12-24 17:26               ` matthieu castet
  2010-12-27  2:10                 ` Lin Ming
  1 sibling, 1 reply; 24+ messages in thread
From: matthieu castet @ 2010-12-24 17:26 UTC (permalink / raw)
  To: Lin Ming
  Cc: Andi Kleen, Peter Zijlstra, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

Hi,

Le Tue, 30 Nov 2010 13:00:30 +0800,
Lin Ming <ming.m.lin@intel.com> a écrit :

> On Sat, 2010-11-27 at 01:31 +0800, mat wrote:
> > Le Tue, 23 Nov 2010 23:55:27 +0100,
> > mat <castet.matthieu@free.fr> a écrit :
> > 
> > > Le Mon, 22 Nov 2010 17:42:47 +0100,
> > > Andi Kleen <andi@firstfloor.org> a écrit :
> > > 
> > > > > That seems to be a S3 specific code path, that won't fix
> > > > > anything. Simply do:
> > > > > 
> > > > > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > > > > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > > > > 
> > > > > and your machine will explode..
> > > > 
> > > > The SMP startup trampoline is copied I believe
> > > > and only executed in real mode without page tables.
> > > > 
> > > > So it's perhaps not the trampoline, but the early startup 
> > > > code that ends up being broken.
> > > yes :
> > > acpi wakeup code and smp trampoline are copied in low memory
> > > (first 1MB).
> > > 
> > > So they can't end up int the kernel data mapping ?
> > > 
> > > So it should something else.
> > > 
> > > I will try to investigate on this.
> > > 
> > Unfortunately on my laptop supporting NX, suspend to ram seems
> > broken (even without this patch) and I got only one core, so I am
> > unable to test it.
> > 
> > Does cpu suspend/resume is broken ? Or it is only S3 ?
> > 
> > If yes, are there any interesting trace if we suspend only one core
> > with sysfs.
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online;
> echo 1 > /sys/devices/system/cpu/cpu1/online;
> 
> then machine just reboots...
> 
Ok,

could you try the attached patch ?

Thanks

Matthieu

[-- Attachment #2: x64_nx_data.diff --]
[-- Type: text/x-patch, Size: 895 bytes --]

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..d86552f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -790,6 +790,7 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
 	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
+	unsigned long data_end = PAGE_ALIGN((unsigned long) &_edata);
 	unsigned long data_start = (unsigned long) &_sdata;
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
@@ -802,7 +803,7 @@ void mark_rodata_ro(void)
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
-	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (data_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-12-24 17:26               ` matthieu castet
@ 2010-12-27  2:10                 ` Lin Ming
  2011-01-05 18:58                   ` matthieu castet
  0 siblings, 1 reply; 24+ messages in thread
From: Lin Ming @ 2010-12-27  2:10 UTC (permalink / raw)
  To: matthieu castet
  Cc: Andi Kleen, Peter Zijlstra, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

On Sat, 2010-12-25 at 01:26 +0800, matthieu castet wrote:
> Hi,
> 
> Le Tue, 30 Nov 2010 13:00:30 +0800,
> Lin Ming <ming.m.lin@intel.com> a écrit :
> 
> > On Sat, 2010-11-27 at 01:31 +0800, mat wrote:
> > > Le Tue, 23 Nov 2010 23:55:27 +0100,
> > > mat <castet.matthieu@free.fr> a écrit :
> > > 
> > > > Le Mon, 22 Nov 2010 17:42:47 +0100,
> > > > Andi Kleen <andi@firstfloor.org> a écrit :
> > > > 
> > > > > > That seems to be a S3 specific code path, that won't fix
> > > > > > anything. Simply do:
> > > > > > 
> > > > > > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > > > > > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > > > > > 
> > > > > > and your machine will explode..
> > > > > 
> > > > > The SMP startup trampoline is copied I believe
> > > > > and only executed in real mode without page tables.
> > > > > 
> > > > > So it's perhaps not the trampoline, but the early startup 
> > > > > code that ends up being broken.
> > > > yes :
> > > > acpi wakeup code and smp trampoline are copied in low memory
> > > > (first 1MB).
> > > > 
> > > > So they can't end up int the kernel data mapping ?
> > > > 
> > > > So it should something else.
> > > > 
> > > > I will try to investigate on this.
> > > > 
> > > Unfortunately on my laptop supporting NX, suspend to ram seems
> > > broken (even without this patch) and I got only one core, so I am
> > > unable to test it.
> > > 
> > > Does cpu suspend/resume is broken ? Or it is only S3 ?
> > > 
> > > If yes, are there any interesting trace if we suspend only one core
> > > with sysfs.
> > 
> > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > 
> > then machine just reboots...
> > 
> Ok,
> 
> could you try the attached patch ?

Unfortunately, it does not work against current tip/master(1f7107c8).

Lin Ming

> 
> Thanks
> 
> Matthieu



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-12-27  2:10                 ` Lin Ming
@ 2011-01-05 18:58                   ` matthieu castet
  0 siblings, 0 replies; 24+ messages in thread
From: matthieu castet @ 2011-01-05 18:58 UTC (permalink / raw)
  To: Lin Ming
  Cc: Andi Kleen, Peter Zijlstra, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

Le Mon, 27 Dec 2010 10:10:50 +0800,
Lin Ming <ming.m.lin@intel.com> a écrit :

> On Sat, 2010-12-25 at 01:26 +0800, matthieu castet wrote:
> > Hi,
> > 
> > Le Tue, 30 Nov 2010 13:00:30 +0800,
> > Lin Ming <ming.m.lin@intel.com> a écrit :
> > 
> > > On Sat, 2010-11-27 at 01:31 +0800, mat wrote:
> > > > Le Tue, 23 Nov 2010 23:55:27 +0100,
> > > > mat <castet.matthieu@free.fr> a écrit :
> > > > 
> > > > > Le Mon, 22 Nov 2010 17:42:47 +0100,
> > > > > Andi Kleen <andi@firstfloor.org> a écrit :
> > > > > 
> > > > > > > That seems to be a S3 specific code path, that won't fix
> > > > > > > anything. Simply do:
> > > > > > > 
> > > > > > > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > > > > > > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > > > > > > 
> > > > > > > and your machine will explode..
> > > > > > 
> > > > > > The SMP startup trampoline is copied I believe
> > > > > > and only executed in real mode without page tables.
> > > > > > 
> > > > > > So it's perhaps not the trampoline, but the early startup 
> > > > > > code that ends up being broken.
> > > > > yes :
> > > > > acpi wakeup code and smp trampoline are copied in low memory
> > > > > (first 1MB).
> > > > > 
> > > > > So they can't end up int the kernel data mapping ?
> > > > > 
> > > > > So it should something else.
> > > > > 
> > > > > I will try to investigate on this.
> > > > > 
> > > > Unfortunately on my laptop supporting NX, suspend to ram seems
> > > > broken (even without this patch) and I got only one core, so I
> > > > am unable to test it.
> > > > 
> > > > Does cpu suspend/resume is broken ? Or it is only S3 ?
> > > > 
> > > > If yes, are there any interesting trace if we suspend only one
> > > > core with sysfs.
> > > 
> > > echo 0 > /sys/devices/system/cpu/cpu1/online;
> > > echo 1 > /sys/devices/system/cpu/cpu1/online;
> > > 
> > > then machine just reboots...
> > > 
> > Ok,
> > 
> > could you try the attached patch ?
> 
> Unfortunately, it does not work against current tip/master(1f7107c8).
> 
> Lin Ming
> 
Thanks for the testing.

For the record I did some tests.
- The x86_32 cpu hotplug is working fine.
- On x86_64 we crash between the cpu1 is wakeup and before trampoline
  jump to start_secondary

A difference between 32 and 64 bit is that head_32.S use
initial_page_table mmu table then switch to swapper_pg_dir
while 64 bit version use init_level4_pgt for trampoline, head and
kernel.

But I fail to see where setting NX is level3_kernel_pgt aera can make
the resume fail. When we enable NX we are in .text section (head_64).


Matthieu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2010-12-01  0:15                 ` Lin Ming
@ 2011-01-23 19:06                   ` matthieu castet
  2011-01-24 22:22                     ` matthieu castet
  0 siblings, 1 reply; 24+ messages in thread
From: matthieu castet @ 2011-01-23 19:06 UTC (permalink / raw)
  To: Lin Ming
  Cc: Peter Zijlstra, Andi Kleen, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

Lin Ming a écrit :
> On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
>> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
>>> echo 0 > /sys/devices/system/cpu/cpu1/online;
>>> echo 1 > /sys/devices/system/cpu/cpu1/online;
>>>
>>> then machine just reboots...
>>>
I tried to do the same thing on qemu, and the same behavior happened (ie reboot when resuming cpu1).

After enabling qemu log, I found that a triple fault was happening at the beginning of secondary_startup_64
when doing "addq phys_base(%rip), %rax".

Why ?
I suppose because we access data set to NX, but we don't have enabled yet NX in the msr. So the cpu crash due 
to "reserved bit check".

If we enable NX before reading data, there is no more crash (patch attached).

Now I am not sure this is the correct fix. I think the problem is that trampoline using kernel page table
is very dangerous. The kernel can have modified them atfer booting !
May be all the paging stuff should have been done in head_64.S. A first one with identity mapping, and the second one for
the real kernel stuff.


Matthieu

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1065 bytes --]

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 239046b..9f32295 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -165,16 +165,6 @@ ENTRY(secondary_startup_64)
 	movl	$(X86_CR4_PAE | X86_CR4_PGE), %eax
 	movq	%rax, %cr4
 
-	/* Setup early boot stage 4 level pagetables. */
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
-	addq	phys_base(%rip), %rax
-	movq	%rax, %cr3
-
-	/* Ensure I am executing from virtual addresses */
-	movq	$1f, %rax
-	jmp	*%rax
-1:
-
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
@@ -189,6 +179,17 @@ ENTRY(secondary_startup_64)
 	btsl	$_EFER_NX, %eax
 1:	wrmsr				/* Make changes effective */
 
+	/* Setup early boot stage 4 level pagetables. */
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+	addq	phys_base(%rip), %rax
+	movq	%rax, %cr3
+
+	/* Ensure I am executing from virtual addresses */
+	movq	$1f, %rax
+	jmp	*%rax
+1:
+
+
 	/* Setup cr0 */
 #define CR0_STATE	(X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
 			 X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2011-01-23 19:06                   ` matthieu castet
@ 2011-01-24 22:22                     ` matthieu castet
  2011-01-25 12:36                       ` Lin Ming
  2011-03-09 23:16                       ` matthieu castet
  0 siblings, 2 replies; 24+ messages in thread
From: matthieu castet @ 2011-01-24 22:22 UTC (permalink / raw)
  To: Lin Ming
  Cc: Peter Zijlstra, Andi Kleen, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx, linux-security-module

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

matthieu castet a écrit :
> Lin Ming a écrit :
>> On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
>>> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
>>>> echo 0 > /sys/devices/system/cpu/cpu1/online;
>>>> echo 1 > /sys/devices/system/cpu/cpu1/online;
>>>>
>>>> then machine just reboots...
>>>>
> I tried to do the same thing on qemu, and the same behavior happened (ie 
> reboot when resuming cpu1).
> 
> After enabling qemu log, I found that a triple fault was happening at 
> the beginning of secondary_startup_64
> when doing "addq phys_base(%rip), %rax".
> 
> Why ?
> I suppose because we access data set to NX, but we don't have enabled 
> yet NX in the msr. So the cpu crash due to "reserved bit check".
> 
> If we enable NX before reading data, there is no more crash (patch 
> attached).
> 
> Now I am not sure this is the correct fix. I think the problem is that 
> trampoline using kernel page table
> is very dangerous. The kernel can have modified them atfer booting !
> May be all the paging stuff should have been done in head_64.S. A first 
> one with identity mapping, and the second one for
> the real kernel stuff.
> 
Lin, could you try this patch on your x64 machine.


Thanks

Matthieu

[-- Attachment #2: 0001-x86-Add-NX-protection-for-kernel-data-on-64-bit.patch --]
[-- Type: text/x-diff, Size: 6020 bytes --]

>From c6d0a62d47be9956abb9d3e1b7c952eaa7e0df7e Mon Sep 17 00:00:00 2001
From: Matthieu CASTET <castet.matthieu@free.fr>
Date: Mon, 24 Jan 2011 23:12:45 +0100
Subject: [PATCH] x86 : Add NX protection for kernel data on 64 bit

This fix the cpu hotplug support, by allocating dedicated page table
for ident mapping in trampoline.
This is need because kernel set NX flag in level3_ident_pgt and
level3_kernel_pgt, and made it unusable from trampoline.

We also set the Low kernel Mapping to NX.

Finaly we apply nx in free_init_pages only when we switch to NX mode in order
to preserve large page mapping.

mapping now look like :
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000200000           2M     RW             GLB NX pte
0xffff880000200000-0xffff880001000000          14M     RW         PSE GLB NX pmd
0xffff880001000000-0xffff880001200000           2M     ro         PSE GLB NX pmd
0xffff880001200000-0xffff8800012ae000         696K     ro             GLB NX pte
0xffff8800012ae000-0xffff880001400000        1352K     RW             GLB NX pte
0xffff880001400000-0xffff880001503000        1036K     ro             GLB NX pte
0xffff880001503000-0xffff880001600000        1012K     RW             GLB NX pte
0xffff880001600000-0xffff880007e00000         104M     RW         PSE GLB NX pmd
0xffff880007e00000-0xffff880007ffd000        2036K     RW             GLB NX pte
0xffff880007ffd000-0xffff880008000000          12K                           pte
0xffff880008000000-0xffff880040000000         896M                           pmd
0xffff880040000000-0xffff888000000000         511G                           pud
0xffff888000000000-0xffffc90000000000       66048G                           pgd
---[ vmalloc() Area ]---
[...]
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff81400000           4M     ro         PSE GLB x  pmd
0xffffffff81400000-0xffffffff81600000           2M     ro         PSE GLB NX pmd
0xffffffff81600000-0xffffffff81800000           2M     RW         PSE GLB NX pmd
0xffffffff81800000-0xffffffffa0000000         488M                           pmd
---[ Modules ]---

Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
---
 arch/x86/kernel/head_64.S       |   18 ++++++++++++++++++
 arch/x86/kernel/trampoline_64.S |    4 ++--
 arch/x86/mm/init.c              |    6 ++++--
 arch/x86/mm/init_64.c           |    6 +++++-
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 239046b..47f56dc 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -139,6 +139,9 @@ ident_complete:
 #ifdef CONFIG_X86_TRAMPOLINE
 	addq	%rbp, trampoline_level4_pgt + 0(%rip)
 	addq	%rbp, trampoline_level4_pgt + (511*8)(%rip)
+
+	addq	%rbp, trampoline_level3_ident_pgt + 0(%rip)
+	addq	%rbp, trampoline_level3_ident_pgt + (L3_START_KERNEL*8)(%rip)
 #endif
 
 	/* Due to ENTRY(), sometimes the empty space gets filled with
@@ -396,6 +399,21 @@ NEXT_PAGE(level2_kernel_pgt)
 NEXT_PAGE(level2_spare_pgt)
 	.fill   512, 8, 0
 
+#ifdef CONFIG_X86_TRAMPOLINE
+NEXT_PAGE(trampoline_level3_ident_pgt)
+	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	L3_START_KERNEL-1,8,0
+	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	511-L3_START_KERNEL,8,0
+
+
+NEXT_PAGE(trampoline_level2_ident_pgt)
+	/* Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 */
+	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+#endif
+
 #undef PMDS
 #undef NEXT_PAGE
 
diff --git a/arch/x86/kernel/trampoline_64.S b/arch/x86/kernel/trampoline_64.S
index 075d130..a408935 100644
--- a/arch/x86/kernel/trampoline_64.S
+++ b/arch/x86/kernel/trampoline_64.S
@@ -160,8 +160,8 @@ trampoline_stack:
 	.org 0x1000
 trampoline_stack_end:
 ENTRY(trampoline_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.fill	510,8,0
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 
 ENTRY(trampoline_end)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 947f42a..58d173b 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -366,8 +366,10 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
 	 * we are going to free part of that, we need to make that
 	 * writeable and non-executable first.
 	 */
-	set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
-	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+	if (kernel_set_to_readonly) {
+		set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
+		set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+	}
 
 	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..3840ecf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -788,6 +788,7 @@ void mark_rodata_ro(void)
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
+	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
 	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
 	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
 	unsigned long data_start = (unsigned long) &_sdata;
@@ -798,11 +799,14 @@ void mark_rodata_ro(void)
 
 	kernel_set_to_readonly = 1;
 
+	/* make low level mapping NX */
+	set_memory_nx(PAGE_OFFSET, (PMD_PAGE_SIZE*PTRS_PER_PMD) >> PAGE_SHIFT);
+
 	/*
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
-	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2011-01-24 22:22                     ` matthieu castet
@ 2011-01-25 12:36                       ` Lin Ming
  2011-03-09 23:16                       ` matthieu castet
  1 sibling, 0 replies; 24+ messages in thread
From: Lin Ming @ 2011-01-25 12:36 UTC (permalink / raw)
  To: matthieu castet
  Cc: Lin Ming, Peter Zijlstra, Andi Kleen, Siarhei Liakh,
	Xuxian Jiang, Ingo Molnar, Arjan van de Ven, lkml, tglx,
	linux-security-module

On Tue, Jan 25, 2011 at 6:22 AM, matthieu castet
<castet.matthieu@free.fr> wrote:
> matthieu castet a écrit :
>>
>> Lin Ming a écrit :
>>>
>>> On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
>>>>
>>>> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
>>>>>
>>>>> echo 0 > /sys/devices/system/cpu/cpu1/online;
>>>>> echo 1 > /sys/devices/system/cpu/cpu1/online;
>>>>>
>>>>> then machine just reboots...
>>>>>
>> I tried to do the same thing on qemu, and the same behavior happened (ie
>> reboot when resuming cpu1).
>>
>> After enabling qemu log, I found that a triple fault was happening at the
>> beginning of secondary_startup_64
>> when doing "addq phys_base(%rip), %rax".
>>
>> Why ?
>> I suppose because we access data set to NX, but we don't have enabled yet
>> NX in the msr. So the cpu crash due to "reserved bit check".
>>
>> If we enable NX before reading data, there is no more crash (patch
>> attached).
>>
>> Now I am not sure this is the correct fix. I think the problem is that
>> trampoline using kernel page table
>> is very dangerous. The kernel can have modified them atfer booting !
>> May be all the paging stuff should have been done in head_64.S. A first
>> one with identity mapping, and the second one for
>> the real kernel stuff.
>>
> Lin, could you try this patch on your x64 machine.

Hi,

I'm on holiday now.
I'll test it when I'm back on Feb 9.

Thanks,
Lin Ming

>
>
> Thanks
>
> Matthieu
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2011-01-24 22:22                     ` matthieu castet
  2011-01-25 12:36                       ` Lin Ming
@ 2011-03-09 23:16                       ` matthieu castet
  2011-03-10  2:39                         ` Lin Ming
  1 sibling, 1 reply; 24+ messages in thread
From: matthieu castet @ 2011-03-09 23:16 UTC (permalink / raw)
  To: Lin Ming
  Cc: Peter Zijlstra, Andi Kleen, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx, linux-security-module

Hi,

matthieu castet a écrit :
> matthieu castet a écrit :
>> Lin Ming a écrit :
>>> On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
>>>> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
>>>>> echo 0 > /sys/devices/system/cpu/cpu1/online;
>>>>> echo 1 > /sys/devices/system/cpu/cpu1/online;
>>>>>
>>>>> then machine just reboots...
>>>>>
>> I tried to do the same thing on qemu, and the same behavior happened 
>> (ie reboot when resuming cpu1).
>>
>> After enabling qemu log, I found that a triple fault was happening at 
>> the beginning of secondary_startup_64
>> when doing "addq phys_base(%rip), %rax".
>>
>> Why ?
>> I suppose because we access data set to NX, but we don't have enabled 
>> yet NX in the msr. So the cpu crash due to "reserved bit check".
>>
>> If we enable NX before reading data, there is no more crash (patch 
>> attached).
>>
>> Now I am not sure this is the correct fix. I think the problem is that 
>> trampoline using kernel page table
>> is very dangerous. The kernel can have modified them atfer booting !
>> May be all the paging stuff should have been done in head_64.S. A 
>> first one with identity mapping, and the second one for
>> the real kernel stuff.
>>
> Lin, could you try this patch on your x64 machine.
> 
> 
I updated the patch to last tip. I tested on a 64 bits config and everything works.

Any comment on it ?


Thanks


Matthieu


From: Matthieu CASTET <castet.matthieu@free.fr>
Date: Thu, 10 Mar 2011 00:10:01 +0100
Subject: [PATCH] x86 : Add NX protection for kernel data on 64 bit

This fix the cpu hotplug support, by allocating dedicated page table
for ident mapping in trampoline.
This is need because kernel set NX flag in level3_ident_pgt and
level3_kernel_pgt, and made it unusable from trampoline.

We also set the Low kernel Mapping to NX.

Finaly we apply nx in free_init_pages only when we switch to NX mode in order
to preserve large page mapping.

mapping now look like :
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000200000           2M     RW             GLB NX pte
0xffff880000200000-0xffff880001000000          14M     RW         PSE GLB NX pmd
0xffff880001000000-0xffff880001200000           2M     ro         PSE GLB NX pmd
0xffff880001200000-0xffff8800012ae000         696K     ro             GLB NX pte
0xffff8800012ae000-0xffff880001400000        1352K     RW             GLB NX pte
0xffff880001400000-0xffff880001503000        1036K     ro             GLB NX pte
0xffff880001503000-0xffff880001600000        1012K     RW             GLB NX pte
0xffff880001600000-0xffff880007e00000         104M     RW         PSE GLB NX pmd
0xffff880007e00000-0xffff880007ffd000        2036K     RW             GLB NX pte
0xffff880007ffd000-0xffff880008000000          12K                           pte
0xffff880008000000-0xffff880040000000         896M                           pmd
0xffff880040000000-0xffff888000000000         511G                           pud
0xffff888000000000-0xffffc90000000000       66048G                           pgd
---[ vmalloc() Area ]---
[...]
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff81400000           4M     ro         PSE GLB x  pmd
0xffffffff81400000-0xffffffff81600000           2M     ro         PSE GLB NX pmd
0xffffffff81600000-0xffffffff81800000           2M     RW         PSE GLB NX pmd
0xffffffff81800000-0xffffffffa0000000         488M                           pmd
---[ Modules ]---

Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>

Conflicts:

	arch/x86/kernel/head_64.S
---
 arch/x86/kernel/head_64.S       |   15 +++++++++++++++
 arch/x86/kernel/trampoline_64.S |    4 ++--
 arch/x86/mm/init.c              |    6 ++++--
 arch/x86/mm/init_64.c           |    6 +++++-
 4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e11e394..e261354 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -140,6 +140,9 @@ ident_complete:
 	addq	%rbp, trampoline_level4_pgt + 0(%rip)
 	addq	%rbp, trampoline_level4_pgt + (511*8)(%rip)
 
+	addq	%rbp, trampoline_level3_ident_pgt + 0(%rip)
+	addq	%rbp, trampoline_level3_ident_pgt + (L3_START_KERNEL*8)(%rip)
+
 	/* Due to ENTRY(), sometimes the empty space gets filled with
 	 * zeros. Better take a jmp than relying on empty space being
 	 * filled with 0x90 (nop)
@@ -395,6 +398,18 @@ NEXT_PAGE(level2_kernel_pgt)
 NEXT_PAGE(level2_spare_pgt)
 	.fill   512, 8, 0
 
+NEXT_PAGE(trampoline_level3_ident_pgt)
+	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	L3_START_KERNEL-1,8,0
+	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	511-L3_START_KERNEL,8,0
+
+
+NEXT_PAGE(trampoline_level2_ident_pgt)
+	/* Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 */
+	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
 #undef PMDS
 #undef NEXT_PAGE
 
diff --git a/arch/x86/kernel/trampoline_64.S b/arch/x86/kernel/trampoline_64.S
index 09ff517..8723e47 100644
--- a/arch/x86/kernel/trampoline_64.S
+++ b/arch/x86/kernel/trampoline_64.S
@@ -164,8 +164,8 @@ trampoline_stack:
 	.org 0x1000
 trampoline_stack_end:
 ENTRY(trampoline_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.fill	510,8,0
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 
 ENTRY(trampoline_end)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 286d289..98dd5fa 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -338,8 +338,10 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
 	 * we are going to free part of that, we need to make that
 	 * writeable and non-executable first.
 	 */
-	set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
-	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+	if (kernel_set_to_readonly) {
+		set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
+		set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+	}
 
 	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 470cc47..e9bb29d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -832,6 +832,7 @@ void mark_rodata_ro(void)
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
+	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
 	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
 	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
 	unsigned long data_start = (unsigned long) &_sdata;
@@ -842,11 +843,14 @@ void mark_rodata_ro(void)
 
 	kernel_set_to_readonly = 1;
 
+	/* make low level mapping NX */
+	set_memory_nx(PAGE_OFFSET, (PMD_PAGE_SIZE*PTRS_PER_PMD) >> PAGE_SHIFT);
+
 	/*
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
-	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data)
  2011-03-09 23:16                       ` matthieu castet
@ 2011-03-10  2:39                         ` Lin Ming
  0 siblings, 0 replies; 24+ messages in thread
From: Lin Ming @ 2011-03-10  2:39 UTC (permalink / raw)
  To: matthieu castet
  Cc: Peter Zijlstra, Andi Kleen, Siarhei Liakh, Xuxian Jiang,
	Ingo Molnar, Arjan van de Ven, lkml, tglx, linux-security-module

On Thu, 2011-03-10 at 07:16 +0800, matthieu castet wrote:
> Hi,
> 
> matthieu castet a écrit :
> > matthieu castet a écrit :
> >> Lin Ming a écrit :
> >>> On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
> >>>> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
> >>>>> echo 0 > /sys/devices/system/cpu/cpu1/online;
> >>>>> echo 1 > /sys/devices/system/cpu/cpu1/online;
> >>>>>
> >>>>> then machine just reboots...
> >>>>>
> >> I tried to do the same thing on qemu, and the same behavior happened 
> >> (ie reboot when resuming cpu1).
> >>
> >> After enabling qemu log, I found that a triple fault was happening at 
> >> the beginning of secondary_startup_64
> >> when doing "addq phys_base(%rip), %rax".
> >>
> >> Why ?
> >> I suppose because we access data set to NX, but we don't have enabled 
> >> yet NX in the msr. So the cpu crash due to "reserved bit check".
> >>
> >> If we enable NX before reading data, there is no more crash (patch 
> >> attached).
> >>
> >> Now I am not sure this is the correct fix. I think the problem is that 
> >> trampoline using kernel page table
> >> is very dangerous. The kernel can have modified them atfer booting !
> >> May be all the paging stuff should have been done in head_64.S. A 
> >> first one with identity mapping, and the second one for
> >> the real kernel stuff.
> >>
> > Lin, could you try this patch on your x64 machine.
> > 
> > 
> I updated the patch to last tip. I tested on a 64 bits config and everything works.

I tested suspend/resume/hotplug and it works on my 64 bit notebook too.

Thanks,
Lin Ming

> 
> Any comment on it ?
> 
> 
> Thanks
> 
> 
> Matthieu
> 
> 
> From: Matthieu CASTET <castet.matthieu@free.fr>
> Date: Thu, 10 Mar 2011 00:10:01 +0100
> Subject: [PATCH] x86 : Add NX protection for kernel data on 64 bit
> 
> This fix the cpu hotplug support, by allocating dedicated page table
> for ident mapping in trampoline.
> This is need because kernel set NX flag in level3_ident_pgt and
> level3_kernel_pgt, and made it unusable from trampoline.
> 
> We also set the Low kernel Mapping to NX.
> 
> Finaly we apply nx in free_init_pages only when we switch to NX mode in order
> to preserve large page mapping.
> 
> mapping now look like :
> ---[ Low Kernel Mapping ]---
> 0xffff880000000000-0xffff880000200000           2M     RW             GLB NX pte
> 0xffff880000200000-0xffff880001000000          14M     RW         PSE GLB NX pmd
> 0xffff880001000000-0xffff880001200000           2M     ro         PSE GLB NX pmd
> 0xffff880001200000-0xffff8800012ae000         696K     ro             GLB NX pte
> 0xffff8800012ae000-0xffff880001400000        1352K     RW             GLB NX pte
> 0xffff880001400000-0xffff880001503000        1036K     ro             GLB NX pte
> 0xffff880001503000-0xffff880001600000        1012K     RW             GLB NX pte
> 0xffff880001600000-0xffff880007e00000         104M     RW         PSE GLB NX pmd
> 0xffff880007e00000-0xffff880007ffd000        2036K     RW             GLB NX pte
> 0xffff880007ffd000-0xffff880008000000          12K                           pte
> 0xffff880008000000-0xffff880040000000         896M                           pmd
> 0xffff880040000000-0xffff888000000000         511G                           pud
> 0xffff888000000000-0xffffc90000000000       66048G                           pgd
> ---[ vmalloc() Area ]---
> [...]
> ---[ High Kernel Mapping ]---
> 0xffffffff80000000-0xffffffff81000000          16M                           pmd
> 0xffffffff81000000-0xffffffff81400000           4M     ro         PSE GLB x  pmd
> 0xffffffff81400000-0xffffffff81600000           2M     ro         PSE GLB NX pmd
> 0xffffffff81600000-0xffffffff81800000           2M     RW         PSE GLB NX pmd
> 0xffffffff81800000-0xffffffffa0000000         488M                           pmd
> ---[ Modules ]---
> 
> Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
> 
> Conflicts:
> 
> 	arch/x86/kernel/head_64.S
> ---
>  arch/x86/kernel/head_64.S       |   15 +++++++++++++++
>  arch/x86/kernel/trampoline_64.S |    4 ++--
>  arch/x86/mm/init.c              |    6 ++++--
>  arch/x86/mm/init_64.c           |    6 +++++-
>  4 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index e11e394..e261354 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -140,6 +140,9 @@ ident_complete:
>  	addq	%rbp, trampoline_level4_pgt + 0(%rip)
>  	addq	%rbp, trampoline_level4_pgt + (511*8)(%rip)
>  
> +	addq	%rbp, trampoline_level3_ident_pgt + 0(%rip)
> +	addq	%rbp, trampoline_level3_ident_pgt + (L3_START_KERNEL*8)(%rip)
> +
>  	/* Due to ENTRY(), sometimes the empty space gets filled with
>  	 * zeros. Better take a jmp than relying on empty space being
>  	 * filled with 0x90 (nop)
> @@ -395,6 +398,18 @@ NEXT_PAGE(level2_kernel_pgt)
>  NEXT_PAGE(level2_spare_pgt)
>  	.fill   512, 8, 0
>  
> +NEXT_PAGE(trampoline_level3_ident_pgt)
> +	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.fill	L3_START_KERNEL-1,8,0
> +	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.fill	511-L3_START_KERNEL,8,0
> +
> +
> +NEXT_PAGE(trampoline_level2_ident_pgt)
> +	/* Since I easily can, map the first 1G.
> +	 * Don't set NX because code runs from these pages.
> +	 */
> +	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
>  #undef PMDS
>  #undef NEXT_PAGE
>  
> diff --git a/arch/x86/kernel/trampoline_64.S b/arch/x86/kernel/trampoline_64.S
> index 09ff517..8723e47 100644
> --- a/arch/x86/kernel/trampoline_64.S
> +++ b/arch/x86/kernel/trampoline_64.S
> @@ -164,8 +164,8 @@ trampoline_stack:
>  	.org 0x1000
>  trampoline_stack_end:
>  ENTRY(trampoline_level4_pgt)
> -	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
>  	.fill	510,8,0
> -	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
>  
>  ENTRY(trampoline_end)
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 286d289..98dd5fa 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -338,8 +338,10 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
>  	 * we are going to free part of that, we need to make that
>  	 * writeable and non-executable first.
>  	 */
> -	set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
> -	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
> +	if (kernel_set_to_readonly) {
> +		set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
> +		set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
> +	}
>  
>  	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 470cc47..e9bb29d 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -832,6 +832,7 @@ void mark_rodata_ro(void)
>  	unsigned long rodata_start =
>  		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
>  	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
> +	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
>  	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
>  	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
>  	unsigned long data_start = (unsigned long) &_sdata;
> @@ -842,11 +843,14 @@ void mark_rodata_ro(void)
>  
>  	kernel_set_to_readonly = 1;
>  
> +	/* make low level mapping NX */
> +	set_memory_nx(PAGE_OFFSET, (PMD_PAGE_SIZE*PTRS_PER_PMD) >> PAGE_SHIFT);
> +
>  	/*
>  	 * The rodata section (but not the kernel text!) should also be
>  	 * not-executable.
>  	 */
> -	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
> +	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
>  
>  	rodata_test();
>  



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-03-10  2:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-22  7:23 -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protection for kernel data) Lin Ming
2010-11-22  8:07 ` Ingo Molnar
2010-11-22  9:20   ` Andi Kleen
2010-11-22 10:29     ` castet.matthieu
2010-11-22 10:47       ` Andi Kleen
2010-11-22 13:03 ` Peter Zijlstra
2010-11-22 16:29   ` castet.matthieu
2010-11-22 16:35     ` Peter Zijlstra
2010-11-22 16:42       ` Andi Kleen
2010-11-23 22:55         ` mat
2010-11-26 17:31           ` mat
2010-11-26 23:39             ` Lin Ming
2010-11-30  5:00             ` Lin Ming
2010-11-30 11:27               ` Peter Zijlstra
2010-12-01  0:15                 ` Lin Ming
2011-01-23 19:06                   ` matthieu castet
2011-01-24 22:22                     ` matthieu castet
2011-01-25 12:36                       ` Lin Ming
2011-03-09 23:16                       ` matthieu castet
2011-03-10  2:39                         ` Lin Ming
2010-12-24 17:26               ` matthieu castet
2010-12-27  2:10                 ` Lin Ming
2011-01-05 18:58                   ` matthieu castet
2010-11-22 13:42 ` [tip:x86/security] x86: Resume trampoline must be executable tip-bot for Lin Ming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.