[V5] x86: NX protection for kernel data
diff mbox series

Message ID 817ecb6f0910121803p52a4049ep4a712545d28bba76@mail.gmail.com
State New, archived
Headers show
Series
  • [V5] x86: NX protection for kernel data
Related show

Commit Message

tip-bot for Siarhei Liakh Oct. 13, 2009, 1:03 a.m. UTC
This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.
The following steps are taken to achieve this:
1. Linker script is adjusted so .text always starts and ends on a page boundary
2. Linker script is adjusted so .rodata and .data always start and
end on a page boundary
3. void mark_nxdata_nx(void) added to arch/x86/mm/init.c with actual
functionality: NX is set for all pages from _etext through _end.
4. mark_nxdata_nx() called from free_initmem() (after init has been released)
5. free_init_pages() sets released memory NX in arch/x86/mm/init.c

The patch have been developed for Linux 2.6.31-rc7 x86 by Siarhei Liakh
<sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.

V1:  initial patch for 2.6.30
V2:  patch for 2.6.31-rc7
V3:  moved all code into arch/x86, adjusted credits
V4:  fixed ifdef, removed credits from CREDITS
V5:  fixed an address calculation bug in mark_nxdata_nx()
---

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>


@@ -440,11 +441,29 @@ void free_init_pages(char *what, unsigned long
begin, unsigned long end)
 #endif
 }

+void mark_nxdata_nx(void)
+{
+#ifdef CONFIG_DEBUG_RODATA
+	/*
+	 * When this called, init has already been executed and released,
+	 * so everything past _etext sould be NX.
+	 */
+	unsigned long start = PAGE_ALIGN((unsigned long)(&_etext));
+	unsigned long size = PAGE_ALIGN((unsigned long)(&_end)) - start;
+
+	printk(KERN_INFO "NX-protecting the kernel data: %lx, %lu pages\n",
+		start, size >> PAGE_SHIFT);
+	set_memory_nx(start, size >> PAGE_SHIFT);
+#endif
+}
+
 void free_initmem(void)
 {
 	free_init_pages("unused kernel memory",
 			(unsigned long)(&__init_begin),
 			(unsigned long)(&__init_end));
+	/* Set kernel's data as NX */
+	mark_nxdata_nx();
 }

 #ifdef CONFIG_BLK_DEV_INITRD
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Arjan van de Ven Oct. 13, 2009, 4:32 a.m. UTC | #1
On Mon, 12 Oct 2009 21:03:17 -0400
Siarhei Liakh <sliakh.lkml@gmail.com> wrote:

> This patch expands functionality of CONFIG_DEBUG_RODATA to set main
> (static) kernel data area as NX.
> The following steps are taken to achieve this:
> 1. Linker script is adjusted so .text always starts and ends on a
> page boundary 2. Linker script is adjusted so .rodata and .data
> always start and end on a page boundary
> 3. void mark_nxdata_nx(void) added to arch/x86/mm/init.c with actual
> functionality: NX is set for all pages from _etext through _end.
> 4. mark_nxdata_nx() called from free_initmem() (after init has been
> released) 5. free_init_pages() sets released memory NX in
> arch/x86/mm/init.c
> 
> The patch have been developed for Linux 2.6.31-rc7 x86 by Siarhei
> Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
> 

I like doing this, but... maybe it is useful to have a diff of the
pagetable dump (PT_DUMP config option) to show the effect, in the
changelog. That'd be like the proof on the pudding...
tip-bot for Ingo Molnar Oct. 13, 2009, 6:03 a.m. UTC | #2
* Arjan van de Ven <arjan@infradead.org> wrote:

> On Mon, 12 Oct 2009 21:03:17 -0400
> Siarhei Liakh <sliakh.lkml@gmail.com> wrote:
> 
> > This patch expands functionality of CONFIG_DEBUG_RODATA to set main
> > (static) kernel data area as NX.
> > The following steps are taken to achieve this:
> > 1. Linker script is adjusted so .text always starts and ends on a
> > page boundary 2. Linker script is adjusted so .rodata and .data
> > always start and end on a page boundary
> > 3. void mark_nxdata_nx(void) added to arch/x86/mm/init.c with actual
> > functionality: NX is set for all pages from _etext through _end.
> > 4. mark_nxdata_nx() called from free_initmem() (after init has been
> > released) 5. free_init_pages() sets released memory NX in
> > arch/x86/mm/init.c
> > 
> > The patch have been developed for Linux 2.6.31-rc7 x86 by Siarhei
> > Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
> > 
> 
> I like doing this, but... maybe it is useful to have a diff of the 
> pagetable dump (PT_DUMP config option) to show the effect, in the 
> changelog. That'd be like the proof on the pudding...

That's a good suggestion. Siarhei Liakh, mind doing that?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
David Howells Oct. 13, 2009, 7:14 a.m. UTC | #3
Siarhei Liakh <sliakh.lkml@gmail.com> wrote:

> @@ -440,11 +441,29 @@ void free_init_pages(char *what, unsigned long
> begin, unsigned long end)

Your mail client is word wrapping your patches.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
David Howells Oct. 13, 2009, 7:48 a.m. UTC | #4
Siarhei Liakh <sliakh.lkml@gmail.com> wrote:

> This patch expands functionality of CONFIG_DEBUG_RODATA to set main
> (static) kernel data area as NX.
> The following steps are taken to achieve this:
> 1. Linker script is adjusted so .text always starts and ends on a page boundary
> 2. Linker script is adjusted so .rodata and .data always start and
> end on a page boundary
> 3. void mark_nxdata_nx(void) added to arch/x86/mm/init.c with actual
> functionality: NX is set for all pages from _etext through _end.
> 4. mark_nxdata_nx() called from free_initmem() (after init has been released)
> 5. free_init_pages() sets released memory NX in arch/x86/mm/init.c
> 
> The patch have been developed for Linux 2.6.31-rc7 x86 by Siarhei Liakh
> <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
> 
> V1:  initial patch for 2.6.30
> V2:  patch for 2.6.31-rc7
> V3:  moved all code into arch/x86, adjusted credits
> V4:  fixed ifdef, removed credits from CREDITS
> V5:  fixed an address calculation bug in mark_nxdata_nx()
> ---
> 
> Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>

That seems to fix the problem, thanks.

Acked-by: David Howells <dhowells@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Siarhei Liakh Oct. 13, 2009, 11:35 a.m. UTC | #5
>> I like doing this, but... maybe it is useful to have a diff of the
>> pagetable dump (PT_DUMP config option) to show the effect, in the
>> changelog. That'd be like the proof on the pudding...
>
> That's a good suggestion. Siarhei Liakh, mind doing that?

Here you go:
===============================================
--- data_nx_pt_before.txt	2009-10-13 07:26:17.000000000 -0400
+++ data_nx_pt_after.txt	2009-10-13 07:26:46.000000000 -0400
@@ -2,12 +2,9 @@
 0x00000000-0xc0000000           3G                           pmd
 ---[ Kernel Mapping ]---
 0xc0000000-0xc0100000           1M     RW             GLB x  pte
-0xc0100000-0xc048d000        3636K     ro             GLB x  pte
-0xc048d000-0xc04d0000         268K     RW             GLB x  pte
-0xc04d0000-0xc04d2000           8K     RW             GLB NX pte
-0xc04d2000-0xc04d3000           4K     RW             GLB x  pte
-0xc04d3000-0xc0531000         376K     RW             GLB NX pte
-0xc0531000-0xc0600000         828K     RW             GLB x  pte
+0xc0100000-0xc0381000        2564K     ro             GLB x  pte
+0xc0381000-0xc048d000        1072K     ro             GLB NX pte
+0xc048d000-0xc0600000        1484K     RW             GLB NX pte
 0xc0600000-0xf7800000         882M     RW         PSE GLB NX pmd
 0xf7800000-0xf79fe000        2040K     RW             GLB NX pte
 0xf79fe000-0xf7a00000           8K                           pte
===============================================

Would you like me to re-post whole patch with this addition?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Ingo Molnar Oct. 13, 2009, 12:28 p.m. UTC | #6
* Siarhei Liakh <sliakh.lkml@gmail.com> wrote:

> >> I like doing this, but... maybe it is useful to have a diff of the
> >> pagetable dump (PT_DUMP config option) to show the effect, in the
> >> changelog. That'd be like the proof on the pudding...
> >
> > That's a good suggestion. Siarhei Liakh, mind doing that?
> 
> Here you go:
> ===============================================
> --- data_nx_pt_before.txt	2009-10-13 07:26:17.000000000 -0400
> +++ data_nx_pt_after.txt	2009-10-13 07:26:46.000000000 -0400
> @@ -2,12 +2,9 @@
>  0x00000000-0xc0000000           3G                           pmd
>  ---[ Kernel Mapping ]---
>  0xc0000000-0xc0100000           1M     RW             GLB x  pte
> -0xc0100000-0xc048d000        3636K     ro             GLB x  pte
> -0xc048d000-0xc04d0000         268K     RW             GLB x  pte
> -0xc04d0000-0xc04d2000           8K     RW             GLB NX pte
> -0xc04d2000-0xc04d3000           4K     RW             GLB x  pte
> -0xc04d3000-0xc0531000         376K     RW             GLB NX pte
> -0xc0531000-0xc0600000         828K     RW             GLB x  pte
> +0xc0100000-0xc0381000        2564K     ro             GLB x  pte
> +0xc0381000-0xc048d000        1072K     ro             GLB NX pte
> +0xc048d000-0xc0600000        1484K     RW             GLB NX pte
>  0xc0600000-0xf7800000         882M     RW         PSE GLB NX pmd
>  0xf7800000-0xf79fe000        2040K     RW             GLB NX pte
>  0xf79fe000-0xf7a00000           8K                           pte
> ===============================================
> 
> Would you like me to re-post whole patch with this addition?

Yep, v6 with Arjan's ack (once he sends it) would be handy.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Arjan van de Ven Oct. 13, 2009, 2:07 p.m. UTC | #7
On Tue, 13 Oct 2009 07:35:28 -0400
Siarhei Liakh <sliakh.lkml@gmail.com> wrote:

> ---[ Kernel Mapping ]---
>  0xc0000000-0xc0100000           1M     RW             GLB x  pte
> -0xc0100000-0xc048d000        3636K     ro             GLB x  pte
> -0xc048d000-0xc04d0000         268K     RW             GLB x  pte
> -0xc04d0000-0xc04d2000           8K     RW             GLB NX pte
> -0xc04d2000-0xc04d3000           4K     RW             GLB x  pte
> -0xc04d3000-0xc0531000         376K     RW             GLB NX pte
> -0xc0531000-0xc0600000         828K     RW             GLB x  pte
> +0xc0100000-0xc0381000        2564K     ro             GLB x  pte
> +0xc0381000-0xc048d000        1072K     ro             GLB NX pte
> +0xc048d000-0xc0600000        1484K     RW             GLB NX pte
>  0xc0600000-0xf7800000         882M     RW         PSE GLB NX pmd
>  0xf7800000-0xf79fe000        2040K     RW             GLB NX pte
>  0xf79fe000-0xf7a00000           8K                           pte
> ===============================================
> 

looks great to me; the result is 
* kernel is ro + x
* rodata is ro + NX
* data is RW + NX
(and there is no "RW + x", other than the first megabyte... hmm. maybe
we need to look at that as well at some point)

Acked-by: Arjan van de Ven <arjan@linux.intel.com>
tip-bot for Ingo Molnar Oct. 13, 2009, 2:15 p.m. UTC | #8
* Arjan van de Ven <arjan@infradead.org> wrote:

> On Tue, 13 Oct 2009 07:35:28 -0400
> Siarhei Liakh <sliakh.lkml@gmail.com> wrote:
> 
> > ---[ Kernel Mapping ]---
> >  0xc0000000-0xc0100000           1M     RW             GLB x  pte
> > -0xc0100000-0xc048d000        3636K     ro             GLB x  pte
> > -0xc048d000-0xc04d0000         268K     RW             GLB x  pte
> > -0xc04d0000-0xc04d2000           8K     RW             GLB NX pte
> > -0xc04d2000-0xc04d3000           4K     RW             GLB x  pte
> > -0xc04d3000-0xc0531000         376K     RW             GLB NX pte
> > -0xc0531000-0xc0600000         828K     RW             GLB x  pte
> > +0xc0100000-0xc0381000        2564K     ro             GLB x  pte
> > +0xc0381000-0xc048d000        1072K     ro             GLB NX pte
> > +0xc048d000-0xc0600000        1484K     RW             GLB NX pte
> >  0xc0600000-0xf7800000         882M     RW         PSE GLB NX pmd
> >  0xf7800000-0xf79fe000        2040K     RW             GLB NX pte
> >  0xf79fe000-0xf7a00000           8K                           pte
> > ===============================================
> > 
> 
> looks great to me; the result is 
> * kernel is ro + x
> * rodata is ro + NX
> * data is RW + NX
>
> (and there is no "RW + x", other than the first megabyte... hmm. maybe 
> we need to look at that as well at some point)

Could we cover the first megabyte too please via a (default-disabled) 
option? Modern Xorg shouldnt mind about that anymore, right?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Arjan van de Ven Oct. 13, 2009, 2:29 p.m. UTC | #9
On Tue, 13 Oct 2009 16:15:27 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > On Tue, 13 Oct 2009 07:35:28 -0400
> > Siarhei Liakh <sliakh.lkml@gmail.com> wrote:
> > 
> > > ---[ Kernel Mapping ]---
> > >  0xc0000000-0xc0100000           1M     RW             GLB x  pte
> > > -0xc0100000-0xc048d000        3636K     ro             GLB x  pte
> > > -0xc048d000-0xc04d0000         268K     RW             GLB x  pte
> > > -0xc04d0000-0xc04d2000           8K     RW             GLB NX pte
> > > -0xc04d2000-0xc04d3000           4K     RW             GLB x  pte
> > > -0xc04d3000-0xc0531000         376K     RW             GLB NX pte
> > > -0xc0531000-0xc0600000         828K     RW             GLB x  pte
> > > +0xc0100000-0xc0381000        2564K     ro             GLB x  pte
> > > +0xc0381000-0xc048d000        1072K     ro             GLB NX pte
> > > +0xc048d000-0xc0600000        1484K     RW             GLB NX pte
> > >  0xc0600000-0xf7800000         882M     RW         PSE GLB NX pmd
> > >  0xf7800000-0xf79fe000        2040K     RW             GLB NX pte
> > >  0xf79fe000-0xf7a00000           8K                           pte
> > > ===============================================
> > > 
> > 
> > looks great to me; the result is 
> > * kernel is ro + x
> > * rodata is ro + NX
> > * data is RW + NX
> >
> > (and there is no "RW + x", other than the first megabyte... hmm.
> > maybe we need to look at that as well at some point)
> 
> Could we cover the first megabyte too please via a (default-disabled) 
> option? Modern Xorg shouldnt mind about that anymore, right?


I'd be surprised if anything ever did; this is the *kernel* mapping of
the first megabyte, not some userspace mapping....
Arjan van de Ven Oct. 13, 2009, 2:35 p.m. UTC | #10
On Tue, 13 Oct 2009 16:15:27 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > On Tue, 13 Oct 2009 07:35:28 -0400
> > Siarhei Liakh <sliakh.lkml@gmail.com> wrote:
> > 
> > > ---[ Kernel Mapping ]---
> > >  0xc0000000-0xc0100000           1M     RW             GLB x  pte
> > > -0xc0100000-0xc048d000        3636K     ro             GLB x  pte
> > > -0xc048d000-0xc04d0000         268K     RW             GLB x  pte
> > > -0xc04d0000-0xc04d2000           8K     RW             GLB NX pte
> > > -0xc04d2000-0xc04d3000           4K     RW             GLB x  pte
> > > -0xc04d3000-0xc0531000         376K     RW             GLB NX pte
> > > -0xc0531000-0xc0600000         828K     RW             GLB x  pte
> > > +0xc0100000-0xc0381000        2564K     ro             GLB x  pte
> > > +0xc0381000-0xc048d000        1072K     ro             GLB NX pte
> > > +0xc048d000-0xc0600000        1484K     RW             GLB NX pte
> > >  0xc0600000-0xf7800000         882M     RW         PSE GLB NX pmd
> > >  0xf7800000-0xf79fe000        2040K     RW             GLB NX pte
> > >  0xf79fe000-0xf7a00000           8K                           pte
> > > ===============================================
> > > 
> > 
> > looks great to me; the result is 
> > * kernel is ro + x
> > * rodata is ro + NX
> > * data is RW + NX
> >
> > (and there is no "RW + x", other than the first megabyte... hmm.
> > maybe we need to look at that as well at some point)
> 
> Could we cover the first megabyte too please via a (default-disabled) 
> option? Modern Xorg shouldnt mind about that anymore, right?

just to be clear, for me this 1Mb is a seperate issue, and for a
separate patch.... the current patch is good as is.
Alan Cox Oct. 13, 2009, 2:49 p.m. UTC | #11
> I'd be surprised if anything ever did; this is the *kernel* mapping of
> the first megabyte, not some userspace mapping....

APM, BIOS32, EDD, PnPBIOS ..

However except for APM (which isn't generally needed on NX capable
devices or found on them) none of them are usually on critical paths
because EDD is just grovelling around sort of stuff, and BIOS32 isn't
generally used by the kernel anyway so could probably cope with flipping
the permissions on the low 1 MB each call.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Siarhei Liakh Oct. 13, 2009, 3:34 p.m. UTC | #12
>> I'd be surprised if anything ever did; this is the *kernel* mapping of
>> the first megabyte, not some userspace mapping....
>
> APM, BIOS32, EDD, PnPBIOS ..
>
> However except for APM (which isn't generally needed on NX capable
> devices or found on them) none of them are usually on critical paths
> because EDD is just grovelling around sort of stuff, and BIOS32 isn't
> generally used by the kernel anyway so could probably cope with flipping
> the permissions on the low 1 MB each call.

Actually, I have posted a patch to fix RW+X problem with BIOS32 some
time ago. See my submission to LKML (and subsequent discussion) on Jul
19 2009 "[PATCH] x86: Reducing footprint of BIOS32 service mappings".

Nevertheless, that 1MB area is on my "to do" list, and I will be
patching it sooner or later (assuming I get my patches tested well
enough to get them accepted).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 78d185d..83ae734 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -43,14 +43,14 @@  jiffies_64 = jiffies;

 PHDRS {
 	text PT_LOAD FLAGS(5);          /* R_E */
-	data PT_LOAD FLAGS(7);          /* RWE */
+	data PT_LOAD FLAGS(6);          /* RW_ */
 #ifdef CONFIG_X86_64
-	user PT_LOAD FLAGS(7);          /* RWE */
-	data.init PT_LOAD FLAGS(7);     /* RWE */
+	user PT_LOAD FLAGS(6);          /* RW_ */
+	data.init PT_LOAD FLAGS(6);     /* RW_ */
 #ifdef CONFIG_SMP
-	percpu PT_LOAD FLAGS(7);        /* RWE */
+	percpu PT_LOAD FLAGS(6);        /* RW_ */
 #endif
-	data.init2 PT_LOAD FLAGS(7);    /* RWE */
+	data.init2 PT_LOAD FLAGS(6);    /* RW_ */
 #endif
 	note PT_NOTE FLAGS(0);          /* ___ */
 }
@@ -89,6 +89,8 @@  SECTIONS
 		IRQENTRY_TEXT
 		*(.fixup)
 		*(.gnu.warning)
+		/* .text should occupy whole number of pages */
+		. = ALIGN(PAGE_SIZE);
 		/* End of text section */
 		_etext = .;
 	} :text = 0x9090
@@ -151,6 +153,8 @@  SECTIONS
 	.data.read_mostly : AT(ADDR(.data.read_mostly) - LOAD_OFFSET) {
 		*(.data.read_mostly)

+		/* .data should occupy whole number of pages */
+		. = ALIGN(PAGE_SIZE);
 		/* End of data section */
 		_edata = .;
 	}
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 0607119..7bfd411 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -423,9 +423,10 @@  void free_init_pages(char *what, unsigned long
begin, unsigned long end)
 	/*
 	 * We just marked the kernel text read only above, now that
 	 * we are going to free part of that, we need to make that
-	 * writeable first.
+	 * writeable and non-executable first.
 	 */
 	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+	set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);

 	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);