All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@suse.de>
To: Stuart_Hayes@Dell.com
Cc: ak@suse.de, riel@redhat.com, andrea@suse.de,
	linux-kernel@vger.kernel.org, mingo@elte.hu
Subject: Re: page allocation/attributes question (i386/x86_64 specific)
Date: Thu, 7 Jul 2005 16:33:05 +0200	[thread overview]
Message-ID: <20050707143304.GE21330@wotan.suse.de> (raw)
In-Reply-To: <B1939BC11A23AE47A0DBE89A37CB26B407435C@ausx3mps305.aus.amer.dell.com>

On Tue, Jul 05, 2005 at 03:02:26PM -0500, Stuart_Hayes@Dell.com wrote:
[Sorry for  the late answer]

> Hayes, Stuart wrote:
> >> So, if I understand correctly what's going on in x86_64, your fix
> >> wouldn't be applicable to i386.  In x86_64, every large page has a
> >> correct "ref_prot" that is the normal setting for that page... but in
> >> i386, the kernel text area does not--it should ideally be split into
> >> small pages all the time if there are both kernel code & free pages
> >> residing in the same 2M area. 
> >> 
> >> Stuart
> > 
> > (This isn't a submission--I'm just posting this for comments.)
> > 
> > Right now, any large page that touches anywhere from PAGE_OFFSET to
> > __init_end is initially set up as a large, executable page... but
> > some of this area contains data & free pages.  The patch below adds a
> > "cleanup_nx_in_kerneltext()" function, called at the end of
> > free_initmem(), which changes these pages--except for the range from
> > "_text" to "_etext"--to PAGE_KERNEL (i.e., non-executable).     
> > 
> > This does result in two large pages being split up into small PTEs
> > permanently, but all the non-code regions will be non-executable, and
> > change_page_attr() will work correctly.  
> > 
> > What do you think of this?  I have tested this on 2.6.12.
> > 
> > (I've attached the patch as a file, too, since my mail server can't
> > be convinced to not wrap text.) 
> > 
> > Stuart
> > 
> 
> Andi--
> 
> I made another pass at this.  This does roughly the same thing, but it
> doesn't create the new "change_page_attr_perm()" functions.  With this
> patch, the change to init.c (cleanup_nx_in_kerneltext()) is optional.  I
> changed __change_page_attr() so that, if the page to be changed is part
> of a large executable page, it splits the page up *keeping the
> executability of the extra 511 pages*, and then marks the new PTE page
> as reserved so that it won't be reverted.
> 
> So, basically, without the changes to init.c, the NX bits for data in
> the first two big pages won't get fixed until someone calls
> change_page_attr() on them.  If NX is disabled, these patches have no
> functional effect at all.
> 
> How does this look?

If anything you would need to ask Ingo who did the i386 NX code (cc'ed) 

I personally wouldn't like doing this NX cleanup very late like you did 
but instead directly after the early NX setup.



> Thanks!
> Stuart
> 
> -----
> 
> diff -purN linux-2.6.12grep/arch/i386/mm/init.c
> linux-2.6.12/arch/i386/mm/init.c
> --- linux-2.6.12grep/arch/i386/mm/init.c	2005-07-01
> 15:09:27.000000000 -0500
> +++ linux-2.6.12/arch/i386/mm/init.c	2005-07-05 14:32:57.000000000
> -0500
> @@ -666,6 +666,28 @@ static int noinline do_test_wp_bit(void)
>  	return flag;
>  }
>  
> +/*
> + * In kernel_physical_mapping_init(), any big pages that contained
> kernel text area were
> + * set up as big executable pages.  This function should be called when
> the initmem
> + * is freed, to correctly set up the executable & non-executable pages
> in this area.
> + */
> +static void cleanup_nx_in_kerneltext(void)
> +{
> +	unsigned long from, to;
> +
> +	if (!nx_enabled) return;
> +
> +	from = PAGE_OFFSET;
> +	to = (unsigned long)_text & PAGE_MASK;
> +	for (; from<to; from += PAGE_SIZE)
> +		change_page_attr(virt_to_page(from), 1, PAGE_KERNEL); 
> +	
> +	from = ((unsigned long)_etext + PAGE_SIZE - 1) & PAGE_MASK;
> +	to = ((unsigned long)__init_end + LARGE_PAGE_SIZE) &
> LARGE_PAGE_MASK;
> +	for (; from<to; from += PAGE_SIZE)
> +		change_page_attr(virt_to_page(from), 1, PAGE_KERNEL); 
> +}
> +
>  void free_initmem(void)
>  {
>  	unsigned long addr;
> @@ -679,6 +701,8 @@ void free_initmem(void)
>  		totalram_pages++;
>  	}
>  	printk (KERN_INFO "Freeing unused kernel memory: %dk freed\n",
> (__init_end - __init_begin) >> 10);
> +
> +	cleanup_nx_in_kerneltext();
>  }
>  
>  #ifdef CONFIG_BLK_DEV_INITRD
> diff -purN linux-2.6.12grep/arch/i386/mm/pageattr.c
> linux-2.6.12/arch/i386/mm/pageattr.c
> --- linux-2.6.12grep/arch/i386/mm/pageattr.c	2005-07-01
> 15:09:08.000000000 -0500
> +++ linux-2.6.12/arch/i386/mm/pageattr.c	2005-07-05
> 14:44:53.000000000 -0500
> @@ -35,7 +35,8 @@ pte_t *lookup_address(unsigned long addr
>          return pte_offset_kernel(pmd, address);
>  } 
>  
> -static struct page *split_large_page(unsigned long address, pgprot_t
> prot)
> +static struct page *split_large_page(unsigned long address, pgprot_t
> prot, 
> +					pgprot_t ref_prot)
>  { 
>  	int i; 
>  	unsigned long addr;
> @@ -53,7 +54,7 @@ static struct page *split_large_page(uns
>  	pbase = (pte_t *)page_address(base);
>  	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
>  		pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
> -				   addr == address ? prot :
> PAGE_KERNEL);
> +				   addr == address ? prot : ref_prot);
>  	}
>  	return base;
>  } 
> @@ -118,11 +119,30 @@ __change_page_attr(struct page *page, pg
>  	if (!kpte)
>  		return -EINVAL;
>  	kpte_page = virt_to_page(kpte);
> +
> +	/*
> +	 * If this page is part of a large page that's executable (and
> NX is
> +	 * enabled), then split page up and set new PTE page as reserved
> so
> +	 * we won't revert this back into a large page.  This should
> only
> +	 * happen in large pages that also contain kernel executable
> code,
> +	 * and shouldn't happen at all if init.c correctly sets up NX
> regions.
> +	 */
> +	if (nx_enabled && 
> +	    !(pte_val(*kpte) & _PAGE_NX) &&
> +	    (pte_val(*kpte) & _PAGE_PSE)) {
> +		struct page *split = split_large_page(address, prot,
> PAGE_KERNEL_EXEC); 
> +		if (!split)
> +			return -ENOMEM;
> +		set_pmd_pte(kpte,address,mk_pte(split,
> PAGE_KERNEL_EXEC));
> +		SetPageReserved(split);

Why setting reserved? I don't think cpa checks that anywhere.
In fact Nick has been working on getting rid of Reserved completely.


Anyways, isn't the code from x86-64 for this enough? It should 
work here too. I don't think such a ugly special case is needed.


-Andi

> +		return 0;
> +	}
> +
>  	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
>  		if ((pte_val(*kpte) & _PAGE_PSE) == 0) { 
>  			set_pte_atomic(kpte, mk_pte(page, prot)); 
>  		} else {
> -			struct page *split = split_large_page(address,
> prot); 
> +			struct page *split = split_large_page(address,
> prot, PAGE_KERNEL); 
>  			if (!split)
>  				return -ENOMEM;
>  			set_pmd_pte(kpte,address,mk_pte(split,
> PAGE_KERNEL));



  parent reply	other threads:[~2005-07-07 14:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-05 20:02 page allocation/attributes question (i386/x86_64 specific) Stuart_Hayes
2005-07-05 21:27 ` randy_dunlap
2005-07-07 14:33 ` Andi Kleen [this message]
  -- strict thread matches above, loose matches on Subject: below --
2005-07-20 19:24 Stuart_Hayes
2005-07-20 15:06 Stuart_Hayes
2005-07-20 20:45 ` Ingo Molnar
2005-07-20 14:51 Stuart_Hayes
2005-07-20 15:00 ` Ingo Molnar
2005-07-19 21:20 Stuart Hayes
2005-07-19 22:04 ` Ingo Molnar
2005-07-07 17:21 Stuart_Hayes
2005-07-05 21:35 Stuart_Hayes
2005-07-05 22:00 ` randy_dunlap
2005-07-01 20:33 Stuart_Hayes
2005-06-30 19:11 Stuart_Hayes
2005-06-30 17:14 Stuart_Hayes
2005-06-30 16:56 Stuart_Hayes
2005-06-30 17:34 ` Arjan van de Ven
2005-06-29 17:20 Stuart_Hayes
2005-06-28 20:16 Stuart_Hayes
2005-06-30 16:15 ` Andi Kleen
2005-06-24 18:20 Stuart_Hayes
2005-06-25  2:28 ` Andi Kleen
2005-06-22 17:11 Stuart_Hayes
2005-06-22 23:54 ` Rik Van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050707143304.GE21330@wotan.suse.de \
    --to=ak@suse.de \
    --cc=Stuart_Hayes@Dell.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.