All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Subject: Re: "x86-64, mm: Put early page table high" causes crash on Xen
Date: Tue, 8 Mar 2011 12:14:33 +0000	[thread overview]
Message-ID: <alpine.DEB.2.00.1103081203260.2968@kaball-desktop> (raw)
In-Reply-To: <AANLkTi=aUGdjvtvy81FimMZB5PZNzASXHN0BZHWHUuLs@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9718 bytes --]

On Mon, 7 Mar 2011, Yinghai Lu wrote:
> On Mon, Mar 7, 2011 at 7:47 AM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Tue, 1 Mar 2011, Yinghai Lu wrote:
> >> On 03/01/2011 09:21 AM, Stefano Stabellini wrote:
> >> > Yinghai Lu,
> >> > while testing tip/x86/mm on Xen I found out that the commit "x86-64, mm:
> >> > Put early page table high" reliably crashes Linux at boot.
> >> > The reason is well explained by the commit message of
> >> > fef5ba797991f9335bcfc295942b684f9bf613a1:
> >> >
> >> > "Xen requires that all pages containing pagetable entries to be mapped
> >> > read-only.  If pages used for the initial pagetable are already mapped
> >> > then we can change the mapping to RO.  However, if they are initially
> >> > unmapped, we need to make sure that when they are later mapped, they
> >> > are also mapped RO.
> >> >
> >> > We do this by knowing that the kernel pagetable memory is pre-allocated
> >> > in the range e820_table_start - e820_table_end, so any pfn within this
> >> > range should be mapped read-only.  However, the pagetable setup code
> >> > (Those mappings are removed before the pages are presented to Xen
> >> > as pagetable pages.)"
> >> >
> >> > In other words mask_rw_pte (called by xen_set_pte_init) should mark RO
> >> > the already existing pagetable pages (like the ones belonging to the
> >> > initial mappings), while it should mark RW the new pages not yet hooked
> >> > into the pagetable.  This is what the following lines used to achieve,
> >> > but don't anymore:
> >> >
> >> >     /*
> >> >      * If the new pfn is within the range of the newly allocated
> >> >      * kernel pagetable, and it isn't being mapped into an
> >> >      */
> >> >         pfn >= pgt_buf_start && pfn < pgt_buf_end)
> >> >             pte = pte_wrprotect(pte);
> >> >
> >> > Unfortunately now we map the already existing initial pagetable pages a
> >> > second time and the new zeroed pages using map_low_page, so we are
> >> > unable to distinguish between the two.
> >> >
> >> > Can we go back to the previous way of accessing pagetable pages from
> >> > kernel_physical_mapping_init, while keeping the new pagetable allocation
> >> > strategy? It seems to me that the introduction of map_low_page is not
> >> > actually required, is it? In that case we could just revert that bit...
> >> > (appended partial revert example).
> >>
> >> We do need map_low_page ( BTW, that name is totally misleading...)
> >>
> >> the reason is we put page_table high and at that time is not under max_pfn_mapped. (aka not mapped).
> >>
> >> So have to use
> >>       adr = early_memremap(phys & PAGE_MASK, PAGE_SIZE);
> >> to early map it and Read/Write to it.
> >
> > As you might already know from my other email few days ago, I have found
> > a solution to this problem modifying only Xen specific code.
> > However I went back to this because I want to be sure that the patch is
> > the correct solution rather than a workaround.
> >
> > It seems to me that only pagetable pages freshly allocated with
> > alloc_low_page needs to be mapped because initial pagetable pages are
> > under max_pfn_mapped.
> > Considering that kernel_physical_mapping_init doesn't go through the
> > same pagetable page twice, it is never the case that we map_low_page
> > anything but initial pagetable pages that are under max_pfn_mapped
> > anyway.
> > For example this is what happens on my machine:
> >
> >
> > [    0.000000] initial memory mapped : 0 - 024be000
> > [    0.000000] init_memory_mapping: 0000000000000000-00000000cffc2000
> > [    0.000000]  0000000000 - 00cffc2000 page 4k
> > [    0.000000] kernel direct mapping tables up to cffc2000 @ cf93d000-cffc2000
> > [    0.000000] DEBUG map low page(2004000, 00001000)
> > [    0.000000] DEBUG map low page(2008000, 00001000)
> > [    0.000000] DEBUG map low page(2c62000, 00001000)
> > [    0.000000] DEBUG map low page(2c65000, 00001000)
> > [    0.000000] DEBUG map low page(2c66000, 00001000)
> > [    0.000000] DEBUG map low page(2c67000, 00001000)
> > [    0.000000] DEBUG map low page(2c68000, 00001000)
> > [    0.000000] DEBUG map low page(2c69000, 00001000)
> > [    0.000000] DEBUG map low page(2c6a000, 00001000)
> > [    0.000000] DEBUG map low page(2c6b000, 00001000)
> > [    0.000000] DEBUG map low page(2c6c000, 00001000)
> > [    0.000000] DEBUG map low page(2c6d000, 00001000)
> > [    0.000000] DEBUG map low page(2c6e000, 00001000)
> > [    0.000000] DEBUG map low page(2c6f000, 00001000)
> > [    0.000000] DEBUG map low page(2c70000, 00001000)
> > [    0.000000] DEBUG map low page(2c71000, 00001000)
> > [    0.000000] DEBUG map low page(2c72000, 00001000)
> > [    0.000000] DEBUG map low page(2c73000, 00001000)
> > [    0.000000] DEBUG map low page(2c74000, 00001000)
> > [    0.000000] DEBUG map low page(2c75000, 00001000)
> > [    0.000000] DEBUG map low page(2c76000, 00001000)
> > [    0.000000] DEBUG map low page(2c77000, 00001000)
> > [    0.000000] DEBUG map low page(2c78000, 00001000)
> > [    0.000000] DEBUG map low page(2c79000, 00001000)
> > [    0.000000] DEBUG map low page(2c7a000, 00001000)
> > [    0.000000] DEBUG map low page(2c7b000, 00001000)
> > [    0.000000] DEBUG map low page(239a000, 00001000)
> > [    0.000000] DEBUG map low page(239b000, 00001000)
> > [    0.000000] DEBUG map low page(239c000, 00001000)
> > [    0.000000] DEBUG map low page(239d000, 00001000)
> 
> these already mapped.

Right. So why are we mapping them again using map_low_page?
Maybe we can just access them directly like we did before?
In my tests going back to using phys_pud_update does not cause any
regressions.


> > [    0.000000] DEBUG alloc low page(cf93d000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf93e000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf93f000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf940000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf941000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf942000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf943000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf944000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf945000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf946000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf947000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf948000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf949000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf94a000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf94b000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf94c000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf94d000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf94e000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf94f000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf950000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf951000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf952000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf953000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf954000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf955000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf956000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf957000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf958000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf959000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf95a000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf95b000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf95c000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf95d000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf95e000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf95f000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf960000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf961000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf962000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf963000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf964000, 00001000)
> > [    0.000000] DEBUG alloc low page(cf965000, 00001000)
> 
> these are not mapped yet, and need to use early_ioremap.

In fact we are mapping them inside alloc_low_page. No need for
map_low_page, right?


> > if this is the case the introduction of map_low_page is unnecessary and
> > actually makes the kernel map pages that are already mapped anyway.
> > Am I correct?
> 
> initial mapping only to 512M, and new page for page tables could above that.

Sure, but the new pages are mapped from within alloc_low_page, so it
looks like the call to map_low_page is not useful and we could just go
back to access already existing pagetable pages directly.
For example:


@@ -554,6 +554,17 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 	return last_map_addr;
 }
 
+static unsigned long __meminit
+phys_pud_update(pgd_t *pgd, unsigned long addr, unsigned long end,
+		 unsigned long page_size_mask)
+{
+	pud_t *pud;
+
+	pud = (pud_t *)pgd_page_vaddr(*pgd);
+
+	return phys_pud_init(pud, addr, end, page_size_mask);
+}
+
 unsigned long __meminit
 kernel_physical_mapping_init(unsigned long start,
 			     unsigned long end,
@@ -577,10 +588,8 @@ kernel_physical_mapping_init(unsigned long start,
 			next = end;
 
 		if (pgd_val(*pgd)) {
-			pud = map_low_page((pud_t *)pgd_page_vaddr(*pgd));
-			last_map_addr = phys_pud_init(pud, __pa(start),
+			last_map_addr = phys_pud_update(pgd, __pa(start),
 						 __pa(end), page_size_mask);
-			unmap_low_page(pud);
 			continue;
 		}

  reply	other threads:[~2011-03-08 12:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-01 17:21 "x86-64, mm: Put early page table high" causes crash on Xen Stefano Stabellini
2011-03-01 21:47 ` Yinghai Lu
2011-03-02 15:23   ` Stefano Stabellini
2011-03-02 17:06     ` Konrad Rzeszutek Wilk
2011-03-07 15:55       ` Stefano Stabellini
2011-03-07 15:47   ` Stefano Stabellini
2011-03-07 19:17     ` Yinghai Lu
2011-03-08 12:14       ` Stefano Stabellini [this message]
2011-03-08 16:39         ` Yinghai Lu
2011-03-08 17:40           ` Stefano Stabellini
2011-03-08 18:54             ` Yinghai Lu
2011-03-08 20:09               ` Stefano Stabellini
2011-03-08 20:21                 ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1103081203260.2968@kaball-desktop \
    --to=stefano.stabellini@eu.citrix.com \
    --cc=Jeremy.Fitzhardinge@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.