linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Borislav Petkov <bp@alien8.de>,
	Attilio Rao <attilio.rao@citrix.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
Date: Fri, 24 Aug 2012 18:27:41 +0100	[thread overview]
Message-ID: <alpine.DEB.2.02.1208241653550.15568@kaball.uk.xensource.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1208241719030.2856@ionos>

On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> On Fri, 24 Aug 2012, Stefano Stabellini wrote:
> > On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> > > I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> > > could avoid all that dance and make all of that pgt_buf_* stuff static
> > > and provide proper accessor functions and hand start, end, top to the
> > > reserve function instead of fiddling with global variables all over
> > > the place. That'd be a real cleanup and progress.
> > > 
> > > But we can't do that easily. And why? Because XEN is making magic
> > > decisions based on those globals in mask_rw_pte().
> > > 
> > > 	/*
> > > 	 * If the new pfn is within the range of the newly allocated
> > > 	 * kernel pagetable, and it isn't being mapped into an
> > > 	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> > > 	 * it is RO.
> > > 	 */
> > > 	if (((!is_early_ioremap_ptep(ptep) &&
> > > 			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> > > 			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> > > 
> > > This comment along with the implementation is really a master piece of
> > > obfuscation. Let's see what this is doing. RO is enforced when:
> > > 
> > > This is not an early ioreamp AND
> > > 
> > >       pfn >= pgt_buf_start && pfn < pgt_buf_top
> > > 
> > > So why is this checking pgt_buf_top? The early stuff is installed
> > > within pgt_buf_start and pgt_buf_end. Anything which is >=
> > > pgt_buf_end at this point is completely wrong.
> > 
> > Unfortunately pgt_buf_end only marks the current end of the pagetable
> > pages (pgt_buf_end keeps increasing during
> > kernel_physical_mapping_init).  However at some point
> > kernel_physical_mapping_init is going to start mapping the pagetable
> > pages themselves, when that happens some of them are not pagetable pages
> > yet (pgt_buf_end <= page < pgt_buf_top) but they are going to be in the
> > near future.
> 
> And how exactly are they allocated between from pgt_buf w/o increasing
> pgt_buf_end ?

They do increase pgt_buf_end when they are allocated, but the entire
[pgt_buf_start - pgt_buf_top) range might already be mapped at that point
as normal memory that falls in the range of addresses passed to
init_memory_mapping as argument.

This is an extract from the commit message of
279b706bf800b5967037f492dbe4fc5081ad5d0f:

--- 

As a consequence of the commit:

commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

---

So let's suppose that we change the check in mask_rw_pte to be:

pfn >= pgt_buf_start && pfn < pgt_buf_end

as it was originally. This is what could happen:

1) pgt_buf_start - pgt_buf_end gets mapped RO;
2) pgt_buf_end - pgt_buf_top gets mapped RW;
3) a new pagetable page is allocated, pgt_buf_end is increased;
4) this new pagetable page is hooked into the pagetable;
5) since a mapping of this page already exists (it was done in
   point 2), and this mapping is RW, Linux crashes.


Thanks for taking the time to look into this issue. I know it is
difficult and not very pleasant.

  reply	other threads:[~2012-08-24 17:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-22 15:08 [PATCH v4 0/2] XEN/X86: Document x86_init.mapping.pagetable_reserve and enforce a better semantic Attilio Rao
2012-08-22 15:08 ` [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve Attilio Rao
2012-08-23 15:46   ` Thomas Gleixner
2012-08-23 15:44     ` Attilio Rao
2012-08-23 17:14       ` Thomas Gleixner
2012-08-23 17:13         ` Attilio Rao
2012-08-24 10:03           ` Borislav Petkov
2012-08-24 10:10             ` Attilio Rao
2012-08-24 11:36             ` Konrad Rzeszutek Wilk
2012-08-24 11:57               ` Stefano Stabellini
2012-08-24 13:00               ` Thomas Gleixner
2012-08-24 13:24                 ` Attilio Rao
2012-08-24 13:45                   ` Borislav Petkov
2012-08-24 15:18                   ` Thomas Gleixner
2012-08-24 13:32                 ` Stefano Stabellini
2012-08-24 15:20                   ` Thomas Gleixner
2012-08-24 17:27                     ` Stefano Stabellini [this message]
2012-08-24 20:10                       ` Thomas Gleixner
2012-08-22 15:08 ` [PATCH v4 2/2] XEN: Document the semantic of x86_init.mapping.pagetable_reserve Attilio Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1208241653550.15568@kaball.uk.xensource.com \
    --to=stefano.stabellini@eu.citrix.com \
    --cc=attilio.rao@citrix.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).