All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: use pgd accessors when cloning a pgd range.
@ 2010-10-27  8:50 Ian Campbell
  2010-10-27 10:40 ` Borislav Petkov
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2010-10-27  8:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ian Campbell, Borislav Petkov, H. Peter Anvin, Jeremy Fitzhardinge

Page tables should always be updated using the proper accessor
methods. Not doing so bypasses the paravirt infrastructure.

In this case the failure to do so was exposed under Xen by
b40827fa7268 "x86-32, mm: Add an initial page table for core
bootstrapping".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/include/asm/pgtable.h |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ada823a..0b4c514 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -619,7 +619,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
  */
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
-       memcpy(dst, src, count * sizeof(pgd_t));
+	int i;
+
+	for (i=0; i<count; i++)
+		set_pgd(&dst[i], src[i]);
 }
 
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27  8:50 [PATCH] x86: use pgd accessors when cloning a pgd range Ian Campbell
@ 2010-10-27 10:40 ` Borislav Petkov
  2010-10-27 16:50   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2010-10-27 10:40 UTC (permalink / raw)
  To: Ian Campbell; +Cc: linux-kernel, x86, H. Peter Anvin, Jeremy Fitzhardinge

On Wed, Oct 27, 2010 at 09:50:13AM +0100, Ian Campbell wrote:
> Page tables should always be updated using the proper accessor
> methods. Not doing so bypasses the paravirt infrastructure.
> 
> In this case the failure to do so was exposed under Xen by
> b40827fa7268 "x86-32, mm: Add an initial page table for core
> bootstrapping".
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> ---
>  arch/x86/include/asm/pgtable.h |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index ada823a..0b4c514 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -619,7 +619,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
>   */
>  static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
>  {
> -       memcpy(dst, src, count * sizeof(pgd_t));
> +	int i;
> +
> +	for (i=0; i<count; i++)
> +		set_pgd(&dst[i], src[i]);

Hmm, this slows down clone_pgd_range(). It is called at 3 sites total,
two of which happen only on boot in setup_arch() so they can be ignored
but the callchain

copy_process()
...
mm_init()
|->mm_alloc_pgd()
  |->pgd_alloc()
    |->pgd_ctor()
      |->clone_pgd_range()

could become noticeable. To be on the safe side, I'd make
clone_pgd_range() a macro calling either the native or the xen version..

hpa?

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 10:40 ` Borislav Petkov
@ 2010-10-27 16:50   ` Jeremy Fitzhardinge
  2010-10-27 17:18     ` H. Peter Anvin
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 16:50 UTC (permalink / raw)
  To: Borislav Petkov, Ian Campbell, linux-kernel, x86, H. Peter Anvin

 On 10/27/2010 03:40 AM, Borislav Petkov wrote:
> On Wed, Oct 27, 2010 at 09:50:13AM +0100, Ian Campbell wrote:
>> Page tables should always be updated using the proper accessor
>> methods. Not doing so bypasses the paravirt infrastructure.
>>
>> In this case the failure to do so was exposed under Xen by
>> b40827fa7268 "x86-32, mm: Add an initial page table for core
>> bootstrapping".
>>
>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: H. Peter Anvin <hpa@linux.intel.com>
>> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
>> ---
>>  arch/x86/include/asm/pgtable.h |    5 ++++-
>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> index ada823a..0b4c514 100644
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -619,7 +619,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
>>   */
>>  static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
>>  {
>> -       memcpy(dst, src, count * sizeof(pgd_t));
>> +	int i;
>> +
>> +	for (i=0; i<count; i++)
>> +		set_pgd(&dst[i], src[i]);
> Hmm, this slows down clone_pgd_range(). It is called at 3 sites total,
> two of which happen only on boot in setup_arch() so they can be ignored
> but the callchain
>
> copy_process()
> ...
> mm_init()
> |->mm_alloc_pgd()
>   |->pgd_alloc()
>     |->pgd_ctor()
>       |->clone_pgd_range()
>
> could become noticeable. To be on the safe side, I'd make
> clone_pgd_range() a macro calling either the native or the xen version..

Frankly I'd want to see some numbers before getting too worried about
it; if it were a problem we could make the native set_pgd inlined into
the callside (it is just a memory write after all), which will be very
similar to memcpy in performance.

I am, however, more concerned about the effect on performance under
Xen.  xen_set_pgd will avoid doing a hypercall in this case (the
pagetable isn't yet pinned), but it has to do a moderate amount of work
to avoid doing the hypercall, and could really add some measurable
latency to process creation (which is not something we need right now). 
For that a clone_pgd_range() hypercall is the most straightforward
answer, but I'm loathe to propose that right now.

This never used to be a problem.  Perhaps we can change how
clone_pgd_range is used at boot time to avoid it in the Xen case (since
we don't care about the secondary pagetable)?

    J

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 16:50   ` Jeremy Fitzhardinge
@ 2010-10-27 17:18     ` H. Peter Anvin
  2010-10-27 17:31       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 17:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>
> This never used to be a problem.  Perhaps we can change how
> clone_pgd_range is used at boot time to avoid it in the Xen case (since
> we don't care about the secondary pagetable)?
>

Xen shouldn't have any users of this, since it's used for low-level 
operations like SMP bootstrap, suspend to RAM, reboot and low-level BIOS 
functionality.

	-hpa


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:18     ` H. Peter Anvin
@ 2010-10-27 17:31       ` Jeremy Fitzhardinge
  2010-10-27 17:42         ` H. Peter Anvin
  2010-10-27 17:44         ` Borislav Petkov
  0 siblings, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 17:31 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

 On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>>
>> This never used to be a problem.  Perhaps we can change how
>> clone_pgd_range is used at boot time to avoid it in the Xen case (since
>> we don't care about the secondary pagetable)?
>>
>
> Xen shouldn't have any users of this, since it's used for low-level
> operations like SMP bootstrap, suspend to RAM, reboot and low-level
> BIOS functionality.
>

Right, but it is being called smack in the middle of setup_arch().  It
looks like they could be hidden away in
native_pagetable_setup_start/done though.

    J

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:31       ` Jeremy Fitzhardinge
@ 2010-10-27 17:42         ` H. Peter Anvin
  2010-10-27 17:51           ` Ian Campbell
  2010-10-27 17:51           ` Jeremy Fitzhardinge
  2010-10-27 17:44         ` Borislav Petkov
  1 sibling, 2 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 17:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

On 10/27/2010 10:31 AM, Jeremy Fitzhardinge wrote:
>   On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
>> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>>>
>>> This never used to be a problem.  Perhaps we can change how
>>> clone_pgd_range is used at boot time to avoid it in the Xen case (since
>>> we don't care about the secondary pagetable)?
>>>
>>
>> Xen shouldn't have any users of this, since it's used for low-level
>> operations like SMP bootstrap, suspend to RAM, reboot and low-level
>> BIOS functionality.
>>
>
> Right, but it is being called smack in the middle of setup_arch().  It
> looks like they could be hidden away in
> native_pagetable_setup_start/done though.
>

This is what makes me absolutely hate paravirt with a passion... "let's 
hid things away in <obscure place> and make it absolutely impossible to 
either follow the code flow or figure out what the intended semantics 
are supposed to be."  (Let not even get me started on how ill-defined 
the semantics of some of the paravirt operations are.)  In this case, at 
the most you need a single flag of state... or you could even just 
ignore this low-level data structure that you will never use in the 
first place.  Ian's message just mentioned "a failure" and never 
described in any way what kind of "failure" it was.

	-hpa



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:31       ` Jeremy Fitzhardinge
  2010-10-27 17:42         ` H. Peter Anvin
@ 2010-10-27 17:44         ` Borislav Petkov
  2010-10-27 17:54           ` Jeremy Fitzhardinge
  2010-10-27 17:58           ` Ian Campbell
  1 sibling, 2 replies; 22+ messages in thread
From: Borislav Petkov @ 2010-10-27 17:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: H. Peter Anvin, Ian Campbell, linux-kernel, x86

On Wed, Oct 27, 2010 at 10:31:37AM -0700, Jeremy Fitzhardinge wrote:
> > Xen shouldn't have any users of this, since it's used for low-level
> > operations like SMP bootstrap, suspend to RAM, reboot and low-level
> > BIOS functionality.
> >
> Right, but it is being called smack in the middle of setup_arch().  It
> looks like they could be hidden away in
> native_pagetable_setup_start/done though.

I think I can put the second sync-back clone_pgd_range in
native_pagetable_setup_done but the first one needs to take place right
at the beginning of setup_arch() because a lot of code in-between relies
on swapper_pg_dir containing proper entries.

Just to make sure I understand you correctly: best it would be for xen
to not have the copy_pgd_range() calls in setup_arch, correct?

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:42         ` H. Peter Anvin
@ 2010-10-27 17:51           ` Ian Campbell
  2010-10-27 17:51           ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 17:51 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Jeremy Fitzhardinge, Borislav Petkov, linux-kernel, x86

On Wed, 2010-10-27 at 18:42 +0100, H. Peter Anvin wrote:
> On 10/27/2010 10:31 AM, Jeremy Fitzhardinge wrote:
> >   On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
> >> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
> >>>
> >>> This never used to be a problem.  Perhaps we can change how
> >>> clone_pgd_range is used at boot time to avoid it in the Xen case (since
> >>> we don't care about the secondary pagetable)?
> >>>
> >>
> >> Xen shouldn't have any users of this, since it's used for low-level
> >> operations like SMP bootstrap, suspend to RAM, reboot and low-level
> >> BIOS functionality.
> >>
> >
> > Right, but it is being called smack in the middle of setup_arch().  It
> > looks like they could be hidden away in
> > native_pagetable_setup_start/done though.
> >
> 
> This is what makes me absolutely hate paravirt with a passion... "let's 
> hid things away in <obscure place> and make it absolutely impossible to 
> either follow the code flow or figure out what the intended semantics 
> are supposed to be."  (Let not even get me started on how ill-defined 
> the semantics of some of the paravirt operations are.)  In this case, at 
> the most you need a single flag of state... or you could even just 
> ignore this low-level data structure that you will never use in the 
> first place.  Ian's message just mentioned "a failure" and never 
> described in any way what kind of "failure" it was.

That was vague of me, sorry about that. The failure was Xen shooting the
guest due to an attempt to write to swapper_pg_dir directly, which isn't
allowed for active pagetables under Xen.

Ian.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:42         ` H. Peter Anvin
  2010-10-27 17:51           ` Ian Campbell
@ 2010-10-27 17:51           ` Jeremy Fitzhardinge
  2010-10-27 18:02             ` Ian Campbell
  2010-10-27 18:11             ` H. Peter Anvin
  1 sibling, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 17:51 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

 On 10/27/2010 10:42 AM, H. Peter Anvin wrote:
> On 10/27/2010 10:31 AM, Jeremy Fitzhardinge wrote:
>>   On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
>>> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>>>>
>>>> This never used to be a problem.  Perhaps we can change how
>>>> clone_pgd_range is used at boot time to avoid it in the Xen case
>>>> (since
>>>> we don't care about the secondary pagetable)?
>>>>
>>>
>>> Xen shouldn't have any users of this, since it's used for low-level
>>> operations like SMP bootstrap, suspend to RAM, reboot and low-level
>>> BIOS functionality.
>>>
>>
>> Right, but it is being called smack in the middle of setup_arch().  It
>> looks like they could be hidden away in
>> native_pagetable_setup_start/done though.
>>
>
> This is what makes me absolutely hate paravirt with a passion...
> "let's hid things away in <obscure place> and make it absolutely
> impossible to either follow the code flow or figure out what the
> intended semantics are supposed to be."

Its not really an obscure place; it's where x86-32 does the rest of its
boot-time pagetable adjustments (like cleaning out the low identity
maps, etc).  Having those clone_pgd_ranges() floating around in
setup_arch() is out of place.

> (Let not even get me started on how ill-defined the semantics of some
> of the paravirt operations are.)  In this case, at the most you need a
> single flag of state... or you could even just ignore this low-level
> data structure that you will never use in the first place.  Ian's
> message just mentioned "a failure" and never described in any way what
> kind of "failure" it was.

It would be a pagefault from Xen preventing a direct write to the pgd
level of an active pagetable.  At the point in setup_arch() where it
does the first clone_pgd_range() we're already running on swapper_pg_dir
and the copy from initial_page_table is outright wrong.

As Ian suggests, we could switch Xen to use initial_page_table at boot
then move to swapper_pg_dir in the same way native does.

    J

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:44         ` Borislav Petkov
@ 2010-10-27 17:54           ` Jeremy Fitzhardinge
  2010-10-27 17:58           ` Ian Campbell
  1 sibling, 0 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 17:54 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Ian Campbell, linux-kernel, x86

 On 10/27/2010 10:44 AM, Borislav Petkov wrote:
> On Wed, Oct 27, 2010 at 10:31:37AM -0700, Jeremy Fitzhardinge wrote:
>>> Xen shouldn't have any users of this, since it's used for low-level
>>> operations like SMP bootstrap, suspend to RAM, reboot and low-level
>>> BIOS functionality.
>>>
>> Right, but it is being called smack in the middle of setup_arch().  It
>> looks like they could be hidden away in
>> native_pagetable_setup_start/done though.
> I think I can put the second sync-back clone_pgd_range in
> native_pagetable_setup_done but the first one needs to take place right
> at the beginning of setup_arch() because a lot of code in-between relies
> on swapper_pg_dir containing proper entries.

OK.

> Just to make sure I understand you correctly: best it would be for xen
> to not have the copy_pgd_range() calls in setup_arch, correct?

Right.  Could you put the first copy into swapper_pg_dir earlier, before
setup_arch() somewhere that's specific to booting via head_32.S?

    J

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:44         ` Borislav Petkov
  2010-10-27 17:54           ` Jeremy Fitzhardinge
@ 2010-10-27 17:58           ` Ian Campbell
  2010-10-28  9:23             ` Ian Campbell
  1 sibling, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 17:58 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86

On Wed, 2010-10-27 at 18:44 +0100, Borislav Petkov wrote:
> On Wed, Oct 27, 2010 at 10:31:37AM -0700, Jeremy Fitzhardinge wrote:
> > > Xen shouldn't have any users of this, since it's used for low-level
> > > operations like SMP bootstrap, suspend to RAM, reboot and low-level
> > > BIOS functionality.
> > >
> > Right, but it is being called smack in the middle of setup_arch().  It
> > looks like they could be hidden away in
> > native_pagetable_setup_start/done though.
> 
> I think I can put the second sync-back clone_pgd_range in
> native_pagetable_setup_done but the first one needs to take place right
> at the beginning of setup_arch() because a lot of code in-between relies
> on swapper_pg_dir containing proper entries.
> 
> Just to make sure I understand you correctly: best it would be for xen
> to not have the copy_pgd_range() calls in setup_arch, correct?

Yes, or to ensure that swapper_pg_dir has not yet been marked as a
pagetable at this point. Under Xen we are running on swapper_pg_dir when
setup_arch is called.

I think we could achieve this within the Xen pvops backend but e.g. the
following would likely work also.

(Note, untested. I also omitted the reindent for clarity at this point)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 95a3274..2d89d5a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -705,12 +705,14 @@ void __init setup_arch(char **cmdline_p)
 	 * copy kernel address range established so far and switch
 	 * to the proper swapper page table
 	 */
+	if (read_cr3() != __pa(swapper_pg_dir)) {
 	clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
 			initial_page_table + KERNEL_PGD_BOUNDARY,
 			KERNEL_PGD_PTRS);
 
 	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
+	}
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:51           ` Jeremy Fitzhardinge
@ 2010-10-27 18:02             ` Ian Campbell
  2010-10-27 18:11             ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 18:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: H. Peter Anvin, Borislav Petkov, linux-kernel, x86

On Wed, 2010-10-27 at 18:51 +0100, Jeremy Fitzhardinge wrote:
> On 10/27/2010 10:42 AM, H. Peter Anvin wrote:

> > Ian's message just mentioned "a failure" and never described in any way what
> > kind of "failure" it was.
> 
> It would be a pagefault from Xen preventing a direct write to the pgd
> level of an active pagetable.  At the point in setup_arch() where it
> does the first clone_pgd_range() we're already running on swapper_pg_dir
> and the copy from initial_page_table is outright wrong.

I'd missed that aspect, yes the contents of initial_page_table are
wrong. I'm not sure now how my patch to clone_pgd_range even made a
difference...

> As Ian suggests, we could switch Xen to use initial_page_table at boot
> then move to swapper_pg_dir in the same way native does.

Accidentally did that in private mail, for everyone else the gory
details are:

> xen_setup_kernel_pagetable operates on initial_page_table instead of
> swapper_pg_dir. We do not pin initial_page_table apart from the
> implicit
> one from writing it to cr3 (it is necessarily r/o though).
> 
> So we enter setup_arch running on initial_page_table and with
> swapper_pg_dir mapped r/w, which is how it looks on native too. So at
> this point the:
>         clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
>                         initial_page_table + KERNEL_PGD_BOUNDARY,
>                         KERNEL_PGD_PTRS);
> works just fine and we reach:
>       load_cr3(swapper_pg_dir);
> 
> Then in __xen_write_cr3 we notice the first attempt to switch to
> swapper_pg_dir and only at that point do we make it r/o and pin it.
> When
> we then actually do the switch to swapper_pg_dir that releases the
> implicit pin on initial_page_table so we can make it r/o again.
> 
> The later on we reach the:
>        clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
>                         swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
>                         KERNEL_PGD_PTRS);
> which will succeed because initial_page_table is r/w again.
> 
> We don't care about the pin on initial_page_table because we should
> never need it again.

Ian.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:51           ` Jeremy Fitzhardinge
  2010-10-27 18:02             ` Ian Campbell
@ 2010-10-27 18:11             ` H. Peter Anvin
  2010-10-27 18:51               ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 18:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

On 10/27/2010 10:51 AM, Jeremy Fitzhardinge wrote:
>>
>> This is what makes me absolutely hate paravirt with a passion...
>> "let's hid things away in<obscure place>  and make it absolutely
>> impossible to either follow the code flow or figure out what the
>> intended semantics are supposed to be."
>
> Its not really an obscure place; it's where x86-32 does the rest of its
> boot-time pagetable adjustments (like cleaning out the low identity
> maps, etc).  Having those clone_pgd_ranges() floating around in
> setup_arch() is out of place.
>

"Cleaning out the low identity maps" is part of what this patchset 
eliminates.  This is exactly a good reason why paravirt_ops damages the 
kernel -- it makes it impossible to make forward process.

>> (Let not even get me started on how ill-defined the semantics of some
>> of the paravirt operations are.)  In this case, at the most you need a
>> single flag of state... or you could even just ignore this low-level
>> data structure that you will never use in the first place.  Ian's
>> message just mentioned "a failure" and never described in any way what
>> kind of "failure" it was.
>
> It would be a pagefault from Xen preventing a direct write to the pgd
> level of an active pagetable.  At the point in setup_arch() where it
> does the first clone_pgd_range() we're already running on swapper_pg_dir
> and the copy from initial_page_table is outright wrong.
>
> As Ian suggests, we could switch Xen to use initial_page_table at boot
> then move to swapper_pg_dir in the same way native does.

Once the failure was explained, it makes more sense.  Either that or 
just skip this setting if we're already running on swapper_pg_dir.

Let me state this clearly: if Xen is going to continue to live as a 
merged platform, it has to have an obligation to follow changes on the 
native platform.  This is not unique to Xen, but rather a universal rule 
for integrated platforms.  Xen is more widely used than a lot of the 
other minority platforms, which means it legitimately gets allowed more 
slack, but that is moderated by its tremendous invasiveness.

Quite frankly, the single biggest thing you could improve is to improve 
documentation about what you expect in terms of semantics of various 
entry points.  There are a number of cleanups which we currently cannot 
do because they are directly mapped to paravirt_ops which unclear or 
nonsensical semantics.  Having a more explicit description of the design 
space would help there.

paravirt_ops is fundamentally misdesigned as a large monolithic 
driverization layer which combines a lot of unrelated things.  In a 
whole lot of cases it directly duplicates driverization layers already 
in the kernel, meaning we take the cost both in cost clarity and 
performance multiple times.  The patching technology is nice, and it 
would be good to have that available to other platform layers as well, 
but paravirt_ops as it currently sits is going to have to go at some point.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 18:11             ` H. Peter Anvin
@ 2010-10-27 18:51               ` Jeremy Fitzhardinge
  2010-10-27 19:00                 ` H. Peter Anvin
  2010-10-27 19:02                 ` H. Peter Anvin
  0 siblings, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 18:51 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

 On 10/27/2010 11:11 AM, H. Peter Anvin wrote:
> On 10/27/2010 10:51 AM, Jeremy Fitzhardinge wrote:
>>>
>>> This is what makes me absolutely hate paravirt with a passion...
>>> "let's hid things away in<obscure place>  and make it absolutely
>>> impossible to either follow the code flow or figure out what the
>>> intended semantics are supposed to be."
>>
>> Its not really an obscure place; it's where x86-32 does the rest of its
>> boot-time pagetable adjustments (like cleaning out the low identity
>> maps, etc).  Having those clone_pgd_ranges() floating around in
>> setup_arch() is out of place.
>>
>
> "Cleaning out the low identity maps" is part of what this patchset
> eliminates.

Sorry, I didn't look closely enough; its actually removing mappings
beyond the end of physical memory (though I'm not sure why it is 32-bit
only?).

>   This is exactly a good reason why paravirt_ops damages the kernel --
> it makes it impossible to make forward process.

I don't follow.  Why is it impossible to make forward progress?  How
specifically does pvops make it impossible?

>> It would be a pagefault from Xen preventing a direct write to the pgd
>> level of an active pagetable.  At the point in setup_arch() where it
>> does the first clone_pgd_range() we're already running on swapper_pg_dir
>> and the copy from initial_page_table is outright wrong.
>>
>> As Ian suggests, we could switch Xen to use initial_page_table at boot
>> then move to swapper_pg_dir in the same way native does.
>
> Once the failure was explained, it makes more sense.  Either that or
> just skip this setting if we're already running on swapper_pg_dir.

Yes, that's probably the simplest answer (Ian just proposed it
independently).

> Let me state this clearly: if Xen is going to continue to live as a
> merged platform, it has to have an obligation to follow changes on the
> native platform.  This is not unique to Xen, but rather a universal
> rule for integrated platforms.  Xen is more widely used than a lot of
> the other minority platforms, which means it legitimately gets allowed
> more slack, but that is moderated by its tremendous invasiveness.
>
> Quite frankly, the single biggest thing you could improve is to
> improve documentation about what you expect in terms of semantics of
> various entry points.  There are a number of cleanups which we
> currently cannot do because they are directly mapped to paravirt_ops
> which unclear or nonsensical semantics.  

What do you have in mind?  I'm always pro-cleanup.

> Having a more explicit description of the design space would help there.

I agree.  The hot-path pvops (interrupt control, context switch, mmu
updates, etc) are fairly easily defined, but the init time ops are
pretty ad-hoc and often defy simple definition. 

> paravirt_ops is fundamentally misdesigned as a large monolithic
> driverization layer which combines a lot of unrelated things.  In a
> whole lot of cases it directly duplicates driverization layers already
> in the kernel, meaning we take the cost both in cost clarity and
> performance multiple times.

Again, do you have something specific in mind?  We always adopted the
view that we should use an existing abstraction if one is available,
rather than always extending pvops. If a new common layer comes into
existence that subsumes or obsoletes pvops (or can be easily adoped to
do so), then I'm always eager to do that.

>   The patching technology is nice, and it would be good to have that
> available to other platform layers as well, but paravirt_ops as it
> currently sits is going to have to go at some point.

"pvops" as a single thing is a bit of a misnomer these days, in that it
has been devolving into a number of different functional pieces specific
to different problem domains, with the only unifying thing is that they
share the patching machinery.  They're also all controlled by a single
fat CONFIG_PARAVIRT, but someone posted a patch to separate them out
into distinct config options so they could be enabled/disabled
independently as needed, but it seems it was never merged.  I even
remember acking it.

Aside from that, the notion of pvops has been extending into this
broader notion of supporting non-traditional x86 platforms, and indeed
the hooks I'm referring to here are now part of that (or at least tglx
factored them out of the pvops infrastructure at the same time as the
things like timers and the like).  So really what you're complaining
about is that we have lots of indirection and late binding - and yes,
well, there is rather a lot of that in the kernel overall.

    J

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 18:51               ` Jeremy Fitzhardinge
@ 2010-10-27 19:00                 ` H. Peter Anvin
  2010-10-27 19:02                 ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 19:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

On 10/27/2010 11:51 AM, Jeremy Fitzhardinge wrote:
>
> "pvops" as a single thing is a bit of a misnomer these days, in that it
> has been devolving into a number of different functional pieces specific
> to different problem domains, with the only unifying thing is that they
> share the patching machinery.  They're also all controlled by a single
> fat CONFIG_PARAVIRT, but someone posted a patch to separate them out
> into distinct config options so they could be enabled/disabled
> independently as needed, but it seems it was never merged.  I even
> remember acking it.
>
> Aside from that, the notion of pvops has been extending into this
> broader notion of supporting non-traditional x86 platforms, and indeed
> the hooks I'm referring to here are now part of that (or at least tglx
> factored them out of the pvops infrastructure at the same time as the
> things like timers and the like).  So really what you're complaining
> about is that we have lots of indirection and late binding - and yes,
> well, there is rather a lot of that in the kernel overall.
>

tglx's changes are part of the work to clean up and eliminate pvops 
where it is redundant.  However, the pvops machinery has been too rigid 
to be reused, resulting in that we now have something like four 
different patching machineries in the kernel.

But of course, the worst part of pvops is that it is used at a low level 
in a lot of hot paths, and the resulting header files looking like 
someone barfed on a piece of paper, full little inlines pointing "here, 
no over here, no over here."

The indirection and late binding is a problem, especially when the 
indirection layer is badly designed, and encodes design bugs.  Last I 
checked, as an example, there are three paravirt_ops to deal with two 
levels of page tables on non-PAE i386.  One of them (not sure which one) 
doesn't do anything on native, but hell if anyone knows if they are 
actually dead.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 18:51               ` Jeremy Fitzhardinge
  2010-10-27 19:00                 ` H. Peter Anvin
@ 2010-10-27 19:02                 ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 19:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86

One more thing: one thing in particular that is very obnoxious are hooks 
that are null on native hardware, but aren't documented.  That means 
there is no actual reference for their expected behavior.

	-hpa


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-27 17:58           ` Ian Campbell
@ 2010-10-28  9:23             ` Ian Campbell
  2010-10-28 11:20               ` Borislav Petkov
                                 ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-28  9:23 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86

On Wed, 2010-10-27 at 18:58 +0100, Ian Campbell wrote:
> I think we could achieve this within the Xen pvops backend but e.g.
> the following would likely work also.
> 
> (Note, untested. I also omitted the reindent for clarity at this
> point)

Now tested and it works. Patch below.

I'm also going to look into switching Xen to startup on
initial_page_table, since I think the consistency with native would be
useful from a principle of least surprise PoV. However the following is
a useful stopgap if nothing else.

BTW, are there plans to unify 32 and 64 bit WRT the use of
swapper_pg_dir vs initial_page_table?

Ian.

8<-------------------------

>From 9bbce2e6b198cc7efeeebb87131e5b4e9ac923bd Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Thu, 28 Oct 2010 10:16:52 +0100
Subject: [PATCH] x86/32: skip early initialisation of swapper_pg_dir if it is already current

In particular in the Xen PV case the Xen early startup code has
already initialised swapper_pg_dir and switched to it. In this case
initial_page_table is uninitialised since Xen has no use for it so
copying it over swapper_pg_dir is not correct.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/kernel/setup.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 95a3274..2d89d5a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -705,12 +705,14 @@ void __init setup_arch(char **cmdline_p)
 	 * copy kernel address range established so far and switch
 	 * to the proper swapper page table
 	 */
-	clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
-			initial_page_table + KERNEL_PGD_BOUNDARY,
-			KERNEL_PGD_PTRS);
+	if (read_cr3() != __pa(swapper_pg_dir)) {
+		clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+				initial_page_table + KERNEL_PGD_BOUNDARY,
+				KERNEL_PGD_PTRS);
 
-	load_cr3(swapper_pg_dir);
-	__flush_tlb_all();
+		load_cr3(swapper_pg_dir);
+		__flush_tlb_all();
+	}
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif
-- 
1.5.6.5




^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-28  9:23             ` Ian Campbell
@ 2010-10-28 11:20               ` Borislav Petkov
  2010-10-28 11:53                 ` Ian Campbell
  2010-10-28 15:49                 ` H. Peter Anvin
  2010-10-28 15:48               ` H. Peter Anvin
  2010-11-03 15:35               ` Ian Campbell
  2 siblings, 2 replies; 22+ messages in thread
From: Borislav Petkov @ 2010-10-28 11:20 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86

On Thu, Oct 28, 2010 at 10:23:17AM +0100, Ian Campbell wrote:
> Now tested and it works. Patch below.

I'm assuming you tested bare metal too?

> I'm also going to look into switching Xen to startup on
> initial_page_table, since I think the consistency with native would be
> useful from a principle of least surprise PoV. However the following
> is a useful stopgap if nothing else.

Sounds good. Then we can drop this temporary workaround.

> BTW, are there plans to unify 32 and 64 bit WRT the use of
> swapper_pg_dir vs initial_page_table?

To be honest, I thought about it but 64bit does the bootstrapping
differently. I'll look into it one fine day when there's time. (like
there ever is :))

> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Thu, 28 Oct 2010 10:16:52 +0100
> Subject: [PATCH] x86/32: skip early initialisation of swapper_pg_dir if it is already current
> 
> In particular in the Xen PV case the Xen early startup code has
> already initialised swapper_pg_dir and switched to it. In this case
> initial_page_table is uninitialised since Xen has no use for it so
> copying it over swapper_pg_dir is not correct.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>

Acked-by: Borislav Petkov <bp@alien8.de>

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-28 11:20               ` Borislav Petkov
@ 2010-10-28 11:53                 ` Ian Campbell
  2010-10-28 15:49                 ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-28 11:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86

On Thu, 2010-10-28 at 12:20 +0100, Borislav Petkov wrote:
> On Thu, Oct 28, 2010 at 10:23:17AM +0100, Ian Campbell wrote:
> > Now tested and it works. Patch below.
> 
> I'm assuming you tested bare metal too?

Yes, I tried PAE with and without CONFIG_PARAVIRT as well as non-PAE
without CONFIG_PARAVIRT.

Ian.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-28  9:23             ` Ian Campbell
  2010-10-28 11:20               ` Borislav Petkov
@ 2010-10-28 15:48               ` H. Peter Anvin
  2010-11-03 15:35               ` Ian Campbell
  2 siblings, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-28 15:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Borislav Petkov, Jeremy Fitzhardinge, linux-kernel, x86

On 10/28/2010 2:23 AM, Ian Campbell wrote:
>
> BTW, are there plans to unify 32 and 64 bit WRT the use of
> swapper_pg_dir vs initial_page_table?
>

Yes, we're not there yet, though.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-28 11:20               ` Borislav Petkov
  2010-10-28 11:53                 ` Ian Campbell
@ 2010-10-28 15:49                 ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-28 15:49 UTC (permalink / raw)
  To: Borislav Petkov, Ian Campbell, Jeremy Fitzhardinge, linux-kernel, x86

On 10/28/2010 4:20 AM, Borislav Petkov wrote:
>
>> BTW, are there plans to unify 32 and 64 bit WRT the use of
>> swapper_pg_dir vs initial_page_table?
>
> To be honest, I thought about it but 64bit does the bootstrapping
> differently. I'll look into it one fine day when there's time. (like
> there ever is :))
>

The delta between the early 32- and 64-bit code is one of our big 
problems at the moment.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
  2010-10-28  9:23             ` Ian Campbell
  2010-10-28 11:20               ` Borislav Petkov
  2010-10-28 15:48               ` H. Peter Anvin
@ 2010-11-03 15:35               ` Ian Campbell
  2 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-11-03 15:35 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86

On Thu, 2010-10-28 at 10:23 +0100, Ian Campbell wrote:
> I'm also going to look into switching Xen to startup on
> initial_page_table, since I think the consistency with native would be
> useful from a principle of least surprise PoV.

Here it is.

8<---------------

>From 1aa3d8dc160bcee9bd92940853fad509df38d1a2 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Wed, 3 Nov 2010 15:32:21 +0000
Subject: [PATCH] xen: x86/32: perform initial startup on initial_page_table

Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it.  This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.

This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/xen/enlighten.c |    2 -
 arch/x86/xen/mmu.c       |   69 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index bd35549..e977d37 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1198,8 +1198,6 @@ asmlinkage void __init xen_start_kernel(void)
 	/* Allocate and initialize top and mid mfn levels for p2m structure */
 	xen_build_mfn_list_list();
 
-	init_mm.pgd = pgd;
-
 	/* keep using Xen gdt for now; no urgent need to change it */
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index bd2713a..cc507cb 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2133,44 +2133,83 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 	return pgd;
 }
 #else	/* !CONFIG_X86_64 */
-static RESERVE_BRK_ARRAY(pmd_t, level2_kernel_pgt, PTRS_PER_PMD);
+static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
+static RESERVE_BRK_ARRAY(pmd_t, swapper_kernel_pmd, PTRS_PER_PMD);
+
+static __init void xen_write_cr3_init(unsigned long cr3)
+{
+	unsigned long pfn = PFN_DOWN(__pa(swapper_pg_dir));
+
+	BUG_ON(read_cr3() != __pa(initial_page_table));
+	BUG_ON(cr3 != __pa(swapper_pg_dir));
+
+	/*
+	 * We are switching to swapper_pg_dir for the first time (from
+	 * initial_page_table) and therefore need to mark that page
+	 * read-only and then pin it.
+	 *
+	 * Xen disallows sharing of kernel PMDs for PAE
+	 * guests. Therefore we must copy the kernel PMD from
+	 * initial_page_table into a new kernel PMD to be used in
+	 * swapper_pg_dir.
+	 */
+	swapper_kernel_pmd =
+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+	memcpy(swapper_kernel_pmd, initial_kernel_pmd,
+	       sizeof(pmd_t) * PTRS_PER_PMD);
+	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
+		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
+	set_page_prot(swapper_kernel_pmd, PAGE_KERNEL_RO);
+
+	set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO);
+	xen_write_cr3(cr3);
+	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, pfn);
+
+	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE,
+			  PFN_DOWN(__pa(initial_page_table)));
+	set_page_prot(initial_page_table, PAGE_KERNEL);
+	set_page_prot(initial_kernel_pmd, PAGE_KERNEL);
+
+	pv_mmu_ops.write_cr3 = &xen_write_cr3;
+}
 
 __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 					 unsigned long max_pfn)
 {
 	pmd_t *kernel_pmd;
 
-	level2_kernel_pgt = extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+	initial_kernel_pmd =
+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
 
 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
 				  512*1024);
 
 	kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
-	memcpy(level2_kernel_pgt, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
+	memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
 
-	xen_map_identity_early(level2_kernel_pgt, max_pfn);
+	xen_map_identity_early(initial_kernel_pmd, max_pfn);
 
-	memcpy(swapper_pg_dir, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
-	set_pgd(&swapper_pg_dir[KERNEL_PGD_BOUNDARY],
-			__pgd(__pa(level2_kernel_pgt) | _PAGE_PRESENT));
+	memcpy(initial_page_table, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
+	initial_page_table[KERNEL_PGD_BOUNDARY] =
+		__pgd(__pa(initial_kernel_pmd) | _PAGE_PRESENT);
 
-	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO);
+	set_page_prot(initial_kernel_pmd, PAGE_KERNEL_RO);
+	set_page_prot(initial_page_table, PAGE_KERNEL_RO);
 	set_page_prot(empty_zero_page, PAGE_KERNEL_RO);
 
 	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
 
-	xen_write_cr3(__pa(swapper_pg_dir));
-
-	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
+	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE,
+			  PFN_DOWN(__pa(initial_page_table)));
+	xen_write_cr3(__pa(initial_page_table));
 
 	memblock_x86_reserve_range(__pa(xen_start_info->pt_base),
 		      __pa(xen_start_info->pt_base +
 			   xen_start_info->nr_pt_frames * PAGE_SIZE),
 		      "XEN PAGETABLES");
 
-	return swapper_pg_dir;
+	return initial_page_table;
 }
 #endif	/* CONFIG_X86_64 */
 
@@ -2304,7 +2343,11 @@ static const struct pv_mmu_ops xen_mmu_ops __initdata = {
 	.write_cr2 = xen_write_cr2,
 
 	.read_cr3 = xen_read_cr3,
+#ifdef CONFIG_X86_32
+	.write_cr3 = xen_write_cr3_init,
+#else
 	.write_cr3 = xen_write_cr3,
+#endif
 
 	.flush_tlb_user = xen_flush_tlb,
 	.flush_tlb_kernel = xen_flush_tlb,
-- 
1.5.6.5




-- 
Ian Campbell

Booze is the answer.  I don't remember the question.


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-11-03 15:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-27  8:50 [PATCH] x86: use pgd accessors when cloning a pgd range Ian Campbell
2010-10-27 10:40 ` Borislav Petkov
2010-10-27 16:50   ` Jeremy Fitzhardinge
2010-10-27 17:18     ` H. Peter Anvin
2010-10-27 17:31       ` Jeremy Fitzhardinge
2010-10-27 17:42         ` H. Peter Anvin
2010-10-27 17:51           ` Ian Campbell
2010-10-27 17:51           ` Jeremy Fitzhardinge
2010-10-27 18:02             ` Ian Campbell
2010-10-27 18:11             ` H. Peter Anvin
2010-10-27 18:51               ` Jeremy Fitzhardinge
2010-10-27 19:00                 ` H. Peter Anvin
2010-10-27 19:02                 ` H. Peter Anvin
2010-10-27 17:44         ` Borislav Petkov
2010-10-27 17:54           ` Jeremy Fitzhardinge
2010-10-27 17:58           ` Ian Campbell
2010-10-28  9:23             ` Ian Campbell
2010-10-28 11:20               ` Borislav Petkov
2010-10-28 11:53                 ` Ian Campbell
2010-10-28 15:49                 ` H. Peter Anvin
2010-10-28 15:48               ` H. Peter Anvin
2010-11-03 15:35               ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.