* [PATCH] x86: use pgd accessors when cloning a pgd range.
@ 2010-10-27 8:50 Ian Campbell
2010-10-27 10:40 ` Borislav Petkov
0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 8:50 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Ian Campbell, Borislav Petkov, H. Peter Anvin, Jeremy Fitzhardinge
Page tables should always be updated using the proper accessor
methods. Not doing so bypasses the paravirt infrastructure.
In this case the failure to do so was exposed under Xen by
b40827fa7268 "x86-32, mm: Add an initial page table for core
bootstrapping".
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
---
arch/x86/include/asm/pgtable.h | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ada823a..0b4c514 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -619,7 +619,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ int i;
+
+ for (i=0; i<count; i++)
+ set_pgd(&dst[i], src[i]);
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 8:50 [PATCH] x86: use pgd accessors when cloning a pgd range Ian Campbell
@ 2010-10-27 10:40 ` Borislav Petkov
2010-10-27 16:50 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2010-10-27 10:40 UTC (permalink / raw)
To: Ian Campbell; +Cc: linux-kernel, x86, H. Peter Anvin, Jeremy Fitzhardinge
On Wed, Oct 27, 2010 at 09:50:13AM +0100, Ian Campbell wrote:
> Page tables should always be updated using the proper accessor
> methods. Not doing so bypasses the paravirt infrastructure.
>
> In this case the failure to do so was exposed under Xen by
> b40827fa7268 "x86-32, mm: Add an initial page table for core
> bootstrapping".
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> ---
> arch/x86/include/asm/pgtable.h | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index ada823a..0b4c514 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -619,7 +619,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
> */
> static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
> {
> - memcpy(dst, src, count * sizeof(pgd_t));
> + int i;
> +
> + for (i=0; i<count; i++)
> + set_pgd(&dst[i], src[i]);
Hmm, this slows down clone_pgd_range(). It is called at 3 sites total,
two of which happen only on boot in setup_arch() so they can be ignored
but the callchain
copy_process()
...
mm_init()
|->mm_alloc_pgd()
|->pgd_alloc()
|->pgd_ctor()
|->clone_pgd_range()
could become noticeable. To be on the safe side, I'd make
clone_pgd_range() a macro calling either the native or the xen version..
hpa?
--
Regards/Gruss,
Boris.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 10:40 ` Borislav Petkov
@ 2010-10-27 16:50 ` Jeremy Fitzhardinge
2010-10-27 17:18 ` H. Peter Anvin
0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 16:50 UTC (permalink / raw)
To: Borislav Petkov, Ian Campbell, linux-kernel, x86, H. Peter Anvin
On 10/27/2010 03:40 AM, Borislav Petkov wrote:
> On Wed, Oct 27, 2010 at 09:50:13AM +0100, Ian Campbell wrote:
>> Page tables should always be updated using the proper accessor
>> methods. Not doing so bypasses the paravirt infrastructure.
>>
>> In this case the failure to do so was exposed under Xen by
>> b40827fa7268 "x86-32, mm: Add an initial page table for core
>> bootstrapping".
>>
>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: H. Peter Anvin <hpa@linux.intel.com>
>> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
>> ---
>> arch/x86/include/asm/pgtable.h | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> index ada823a..0b4c514 100644
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -619,7 +619,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
>> */
>> static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
>> {
>> - memcpy(dst, src, count * sizeof(pgd_t));
>> + int i;
>> +
>> + for (i=0; i<count; i++)
>> + set_pgd(&dst[i], src[i]);
> Hmm, this slows down clone_pgd_range(). It is called at 3 sites total,
> two of which happen only on boot in setup_arch() so they can be ignored
> but the callchain
>
> copy_process()
> ...
> mm_init()
> |->mm_alloc_pgd()
> |->pgd_alloc()
> |->pgd_ctor()
> |->clone_pgd_range()
>
> could become noticeable. To be on the safe side, I'd make
> clone_pgd_range() a macro calling either the native or the xen version..
Frankly I'd want to see some numbers before getting too worried about
it; if it were a problem we could make the native set_pgd inlined into
the callside (it is just a memory write after all), which will be very
similar to memcpy in performance.
I am, however, more concerned about the effect on performance under
Xen. xen_set_pgd will avoid doing a hypercall in this case (the
pagetable isn't yet pinned), but it has to do a moderate amount of work
to avoid doing the hypercall, and could really add some measurable
latency to process creation (which is not something we need right now).
For that a clone_pgd_range() hypercall is the most straightforward
answer, but I'm loathe to propose that right now.
This never used to be a problem. Perhaps we can change how
clone_pgd_range is used at boot time to avoid it in the Xen case (since
we don't care about the secondary pagetable)?
J
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 16:50 ` Jeremy Fitzhardinge
@ 2010-10-27 17:18 ` H. Peter Anvin
2010-10-27 17:31 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 17:18 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>
> This never used to be a problem. Perhaps we can change how
> clone_pgd_range is used at boot time to avoid it in the Xen case (since
> we don't care about the secondary pagetable)?
>
Xen shouldn't have any users of this, since it's used for low-level
operations like SMP bootstrap, suspend to RAM, reboot and low-level BIOS
functionality.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:18 ` H. Peter Anvin
@ 2010-10-27 17:31 ` Jeremy Fitzhardinge
2010-10-27 17:42 ` H. Peter Anvin
2010-10-27 17:44 ` Borislav Petkov
0 siblings, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 17:31 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>>
>> This never used to be a problem. Perhaps we can change how
>> clone_pgd_range is used at boot time to avoid it in the Xen case (since
>> we don't care about the secondary pagetable)?
>>
>
> Xen shouldn't have any users of this, since it's used for low-level
> operations like SMP bootstrap, suspend to RAM, reboot and low-level
> BIOS functionality.
>
Right, but it is being called smack in the middle of setup_arch(). It
looks like they could be hidden away in
native_pagetable_setup_start/done though.
J
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:31 ` Jeremy Fitzhardinge
@ 2010-10-27 17:42 ` H. Peter Anvin
2010-10-27 17:51 ` Ian Campbell
2010-10-27 17:51 ` Jeremy Fitzhardinge
2010-10-27 17:44 ` Borislav Petkov
1 sibling, 2 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 17:42 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 10:31 AM, Jeremy Fitzhardinge wrote:
> On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
>> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>>>
>>> This never used to be a problem. Perhaps we can change how
>>> clone_pgd_range is used at boot time to avoid it in the Xen case (since
>>> we don't care about the secondary pagetable)?
>>>
>>
>> Xen shouldn't have any users of this, since it's used for low-level
>> operations like SMP bootstrap, suspend to RAM, reboot and low-level
>> BIOS functionality.
>>
>
> Right, but it is being called smack in the middle of setup_arch(). It
> looks like they could be hidden away in
> native_pagetable_setup_start/done though.
>
This is what makes me absolutely hate paravirt with a passion... "let's
hid things away in <obscure place> and make it absolutely impossible to
either follow the code flow or figure out what the intended semantics
are supposed to be." (Let not even get me started on how ill-defined
the semantics of some of the paravirt operations are.) In this case, at
the most you need a single flag of state... or you could even just
ignore this low-level data structure that you will never use in the
first place. Ian's message just mentioned "a failure" and never
described in any way what kind of "failure" it was.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:31 ` Jeremy Fitzhardinge
2010-10-27 17:42 ` H. Peter Anvin
@ 2010-10-27 17:44 ` Borislav Petkov
2010-10-27 17:54 ` Jeremy Fitzhardinge
2010-10-27 17:58 ` Ian Campbell
1 sibling, 2 replies; 22+ messages in thread
From: Borislav Petkov @ 2010-10-27 17:44 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: H. Peter Anvin, Ian Campbell, linux-kernel, x86
On Wed, Oct 27, 2010 at 10:31:37AM -0700, Jeremy Fitzhardinge wrote:
> > Xen shouldn't have any users of this, since it's used for low-level
> > operations like SMP bootstrap, suspend to RAM, reboot and low-level
> > BIOS functionality.
> >
> Right, but it is being called smack in the middle of setup_arch(). It
> looks like they could be hidden away in
> native_pagetable_setup_start/done though.
I think I can put the second sync-back clone_pgd_range in
native_pagetable_setup_done but the first one needs to take place right
at the beginning of setup_arch() because a lot of code in-between relies
on swapper_pg_dir containing proper entries.
Just to make sure I understand you correctly: best it would be for xen
to not have the copy_pgd_range() calls in setup_arch, correct?
--
Regards/Gruss,
Boris.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:42 ` H. Peter Anvin
@ 2010-10-27 17:51 ` Ian Campbell
2010-10-27 17:51 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 17:51 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Jeremy Fitzhardinge, Borislav Petkov, linux-kernel, x86
On Wed, 2010-10-27 at 18:42 +0100, H. Peter Anvin wrote:
> On 10/27/2010 10:31 AM, Jeremy Fitzhardinge wrote:
> > On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
> >> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
> >>>
> >>> This never used to be a problem. Perhaps we can change how
> >>> clone_pgd_range is used at boot time to avoid it in the Xen case (since
> >>> we don't care about the secondary pagetable)?
> >>>
> >>
> >> Xen shouldn't have any users of this, since it's used for low-level
> >> operations like SMP bootstrap, suspend to RAM, reboot and low-level
> >> BIOS functionality.
> >>
> >
> > Right, but it is being called smack in the middle of setup_arch(). It
> > looks like they could be hidden away in
> > native_pagetable_setup_start/done though.
> >
>
> This is what makes me absolutely hate paravirt with a passion... "let's
> hid things away in <obscure place> and make it absolutely impossible to
> either follow the code flow or figure out what the intended semantics
> are supposed to be." (Let not even get me started on how ill-defined
> the semantics of some of the paravirt operations are.) In this case, at
> the most you need a single flag of state... or you could even just
> ignore this low-level data structure that you will never use in the
> first place. Ian's message just mentioned "a failure" and never
> described in any way what kind of "failure" it was.
That was vague of me, sorry about that. The failure was Xen shooting the
guest due to an attempt to write to swapper_pg_dir directly, which isn't
allowed for active pagetables under Xen.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:42 ` H. Peter Anvin
2010-10-27 17:51 ` Ian Campbell
@ 2010-10-27 17:51 ` Jeremy Fitzhardinge
2010-10-27 18:02 ` Ian Campbell
2010-10-27 18:11 ` H. Peter Anvin
1 sibling, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 17:51 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 10:42 AM, H. Peter Anvin wrote:
> On 10/27/2010 10:31 AM, Jeremy Fitzhardinge wrote:
>> On 10/27/2010 10:18 AM, H. Peter Anvin wrote:
>>> On 10/27/2010 9:50 AM, Jeremy Fitzhardinge wrote:
>>>>
>>>> This never used to be a problem. Perhaps we can change how
>>>> clone_pgd_range is used at boot time to avoid it in the Xen case
>>>> (since
>>>> we don't care about the secondary pagetable)?
>>>>
>>>
>>> Xen shouldn't have any users of this, since it's used for low-level
>>> operations like SMP bootstrap, suspend to RAM, reboot and low-level
>>> BIOS functionality.
>>>
>>
>> Right, but it is being called smack in the middle of setup_arch(). It
>> looks like they could be hidden away in
>> native_pagetable_setup_start/done though.
>>
>
> This is what makes me absolutely hate paravirt with a passion...
> "let's hid things away in <obscure place> and make it absolutely
> impossible to either follow the code flow or figure out what the
> intended semantics are supposed to be."
Its not really an obscure place; it's where x86-32 does the rest of its
boot-time pagetable adjustments (like cleaning out the low identity
maps, etc). Having those clone_pgd_ranges() floating around in
setup_arch() is out of place.
> (Let not even get me started on how ill-defined the semantics of some
> of the paravirt operations are.) In this case, at the most you need a
> single flag of state... or you could even just ignore this low-level
> data structure that you will never use in the first place. Ian's
> message just mentioned "a failure" and never described in any way what
> kind of "failure" it was.
It would be a pagefault from Xen preventing a direct write to the pgd
level of an active pagetable. At the point in setup_arch() where it
does the first clone_pgd_range() we're already running on swapper_pg_dir
and the copy from initial_page_table is outright wrong.
As Ian suggests, we could switch Xen to use initial_page_table at boot
then move to swapper_pg_dir in the same way native does.
J
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:44 ` Borislav Petkov
@ 2010-10-27 17:54 ` Jeremy Fitzhardinge
2010-10-27 17:58 ` Ian Campbell
1 sibling, 0 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 17:54 UTC (permalink / raw)
To: Borislav Petkov, H. Peter Anvin, Ian Campbell, linux-kernel, x86
On 10/27/2010 10:44 AM, Borislav Petkov wrote:
> On Wed, Oct 27, 2010 at 10:31:37AM -0700, Jeremy Fitzhardinge wrote:
>>> Xen shouldn't have any users of this, since it's used for low-level
>>> operations like SMP bootstrap, suspend to RAM, reboot and low-level
>>> BIOS functionality.
>>>
>> Right, but it is being called smack in the middle of setup_arch(). It
>> looks like they could be hidden away in
>> native_pagetable_setup_start/done though.
> I think I can put the second sync-back clone_pgd_range in
> native_pagetable_setup_done but the first one needs to take place right
> at the beginning of setup_arch() because a lot of code in-between relies
> on swapper_pg_dir containing proper entries.
OK.
> Just to make sure I understand you correctly: best it would be for xen
> to not have the copy_pgd_range() calls in setup_arch, correct?
Right. Could you put the first copy into swapper_pg_dir earlier, before
setup_arch() somewhere that's specific to booting via head_32.S?
J
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:44 ` Borislav Petkov
2010-10-27 17:54 ` Jeremy Fitzhardinge
@ 2010-10-27 17:58 ` Ian Campbell
2010-10-28 9:23 ` Ian Campbell
1 sibling, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 17:58 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86
On Wed, 2010-10-27 at 18:44 +0100, Borislav Petkov wrote:
> On Wed, Oct 27, 2010 at 10:31:37AM -0700, Jeremy Fitzhardinge wrote:
> > > Xen shouldn't have any users of this, since it's used for low-level
> > > operations like SMP bootstrap, suspend to RAM, reboot and low-level
> > > BIOS functionality.
> > >
> > Right, but it is being called smack in the middle of setup_arch(). It
> > looks like they could be hidden away in
> > native_pagetable_setup_start/done though.
>
> I think I can put the second sync-back clone_pgd_range in
> native_pagetable_setup_done but the first one needs to take place right
> at the beginning of setup_arch() because a lot of code in-between relies
> on swapper_pg_dir containing proper entries.
>
> Just to make sure I understand you correctly: best it would be for xen
> to not have the copy_pgd_range() calls in setup_arch, correct?
Yes, or to ensure that swapper_pg_dir has not yet been marked as a
pagetable at this point. Under Xen we are running on swapper_pg_dir when
setup_arch is called.
I think we could achieve this within the Xen pvops backend but e.g. the
following would likely work also.
(Note, untested. I also omitted the reindent for clarity at this point)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 95a3274..2d89d5a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -705,12 +705,14 @@ void __init setup_arch(char **cmdline_p)
* copy kernel address range established so far and switch
* to the proper swapper page table
*/
+ if (read_cr3() != __pa(swapper_pg_dir)) {
clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
initial_page_table + KERNEL_PGD_BOUNDARY,
KERNEL_PGD_PTRS);
load_cr3(swapper_pg_dir);
__flush_tlb_all();
+ }
#else
printk(KERN_INFO "Command line: %s\n", boot_command_line);
#endif
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:51 ` Jeremy Fitzhardinge
@ 2010-10-27 18:02 ` Ian Campbell
2010-10-27 18:11 ` H. Peter Anvin
1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-27 18:02 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: H. Peter Anvin, Borislav Petkov, linux-kernel, x86
On Wed, 2010-10-27 at 18:51 +0100, Jeremy Fitzhardinge wrote:
> On 10/27/2010 10:42 AM, H. Peter Anvin wrote:
> > Ian's message just mentioned "a failure" and never described in any way what
> > kind of "failure" it was.
>
> It would be a pagefault from Xen preventing a direct write to the pgd
> level of an active pagetable. At the point in setup_arch() where it
> does the first clone_pgd_range() we're already running on swapper_pg_dir
> and the copy from initial_page_table is outright wrong.
I'd missed that aspect, yes the contents of initial_page_table are
wrong. I'm not sure now how my patch to clone_pgd_range even made a
difference...
> As Ian suggests, we could switch Xen to use initial_page_table at boot
> then move to swapper_pg_dir in the same way native does.
Accidentally did that in private mail, for everyone else the gory
details are:
> xen_setup_kernel_pagetable operates on initial_page_table instead of
> swapper_pg_dir. We do not pin initial_page_table apart from the
> implicit
> one from writing it to cr3 (it is necessarily r/o though).
>
> So we enter setup_arch running on initial_page_table and with
> swapper_pg_dir mapped r/w, which is how it looks on native too. So at
> this point the:
> clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> initial_page_table + KERNEL_PGD_BOUNDARY,
> KERNEL_PGD_PTRS);
> works just fine and we reach:
> load_cr3(swapper_pg_dir);
>
> Then in __xen_write_cr3 we notice the first attempt to switch to
> swapper_pg_dir and only at that point do we make it r/o and pin it.
> When
> we then actually do the switch to swapper_pg_dir that releases the
> implicit pin on initial_page_table so we can make it r/o again.
>
> The later on we reach the:
> clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
> swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> KERNEL_PGD_PTRS);
> which will succeed because initial_page_table is r/w again.
>
> We don't care about the pin on initial_page_table because we should
> never need it again.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:51 ` Jeremy Fitzhardinge
2010-10-27 18:02 ` Ian Campbell
@ 2010-10-27 18:11 ` H. Peter Anvin
2010-10-27 18:51 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 18:11 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 10:51 AM, Jeremy Fitzhardinge wrote:
>>
>> This is what makes me absolutely hate paravirt with a passion...
>> "let's hid things away in<obscure place> and make it absolutely
>> impossible to either follow the code flow or figure out what the
>> intended semantics are supposed to be."
>
> Its not really an obscure place; it's where x86-32 does the rest of its
> boot-time pagetable adjustments (like cleaning out the low identity
> maps, etc). Having those clone_pgd_ranges() floating around in
> setup_arch() is out of place.
>
"Cleaning out the low identity maps" is part of what this patchset
eliminates. This is exactly a good reason why paravirt_ops damages the
kernel -- it makes it impossible to make forward process.
>> (Let not even get me started on how ill-defined the semantics of some
>> of the paravirt operations are.) In this case, at the most you need a
>> single flag of state... or you could even just ignore this low-level
>> data structure that you will never use in the first place. Ian's
>> message just mentioned "a failure" and never described in any way what
>> kind of "failure" it was.
>
> It would be a pagefault from Xen preventing a direct write to the pgd
> level of an active pagetable. At the point in setup_arch() where it
> does the first clone_pgd_range() we're already running on swapper_pg_dir
> and the copy from initial_page_table is outright wrong.
>
> As Ian suggests, we could switch Xen to use initial_page_table at boot
> then move to swapper_pg_dir in the same way native does.
Once the failure was explained, it makes more sense. Either that or
just skip this setting if we're already running on swapper_pg_dir.
Let me state this clearly: if Xen is going to continue to live as a
merged platform, it has to have an obligation to follow changes on the
native platform. This is not unique to Xen, but rather a universal rule
for integrated platforms. Xen is more widely used than a lot of the
other minority platforms, which means it legitimately gets allowed more
slack, but that is moderated by its tremendous invasiveness.
Quite frankly, the single biggest thing you could improve is to improve
documentation about what you expect in terms of semantics of various
entry points. There are a number of cleanups which we currently cannot
do because they are directly mapped to paravirt_ops which unclear or
nonsensical semantics. Having a more explicit description of the design
space would help there.
paravirt_ops is fundamentally misdesigned as a large monolithic
driverization layer which combines a lot of unrelated things. In a
whole lot of cases it directly duplicates driverization layers already
in the kernel, meaning we take the cost both in cost clarity and
performance multiple times. The patching technology is nice, and it
would be good to have that available to other platform layers as well,
but paravirt_ops as it currently sits is going to have to go at some point.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 18:11 ` H. Peter Anvin
@ 2010-10-27 18:51 ` Jeremy Fitzhardinge
2010-10-27 19:00 ` H. Peter Anvin
2010-10-27 19:02 ` H. Peter Anvin
0 siblings, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 18:51 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 11:11 AM, H. Peter Anvin wrote:
> On 10/27/2010 10:51 AM, Jeremy Fitzhardinge wrote:
>>>
>>> This is what makes me absolutely hate paravirt with a passion...
>>> "let's hid things away in<obscure place> and make it absolutely
>>> impossible to either follow the code flow or figure out what the
>>> intended semantics are supposed to be."
>>
>> Its not really an obscure place; it's where x86-32 does the rest of its
>> boot-time pagetable adjustments (like cleaning out the low identity
>> maps, etc). Having those clone_pgd_ranges() floating around in
>> setup_arch() is out of place.
>>
>
> "Cleaning out the low identity maps" is part of what this patchset
> eliminates.
Sorry, I didn't look closely enough; its actually removing mappings
beyond the end of physical memory (though I'm not sure why it is 32-bit
only?).
> This is exactly a good reason why paravirt_ops damages the kernel --
> it makes it impossible to make forward process.
I don't follow. Why is it impossible to make forward progress? How
specifically does pvops make it impossible?
>> It would be a pagefault from Xen preventing a direct write to the pgd
>> level of an active pagetable. At the point in setup_arch() where it
>> does the first clone_pgd_range() we're already running on swapper_pg_dir
>> and the copy from initial_page_table is outright wrong.
>>
>> As Ian suggests, we could switch Xen to use initial_page_table at boot
>> then move to swapper_pg_dir in the same way native does.
>
> Once the failure was explained, it makes more sense. Either that or
> just skip this setting if we're already running on swapper_pg_dir.
Yes, that's probably the simplest answer (Ian just proposed it
independently).
> Let me state this clearly: if Xen is going to continue to live as a
> merged platform, it has to have an obligation to follow changes on the
> native platform. This is not unique to Xen, but rather a universal
> rule for integrated platforms. Xen is more widely used than a lot of
> the other minority platforms, which means it legitimately gets allowed
> more slack, but that is moderated by its tremendous invasiveness.
>
> Quite frankly, the single biggest thing you could improve is to
> improve documentation about what you expect in terms of semantics of
> various entry points. There are a number of cleanups which we
> currently cannot do because they are directly mapped to paravirt_ops
> which unclear or nonsensical semantics.
What do you have in mind? I'm always pro-cleanup.
> Having a more explicit description of the design space would help there.
I agree. The hot-path pvops (interrupt control, context switch, mmu
updates, etc) are fairly easily defined, but the init time ops are
pretty ad-hoc and often defy simple definition.
> paravirt_ops is fundamentally misdesigned as a large monolithic
> driverization layer which combines a lot of unrelated things. In a
> whole lot of cases it directly duplicates driverization layers already
> in the kernel, meaning we take the cost both in cost clarity and
> performance multiple times.
Again, do you have something specific in mind? We always adopted the
view that we should use an existing abstraction if one is available,
rather than always extending pvops. If a new common layer comes into
existence that subsumes or obsoletes pvops (or can be easily adoped to
do so), then I'm always eager to do that.
> The patching technology is nice, and it would be good to have that
> available to other platform layers as well, but paravirt_ops as it
> currently sits is going to have to go at some point.
"pvops" as a single thing is a bit of a misnomer these days, in that it
has been devolving into a number of different functional pieces specific
to different problem domains, with the only unifying thing is that they
share the patching machinery. They're also all controlled by a single
fat CONFIG_PARAVIRT, but someone posted a patch to separate them out
into distinct config options so they could be enabled/disabled
independently as needed, but it seems it was never merged. I even
remember acking it.
Aside from that, the notion of pvops has been extending into this
broader notion of supporting non-traditional x86 platforms, and indeed
the hooks I'm referring to here are now part of that (or at least tglx
factored them out of the pvops infrastructure at the same time as the
things like timers and the like). So really what you're complaining
about is that we have lots of indirection and late binding - and yes,
well, there is rather a lot of that in the kernel overall.
J
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 18:51 ` Jeremy Fitzhardinge
@ 2010-10-27 19:00 ` H. Peter Anvin
2010-10-27 19:02 ` H. Peter Anvin
1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 19:00 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
On 10/27/2010 11:51 AM, Jeremy Fitzhardinge wrote:
>
> "pvops" as a single thing is a bit of a misnomer these days, in that it
> has been devolving into a number of different functional pieces specific
> to different problem domains, with the only unifying thing is that they
> share the patching machinery. They're also all controlled by a single
> fat CONFIG_PARAVIRT, but someone posted a patch to separate them out
> into distinct config options so they could be enabled/disabled
> independently as needed, but it seems it was never merged. I even
> remember acking it.
>
> Aside from that, the notion of pvops has been extending into this
> broader notion of supporting non-traditional x86 platforms, and indeed
> the hooks I'm referring to here are now part of that (or at least tglx
> factored them out of the pvops infrastructure at the same time as the
> things like timers and the like). So really what you're complaining
> about is that we have lots of indirection and late binding - and yes,
> well, there is rather a lot of that in the kernel overall.
>
tglx's changes are part of the work to clean up and eliminate pvops
where it is redundant. However, the pvops machinery has been too rigid
to be reused, resulting in that we now have something like four
different patching machineries in the kernel.
But of course, the worst part of pvops is that it is used at a low level
in a lot of hot paths, and the resulting header files looking like
someone barfed on a piece of paper, full little inlines pointing "here,
no over here, no over here."
The indirection and late binding is a problem, especially when the
indirection layer is badly designed, and encodes design bugs. Last I
checked, as an example, there are three paravirt_ops to deal with two
levels of page tables on non-PAE i386. One of them (not sure which one)
doesn't do anything on native, but hell if anyone knows if they are
actually dead.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 18:51 ` Jeremy Fitzhardinge
2010-10-27 19:00 ` H. Peter Anvin
@ 2010-10-27 19:02 ` H. Peter Anvin
1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-27 19:02 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Borislav Petkov, Ian Campbell, linux-kernel, x86
One more thing: one thing in particular that is very obnoxious are hooks
that are null on native hardware, but aren't documented. That means
there is no actual reference for their expected behavior.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-27 17:58 ` Ian Campbell
@ 2010-10-28 9:23 ` Ian Campbell
2010-10-28 11:20 ` Borislav Petkov
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-28 9:23 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86
On Wed, 2010-10-27 at 18:58 +0100, Ian Campbell wrote:
> I think we could achieve this within the Xen pvops backend but e.g.
> the following would likely work also.
>
> (Note, untested. I also omitted the reindent for clarity at this
> point)
Now tested and it works. Patch below.
I'm also going to look into switching Xen to startup on
initial_page_table, since I think the consistency with native would be
useful from a principle of least surprise PoV. However the following is
a useful stopgap if nothing else.
BTW, are there plans to unify 32 and 64 bit WRT the use of
swapper_pg_dir vs initial_page_table?
Ian.
8<-------------------------
>From 9bbce2e6b198cc7efeeebb87131e5b4e9ac923bd Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Thu, 28 Oct 2010 10:16:52 +0100
Subject: [PATCH] x86/32: skip early initialisation of swapper_pg_dir if it is already current
In particular in the Xen PV case the Xen early startup code has
already initialised swapper_pg_dir and switched to it. In this case
initial_page_table is uninitialised since Xen has no use for it so
copying it over swapper_pg_dir is not correct.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
---
arch/x86/kernel/setup.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 95a3274..2d89d5a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -705,12 +705,14 @@ void __init setup_arch(char **cmdline_p)
* copy kernel address range established so far and switch
* to the proper swapper page table
*/
- clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- initial_page_table + KERNEL_PGD_BOUNDARY,
- KERNEL_PGD_PTRS);
+ if (read_cr3() != __pa(swapper_pg_dir)) {
+ clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+ initial_page_table + KERNEL_PGD_BOUNDARY,
+ KERNEL_PGD_PTRS);
- load_cr3(swapper_pg_dir);
- __flush_tlb_all();
+ load_cr3(swapper_pg_dir);
+ __flush_tlb_all();
+ }
#else
printk(KERN_INFO "Command line: %s\n", boot_command_line);
#endif
--
1.5.6.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-28 9:23 ` Ian Campbell
@ 2010-10-28 11:20 ` Borislav Petkov
2010-10-28 11:53 ` Ian Campbell
2010-10-28 15:49 ` H. Peter Anvin
2010-10-28 15:48 ` H. Peter Anvin
2010-11-03 15:35 ` Ian Campbell
2 siblings, 2 replies; 22+ messages in thread
From: Borislav Petkov @ 2010-10-28 11:20 UTC (permalink / raw)
To: Ian Campbell; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86
On Thu, Oct 28, 2010 at 10:23:17AM +0100, Ian Campbell wrote:
> Now tested and it works. Patch below.
I'm assuming you tested bare metal too?
> I'm also going to look into switching Xen to startup on
> initial_page_table, since I think the consistency with native would be
> useful from a principle of least surprise PoV. However the following
> is a useful stopgap if nothing else.
Sounds good. Then we can drop this temporary workaround.
> BTW, are there plans to unify 32 and 64 bit WRT the use of
> swapper_pg_dir vs initial_page_table?
To be honest, I thought about it but 64bit does the bootstrapping
differently. I'll look into it one fine day when there's time. (like
there ever is :))
> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Thu, 28 Oct 2010 10:16:52 +0100
> Subject: [PATCH] x86/32: skip early initialisation of swapper_pg_dir if it is already current
>
> In particular in the Xen PV case the Xen early startup code has
> already initialised swapper_pg_dir and switched to it. In this case
> initial_page_table is uninitialised since Xen has no use for it so
> copying it over swapper_pg_dir is not correct.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Borislav Petkov <bp@alien8.de>
--
Regards/Gruss,
Boris.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-28 11:20 ` Borislav Petkov
@ 2010-10-28 11:53 ` Ian Campbell
2010-10-28 15:49 ` H. Peter Anvin
1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-10-28 11:53 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86
On Thu, 2010-10-28 at 12:20 +0100, Borislav Petkov wrote:
> On Thu, Oct 28, 2010 at 10:23:17AM +0100, Ian Campbell wrote:
> > Now tested and it works. Patch below.
>
> I'm assuming you tested bare metal too?
Yes, I tried PAE with and without CONFIG_PARAVIRT as well as non-PAE
without CONFIG_PARAVIRT.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-28 9:23 ` Ian Campbell
2010-10-28 11:20 ` Borislav Petkov
@ 2010-10-28 15:48 ` H. Peter Anvin
2010-11-03 15:35 ` Ian Campbell
2 siblings, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-28 15:48 UTC (permalink / raw)
To: Ian Campbell; +Cc: Borislav Petkov, Jeremy Fitzhardinge, linux-kernel, x86
On 10/28/2010 2:23 AM, Ian Campbell wrote:
>
> BTW, are there plans to unify 32 and 64 bit WRT the use of
> swapper_pg_dir vs initial_page_table?
>
Yes, we're not there yet, though.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-28 11:20 ` Borislav Petkov
2010-10-28 11:53 ` Ian Campbell
@ 2010-10-28 15:49 ` H. Peter Anvin
1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-10-28 15:49 UTC (permalink / raw)
To: Borislav Petkov, Ian Campbell, Jeremy Fitzhardinge, linux-kernel, x86
On 10/28/2010 4:20 AM, Borislav Petkov wrote:
>
>> BTW, are there plans to unify 32 and 64 bit WRT the use of
>> swapper_pg_dir vs initial_page_table?
>
> To be honest, I thought about it but 64bit does the bootstrapping
> differently. I'll look into it one fine day when there's time. (like
> there ever is :))
>
The delta between the early 32- and 64-bit code is one of our big
problems at the moment.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86: use pgd accessors when cloning a pgd range.
2010-10-28 9:23 ` Ian Campbell
2010-10-28 11:20 ` Borislav Petkov
2010-10-28 15:48 ` H. Peter Anvin
@ 2010-11-03 15:35 ` Ian Campbell
2 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2010-11-03 15:35 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, linux-kernel, x86
On Thu, 2010-10-28 at 10:23 +0100, Ian Campbell wrote:
> I'm also going to look into switching Xen to startup on
> initial_page_table, since I think the consistency with native would be
> useful from a principle of least surprise PoV.
Here it is.
8<---------------
>From 1aa3d8dc160bcee9bd92940853fad509df38d1a2 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Wed, 3 Nov 2010 15:32:21 +0000
Subject: [PATCH] xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it. This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.
This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
---
arch/x86/xen/enlighten.c | 2 -
arch/x86/xen/mmu.c | 69 +++++++++++++++++++++++++++++++++++++--------
2 files changed, 56 insertions(+), 15 deletions(-)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index bd35549..e977d37 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1198,8 +1198,6 @@ asmlinkage void __init xen_start_kernel(void)
/* Allocate and initialize top and mid mfn levels for p2m structure */
xen_build_mfn_list_list();
- init_mm.pgd = pgd;
-
/* keep using Xen gdt for now; no urgent need to change it */
#ifdef CONFIG_X86_32
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index bd2713a..cc507cb 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2133,44 +2133,83 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
return pgd;
}
#else /* !CONFIG_X86_64 */
-static RESERVE_BRK_ARRAY(pmd_t, level2_kernel_pgt, PTRS_PER_PMD);
+static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
+static RESERVE_BRK_ARRAY(pmd_t, swapper_kernel_pmd, PTRS_PER_PMD);
+
+static __init void xen_write_cr3_init(unsigned long cr3)
+{
+ unsigned long pfn = PFN_DOWN(__pa(swapper_pg_dir));
+
+ BUG_ON(read_cr3() != __pa(initial_page_table));
+ BUG_ON(cr3 != __pa(swapper_pg_dir));
+
+ /*
+ * We are switching to swapper_pg_dir for the first time (from
+ * initial_page_table) and therefore need to mark that page
+ * read-only and then pin it.
+ *
+ * Xen disallows sharing of kernel PMDs for PAE
+ * guests. Therefore we must copy the kernel PMD from
+ * initial_page_table into a new kernel PMD to be used in
+ * swapper_pg_dir.
+ */
+ swapper_kernel_pmd =
+ extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+ memcpy(swapper_kernel_pmd, initial_kernel_pmd,
+ sizeof(pmd_t) * PTRS_PER_PMD);
+ swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
+ __pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
+ set_page_prot(swapper_kernel_pmd, PAGE_KERNEL_RO);
+
+ set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO);
+ xen_write_cr3(cr3);
+ pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, pfn);
+
+ pin_pagetable_pfn(MMUEXT_UNPIN_TABLE,
+ PFN_DOWN(__pa(initial_page_table)));
+ set_page_prot(initial_page_table, PAGE_KERNEL);
+ set_page_prot(initial_kernel_pmd, PAGE_KERNEL);
+
+ pv_mmu_ops.write_cr3 = &xen_write_cr3;
+}
__init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
unsigned long max_pfn)
{
pmd_t *kernel_pmd;
- level2_kernel_pgt = extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+ initial_kernel_pmd =
+ extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
xen_start_info->nr_pt_frames * PAGE_SIZE +
512*1024);
kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
- memcpy(level2_kernel_pgt, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
+ memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
- xen_map_identity_early(level2_kernel_pgt, max_pfn);
+ xen_map_identity_early(initial_kernel_pmd, max_pfn);
- memcpy(swapper_pg_dir, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
- set_pgd(&swapper_pg_dir[KERNEL_PGD_BOUNDARY],
- __pgd(__pa(level2_kernel_pgt) | _PAGE_PRESENT));
+ memcpy(initial_page_table, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
+ initial_page_table[KERNEL_PGD_BOUNDARY] =
+ __pgd(__pa(initial_kernel_pmd) | _PAGE_PRESENT);
- set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
- set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO);
+ set_page_prot(initial_kernel_pmd, PAGE_KERNEL_RO);
+ set_page_prot(initial_page_table, PAGE_KERNEL_RO);
set_page_prot(empty_zero_page, PAGE_KERNEL_RO);
pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
- xen_write_cr3(__pa(swapper_pg_dir));
-
- pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
+ pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE,
+ PFN_DOWN(__pa(initial_page_table)));
+ xen_write_cr3(__pa(initial_page_table));
memblock_x86_reserve_range(__pa(xen_start_info->pt_base),
__pa(xen_start_info->pt_base +
xen_start_info->nr_pt_frames * PAGE_SIZE),
"XEN PAGETABLES");
- return swapper_pg_dir;
+ return initial_page_table;
}
#endif /* CONFIG_X86_64 */
@@ -2304,7 +2343,11 @@ static const struct pv_mmu_ops xen_mmu_ops __initdata = {
.write_cr2 = xen_write_cr2,
.read_cr3 = xen_read_cr3,
+#ifdef CONFIG_X86_32
+ .write_cr3 = xen_write_cr3_init,
+#else
.write_cr3 = xen_write_cr3,
+#endif
.flush_tlb_user = xen_flush_tlb,
.flush_tlb_kernel = xen_flush_tlb,
--
1.5.6.5
--
Ian Campbell
Booze is the answer. I don't remember the question.
^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-11-03 15:35 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-27 8:50 [PATCH] x86: use pgd accessors when cloning a pgd range Ian Campbell
2010-10-27 10:40 ` Borislav Petkov
2010-10-27 16:50 ` Jeremy Fitzhardinge
2010-10-27 17:18 ` H. Peter Anvin
2010-10-27 17:31 ` Jeremy Fitzhardinge
2010-10-27 17:42 ` H. Peter Anvin
2010-10-27 17:51 ` Ian Campbell
2010-10-27 17:51 ` Jeremy Fitzhardinge
2010-10-27 18:02 ` Ian Campbell
2010-10-27 18:11 ` H. Peter Anvin
2010-10-27 18:51 ` Jeremy Fitzhardinge
2010-10-27 19:00 ` H. Peter Anvin
2010-10-27 19:02 ` H. Peter Anvin
2010-10-27 17:44 ` Borislav Petkov
2010-10-27 17:54 ` Jeremy Fitzhardinge
2010-10-27 17:58 ` Ian Campbell
2010-10-28 9:23 ` Ian Campbell
2010-10-28 11:20 ` Borislav Petkov
2010-10-28 11:53 ` Ian Campbell
2010-10-28 15:49 ` H. Peter Anvin
2010-10-28 15:48 ` H. Peter Anvin
2010-11-03 15:35 ` Ian Campbell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.