All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen's use of PAT and PV guests
@ 2010-03-30  0:35 Jeremy Fitzhardinge
  2010-03-30  7:44 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-30  0:35 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

I'm looking again at what it will take to reconcile Xen's PAT setup with 
the standard Linux one so that we can enable PAT use in pvops kernels.

Just for reference, this is the Linux vs Xen vs default PAT setups:

Index	PTE flags	Linux	Xen	Default
0			WB	WB	WB
1	        PWT	WC	WT	WT
2	    PCD	   	UC-	UC-	UC-
3	    PCD PWT	UC	UC	UC
4	PAT        	WB	WC	WB
5	PAT     PWT	WC	WP	WT
6	PAT PCD	   	UC-	UC	UC-
7	PAT PCD PWT	UC	UC	UC


Originally I was thinking of a moderately complex scheme in which an ELF 
node on the dom0 kernel could determine the system-wide Xen PAT MSR, and 
then the kernel ELF notes on subsequent domains would determine whether 
the PAT CPU feature flag is enabled or not.

However this has several problems:

   1. it is fairly complex
   2. if dom0 sets the PAT configuration to something strange, it may
      completely break other PV guests entirely (since it might
      effectively change the meaning of PCD+PWT globally)
   3. disabling the PAT CPU feature flag is meaningless, as its only
      effect is to say "there's no PAT, so PCD/PWT have their default
      behaviours", which is definitely not true in general

Linux only uses the first 4 PAT entries, and repeats it, effectively 
making the PAT pte flag a don't-care.  In those 4 entries, the Linux, 
Xen and Default configurations are identical aside from Linux using WC 
rather than WT.

It therefore seems to me that if I make Linux:

   1. never set the PAT flag (which it won't anyway),
   2. check that the value written to IA32_PAT is as expected, but
      otherwise ignore it, and
   3. use WT rather than WC

then it all should just work.  I'm not completely confident in the third 
point though, since I'm not quite sure about the full set of differences 
between WT and WC, and their respective interactions with the MTRR, and 
whether that would break anything.  At first glance it seems pretty safe 
though...

Thoughts?

     J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30  0:35 Xen's use of PAT and PV guests Jeremy Fitzhardinge
@ 2010-03-30  7:44 ` Jan Beulich
  2010-03-30 17:39   ` Jeremy Fitzhardinge
  2010-03-30 16:57 ` Konrad Rzeszutek Wilk
  2010-03-30 17:56 ` Ian Campbell
  2 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2010-03-30  7:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Keir Fraser, Konrad Rzeszutek Wilk

>>> Jeremy Fitzhardinge <jeremy@goop.org> 30.03.10 02:35 >>>
>It therefore seems to me that if I make Linux:
>
>   1. never set the PAT flag (which it won't anyway),
>   2. check that the value written to IA32_PAT is as expected, but
>      otherwise ignore it, and
>   3. use WT rather than WC
>
>then it all should just work.  I'm not completely confident in the third 
>point though, since I'm not quite sure about the full set of differences 
>between WT and WC, and their respective interactions with the MTRR, and 
>whether that would break anything.  At first glance it seems pretty safe 
>though...

No. For one, while WT is cachable (for reads), WC isn't.

Second, when the MTRRs indicate WC, using WT from PAT is not
recommended (and was earlier documented as undefined behavior).

Third, performance would likely suffer (MTRR-{WC,UC} + PAT-WT -> UC
whereas MTRR-{WC,UC} + PAT-WC -> WC).

Plus all of this would need revisiting once Linux decides to use WT
or WP.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30  0:35 Xen's use of PAT and PV guests Jeremy Fitzhardinge
  2010-03-30  7:44 ` Jan Beulich
@ 2010-03-30 16:57 ` Konrad Rzeszutek Wilk
  2010-03-30 18:43   ` Jeremy Fitzhardinge
  2010-03-30 17:56 ` Ian Campbell
  2 siblings, 1 reply; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-30 16:57 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Keir Fraser, Jan Beulich

On Mon, Mar 29, 2010 at 05:35:57PM -0700, Jeremy Fitzhardinge wrote:
> I'm looking again at what it will take to reconcile Xen's PAT setup with  
> the standard Linux one so that we can enable PAT use in pvops kernels.
>
> Just for reference, this is the Linux vs Xen vs default PAT setups:

And this LKML is good a primer:

http://www.linuxsymposium.org/archives/OLS/Reprints-2008/pallipadi-reprint.pdf

>
> Index	PTE flags	Linux	Xen	Default
> 0			WB	WB	WB
> 1	        PWT	WC	WT	WT
> 2	    PCD	   	UC-	UC-	UC-
> 3	    PCD PWT	UC	UC	UC
> 4	PAT        	WB	WC	WB
> 5	PAT     PWT	WC	WP	WT
> 6	PAT PCD	   	UC-	UC	UC-
> 7	PAT PCD PWT	UC	UC	UC
>
>
> Originally I was thinking of a moderately complex scheme in which an ELF  
> node on the dom0 kernel could determine the system-wide Xen PAT MSR, and  
> then the kernel ELF notes on subsequent domains would determine whether  
> the PAT CPU feature flag is enabled or not.
>
> However this has several problems:
>
>   1. it is fairly complex
>   2. if dom0 sets the PAT configuration to something strange, it may
>      completely break other PV guests entirely (since it might
>      effectively change the meaning of PCD+PWT globally)

How does this work on pages shared across domains? Say Guest A makes the
page WC,Dom0 makes it WB and Xen puts it in WC, and Dom0 reads does a
Write/Read/Write, but in actuallity it is a Read/Write/Write. Or is
there no danger there since the grant table pages have UC set on them?

>   3. disabling the PAT CPU feature flag is meaningless, as its only
>      effect is to say "there's no PAT, so PCD/PWT have their default
>      behaviours", which is definitely not true in general
>
> Linux only uses the first 4 PAT entries, and repeats it, effectively  
> making the PAT pte flag a don't-care.  In those 4 entries, the Linux,  
> Xen and Default configurations are identical aside from Linux using WC  
> rather than WT.
>
> It therefore seems to me that if I make Linux:
>
>   1. never set the PAT flag (which it won't anyway),
>   2. check that the value written to IA32_PAT is as expected, but
>      otherwise ignore it, and
>   3. use WT rather than WC

That would make all writes synchronous. Why not write back?
>
> then it all should just work.  I'm not completely confident in the third  
> point though, since I'm not quite sure about the full set of differences  
> between WT and WC, and their respective interactions with the MTRR, and  
> whether that would break anything.  At first glance it seems pretty safe  
> though...

The graphics cards (and the XServer) are the ones that come in my mind
as heavy users of having this "just right".  But in most (all?) cases
they want it to be UC or better UC- so that shouldn't affect this.

http://lkml.indiana.edu/hypermail/linux/kernel/9904.1/0306.html

Ah, but of course, there is an exception:

(i915_dma.c): gtt = ioremap_wc(pci_resource_start(dev->pdev, gtt_bar) + gtt_offset, gtt_size);

and then 'ttm_bo_ioremap' which does the ioremap_wc if  TTM_PL_FLAG_WC is set.

And it looks to be is set by the Radeon (if card is an AGP) and nouveau
on their first memory BAR.  Also the vmwgfx (VMWare) driver sets this, but we don't have
to worry about that.


So I think setting it to WC->WT would mean that graphics performance
would go down the drain. But then, there are some lingering issues with the
TTM/DRM infrastructure that need to tracked down and I believe 
Arvind and Michael are actively looking at that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30  7:44 ` Jan Beulich
@ 2010-03-30 17:39   ` Jeremy Fitzhardinge
  2010-03-30 17:59     ` Keir Fraser
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-30 17:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Keir Fraser, Konrad Rzeszutek Wilk

On 03/30/2010 12:44 AM, Jan Beulich wrote:
>>>> Jeremy Fitzhardinge<jeremy@goop.org>  30.03.10 02:35>>>
>>>>          
>> It therefore seems to me that if I make Linux:
>>
>>    1. never set the PAT flag (which it won't anyway),
>>    2. check that the value written to IA32_PAT is as expected, but
>>       otherwise ignore it, and
>>    3. use WT rather than WC
>>
>> then it all should just work.  I'm not completely confident in the third
>> point though, since I'm not quite sure about the full set of differences
>> between WT and WC, and their respective interactions with the MTRR, and
>> whether that would break anything.  At first glance it seems pretty safe
>> though...
>>      
> No. For one, while WT is cachable (for reads), WC isn't.
>
> Second, when the MTRRs indicate WC, using WT from PAT is not
> recommended (and was earlier documented as undefined behavior).
>    

Yes, I noticed that, and I wondered if that was why Linux is using WC, 
for max compatibility.  But presumably since it is now defined 
unconditionally, it means that all older (Intel, at least) 
implementations have that defined behaviour.

> Third, performance would likely suffer (MTRR-{WC,UC} + PAT-WT ->  UC
> whereas MTRR-{WC,UC} + PAT-WC ->  WC).
>    

Yeah.  If !pat_enabled, Linux will map a WC pte into UC-.

> Plus all of this would need revisiting once Linux decides to use WT
> or WP.
>    

Yes.

Ah, I think I know how to do it now: when constructing a PTE, remap 
Linux's PWT to Xen's PAT to end up with a WC PTE.

Does Xen guarantee that PAT is always available to vcpus as part of its 
ABI  (ie, do we support any pre-PAT cpus?).

Also, I'm assuming Xen's PAT entries 6 and 7 are reserved, in case Intel 
defines 2 and 3?

     J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30  0:35 Xen's use of PAT and PV guests Jeremy Fitzhardinge
  2010-03-30  7:44 ` Jan Beulich
  2010-03-30 16:57 ` Konrad Rzeszutek Wilk
@ 2010-03-30 17:56 ` Ian Campbell
  2010-03-30 21:47   ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Campbell @ 2010-03-30 17:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rzeszutek Wilk, Xen-devel, Keir Fraser, Jan Beulich, Konrad

On Tue, 2010-03-30 at 01:35 +0100, Jeremy Fitzhardinge wrote:

> It therefore seems to me that if I make Linux:
> 
>    1. never set the PAT flag (which it won't anyway),
>    2. check that the value written to IA32_PAT is as expected, but
>       otherwise ignore it, and
>    3. use WT rather than WC
> 
> then it all should just work.

I had a patch ages ago (which I have now lost) that caused the kernel to
read back the PAT MSR after writing it and try and locate a suitable
entry for each cache setting it was interested in (with fallbacks as
appropriate) to use dynamically thereafter.

This has the nice property that Linux could write what it really wanted
to the PAT register but it would then read and use whatever it actually
ended up with.

I'm not sure that this scheme is at all upstreamable though.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30 17:39   ` Jeremy Fitzhardinge
@ 2010-03-30 17:59     ` Keir Fraser
  2010-03-30 18:25       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Keir Fraser @ 2010-03-30 17:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Jan Beulich; +Cc: Xen-devel, Konrad Rzeszutek Wilk

On 30/03/2010 18:39, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

> Does Xen guarantee that PAT is always available to vcpus as part of its
> ABI  (ie, do we support any pre-PAT cpus?).

As it happens I think we do only support CPUs that have PAT. But you can
always check CPUID, just like running natively.

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Xen's use of PAT and PV guests
  2010-03-30 17:59     ` Keir Fraser
@ 2010-03-30 18:25       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-30 18:25 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel, Dave McCracken, Jan Beulich, Konrad Rzeszutek Wilk

On 03/30/2010 10:59 AM, Keir Fraser wrote:
> As it happens I think we do only support CPUs that have PAT. But you can
> always check CPUID, just like running natively.
>    

Yeah, I wasn't going to remove any of the tests, but I was wondering if 
the guest can always assume that it can set the pat flags in the pte.  I 
guess that since Xen uses the same settings for the default pat (=no pat 
at all), then so long as the guest doesn't try to set _PAGE_PAT, then it 
doesn't matter.

Unfortunately hugetlbfs adds a wart, since it appears to end up going 
down to the make_pte/pte_val path, but we can't tell whether its a page 
with _PAGE_PSE or _PAGE_PAT set...

     J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Xen's use of PAT and PV guests
  2010-03-30 16:57 ` Konrad Rzeszutek Wilk
@ 2010-03-30 18:43   ` Jeremy Fitzhardinge
  2010-03-31  8:26     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-30 18:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xen-devel, Keir Fraser, Jan Beulich

On 03/30/2010 09:57 AM, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 29, 2010 at 05:35:57PM -0700, Jeremy Fitzhardinge wrote:
>    
>> I'm looking again at what it will take to reconcile Xen's PAT setup with
>> the standard Linux one so that we can enable PAT use in pvops kernels.
>>
>> Just for reference, this is the Linux vs Xen vs default PAT setups:
>>      
> And this LKML is good a primer:
>
> http://www.linuxsymposium.org/archives/OLS/Reprints-2008/pallipadi-reprint.pdf
>    

Thanks, just what I was looking for.

>> Index	PTE flags	Linux	Xen	Default
>> 0			WB	WB	WB
>> 1	        PWT	WC	WT	WT
>> 2	    PCD	   	UC-	UC-	UC-
>> 3	    PCD PWT	UC	UC	UC
>> 4	PAT        	WB	WC	WB
>> 5	PAT     PWT	WC	WP	WT
>> 6	PAT PCD	   	UC-	UC	UC-
>> 7	PAT PCD PWT	UC	UC	UC
>>
>>
>> Originally I was thinking of a moderately complex scheme in which an ELF
>> node on the dom0 kernel could determine the system-wide Xen PAT MSR, and
>> then the kernel ELF notes on subsequent domains would determine whether
>> the PAT CPU feature flag is enabled or not.
>>
>> However this has several problems:
>>
>>    1. it is fairly complex
>>    2. if dom0 sets the PAT configuration to something strange, it may
>>       completely break other PV guests entirely (since it might
>>       effectively change the meaning of PCD+PWT globally)
>>      
> How does this work on pages shared across domains? Say Guest A makes the
> page WC,Dom0 makes it WB and Xen puts it in WC, and Dom0 reads does a
> Write/Read/Write, but in actuallity it is a Read/Write/Write. Or is
> there no danger there since the grant table pages have UC set on them?
>    

Not sure.  That would invoke undefined behaviour, I'd assume.  Does Xen 
keep track of memory type aliases?  Grant pages don't have to be UC do 
they?  Pages between front and backends don't need to be (and shouldn't 
be) UC.

> The graphics cards (and the XServer) are the ones that come in my mind
> as heavy users of having this "just right".  But in most (all?) cases
> they want it to be UC or better UC- so that shouldn't affect this.
>    

Hm, I don't want to try out-guessing Linux's use of all these modes; we 
need to either get it right or not try.

     J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30 17:56 ` Ian Campbell
@ 2010-03-30 21:47   ` Jeremy Fitzhardinge
  2010-03-31  8:31     ` Ian Campbell
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-30 21:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

On 03/30/2010 10:56 AM, Ian Campbell wrote:
> I had a patch ages ago (which I have now lost) that caused the kernel to
> read back the PAT MSR after writing it and try and locate a suitable
> entry for each cache setting it was interested in (with fallbacks as
> appropriate) to use dynamically thereafter.
>
> This has the nice property that Linux could write what it really wanted
> to the PAT register but it would then read and use whatever it actually
> ended up with.
>

I started to implement something like that, but stopped and decided to 
do a much simpler hack.  I wanted to make sure that make_pte and pte_val 
are proper inverses of each other, so pte_val needs to do the reverse 
mapping from xen pte pat flags -> linux pte pat flags.  Given that the 
only difference between Xen and Linux is whether _PAGE_PWT means WT or 
WC, it is easy to do the mapping forward and back.

But hugetlbfs adds the complication that it ends up constructing ptes 
via the pte operations (which makes logical sense), but on x86 that's a 
mess because the meaning of _PAGE_PAT changes to _PAGE_PSE on level 
2/3/4 entries, and there's no way of knowing what level we're looking at.

At the moment I'm winging it by ignoring _PAGE_PAT in make_pte, and hope 
that nobody wants to do pte_val on a hugetlbfs pte...

Looks like the proper fix is to stop hugetlbfs from using mk_pte, and 
add a mk_huge_tlb instead, and have the x86 version use the pmd operations.

> I'm not sure that this scheme is at all upstreamable though.
>

I don't see why not; it would all be hidden away in the Xen code, and 
maintains the normal x86 illusion.   It's just a matter of hooking 
wrmsr, make_pte and pte_val, which we do already.

     J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Xen's use of PAT and PV guests
  2010-03-30 18:43   ` Jeremy Fitzhardinge
@ 2010-03-31  8:26     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2010-03-31  8:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk; +Cc: Xen-devel, Keir Fraser

>>> Jeremy Fitzhardinge <jeremy@goop.org> 30.03.10 20:43 >>>
>On 03/30/2010 09:57 AM, Konrad Rzeszutek Wilk wrote:
>> How does this work on pages shared across domains? Say Guest A makes the
>> page WC,Dom0 makes it WB and Xen puts it in WC, and Dom0 reads does a
>> Write/Read/Write, but in actuallity it is a Read/Write/Write. Or is
>> there no danger there since the grant table pages have UC set on them?
>>    
>
>Not sure.  That would invoke undefined behaviour, I'd assume.  Does Xen 
>keep track of memory type aliases?  Grant pages don't have to be UC do 
>they?  Pages between front and backends don't need to be (and shouldn't 
>be) UC.

Granted pages are RAM pages, and hence should always be WB
everywhere.

As to Xen's memory type handling - iirc the most recent memory type
used in a page table entry determines what Xen uses in its 1:1 mapping,
but I don't think global consistency is being enforced.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-30 21:47   ` Jeremy Fitzhardinge
@ 2010-03-31  8:31     ` Ian Campbell
  2010-03-31 16:55       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Campbell @ 2010-03-31  8:31 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jan, Xen-devel, Keir Fraser, Beulich, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

On Tue, 2010-03-30 at 22:47 +0100, Jeremy Fitzhardinge wrote:
> 
> > I'm not sure that this scheme is at all upstreamable though.
> >
> 
> I don't see why not; it would all be hidden away in the Xen code, and 
> maintains the normal x86 illusion.   It's just a matter of hooking 
> wrmsr, make_pte and pte_val, which we do already. 

I was referring to my patch which wasn't at all hidden away in the Xen
code. Anyway I've found and attached it for your amusement, it's not
nearly as fully baked as I remembered ;-) In my defence the date stamp
on the patch file is May 2009...

Ian.


[-- Attachment #2: pat --]
[-- Type: text/x-patch, Size: 9467 bytes --]

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 54cb697..ef947d8 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -68,14 +68,22 @@
 			 _PAGE_DIRTY)
 
 /* Set of bits not changed in pte_modify */
-#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
+#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PAT | _PAGE_PCD | _PAGE_PWT | \
 			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
 
-#define _PAGE_CACHE_MASK	(_PAGE_PCD | _PAGE_PWT)
-#define _PAGE_CACHE_WB		(0)
-#define _PAGE_CACHE_WC		(_PAGE_PWT)
-#define _PAGE_CACHE_UC_MINUS	(_PAGE_PCD)
-#define _PAGE_CACHE_UC		(_PAGE_PCD | _PAGE_PWT)
+#define _PAGE_CACHE_MASK	(_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
+
+#ifndef __ASSEMBLY__
+extern unsigned __page_cache_wb;
+extern unsigned __page_cache_wc;
+extern unsigned __page_cache_uc_minus;
+extern unsigned __page_cache_uc;
+#endif
+
+#define _PAGE_CACHE_WB		__page_cache_wb
+#define _PAGE_CACHE_WC		__page_cache_wc
+#define _PAGE_CACHE_UC_MINUS	__page_cache_uc_minus
+#define _PAGE_CACHE_UC		__page_cache_uc
 
 #define PAGE_NONE	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
 #define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c66dda1..0137c05 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -304,6 +304,8 @@ void __init get_mtrr_state(void)
 	unsigned lo, dummy;
 	unsigned long flags;
 
+	printk(KERN_CRIT "pat: made it to get_mtrr_state\n");
+
 	vrs = mtrr_state.var_ranges;
 
 	rdmsr(MTRRcap_MSR, lo, dummy);
@@ -336,6 +338,7 @@ void __init get_mtrr_state(void)
 	local_irq_save(flags);
 	prepare_set();
 
+	printk(KERN_CRIT "pat: get_mtrr_state calling pat_init()\n");
 	pat_init();
 
 	post_set();
@@ -614,6 +617,7 @@ static void generic_set_all(void)
 	mask = set_mtrr_state();
 
 	/* also set PAT */
+	printk(KERN_CRIT "pat: generic_set_all() calling pat_init()\n");
 	pat_init();
 
 	post_set();
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index fd5ac04..6eee754 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -623,6 +623,8 @@ void __init mtrr_bp_init(void)
 {
 	u32 phys_addr;
 
+	printk(KERN_CRIT "pat: made it to mtrr_bp_init\n");
+
 	init_ifs();
 
 	phys_addr = 32;
@@ -690,16 +692,22 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		printk(KERN_CRIT "pat: in mtrr_bp_init and mtrr_if is true\n");
+
 		num_var_ranges = mtrr_if->num_var_ranges();
 		init_table();
 		if (use_intel()) {
+			printk(KERN_CRIT "pat: mtrr_bp_init calling get_mtrr_state\n");
 			get_mtrr_state();
 
 			if (mtrr_cleanup(phys_addr)) {
 				changed_by_mtrr_cleanup = 1;
+				printk(KERN_CRIT "pat: mtrr_bp_init calling %pF via set_all hook\n", &mtrr_if->set_all);
 				mtrr_if->set_all();
 			}
 
+		} else {
+			printk(KERN_CRIT "pat: in mtrr_bp_init but not use_intel()\n");
 		}
 	}
 }
@@ -720,6 +728,7 @@ void mtrr_ap_init(void)
 	 */
 	local_irq_save(flags);
 
+	printk(KERN_CRIT "pat: mtrr_ap_init calling %pF via set_all hook\n", &mtrr_if->set_all);
 	mtrr_if->set_all();
 
 	local_irq_restore(flags);
diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c
index 50a45db..63f86aa 100644
--- a/arch/x86/kernel/cpu/mtrr/xen.c
+++ b/arch/x86/kernel/cpu/mtrr/xen.c
@@ -15,6 +15,8 @@ static void xen_set_mtrr(unsigned int reg, unsigned long base,
 	struct xen_platform_op op;
 	int error;
 
+	printk(KERN_CRIT "pat: xen_set_mtrr\n");
+
 	/* mtrr_ops->set() is called once per CPU,
 	 * but Xen's ops apply to all CPUs.
 	 */
@@ -87,7 +89,7 @@ static struct mtrr_ops xen_mtrr_ops = {
 	.get_free_region   = xen_get_free_region,
 	.validate_add_page = generic_validate_add_page,
 	.have_wrcomb       = positive_have_wrcomb,
-	.use_intel_if	   = 0,
+	.use_intel_if	   = 1/*0*/,
 	.num_var_ranges	   = xen_num_var_ranges,
 };
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1b1c851..f704078 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -834,6 +834,7 @@ void __init setup_arch(char **cmdline_p)
 	/* preallocate 4k for mptable mpc */
 	early_reserve_e820_mpc_new();
 	/* update e820 for memory not covered by WB MTRRs */
+	printk(KERN_CRIT "pat: calling mtrr_bp_init from setup_arch()\n");
 	mtrr_bp_init();
 	if (mtrr_trim_uncached_memory(max_pfn))
 		max_pfn = e820_end_of_ram_pfn();
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 8a45093..3c13d64 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -141,18 +141,12 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
 	unsigned long nrpages = size >> PAGE_SHIFT;
 	int err;
 
-	switch (prot_val) {
-	case _PAGE_CACHE_UC:
-	default:
-		err = _set_memory_uc(vaddr, nrpages);
-		break;
-	case _PAGE_CACHE_WC:
+	if (prot_val == _PAGE_CACHE_WC)
 		err = _set_memory_wc(vaddr, nrpages);
-		break;
-	case _PAGE_CACHE_WB:
+	else if (prot_val == _PAGE_CACHE_WB)
 		err = _set_memory_wb(vaddr, nrpages);
-		break;
-	}
+	else /* _PAGE_CACHE_UC or "other" */
+		err = _set_memory_uc(vaddr, nrpages);
 
 	return err;
 }
@@ -256,21 +250,14 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 		prot_val = new_prot_val;
 	}
 
-	switch (prot_val) {
-	case _PAGE_CACHE_UC:
-	default:
-		prot = PAGE_KERNEL_IO_NOCACHE;
-		break;
-	case _PAGE_CACHE_UC_MINUS:
-		prot = PAGE_KERNEL_IO_UC_MINUS;
-		break;
-	case _PAGE_CACHE_WC:
+	if (prot_val == _PAGE_CACHE_WC)
 		prot = PAGE_KERNEL_IO_WC;
-		break;
-	case _PAGE_CACHE_WB:
+	else if (prot_val == _PAGE_CACHE_WB)
 		prot = PAGE_KERNEL_IO;
-		break;
-	}
+	else if (prot_val == _PAGE_CACHE_UC_MINUS)
+		prot = PAGE_KERNEL_IO_UC_MINUS;
+	else /* _PAGE_CACHE_UC or "other" */
+		prot = PAGE_KERNEL_IO_NOCACHE;
 
 	/*
 	 * Ok, go for it..
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 3f7886f..f85a273 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -61,6 +61,11 @@ __setup("debugpat", pat_debug_setup);
 
 static u64 __read_mostly boot_pat_state;
 
+unsigned __page_cache_wb;
+unsigned __page_cache_wc;
+unsigned __page_cache_uc_minus;
+unsigned __page_cache_uc;
+
 enum {
 	PAT_UC = 0,		/* uncached */
 	PAT_WC = 1,		/* Write combining */
@@ -70,15 +75,43 @@ enum {
 	PAT_UC_MINUS = 7,	/* UC, but can be overriden by MTRR */
 };
 
+const char *pat_labels[8] = {
+	"broken",
+	"broken",
+	"broken",
+	"broken",
+	"broken",
+	"broken",
+	"broken",
+	"broken"
+};
+
 #define PAT(x, y)	((u64)PAT_ ## y << ((x)*8))
 
+static unsigned find_pat_index(const u64 pat, int mode)
+{
+	u64 mask;
+	int i;
+
+	for (mask = 0x7, i = 0; i < 8; mask <<= 8, i++) {
+		if (((pat & mask)>>(i*8)) == mode)
+			return i;
+	}
+	return -1; /* OR??? */
+}
+
 void pat_init(void)
 {
 	u64 pat;
 
-	if (!pat_enabled)
+	printk(KERN_CRIT "%s\n", __func__);
+	dump_stack();
+	if (!pat_enabled) {
+		printk(KERN_CRIT "%s pat not enabled\n", __func__);
 		return;
+	}
 
+	printk(KERN_CRIT "%s:%d\n", __func__, __LINE__);
 	if (!cpu_has_pat) {
 		if (!boot_pat_state) {
 			pat_disable("PAT not supported by CPU.");
@@ -95,6 +128,8 @@ void pat_init(void)
 		}
 	}
 
+	printk(KERN_CRIT "%s:%d\n", __func__, __LINE__);
+
 	/* Set PWT to Write-Combining. All other bits stay the same */
 	/*
 	 * PTE encoding used in Linux:
@@ -105,32 +140,41 @@ void pat_init(void)
 	 *      000 WB		_PAGE_CACHE_WB
 	 *      001 WC		_PAGE_CACHE_WC
 	 *      010 UC-		_PAGE_CACHE_UC_MINUS
-	 *      011 UC		_PAGE_CACHE_UC
-	 * PAT bit unused
+	 *      011 UC		_PAGE_CACHE_UC PAT bit unused
 	 */
 	pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
-	      PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
+		PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
 
 	/* Boot CPU check */
 	if (!boot_pat_state)
 		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
 
 	wrmsrl(MSR_IA32_CR_PAT, pat);
+
+
+	/* Now see what we actually got */
+	rdmsrl(MSR_IA32_CR_PAT, pat);
+	__page_cache_wb = find_pat_index(pat, PAT_WB);
+	__page_cache_wc = find_pat_index(pat, PAT_WC);
+	__page_cache_uc_minus = find_pat_index(pat, PAT_UC_MINUS);
+	__page_cache_uc = find_pat_index(pat, PAT_UC);
+
+	pat_labels[__page_cache_wb] = "write-back";
+	pat_labels[__page_cache_wc] = "write-combining";
+	pat_labels[__page_cache_uc_minus] = "uncached-minus";
+	pat_labels[__page_cache_uc] = "uncached";
+
 	printk(KERN_INFO "x86 PAT enabled: cpu %d, old 0x%Lx, new 0x%Lx\n",
 	       smp_processor_id(), boot_pat_state, pat);
+	printk(KERN_INFO "Indexes: WB:%d; WC:%d; UC-:%d; UC:%d\n",
+	       __page_cache_wb, __page_cache_wc, __page_cache_uc_minus, __page_cache_uc);
 }
 
 #undef PAT
 
 static char *cattr_name(unsigned long flags)
 {
-	switch (flags & _PAGE_CACHE_MASK) {
-	case _PAGE_CACHE_UC:		return "uncached";
-	case _PAGE_CACHE_UC_MINUS:	return "uncached-minus";
-	case _PAGE_CACHE_WB:		return "write-back";
-	case _PAGE_CACHE_WC:		return "write-combining";
-	default:			return "broken";
-	}
+	return pat_labels[flags & _PAGE_CACHE_MASK];
 }
 
 /*
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e099e44..232ed3e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1121,7 +1121,7 @@ asmlinkage void __init xen_start_kernel(void)
 		xen_start_info->console.domU.evtchn = 0;
 	}
 
-	pat_disable("PAT disabled on Xen");
+	//pat_disable("PAT disabled on Xen");
 
 	xen_raw_console_write("about to get started...\n");
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Xen's use of PAT and PV guests
  2010-03-31  8:31     ` Ian Campbell
@ 2010-03-31 16:55       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-31 16:55 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

On 03/31/2010 01:31 AM, Ian Campbell wrote:
> I was referring to my patch which wasn't at all hidden away in the Xen
> code. Anyway I've found and attached it for your amusement, it's not
> nearly as fully baked as I remembered ;-) In my defence the date stamp
> on the patch file is May 2009...
>    

Urk, yeah, that's going to be much harder to make fly.  Conversion at 
the make_pte/pte_val level gets us most of way there with less intrusion.

     J

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-03-31 16:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-30  0:35 Xen's use of PAT and PV guests Jeremy Fitzhardinge
2010-03-30  7:44 ` Jan Beulich
2010-03-30 17:39   ` Jeremy Fitzhardinge
2010-03-30 17:59     ` Keir Fraser
2010-03-30 18:25       ` Jeremy Fitzhardinge
2010-03-30 16:57 ` Konrad Rzeszutek Wilk
2010-03-30 18:43   ` Jeremy Fitzhardinge
2010-03-31  8:26     ` Jan Beulich
2010-03-30 17:56 ` Ian Campbell
2010-03-30 21:47   ` Jeremy Fitzhardinge
2010-03-31  8:31     ` Ian Campbell
2010-03-31 16:55       ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.