Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
@ 2009-05-25  8:42 Tomasz Chmielewski
  2009-05-25  9:15 ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Tomasz Chmielewski @ 2009-05-25  8:42 UTC (permalink / raw)
  To: LKML
  Cc: mingo, Alan, Gerd Hoffmann, jeremy, Andi Kleen, H. Peter Anvin,
	JBeulich, jbarnes

Ingo Molnar wrote:

>>>> Oops, the third "proper technical solutions" is missing.
>>>
>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>>
>> Works only in case the CPU has PAT support.
> 
> Which specific CPU without PAT support do you worry about?

Are these ones (below) good CPU to worry about? That's slightly 
off-topic as they are KVM guests, but still, the guest kernel seems to 
disable PAT (unless I'm mistaken).

KVM-86 + 2.6.27.19 guest:

# dmesg
Initializing cgroup subsys cpuset
Linux version 2.6.27.19-server-1mnb (qateam@titan.mandriva.com) (gcc 
version 4.3.2 (GCC) ) #1 SMP Thu Mar 5 00:09:04 EST 2009
PAT WC disabled due to known CPU erratum.
(...)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : QEMU Virtual CPU version 0.10.50
stepping        : 3
cpu MHz         : 2133.285
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 syscall lm up pni
bogomips        : 4266.57
clflush size    : 64
power management:

KVM-85 + 2.6.29.1 guest:

# dmesg
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.29.1-server-4mnb (herton@n2.mandriva.com) (gcc version 
4.3.2 (GCC) ) #1 SMP Mon Apr 20 20:10:24 EDT 2009
KERNEL supported cpus:
   Intel GenuineIntel
   AMD AuthenticAMD
   NSC Geode by NSC
   Cyrix CyrixInstead
   Centaur CentaurHauls
   Transmeta GenuineTMx86
   Transmeta TransmetaCPU
   UMC UMC UMC UMC
PAT WC disabled due to known CPU erratum.
(...)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : QEMU Virtual CPU version 0.10.0
stepping        : 3
cpu MHz         : 2128.004
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up pni hypervisor
bogomips        : 4256.00
clflush size    : 64
power management:

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  8:42 [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation Tomasz Chmielewski
@ 2009-05-25  9:15 ` Andi Kleen
  2009-05-25  9:31   ` Jan Beulich
  2009-05-25 16:05   ` H. Peter Anvin
  0 siblings, 2 replies; 20+ messages in thread
From: Andi Kleen @ 2009-05-25  9:15 UTC (permalink / raw)
  To: Tomasz Chmielewski
  Cc: LKML, mingo, Alan, Gerd Hoffmann, jeremy, Andi Kleen,
	H. Peter Anvin, JBeulich, jbarnes

> Are these ones (below) good CPU to worry about? That's slightly 
> off-topic as they are KVM guests, but still, the guest kernel seems to 
> disable PAT (unless I'm mistaken).

To my knowledge the only CPUs with broken PAT are some very old Pentium
Pros dating from back the time when PAT was new and noone used it.
On those it should be disabled or the high bits not used.

The kernel is overall too conservative with its white list.

Today a lot of other common operating systems use PAT extensively,
so in general it works on the hardware level.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  9:15 ` Andi Kleen
@ 2009-05-25  9:31   ` Jan Beulich
  2009-05-25  9:47     ` Andi Kleen
  2009-05-25 16:05   ` H. Peter Anvin
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2009-05-25  9:31 UTC (permalink / raw)
  To: Andi Kleen, Tomasz Chmielewski
  Cc: mingo, jeremy, Alan, Gerd Hoffmann, LKML, jbarnes, H. Peter Anvin

>>> Andi Kleen <andi@firstfloor.org> 25.05.09 11:15 >>>
>To my knowledge the only CPUs with broken PAT are some very old Pentium
>Pros dating from back the time when PAT was new and noone used it.
>On those it should be disabled or the high bits not used.

??? See PentiumIII erratum E27.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  9:31   ` Jan Beulich
@ 2009-05-25  9:47     ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2009-05-25  9:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andi Kleen, Tomasz Chmielewski, mingo, jeremy, Alan,
	Gerd Hoffmann, LKML, jbarnes, H. Peter Anvin

On Mon, May 25, 2009 at 10:31:03AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <andi@firstfloor.org> 25.05.09 11:15 >>>
> >To my knowledge the only CPUs with broken PAT are some very old Pentium
> >Pros dating from back the time when PAT was new and noone used it.
> >On those it should be disabled or the high bits not used.
> 
> ??? See PentiumIII erratum E27.

Thanks for the correction.  

It's still all pretty old CPUs.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  9:15 ` Andi Kleen
  2009-05-25  9:31   ` Jan Beulich
@ 2009-05-25 16:05   ` H. Peter Anvin
  1 sibling, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2009-05-25 16:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Tomasz Chmielewski, LKML, mingo, Alan, Gerd Hoffmann, jeremy,
	JBeulich, jbarnes

Andi Kleen wrote:
> 
> To my knowledge the only CPUs with broken PAT are some very old Pentium
> Pros dating from back the time when PAT was new and noone used it.
> On those it should be disabled or the high bits not used.
> 
> The kernel is overall too conservative with its white list.
> 
> Today a lot of other common operating systems use PAT extensively,
> so in general it works on the hardware level.
> 

Well, we don't use the high bits exactly for this reason.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 11:08                           ` Ingo Molnar
  2009-05-19 12:04                             ` Gerd Hoffmann
@ 2009-05-20 16:35                             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 16:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, the arch/x86 maintainers, Thomas Gleixner,
	Linus Torvalds, Xen-devel, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin

Ingo Molnar wrote:
> * Jan Beulich <JBeulich@novell.com> wrote:
>
>   
>>>>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>>>>>           
>>> Exactly what is 'bizarre' about using the API defined by the 
>>> _CPU_ already, without adding any ad-hoc hypecall? Catch the 
>>> dom0 WRMSRs, filter out the MTRR indices - that's it.
>>>       
>> But that is *not* the same as using the hypercalls: The hypercall 
>> tells Xen "Change all CPUs' MTRRs with the indicated index to the 
>> indicated value", while the MSR write says "Change the MTRR with 
>> the given index on the physical CPU the current virtual CPU 
>> happens to run on to the given value". [...]
>>     
>
> The change of MTRR's on _any_ of the guest CPUs in a dom0 context 
> should immediately be refected on all CPUs. Assymetric MTRR settings 
> are madness.
>
> ( And the thing is, changing MTRRs is fragile and racy on native 
>   Linux no matter what - even without any hypervisors - due to SMM 
>   contexts possibly relying on them etc. )
>
>   
>> [...] A write-base/write-mask pair may happen to get interrupted 
>> (preempted) by the hypervisor, and hence the two writes may happen 
>> on different pCPU-s. Teaching the hypervisor to (correctly!) guess 
>> what the guest meant in that situation isn't trivial, as then it 
>> needs to handle all possible situations (and it can never know 
>> whether Dom0 really intended to do something that may look 
>> bogus/inconsistent at the first glance). [...]
>>     
>
> None of this is a problem really if a sane approach is used: a 
> change to the MTRR state on dom0 is applied symmetrically on all 
> CPUs.
>
> Or, alternatively, the hypervisor can expose its own administrative 
> interface to manage MTRRs.
>
> There's no need to fuglify the Linux kernel for that.

I'm not sure what you mean by that, other than as a description of the 
current case.  The Xen MTRR hypercall:

   1. treats MTRR ranges as allocatable resources, and keep track of how
      many uses there are of each
   2. updates all physical cpus synchronously (ie, the MTRR is not
      presented as a property of dom0's virtual CPU, but as a
      system-wide resource)
   3. prevents guests from setting inconsistent or conflicting MTRRs

Mapping from MSR writes to this interface is moderately complex, because 
it requires a mapping from a low-semantic-content interface to a 
high-semantic-content interface.  It essentially requires parsing the 
MSR writes to map them back to the relatively high-level operations at 
the mtrr_ops interface and then present that to Xen.

There are at least a couple of secondary issues which arise from that 
approach:

    * mtrr/generic.c also has to do a number of other things like
      disabling caching, tlb flushes, etc.  That adds complexity because
      Xen guests are never allowed to globally disable caching, so we'd
      have to add additional filtering to remove those cr0 writes
    * As we've discussed, we'd need to make the mtrr writes implicitly
      change all cpus atomically, as the dom0 kernel can't see physical cpus


The net effect would be that we would be making a pile of apparently 
generic CPU operations (MSR writes, control register writes) actually 
feed a fairly complex parser, increasing the difference between the Xen 
and native cases even more.

mtrr/generic.c about 730 lines of fairly intricate arch-specific code.  
mtrr/xen.c is 120 lines of straightforward hypercalls.  The mtrr_ops 
interface and the Xen hypercall interface are a close semantic match, so 
there's very little glue code in there.


But that said, this a huge distraction, an unbelievable amount of noise 
for a fairly minor point.  We can live without these changes, and 
they're certainly easy enough to carry out of tree in the meantime.  If 
you can't live with these changes, then drop them and we'll work out 
something else.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 15:24                                           ` Ingo Molnar
@ 2009-05-20  8:01                                             ` Gerd Hoffmann
  0 siblings, 0 replies; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-20  8:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 17:24, Ingo Molnar wrote:
> the Xen hypervisor can simply repeat all requests (i.e. not care at
> all about the fact that a guest does these modifications on all CPUs
> it sees), or realize that the modification has already been done and
> skip it.

Could be done, yes.  It still feels wrong that wrmsr(mtrr) works 
slightly different on xen and on native.  And it wouldn't work on 
existing Xen deployments as the Xen hypervisor doesn't support that today.

>>>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.

> That's a really old CPU, but even Coppermine has PAT support in the
> CPU. You need to go back to things like P5 200 MHz CPUs to find
> PAT-less CPUs.

Linux shouln't say "PAT not supported by CPU." then.

Also it doesn't make sense to me to handle things differently on native 
and xen.  While it might make sense to deprecate mtrrs in favor of PAT 
(don't know enougth about all the different cpus in the wild to justify 
that) I don't think it makes sense to do that for xen only.  Native 
should declare mtrrs obsolete as well.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 14:55                                         ` Gerd Hoffmann
@ 2009-05-19 15:24                                           ` Ingo Molnar
  2009-05-20  8:01                                             ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 15:24 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

* Gerd Hoffmann <kraxel@redhat.com> wrote:

>>>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>>> Why?
>
>> | The change of MTRR's on _any_ of the guest CPUs in a dom0 context
>> | should immediately be refected on all CPUs. Assymetric MTRR
>> | settings are madness.
>
> Exactly.  And thats why it is pointless to let the dom0 kernel 
> write the mtrr msrs on more than one cpu.

How does this negate my claim that the Linux kernel needs no 
modifications? (which i think your point is - let me know if you 
have some other point here.)

the Xen hypervisor can simply repeat all requests (i.e. not care at 
all about the fact that a guest does these modifications on all CPUs 
it sees), or realize that the modification has already been done and 
skip it. This is in no way a performance sensitive codepath - mtrrs 
are only modified in init sequences - and setting mtrrs is slow and 
globally serialized to begin with.

>>>>> Oops, the third "proper technical solutions" is missing.
>>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>>> Works only in case the CPU has PAT support.
>>
>> Which specific CPU without PAT support do you worry about?
>
> For example: I have a Notebook here, with MTRR and without PAT 
> according to the boot log.  "Pentium III (Coppermine)" says 
> /proc/cpuinfo.

That's a really old CPU, but even Coppermine has PAT support in the 
CPU. You need to go back to things like P5 200 MHz CPUs to find 
PAT-less CPUs.

On the Linux side it's easy to enable it (and _such_ a patch would 
make sense indeed) - it's just that nobody has yet gone through the 
effort of testing and validating the PAT code on that CPU. If you 
care enough, you can do it, send a patch and ping the Intel folks 
about it.

Once the issues are framed correctly, Linux is helped for real.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 14:17                                       ` Ingo Molnar
@ 2009-05-19 14:55                                         ` Gerd Hoffmann
  2009-05-19 15:24                                           ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 14:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

>>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>> Why?

> | The change of MTRR's on _any_ of the guest CPUs in a dom0 context
> | should immediately be refected on all CPUs. Assymetric MTRR
> | settings are madness.

Exactly.  And thats why it is pointless to let the dom0 kernel write the 
mtrr msrs on more than one cpu.

>>>> Oops, the third "proper technical solutions" is missing.
>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>> Works only in case the CPU has PAT support.
>
> Which specific CPU without PAT support do you worry about?

For example:  I have a Notebook here, with MTRR and without PAT 
according to the boot log.  "Pentium III (Coppermine)" says /proc/cpuinfo.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 13:51                                     ` Gerd Hoffmann
@ 2009-05-19 14:17                                       ` Ingo Molnar
  2009-05-19 14:55                                         ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 14:17 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds


* Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 05/19/09 15:31, Ingo Molnar wrote:
>> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>>
>>> On 05/19/09 14:26, Ingo Molnar wrote:
>>>> * Gerd Hoffmann<kraxel@redhat.com>   wrote:
>>>>
>>>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>>>> interface to manage MTRRs.
>>>>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>>>>> implementation uses that interface ...
>>>> No, that is not an 'administrative interface' - that is a guest
>>>> kernel level hack that complicates Linux, extends its effective ABI
>>>> dependencies and which has to be maintained there from that point
>>>> on.
>>>>
>>>> There's really just three proper technical solutions here:
>>>>
>>>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>>>     isnt really all that different from the mtrr_ops approach so it
>>>>     shouldnt pose undue difficulties to the Xen hypervisor).
>>> Devil is in the details.
>>>
>>> The dom0 kernel might not see all physical cpus on the system.  So
>>> Xen can't leave the job of looping over all cpus to the dom0
>>> kernel, Xen has to apply the changes made by the (priviledged)
>>> guest kernel on any (virtual) cpu to all (physical) cpus in the
>>> machine.
>>
>> Applying MTRR changes to only part of the CPUs is utter madness.
>
> Sure.  Do you read what I'm writing?
>
>>> Which in turn means the "lowlevel cpu hw op" would work in a
>>> slightly different way on Xen and native.  Nasty.
>>>
>>>>     That will
>>>>     be maximally transparent and compatible, with zero changes needed
>>>>     to the Linux kernel.
>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>
>> Why?
>
> See above.  Xen has to apply the changes to all cpus anyway.

do _you_ read what i wrote, in the thread you are replying to:

|
| The change of MTRR's on _any_ of the guest CPUs in a dom0 context 
| should immediately be refected on all CPUs. Assymetric MTRR 
| settings are madness.
|


>>> Oops, the third "proper technical solutions" is missing.
>>
>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>
> Works only in case the CPU has PAT support.

Which specific CPU without PAT support do you worry about?

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 13:31                                   ` Ingo Molnar
@ 2009-05-19 13:51                                     ` Gerd Hoffmann
  2009-05-19 14:17                                       ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 15:31, Ingo Molnar wrote:
> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>
>> On 05/19/09 14:26, Ingo Molnar wrote:
>>> * Gerd Hoffmann<kraxel@redhat.com>   wrote:
>>>
>>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>>> interface to manage MTRRs.
>>>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>>>> implementation uses that interface ...
>>> No, that is not an 'administrative interface' - that is a guest
>>> kernel level hack that complicates Linux, extends its effective ABI
>>> dependencies and which has to be maintained there from that point
>>> on.
>>>
>>> There's really just three proper technical solutions here:
>>>
>>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>>     isnt really all that different from the mtrr_ops approach so it
>>>     shouldnt pose undue difficulties to the Xen hypervisor).
>> Devil is in the details.
>>
>> The dom0 kernel might not see all physical cpus on the system.  So
>> Xen can't leave the job of looping over all cpus to the dom0
>> kernel, Xen has to apply the changes made by the (priviledged)
>> guest kernel on any (virtual) cpu to all (physical) cpus in the
>> machine.
>
> Applying MTRR changes to only part of the CPUs is utter madness.

Sure.  Do you read what I'm writing?

>> Which in turn means the "lowlevel cpu hw op" would work in a
>> slightly different way on Xen and native.  Nasty.
>>
>>>     That will
>>>     be maximally transparent and compatible, with zero changes needed
>>>     to the Linux kernel.
>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>
> Why?

See above.  Xen has to apply the changes to all cpus anyway.

>> Oops, the third "proper technical solutions" is missing.
>
> Yeah, the third one is to not touch MTRRs after bootup and use PAT.

Works only in case the CPU has PAT support.

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 13:21                                 ` Gerd Hoffmann
@ 2009-05-19 13:31                                   ` Ingo Molnar
  2009-05-19 13:51                                     ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 13:31 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds


* Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 05/19/09 14:26, Ingo Molnar wrote:
>> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>>
>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>> interface to manage MTRRs.
>>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>>> implementation uses that interface ...
>>
>> No, that is not an 'administrative interface' - that is a guest
>> kernel level hack that complicates Linux, extends its effective ABI
>> dependencies and which has to be maintained there from that point
>> on.
>>
>> There's really just three proper technical solutions here:
>>
>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>    isnt really all that different from the mtrr_ops approach so it
>>    shouldnt pose undue difficulties to the Xen hypervisor).
>
> Devil is in the details.
>
> The dom0 kernel might not see all physical cpus on the system.  So 
> Xen can't leave the job of looping over all cpus to the dom0 
> kernel, Xen has to apply the changes made by the (priviledged) 
> guest kernel on any (virtual) cpu to all (physical) cpus in the 
> machine.

Applying MTRR changes to only part of the CPUs is utter madness.

> Which in turn means the "lowlevel cpu hw op" would work in a 
> slightly different way on Xen and native.  Nasty.
>
>>    That will
>>    be maximally transparent and compatible, with zero changes needed
>>    to the Linux kernel.
>
> No, the linux kernel probably should do the wrmsr on one cpu only then.

Why?

>> - or introduce its own hypercall API based administration API,
>>    without bothering the guest kernel with crap. Trivially patch Xorg
>>    to make use of it and that's it.
>
> I have serious doubts that this is going to fly with KMS.
>
> Oops, the third "proper technical solutions" is missing.

Yeah, the third one is to not touch MTRRs after bootup and use PAT.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:26                               ` Ingo Molnar
  2009-05-19 12:32                                 ` Alan Cox
@ 2009-05-19 13:21                                 ` Gerd Hoffmann
  2009-05-19 13:31                                   ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 13:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 14:26, Ingo Molnar wrote:
> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>
>> On 05/19/09 13:08, Ingo Molnar wrote:
>>> Or, alternatively, the hypervisor can expose its own administrative
>>> interface to manage MTRRs.
>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>> implementation uses that interface ...
>
> No, that is not an 'administrative interface' - that is a guest
> kernel level hack that complicates Linux, extends its effective ABI
> dependencies and which has to be maintained there from that point
> on.
>
> There's really just three proper technical solutions here:
>
> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>    isnt really all that different from the mtrr_ops approach so it
>    shouldnt pose undue difficulties to the Xen hypervisor).

Devil is in the details.

The dom0 kernel might not see all physical cpus on the system.  So Xen 
can't leave the job of looping over all cpus to the dom0 kernel, Xen has 
to apply the changes made by the (priviledged) guest kernel on any 
(virtual) cpu to all (physical) cpus in the machine.

Which in turn means the "lowlevel cpu hw op" would work in a slightly 
different way on Xen and native.  Nasty.

>    That will
>    be maximally transparent and compatible, with zero changes needed
>    to the Linux kernel.

No, the linux kernel probably should do the wrmsr on one cpu only then.

> - or introduce its own hypercall API based administration API,
>    without bothering the guest kernel with crap. Trivially patch Xorg
>    to make use of it and that's it.

I have serious doubts that this is going to fly with KMS.

Oops, the third "proper technical solutions" is missing.

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:32                                 ` Alan Cox
@ 2009-05-19 12:37                                   ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Jan Beulich, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Eric W. Biederman


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > - or introduce its own hypercall API based administration API,
> >   without bothering the guest kernel with crap. Trivially patch Xorg
> >   to make use of it and that's it.
> 
> PCI pass through mixed with that isn't going to be fun and PCI 
> pass through is one area where a lot of the MTRR manipulation is 
> meaningful and valid (and can be handled for the guest).

virtualizing that aspect of the CPU properly is certainly the 
highest grade approach.

> I really don't see why rd/wrmsr processing isn't sufficient for 
> this

/me neither.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:26                               ` Ingo Molnar
@ 2009-05-19 12:32                                 ` Alan Cox
  2009-05-19 12:37                                   ` Ingo Molnar
  2009-05-19 13:21                                 ` Gerd Hoffmann
  1 sibling, 1 reply; 20+ messages in thread
From: Alan Cox @ 2009-05-19 12:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Jan Beulich, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Eric W. Biederman

> - or introduce its own hypercall API based administration API,
>   without bothering the guest kernel with crap. Trivially patch Xorg
>   to make use of it and that's it.

PCI pass through mixed with that isn't going to be fun and PCI pass
through is one area where a lot of the MTRR manipulation is meaningful
and valid (and can be handled for the guest).

I really don't see why rd/wrmsr processing isn't sufficient for this

Alan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:04                             ` Gerd Hoffmann
@ 2009-05-19 12:26                               ` Ingo Molnar
  2009-05-19 12:32                                 ` Alan Cox
  2009-05-19 13:21                                 ` Gerd Hoffmann
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:26 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds


* Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 05/19/09 13:08, Ingo Molnar wrote:
>> Or, alternatively, the hypervisor can expose its own administrative
>> interface to manage MTRRs.
>
> Guess what?  Xen does exactly that.  And the xen mtrr_ops 
> implementation uses that interface ...

No, that is not an 'administrative interface' - that is a guest 
kernel level hack that complicates Linux, extends its effective ABI 
dependencies and which has to be maintained there from that point 
on.

There's really just three proper technical solutions here:

- either catch the lowlevel CPU hw ops (the MSR modifications, which
  isnt really all that different from the mtrr_ops approach so it
  shouldnt pose undue difficulties to the Xen hypervisor). That will
  be maximally transparent and compatible, with zero changes needed 
  to the Linux kernel.

- or introduce its own hypercall API based administration API,
  without bothering the guest kernel with crap. Trivially patch Xorg
  to make use of it and that's it.

There is absolutely no reason to introduce some intermediate crap 
into Linux.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 11:08                           ` Ingo Molnar
@ 2009-05-19 12:04                             ` Gerd Hoffmann
  2009-05-19 12:26                               ` Ingo Molnar
  2009-05-20 16:35                             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 12:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 13:08, Ingo Molnar wrote:
> Or, alternatively, the hypervisor can expose its own administrative
> interface to manage MTRRs.

Guess what?  Xen does exactly that.  And the xen mtrr_ops implementation 
uses that interface ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 10:22                         ` [Xen-devel] " Jan Beulich
@ 2009-05-19 11:08                           ` Ingo Molnar
  2009-05-19 12:04                             ` Gerd Hoffmann
  2009-05-20 16:35                             ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 11:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers, Thomas Gleixner,
	Linus Torvalds, Xen-devel, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin


* Jan Beulich <JBeulich@novell.com> wrote:

> >>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>
> > Exactly what is 'bizarre' about using the API defined by the 
> > _CPU_ already, without adding any ad-hoc hypecall? Catch the 
> > dom0 WRMSRs, filter out the MTRR indices - that's it.
> 
> But that is *not* the same as using the hypercalls: The hypercall 
> tells Xen "Change all CPUs' MTRRs with the indicated index to the 
> indicated value", while the MSR write says "Change the MTRR with 
> the given index on the physical CPU the current virtual CPU 
> happens to run on to the given value". [...]

The change of MTRR's on _any_ of the guest CPUs in a dom0 context 
should immediately be refected on all CPUs. Assymetric MTRR settings 
are madness.

( And the thing is, changing MTRRs is fragile and racy on native 
  Linux no matter what - even without any hypervisors - due to SMM 
  contexts possibly relying on them etc. )

> [...] A write-base/write-mask pair may happen to get interrupted 
> (preempted) by the hypervisor, and hence the two writes may happen 
> on different pCPU-s. Teaching the hypervisor to (correctly!) guess 
> what the guest meant in that situation isn't trivial, as then it 
> needs to handle all possible situations (and it can never know 
> whether Dom0 really intended to do something that may look 
> bogus/inconsistent at the first glance). [...]

None of this is a problem really if a sane approach is used: a 
change to the MTRR state on dom0 is applied symmetrically on all 
CPUs.

Or, alternatively, the hypervisor can expose its own administrative 
interface to manage MTRRs.

There's no need to fuglify the Linux kernel for that.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19  9:59                       ` Ingo Molnar
@ 2009-05-19 10:22                         ` Jan Beulich
  2009-05-19 11:08                           ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2009-05-19 10:22 UTC (permalink / raw)
  To: Ingo Molnar, Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Thomas Gleixner, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin

>>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>Exactly what is 'bizarre' about using the API defined by the _CPU_ 
>already, without adding any ad-hoc hypecall? Catch the dom0 WRMSRs, 
>filter out the MTRR indices - that's it.

But that is *not* the same as using the hypercalls: The hypercall tells Xen
"Change all CPUs' MTRRs with the indicated index to the indicated value",
while the MSR write says "Change the MTRR with the given index on the
physical CPU the current virtual CPU happens to run on to the given value".
A write-base/write-mask pair may happen to get interrupted (preempted)
by the hypervisor, and hence the two writes may happen on different
pCPU-s. Teaching the hypervisor to (correctly!) guess what the guest
meant in that situation isn't trivial, as then it needs to handle all possible
situations (and it can never know whether Dom0 really intended to do
something that may look bogus/inconsistent at the first glance). This is
even more so considering that Linux is not the only OS capable of
running as Dom0.

An apparently relatively simple solution - latch the writes and commit them
only when the global view became consistent again - isn't possible,
since Dom0 may not have a vCPU running on each individual pCPU.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-18  8:59                   ` Ingo Molnar
@ 2009-05-18 13:17                     ` Jan Beulich
  2009-05-18 18:07                     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2009-05-18 13:17 UTC (permalink / raw)
  To: Ingo Molnar, Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Thomas Gleixner, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin

>>> Ingo Molnar <mingo@elte.hu> 18.05.09 10:59 >>>
>Here Xen invades an already fragile piece of upstream code 
>(/proc/mtrr) that is obsolete and on the way out. If you want a 
>solution you should add PAT support to Xen and you should use recent 

As Jeremy pointed out a number of times, Xen *does* have PAT support,
perhaps (that's my personal opinion) even superior to the Linux one, as
it doesn't redefine the 486-inherited caching mode attributes but rather
uses the full 3 bits that the hardware provides (and, as an
acknowledgement to the various hardware bugs, makes sure not to use
any large page mappings when using non-WB mappings).

>upstream kernels. Or you should emulate /proc/mtrr in _Xen the 
>hypervisor_, if you really care that much - without increasing the 
>amount of crap in Linux.

As Jeremy also pointed out previously, emulating the MTRRs in the
hypervisor is very undesirable (and technically at least very close to
impossible), as we're talking about the *real* MTTRs that need managing
here (whereas dealing with virtualized MTRRs in a fully virtualized guest
is a completely different - and very reasonable - thing).

I can only support Jeremy in asking that you please reconsider your NAK.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-05-25 16:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-25  8:42 [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation Tomasz Chmielewski
2009-05-25  9:15 ` Andi Kleen
2009-05-25  9:31   ` Jan Beulich
2009-05-25  9:47     ` Andi Kleen
2009-05-25 16:05   ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2009-05-12 23:27 Jeremy Fitzhardinge
2009-05-13 13:30 ` Ingo Molnar
2009-05-13 14:39   ` Jeremy Fitzhardinge
2009-05-15 18:27     ` Ingo Molnar
2009-05-15 20:09       ` Jeremy Fitzhardinge
2009-05-15 23:26         ` Eric W. Biederman
2009-05-15 23:49           ` Jeremy Fitzhardinge
2009-05-16  3:22             ` Jesse Barnes
2009-05-16  4:26               ` Eric W. Biederman
2009-05-18  4:57                 ` Jeremy Fitzhardinge
2009-05-18  8:59                   ` Ingo Molnar
2009-05-18 13:17                     ` [Xen-devel] " Jan Beulich
2009-05-18 18:07                     ` Jeremy Fitzhardinge
2009-05-19  9:59                       ` Ingo Molnar
2009-05-19 10:22                         ` [Xen-devel] " Jan Beulich
2009-05-19 11:08                           ` Ingo Molnar
2009-05-19 12:04                             ` Gerd Hoffmann
2009-05-19 12:26                               ` Ingo Molnar
2009-05-19 12:32                                 ` Alan Cox
2009-05-19 12:37                                   ` Ingo Molnar
2009-05-19 13:21                                 ` Gerd Hoffmann
2009-05-19 13:31                                   ` Ingo Molnar
2009-05-19 13:51                                     ` Gerd Hoffmann
2009-05-19 14:17                                       ` Ingo Molnar
2009-05-19 14:55                                         ` Gerd Hoffmann
2009-05-19 15:24                                           ` Ingo Molnar
2009-05-20  8:01                                             ` Gerd Hoffmann
2009-05-20 16:35                             ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).