All of lore.kernel.org
 help / color / mirror / Atom feed
* PV guests and APIC interaction
@ 2018-10-03 11:56 Andrew Cooper
  2018-10-04 10:45 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2018-10-03 11:56 UTC (permalink / raw)
  To: Xen-devel List
  Cc: Juergen Gross, Sergey Dyasli, Boris Ostrovsky, Jan Beulich,
	Igor Druzhinin

Hello,

A bug has recently been discovered internally, where a 4.14 dom0 was
observed to be doing this:

(XEN) [   16.035377] emul-priv-op.c:1166:d0v0 Domain attempted WRMSR 0000001b from 0x00000000fee00d00 to 0x00000000fee00100
(XEN) [   16.035392] emul-priv-op.c:1166:d0v0 Domain attempted WRMSR 0000001b from 0x00000000fee00d00 to 0x00000000fee00900
...
(XEN) [   18.798336] emul-priv-op.c:1166:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) [   18.798350] emul-priv-op.c:1166:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800

This is dom0 finding x2apic enabled in the APIC, and trying to cycle it
around to xapic mode, and raises multiple issues.

First and foremost, PV guests don't have an APIC and shouldn't be
playing with it at all.

It turns out that Xen advertise the hardware APIC bit to PV guests,
which isn't necessarily always set.  On top of that, the default
read/write-ignore behaviour of MSR lets Linux get into a position where
it thinks it is actually making real changes to the APIC mode.

Architecturally speaking, if we offer the APIC bit, we should honour
read/write requests correctly.  Obviously, this isn't a viable option -
hiding the APIC bit and raising #GP's is the only
architecturally-correct way to do this.

Given that we've already played "how much does Linux explode if it
thinks there is no APIC", does anyone have any suggestions for how to
resolve this without breaking Linux?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PV guests and APIC interaction
  2018-10-03 11:56 PV guests and APIC interaction Andrew Cooper
@ 2018-10-04 10:45 ` Jan Beulich
  2018-10-04 13:20   ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2018-10-04 10:45 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Boris Ostrovsky, Xen-devel List,
	Sergey Dyasli

>>> On 03.10.18 at 13:56, <andrew.cooper3@citrix.com> wrote:
> A bug has recently been discovered internally, where a 4.14 dom0 was
> observed to be doing this:
> 
> (XEN) [   16.035377] emul-priv-op.c:1166:d0v0 Domain attempted WRMSR 
> 0000001b from 0x00000000fee00d00 to 0x00000000fee00100
> (XEN) [   16.035392] emul-priv-op.c:1166:d0v0 Domain attempted WRMSR 
> 0000001b from 0x00000000fee00d00 to 0x00000000fee00900
> ...
> (XEN) [   18.798336] emul-priv-op.c:1166:d0v1 Domain attempted WRMSR 
> 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) [   18.798350] emul-priv-op.c:1166:d0v1 Domain attempted WRMSR 
> 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> 
> This is dom0 finding x2apic enabled in the APIC, and trying to cycle it
> around to xapic mode, and raises multiple issues.
> 
> First and foremost, PV guests don't have an APIC and shouldn't be
> playing with it at all.

This is the crucial point, imo. It is one of the downsides of the pv-ops
approach (allowing a single kernel binary to be used both without and
with Xen) that code like that dealing with the LAPIC can't simply be
compiled out to make sure it can't possibly be reached.

> It turns out that Xen advertise the hardware APIC bit to PV guests,
> which isn't necessarily always set.  On top of that, the default
> read/write-ignore behaviour of MSR lets Linux get into a position where
> it thinks it is actually making real changes to the APIC mode.
> 
> Architecturally speaking, if we offer the APIC bit, we should honour
> read/write requests correctly.  Obviously, this isn't a viable option -
> hiding the APIC bit and raising #GP's is the only
> architecturally-correct way to do this.
> 
> Given that we've already played "how much does Linux explode if it
> thinks there is no APIC", does anyone have any suggestions for how to
> resolve this without breaking Linux?

Hiding the APIC bits is not an options, afaict, as that would also
imply absence of any IO-APICs. What I don't understand is why
we surface X2APIC to PV guests. Wouldn't hiding that bit alone
address the specific issue above, even if the more general (xAPIC
related) one can't reasonably be addressed?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PV guests and APIC interaction
  2018-10-04 10:45 ` Jan Beulich
@ 2018-10-04 13:20   ` Andrew Cooper
  2018-10-04 13:45     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2018-10-04 13:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Igor Druzhinin, Boris Ostrovsky, Xen-devel List,
	Sergey Dyasli

On 04/10/18 11:45, Jan Beulich wrote:
>>>> On 03.10.18 at 13:56, <andrew.cooper3@citrix.com> wrote:
>> A bug has recently been discovered internally, where a 4.14 dom0 was
>> observed to be doing this:
>>
>> (XEN) [   16.035377] emul-priv-op.c:1166:d0v0 Domain attempted WRMSR 
>> 0000001b from 0x00000000fee00d00 to 0x00000000fee00100
>> (XEN) [   16.035392] emul-priv-op.c:1166:d0v0 Domain attempted WRMSR 
>> 0000001b from 0x00000000fee00d00 to 0x00000000fee00900
>> ...
>> (XEN) [   18.798336] emul-priv-op.c:1166:d0v1 Domain attempted WRMSR 
>> 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
>> (XEN) [   18.798350] emul-priv-op.c:1166:d0v1 Domain attempted WRMSR 
>> 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
>>
>> This is dom0 finding x2apic enabled in the APIC, and trying to cycle it
>> around to xapic mode, and raises multiple issues.
>>
>> First and foremost, PV guests don't have an APIC and shouldn't be
>> playing with it at all.
> This is the crucial point, imo. It is one of the downsides of the pv-ops
> approach (allowing a single kernel binary to be used both without and
> with Xen) that code like that dealing with the LAPIC can't simply be
> compiled out to make sure it can't possibly be reached.

It doesn't need to be compiled out, but it does need to be suitably
untouched when started via the PV path.

At least part of this problem is a Linux PVOps bug.

>
>> It turns out that Xen advertise the hardware APIC bit to PV guests,
>> which isn't necessarily always set.  On top of that, the default
>> read/write-ignore behaviour of MSR lets Linux get into a position where
>> it thinks it is actually making real changes to the APIC mode.
>>
>> Architecturally speaking, if we offer the APIC bit, we should honour
>> read/write requests correctly.  Obviously, this isn't a viable option -
>> hiding the APIC bit and raising #GP's is the only
>> architecturally-correct way to do this.
>>
>> Given that we've already played "how much does Linux explode if it
>> thinks there is no APIC", does anyone have any suggestions for how to
>> resolve this without breaking Linux?
> Hiding the APIC bits is not an options, afaict, as that would also
> imply absence of any IO-APICs.

I don't think you should draw any implication between the two.

The APIC bit is a hardware fast-forward, so can already be cleared on
hardware with IO-APICs.  The ACPI tables describe the IO-APICs, and that
is the only way any software has of finding them.

Furthermore, for a system which sets all the relevent "no legacy
hardware" bits in ACPI, there is no need to have an IO-APIC at all. 
There is provision in the latest PCI spec to have devices which are not
capable of generating legacy interrupts.

> What I don't understand is why
> we surface X2APIC to PV guests. Wouldn't hiding that bit alone
> address the specific issue above, even if the more general (xAPIC
> related) one can't reasonably be addressed?

From the cpumask work:

"Must expose hosts HTT and X2APIC value so a guest using native
CPUID can correctly interpret other leaves which cannot be
masked."

although to be perfectly honest, I don't remember exactly why.  It might
be to do with the visibility of leaf 0xb.

Furthermore, hiding the x2APIC feature but allowing APICBASE to be read
will cause extra confusion to the guest if it finds EXTD set.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PV guests and APIC interaction
  2018-10-04 13:20   ` Andrew Cooper
@ 2018-10-04 13:45     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2018-10-04 13:45 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Boris Ostrovsky, Xen-devel List,
	Sergey Dyasli

>>> On 04.10.18 at 15:20, <andrew.cooper3@citrix.com> wrote:
> On 04/10/18 11:45, Jan Beulich wrote:
>>>>> On 03.10.18 at 13:56, <andrew.cooper3@citrix.com> wrote:
>>> It turns out that Xen advertise the hardware APIC bit to PV guests,
>>> which isn't necessarily always set.  On top of that, the default
>>> read/write-ignore behaviour of MSR lets Linux get into a position where
>>> it thinks it is actually making real changes to the APIC mode.
>>>
>>> Architecturally speaking, if we offer the APIC bit, we should honour
>>> read/write requests correctly.  Obviously, this isn't a viable option -
>>> hiding the APIC bit and raising #GP's is the only
>>> architecturally-correct way to do this.
>>>
>>> Given that we've already played "how much does Linux explode if it
>>> thinks there is no APIC", does anyone have any suggestions for how to
>>> resolve this without breaking Linux?
>> Hiding the APIC bits is not an options, afaict, as that would also
>> imply absence of any IO-APICs.
> 
> I don't think you should draw any implication between the two.
> 
> The APIC bit is a hardware fast-forward, so can already be cleared on
> hardware with IO-APICs.  The ACPI tables describe the IO-APICs, and that
> is the only way any software has of finding them.

But without knowing there is an LAPIC on each CPU there's no way
to use any IO-APIC. IOW with LAPIC present bus disabled, IO-APICs
wouldn't be usable either.

> Furthermore, for a system which sets all the relevent "no legacy
> hardware" bits in ACPI, there is no need to have an IO-APIC at all. 
> There is provision in the latest PCI spec to have devices which are not
> capable of generating legacy interrupts.

That's fine, as being the opposite case.

>> What I don't understand is why
>> we surface X2APIC to PV guests. Wouldn't hiding that bit alone
>> address the specific issue above, even if the more general (xAPIC
>> related) one can't reasonably be addressed?
> 
> From the cpumask work:
> 
> "Must expose hosts HTT and X2APIC value so a guest using native
> CPUID can correctly interpret other leaves which cannot be
> masked."
> 
> although to be perfectly honest, I don't remember exactly why.  It might
> be to do with the visibility of leaf 0xb.

;-)

I'm not aware of leaf 0xb being tied to any particular other CPUID
output bit. And any (hardware) APIC ID in whatever leaf is
meaningless to PV anyway.

> Furthermore, hiding the x2APIC feature but allowing APICBASE to be read
> will cause extra confusion to the guest if it finds EXTD set.

Well, it would be easy enough to provide the default value here
for PV guests, instead of letting the physical register value
shine through.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-04 13:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 11:56 PV guests and APIC interaction Andrew Cooper
2018-10-04 10:45 ` Jan Beulich
2018-10-04 13:20   ` Andrew Cooper
2018-10-04 13:45     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.