* [PATCH 0/3] x86/hvm: add support for extended destination ID @ 2022-01-20 15:23 Roger Pau Monne 2022-01-20 15:23 ` [PATCH 1/3] xen/vioapic: add support for the extended destination ID field Roger Pau Monne ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Roger Pau Monne @ 2022-01-20 15:23 UTC (permalink / raw) To: xen-devel Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant Hello, The follow patch series is a very initial implementation of the extended destination ID for vIO-APIC and vMSI for HVM/PVH guests. This is already supported by QEMU/KVM and HyperV in order to target APIC IDs for device interrupts up to 32768 without the need of interrupt remapping. Note the feature is only exposed to guests with vPCI at the moment (see patch 2 for the details). In practice this means it's only exposed to a PVH dom0, and the maximum vCPU count for HVM/PVH is still not increased, as that would also require some changes to QEMU and the ACPI tables for domUs (both HVM and PVH). Thanks, Roger. Roger Pau Monne (3): xen/vioapic: add support for the extended destination ID field x86/vmsi: add support for extended destination ID in address field HACK: allow adding an offset to the x2APIC ID xen/arch/x86/cpuid.c | 12 +++++++++++- xen/arch/x86/hvm/dom0_build.c | 3 ++- xen/arch/x86/hvm/irq.c | 3 ++- xen/arch/x86/hvm/vioapic.c | 3 ++- xen/arch/x86/hvm/vlapic.c | 14 ++++++++++++-- xen/arch/x86/hvm/vmsi.c | 13 ++++++++++--- xen/arch/x86/include/asm/hvm/hvm.h | 5 +++-- xen/arch/x86/include/asm/hvm/vlapic.h | 2 ++ xen/arch/x86/include/asm/msi.h | 1 + xen/arch/x86/traps.c | 3 +++ xen/drivers/passthrough/x86/hvm.c | 10 +++++++--- xen/include/public/arch-x86/cpuid.h | 6 ++++++ xen/include/public/arch-x86/hvm/save.h | 4 +++- xen/include/public/domctl.h | 1 + 14 files changed, 65 insertions(+), 15 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-20 15:23 [PATCH 0/3] x86/hvm: add support for extended destination ID Roger Pau Monne @ 2022-01-20 15:23 ` Roger Pau Monne 2022-01-24 13:20 ` Jan Beulich 2022-01-20 15:23 ` [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field Roger Pau Monne 2022-01-20 15:23 ` [PATCH 3/3] HACK: allow adding an offset to the x2APIC ID Roger Pau Monne 2 siblings, 1 reply; 18+ messages in thread From: Roger Pau Monne @ 2022-01-20 15:23 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu Such field uses bits 55:48, but for the purposes the register will be used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is in remappable format which is not supported by the vIO-APIC. Use the extended destination ID to store the high bits from the destination ID, thus expanding the size of the destination ID field to 15 bits, allowing an IO-APIC to target APIC IDs up to 32768. Note this is already supported by QEMU/KVM and HyperV. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/hvm/vioapic.c | 3 ++- xen/include/public/arch-x86/hvm/save.h | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 553c0f76ef..1f2305c232 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -412,7 +412,8 @@ static void ioapic_inj_irq( static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) { - uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; + uint16_t dest = vioapic->redirtbl[pin].fields.dest_id | + (vioapic->redirtbl[pin].fields.ext_dest_id << 8); uint8_t dest_mode = vioapic->redirtbl[pin].fields.dest_mode; uint8_t delivery_mode = vioapic->redirtbl[pin].fields.delivery_mode; uint8_t vector = vioapic->redirtbl[pin].fields.vector; diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index 773a380bc2..14a5d94588 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -376,7 +376,9 @@ union vioapic_redir_entry uint8_t trig_mode:1; uint8_t mask:1; uint8_t reserve:7; - uint8_t reserved[4]; + uint8_t reserved[3]; + uint8_t :1; + uint8_t ext_dest_id:7; uint8_t dest_id; } fields; }; -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-20 15:23 ` [PATCH 1/3] xen/vioapic: add support for the extended destination ID field Roger Pau Monne @ 2022-01-24 13:20 ` Jan Beulich 2022-01-25 15:13 ` Roger Pau Monné 0 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2022-01-24 13:20 UTC (permalink / raw) To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel On 20.01.2022 16:23, Roger Pau Monne wrote: > Such field uses bits 55:48, but for the purposes the register will be > used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is > in remappable format which is not supported by the vIO-APIC. Neither here nor in the cover letter you point at a formal specification of this mode of operation. What I'm aware of are vague indications of this mode's existence in some of Intel's chipset data sheets. Yet that leaves open, for example, whether indeed bit 48 cannot be used here. > --- a/xen/arch/x86/hvm/vioapic.c > +++ b/xen/arch/x86/hvm/vioapic.c > @@ -412,7 +412,8 @@ static void ioapic_inj_irq( > > static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) > { > - uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; > + uint16_t dest = vioapic->redirtbl[pin].fields.dest_id | > + (vioapic->redirtbl[pin].fields.ext_dest_id << 8); What if an existing guest has been writing non-zero in these bits? Can you really use them here without any further indication by the guest? Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-24 13:20 ` Jan Beulich @ 2022-01-25 15:13 ` Roger Pau Monné 2022-01-26 12:47 ` Jan Beulich 2022-01-26 13:52 ` David Woodhouse 0 siblings, 2 replies; 18+ messages in thread From: Roger Pau Monné @ 2022-01-25 15:13 UTC (permalink / raw) To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, David Woodhouse On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote: > On 20.01.2022 16:23, Roger Pau Monne wrote: > > Such field uses bits 55:48, but for the purposes the register will be > > used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is > > in remappable format which is not supported by the vIO-APIC. > > Neither here nor in the cover letter you point at a formal specification > of this mode of operation. I'm not aware of any formal specification of this mode, apart from the work done to introduce support in Linux and QEMU: https://lore.kernel.org/all/20201009104616.1314746-1-dwmw2@infradead.org/ https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e Adding David in case there's some kind of specification somewhere I'm not aware of. > What I'm aware of are vague indications of > this mode's existence in some of Intel's chipset data sheets. Yet that > leaves open, for example, whether indeed bit 48 cannot be used here. Bit 48 cannot be used because it's already used to signal an RTE is in remappable format. We still want to differentiate an RTE entry in remappable format, as it should be possible to expose both the extended ID support and an emulated IOMMU. > > --- a/xen/arch/x86/hvm/vioapic.c > > +++ b/xen/arch/x86/hvm/vioapic.c > > @@ -412,7 +412,8 @@ static void ioapic_inj_irq( > > > > static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) > > { > > - uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; > > + uint16_t dest = vioapic->redirtbl[pin].fields.dest_id | > > + (vioapic->redirtbl[pin].fields.ext_dest_id << 8); > > What if an existing guest has been writing non-zero in these bits? Can > you really use them here without any further indication by the guest? Those bits where reserved previously, so no OS should have used them. There are hypervisors already in the field (QEMU/KVM and HyperV) using this mode. We could add a per-domain option to disable extended ID mode if we are really worried about OSes having used those bits for some reason. Thanks, Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-25 15:13 ` Roger Pau Monné @ 2022-01-26 12:47 ` Jan Beulich 2022-01-26 19:21 ` David Woodhouse 2022-01-26 13:52 ` David Woodhouse 1 sibling, 1 reply; 18+ messages in thread From: Jan Beulich @ 2022-01-26 12:47 UTC (permalink / raw) To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel, David Woodhouse On 25.01.2022 16:13, Roger Pau Monné wrote: > On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote: >> On 20.01.2022 16:23, Roger Pau Monne wrote: >>> Such field uses bits 55:48, but for the purposes the register will be >>> used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is >>> in remappable format which is not supported by the vIO-APIC. >> >> Neither here nor in the cover letter you point at a formal specification >> of this mode of operation. > > I'm not aware of any formal specification of this mode, apart from the > work done to introduce support in Linux and QEMU: > > https://lore.kernel.org/all/20201009104616.1314746-1-dwmw2@infradead.org/ > https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e > > Adding David in case there's some kind of specification somewhere I'm > not aware of. > >> What I'm aware of are vague indications of >> this mode's existence in some of Intel's chipset data sheets. Yet that >> leaves open, for example, whether indeed bit 48 cannot be used here. > > Bit 48 cannot be used because it's already used to signal an RTE is in > remappable format. We still want to differentiate an RTE entry in > remappable format, as it should be possible to expose both the > extended ID support and an emulated IOMMU. I think I did say so on irc already: There's not really a problem like this. For one I wouldn't expect an OS to use this extended ID at the same time as having an IOMMU to deal with the width restriction. And then, even if they wanted to use both at the same time, they'd simply need to care about the specific meaning of this bit themselves: When the bit is set, it would be unavoidable to have it (perhaps identity-) remapped by the IOMMU. >>> --- a/xen/arch/x86/hvm/vioapic.c >>> +++ b/xen/arch/x86/hvm/vioapic.c >>> @@ -412,7 +412,8 @@ static void ioapic_inj_irq( >>> >>> static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) >>> { >>> - uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; >>> + uint16_t dest = vioapic->redirtbl[pin].fields.dest_id | >>> + (vioapic->redirtbl[pin].fields.ext_dest_id << 8); >> >> What if an existing guest has been writing non-zero in these bits? Can >> you really use them here without any further indication by the guest? > > Those bits where reserved previously, so no OS should have used them. > There are hypervisors already in the field (QEMU/KVM and HyperV) using > this mode. > > We could add a per-domain option to disable extended ID mode if we are > really worried about OSes having used those bits for some reason. Generally I think previously ignored bits need to be handled with care. If there was a specification, what is being said there might serve as a guideline for us. Even if there was just a proper description of the EDID field found in recent Intel chipset spec, this might already help determining whether we want/need an enable (or disable). But there's not even a bit announcing the functionality in, say, the version register. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-26 12:47 ` Jan Beulich @ 2022-01-26 19:21 ` David Woodhouse 0 siblings, 0 replies; 18+ messages in thread From: David Woodhouse @ 2022-01-26 19:21 UTC (permalink / raw) To: Jan Beulich, Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel [-- Attachment #1: Type: text/plain, Size: 5034 bytes --] On Wed, 2022-01-26 at 13:47 +0100, Jan Beulich wrote: > On 25.01.2022 16:13, Roger Pau Monné wrote: > > On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote: > > > On 20.01.2022 16:23, Roger Pau Monne wrote: > > > > Such field uses bits 55:48, but for the purposes the register > > > > will be > > > > used use bits 55:49 instead. Bit 48 is used to signal an RTE > > > > entry is > > > > in remappable format which is not supported by the vIO-APIC. > > > > > > Neither here nor in the cover letter you point at a formal > > > specification > > > of this mode of operation. > > > > I'm not aware of any formal specification of this mode, apart from > > the > > work done to introduce support in Linux and QEMU: > > > > https://lore.kernel.org/all/20201009104616.1314746-1-dwmw2@infradead.org/ > > > > https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e > > > > > > Adding David in case there's some kind of specification somewhere > > I'm > > not aware of. > > > > > What I'm aware of are vague indications of > > > this mode's existence in some of Intel's chipset data sheets. Yet that > > > leaves open, for example, whether indeed bit 48 cannot be used here. > > > > Bit 48 cannot be used because it's already used to signal an RTE is in > > remappable format. We still want to differentiate an RTE entry in > > remappable format, as it should be possible to expose both the > > extended ID support and an emulated IOMMU. > > I think I did say so on irc already: There's not really a problem like > this. For one I wouldn't expect an OS to use this extended ID at the > same time as having an IOMMU to deal with the width restriction. And > then, even if they wanted to use both at the same time, they'd simply > need to care about the specific meaning of this bit themselves: When > the bit is set, it would be unavoidable to have it (perhaps identity-) > remapped by the IOMMU. As you later said, it's too late for bikeshedding that decision. But I stand by it regardless of the time. Even by the time *I* made that choice, it was long since established by Intel. You could make the same argument about their original hardware design, that the format bit is pointless and that if an OS enables interrupt remapping, it knows full well when it's going to use it. It can even be configured in the IOMMU per PCI function. There is benefit to having a very clear and unambiguous difference between the MSI formats that isn't entirely dependent on the IOMMU being configured correctly. And in my case there is *definitely* benefit to following the precedent already set by Intel in the real hardware. For me, those outweighed the marginal additional benefit of going from 15 to 16 bits of APIC ID in the MSI. > > > > --- a/xen/arch/x86/hvm/vioapic.c > > > > +++ b/xen/arch/x86/hvm/vioapic.c > > > > @@ -412,7 +412,8 @@ static void ioapic_inj_irq( > > > > > > > > static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) > > > > { > > > > - uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; > > > > + uint16_t dest = vioapic->redirtbl[pin].fields.dest_id | > > > > + (vioapic->redirtbl[pin].fields.ext_dest_id << 8); > > > > > > What if an existing guest has been writing non-zero in these bits? Can > > > you really use them here without any further indication by the guest? > > > > Those bits where reserved previously, so no OS should have used them. > > There are hypervisors already in the field (QEMU/KVM and HyperV) using > > this mode. > > > > We could add a per-domain option to disable extended ID mode if we are > > really worried about OSes having used those bits for some reason. > > Generally I think previously ignored bits need to be handled with care. > If there was a specification, what is being said there might serve as > a guideline for us. Even if there was just a proper description of the > EDID field found in recent Intel chipset spec, this might already help > determining whether we want/need an enable (or disable). But there's > not even a bit announcing the functionality in, say, the version > register. It's not very verbose, but the Extended Destination ID in the I/OAPIC is at least mentioned in the RTE documentation in the 82806AA datasheet https://datasheet.octopart.com/FW82806AA-SL3VZ-Intel-datasheet-13695406.pdf See page 47, §2.4.10 "Redirection Table High DWord". The rest you have to kind of piece together from the later documentation once they actually started *using* it for IRQ remapping. I think it may also have been used on IA64? The realisation that we didn't need to have special different code to compose RTE entries for Compatibility Format vs. Remappable Format, and that we could just allow the 'upstream' APIC code to compose the MSI message and then swizzle the bits into the RTE... was rather slow to come. https://lore.kernel.org/all/20201024213535.443185-22-dwmw2@infradead.org/ [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-25 15:13 ` Roger Pau Monné 2022-01-26 12:47 ` Jan Beulich @ 2022-01-26 13:52 ` David Woodhouse 2022-01-26 14:23 ` Jan Beulich 1 sibling, 1 reply; 18+ messages in thread From: David Woodhouse @ 2022-01-26 13:52 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel [-- Attachment #1: Type: text/plain, Size: 3225 bytes --] On Tue, 2022-01-25 at 16:13 +0100, Roger Pau Monné wrote: > On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote: > > On 20.01.2022 16:23, Roger Pau Monne wrote: > > > Such field uses bits 55:48, but for the purposes the register will be > > > used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is > > > in remappable format which is not supported by the vIO-APIC. > > > > Neither here nor in the cover letter you point at a formal specification > > of this mode of operation. > > I'm not aware of any formal specification of this mode, apart from the > work done to introduce support in Linux and QEMU: > > https://lore.kernel.org/all/20201009104616.1314746-1-dwmw2@infradead.org/ > > https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e > > > Adding David in case there's some kind of specification somewhere I'm > not aware of. Indeed there is no formal specification that I am aware of, although it's vaguely possible that Microsoft wrote something up when they added it to Hyper-V. https://lore.kernel.org/all/20201103011136.59108-1-decui@microsoft.com/ I had an internal doc which.... looks like I can clean it up a tiny bit and then share at http://david.woodhou.se/15-bit-msi.pdf if that helps? > > What I'm aware of are vague indications of > > this mode's existence in some of Intel's chipset data sheets. Yet that > > leaves open, for example, whether indeed bit 48 cannot be used here. > > Bit 48 cannot be used because it's already used to signal an RTE is in > remappable format. We still want to differentiate an RTE entry in > remappable format, as it should be possible to expose both the > extended ID support and an emulated IOMMU. Right. I chose not to use the low bit of the existing Extended Destination ID because that's the one Intel used to indicate Remappable Format. This means we can still expose an IOMMU to guests and easily distinguish between Compatibility Format and Remappable Format MSIs just as real hardware does. > > > --- a/xen/arch/x86/hvm/vioapic.c > > > +++ b/xen/arch/x86/hvm/vioapic.c > > > @@ -412,7 +412,8 @@ static void ioapic_inj_irq( > > > > > > static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) > > > { > > > - uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; > > > + uint16_t dest = vioapic->redirtbl[pin].fields.dest_id | > > > + (vioapic->redirtbl[pin].fields.ext_dest_id << 8); > > > > What if an existing guest has been writing non-zero in these bits? Can > > you really use them here without any further indication by the guest? > > Those bits where reserved previously, so no OS should have used them. > There are hypervisors already in the field (QEMU/KVM and HyperV) using > this mode. > > We could add a per-domain option to disable extended ID mode if we are > really worried about OSes having used those bits for some reason. Note that I didn't even have to touch this part in qemu; the swizzling of those 8 'Extended Destination ID' bits from the RTE into the corresponding bits of the MSI message was already being done — without which, IRQ remapping wouldn't have worked anyway. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field 2022-01-26 13:52 ` David Woodhouse @ 2022-01-26 14:23 ` Jan Beulich 0 siblings, 0 replies; 18+ messages in thread From: Jan Beulich @ 2022-01-26 14:23 UTC (permalink / raw) To: David Woodhouse; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monné On 26.01.2022 14:52, David Woodhouse wrote: > On Tue, 2022-01-25 at 16:13 +0100, Roger Pau Monné wrote: >> On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote: >>> On 20.01.2022 16:23, Roger Pau Monne wrote: >>>> Such field uses bits 55:48, but for the purposes the register will be >>>> used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is >>>> in remappable format which is not supported by the vIO-APIC. >>> >>> Neither here nor in the cover letter you point at a formal specification >>> of this mode of operation. >> >> I'm not aware of any formal specification of this mode, apart from the >> work done to introduce support in Linux and QEMU: >> >> https://lore.kernel.org/all/20201009104616.1314746-1-dwmw2@infradead.org/ >> >> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e >> >> >> Adding David in case there's some kind of specification somewhere I'm >> not aware of. > > Indeed there is no formal specification that I am aware of, although > it's vaguely possible that Microsoft wrote something up when they added > it to Hyper-V. > > https://lore.kernel.org/all/20201103011136.59108-1-decui@microsoft.com/ > > I had an internal doc which.... looks like I can clean it up a tiny bit > and then share at http://david.woodhou.se/15-bit-msi.pdf if that helps? Thanks, this at least puts us on common grounds. >>> What I'm aware of are vague indications of >>> this mode's existence in some of Intel's chipset data sheets. Yet that >>> leaves open, for example, whether indeed bit 48 cannot be used here. >> >> Bit 48 cannot be used because it's already used to signal an RTE is in >> remappable format. We still want to differentiate an RTE entry in >> remappable format, as it should be possible to expose both the >> extended ID support and an emulated IOMMU. > > Right. I chose not to use the low bit of the existing Extended > Destination ID because that's the one Intel used to indicate Remappable > Format. This means we can still expose an IOMMU to guests and easily > distinguish between Compatibility Format and Remappable Format MSIs > just as real hardware does. Well, with the defacto standard of using only 7 of the bits we will have to follow suit of course. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-01-20 15:23 [PATCH 0/3] x86/hvm: add support for extended destination ID Roger Pau Monne 2022-01-20 15:23 ` [PATCH 1/3] xen/vioapic: add support for the extended destination ID field Roger Pau Monne @ 2022-01-20 15:23 ` Roger Pau Monne 2022-01-24 13:47 ` Jan Beulich 2022-01-20 15:23 ` [PATCH 3/3] HACK: allow adding an offset to the x2APIC ID Roger Pau Monne 2 siblings, 1 reply; 18+ messages in thread From: Roger Pau Monne @ 2022-01-20 15:23 UTC (permalink / raw) To: xen-devel Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant Both QEMU/KVM and HyperV support using bits 11:5 from the MSI address field in order to store the high part of the target APIC ID. This allows expanding the maximum APID ID usable without interrupt remapping support from 255 to 32768. Note the interface used by QEMU for emulated devices (via the XEN_DMOP_inject_msi hypercall) already passes both the address and data fields into Xen for processing, so there's no need for any change to QEMU there. However for PCI passthrough devices QEMU uses the XEN_DOMCTL_bind_pt_irq hypercall which does need an addition to the gflags field in order to pass the high bits of the APIC destination ID. Introduce a new CPUID flag to signal the support for the feature. The introduced flag covers both the support for extended ID for the IO-APIC RTE and the MSI address registers. Such flag is currently only exposed when the domain is using vPCI (ie: a PVH dom0). Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/hvm/irq.c | 3 ++- xen/arch/x86/hvm/vmsi.c | 13 ++++++++++--- xen/arch/x86/include/asm/hvm/hvm.h | 5 +++-- xen/arch/x86/include/asm/msi.h | 1 + xen/arch/x86/traps.c | 3 +++ xen/drivers/passthrough/x86/hvm.c | 10 +++++++--- xen/include/public/arch-x86/cpuid.h | 6 ++++++ xen/include/public/domctl.h | 1 + 8 files changed, 33 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 52aae4565f..b9b5182369 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -383,7 +383,8 @@ int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq) int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data) { uint32_t tmp = (uint32_t) addr; - uint8_t dest = (tmp & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT; + unsigned int dest = (MASK_EXTR(tmp, MSI_ADDR_EXT_DEST_ID_MASK) << 8) | + MASK_EXTR(tmp, MSI_ADDR_DEST_ID_MASK); uint8_t dest_mode = !!(tmp & MSI_ADDR_DESTMODE_MASK); uint8_t delivery_mode = (data & MSI_DATA_DELIVERY_MODE_MASK) >> MSI_DATA_DELIVERY_MODE_SHIFT; diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 13e2a190b4..ec0f3bc13f 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -66,7 +66,7 @@ static void vmsi_inj_irq( int vmsi_deliver( struct domain *d, int vector, - uint8_t dest, uint8_t dest_mode, + unsigned int dest, unsigned int dest_mode, uint8_t delivery_mode, uint8_t trig_mode) { struct vlapic *target; @@ -107,7 +107,9 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci) { uint32_t flags = pirq_dpci->gmsi.gflags; int vector = pirq_dpci->gmsi.gvec; - uint8_t dest = (uint8_t)flags; + unsigned int dest = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) | + (MASK_EXTR(flags, + XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) << 8); bool dest_mode = flags & XEN_DOMCTL_VMSI_X86_DM_MASK; uint8_t delivery_mode = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DELIV_MASK); bool trig_mode = flags & XEN_DOMCTL_VMSI_X86_TRIG_MASK; @@ -123,7 +125,8 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci) } /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */ -int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode) +int hvm_girq_dest_2_vcpu_id(struct domain *d, unsigned int dest, + unsigned int dest_mode) { int dest_vcpu_id = -1, w = 0; struct vcpu *v; @@ -645,6 +648,8 @@ static unsigned int msi_gflags(uint16_t data, uint64_t addr, bool masked) */ return MASK_INSR(MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK), XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) | + MASK_INSR(MASK_EXTR(addr, MSI_ADDR_EXT_DEST_ID_MASK), + XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) | MASK_INSR(MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK), XEN_DOMCTL_VMSI_X86_RH_MASK) | MASK_INSR(MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK), @@ -835,6 +840,7 @@ void vpci_msi_arch_print(const struct vpci_msi *msi) msi->data & MSI_DATA_LEVEL_ASSERT ? "" : "de", msi->address & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys", msi->address & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "fixed", + (MASK_EXTR(msi->address, MSI_ADDR_EXT_DEST_ID_MASK) << 8) | MASK_EXTR(msi->address, MSI_ADDR_DEST_ID_MASK), msi->arch.pirq); } @@ -904,6 +910,7 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) entry->data & MSI_DATA_LEVEL_ASSERT ? "" : "de", entry->addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys", entry->addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "fixed", + (MASK_EXTR(entry->addr, MSI_ADDR_EXT_DEST_ID_MASK) << 8) | MASK_EXTR(entry->addr, MSI_ADDR_DEST_ID_MASK), entry->masked, entry->arch.pirq); if ( i && !(i % 64) ) diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h index b26302d9e7..f001b43a21 100644 --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -270,11 +270,12 @@ uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc); int vmsi_deliver( struct domain *d, int vector, - uint8_t dest, uint8_t dest_mode, + unsigned int dest, unsigned int dest_mode, uint8_t delivery_mode, uint8_t trig_mode); struct hvm_pirq_dpci; void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *); -int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode); +int hvm_girq_dest_2_vcpu_id(struct domain *d, unsigned int dest, + unsigned int dest_mode); enum hvm_intblk hvm_interrupt_blocked(struct vcpu *v, struct hvm_intack intack); diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h index e228b0f3f3..531b860e42 100644 --- a/xen/arch/x86/include/asm/msi.h +++ b/xen/arch/x86/include/asm/msi.h @@ -54,6 +54,7 @@ #define MSI_ADDR_DEST_ID_SHIFT 12 #define MSI_ADDR_DEST_ID_MASK 0x00ff000 #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) +#define MSI_ADDR_EXT_DEST_ID_MASK 0x0000fe0 /* MAX fixed pages reserved for mapping MSIX tables. */ #define FIX_MSIX_MAX_PAGES 512 diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 485bd66971..3d2d75978c 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1150,6 +1150,9 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, uint32_t leaf, res->a |= XEN_HVM_CPUID_DOMID_PRESENT; res->c = d->domain_id; + if ( has_vpci(d) ) + res->a |= XEN_HVM_CPUID_EXT_DEST_ID; + break; case 5: /* PV-specific parameters */ diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c index 351daafdc9..666c4b7757 100644 --- a/xen/drivers/passthrough/x86/hvm.c +++ b/xen/drivers/passthrough/x86/hvm.c @@ -269,7 +269,7 @@ int pt_irq_create_bind( { case PT_IRQ_TYPE_MSI: { - uint8_t dest, delivery_mode; + unsigned int dest, delivery_mode; bool dest_mode; int dest_vcpu_id; const struct vcpu *vcpu; @@ -345,7 +345,9 @@ int pt_irq_create_bind( } /* Calculate dest_vcpu_id for MSI-type pirq migration. */ dest = MASK_EXTR(pirq_dpci->gmsi.gflags, - XEN_DOMCTL_VMSI_X86_DEST_ID_MASK); + XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) | + (MASK_EXTR(pirq_dpci->gmsi.gflags, + XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) << 8); dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK; delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags, XEN_DOMCTL_VMSI_X86_DELIV_MASK); @@ -782,7 +784,9 @@ static int _hvm_dpci_msi_eoi(struct domain *d, (pirq_dpci->gmsi.gvec == vector) ) { unsigned int dest = MASK_EXTR(pirq_dpci->gmsi.gflags, - XEN_DOMCTL_VMSI_X86_DEST_ID_MASK); + XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) | + (MASK_EXTR(pirq_dpci->gmsi.gflags, + XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) << 8); bool dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK; if ( vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest, diff --git a/xen/include/public/arch-x86/cpuid.h b/xen/include/public/arch-x86/cpuid.h index ce46305bee..49bcc93b6b 100644 --- a/xen/include/public/arch-x86/cpuid.h +++ b/xen/include/public/arch-x86/cpuid.h @@ -102,6 +102,12 @@ #define XEN_HVM_CPUID_IOMMU_MAPPINGS (1u << 2) #define XEN_HVM_CPUID_VCPU_ID_PRESENT (1u << 3) /* vcpu id is present in EBX */ #define XEN_HVM_CPUID_DOMID_PRESENT (1u << 4) /* domid is present in ECX */ +/* + * Bits 55:49 from the IO-APIC RTE and bits 11:5 from the MSI address can be + * used to store high bits for the Destination ID. This expands the Destination + * ID field from 8 to 15 bits, allowing to target APIC IDs up 32768. + */ +#define XEN_HVM_CPUID_EXT_DEST_ID (1u << 5) /* * Leaf 6 (0x40000x05) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index b85e6170b0..17ac7ef82b 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 #define XEN_DOMCTL_VMSI_X86_TRIG_MASK 0x008000 #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x010000 +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe0000 uint64_aligned_t gtable; } msi; -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-01-20 15:23 ` [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field Roger Pau Monne @ 2022-01-24 13:47 ` Jan Beulich 2022-01-26 13:54 ` David Woodhouse 2022-02-04 9:23 ` Roger Pau Monné 0 siblings, 2 replies; 18+ messages in thread From: Jan Beulich @ 2022-01-24 13:47 UTC (permalink / raw) To: Roger Pau Monne Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel On 20.01.2022 16:23, Roger Pau Monne wrote: > Both QEMU/KVM and HyperV support using bits 11:5 from the MSI address > field in order to store the high part of the target APIC ID. This > allows expanding the maximum APID ID usable without interrupt > remapping support from 255 to 32768. > > Note the interface used by QEMU for emulated devices (via the > XEN_DMOP_inject_msi hypercall) already passes both the address and > data fields into Xen for processing, so there's no need for any change > to QEMU there. > > However for PCI passthrough devices QEMU uses the > XEN_DOMCTL_bind_pt_irq hypercall which does need an addition to the > gflags field in order to pass the high bits of the APIC destination > ID. > > Introduce a new CPUID flag to signal the support for the feature. The > introduced flag covers both the support for extended ID for the > IO-APIC RTE and the MSI address registers. Such flag is currently only > exposed when the domain is using vPCI (ie: a PVH dom0). Because of also covering the IO-APIC side, I think the CPUID aspect of this really wants splitting into a 3rd patch. That way the MSI and IO-APIC parts could in principle go in independently, and only the CPUID one needs to remain at the tail. > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -66,7 +66,7 @@ static void vmsi_inj_irq( > > int vmsi_deliver( > struct domain *d, int vector, > - uint8_t dest, uint8_t dest_mode, > + unsigned int dest, unsigned int dest_mode, If you change the type of dest_mode, then to "bool" please - see its only call site. > @@ -123,7 +125,8 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci) > } > > /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */ > -int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode) > +int hvm_girq_dest_2_vcpu_id(struct domain *d, unsigned int dest, > + unsigned int dest_mode) Same here then. > --- a/xen/arch/x86/include/asm/msi.h > +++ b/xen/arch/x86/include/asm/msi.h > @@ -54,6 +54,7 @@ > #define MSI_ADDR_DEST_ID_SHIFT 12 > #define MSI_ADDR_DEST_ID_MASK 0x00ff000 > #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) > +#define MSI_ADDR_EXT_DEST_ID_MASK 0x0000fe0 Especially the immediately preceding macro now becomes kind of stale. > --- a/xen/drivers/passthrough/x86/hvm.c > +++ b/xen/drivers/passthrough/x86/hvm.c > @@ -269,7 +269,7 @@ int pt_irq_create_bind( > { > case PT_IRQ_TYPE_MSI: > { > - uint8_t dest, delivery_mode; > + unsigned int dest, delivery_mode; > bool dest_mode; If you touch delivery_mode's type, wouldn't that better become bool? > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { > #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 > #define XEN_DOMCTL_VMSI_X86_TRIG_MASK 0x008000 > #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x010000 > +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe0000 I'm not convinced it is a good idea to limit the overall destination ID width to 15 bits here - at the interface level we could as well permit more bits right away; the implementation would reject too high a value, of course. Not only with this I further wonder whether the field shouldn't be unsplit while extending it. You won't get away without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was bumped already for 4.17) since afaics the unused bits of this field previously weren't checked for being zero. We could easily have 8 bits vector, 16 bits flags, and 32 bits destination ID in the struct. And there would then still be 8 unused bits (which from now on we ought to check for being zero). Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-01-24 13:47 ` Jan Beulich @ 2022-01-26 13:54 ` David Woodhouse 2022-01-26 14:22 ` Roger Pau Monné 2022-02-04 9:23 ` Roger Pau Monné 1 sibling, 1 reply; 18+ messages in thread From: David Woodhouse @ 2022-01-26 13:54 UTC (permalink / raw) To: Jan Beulich, Roger Pau Monne Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel [-- Attachment #1: Type: text/plain, Size: 353 bytes --] On Mon, 2022-01-24 at 14:47 +0100, Jan Beulich wrote: > Because of also covering the IO-APIC side, I think the CPUID aspect of > this really wants splitting into a 3rd patch. That way the MSI and > IO-APIC parts could in principle go in independently, and only the > CPUID one needs to remain at the tail. HPET can generate MSIs directly too. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-01-26 13:54 ` David Woodhouse @ 2022-01-26 14:22 ` Roger Pau Monné 0 siblings, 0 replies; 18+ messages in thread From: Roger Pau Monné @ 2022-01-26 14:22 UTC (permalink / raw) To: David Woodhouse Cc: Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel On Wed, Jan 26, 2022 at 01:54:26PM +0000, David Woodhouse wrote: > On Mon, 2022-01-24 at 14:47 +0100, Jan Beulich wrote: > > Because of also covering the IO-APIC side, I think the CPUID aspect of > > this really wants splitting into a 3rd patch. That way the MSI and > > IO-APIC parts could in principle go in independently, and only the > > CPUID one needs to remain at the tail. > > HPET can generate MSIs directly too. Indeed, but the emulated one we expose to HVM guests doesn't support FSB. Thanks, Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-01-24 13:47 ` Jan Beulich 2022-01-26 13:54 ` David Woodhouse @ 2022-02-04 9:23 ` Roger Pau Monné 2022-02-04 9:30 ` Jan Beulich 1 sibling, 1 reply; 18+ messages in thread From: Roger Pau Monné @ 2022-02-04 9:23 UTC (permalink / raw) To: Jan Beulich Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: > On 20.01.2022 16:23, Roger Pau Monne wrote: > > --- a/xen/arch/x86/include/asm/msi.h > > +++ b/xen/arch/x86/include/asm/msi.h > > @@ -54,6 +54,7 @@ > > #define MSI_ADDR_DEST_ID_SHIFT 12 > > #define MSI_ADDR_DEST_ID_MASK 0x00ff000 > > #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) > > +#define MSI_ADDR_EXT_DEST_ID_MASK 0x0000fe0 > > Especially the immediately preceding macro now becomes kind of stale. Hm, I'm not so sure about that. We could expand the macro to place the high bits in dest at bits 11:4 of the resulting address. However that macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own messages, so until we add support for the hypervisor itself to use the extended destination ID mode there's not much point in modifying the macro IMO. > > > --- a/xen/drivers/passthrough/x86/hvm.c > > +++ b/xen/drivers/passthrough/x86/hvm.c > > @@ -269,7 +269,7 @@ int pt_irq_create_bind( > > { > > case PT_IRQ_TYPE_MSI: > > { > > - uint8_t dest, delivery_mode; > > + unsigned int dest, delivery_mode; > > bool dest_mode; > > If you touch delivery_mode's type, wouldn't that better become bool? > > > --- a/xen/include/public/domctl.h > > +++ b/xen/include/public/domctl.h > > @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { > > #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 > > #define XEN_DOMCTL_VMSI_X86_TRIG_MASK 0x008000 > > #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x010000 > > +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe0000 > > I'm not convinced it is a good idea to limit the overall destination > ID width to 15 bits here - at the interface level we could as well > permit more bits right away; the implementation would reject too high > a value, of course. Not only with this I further wonder whether the > field shouldn't be unsplit while extending it. You won't get away > without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was > bumped already for 4.17) since afaics the unused bits of this field > previously weren't checked for being zero. We could easily have 8 > bits vector, 16 bits flags, and 32 bits destination ID in the struct. > And there would then still be 8 unused bits (which from now on we > ought to check for being zero). So I've made gflags a 64bit field, used the high 32bits for the destination ID, and the low ones for flags. I've left gvec as a separate field in the struct, as I don't want to require a modification to QEMU, just a rebuild against the updated headers will be enough. I've been wondering about this interface though (xen_domctl_bind_pt_irq), and it would seem better to just pass the raw MSI address and data fields from the guest and let Xen do the decoding. This however is not trivial to do as we would need to keep the previous interface anyway as it's used by QEMU. Maybe we could have some kind of union between a pair of address and data fields and a gflags one that would match the native layout, but as said not trivial (and would require using anonymous unions which I'm not sure are accepted even for domctls in the public headers). Thanks, Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-02-04 9:23 ` Roger Pau Monné @ 2022-02-04 9:30 ` Jan Beulich 2022-02-04 9:54 ` Roger Pau Monné 0 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2022-02-04 9:30 UTC (permalink / raw) To: Roger Pau Monné Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel On 04.02.2022 10:23, Roger Pau Monné wrote: > On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: >> On 20.01.2022 16:23, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/include/asm/msi.h >>> +++ b/xen/arch/x86/include/asm/msi.h >>> @@ -54,6 +54,7 @@ >>> #define MSI_ADDR_DEST_ID_SHIFT 12 >>> #define MSI_ADDR_DEST_ID_MASK 0x00ff000 >>> #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) >>> +#define MSI_ADDR_EXT_DEST_ID_MASK 0x0000fe0 >> >> Especially the immediately preceding macro now becomes kind of stale. > > Hm, I'm not so sure about that. We could expand the macro to place the > high bits in dest at bits 11:4 of the resulting address. However that > macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own > messages, so until we add support for the hypervisor itself to use the > extended destination ID mode there's not much point in modifying the > macro IMO. Well, this is all unhelpful considering the different form of extended ID in Intel's doc. At least by way of a comment things need clarifying and potential pitfalls need pointing out imo. >>> --- a/xen/include/public/domctl.h >>> +++ b/xen/include/public/domctl.h >>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { >>> #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 >>> #define XEN_DOMCTL_VMSI_X86_TRIG_MASK 0x008000 >>> #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x010000 >>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe0000 >> >> I'm not convinced it is a good idea to limit the overall destination >> ID width to 15 bits here - at the interface level we could as well >> permit more bits right away; the implementation would reject too high >> a value, of course. Not only with this I further wonder whether the >> field shouldn't be unsplit while extending it. You won't get away >> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was >> bumped already for 4.17) since afaics the unused bits of this field >> previously weren't checked for being zero. We could easily have 8 >> bits vector, 16 bits flags, and 32 bits destination ID in the struct. >> And there would then still be 8 unused bits (which from now on we >> ought to check for being zero). > > So I've made gflags a 64bit field, used the high 32bits for the > destination ID, and the low ones for flags. I've left gvec as a > separate field in the struct, as I don't want to require a > modification to QEMU, just a rebuild against the updated headers will > be enough. Hmm, wait - if qemu uses this without going through a suitable abstraction in at least libxc, then we cannot _ever_ change this interface: If a rebuild was required, old qemu binaries would stop working with newer Xen. If such a dependency indeed exists, we'll need a prominent warning comment in the public header. Jan > I've been wondering about this interface though > (xen_domctl_bind_pt_irq), and it would seem better to just pass the > raw MSI address and data fields from the guest and let Xen do the > decoding. This however is not trivial to do as we would need to keep > the previous interface anyway as it's used by QEMU. Maybe we could > have some kind of union between a pair of address and data fields and > a gflags one that would match the native layout, but as said not > trivial (and would require using anonymous unions which I'm not sure > are accepted even for domctls in the public headers). > > Thanks, Roger. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-02-04 9:30 ` Jan Beulich @ 2022-02-04 9:54 ` Roger Pau Monné 2022-02-04 10:20 ` Jan Beulich 0 siblings, 1 reply; 18+ messages in thread From: Roger Pau Monné @ 2022-02-04 9:54 UTC (permalink / raw) To: Jan Beulich Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel On Fri, Feb 04, 2022 at 10:30:54AM +0100, Jan Beulich wrote: > On 04.02.2022 10:23, Roger Pau Monné wrote: > > On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: > >> On 20.01.2022 16:23, Roger Pau Monne wrote: > >>> --- a/xen/arch/x86/include/asm/msi.h > >>> +++ b/xen/arch/x86/include/asm/msi.h > >>> @@ -54,6 +54,7 @@ > >>> #define MSI_ADDR_DEST_ID_SHIFT 12 > >>> #define MSI_ADDR_DEST_ID_MASK 0x00ff000 > >>> #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) > >>> +#define MSI_ADDR_EXT_DEST_ID_MASK 0x0000fe0 > >> > >> Especially the immediately preceding macro now becomes kind of stale. > > > > Hm, I'm not so sure about that. We could expand the macro to place the > > high bits in dest at bits 11:4 of the resulting address. However that > > macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own > > messages, so until we add support for the hypervisor itself to use the > > extended destination ID mode there's not much point in modifying the > > macro IMO. > > Well, this is all unhelpful considering the different form of extended > ID in Intel's doc. At least by way of a comment things need clarifying > and potential pitfalls need pointing out imo. Sure, will add some comments there. > >>> --- a/xen/include/public/domctl.h > >>> +++ b/xen/include/public/domctl.h > >>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { > >>> #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 > >>> #define XEN_DOMCTL_VMSI_X86_TRIG_MASK 0x008000 > >>> #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x010000 > >>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe0000 > >> > >> I'm not convinced it is a good idea to limit the overall destination > >> ID width to 15 bits here - at the interface level we could as well > >> permit more bits right away; the implementation would reject too high > >> a value, of course. Not only with this I further wonder whether the > >> field shouldn't be unsplit while extending it. You won't get away > >> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was > >> bumped already for 4.17) since afaics the unused bits of this field > >> previously weren't checked for being zero. We could easily have 8 > >> bits vector, 16 bits flags, and 32 bits destination ID in the struct. > >> And there would then still be 8 unused bits (which from now on we > >> ought to check for being zero). > > > > So I've made gflags a 64bit field, used the high 32bits for the > > destination ID, and the low ones for flags. I've left gvec as a > > separate field in the struct, as I don't want to require a > > modification to QEMU, just a rebuild against the updated headers will > > be enough. > > Hmm, wait - if qemu uses this without going through a suitable > abstraction in at least libxc, then we cannot _ever_ change this > interface: If a rebuild was required, old qemu binaries would > stop working with newer Xen. If such a dependency indeed exists, > we'll need a prominent warning comment in the public header. Hm, it's bad. The xc_domain_update_msi_irq interface uses a gflags parameter that's the gflags parameter of xen_domctl_bind_pt_irq. Which is even worse because it's not using the mask definitions from domctl.h, but rather a copy of them named XEN_PT_GFLAGS_* that are hardcoded in xen_pt_msi.c in QEMU code. So we can likely expand the layout of gflags, but moving fields is not an option. I think my original proposal of adding a XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK mask is the less bad option until we add a new stable interface for device passthrough for QEMU. Thanks, Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field 2022-02-04 9:54 ` Roger Pau Monné @ 2022-02-04 10:20 ` Jan Beulich 0 siblings, 0 replies; 18+ messages in thread From: Jan Beulich @ 2022-02-04 10:20 UTC (permalink / raw) To: Roger Pau Monné Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel On 04.02.2022 10:54, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 10:30:54AM +0100, Jan Beulich wrote: >> On 04.02.2022 10:23, Roger Pau Monné wrote: >>> On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: >>>> On 20.01.2022 16:23, Roger Pau Monne wrote: >>>>> --- a/xen/arch/x86/include/asm/msi.h >>>>> +++ b/xen/arch/x86/include/asm/msi.h >>>>> @@ -54,6 +54,7 @@ >>>>> #define MSI_ADDR_DEST_ID_SHIFT 12 >>>>> #define MSI_ADDR_DEST_ID_MASK 0x00ff000 >>>>> #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) >>>>> +#define MSI_ADDR_EXT_DEST_ID_MASK 0x0000fe0 >>>> >>>> Especially the immediately preceding macro now becomes kind of stale. >>> >>> Hm, I'm not so sure about that. We could expand the macro to place the >>> high bits in dest at bits 11:4 of the resulting address. However that >>> macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own >>> messages, so until we add support for the hypervisor itself to use the >>> extended destination ID mode there's not much point in modifying the >>> macro IMO. >> >> Well, this is all unhelpful considering the different form of extended >> ID in Intel's doc. At least by way of a comment things need clarifying >> and potential pitfalls need pointing out imo. > > Sure, will add some comments there. > >>>>> --- a/xen/include/public/domctl.h >>>>> +++ b/xen/include/public/domctl.h >>>>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { >>>>> #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 >>>>> #define XEN_DOMCTL_VMSI_X86_TRIG_MASK 0x008000 >>>>> #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x010000 >>>>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe0000 >>>> >>>> I'm not convinced it is a good idea to limit the overall destination >>>> ID width to 15 bits here - at the interface level we could as well >>>> permit more bits right away; the implementation would reject too high >>>> a value, of course. Not only with this I further wonder whether the >>>> field shouldn't be unsplit while extending it. You won't get away >>>> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was >>>> bumped already for 4.17) since afaics the unused bits of this field >>>> previously weren't checked for being zero. We could easily have 8 >>>> bits vector, 16 bits flags, and 32 bits destination ID in the struct. >>>> And there would then still be 8 unused bits (which from now on we >>>> ought to check for being zero). >>> >>> So I've made gflags a 64bit field, used the high 32bits for the >>> destination ID, and the low ones for flags. I've left gvec as a >>> separate field in the struct, as I don't want to require a >>> modification to QEMU, just a rebuild against the updated headers will >>> be enough. >> >> Hmm, wait - if qemu uses this without going through a suitable >> abstraction in at least libxc, then we cannot _ever_ change this >> interface: If a rebuild was required, old qemu binaries would >> stop working with newer Xen. If such a dependency indeed exists, >> we'll need a prominent warning comment in the public header. > > Hm, it's bad. The xc_domain_update_msi_irq interface uses a gflags > parameter that's the gflags parameter of xen_domctl_bind_pt_irq. Which > is even worse because it's not using the mask definitions from > domctl.h, but rather a copy of them named XEN_PT_GFLAGS_* that are > hardcoded in xen_pt_msi.c in QEMU code. > > So we can likely expand the layout of gflags, but moving fields is not > an option. I think my original proposal of adding a > XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK mask is the less bad option until > we add a new stable interface for device passthrough for QEMU. Given the observations - yeah, not much of a choice left. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] HACK: allow adding an offset to the x2APIC ID 2022-01-20 15:23 [PATCH 0/3] x86/hvm: add support for extended destination ID Roger Pau Monne 2022-01-20 15:23 ` [PATCH 1/3] xen/vioapic: add support for the extended destination ID field Roger Pau Monne 2022-01-20 15:23 ` [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field Roger Pau Monne @ 2022-01-20 15:23 ` Roger Pau Monne 2022-01-26 14:03 ` David Woodhouse 2 siblings, 1 reply; 18+ messages in thread From: Roger Pau Monne @ 2022-01-20 15:23 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu In order to test external interrupts using a destination ID > 255. Also start vCPUs with the APIC in x2APIC mode. --- xen/arch/x86/cpuid.c | 12 +++++++++++- xen/arch/x86/hvm/dom0_build.c | 3 ++- xen/arch/x86/hvm/vlapic.c | 14 ++++++++++++-- xen/arch/x86/include/asm/hvm/vlapic.h | 2 ++ 4 files changed, 27 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 0407a54626..01dcd474e8 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -8,6 +8,7 @@ #include <asm/hvm/nestedhvm.h> #include <asm/hvm/svm/svm.h> #include <asm/hvm/viridian.h> +#include <asm/hvm/vlapic.h> #include <asm/hvm/vmx/vmcs.h> #include <asm/paging.h> #include <asm/processor.h> @@ -876,7 +877,14 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, /* TODO: Rework topology logic. */ res->b &= 0x00ffffffu; if ( is_hvm_domain(d) ) - res->b |= (v->vcpu_id * 2) << 24; + { + unsigned int id = v->vcpu_id * 2; + + if ( id ) + id += opt_x2apic_id_offset; + + res->b |= id << 24; + } /* TODO: Rework vPMU control in terms of toolstack choices. */ if ( vpmu_available(v) && @@ -1058,6 +1066,8 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, /* Fix the x2APIC identifier. */ res->d = v->vcpu_id * 2; + if ( res->d ) + res->d += opt_x2apic_id_offset; } break; diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 43e1bf1248..b00e45885c 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -30,6 +30,7 @@ #include <asm/bzimage.h> #include <asm/dom0_build.h> #include <asm/hvm/support.h> +#include <asm/hvm/vlapic.h> #include <asm/io_apic.h> #include <asm/p2m.h> #include <asm/paging.h> @@ -845,7 +846,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr) x2apic->header.type = ACPI_MADT_TYPE_LOCAL_X2APIC; x2apic->header.length = sizeof(*x2apic); x2apic->uid = i; - x2apic->local_apic_id = i * 2; + x2apic->local_apic_id = i * 2 + (i ? opt_x2apic_id_offset : 0); x2apic->lapic_flags = ACPI_MADT_ENABLED; x2apic++; } diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index b8c84458ff..34209d5378 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -24,6 +24,7 @@ #include <xen/domain_page.h> #include <xen/event.h> #include <xen/nospec.h> +#include <xen/param.h> #include <xen/trace.h> #include <xen/lib.h> #include <xen/sched.h> @@ -53,6 +54,9 @@ (LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY |\ APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER) +unsigned int opt_x2apic_id_offset; +integer_param("x2apic_id_offset", opt_x2apic_id_offset); + static const unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] = { /* LVTT */ @@ -1073,7 +1077,7 @@ static void set_x2apic_id(struct vlapic *vlapic) u32 id = vlapic_vcpu(vlapic)->vcpu_id; u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf)); - vlapic_set_reg(vlapic, APIC_ID, id * 2); + vlapic_set_reg(vlapic, APIC_ID, id * 2 + (id ? opt_x2apic_id_offset : 0)); vlapic_set_reg(vlapic, APIC_LDR, ldr); } @@ -1443,7 +1447,13 @@ void vlapic_reset(struct vlapic *vlapic) if ( v->vcpu_id == 0 ) vlapic->hw.apic_base_msr |= APIC_BASE_BSP; - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24); + /* start in x2APIC mode. */ + vlapic->hw.apic_base_msr |= APIC_BASE_EXTD; + set_x2apic_id(vlapic); +#if 0 + vlapic_set_reg(vlapic, APIC_ID, id << 24); +#endif + vlapic_do_init(vlapic); } diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h index 8f908928c3..6e837cb5bf 100644 --- a/xen/arch/x86/include/asm/hvm/vlapic.h +++ b/xen/arch/x86/include/asm/hvm/vlapic.h @@ -91,6 +91,8 @@ struct vlapic { } init_sipi; }; +extern unsigned int opt_x2apic_id_offset; + /* vlapic's frequence is 100 MHz */ #define APIC_BUS_CYCLE_NS 10 -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] HACK: allow adding an offset to the x2APIC ID 2022-01-20 15:23 ` [PATCH 3/3] HACK: allow adding an offset to the x2APIC ID Roger Pau Monne @ 2022-01-26 14:03 ` David Woodhouse 0 siblings, 0 replies; 18+ messages in thread From: David Woodhouse @ 2022-01-26 14:03 UTC (permalink / raw) To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich, Andrew Cooper, Wei Liu [-- Attachment #1: Type: text/plain, Size: 368 bytes --] On Thu, 2022-01-20 at 16:23 +0100, Roger Pau Monne wrote: > In order to test external interrupts using a destination ID > 255. > Also start vCPUs with the APIC in x2APIC mode. Do also test APIC ID == 255 too, not just > 255. That one is particularly interesting since it was *broadcast* in XAPIC mode. It broke at least once in the qemu/KVM setup with X2APIC. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-02-04 10:20 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-20 15:23 [PATCH 0/3] x86/hvm: add support for extended destination ID Roger Pau Monne 2022-01-20 15:23 ` [PATCH 1/3] xen/vioapic: add support for the extended destination ID field Roger Pau Monne 2022-01-24 13:20 ` Jan Beulich 2022-01-25 15:13 ` Roger Pau Monné 2022-01-26 12:47 ` Jan Beulich 2022-01-26 19:21 ` David Woodhouse 2022-01-26 13:52 ` David Woodhouse 2022-01-26 14:23 ` Jan Beulich 2022-01-20 15:23 ` [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field Roger Pau Monne 2022-01-24 13:47 ` Jan Beulich 2022-01-26 13:54 ` David Woodhouse 2022-01-26 14:22 ` Roger Pau Monné 2022-02-04 9:23 ` Roger Pau Monné 2022-02-04 9:30 ` Jan Beulich 2022-02-04 9:54 ` Roger Pau Monné 2022-02-04 10:20 ` Jan Beulich 2022-01-20 15:23 ` [PATCH 3/3] HACK: allow adding an offset to the x2APIC ID Roger Pau Monne 2022-01-26 14:03 ` David Woodhouse
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.