* [PATCH 0/2] VT-d: address fallout from XSA-400 @ 2022-04-06 12:22 Jan Beulich 2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich 2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich 0 siblings, 2 replies; 8+ messages in thread From: Jan Beulich @ 2022-04-06 12:22 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant 1: avoid NULL deref on domain_context_mapping_one() error paths 2: avoid infinite recursion on domain_context_mapping_one() error path Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths 2022-04-06 12:22 [PATCH 0/2] VT-d: address fallout from XSA-400 Jan Beulich @ 2022-04-06 12:24 ` Jan Beulich 2022-04-06 13:36 ` Roger Pau Monné 2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich 1 sibling, 1 reply; 8+ messages in thread From: Jan Beulich @ 2022-04-06 12:24 UTC (permalink / raw) To: xen-devel Cc: Roger Pau Monné, Kevin Tian, Paul Durrant, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu First there's a printk() which actually wrongly uses pdev in the first place: We want to log the coordinates of the (perhaps fake) device acted upon, which may not be pdev. Then it was quite pointless for eb19326a328d ("VT-d: prepare for per- device quarantine page tables (part I)") to add a domid_t parameter to domain_context_unmap_one(): It's only used to pass back here via me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again. Finally there's the invocation of domain_context_mapping_one(), which needs to be passed the correct domain ID. Avoid taking that path when pdev is NULL and the quarantine state is what would need restoring to. This means we can't security-support PCI devices with RMRRs (if such exist in practice) any longer. Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining") Coverity ID: 1503784 Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/SUPPORT.md +++ b/SUPPORT.md @@ -750,6 +750,10 @@ However, this feature can still confer s when used to remove drivers and backends from domain 0 (i.e., Driver Domains). +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices +when they have associated Reserved Memory Regions (RMRRs) +is not security supported, if such a combination exists in the first place. + ### x86/Multiple IOREQ servers An IOREQ server provides emulated devices to HVM and PVH guests. --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -85,7 +85,7 @@ int domain_context_mapping_one(struct do const struct pci_dev *pdev, domid_t domid, paddr_t pgd_maddr, unsigned int mode); int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iommu, - uint8_t bus, uint8_t devfn, domid_t domid); + uint8_t bus, uint8_t devfn); int cf_check intel_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1533,7 +1533,7 @@ int domain_context_mapping_one( check_cleanup_domid_map(domain, pdev, iommu); printk(XENLOG_ERR "%pp: unexpected context entry %016lx_%016lx (expected %016lx_%016lx)\n", - &PCI_SBDF3(pdev->seg, pdev->bus, devfn), + &PCI_SBDF3(seg, bus, devfn), (uint64_t)(res >> 64), (uint64_t)res, (uint64_t)(old >> 64), (uint64_t)old); rc = -EILSEQ; @@ -1601,9 +1601,13 @@ int domain_context_mapping_one( if ( rc ) { - if ( !prev_dom ) - ret = domain_context_unmap_one(domain, iommu, bus, devfn, - DEVICE_DOMID(domain, pdev)); + if ( !prev_dom || + /* + * Unmapping here means PCI devices with RMRRs (if such exist) + * will cause problems if such a region was actually accessed. + */ + (prev_dom == dom_io && !pdev) ) + ret = domain_context_unmap_one(domain, iommu, bus, devfn); else if ( prev_dom != domain ) /* Avoid infinite recursion. */ ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev, DEVICE_DOMID(prev_dom, pdev), @@ -1809,7 +1813,7 @@ static int domain_context_mapping(struct int domain_context_unmap_one( struct domain *domain, struct vtd_iommu *iommu, - uint8_t bus, uint8_t devfn, domid_t domid) + uint8_t bus, uint8_t devfn) { struct context_entry *context, *context_entries; u64 maddr; @@ -1867,7 +1871,8 @@ int domain_context_unmap_one( unmap_vtd_domain_page(context_entries); if ( !iommu->drhd->segment && !rc ) - rc = me_wifi_quirk(domain, bus, devfn, domid, 0, UNMAP_ME_PHANTOM_FUNC); + rc = me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, 0, + UNMAP_ME_PHANTOM_FUNC); if ( rc && !is_hardware_domain(domain) && domain != dom_io ) { @@ -1916,8 +1921,7 @@ static const struct acpi_drhd_unit *doma if ( iommu_debug ) printk(VTDPREFIX "%pd:PCIe: unmap %pp\n", domain, &PCI_SBDF3(seg, bus, devfn)); - ret = domain_context_unmap_one(domain, iommu, bus, devfn, - DEVICE_DOMID(domain, pdev)); + ret = domain_context_unmap_one(domain, iommu, bus, devfn); if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 ) disable_ats_device(pdev); @@ -1930,8 +1934,7 @@ static const struct acpi_drhd_unit *doma if ( iommu_debug ) printk(VTDPREFIX "%pd:PCI: unmap %pp\n", domain, &PCI_SBDF3(seg, bus, devfn)); - ret = domain_context_unmap_one(domain, iommu, bus, devfn, - DEVICE_DOMID(domain, pdev)); + ret = domain_context_unmap_one(domain, iommu, bus, devfn); if ( ret ) break; @@ -1954,12 +1957,10 @@ static const struct acpi_drhd_unit *doma break; } - ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn, - DEVICE_DOMID(domain, pdev)); + ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn); /* PCIe to PCI/PCIx bridge */ if ( !ret && pdev_type(seg, tmp_bus, tmp_devfn) == DEV_TYPE_PCIe2PCI_BRIDGE ) - ret = domain_context_unmap_one(domain, iommu, secbus, 0, - DEVICE_DOMID(domain, pdev)); + ret = domain_context_unmap_one(domain, iommu, secbus, 0); break; --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -427,7 +427,7 @@ static int __must_check map_me_phantom_f domid, pgd_maddr, mode); else rc = domain_context_unmap_one(domain, drhd->iommu, 0, - PCI_DEVFN(dev, 7), domid); + PCI_DEVFN(dev, 7)); return rc; } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths 2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich @ 2022-04-06 13:36 ` Roger Pau Monné 2022-04-06 14:07 ` Jan Beulich 0 siblings, 1 reply; 8+ messages in thread From: Roger Pau Monné @ 2022-04-06 13:36 UTC (permalink / raw) To: Jan Beulich Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote: > First there's a printk() which actually wrongly uses pdev in the first > place: We want to log the coordinates of the (perhaps fake) device > acted upon, which may not be pdev. > > Then it was quite pointless for eb19326a328d ("VT-d: prepare for per- > device quarantine page tables (part I)") to add a domid_t parameter to > domain_context_unmap_one(): It's only used to pass back here via > me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again. > > Finally there's the invocation of domain_context_mapping_one(), which > needs to be passed the correct domain ID. Avoid taking that path when > pdev is NULL and the quarantine state is what would need restoring to. > This means we can't security-support PCI devices with RMRRs (if such > exist in practice) any longer. > > Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") > Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining") > Coverity ID: 1503784 > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/SUPPORT.md > +++ b/SUPPORT.md > @@ -750,6 +750,10 @@ However, this feature can still confer s > when used to remove drivers and backends from domain 0 > (i.e., Driver Domains). > > +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices > +when they have associated Reserved Memory Regions (RMRRs) > +is not security supported, if such a combination exists in the first place. Hm, I think this could be confusing from a user PoV. It's my understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT and DEV_TYPE_PCI device types, so maybe we could use: "On VT-d (Intel hardware) passing through non PCI Express devices with associated Reserved Memory Regions (RMRRs) is not supported." AFAICT domain_context_mapping will already prevent this from happening by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling). > ### x86/Multiple IOREQ servers > > An IOREQ server provides emulated devices to HVM and PVH guests. > --- a/xen/drivers/passthrough/vtd/extern.h > +++ b/xen/drivers/passthrough/vtd/extern.h > @@ -85,7 +85,7 @@ int domain_context_mapping_one(struct do > const struct pci_dev *pdev, domid_t domid, > paddr_t pgd_maddr, unsigned int mode); > int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iommu, > - uint8_t bus, uint8_t devfn, domid_t domid); > + uint8_t bus, uint8_t devfn); > int cf_check intel_iommu_get_reserved_device_memory( > iommu_grdm_t *func, void *ctxt); > > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1533,7 +1533,7 @@ int domain_context_mapping_one( > check_cleanup_domid_map(domain, pdev, iommu); > printk(XENLOG_ERR > "%pp: unexpected context entry %016lx_%016lx (expected %016lx_%016lx)\n", > - &PCI_SBDF3(pdev->seg, pdev->bus, devfn), > + &PCI_SBDF3(seg, bus, devfn), > (uint64_t)(res >> 64), (uint64_t)res, > (uint64_t)(old >> 64), (uint64_t)old); > rc = -EILSEQ; > @@ -1601,9 +1601,13 @@ int domain_context_mapping_one( > > if ( rc ) > { > - if ( !prev_dom ) > - ret = domain_context_unmap_one(domain, iommu, bus, devfn, > - DEVICE_DOMID(domain, pdev)); > + if ( !prev_dom || > + /* > + * Unmapping here means PCI devices with RMRRs (if such exist) > + * will cause problems if such a region was actually accessed. > + */ > + (prev_dom == dom_io && !pdev) ) Maybe I'm reading this wrong, but plain PCI devices with RMRRs are only allowed to be assigned to the hardware domain, and won't be able to be reassigned afterwards. It would be fine to unmap unconditionally if !prev_dom or !pdev? As calls with !pdev only happening for phantom functions or bridge devices. Thanks, Roger. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths 2022-04-06 13:36 ` Roger Pau Monné @ 2022-04-06 14:07 ` Jan Beulich 2022-04-06 15:01 ` Roger Pau Monné 0 siblings, 1 reply; 8+ messages in thread From: Jan Beulich @ 2022-04-06 14:07 UTC (permalink / raw) To: Roger Pau Monné Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On 06.04.2022 15:36, Roger Pau Monné wrote: > On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote: >> First there's a printk() which actually wrongly uses pdev in the first >> place: We want to log the coordinates of the (perhaps fake) device >> acted upon, which may not be pdev. >> >> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per- >> device quarantine page tables (part I)") to add a domid_t parameter to >> domain_context_unmap_one(): It's only used to pass back here via >> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again. >> >> Finally there's the invocation of domain_context_mapping_one(), which >> needs to be passed the correct domain ID. Avoid taking that path when >> pdev is NULL and the quarantine state is what would need restoring to. >> This means we can't security-support PCI devices with RMRRs (if such >> exist in practice) any longer. >> >> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") >> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining") >> Coverity ID: 1503784 >> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> --- a/SUPPORT.md >> +++ b/SUPPORT.md >> @@ -750,6 +750,10 @@ However, this feature can still confer s >> when used to remove drivers and backends from domain 0 >> (i.e., Driver Domains). >> >> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices >> +when they have associated Reserved Memory Regions (RMRRs) >> +is not security supported, if such a combination exists in the first place. > > Hm, I think this could be confusing from a user PoV. It's my > understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT > and DEV_TYPE_PCI device types, so maybe we could use: > > "On VT-d (Intel hardware) passing through non PCI Express devices with > associated Reserved Memory Regions (RMRRs) is not supported." > > AFAICT domain_context_mapping will already prevent this from happening > by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling). Hmm. I did look at that code while writing the patch, but I didn't draw the same conclusion. I'd like to tie the security support statement to what could technically be made work. IOW I don't like to say "not supported"; I'd like to stick to "not security supported", which won't change even if that -EOPNOTSUPP path would be replaced by a proper implementation. Even adding a sentence to say this doesn't even work at present would seem to me to go too far: Such a sentence would easily be forgotten if the situation changed. But I'd be willing to add such an auxiliary statement as a compromise. As to "plain PCI (or PCI-X)" vs "non PCI Express" - while I prefer to avoid a negation there, I'd be okay to switch if that's deemed better for potential readers. >> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one( >> >> if ( rc ) >> { >> - if ( !prev_dom ) >> - ret = domain_context_unmap_one(domain, iommu, bus, devfn, >> - DEVICE_DOMID(domain, pdev)); >> + if ( !prev_dom || >> + /* >> + * Unmapping here means PCI devices with RMRRs (if such exist) >> + * will cause problems if such a region was actually accessed. >> + */ >> + (prev_dom == dom_io && !pdev) ) > > Maybe I'm reading this wrong, but plain PCI devices with RMRRs are > only allowed to be assigned to the hardware domain, and won't be able > to be reassigned afterwards. It would be fine to unmap > unconditionally if !prev_dom or !pdev? As calls with !pdev only > happening for phantom functions or bridge devices. Like with the support statement, I'd prefer this code to be independent of the (perhaps temporary) decision to not allow certain assignments. Since you mention phantom functions: Aiui their mapping operations will be done with a non-NULL pdev, unless of course they're phantom functions associated with a non-PCIe device (in which case the same secondary mappings with a NULL pdev would occur - imo pointlessly, as it would be the same bridge and the same secondary bus as for the actual device; I'm under the impression that error handling may not work properly when such redundant mappings occur). Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths 2022-04-06 14:07 ` Jan Beulich @ 2022-04-06 15:01 ` Roger Pau Monné 2022-04-06 15:35 ` Jan Beulich 0 siblings, 1 reply; 8+ messages in thread From: Roger Pau Monné @ 2022-04-06 15:01 UTC (permalink / raw) To: Jan Beulich Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On Wed, Apr 06, 2022 at 04:07:24PM +0200, Jan Beulich wrote: > On 06.04.2022 15:36, Roger Pau Monné wrote: > > On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote: > >> First there's a printk() which actually wrongly uses pdev in the first > >> place: We want to log the coordinates of the (perhaps fake) device > >> acted upon, which may not be pdev. > >> > >> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per- > >> device quarantine page tables (part I)") to add a domid_t parameter to > >> domain_context_unmap_one(): It's only used to pass back here via > >> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again. > >> > >> Finally there's the invocation of domain_context_mapping_one(), which > >> needs to be passed the correct domain ID. Avoid taking that path when > >> pdev is NULL and the quarantine state is what would need restoring to. > >> This means we can't security-support PCI devices with RMRRs (if such > >> exist in practice) any longer. > >> > >> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") > >> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining") > >> Coverity ID: 1503784 > >> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >> > >> --- a/SUPPORT.md > >> +++ b/SUPPORT.md > >> @@ -750,6 +750,10 @@ However, this feature can still confer s > >> when used to remove drivers and backends from domain 0 > >> (i.e., Driver Domains). > >> > >> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices > >> +when they have associated Reserved Memory Regions (RMRRs) > >> +is not security supported, if such a combination exists in the first place. > > > > Hm, I think this could be confusing from a user PoV. It's my > > understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT > > and DEV_TYPE_PCI device types, so maybe we could use: > > > > "On VT-d (Intel hardware) passing through non PCI Express devices with > > associated Reserved Memory Regions (RMRRs) is not supported." > > > > AFAICT domain_context_mapping will already prevent this from happening > > by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling). > > Hmm. I did look at that code while writing the patch, but I didn't > draw the same conclusion. I'd like to tie the security support > statement to what could technically be made work. IOW I don't like > to say "not supported"; I'd like to stick to "not security > supported", which won't change even if that -EOPNOTSUPP path would > be replaced by a proper implementation. My preference for using 'not supported' was simply so that users don't need to worry whether their use-case fails in this category: Xen will simply reject the operation in the first place. Otherwise users might wonder whether some of the devices they are passing through are security supported or not (lacking proper information about how to check whether a device is PCI, PCI-X or PCIe and whether it has associated RMRR regions). I understand your worry here, but I do think we should aim to make this document as easy to parse as possible for users, and hence I wonder whether your proposed addition will cause unneeded confusion because that use-case is not allowed by the hypervisor in the first place. > Even adding a sentence to > say this doesn't even work at present would seem to me to go too > far: Such a sentence would easily be forgotten if the situation > changed. But I'd be willing to add such an auxiliary statement as > a compromise. > > As to "plain PCI (or PCI-X)" vs "non PCI Express" - while I prefer > to avoid a negation there, I'd be okay to switch if that's deemed > better for potential readers. Maybe it would be best to simply expand the comment before the RMRR check in domain_context_mapping() to note that removing the check will have security implications? > >> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one( > >> > >> if ( rc ) > >> { > >> - if ( !prev_dom ) > >> - ret = domain_context_unmap_one(domain, iommu, bus, devfn, > >> - DEVICE_DOMID(domain, pdev)); > >> + if ( !prev_dom || > >> + /* > >> + * Unmapping here means PCI devices with RMRRs (if such exist) > >> + * will cause problems if such a region was actually accessed. > >> + */ > >> + (prev_dom == dom_io && !pdev) ) > > > > Maybe I'm reading this wrong, but plain PCI devices with RMRRs are > > only allowed to be assigned to the hardware domain, and won't be able > > to be reassigned afterwards. It would be fine to unmap > > unconditionally if !prev_dom or !pdev? As calls with !pdev only > > happening for phantom functions or bridge devices. > > Like with the support statement, I'd prefer this code to be independent > of the (perhaps temporary) decision to not allow certain assignments. I was just saying because it would make the code easier IMO, but maybe it doesn't matter that much. The comment however should also be adjusted to mention that refers to legacy DEV_TYPE_PCI type devices ('PCI devices with RMRRs' is too unspecific IMO). > Since you mention phantom functions: Aiui their mapping operations will > be done with a non-NULL pdev, unless of course they're phantom functions > associated with a non-PCIe device (in which case the same secondary > mappings with a NULL pdev would occur - imo pointlessly, as it would > be the same bridge and the same secondary bus as for the actual device; > I'm under the impression that error handling may not work properly when > such redundant mappings occur). The redundant mapping of the bridges would be fine as prev_dom == domain in that case, and cannot fail? Thanks, Roger. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths 2022-04-06 15:01 ` Roger Pau Monné @ 2022-04-06 15:35 ` Jan Beulich 0 siblings, 0 replies; 8+ messages in thread From: Jan Beulich @ 2022-04-06 15:35 UTC (permalink / raw) To: Roger Pau Monné Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On 06.04.2022 17:01, Roger Pau Monné wrote: > On Wed, Apr 06, 2022 at 04:07:24PM +0200, Jan Beulich wrote: >> On 06.04.2022 15:36, Roger Pau Monné wrote: >>> On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote: >>>> First there's a printk() which actually wrongly uses pdev in the first >>>> place: We want to log the coordinates of the (perhaps fake) device >>>> acted upon, which may not be pdev. >>>> >>>> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per- >>>> device quarantine page tables (part I)") to add a domid_t parameter to >>>> domain_context_unmap_one(): It's only used to pass back here via >>>> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again. >>>> >>>> Finally there's the invocation of domain_context_mapping_one(), which >>>> needs to be passed the correct domain ID. Avoid taking that path when >>>> pdev is NULL and the quarantine state is what would need restoring to. >>>> This means we can't security-support PCI devices with RMRRs (if such >>>> exist in practice) any longer. >>>> >>>> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") >>>> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining") >>>> Coverity ID: 1503784 >>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> >>>> --- a/SUPPORT.md >>>> +++ b/SUPPORT.md >>>> @@ -750,6 +750,10 @@ However, this feature can still confer s >>>> when used to remove drivers and backends from domain 0 >>>> (i.e., Driver Domains). >>>> >>>> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices >>>> +when they have associated Reserved Memory Regions (RMRRs) >>>> +is not security supported, if such a combination exists in the first place. >>> >>> Hm, I think this could be confusing from a user PoV. It's my >>> understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT >>> and DEV_TYPE_PCI device types, so maybe we could use: >>> >>> "On VT-d (Intel hardware) passing through non PCI Express devices with >>> associated Reserved Memory Regions (RMRRs) is not supported." >>> >>> AFAICT domain_context_mapping will already prevent this from happening >>> by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling). >> >> Hmm. I did look at that code while writing the patch, but I didn't >> draw the same conclusion. I'd like to tie the security support >> statement to what could technically be made work. IOW I don't like >> to say "not supported"; I'd like to stick to "not security >> supported", which won't change even if that -EOPNOTSUPP path would >> be replaced by a proper implementation. > > My preference for using 'not supported' was simply so that users don't > need to worry whether their use-case fails in this category: Xen will > simply reject the operation in the first place. > > Otherwise users might wonder whether some of the devices they are > passing through are security supported or not (lacking proper > information about how to check whether a device is PCI, PCI-X or PCIe > and whether it has associated RMRR regions). > > I understand your worry here, but I do think we should aim to make > this document as easy to parse as possible for users, and hence I > wonder whether your proposed addition will cause unneeded confusion > because that use-case is not allowed by the hypervisor in the first > place. I guess I'll simply drop the SUPPORT.md addition then. It would probably have been a better fit right in one of the XSA-400 patches anyway. >> Even adding a sentence to >> say this doesn't even work at present would seem to me to go too >> far: Such a sentence would easily be forgotten if the situation >> changed. But I'd be willing to add such an auxiliary statement as >> a compromise. >> >> As to "plain PCI (or PCI-X)" vs "non PCI Express" - while I prefer >> to avoid a negation there, I'd be okay to switch if that's deemed >> better for potential readers. > > Maybe it would be best to simply expand the comment before the RMRR > check in domain_context_mapping() to note that removing the check will > have security implications? Hmm, with the changes I'm doing I don't think I make matters worse, so this wouldn't seem to me to belong here. >>>> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one( >>>> >>>> if ( rc ) >>>> { >>>> - if ( !prev_dom ) >>>> - ret = domain_context_unmap_one(domain, iommu, bus, devfn, >>>> - DEVICE_DOMID(domain, pdev)); >>>> + if ( !prev_dom || >>>> + /* >>>> + * Unmapping here means PCI devices with RMRRs (if such exist) >>>> + * will cause problems if such a region was actually accessed. >>>> + */ >>>> + (prev_dom == dom_io && !pdev) ) >>> >>> Maybe I'm reading this wrong, but plain PCI devices with RMRRs are >>> only allowed to be assigned to the hardware domain, and won't be able >>> to be reassigned afterwards. It would be fine to unmap >>> unconditionally if !prev_dom or !pdev? As calls with !pdev only >>> happening for phantom functions or bridge devices. >> >> Like with the support statement, I'd prefer this code to be independent >> of the (perhaps temporary) decision to not allow certain assignments. > > I was just saying because it would make the code easier IMO, but maybe > it doesn't matter that much. > > The comment however should also be adjusted to mention that refers to > legacy DEV_TYPE_PCI type devices ('PCI devices with RMRRs' is too > unspecific IMO). I'm happy to use DEV_TYPE_PCI in the comment. >> Since you mention phantom functions: Aiui their mapping operations will >> be done with a non-NULL pdev, unless of course they're phantom functions >> associated with a non-PCIe device (in which case the same secondary >> mappings with a NULL pdev would occur - imo pointlessly, as it would >> be the same bridge and the same secondary bus as for the actual device; >> I'm under the impression that error handling may not work properly when >> such redundant mappings occur). > > The redundant mapping of the bridges would be fine as prev_dom == > domain in that case, and cannot fail? Hmm, yes, good point. Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path 2022-04-06 12:22 [PATCH 0/2] VT-d: address fallout from XSA-400 Jan Beulich 2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich @ 2022-04-06 12:25 ` Jan Beulich 2022-04-06 14:03 ` Roger Pau Monné 1 sibling, 1 reply; 8+ messages in thread From: Jan Beulich @ 2022-04-06 12:25 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant Despite the comment there infinite recursion was still possible, by flip-flopping between two domains. This is because prev_dom is derived from the DID found in the context entry, which was already updated by the time error recovery is invoked. Simply introduce yet another mode flag to detect the situation and cancel further recursion attempts. Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1608,11 +1608,13 @@ int domain_context_mapping_one( */ (prev_dom == dom_io && !pdev) ) ret = domain_context_unmap_one(domain, iommu, bus, devfn); - else if ( prev_dom != domain ) /* Avoid infinite recursion. */ + /* Avoid infinite recursion. */ + else if ( prev_dom != domain && !(mode & MAP_ERROR_RECOVERY) ) ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev, DEVICE_DOMID(prev_dom, pdev), DEVICE_PGTABLE(prev_dom, pdev), - mode & MAP_WITH_RMRR) < 0; + (mode & MAP_WITH_RMRR) | + MAP_ERROR_RECOVERY) < 0; else ret = 1; --- a/xen/drivers/passthrough/vtd/vtd.h +++ b/xen/drivers/passthrough/vtd/vtd.h @@ -29,7 +29,8 @@ #define MAP_WITH_RMRR (1u << 0) #define MAP_OWNER_DYING (1u << 1) #define MAP_SINGLE_DEVICE (1u << 2) -#define UNMAP_ME_PHANTOM_FUNC (1u << 3) +#define MAP_ERROR_RECOVERY (1u << 3) +#define UNMAP_ME_PHANTOM_FUNC (1u << 4) /* Allow for both IOAPIC and IOSAPIC. */ #define IO_xAPIC_route_entry IO_APIC_route_entry ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path 2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich @ 2022-04-06 14:03 ` Roger Pau Monné 0 siblings, 0 replies; 8+ messages in thread From: Roger Pau Monné @ 2022-04-06 14:03 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant On Wed, Apr 06, 2022 at 02:25:13PM +0200, Jan Beulich wrote: > Despite the comment there infinite recursion was still possible, by > flip-flopping between two domains. This is because prev_dom is derived > from the DID found in the context entry, which was already updated by > the time error recovery is invoked. Simply introduce yet another mode > flag to detect the situation and cancel further recursion attempts. > > Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-04-06 15:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-06 12:22 [PATCH 0/2] VT-d: address fallout from XSA-400 Jan Beulich 2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich 2022-04-06 13:36 ` Roger Pau Monné 2022-04-06 14:07 ` Jan Beulich 2022-04-06 15:01 ` Roger Pau Monné 2022-04-06 15:35 ` Jan Beulich 2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich 2022-04-06 14:03 ` Roger Pau Monné
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.