All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] VT-d: address fallout from XSA-400
@ 2022-04-06 12:22 Jan Beulich
  2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
  2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Jan Beulich @ 2022-04-06 12:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant

1: avoid NULL deref on domain_context_mapping_one() error paths
2: avoid infinite recursion on domain_context_mapping_one() error path

Jan



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-06 12:22 [PATCH 0/2] VT-d: address fallout from XSA-400 Jan Beulich
@ 2022-04-06 12:24 ` Jan Beulich
  2022-04-06 13:36   ` Roger Pau Monné
  2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-04-06 12:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monné,
	Kevin Tian, Paul Durrant, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu

First there's a printk() which actually wrongly uses pdev in the first
place: We want to log the coordinates of the (perhaps fake) device
acted upon, which may not be pdev.

Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
device quarantine page tables (part I)") to add a domid_t parameter to
domain_context_unmap_one(): It's only used to pass back here via
me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.

Finally there's the invocation of domain_context_mapping_one(), which
needs to be passed the correct domain ID. Avoid taking that path when
pdev is NULL and the quarantine state is what would need restoring to.
This means we can't security-support PCI devices with RMRRs (if such
exist in practice) any longer.

Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
Coverity ID: 1503784
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -750,6 +750,10 @@ However, this feature can still confer s
 when used to remove drivers and backends from domain 0
 (i.e., Driver Domains).
 
+On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices
+when they have associated Reserved Memory Regions (RMRRs)
+is not security supported, if such a combination exists in the first place.
+
 ### x86/Multiple IOREQ servers
 
 An IOREQ server provides emulated devices to HVM and PVH guests.
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -85,7 +85,7 @@ int domain_context_mapping_one(struct do
                                const struct pci_dev *pdev, domid_t domid,
                                paddr_t pgd_maddr, unsigned int mode);
 int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iommu,
-                             uint8_t bus, uint8_t devfn, domid_t domid);
+                             uint8_t bus, uint8_t devfn);
 int cf_check intel_iommu_get_reserved_device_memory(
     iommu_grdm_t *func, void *ctxt);
 
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1533,7 +1533,7 @@ int domain_context_mapping_one(
                 check_cleanup_domid_map(domain, pdev, iommu);
             printk(XENLOG_ERR
                    "%pp: unexpected context entry %016lx_%016lx (expected %016lx_%016lx)\n",
-                   &PCI_SBDF3(pdev->seg, pdev->bus, devfn),
+                   &PCI_SBDF3(seg, bus, devfn),
                    (uint64_t)(res >> 64), (uint64_t)res,
                    (uint64_t)(old >> 64), (uint64_t)old);
             rc = -EILSEQ;
@@ -1601,9 +1601,13 @@ int domain_context_mapping_one(
 
     if ( rc )
     {
-        if ( !prev_dom )
-            ret = domain_context_unmap_one(domain, iommu, bus, devfn,
-                                           DEVICE_DOMID(domain, pdev));
+        if ( !prev_dom ||
+             /*
+              * Unmapping here means PCI devices with RMRRs (if such exist)
+              * will cause problems if such a region was actually accessed.
+              */
+             (prev_dom == dom_io && !pdev) )
+            ret = domain_context_unmap_one(domain, iommu, bus, devfn);
         else if ( prev_dom != domain ) /* Avoid infinite recursion. */
             ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
                                              DEVICE_DOMID(prev_dom, pdev),
@@ -1809,7 +1813,7 @@ static int domain_context_mapping(struct
 int domain_context_unmap_one(
     struct domain *domain,
     struct vtd_iommu *iommu,
-    uint8_t bus, uint8_t devfn, domid_t domid)
+    uint8_t bus, uint8_t devfn)
 {
     struct context_entry *context, *context_entries;
     u64 maddr;
@@ -1867,7 +1871,8 @@ int domain_context_unmap_one(
     unmap_vtd_domain_page(context_entries);
 
     if ( !iommu->drhd->segment && !rc )
-        rc = me_wifi_quirk(domain, bus, devfn, domid, 0, UNMAP_ME_PHANTOM_FUNC);
+        rc = me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, 0,
+                           UNMAP_ME_PHANTOM_FUNC);
 
     if ( rc && !is_hardware_domain(domain) && domain != dom_io )
     {
@@ -1916,8 +1921,7 @@ static const struct acpi_drhd_unit *doma
         if ( iommu_debug )
             printk(VTDPREFIX "%pd:PCIe: unmap %pp\n",
                    domain, &PCI_SBDF3(seg, bus, devfn));
-        ret = domain_context_unmap_one(domain, iommu, bus, devfn,
-                                       DEVICE_DOMID(domain, pdev));
+        ret = domain_context_unmap_one(domain, iommu, bus, devfn);
         if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 )
             disable_ats_device(pdev);
 
@@ -1930,8 +1934,7 @@ static const struct acpi_drhd_unit *doma
         if ( iommu_debug )
             printk(VTDPREFIX "%pd:PCI: unmap %pp\n",
                    domain, &PCI_SBDF3(seg, bus, devfn));
-        ret = domain_context_unmap_one(domain, iommu, bus, devfn,
-                                       DEVICE_DOMID(domain, pdev));
+        ret = domain_context_unmap_one(domain, iommu, bus, devfn);
         if ( ret )
             break;
 
@@ -1954,12 +1957,10 @@ static const struct acpi_drhd_unit *doma
             break;
         }
 
-        ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
-                                       DEVICE_DOMID(domain, pdev));
+        ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn);
         /* PCIe to PCI/PCIx bridge */
         if ( !ret && pdev_type(seg, tmp_bus, tmp_devfn) == DEV_TYPE_PCIe2PCI_BRIDGE )
-            ret = domain_context_unmap_one(domain, iommu, secbus, 0,
-                                           DEVICE_DOMID(domain, pdev));
+            ret = domain_context_unmap_one(domain, iommu, secbus, 0);
 
         break;
 
--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -427,7 +427,7 @@ static int __must_check map_me_phantom_f
                                         domid, pgd_maddr, mode);
     else
         rc = domain_context_unmap_one(domain, drhd->iommu, 0,
-                                      PCI_DEVFN(dev, 7), domid);
+                                      PCI_DEVFN(dev, 7));
 
     return rc;
 }



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path
  2022-04-06 12:22 [PATCH 0/2] VT-d: address fallout from XSA-400 Jan Beulich
  2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
@ 2022-04-06 12:25 ` Jan Beulich
  2022-04-06 14:03   ` Roger Pau Monné
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-04-06 12:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant

Despite the comment there infinite recursion was still possible, by
flip-flopping between two domains. This is because prev_dom is derived
from the DID found in the context entry, which was already updated by
the time error recovery is invoked. Simply introduce yet another mode
flag to detect the situation and cancel further recursion attempts.

Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1608,11 +1608,13 @@ int domain_context_mapping_one(
               */
              (prev_dom == dom_io && !pdev) )
             ret = domain_context_unmap_one(domain, iommu, bus, devfn);
-        else if ( prev_dom != domain ) /* Avoid infinite recursion. */
+        /* Avoid infinite recursion. */
+        else if ( prev_dom != domain && !(mode & MAP_ERROR_RECOVERY) )
             ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
                                              DEVICE_DOMID(prev_dom, pdev),
                                              DEVICE_PGTABLE(prev_dom, pdev),
-                                             mode & MAP_WITH_RMRR) < 0;
+                                             (mode & MAP_WITH_RMRR) |
+                                             MAP_ERROR_RECOVERY) < 0;
         else
             ret = 1;
 
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -29,7 +29,8 @@
 #define MAP_WITH_RMRR         (1u << 0)
 #define MAP_OWNER_DYING       (1u << 1)
 #define MAP_SINGLE_DEVICE     (1u << 2)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
+#define MAP_ERROR_RECOVERY    (1u << 3)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
@ 2022-04-06 13:36   ` Roger Pau Monné
  2022-04-06 14:07     ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2022-04-06 13:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote:
> First there's a printk() which actually wrongly uses pdev in the first
> place: We want to log the coordinates of the (perhaps fake) device
> acted upon, which may not be pdev.
> 
> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
> device quarantine page tables (part I)") to add a domid_t parameter to
> domain_context_unmap_one(): It's only used to pass back here via
> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
> 
> Finally there's the invocation of domain_context_mapping_one(), which
> needs to be passed the correct domain ID. Avoid taking that path when
> pdev is NULL and the quarantine state is what would need restoring to.
> This means we can't security-support PCI devices with RMRRs (if such
> exist in practice) any longer.
> 
> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
> Coverity ID: 1503784
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -750,6 +750,10 @@ However, this feature can still confer s
>  when used to remove drivers and backends from domain 0
>  (i.e., Driver Domains).
>  
> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices
> +when they have associated Reserved Memory Regions (RMRRs)
> +is not security supported, if such a combination exists in the first place.

Hm, I think this could be confusing from a user PoV.  It's my
understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT
and DEV_TYPE_PCI device types, so maybe we could use:

"On VT-d (Intel hardware) passing through non PCI Express devices with
associated Reserved Memory Regions (RMRRs) is not supported."

AFAICT domain_context_mapping will already prevent this from happening
by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling).

>  ### x86/Multiple IOREQ servers
>  
>  An IOREQ server provides emulated devices to HVM and PVH guests.
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -85,7 +85,7 @@ int domain_context_mapping_one(struct do
>                                 const struct pci_dev *pdev, domid_t domid,
>                                 paddr_t pgd_maddr, unsigned int mode);
>  int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iommu,
> -                             uint8_t bus, uint8_t devfn, domid_t domid);
> +                             uint8_t bus, uint8_t devfn);
>  int cf_check intel_iommu_get_reserved_device_memory(
>      iommu_grdm_t *func, void *ctxt);
>  
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1533,7 +1533,7 @@ int domain_context_mapping_one(
>                  check_cleanup_domid_map(domain, pdev, iommu);
>              printk(XENLOG_ERR
>                     "%pp: unexpected context entry %016lx_%016lx (expected %016lx_%016lx)\n",
> -                   &PCI_SBDF3(pdev->seg, pdev->bus, devfn),
> +                   &PCI_SBDF3(seg, bus, devfn),
>                     (uint64_t)(res >> 64), (uint64_t)res,
>                     (uint64_t)(old >> 64), (uint64_t)old);
>              rc = -EILSEQ;
> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one(
>  
>      if ( rc )
>      {
> -        if ( !prev_dom )
> -            ret = domain_context_unmap_one(domain, iommu, bus, devfn,
> -                                           DEVICE_DOMID(domain, pdev));
> +        if ( !prev_dom ||
> +             /*
> +              * Unmapping here means PCI devices with RMRRs (if such exist)
> +              * will cause problems if such a region was actually accessed.
> +              */
> +             (prev_dom == dom_io && !pdev) )

Maybe I'm reading this wrong, but plain PCI devices with RMRRs are
only allowed to be assigned to the hardware domain, and won't be able
to be reassigned afterwards.  It would be fine to unmap
unconditionally if !prev_dom or !pdev?  As calls with !pdev only
happening for phantom functions or bridge devices.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path
  2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
@ 2022-04-06 14:03   ` Roger Pau Monné
  0 siblings, 0 replies; 8+ messages in thread
From: Roger Pau Monné @ 2022-04-06 14:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On Wed, Apr 06, 2022 at 02:25:13PM +0200, Jan Beulich wrote:
> Despite the comment there infinite recursion was still possible, by
> flip-flopping between two domains. This is because prev_dom is derived
> from the DID found in the context entry, which was already updated by
> the time error recovery is invoked. Simply introduce yet another mode
> flag to detect the situation and cancel further recursion attempts.
> 
> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-06 13:36   ` Roger Pau Monné
@ 2022-04-06 14:07     ` Jan Beulich
  2022-04-06 15:01       ` Roger Pau Monné
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-04-06 14:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu

On 06.04.2022 15:36, Roger Pau Monné wrote:
> On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote:
>> First there's a printk() which actually wrongly uses pdev in the first
>> place: We want to log the coordinates of the (perhaps fake) device
>> acted upon, which may not be pdev.
>>
>> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
>> device quarantine page tables (part I)") to add a domid_t parameter to
>> domain_context_unmap_one(): It's only used to pass back here via
>> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
>>
>> Finally there's the invocation of domain_context_mapping_one(), which
>> needs to be passed the correct domain ID. Avoid taking that path when
>> pdev is NULL and the quarantine state is what would need restoring to.
>> This means we can't security-support PCI devices with RMRRs (if such
>> exist in practice) any longer.
>>
>> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
>> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
>> Coverity ID: 1503784
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -750,6 +750,10 @@ However, this feature can still confer s
>>  when used to remove drivers and backends from domain 0
>>  (i.e., Driver Domains).
>>  
>> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices
>> +when they have associated Reserved Memory Regions (RMRRs)
>> +is not security supported, if such a combination exists in the first place.
> 
> Hm, I think this could be confusing from a user PoV.  It's my
> understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT
> and DEV_TYPE_PCI device types, so maybe we could use:
> 
> "On VT-d (Intel hardware) passing through non PCI Express devices with
> associated Reserved Memory Regions (RMRRs) is not supported."
> 
> AFAICT domain_context_mapping will already prevent this from happening
> by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling).

Hmm. I did look at that code while writing the patch, but I didn't
draw the same conclusion. I'd like to tie the security support
statement to what could technically be made work. IOW I don't like
to say "not supported"; I'd like to stick to "not security
supported", which won't change even if that -EOPNOTSUPP path would
be replaced by a proper implementation. Even adding a sentence to
say this doesn't even work at present would seem to me to go too
far: Such a sentence would easily be forgotten if the situation
changed. But I'd be willing to add such an auxiliary statement as
a compromise.

As to "plain PCI (or PCI-X)" vs "non PCI Express" - while I prefer
to avoid a negation there, I'd be okay to switch if that's deemed
better for potential readers.

>> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one(
>>  
>>      if ( rc )
>>      {
>> -        if ( !prev_dom )
>> -            ret = domain_context_unmap_one(domain, iommu, bus, devfn,
>> -                                           DEVICE_DOMID(domain, pdev));
>> +        if ( !prev_dom ||
>> +             /*
>> +              * Unmapping here means PCI devices with RMRRs (if such exist)
>> +              * will cause problems if such a region was actually accessed.
>> +              */
>> +             (prev_dom == dom_io && !pdev) )
> 
> Maybe I'm reading this wrong, but plain PCI devices with RMRRs are
> only allowed to be assigned to the hardware domain, and won't be able
> to be reassigned afterwards.  It would be fine to unmap
> unconditionally if !prev_dom or !pdev?  As calls with !pdev only
> happening for phantom functions or bridge devices.

Like with the support statement, I'd prefer this code to be independent
of the (perhaps temporary) decision to not allow certain assignments.

Since you mention phantom functions: Aiui their mapping operations will
be done with a non-NULL pdev, unless of course they're phantom functions
associated with a non-PCIe device (in which case the same secondary
mappings with a NULL pdev would occur - imo pointlessly, as it would
be the same bridge and the same secondary bus as for the actual device;
I'm under the impression that error handling may not work properly when
such redundant mappings occur).

Jan



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-06 14:07     ` Jan Beulich
@ 2022-04-06 15:01       ` Roger Pau Monné
  2022-04-06 15:35         ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2022-04-06 15:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Apr 06, 2022 at 04:07:24PM +0200, Jan Beulich wrote:
> On 06.04.2022 15:36, Roger Pau Monné wrote:
> > On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote:
> >> First there's a printk() which actually wrongly uses pdev in the first
> >> place: We want to log the coordinates of the (perhaps fake) device
> >> acted upon, which may not be pdev.
> >>
> >> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
> >> device quarantine page tables (part I)") to add a domid_t parameter to
> >> domain_context_unmap_one(): It's only used to pass back here via
> >> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
> >>
> >> Finally there's the invocation of domain_context_mapping_one(), which
> >> needs to be passed the correct domain ID. Avoid taking that path when
> >> pdev is NULL and the quarantine state is what would need restoring to.
> >> This means we can't security-support PCI devices with RMRRs (if such
> >> exist in practice) any longer.
> >>
> >> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
> >> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
> >> Coverity ID: 1503784
> >> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>
> >> --- a/SUPPORT.md
> >> +++ b/SUPPORT.md
> >> @@ -750,6 +750,10 @@ However, this feature can still confer s
> >>  when used to remove drivers and backends from domain 0
> >>  (i.e., Driver Domains).
> >>  
> >> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices
> >> +when they have associated Reserved Memory Regions (RMRRs)
> >> +is not security supported, if such a combination exists in the first place.
> > 
> > Hm, I think this could be confusing from a user PoV.  It's my
> > understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT
> > and DEV_TYPE_PCI device types, so maybe we could use:
> > 
> > "On VT-d (Intel hardware) passing through non PCI Express devices with
> > associated Reserved Memory Regions (RMRRs) is not supported."
> > 
> > AFAICT domain_context_mapping will already prevent this from happening
> > by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling).
> 
> Hmm. I did look at that code while writing the patch, but I didn't
> draw the same conclusion. I'd like to tie the security support
> statement to what could technically be made work. IOW I don't like
> to say "not supported"; I'd like to stick to "not security
> supported", which won't change even if that -EOPNOTSUPP path would
> be replaced by a proper implementation.

My preference for using 'not supported' was simply so that users don't
need to worry whether their use-case fails in this category: Xen will
simply reject the operation in the first place.

Otherwise users might wonder whether some of the devices they are
passing through are security supported or not (lacking proper
information about how to check whether a device is PCI, PCI-X or PCIe
and whether it has associated RMRR regions).

I understand your worry here, but I do think we should aim to make
this document as easy to parse as possible for users, and hence I
wonder whether your proposed addition will cause unneeded confusion
because that use-case is not allowed by the hypervisor in the first
place.

> Even adding a sentence to
> say this doesn't even work at present would seem to me to go too
> far: Such a sentence would easily be forgotten if the situation
> changed. But I'd be willing to add such an auxiliary statement as
> a compromise.
> 
> As to "plain PCI (or PCI-X)" vs "non PCI Express" - while I prefer
> to avoid a negation there, I'd be okay to switch if that's deemed
> better for potential readers.

Maybe it would be best to simply expand the comment before the RMRR
check in domain_context_mapping() to note that removing the check will
have security implications?

> >> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one(
> >>  
> >>      if ( rc )
> >>      {
> >> -        if ( !prev_dom )
> >> -            ret = domain_context_unmap_one(domain, iommu, bus, devfn,
> >> -                                           DEVICE_DOMID(domain, pdev));
> >> +        if ( !prev_dom ||
> >> +             /*
> >> +              * Unmapping here means PCI devices with RMRRs (if such exist)
> >> +              * will cause problems if such a region was actually accessed.
> >> +              */
> >> +             (prev_dom == dom_io && !pdev) )
> > 
> > Maybe I'm reading this wrong, but plain PCI devices with RMRRs are
> > only allowed to be assigned to the hardware domain, and won't be able
> > to be reassigned afterwards.  It would be fine to unmap
> > unconditionally if !prev_dom or !pdev?  As calls with !pdev only
> > happening for phantom functions or bridge devices.
> 
> Like with the support statement, I'd prefer this code to be independent
> of the (perhaps temporary) decision to not allow certain assignments.

I was just saying because it would make the code easier IMO, but maybe
it doesn't matter that much.

The comment however should also be adjusted to mention that refers to
legacy DEV_TYPE_PCI type devices ('PCI devices with RMRRs' is too
unspecific IMO).

> Since you mention phantom functions: Aiui their mapping operations will
> be done with a non-NULL pdev, unless of course they're phantom functions
> associated with a non-PCIe device (in which case the same secondary
> mappings with a NULL pdev would occur - imo pointlessly, as it would
> be the same bridge and the same secondary bus as for the actual device;
> I'm under the impression that error handling may not work properly when
> such redundant mappings occur).

The redundant mapping of the bridges would be fine as prev_dom ==
domain in that case, and cannot fail?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-06 15:01       ` Roger Pau Monné
@ 2022-04-06 15:35         ` Jan Beulich
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2022-04-06 15:35 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Kevin Tian, Paul Durrant, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu

On 06.04.2022 17:01, Roger Pau Monné wrote:
> On Wed, Apr 06, 2022 at 04:07:24PM +0200, Jan Beulich wrote:
>> On 06.04.2022 15:36, Roger Pau Monné wrote:
>>> On Wed, Apr 06, 2022 at 02:24:32PM +0200, Jan Beulich wrote:
>>>> First there's a printk() which actually wrongly uses pdev in the first
>>>> place: We want to log the coordinates of the (perhaps fake) device
>>>> acted upon, which may not be pdev.
>>>>
>>>> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
>>>> device quarantine page tables (part I)") to add a domid_t parameter to
>>>> domain_context_unmap_one(): It's only used to pass back here via
>>>> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
>>>>
>>>> Finally there's the invocation of domain_context_mapping_one(), which
>>>> needs to be passed the correct domain ID. Avoid taking that path when
>>>> pdev is NULL and the quarantine state is what would need restoring to.
>>>> This means we can't security-support PCI devices with RMRRs (if such
>>>> exist in practice) any longer.
>>>>
>>>> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
>>>> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
>>>> Coverity ID: 1503784
>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/SUPPORT.md
>>>> +++ b/SUPPORT.md
>>>> @@ -750,6 +750,10 @@ However, this feature can still confer s
>>>>  when used to remove drivers and backends from domain 0
>>>>  (i.e., Driver Domains).
>>>>  
>>>> +On VT-d (Intel hardware) passing through plain PCI (or PCI-X) devices
>>>> +when they have associated Reserved Memory Regions (RMRRs)
>>>> +is not security supported, if such a combination exists in the first place.
>>>
>>> Hm, I think this could be confusing from a user PoV.  It's my
>>> understanding you want to differentiate between DEV_TYPE_PCIe_ENDPOINT
>>> and DEV_TYPE_PCI device types, so maybe we could use:
>>>
>>> "On VT-d (Intel hardware) passing through non PCI Express devices with
>>> associated Reserved Memory Regions (RMRRs) is not supported."
>>>
>>> AFAICT domain_context_mapping will already prevent this from happening
>>> by returning -EOPNOTSUPP (see the DEV_TYPE_PCI case handling).
>>
>> Hmm. I did look at that code while writing the patch, but I didn't
>> draw the same conclusion. I'd like to tie the security support
>> statement to what could technically be made work. IOW I don't like
>> to say "not supported"; I'd like to stick to "not security
>> supported", which won't change even if that -EOPNOTSUPP path would
>> be replaced by a proper implementation.
> 
> My preference for using 'not supported' was simply so that users don't
> need to worry whether their use-case fails in this category: Xen will
> simply reject the operation in the first place.
> 
> Otherwise users might wonder whether some of the devices they are
> passing through are security supported or not (lacking proper
> information about how to check whether a device is PCI, PCI-X or PCIe
> and whether it has associated RMRR regions).
> 
> I understand your worry here, but I do think we should aim to make
> this document as easy to parse as possible for users, and hence I
> wonder whether your proposed addition will cause unneeded confusion
> because that use-case is not allowed by the hypervisor in the first
> place.

I guess I'll simply drop the SUPPORT.md addition then. It would
probably have been a better fit right in one of the XSA-400 patches
anyway.

>> Even adding a sentence to
>> say this doesn't even work at present would seem to me to go too
>> far: Such a sentence would easily be forgotten if the situation
>> changed. But I'd be willing to add such an auxiliary statement as
>> a compromise.
>>
>> As to "plain PCI (or PCI-X)" vs "non PCI Express" - while I prefer
>> to avoid a negation there, I'd be okay to switch if that's deemed
>> better for potential readers.
> 
> Maybe it would be best to simply expand the comment before the RMRR
> check in domain_context_mapping() to note that removing the check will
> have security implications?

Hmm, with the changes I'm doing I don't think I make matters worse,
so this wouldn't seem to me to belong here.

>>>> @@ -1601,9 +1601,13 @@ int domain_context_mapping_one(
>>>>  
>>>>      if ( rc )
>>>>      {
>>>> -        if ( !prev_dom )
>>>> -            ret = domain_context_unmap_one(domain, iommu, bus, devfn,
>>>> -                                           DEVICE_DOMID(domain, pdev));
>>>> +        if ( !prev_dom ||
>>>> +             /*
>>>> +              * Unmapping here means PCI devices with RMRRs (if such exist)
>>>> +              * will cause problems if such a region was actually accessed.
>>>> +              */
>>>> +             (prev_dom == dom_io && !pdev) )
>>>
>>> Maybe I'm reading this wrong, but plain PCI devices with RMRRs are
>>> only allowed to be assigned to the hardware domain, and won't be able
>>> to be reassigned afterwards.  It would be fine to unmap
>>> unconditionally if !prev_dom or !pdev?  As calls with !pdev only
>>> happening for phantom functions or bridge devices.
>>
>> Like with the support statement, I'd prefer this code to be independent
>> of the (perhaps temporary) decision to not allow certain assignments.
> 
> I was just saying because it would make the code easier IMO, but maybe
> it doesn't matter that much.
> 
> The comment however should also be adjusted to mention that refers to
> legacy DEV_TYPE_PCI type devices ('PCI devices with RMRRs' is too
> unspecific IMO).

I'm happy to use DEV_TYPE_PCI in the comment.

>> Since you mention phantom functions: Aiui their mapping operations will
>> be done with a non-NULL pdev, unless of course they're phantom functions
>> associated with a non-PCIe device (in which case the same secondary
>> mappings with a NULL pdev would occur - imo pointlessly, as it would
>> be the same bridge and the same secondary bus as for the actual device;
>> I'm under the impression that error handling may not work properly when
>> such redundant mappings occur).
> 
> The redundant mapping of the bridges would be fine as prev_dom ==
> domain in that case, and cannot fail?

Hmm, yes, good point.

Jan



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-06 15:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 12:22 [PATCH 0/2] VT-d: address fallout from XSA-400 Jan Beulich
2022-04-06 12:24 ` [PATCH 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
2022-04-06 13:36   ` Roger Pau Monné
2022-04-06 14:07     ` Jan Beulich
2022-04-06 15:01       ` Roger Pau Monné
2022-04-06 15:35         ` Jan Beulich
2022-04-06 12:25 ` [PATCH 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
2022-04-06 14:03   ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.