All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] VT-d: address fallout from XSA-400
@ 2022-04-07  6:09 Jan Beulich
  2022-04-07  6:11 ` [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-07  6:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant

1: avoid NULL deref on domain_context_mapping_one() error paths
2: avoid infinite recursion on domain_context_mapping_one() error path

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-07  6:09 [PATCH v2 0/2] VT-d: address fallout from XSA-400 Jan Beulich
@ 2022-04-07  6:11 ` Jan Beulich
  2022-04-07  7:41   ` Roger Pau Monné
  2022-04-07  6:11 ` [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
  2022-04-07  9:27 ` [PATCH v2 0.9/2] VT-d: don't needlessly look up DID Jan Beulich
  2 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2022-04-07  6:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant

First there's a printk() which actually wrongly uses pdev in the first
place: We want to log the coordinates of the (perhaps fake) device
acted upon, which may not be pdev.

Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
device quarantine page tables (part I)") to add a domid_t parameter to
domain_context_unmap_one(): It's only used to pass back here via
me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.

Finally there's the invocation of domain_context_mapping_one(), which
needs to be passed the correct domain ID. Avoid taking that path when
pdev is NULL and the quarantine state is what would need restoring to.
This means we can't security-support PCI devices with RMRRs (if such
exist in practice) any longer.

Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
Coverity ID: 1503784
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Drop SUPPORT.md addition. Adjust comment. Extend another comment.

--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -85,7 +85,7 @@ int domain_context_mapping_one(struct do
                                const struct pci_dev *pdev, domid_t domid,
                                paddr_t pgd_maddr, unsigned int mode);
 int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iommu,
-                             uint8_t bus, uint8_t devfn, domid_t domid);
+                             uint8_t bus, uint8_t devfn);
 int cf_check intel_iommu_get_reserved_device_memory(
     iommu_grdm_t *func, void *ctxt);
 
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1533,7 +1533,7 @@ int domain_context_mapping_one(
                 check_cleanup_domid_map(domain, pdev, iommu);
             printk(XENLOG_ERR
                    "%pp: unexpected context entry %016lx_%016lx (expected %016lx_%016lx)\n",
-                   &PCI_SBDF3(pdev->seg, pdev->bus, devfn),
+                   &PCI_SBDF3(seg, bus, devfn),
                    (uint64_t)(res >> 64), (uint64_t)res,
                    (uint64_t)(old >> 64), (uint64_t)old);
             rc = -EILSEQ;
@@ -1601,9 +1601,14 @@ int domain_context_mapping_one(
 
     if ( rc )
     {
-        if ( !prev_dom )
-            ret = domain_context_unmap_one(domain, iommu, bus, devfn,
-                                           DEVICE_DOMID(domain, pdev));
+        if ( !prev_dom ||
+             /*
+              * Unmapping here means DEV_TYPE_PCI devices with RMRRs (if such
+              * exist) would cause problems if such a region was actually
+              * accessed.
+              */
+             (prev_dom == dom_io && !pdev) )
+            ret = domain_context_unmap_one(domain, iommu, bus, devfn);
         else if ( prev_dom != domain ) /* Avoid infinite recursion. */
             ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
                                              DEVICE_DOMID(prev_dom, pdev),
@@ -1744,7 +1749,9 @@ static int domain_context_mapping(struct
          * Strictly speaking if the device is the only one behind this bridge
          * and the only one with this (secbus,0,0) tuple, it could be allowed
          * to be re-assigned regardless of RMRR presence.  But let's deal with
-         * that case only if it is actually found in the wild.
+         * that case only if it is actually found in the wild.  Note that
+         * dealing with this just here would still not render the operation
+         * secure.
          */
         else if ( prev_present && (mode & MAP_WITH_RMRR) &&
                   domain != pdev->domain )
@@ -1809,7 +1816,7 @@ static int domain_context_mapping(struct
 int domain_context_unmap_one(
     struct domain *domain,
     struct vtd_iommu *iommu,
-    uint8_t bus, uint8_t devfn, domid_t domid)
+    uint8_t bus, uint8_t devfn)
 {
     struct context_entry *context, *context_entries;
     u64 maddr;
@@ -1867,7 +1874,8 @@ int domain_context_unmap_one(
     unmap_vtd_domain_page(context_entries);
 
     if ( !iommu->drhd->segment && !rc )
-        rc = me_wifi_quirk(domain, bus, devfn, domid, 0, UNMAP_ME_PHANTOM_FUNC);
+        rc = me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, 0,
+                           UNMAP_ME_PHANTOM_FUNC);
 
     if ( rc && !is_hardware_domain(domain) && domain != dom_io )
     {
@@ -1916,8 +1924,7 @@ static const struct acpi_drhd_unit *doma
         if ( iommu_debug )
             printk(VTDPREFIX "%pd:PCIe: unmap %pp\n",
                    domain, &PCI_SBDF3(seg, bus, devfn));
-        ret = domain_context_unmap_one(domain, iommu, bus, devfn,
-                                       DEVICE_DOMID(domain, pdev));
+        ret = domain_context_unmap_one(domain, iommu, bus, devfn);
         if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 )
             disable_ats_device(pdev);
 
@@ -1930,8 +1937,7 @@ static const struct acpi_drhd_unit *doma
         if ( iommu_debug )
             printk(VTDPREFIX "%pd:PCI: unmap %pp\n",
                    domain, &PCI_SBDF3(seg, bus, devfn));
-        ret = domain_context_unmap_one(domain, iommu, bus, devfn,
-                                       DEVICE_DOMID(domain, pdev));
+        ret = domain_context_unmap_one(domain, iommu, bus, devfn);
         if ( ret )
             break;
 
@@ -1954,12 +1960,10 @@ static const struct acpi_drhd_unit *doma
             break;
         }
 
-        ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
-                                       DEVICE_DOMID(domain, pdev));
+        ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn);
         /* PCIe to PCI/PCIx bridge */
         if ( !ret && pdev_type(seg, tmp_bus, tmp_devfn) == DEV_TYPE_PCIe2PCI_BRIDGE )
-            ret = domain_context_unmap_one(domain, iommu, secbus, 0,
-                                           DEVICE_DOMID(domain, pdev));
+            ret = domain_context_unmap_one(domain, iommu, secbus, 0);
 
         break;
 
--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -427,7 +427,7 @@ static int __must_check map_me_phantom_f
                                         domid, pgd_maddr, mode);
     else
         rc = domain_context_unmap_one(domain, drhd->iommu, 0,
-                                      PCI_DEVFN(dev, 7), domid);
+                                      PCI_DEVFN(dev, 7));
 
     return rc;
 }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path
  2022-04-07  6:09 [PATCH v2 0/2] VT-d: address fallout from XSA-400 Jan Beulich
  2022-04-07  6:11 ` [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
@ 2022-04-07  6:11 ` Jan Beulich
  2022-04-07  7:51   ` Roger Pau Monné
  2022-04-08  4:06   ` Tian, Kevin
  2022-04-07  9:27 ` [PATCH v2 0.9/2] VT-d: don't needlessly look up DID Jan Beulich
  2 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-07  6:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant

Despite the comment there infinite recursion was still possible, by
flip-flopping between two domains. This is because prev_dom is derived
from the DID found in the context entry, which was already updated by
the time error recovery is invoked. Simply introduce yet another mode
flag to prevent rolling back an in-progress roll-back of a prior
mapping attempt.

Also drop the existing recursion prevention for having been dead anyway:
Earlier in the function we already bail when prev_dom == domain.

Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Extend scope of the approach taken. Leverage for some cleanup.

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1599,7 +1599,7 @@ int domain_context_mapping_one(
     if ( !seg && !rc )
         rc = me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode);
 
-    if ( rc )
+    if ( rc && !(mode & MAP_ERROR_RECOVERY) )
     {
         if ( !prev_dom ||
              /*
@@ -1609,13 +1609,12 @@ int domain_context_mapping_one(
               */
              (prev_dom == dom_io && !pdev) )
             ret = domain_context_unmap_one(domain, iommu, bus, devfn);
-        else if ( prev_dom != domain ) /* Avoid infinite recursion. */
+        else
             ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
                                              DEVICE_DOMID(prev_dom, pdev),
                                              DEVICE_PGTABLE(prev_dom, pdev),
-                                             mode & MAP_WITH_RMRR) < 0;
-        else
-            ret = 1;
+                                             (mode & MAP_WITH_RMRR) |
+                                             MAP_ERROR_RECOVERY) < 0;
 
         if ( !ret && pdev && pdev->devfn == devfn )
             check_cleanup_domid_map(domain, pdev, iommu);
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -29,7 +29,8 @@
 #define MAP_WITH_RMRR         (1u << 0)
 #define MAP_OWNER_DYING       (1u << 1)
 #define MAP_SINGLE_DEVICE     (1u << 2)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
+#define MAP_ERROR_RECOVERY    (1u << 3)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-07  6:11 ` [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
@ 2022-04-07  7:41   ` Roger Pau Monné
  2022-04-07  7:50     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-07  7:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On Thu, Apr 07, 2022 at 08:11:06AM +0200, Jan Beulich wrote:
> First there's a printk() which actually wrongly uses pdev in the first
> place: We want to log the coordinates of the (perhaps fake) device
> acted upon, which may not be pdev.
> 
> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
> device quarantine page tables (part I)") to add a domid_t parameter to
> domain_context_unmap_one(): It's only used to pass back here via
> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
> 
> Finally there's the invocation of domain_context_mapping_one(), which
> needs to be passed the correct domain ID. Avoid taking that path when
> pdev is NULL and the quarantine state is what would need restoring to.
> This means we can't security-support PCI devices with RMRRs (if such
> exist in practice) any longer.

The sentence:

"This means we can't security-support PCI devices with RMRRs"

Seems too broad and could lead to confusion. So I would maybe use:
"legacy PCI devices" or "non PCI Express devices".

> 
> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
> Coverity ID: 1503784
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-07  7:41   ` Roger Pau Monné
@ 2022-04-07  7:50     ` Jan Beulich
  2022-04-07 16:31       ` Jason Andryuk
  2022-04-08  4:02       ` Tian, Kevin
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-07  7:50 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On 07.04.2022 09:41, Roger Pau Monné wrote:
> On Thu, Apr 07, 2022 at 08:11:06AM +0200, Jan Beulich wrote:
>> First there's a printk() which actually wrongly uses pdev in the first
>> place: We want to log the coordinates of the (perhaps fake) device
>> acted upon, which may not be pdev.
>>
>> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
>> device quarantine page tables (part I)") to add a domid_t parameter to
>> domain_context_unmap_one(): It's only used to pass back here via
>> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
>>
>> Finally there's the invocation of domain_context_mapping_one(), which
>> needs to be passed the correct domain ID. Avoid taking that path when
>> pdev is NULL and the quarantine state is what would need restoring to.
>> This means we can't security-support PCI devices with RMRRs (if such
>> exist in practice) any longer.
> 
> The sentence:
> 
> "This means we can't security-support PCI devices with RMRRs"
> 
> Seems too broad and could lead to confusion. So I would maybe use:
> "legacy PCI devices" or "non PCI Express devices".

Right. I did actually forget to either drop or edit that sentence. I've
now extended this to

"This means we can't security-support non-PCI-Express devices with RMRRs
 (if such exist in practice) any longer; note that as of trhe 1st of the
 two commits referenced below assigning them to DomU-s is unsupported
 anyway."

>> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
>> Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
>> Coverity ID: 1503784
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path
  2022-04-07  6:11 ` [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
@ 2022-04-07  7:51   ` Roger Pau Monné
  2022-04-07  8:54     ` Jan Beulich
  2022-04-08  4:06   ` Tian, Kevin
  1 sibling, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-07  7:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On Thu, Apr 07, 2022 at 08:11:45AM +0200, Jan Beulich wrote:
> Despite the comment there infinite recursion was still possible, by
> flip-flopping between two domains. This is because prev_dom is derived
> from the DID found in the context entry, which was already updated by
> the time error recovery is invoked. Simply introduce yet another mode
> flag to prevent rolling back an in-progress roll-back of a prior
> mapping attempt.
> 
> Also drop the existing recursion prevention for having been dead anyway:
> Earlier in the function we already bail when prev_dom == domain.

I wonder whether it would be cleaner to stash the previous context
entry if present and try to (re)set that one instead of recurring into
ourselves.

> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path
  2022-04-07  7:51   ` Roger Pau Monné
@ 2022-04-07  8:54     ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-07  8:54 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On 07.04.2022 09:51, Roger Pau Monné wrote:
> On Thu, Apr 07, 2022 at 08:11:45AM +0200, Jan Beulich wrote:
>> Despite the comment there infinite recursion was still possible, by
>> flip-flopping between two domains. This is because prev_dom is derived
>> from the DID found in the context entry, which was already updated by
>> the time error recovery is invoked. Simply introduce yet another mode
>> flag to prevent rolling back an in-progress roll-back of a prior
>> mapping attempt.
>>
>> Also drop the existing recursion prevention for having been dead anyway:
>> Earlier in the function we already bail when prev_dom == domain.
> 
> I wonder whether it would be cleaner to stash the previous context
> entry if present and try to (re)set that one instead of recurring into
> ourselves.

I'm not sure this would be cleaner (it might be easier): The entry
may have had modifications which we want to undo by a clean
establishing of the "new" (really original) mapping. Otoh roll-back
can certainly mean simply going back to what was there. But that
would likely want to be a separate change, for likely coming with
a lot of code churn: I'd see the function gaining a two-iteration
loop then, which would mean re-indenting fair parts of it. But
maybe it could also be dealt with by other means, while still not
introducing a fake loop via adding a "goto" back to the top of the
function ...

>> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0.9/2] VT-d: don't needlessly look up DID
  2022-04-07  6:09 [PATCH v2 0/2] VT-d: address fallout from XSA-400 Jan Beulich
  2022-04-07  6:11 ` [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
  2022-04-07  6:11 ` [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
@ 2022-04-07  9:27 ` Jan Beulich
  2022-04-07  9:40   ` Roger Pau Monné
  2022-04-08  4:08   ` Tian, Kevin
  2 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-07  9:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Kevin Tian, Andrew Cooper, Paul Durrant

If get_iommu_domid() in domain_context_unmap_one() fails, we better
wouldn't clear the context entry in the first place, as we're then unable
to issue the corresponding flush. However, we have no need to look up the
DID in the first place: What needs flushing is very specifically the DID
that was in the context entry before our clearing of it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This (an intended follow-up to XSA-399) is actually a prereq to what was
called patch 1 so far in this series.

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1830,18 +1830,12 @@ int domain_context_unmap_one(
         return 0;
     }
 
+    iommu_domid = context_domain_id(*context);
+
     context_clear_present(*context);
     context_clear_entry(*context);
     iommu_sync_cache(context, sizeof(struct context_entry));
 
-    iommu_domid = get_iommu_did(domid, iommu, !domain->is_dying);
-    if ( iommu_domid == -1 )
-    {
-        spin_unlock(&iommu->lock);
-        unmap_vtd_domain_page(context_entries);
-        return -EINVAL;
-    }
-
     rc = iommu_flush_context_device(iommu, iommu_domid,
                                     PCI_BDF2(bus, devfn),
                                     DMA_CCMD_MASK_NOBIT, 0);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0.9/2] VT-d: don't needlessly look up DID
  2022-04-07  9:27 ` [PATCH v2 0.9/2] VT-d: don't needlessly look up DID Jan Beulich
@ 2022-04-07  9:40   ` Roger Pau Monné
  2022-04-08  4:08   ` Tian, Kevin
  1 sibling, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-07  9:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On Thu, Apr 07, 2022 at 11:27:53AM +0200, Jan Beulich wrote:
> If get_iommu_domid() in domain_context_unmap_one() fails, we better
> wouldn't clear the context entry in the first place, as we're then unable
> to issue the corresponding flush. However, we have no need to look up the
> DID in the first place: What needs flushing is very specifically the DID
> that was in the context entry before our clearing of it.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-07  7:50     ` Jan Beulich
@ 2022-04-07 16:31       ` Jason Andryuk
  2022-04-08  6:03         ` Jan Beulich
  2022-04-08  4:02       ` Tian, Kevin
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Andryuk @ 2022-04-07 16:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

Hi,

On Thu, Apr 7, 2022 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 07.04.2022 09:41, Roger Pau Monné wrote:
> > On Thu, Apr 07, 2022 at 08:11:06AM +0200, Jan Beulich wrote:
> >> First there's a printk() which actually wrongly uses pdev in the first
> >> place: We want to log the coordinates of the (perhaps fake) device
> >> acted upon, which may not be pdev.
> >>
> >> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
> >> device quarantine page tables (part I)") to add a domid_t parameter to
> >> domain_context_unmap_one(): It's only used to pass back here via
> >> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
> >>
> >> Finally there's the invocation of domain_context_mapping_one(), which
> >> needs to be passed the correct domain ID. Avoid taking that path when
> >> pdev is NULL and the quarantine state is what would need restoring to.
> >> This means we can't security-support PCI devices with RMRRs (if such
> >> exist in practice) any longer.
> >
> > The sentence:
> >
> > "This means we can't security-support PCI devices with RMRRs"
> >
> > Seems too broad and could lead to confusion. So I would maybe use:
> > "legacy PCI devices" or "non PCI Express devices".
>
> Right. I did actually forget to either drop or edit that sentence. I've
> now extended this to
>
> "This means we can't security-support non-PCI-Express devices with RMRRs
>  (if such exist in practice) any longer; note that as of trhe 1st of the
>  two commits referenced below assigning them to DomU-s is unsupported
>  anyway."

Mentioning "Express" makes the support statement clearer.  However,
I'm not clear on what unsupported means in "assigning them to DomU-s
is unsupported anyway".  They can't be assigned?  I'm testing with
staging-4.16, so with XSA-400, but not this patch.  I seemingly have a
legacy PCI device still being assigned to a domU.

It is an 8th Gen Intel laptop with:
00:14.0 USB controller: Intel Corporation Cannon Point-LP USB 3.1 xHCI
Controller (rev 30) (prog-if 30 [XHCI]).

lspci output for 00:14.0 does *not* have capability "Express (v2)",
but it does have an RMRR:
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:14.0

It looks like it is PCI compared to 39:00.0 which does have Express (v2):
(XEN) [VT-D]d1:PCI: map 0000:00:14.0
(XEN) [VT-D]d1:PCIe: map 0000:39:00.0

As I understand it, an RMRR is common with USB controllers for
implementing legacy mouse & keyboard support.  The Cannon Point PCH is
fairly modern, so I'd expect it to use PCI Express.  Xen seems to
identify it as DEV_TYPE_PCI given "PCI" above.  It is an integrated
PCI device, so it has no upstream bridge.  Maybe that is why it can
still be assigned?  The "unsupported assignment" is a legacy PCI
device, behind a bridge, with an RMRR?

Thanks,
Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-07  7:50     ` Jan Beulich
  2022-04-07 16:31       ` Jason Andryuk
@ 2022-04-08  4:02       ` Tian, Kevin
  1 sibling, 0 replies; 15+ messages in thread
From: Tian, Kevin @ 2022-04-08  4:02 UTC (permalink / raw)
  To: Beulich, Jan, Pau Monné, Roger
  Cc: xen-devel, Cooper, Andrew, Paul Durrant

> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, April 7, 2022 3:50 PM
> 
> On 07.04.2022 09:41, Roger Pau Monné wrote:
> > On Thu, Apr 07, 2022 at 08:11:06AM +0200, Jan Beulich wrote:
> >> First there's a printk() which actually wrongly uses pdev in the first
> >> place: We want to log the coordinates of the (perhaps fake) device
> >> acted upon, which may not be pdev.
> >>
> >> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
> >> device quarantine page tables (part I)") to add a domid_t parameter to
> >> domain_context_unmap_one(): It's only used to pass back here via
> >> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter
> again.
> >>
> >> Finally there's the invocation of domain_context_mapping_one(), which
> >> needs to be passed the correct domain ID. Avoid taking that path when
> >> pdev is NULL and the quarantine state is what would need restoring to.
> >> This means we can't security-support PCI devices with RMRRs (if such
> >> exist in practice) any longer.
> >
> > The sentence:
> >
> > "This means we can't security-support PCI devices with RMRRs"
> >
> > Seems too broad and could lead to confusion. So I would maybe use:
> > "legacy PCI devices" or "non PCI Express devices".
> 
> Right. I did actually forget to either drop or edit that sentence. I've
> now extended this to
> 
> "This means we can't security-support non-PCI-Express devices with RMRRs
>  (if such exist in practice) any longer; note that as of trhe 1st of the
>  two commits referenced below assigning them to DomU-s is unsupported
>  anyway."
> 

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path
  2022-04-07  6:11 ` [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
  2022-04-07  7:51   ` Roger Pau Monné
@ 2022-04-08  4:06   ` Tian, Kevin
  1 sibling, 0 replies; 15+ messages in thread
From: Tian, Kevin @ 2022-04-08  4:06 UTC (permalink / raw)
  To: Beulich, Jan, xen-devel
  Cc: Pau Monné, Roger, Cooper, Andrew, Paul Durrant

> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, April 7, 2022 2:12 PM
> 
> Despite the comment there infinite recursion was still possible, by
> flip-flopping between two domains. This is because prev_dom is derived
> from the DID found in the context entry, which was already updated by
> the time error recovery is invoked. Simply introduce yet another mode
> flag to prevent rolling back an in-progress roll-back of a prior
> mapping attempt.
> 
> Also drop the existing recursion prevention for having been dead anyway:
> Earlier in the function we already bail when prev_dom == domain.
> 
> Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> v2: Extend scope of the approach taken. Leverage for some cleanup.
> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1599,7 +1599,7 @@ int domain_context_mapping_one(
>      if ( !seg && !rc )
>          rc = me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode);
> 
> -    if ( rc )
> +    if ( rc && !(mode & MAP_ERROR_RECOVERY) )
>      {
>          if ( !prev_dom ||
>               /*
> @@ -1609,13 +1609,12 @@ int domain_context_mapping_one(
>                */
>               (prev_dom == dom_io && !pdev) )
>              ret = domain_context_unmap_one(domain, iommu, bus, devfn);
> -        else if ( prev_dom != domain ) /* Avoid infinite recursion. */
> +        else
>              ret = domain_context_mapping_one(prev_dom, iommu, bus, devfn,
> pdev,
>                                               DEVICE_DOMID(prev_dom, pdev),
>                                               DEVICE_PGTABLE(prev_dom, pdev),
> -                                             mode & MAP_WITH_RMRR) < 0;
> -        else
> -            ret = 1;
> +                                             (mode & MAP_WITH_RMRR) |
> +                                             MAP_ERROR_RECOVERY) < 0;
> 
>          if ( !ret && pdev && pdev->devfn == devfn )
>              check_cleanup_domid_map(domain, pdev, iommu);
> --- a/xen/drivers/passthrough/vtd/vtd.h
> +++ b/xen/drivers/passthrough/vtd/vtd.h
> @@ -29,7 +29,8 @@
>  #define MAP_WITH_RMRR         (1u << 0)
>  #define MAP_OWNER_DYING       (1u << 1)
>  #define MAP_SINGLE_DEVICE     (1u << 2)
> -#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
> +#define MAP_ERROR_RECOVERY    (1u << 3)
> +#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
> 
>  /* Allow for both IOAPIC and IOSAPIC. */
>  #define IO_xAPIC_route_entry IO_APIC_route_entry


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 0.9/2] VT-d: don't needlessly look up DID
  2022-04-07  9:27 ` [PATCH v2 0.9/2] VT-d: don't needlessly look up DID Jan Beulich
  2022-04-07  9:40   ` Roger Pau Monné
@ 2022-04-08  4:08   ` Tian, Kevin
  1 sibling, 0 replies; 15+ messages in thread
From: Tian, Kevin @ 2022-04-08  4:08 UTC (permalink / raw)
  To: Beulich, Jan, xen-devel
  Cc: Pau Monné, Roger, Cooper, Andrew, Paul Durrant

> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, April 7, 2022 5:28 PM
> 
> If get_iommu_domid() in domain_context_unmap_one() fails, we better
> wouldn't clear the context entry in the first place, as we're then unable
> to issue the corresponding flush. However, we have no need to look up the
> DID in the first place: What needs flushing is very specifically the DID
> that was in the context entry before our clearing of it.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> This (an intended follow-up to XSA-399) is actually a prereq to what was
> called patch 1 so far in this series.
> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1830,18 +1830,12 @@ int domain_context_unmap_one(
>          return 0;
>      }
> 
> +    iommu_domid = context_domain_id(*context);
> +
>      context_clear_present(*context);
>      context_clear_entry(*context);
>      iommu_sync_cache(context, sizeof(struct context_entry));
> 
> -    iommu_domid = get_iommu_did(domid, iommu, !domain->is_dying);
> -    if ( iommu_domid == -1 )
> -    {
> -        spin_unlock(&iommu->lock);
> -        unmap_vtd_domain_page(context_entries);
> -        return -EINVAL;
> -    }
> -
>      rc = iommu_flush_context_device(iommu, iommu_domid,
>                                      PCI_BDF2(bus, devfn),
>                                      DMA_CCMD_MASK_NOBIT, 0);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-07 16:31       ` Jason Andryuk
@ 2022-04-08  6:03         ` Jan Beulich
  2022-04-08 12:12           ` Jason Andryuk
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2022-04-08  6:03 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Roger Pau Monné, xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On 07.04.2022 18:31, Jason Andryuk wrote:
> Hi,
> 
> On Thu, Apr 7, 2022 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 07.04.2022 09:41, Roger Pau Monné wrote:
>>> On Thu, Apr 07, 2022 at 08:11:06AM +0200, Jan Beulich wrote:
>>>> First there's a printk() which actually wrongly uses pdev in the first
>>>> place: We want to log the coordinates of the (perhaps fake) device
>>>> acted upon, which may not be pdev.
>>>>
>>>> Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
>>>> device quarantine page tables (part I)") to add a domid_t parameter to
>>>> domain_context_unmap_one(): It's only used to pass back here via
>>>> me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.
>>>>
>>>> Finally there's the invocation of domain_context_mapping_one(), which
>>>> needs to be passed the correct domain ID. Avoid taking that path when
>>>> pdev is NULL and the quarantine state is what would need restoring to.
>>>> This means we can't security-support PCI devices with RMRRs (if such
>>>> exist in practice) any longer.
>>>
>>> The sentence:
>>>
>>> "This means we can't security-support PCI devices with RMRRs"
>>>
>>> Seems too broad and could lead to confusion. So I would maybe use:
>>> "legacy PCI devices" or "non PCI Express devices".
>>
>> Right. I did actually forget to either drop or edit that sentence. I've
>> now extended this to
>>
>> "This means we can't security-support non-PCI-Express devices with RMRRs
>>  (if such exist in practice) any longer; note that as of trhe 1st of the
>>  two commits referenced below assigning them to DomU-s is unsupported
>>  anyway."
> 
> Mentioning "Express" makes the support statement clearer.  However,
> I'm not clear on what unsupported means in "assigning them to DomU-s
> is unsupported anyway".  They can't be assigned?  I'm testing with
> staging-4.16, so with XSA-400, but not this patch.  I seemingly have a
> legacy PCI device still being assigned to a domU.
> 
> It is an 8th Gen Intel laptop with:
> 00:14.0 USB controller: Intel Corporation Cannon Point-LP USB 3.1 xHCI
> Controller (rev 30) (prog-if 30 [XHCI]).
> 
> lspci output for 00:14.0 does *not* have capability "Express (v2)",
> but it does have an RMRR:
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:00:14.0
> 
> It looks like it is PCI compared to 39:00.0 which does have Express (v2):
> (XEN) [VT-D]d1:PCI: map 0000:00:14.0
> (XEN) [VT-D]d1:PCIe: map 0000:39:00.0
> 
> As I understand it, an RMRR is common with USB controllers for
> implementing legacy mouse & keyboard support.  The Cannon Point PCH is
> fairly modern, so I'd expect it to use PCI Express.  Xen seems to
> identify it as DEV_TYPE_PCI given "PCI" above.  It is an integrated
> PCI device, so it has no upstream bridge.  Maybe that is why it can
> still be assigned?  The "unsupported assignment" is a legacy PCI
> device, behind a bridge, with an RMRR?

Ah yes - the "behind a bridge" aspect does matter (but I can't
adjust the description of an already committed patch). That's both
for the respective part of the XSA-400 series as well as for the
change here.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths
  2022-04-08  6:03         ` Jan Beulich
@ 2022-04-08 12:12           ` Jason Andryuk
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Andryuk @ 2022-04-08 12:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel, Kevin Tian, Andrew Cooper, Paul Durrant

On Fri, Apr 8, 2022 at 2:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 07.04.2022 18:31, Jason Andryuk wrote:
> > As I understand it, an RMRR is common with USB controllers for
> > implementing legacy mouse & keyboard support.  The Cannon Point PCH is
> > fairly modern, so I'd expect it to use PCI Express.  Xen seems to
> > identify it as DEV_TYPE_PCI given "PCI" above.  It is an integrated
> > PCI device, so it has no upstream bridge.  Maybe that is why it can
> > still be assigned?  The "unsupported assignment" is a legacy PCI
> > device, behind a bridge, with an RMRR?
>
> Ah yes - the "behind a bridge" aspect does matter (but I can't
> adjust the description of an already committed patch). That's both
> for the respective part of the XSA-400 series as well as for the
> change here.

Ok.  Thank you for confirming.  It is captured in the mailing list
archive at least.

Regards,
Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-08 12:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07  6:09 [PATCH v2 0/2] VT-d: address fallout from XSA-400 Jan Beulich
2022-04-07  6:11 ` [PATCH v2 1/2] VT-d: avoid NULL deref on domain_context_mapping_one() error paths Jan Beulich
2022-04-07  7:41   ` Roger Pau Monné
2022-04-07  7:50     ` Jan Beulich
2022-04-07 16:31       ` Jason Andryuk
2022-04-08  6:03         ` Jan Beulich
2022-04-08 12:12           ` Jason Andryuk
2022-04-08  4:02       ` Tian, Kevin
2022-04-07  6:11 ` [PATCH v2 2/2] VT-d: avoid infinite recursion on domain_context_mapping_one() error path Jan Beulich
2022-04-07  7:51   ` Roger Pau Monné
2022-04-07  8:54     ` Jan Beulich
2022-04-08  4:06   ` Tian, Kevin
2022-04-07  9:27 ` [PATCH v2 0.9/2] VT-d: don't needlessly look up DID Jan Beulich
2022-04-07  9:40   ` Roger Pau Monné
2022-04-08  4:08   ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.