All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Security Advisory 337 v3 (CVE-2020-25595) - PCI passthrough code reading back hardware registers
@ 2020-09-22 13:37 Xen.org security team
  0 siblings, 0 replies; only message in thread
From: Xen.org security team @ 2020-09-22 13:37 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 5698 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2020-25595 / XSA-337
                               version 3

         PCI passthrough code reading back hardware registers

UPDATES IN VERSION 3
====================

Public release.

ISSUE DESCRIPTION
=================

Code paths in Xen's MSI handling have been identified which act on
unsanitized values read back from device hardware registers.  While
devices strictly compliant with PCI specifications shouldn't be able to
affect these registers, experience shows that it's very common for
devices to have out-of-spec "backdoor" operations which can affect the
result of these reads.

IMPACT
======

A not fully trusted guest may be able to crash Xen, leading to a Denial
of Service (DoS) for the entire system.  Privilege escalation and
information leaks cannot be excluded.

VULNERABLE SYSTEMS
==================

All versions of Xen supporting PCI passthrough are affected.

Only x86 systems are vulnerable.  Arm systems are not vulnerable.

Only guests with passed through PCI devices may be able to leverage
the vulnerability.

Only systems passing through devices with out-of-spec ("backdoor")
functionality can cause issues.  Experience shows that such out-of-spec
functionality is common; unless you have reason to believe that your
device does not have such functionality, it's better to assume that it
does.

REMINDER OF PCI PASSTHROUGH SUPPORT STATEMENT
=============================================

The security team wishes to reiterate our support statement for PCI
Device Passthrough (found in xen.git/SUPPORT.md):

"Because of hardware limitations (affecting any operating system or
hypervisor), it is generally not safe to use this feature to expose a
physical device to completely untrusted guests.  However, this feature
can still confer significant security benefit when used to remove
drivers and backends from domain 0 (i.e., Driver Domains)."

The possibility of "backdoor" device functionality mentioned above is
one of the major reasons for this stance.  We issue this XSA to help
maintain Driver Domains as a "defense-in-depth", and also on behalf of
those who may have done full security audits of their particular
hardware platform.  It does not change our stance that passing through
PCI devices to untrusted guests is in general not safe.

MITIGATION
==========

Not passing through physical devices to untrusted guests will avoid
the vulnerability.

CREDITS
=======

This issue was discovered by Andrew Cooper of Citrix.

RESOLUTION
==========

Applying the appropriate pair of attached patches resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa337/xsa337-?.patch           Xen 4.14 - xen-unstable
xsa337/xsa337-4.13-?.patch      Xen 4.13
xsa337/xsa337-4.12-?.patch      Xen 4.10 - 4.12

$ sha256sum xsa337* xsa337*/*
f027d07fb307f5441ee9d19b6385e421bba745059667d181031b0bfd7047a15b  xsa337.meta
98c48781dd46bf6ff6cc46246c6c9f2e2be6ec696c0e7918d4b82845588ce04e  xsa337/xsa337-1.patch
9e8ae24222371379f2ea62e14fcc7f7282e01c356dff230c22c9ab1d2fb941e2  xsa337/xsa337-2.patch
a6744fdab01877e098f88dcd3bee10c3146aef66170a1422b3811cd09fc9faef  xsa337/xsa337-4.12-1.patch
a091652f1a3c0bf851e35b61d338d53b4690fab828b3c30f354c28c377af2aee  xsa337/xsa337-4.12-2.patch
fb27fd2508e017bf05131eb3d31bb8cc56c79690cbb7f1af76cb92fd568040a1  xsa337/xsa337-4.13-1.patch
a25bc70ad55716ce3a0d9435fa2b0a492420a0eabfb0e3f94cd27de10242d98b  xsa337/xsa337-4.13-2.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.

This is because removing of pass-through devices or their replacement by
emulated devices is a guest visible configuration change, which may lead
to re-discovery of the issue.

Deployment of this mitigation is permitted only AFTER the embargo ends.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl9p/ecMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZlcIIAKtn9RdA/CjIRcodtfxnnFPQu9SRtmHfLNQ2Vjmu
F1nIjjEklUNJSpGlEGjG1cq3oA/SZTm2jYXu2k4rcAyrl0bhaflSoL/N+Fmwo2Ym
898KA8gLdIckagxz5WKVv/vqc3x/h2IZgN4AUgt73buUOxEBFudqJKvnwtep5Z5R
60MDs+lp/5Mp6cXUukAWzPnmtJDWZ4s4QHXHNkKXTUpByZfmGJqqflL6yJDFHSxt
vvGpvElApkMP4Ks+rPoCrdG/ObbQvgwMgSJ//tnnWayfs1asOxrRbFlLAt4yVvdt
Y6Hi69hHB+ZWO36qy5dvjjKk0ftbrPAPDbDk27y/zuKXhko=
=TzZR
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa337.meta --]
[-- Type: application/octet-stream, Size: 1931 bytes --]

{
  "XSA": 337,
  "SupportedVersions": [
    "master",
    "4.14",
    "4.13",
    "4.12",
    "4.11",
    "4.10"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.10": {
      "Recipes": {
        "xen": {
          "StableRef": "93be943e7d759015bd5db41a48f6dce58e580d5a",
          "Prereqs": [
            336
          ],
          "Patches": [
            "xsa337/xsa337-4.12-?.patch"
          ]
        }
      }
    },
    "4.11": {
      "Recipes": {
        "xen": {
          "StableRef": "ddaaccbbab6b19bf21ed2c097f3055a3c2544c8d",
          "Prereqs": [
            333,
            336
          ],
          "Patches": [
            "xsa337/xsa337-4.12-?.patch"
          ]
        }
      }
    },
    "4.12": {
      "Recipes": {
        "xen": {
          "StableRef": "1336ca17742471fc4a59879ae2f637a59530a933",
          "Prereqs": [
            333,
            334,
            336
          ],
          "Patches": [
            "xsa337/xsa337-4.12-?.patch"
          ]
        }
      }
    },
    "4.13": {
      "Recipes": {
        "xen": {
          "StableRef": "9b367b2b0b714f3ffb69ed6be0a118e8d3eac07f",
          "Prereqs": [
            333,
            334,
            336
          ],
          "Patches": [
            "xsa337/xsa337-4.13-?.patch"
          ]
        }
      }
    },
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "c3a0fc22af90ef28e68b116c6a49d9cec57f71cf",
          "Prereqs": [
            333,
            334,
            336
          ],
          "Patches": [
            "xsa337/xsa337-?.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "b11910082d90bb1597f6679524eb726a33306672",
          "Prereqs": [
            333,
            334,
            336
          ],
          "Patches": [
            "xsa337/xsa337-?.patch"
          ]
        }
      }
    }
  }
}

[-- Attachment #3: xsa337/xsa337-1.patch --]
[-- Type: application/octet-stream, Size: 2726 bytes --]

From: Roger Pau Monné <roger.pau@citrix.com>
Subject: x86/msi: get rid of read_msi_msg

It's safer and faster to just use the cached last written
(untranslated) MSI message stored in msi_desc for the single user that
calls read_msi_msg.

This also prevents relying on the data read from the device MSI
registers in order to figure out the index into the IOMMU interrupt
remapping table, which is not safe.

This is part of XSA-337.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -185,54 +185,6 @@ void msi_compose_msg(unsigned vector, co
                 MSI_DATA_VECTOR(vector);
 }

-static bool read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
-{
-    switch ( entry->msi_attrib.type )
-    {
-    case PCI_CAP_ID_MSI:
-    {
-        struct pci_dev *dev = entry->dev;
-        int pos = entry->msi_attrib.pos;
-        uint16_t data;
-
-        msg->address_lo = pci_conf_read32(dev->sbdf,
-                                          msi_lower_address_reg(pos));
-        if ( entry->msi_attrib.is_64 )
-        {
-            msg->address_hi = pci_conf_read32(dev->sbdf,
-                                              msi_upper_address_reg(pos));
-            data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 1));
-        }
-        else
-        {
-            msg->address_hi = 0;
-            data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 0));
-        }
-        msg->data = data;
-        break;
-    }
-    case PCI_CAP_ID_MSIX:
-    {
-        void __iomem *base = entry->mask_base;
-
-        if ( unlikely(!msix_memory_decoded(entry->dev,
-                                           entry->msi_attrib.pos)) )
-            return false;
-        msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
-        msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
-        msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
-        break;
-    }
-    default:
-        BUG();
-    }
-
-    if ( iommu_intremap )
-        iommu_read_msi_from_ire(entry, msg);
-
-    return true;
-}
-
 static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
     entry->msg = *msg;
@@ -304,10 +256,7 @@ void set_msi_affinity(struct irq_desc *d

     ASSERT(spin_is_locked(&desc->lock));

-    memset(&msg, 0, sizeof(msg));
-    if ( !read_msi_msg(msi_desc, &msg) )
-        return;
-
+    msg = msi_desc->msg;
     msg.data &= ~MSI_DATA_VECTOR_MASK;
     msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;

[-- Attachment #4: xsa337/xsa337-2.patch --]
[-- Type: application/octet-stream, Size: 6424 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/MSI-X: restrict reading of table/PBA bases from BARs

When assigned to less trusted or un-trusted guests, devices may change
state behind our backs (they may e.g. get reset by means we may not know
about). Therefore we should avoid reading BARs from hardware once a
device is no longer owned by Dom0. Furthermore when we can't read a BAR,
or when we read zero, we shouldn't instead use the caller provided
address unless that caller can be trusted.

Re-arrange the logic in msix_capability_init() such that only Dom0 (and
only if the device isn't DomU-owned yet) or calls through
PHYSDEVOP_prepare_msix will actually result in the reading of the
respective BAR register(s). Additionally do so only as long as in-use
table entries are known (note that invocation of PHYSDEVOP_prepare_msix
counts as a "pseudo" entry). In all other uses the value already
recorded will get used instead.

Clear the recorded values in _pci_cleanup_msix() as well as on the one
affected error path. (Adjust this error path to also avoid blindly
disabling MSI-X when it was enabled on entry to the function.)

While moving around variable declarations (in many cases to reduce their
scopes), also adjust some of their types.

This is part of XSA-337.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v2: Use "unsigned int" for moved bir, pbus, etc. Further restrict under
    what conditions to read the BAR(s).

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -771,16 +771,14 @@ static int msix_capability_init(struct p
 {
     struct arch_msix *msix = dev->msix;
     struct msi_desc *entry = NULL;
-    int vf;
     u16 control;
     u64 table_paddr;
     u32 table_offset;
-    u8 bir, pbus, pslot, pfunc;
     u16 seg = dev->seg;
     u8 bus = dev->bus;
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
-    bool maskall = msix->host_maskall;
+    bool maskall = msix->host_maskall, zap_on_error = false;
     unsigned int pos = pci_find_cap_offset(seg, bus, slot, func,
                                            PCI_CAP_ID_MSIX);
 
@@ -822,43 +820,45 @@ static int msix_capability_init(struct p
 
     /* Locate MSI-X table region */
     table_offset = pci_conf_read32(dev->sbdf, msix_table_offset_reg(pos));
-    bir = (u8)(table_offset & PCI_MSIX_BIRMASK);
-    table_offset &= ~PCI_MSIX_BIRMASK;
+    if ( !msix->used_entries &&
+         (!msi ||
+          (is_hardware_domain(current->domain) &&
+           (dev->domain == current->domain || dev->domain == dom_io))) )
+    {
+        unsigned int bir = table_offset & PCI_MSIX_BIRMASK, pbus, pslot, pfunc;
+        int vf;
+        paddr_t pba_paddr;
+        unsigned int pba_offset;
 
-    if ( !dev->info.is_virtfn )
-    {
-        pbus = bus;
-        pslot = slot;
-        pfunc = func;
-        vf = -1;
-    }
-    else
-    {
-        pbus = dev->info.physfn.bus;
-        pslot = PCI_SLOT(dev->info.physfn.devfn);
-        pfunc = PCI_FUNC(dev->info.physfn.devfn);
-        vf = PCI_BDF2(dev->bus, dev->devfn);
-    }
-
-    table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
-    WARN_ON(msi && msi->table_base != table_paddr);
-    if ( !table_paddr )
-    {
-        if ( !msi || !msi->table_base )
+        if ( !dev->info.is_virtfn )
         {
-            pci_conf_write16(dev->sbdf, msix_control_reg(pos),
-                             control & ~PCI_MSIX_FLAGS_ENABLE);
-            xfree(entry);
-            return -ENXIO;
+            pbus = bus;
+            pslot = slot;
+            pfunc = func;
+            vf = -1;
+        }
+        else
+        {
+            pbus = dev->info.physfn.bus;
+            pslot = PCI_SLOT(dev->info.physfn.devfn);
+            pfunc = PCI_FUNC(dev->info.physfn.devfn);
+            vf = PCI_BDF2(dev->bus, dev->devfn);
         }
-        table_paddr = msi->table_base;
-    }
-    table_paddr += table_offset;
 
-    if ( !msix->used_entries )
-    {
-        u64 pba_paddr;
-        u32 pba_offset;
+        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        WARN_ON(msi && msi->table_base != table_paddr);
+        if ( !table_paddr )
+        {
+            if ( !msi || !msi->table_base )
+            {
+                pci_conf_write16(dev->sbdf, msix_control_reg(pos),
+                                 control & ~PCI_MSIX_FLAGS_ENABLE);
+                xfree(entry);
+                return -ENXIO;
+            }
+            table_paddr = msi->table_base;
+        }
+        table_paddr += table_offset & ~PCI_MSIX_BIRMASK;
 
         msix->table.first = PFN_DOWN(table_paddr);
         msix->table.last = PFN_DOWN(table_paddr +
@@ -877,7 +877,18 @@ static int msix_capability_init(struct p
                                   BITS_TO_LONGS(msix->nr_entries) - 1);
         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, msix->pba.first,
                                         msix->pba.last));
+
+        zap_on_error = true;
+    }
+    else if ( !msix->table.first )
+    {
+        pci_conf_write16(dev->sbdf, msix_control_reg(pos), control);
+        xfree(entry);
+        return -ENODATA;
     }
+    else
+        table_paddr = (msix->table.first << PAGE_SHIFT) +
+                      PAGE_OFFSET(table_offset & ~PCI_MSIX_BIRMASK);
 
     if ( entry )
     {
@@ -888,8 +899,15 @@ static int msix_capability_init(struct p
 
         if ( idx < 0 )
         {
-            pci_conf_write16(dev->sbdf, msix_control_reg(pos),
-                             control & ~PCI_MSIX_FLAGS_ENABLE);
+            if ( zap_on_error )
+            {
+                msix->table.first = 0;
+                msix->pba.first = 0;
+
+                control &= ~PCI_MSIX_FLAGS_ENABLE;
+            }
+
+            pci_conf_write16(dev->sbdf, msix_control_reg(pos), control);
             xfree(entry);
             return idx;
         }
@@ -1072,9 +1090,14 @@ static void _pci_cleanup_msix(struct arc
         if ( rangeset_remove_range(mmio_ro_ranges, msix->table.first,
                                    msix->table.last) )
             WARN();
+        msix->table.first = 0;
+        msix->table.last = 0;
+
         if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
                                    msix->pba.last) )
             WARN();
+        msix->pba.first = 0;
+        msix->pba.last = 0;
     }
 }
 

[-- Attachment #5: xsa337/xsa337-4.12-1.patch --]
[-- Type: application/octet-stream, Size: 2963 bytes --]

From: Roger Pau Monné <roger.pau@citrix.com>
Subject: x86/msi: get rid of read_msi_msg

It's safer and faster to just use the cached last written
(untranslated) MSI message stored in msi_desc for the single user that
calls read_msi_msg.

This also prevents relying on the data read from the device MSI
registers in order to figure out the index into the IOMMU interrupt
remapping table, which is not safe.

This is part of XSA-337.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -192,59 +192,6 @@ void msi_compose_msg(unsigned vector, co
                 MSI_DATA_VECTOR(vector);
 }

-static bool read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
-{
-    switch ( entry->msi_attrib.type )
-    {
-    case PCI_CAP_ID_MSI:
-    {
-        struct pci_dev *dev = entry->dev;
-        int pos = entry->msi_attrib.pos;
-        u16 data, seg = dev->seg;
-        u8 bus = dev->bus;
-        u8 slot = PCI_SLOT(dev->devfn);
-        u8 func = PCI_FUNC(dev->devfn);
-
-        msg->address_lo = pci_conf_read32(seg, bus, slot, func,
-                                          msi_lower_address_reg(pos));
-        if ( entry->msi_attrib.is_64 )
-        {
-            msg->address_hi = pci_conf_read32(seg, bus, slot, func,
-                                              msi_upper_address_reg(pos));
-            data = pci_conf_read16(seg, bus, slot, func,
-                                   msi_data_reg(pos, 1));
-        }
-        else
-        {
-            msg->address_hi = 0;
-            data = pci_conf_read16(seg, bus, slot, func,
-                                   msi_data_reg(pos, 0));
-        }
-        msg->data = data;
-        break;
-    }
-    case PCI_CAP_ID_MSIX:
-    {
-        void __iomem *base = entry->mask_base;
-
-        if ( unlikely(!msix_memory_decoded(entry->dev,
-                                           entry->msi_attrib.pos)) )
-            return false;
-        msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
-        msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
-        msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
-        break;
-    }
-    default:
-        BUG();
-    }
-
-    if ( iommu_intremap )
-        iommu_read_msi_from_ire(entry, msg);
-
-    return true;
-}
-
 static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
     entry->msg = *msg;
@@ -322,10 +269,7 @@ void set_msi_affinity(struct irq_desc *d

     ASSERT(spin_is_locked(&desc->lock));

-    memset(&msg, 0, sizeof(msg));
-    if ( !read_msi_msg(msi_desc, &msg) )
-        return;
-
+    msg = msi_desc->msg;
     msg.data &= ~MSI_DATA_VECTOR_MASK;
     msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;

[-- Attachment #6: xsa337/xsa337-4.12-2.patch --]
[-- Type: application/octet-stream, Size: 6279 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/MSI-X: restrict reading of table/PBA bases from BARs

When assigned to less trusted or un-trusted guests, devices may change
state behind our backs (they may e.g. get reset by means we may not know
about). Therefore we should avoid reading BARs from hardware once a
device is no longer owned by Dom0. Furthermore when we can't read a BAR,
or when we read zero, we shouldn't instead use the caller provided
address unless that caller can be trusted.

Re-arrange the logic in msix_capability_init() such that only Dom0 (and
only if the device isn't DomU-owned yet) or calls through
PHYSDEVOP_prepare_msix will actually result in the reading of the
respective BAR register(s). Additionally do so only as long as in-use
table entries are known (note that invocation of PHYSDEVOP_prepare_msix
counts as a "pseudo" entry). In all other uses the value already
recorded will get used instead.

Clear the recorded values in _pci_cleanup_msix() as well as on the one
affected error path. (Adjust this error path to also avoid blindly
disabling MSI-X when it was enabled on entry to the function.)

While moving around variable declarations (in many cases to reduce their
scopes), also adjust some of their types.

This is part of XSA-337.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -790,16 +790,14 @@ static int msix_capability_init(struct p
 {
     struct arch_msix *msix = dev->msix;
     struct msi_desc *entry = NULL;
-    int vf;
     u16 control;
     u64 table_paddr;
     u32 table_offset;
-    u8 bir, pbus, pslot, pfunc;
     u16 seg = dev->seg;
     u8 bus = dev->bus;
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
-    bool maskall = msix->host_maskall;
+    bool maskall = msix->host_maskall, zap_on_error = false;
 
     ASSERT(pcidevs_locked());
 
@@ -837,43 +835,45 @@ static int msix_capability_init(struct p
     /* Locate MSI-X table region */
     table_offset = pci_conf_read32(seg, bus, slot, func,
                                    msix_table_offset_reg(pos));
-    bir = (u8)(table_offset & PCI_MSIX_BIRMASK);
-    table_offset &= ~PCI_MSIX_BIRMASK;
+    if ( !msix->used_entries &&
+         (!msi ||
+          (is_hardware_domain(current->domain) &&
+           (dev->domain == current->domain || dev->domain == dom_io))) )
+    {
+        unsigned int bir = table_offset & PCI_MSIX_BIRMASK, pbus, pslot, pfunc;
+        int vf;
+        paddr_t pba_paddr;
+        unsigned int pba_offset;
 
-    if ( !dev->info.is_virtfn )
-    {
-        pbus = bus;
-        pslot = slot;
-        pfunc = func;
-        vf = -1;
-    }
-    else
-    {
-        pbus = dev->info.physfn.bus;
-        pslot = PCI_SLOT(dev->info.physfn.devfn);
-        pfunc = PCI_FUNC(dev->info.physfn.devfn);
-        vf = PCI_BDF2(dev->bus, dev->devfn);
-    }
-
-    table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
-    WARN_ON(msi && msi->table_base != table_paddr);
-    if ( !table_paddr )
-    {
-        if ( !msi || !msi->table_base )
+        if ( !dev->info.is_virtfn )
         {
-            pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
-                             control & ~PCI_MSIX_FLAGS_ENABLE);
-            xfree(entry);
-            return -ENXIO;
+            pbus = bus;
+            pslot = slot;
+            pfunc = func;
+            vf = -1;
+        }
+        else
+        {
+            pbus = dev->info.physfn.bus;
+            pslot = PCI_SLOT(dev->info.physfn.devfn);
+            pfunc = PCI_FUNC(dev->info.physfn.devfn);
+            vf = PCI_BDF2(dev->bus, dev->devfn);
         }
-        table_paddr = msi->table_base;
-    }
-    table_paddr += table_offset;
 
-    if ( !msix->used_entries )
-    {
-        u64 pba_paddr;
-        u32 pba_offset;
+        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        WARN_ON(msi && msi->table_base != table_paddr);
+        if ( !table_paddr )
+        {
+            if ( !msi || !msi->table_base )
+            {
+                pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
+                                 control & ~PCI_MSIX_FLAGS_ENABLE);
+                xfree(entry);
+                return -ENXIO;
+            }
+            table_paddr = msi->table_base;
+        }
+        table_paddr += table_offset & ~PCI_MSIX_BIRMASK;
 
         msix->nr_entries = nr_entries;
         msix->table.first = PFN_DOWN(table_paddr);
@@ -894,7 +894,19 @@ static int msix_capability_init(struct p
                                   BITS_TO_LONGS(nr_entries) - 1);
         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, msix->pba.first,
                                         msix->pba.last));
+
+        zap_on_error = true;
     }
+    else if ( !msix->table.first )
+    {
+        pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
+                         control);
+        xfree(entry);
+        return -ENODATA;
+    }
+    else
+        table_paddr = (msix->table.first << PAGE_SHIFT) +
+                      (table_offset & ~PCI_MSIX_BIRMASK & ~PAGE_MASK);
 
     if ( entry )
     {
@@ -905,8 +917,16 @@ static int msix_capability_init(struct p
 
         if ( idx < 0 )
         {
+            if ( zap_on_error )
+            {
+                msix->table.first = 0;
+                msix->pba.first = 0;
+
+                control &= ~PCI_MSIX_FLAGS_ENABLE;
+            }
+
             pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
-                             control & ~PCI_MSIX_FLAGS_ENABLE);
+                             control);
             xfree(entry);
             return idx;
         }
@@ -1102,9 +1122,14 @@ static void _pci_cleanup_msix(struct arc
         if ( rangeset_remove_range(mmio_ro_ranges, msix->table.first,
                                    msix->table.last) )
             WARN();
+        msix->table.first = 0;
+        msix->table.last = 0;
+
         if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
                                    msix->pba.last) )
             WARN();
+        msix->pba.first = 0;
+        msix->pba.last = 0;
     }
 }
 

[-- Attachment #7: xsa337/xsa337-4.13-1.patch --]
[-- Type: application/octet-stream, Size: 2726 bytes --]

From: Roger Pau Monné <roger.pau@citrix.com>
Subject: x86/msi: get rid of read_msi_msg

It's safer and faster to just use the cached last written
(untranslated) MSI message stored in msi_desc for the single user that
calls read_msi_msg.

This also prevents relying on the data read from the device MSI
registers in order to figure out the index into the IOMMU interrupt
remapping table, which is not safe.

This is part of XSA-337.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -183,54 +183,6 @@ void msi_compose_msg(unsigned vector, co
                 MSI_DATA_VECTOR(vector);
 }

-static bool read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
-{
-    switch ( entry->msi_attrib.type )
-    {
-    case PCI_CAP_ID_MSI:
-    {
-        struct pci_dev *dev = entry->dev;
-        int pos = entry->msi_attrib.pos;
-        uint16_t data;
-
-        msg->address_lo = pci_conf_read32(dev->sbdf,
-                                          msi_lower_address_reg(pos));
-        if ( entry->msi_attrib.is_64 )
-        {
-            msg->address_hi = pci_conf_read32(dev->sbdf,
-                                              msi_upper_address_reg(pos));
-            data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 1));
-        }
-        else
-        {
-            msg->address_hi = 0;
-            data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 0));
-        }
-        msg->data = data;
-        break;
-    }
-    case PCI_CAP_ID_MSIX:
-    {
-        void __iomem *base = entry->mask_base;
-
-        if ( unlikely(!msix_memory_decoded(entry->dev,
-                                           entry->msi_attrib.pos)) )
-            return false;
-        msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
-        msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
-        msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
-        break;
-    }
-    default:
-        BUG();
-    }
-
-    if ( iommu_intremap )
-        iommu_read_msi_from_ire(entry, msg);
-
-    return true;
-}
-
 static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
     entry->msg = *msg;
@@ -302,10 +254,7 @@ void set_msi_affinity(struct irq_desc *d

     ASSERT(spin_is_locked(&desc->lock));

-    memset(&msg, 0, sizeof(msg));
-    if ( !read_msi_msg(msi_desc, &msg) )
-        return;
-
+    msg = msi_desc->msg;
     msg.data &= ~MSI_DATA_VECTOR_MASK;
     msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;

[-- Attachment #8: xsa337/xsa337-4.13-2.patch --]
[-- Type: application/octet-stream, Size: 6310 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/MSI-X: restrict reading of table/PBA bases from BARs

When assigned to less trusted or un-trusted guests, devices may change
state behind our backs (they may e.g. get reset by means we may not know
about). Therefore we should avoid reading BARs from hardware once a
device is no longer owned by Dom0. Furthermore when we can't read a BAR,
or when we read zero, we shouldn't instead use the caller provided
address unless that caller can be trusted.

Re-arrange the logic in msix_capability_init() such that only Dom0 (and
only if the device isn't DomU-owned yet) or calls through
PHYSDEVOP_prepare_msix will actually result in the reading of the
respective BAR register(s). Additionally do so only as long as in-use
table entries are known (note that invocation of PHYSDEVOP_prepare_msix
counts as a "pseudo" entry). In all other uses the value already
recorded will get used instead.

Clear the recorded values in _pci_cleanup_msix() as well as on the one
affected error path. (Adjust this error path to also avoid blindly
disabling MSI-X when it was enabled on entry to the function.)

While moving around variable declarations (in many cases to reduce their
scopes), also adjust some of their types.

This is part of XSA-337.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -769,16 +769,14 @@ static int msix_capability_init(struct p
 {
     struct arch_msix *msix = dev->msix;
     struct msi_desc *entry = NULL;
-    int vf;
     u16 control;
     u64 table_paddr;
     u32 table_offset;
-    u8 bir, pbus, pslot, pfunc;
     u16 seg = dev->seg;
     u8 bus = dev->bus;
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
-    bool maskall = msix->host_maskall;
+    bool maskall = msix->host_maskall, zap_on_error = false;
     unsigned int pos = pci_find_cap_offset(seg, bus, slot, func,
                                            PCI_CAP_ID_MSIX);
 
@@ -820,43 +818,45 @@ static int msix_capability_init(struct p
 
     /* Locate MSI-X table region */
     table_offset = pci_conf_read32(dev->sbdf, msix_table_offset_reg(pos));
-    bir = (u8)(table_offset & PCI_MSIX_BIRMASK);
-    table_offset &= ~PCI_MSIX_BIRMASK;
+    if ( !msix->used_entries &&
+         (!msi ||
+          (is_hardware_domain(current->domain) &&
+           (dev->domain == current->domain || dev->domain == dom_io))) )
+    {
+        unsigned int bir = table_offset & PCI_MSIX_BIRMASK, pbus, pslot, pfunc;
+        int vf;
+        paddr_t pba_paddr;
+        unsigned int pba_offset;
 
-    if ( !dev->info.is_virtfn )
-    {
-        pbus = bus;
-        pslot = slot;
-        pfunc = func;
-        vf = -1;
-    }
-    else
-    {
-        pbus = dev->info.physfn.bus;
-        pslot = PCI_SLOT(dev->info.physfn.devfn);
-        pfunc = PCI_FUNC(dev->info.physfn.devfn);
-        vf = PCI_BDF2(dev->bus, dev->devfn);
-    }
-
-    table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
-    WARN_ON(msi && msi->table_base != table_paddr);
-    if ( !table_paddr )
-    {
-        if ( !msi || !msi->table_base )
+        if ( !dev->info.is_virtfn )
         {
-            pci_conf_write16(dev->sbdf, msix_control_reg(pos),
-                             control & ~PCI_MSIX_FLAGS_ENABLE);
-            xfree(entry);
-            return -ENXIO;
+            pbus = bus;
+            pslot = slot;
+            pfunc = func;
+            vf = -1;
+        }
+        else
+        {
+            pbus = dev->info.physfn.bus;
+            pslot = PCI_SLOT(dev->info.physfn.devfn);
+            pfunc = PCI_FUNC(dev->info.physfn.devfn);
+            vf = PCI_BDF2(dev->bus, dev->devfn);
         }
-        table_paddr = msi->table_base;
-    }
-    table_paddr += table_offset;
 
-    if ( !msix->used_entries )
-    {
-        u64 pba_paddr;
-        u32 pba_offset;
+        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        WARN_ON(msi && msi->table_base != table_paddr);
+        if ( !table_paddr )
+        {
+            if ( !msi || !msi->table_base )
+            {
+                pci_conf_write16(dev->sbdf, msix_control_reg(pos),
+                                 control & ~PCI_MSIX_FLAGS_ENABLE);
+                xfree(entry);
+                return -ENXIO;
+            }
+            table_paddr = msi->table_base;
+        }
+        table_paddr += table_offset & ~PCI_MSIX_BIRMASK;
 
         msix->table.first = PFN_DOWN(table_paddr);
         msix->table.last = PFN_DOWN(table_paddr +
@@ -875,7 +875,18 @@ static int msix_capability_init(struct p
                                   BITS_TO_LONGS(msix->nr_entries) - 1);
         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, msix->pba.first,
                                         msix->pba.last));
+
+        zap_on_error = true;
+    }
+    else if ( !msix->table.first )
+    {
+        pci_conf_write16(dev->sbdf, msix_control_reg(pos), control);
+        xfree(entry);
+        return -ENODATA;
     }
+    else
+        table_paddr = (msix->table.first << PAGE_SHIFT) +
+                      (table_offset & ~PCI_MSIX_BIRMASK & ~PAGE_MASK);
 
     if ( entry )
     {
@@ -886,8 +897,15 @@ static int msix_capability_init(struct p
 
         if ( idx < 0 )
         {
-            pci_conf_write16(dev->sbdf, msix_control_reg(pos),
-                             control & ~PCI_MSIX_FLAGS_ENABLE);
+            if ( zap_on_error )
+            {
+                msix->table.first = 0;
+                msix->pba.first = 0;
+
+                control &= ~PCI_MSIX_FLAGS_ENABLE;
+            }
+
+            pci_conf_write16(dev->sbdf, msix_control_reg(pos), control);
             xfree(entry);
             return idx;
         }
@@ -1076,9 +1094,14 @@ static void _pci_cleanup_msix(struct arc
         if ( rangeset_remove_range(mmio_ro_ranges, msix->table.first,
                                    msix->table.last) )
             WARN();
+        msix->table.first = 0;
+        msix->table.last = 0;
+
         if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
                                    msix->pba.last) )
             WARN();
+        msix->pba.first = 0;
+        msix->pba.last = 0;
     }
 }
 

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-09-22 13:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 13:37 Xen Security Advisory 337 v3 (CVE-2020-25595) - PCI passthrough code reading back hardware registers Xen.org security team

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.