From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jan Beulich <jbeulich@suse.com>, Paul Durrant <paul@xen.org>,
"Cooper, Andrew" <andrew.cooper3@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: RE: Ping: [PATCH v5] IOMMU: make DMA containment of quarantined devices optional
Date: Wed, 7 Jul 2021 01:34:14 +0000 [thread overview]
Message-ID: <BN9PR11MB54337D419466D0DE95E83F8D8C1A9@BN9PR11MB5433.namprd11.prod.outlook.com> (raw)
In-Reply-To: <54ef2e6e-b9d8-8fc8-897c-ca7c3fb8bc1d@suse.com>
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, July 6, 2021 3:43 PM
>
> On 26.05.2021 10:19, Jan Beulich wrote:
> > IOMMU: make DMA containment of quarantined devices optional
> >
> > Containing still in flight DMA was introduced to work around certain
> > devices / systems hanging hard upon hitting a "not-present" IOMMU fault.
> > Passing through (such) devices (on such systems) is inherently insecure
> > (as guests could easily arrange for IOMMU faults of any kind to occur).
> > Defaulting to a mode where admins may not even become aware of issues
> > with devices can be considered undesirable. Therefore convert this mode
> > of operation to an optional one, not one enabled by default.
> >
> > This involves resurrecting code commit ea38867831da ("x86 / iommu: set
> > up a scratch page in the quarantine domain") did remove, in a slightly
> > extended and abstracted fashion. Here, instead of reintroducing a pretty
> > pointless use of "goto" in domain_context_unmap(), and instead of making
> > the function (at least temporarily) inconsistent, take the opportunity
> > and replace the other similarly pointless "goto" as well.
> >
> > In order to key the re-instated bypasses off of there (not) being a root
> > page table this further requires moving the allocate_domain_resources()
> > invocation from reassign_device() to amd_iommu_setup_domain_device()
> (or
> > else reassign_device() would allocate a root page table anyway); this is
> > benign to the second caller of the latter function.
> >
> > In VT-d's domain_context_unmap(), instead of adding yet another
> > "goto out" when all that's wanted is a "return", eliminate the "out"
> > label at the same time.
> >
> > Take the opportunity and also limit the control to builds supporting
> > PCI.
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> May I please ask for feedback here? While I consider it too late to
> get back fundamental objections (such should have been voiced
> earlier), I'm still willing to accept such if they come with an
> understandable reason and are backed by a majority, in which case
> I'd (not very happily) drop the patch despite my concerns with the
> original default chosen when the scratch-page variant of quarantining
> was introduced. But I'm not going to give up on this merely because
> of not getting any feedback at all; instead I'd then also have this
> fall under "lazy consensus", if need be.
>
> Jan
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>
> > ---
> > v5: IOMMU_quarantine_fault -> IOMMU_quarantine_basic,
> > IOMMU_quarantine_write_fault -> IOMMU_quarantine_scratch_page.
> > Amend command line description to clarify tool stack based
> > quarantining mode when "iommu=no-quarantine". Fully
> > s/dummy/scratch/. Re-base.
> > v4: "full" -> "scratch_page". Duplicate Kconfig help text into command
> > line doc. Re-base.
> > v3: IOMMU_quarantine_basic -> IOMMU_quarantine_fault,
> > IOMMU_quarantine_full -> IOMMU_quarantine_write_fault. Kconfig
> > option (choice) to select default. Limit to HAS_PCI.
> > v2: Don't use true/false. Introduce QUARANTINE_SKIP() (albeit I'm not
> > really convinced this is an improvement). Add comment.
> >
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -1364,7 +1364,7 @@ detection of systems known to misbehave
> > > Default: `new` unless directed-EOI is supported
> >
> > ### iommu
> > - = List of [ <bool>, verbose, debug, force, required, quarantine,
> > + = List of [ <bool>, verbose, debug, force, required, quarantine[=scratch-
> page],
> > sharept, intremap, intpost, crash-disable,
> > snoop, qinval, igfx, amd-iommu-perdev-intremap,
> > dom0-{passthrough,strict} ]
> > @@ -1402,11 +1402,32 @@ boolean (e.g. `iommu=no`) can override t
> > will prevent Xen from booting if IOMMUs aren't discovered and enabled
> > successfully.
> >
> > -* The `quarantine` boolean can be used to control Xen's behavior when
> > - de-assigning devices from guests. If enabled (the default), Xen always
> > +* The `quarantine` option can be used to control Xen's behavior when
> > + de-assigning devices from guests.
> > +
> > + When a PCI device is assigned to an untrusted domain, it is possible
> > + for that domain to program the device to DMA to an arbitrary address.
> > + The IOMMU is used to protect the host from malicious DMA by making
> > + sure that the device addresses can only target memory assigned to the
> > + guest. However, when the guest domain is torn down, assigning the
> > + device back to the hardware domain would allow any in-flight DMA to
> > + potentially target critical host data. To avoid this, quarantining
> > + should be enabled. Quarantining can be done in two ways: In its basic
> > + form, all in-flight DMA will simply be forced to encounter IOMMU
> > + faults. Since there are systems where doing so can cause host lockup,
> > + an alternative form is available where writes to memory will be made
> > + fault, but reads will be directed to a scratch page. The implication
> > + here is that such reads will go unnoticed, i.e. an admin may not
> > + become aware of the underlying problem.
> > +
> > + Therefore, if this option is set to true (the default), Xen always
> > quarantines such devices; they must be explicitly assigned back to Dom0
> > - before they can be used there again. If disabled, Xen will only
> > - quarantine devices the toolstack hass arranged for getting quarantined.
> > + before they can be used there again. If set to "scratch-page", still
> > + active DMA reads will additionally be directed to a "scratch" page. If
> > + set to false, Xen will only quarantine devices the toolstack has arranged
> > + for getting quarantined, and only in the "basic" form.
> > +
> > + This option is only valid on builds supporting PCI.
> >
> > * The `sharept` boolean controls whether the IOMMU pagetables are
> shared
> > with the CPU-side HAP pagetables, or allocated separately. Sharing
> > --- a/xen/drivers/passthrough/Kconfig
> > +++ b/xen/drivers/passthrough/Kconfig
> > @@ -39,3 +39,31 @@ endif
> >
> > config IOMMU_FORCE_PT_SHARE
> > bool
> > +
> > +choice
> > + prompt "IOMMU device quarantining default behavior"
> > + depends on HAS_PCI
> > + default IOMMU_QUARANTINE_BASIC
> > + ---help---
> > + When a PCI device is assigned to an untrusted domain, it is possible
> > + for that domain to program the device to DMA to an arbitrary
> address.
> > + The IOMMU is used to protect the host from malicious DMA by
> making
> > + sure that the device addresses can only target memory assigned to
> the
> > + guest. However, when the guest domain is torn down, assigning the
> > + device back to the hardware domain would allow any in-flight DMA
> to
> > + potentially target critical host data. To avoid this, quarantining
> > + should be enabled. Quarantining can be done in two ways: In its
> basic
> > + form, all in-flight DMA will simply be forced to encounter IOMMU
> > + faults. Since there are systems where doing so can cause host
> lockup,
> > + an alternative form is available where writes to memory will be
> made
> > + fault, but reads will be directed to a scratch page. The implication
> > + here is that such reads will go unnoticed, i.e. an admin may not
> > + become aware of the underlying problem.
> > +
> > + config IOMMU_QUARANTINE_NONE
> > + bool "none"
> > + config IOMMU_QUARANTINE_BASIC
> > + bool "basic"
> > + config IOMMU_QUARANTINE_SCRATCH_PAGE
> > + bool "scratch page"
> > +endchoice
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -25,6 +25,9 @@
> > #include "iommu.h"
> > #include "../ats.h"
> >
> > +/* dom_io is used as a sentinel for quarantined devices */
> > +#define QUARANTINE_SKIP(d) ((d) == dom_io && !dom_iommu(d)-
> >arch.amd.root_table)
> > +
> > static bool_t __read_mostly init_done;
> >
> > static const struct iommu_init_ops _iommu_init_ops;
> > @@ -81,19 +84,36 @@ int get_dma_requestor_id(uint16_t seg, u
> > return req_id;
> > }
> >
> > -static void amd_iommu_setup_domain_device(
> > +static int __must_check allocate_domain_resources(struct domain *d)
> > +{
> > + struct domain_iommu *hd = dom_iommu(d);
> > + int rc;
> > +
> > + spin_lock(&hd->arch.mapping_lock);
> > + rc = amd_iommu_alloc_root(d);
> > + spin_unlock(&hd->arch.mapping_lock);
> > +
> > + return rc;
> > +}
> > +
> > +static int __must_check amd_iommu_setup_domain_device(
> > struct domain *domain, struct amd_iommu *iommu,
> > uint8_t devfn, struct pci_dev *pdev)
> > {
> > struct amd_iommu_dte *table, *dte;
> > unsigned long flags;
> > - int req_id, valid = 1;
> > + int req_id, valid = 1, rc;
> > u8 bus = pdev->bus;
> > - const struct domain_iommu *hd = dom_iommu(domain);
> > + struct domain_iommu *hd = dom_iommu(domain);
> >
> > - BUG_ON( !hd->arch.amd.root_table ||
> > - !hd->arch.amd.paging_mode ||
> > - !iommu->dev_table.buffer );
> > + if ( QUARANTINE_SKIP(domain) )
> > + return 0;
> > +
> > + BUG_ON(!hd->arch.amd.paging_mode || !iommu->dev_table.buffer);
> > +
> > + rc = allocate_domain_resources(domain);
> > + if ( rc )
> > + return rc;
> >
> > if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
> > valid = 0;
> > @@ -151,6 +171,8 @@ static void amd_iommu_setup_domain_devic
> >
> > amd_iommu_flush_iotlb(devfn, pdev,
> INV_IOMMU_ALL_PAGES_ADDRESS, 0);
> > }
> > +
> > + return 0;
> > }
> >
> > int __init acpi_ivrs_init(void)
> > @@ -222,18 +244,6 @@ int amd_iommu_alloc_root(struct domain *
> > return 0;
> > }
> >
> > -static int __must_check allocate_domain_resources(struct domain *d)
> > -{
> > - struct domain_iommu *hd = dom_iommu(d);
> > - int rc;
> > -
> > - spin_lock(&hd->arch.mapping_lock);
> > - rc = amd_iommu_alloc_root(d);
> > - spin_unlock(&hd->arch.mapping_lock);
> > -
> > - return rc;
> > -}
> > -
> > static int amd_iommu_domain_init(struct domain *d)
> > {
> > struct domain_iommu *hd = dom_iommu(d);
> > @@ -283,6 +293,9 @@ static void amd_iommu_disable_domain_dev
> > int req_id;
> > u8 bus = pdev->bus;
> >
> > + if ( QUARANTINE_SKIP(domain) )
> > + return;
> > +
> > BUG_ON ( iommu->dev_table.buffer == NULL );
> > req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
> > table = iommu->dev_table.buffer;
> > @@ -349,11 +362,10 @@ static int reassign_device(struct domain
> > pdev->domain = target;
> > }
> >
> > - rc = allocate_domain_resources(target);
> > + rc = amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
> > if ( rc )
> > return rc;
> >
> > - amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
> > AMD_IOMMU_DEBUG("Re-assign %pp from dom%d to dom%d\n",
> > &pdev->sbdf, source->domain_id, target->domain_id);
> >
> > @@ -460,8 +472,7 @@ static int amd_iommu_add_device(u8 devfn
> > spin_unlock_irqrestore(&iommu->lock, flags);
> > }
> >
> > - amd_iommu_setup_domain_device(pdev->domain, iommu, devfn,
> pdev);
> > - return 0;
> > + return amd_iommu_setup_domain_device(pdev->domain, iommu,
> devfn, pdev);
> > }
> >
> > static int amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -31,9 +31,24 @@ bool_t __initdata iommu_enable = 1;
> > bool_t __read_mostly iommu_enabled;
> > bool_t __read_mostly force_iommu;
> > bool_t __read_mostly iommu_verbose;
> > -bool __read_mostly iommu_quarantine = true;
> > bool_t __read_mostly iommu_crash_disable;
> >
> > +#define IOMMU_quarantine_none 0 /* aka false */
> > +#define IOMMU_quarantine_basic 1 /* aka true */
> > +#define IOMMU_quarantine_scratch_page 2
> > +#ifdef CONFIG_HAS_PCI
> > +uint8_t __read_mostly iommu_quarantine =
> > +# if defined(CONFIG_IOMMU_QUARANTINE_NONE)
> > + IOMMU_quarantine_none;
> > +# elif defined(CONFIG_IOMMU_QUARANTINE_BASIC)
> > + IOMMU_quarantine_basic;
> > +# elif defined(CONFIG_IOMMU_QUARANTINE_SCRATCH_PAGE)
> > + IOMMU_quarantine_scratch_page;
> > +# endif
> > +#else
> > +# define iommu_quarantine IOMMU_quarantine_none
> > +#endif /* CONFIG_HAS_PCI */
> > +
> > static bool __hwdom_initdata iommu_hwdom_none;
> > bool __hwdom_initdata iommu_hwdom_strict;
> > bool __read_mostly iommu_hwdom_passthrough;
> > @@ -64,8 +79,12 @@ static int __init parse_iommu_param(cons
> > else if ( (val = parse_boolean("force", s, ss)) >= 0 ||
> > (val = parse_boolean("required", s, ss)) >= 0 )
> > force_iommu = val;
> > +#ifdef CONFIG_HAS_PCI
> > else if ( (val = parse_boolean("quarantine", s, ss)) >= 0 )
> > iommu_quarantine = val;
> > + else if ( ss == s + 15 && !strncmp(s, "quarantine=scratch-page", 23) )
> > + iommu_quarantine = IOMMU_quarantine_scratch_page;
> > +#endif
> > #ifdef CONFIG_X86
> > else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
> > iommu_igfx = val;
> > @@ -432,7 +451,7 @@ static int __init iommu_quarantine_init(
> > dom_io->options |= XEN_DOMCTL_CDF_iommu;
> >
> > rc = iommu_domain_init(dom_io, 0);
> > - if ( rc )
> > + if ( rc || iommu_quarantine < IOMMU_quarantine_scratch_page )
> > return rc;
> >
> > if ( !hd->platform_ops->quarantine_init )
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -42,6 +42,9 @@
> > #include "vtd.h"
> > #include "../ats.h"
> >
> > +/* dom_io is used as a sentinel for quarantined devices */
> > +#define QUARANTINE_SKIP(d) ((d) == dom_io && !dom_iommu(d)-
> >arch.vtd.pgd_maddr)
> > +
> > struct mapped_rmrr {
> > struct list_head list;
> > u64 base, end;
> > @@ -1328,6 +1331,9 @@ int domain_context_mapping_one(
> > int rc, ret;
> > bool_t flush_dev_iotlb;
> >
> > + if ( QUARANTINE_SKIP(domain) )
> > + return 0;
> > +
> > ASSERT(pcidevs_locked());
> > spin_lock(&iommu->lock);
> > maddr = bus_to_context_maddr(iommu, bus);
> > @@ -1556,6 +1562,9 @@ int domain_context_unmap_one(
> > int iommu_domid, rc, ret;
> > bool_t flush_dev_iotlb;
> >
> > + if ( QUARANTINE_SKIP(domain) )
> > + return 0;
> > +
> > ASSERT(pcidevs_locked());
> > spin_lock(&iommu->lock);
> >
> > @@ -1617,7 +1626,7 @@ static int domain_context_unmap(struct d
> > {
> > struct acpi_drhd_unit *drhd;
> > struct vtd_iommu *iommu;
> > - int ret = 0;
> > + int ret;
> > u8 seg = pdev->seg, bus = pdev->bus, tmp_bus, tmp_devfn, secbus;
> > int found = 0;
> >
> > @@ -1632,14 +1641,12 @@ static int domain_context_unmap(struct d
> > if ( iommu_debug )
> > printk(VTDPREFIX "%pd:Hostbridge: skip %pp unmap\n",
> > domain, &PCI_SBDF3(seg, bus, devfn));
> > - if ( !is_hardware_domain(domain) )
> > - return -EPERM;
> > - goto out;
> > + return is_hardware_domain(domain) ? 0 : -EPERM;
> >
> > case DEV_TYPE_PCIe_BRIDGE:
> > case DEV_TYPE_PCIe2PCI_BRIDGE:
> > case DEV_TYPE_LEGACY_PCI_BRIDGE:
> > - goto out;
> > + return 0;
> >
> > case DEV_TYPE_PCIe_ENDPOINT:
> > if ( iommu_debug )
> > @@ -1681,10 +1688,12 @@ static int domain_context_unmap(struct d
> > default:
> > dprintk(XENLOG_ERR VTDPREFIX, "%pd:unknown(%u): %pp\n",
> > domain, pdev->type, &PCI_SBDF3(seg, bus, devfn));
> > - ret = -EINVAL;
> > - goto out;
> > + return -EINVAL;
> > }
> >
> > + if ( QUARANTINE_SKIP(domain) )
> > + return ret;
> > +
> > /*
> > * if no other devices under the same iommu owned by this domain,
> > * clear iommu in iommu_bitmap and clear domain_id in domid_bitmp
> > @@ -1719,7 +1728,6 @@ static int domain_context_unmap(struct d
> > iommu->domid_map[iommu_domid] = 0;
> > }
> >
> > -out:
> > return ret;
> > }
> >
> > --- a/xen/include/xen/iommu.h
> > +++ b/xen/include/xen/iommu.h
> > @@ -52,7 +52,9 @@ static inline bool_t dfn_eq(dfn_t x, dfn
> > }
> >
> > extern bool_t iommu_enable, iommu_enabled;
> > -extern bool force_iommu, iommu_quarantine, iommu_verbose;
> > +extern bool force_iommu, iommu_verbose;
> > +/* Boolean except for the specific purposes of
> drivers/passthrough/iommu.c. */
> > +extern uint8_t iommu_quarantine;
> >
> > #ifdef CONFIG_X86
> > extern enum __packed iommu_intremap {
> >
next prev parent reply other threads:[~2021-07-07 1:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-26 8:19 [PATCH v5] IOMMU: make DMA containment of quarantined devices optional Jan Beulich
2021-07-06 7:43 ` Ping: " Jan Beulich
2021-07-07 1:34 ` Tian, Kevin [this message]
2021-07-06 11:43 ` Paul Durrant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BN9PR11MB54337D419466D0DE95E83F8D8C1A9@BN9PR11MB5433.namprd11.prod.outlook.com \
--to=kevin.tian@intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=paul@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).