xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: elena.ufimtseva@oracle.com
To: xen-devel@lists.xen.org
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	kevin.tian@intel.com, tim@xen.org, jbeulich@suse.com,
	yang.z.zhang@intel.com, boris.ostrovsky@oracle.com
Subject: [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs
Date: Mon, 13 Jul 2015 14:18:02 -0400	[thread overview]
Message-ID: <1436811482-16113-6-git-send-email-elena.ufimtseva@oracle.com> (raw)
In-Reply-To: <1436811482-16113-1-git-send-email-elena.ufimtseva@oracle.com>

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

On some platforms RMRR regions may be not specified
in ACPI and thus will not be mapped 1:1 in dom0. This
causes IO Page Faults and prevents dom0 from booting
in PVH mode.
New Xen command line option rmrr allows to specify
such devices and memory regions. These regions are added
to the list of RMRR defined in ACPI if the device
is present in system. As a result, additional RMRRs will
be mapped 1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with
ThinkCentre M and Dell 5600T. No official documentation
was found so far in regards to what devices and why cause this.
Experiments show that ThinkCentre M USB devices with enabled
debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.
For Dell 5600T the device and faulting addresses are not found yet.

For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/xen-command-line.markdown |  13 +++
 xen/drivers/passthrough/vtd/dmar.c  | 209 +++++++++++++++++++++++++++++++++++-
 2 files changed, 221 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index aa684c0..f307f3d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1197,6 +1197,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+> '= start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRR units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If a segment is specified for other than the first device and it does not match
+the one specified for the first one, an error will be reported.
+Note: grub2 requires to escape or use quotations if special characters are used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
 > `= <boolean>`
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 93f10fd..61e8f28 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -867,6 +867,145 @@ out:
     return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option. */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+    struct list_head list;
+    unsigned long base_pfn, end_pfn;
+    unsigned int dev_count;
+    u32    sbdf[MAX_EXTRA_RMRR_DEV];
+};
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
+
+/* Macro for RMRR inclusive range formatting. */
+#define PRI_RMRR(s,e) "[%lx-%lx]"
+
+static void __init add_extra_rmrr(void)
+{
+    struct acpi_rmrr_unit *acpi_rmrr;
+    struct acpi_rmrr_unit *rmrru;
+    unsigned int dev, seg, i, j;
+    unsigned long pfn;
+    bool_t overlap;
+
+    for ( i = 0; i < nr_rmrr; i++ )
+    {
+        if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn )
+        {
+            printk(XENLOG_ERR VTDPREFIX
+                   "Invalid RMRR Range "PRI_RMRR(s,e)"\n",
+                   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+            continue;
+        }
+
+        if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >=
+             MAX_EXTRA_RMRR_PAGES )
+        {
+            printk(XENLOG_ERR VTDPREFIX
+                   "RMRR range "PRI_RMRR(s,e)" exceeds "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
+                   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+            continue;
+        }
+
+        for ( j = 0; j < nr_rmrr; j++ )
+        {
+            if ( i != j &&
+                 extra_rmrr_units[i].base_pfn <= extra_rmrr_units[j].end_pfn &&
+                 extra_rmrr_units[j].base_pfn <= extra_rmrr_units[i].end_pfn )
+            {
+                printk(XENLOG_ERR VTDPREFIX
+                      "Overlapping RMRRs "PRI_RMRR(s,e)" and "PRI_RMRR(s,e)"\n",
+                      extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn,
+                      extra_rmrr_units[j].base_pfn, extra_rmrr_units[j].end_pfn);
+                break;
+            }
+        }
+        /* Broke out of the overlap loop check, continue with next rmrr. */
+        if ( j < nr_rmrr )
+            continue;
+        overlap = 0;
+        list_for_each_entry(rmrru, &acpi_rmrr_units, list)
+        {
+            if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn <= rmrru->end_address) &&
+                 rmrru->base_address <= pfn_to_paddr(extra_rmrr_units[i].end_pfn) )
+            {
+                printk(XENLOG_ERR VTDPREFIX
+                       "Overlapping extra RMRRs "PRI_RMRR(s,e)" and ACPI RMRRs "PRI_RMRR(s,e)"\n",
+                       extra_rmrr_units[i].base_pfn,
+                       extra_rmrr_units[i].end_pfn,
+                       paddr_to_pfn(rmrru->base_address),
+                       paddr_to_pfn(rmrru->end_address));
+                overlap = 1;
+                break;
+            }
+        }
+        /* Continue to next RMRR is this one overlaps with one from ACPI. */
+        if ( overlap )
+            continue;
+
+        pfn = extra_rmrr_units[i].base_pfn;
+        do
+        {
+            if ( !mfn_valid(pfn) || (pfn >> (paddr_bits - PAGE_SHIFT)) )
+            {
+                printk(XENLOG_ERR VTDPREFIX
+                       "Invalid pfn in RMRR range "PRI_RMRR(s,e)"\n",
+                       extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+                break;
+            }
+        } while ( pfn++ <= extra_rmrr_units[i].end_pfn );
+        /* The range had invalid pfn as the loop was broken out before reaching end_pfn. */
+        if ( pfn <= extra_rmrr_units[i].end_pfn )
+            continue;
+
+        acpi_rmrr = xzalloc(struct acpi_rmrr_unit);
+        if ( !acpi_rmrr )
+            return;
+
+        acpi_rmrr->scope.devices = xmalloc_array(u16,
+                                                 extra_rmrr_units[i].dev_count);
+        if ( !acpi_rmrr->scope.devices )
+        {
+            xfree(acpi_rmrr);
+            return;
+        }
+
+        seg = 0;
+        for ( dev = 0; dev < extra_rmrr_units[i].dev_count; dev++ )
+        {
+            acpi_rmrr->scope.devices[dev] = extra_rmrr_units[i].sbdf[dev];
+            seg = seg | PCI_SEG(extra_rmrr_units[i].sbdf[dev]);
+        }
+        if ( seg != PCI_SEG(extra_rmrr_units[i].sbdf[0]) )
+        {
+            printk(XENLOG_ERR VTDPREFIX
+                   "Segments are not equal for RMRR range "PRI_RMRR(s,e)"\n",
+                   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+            scope_devices_free(&acpi_rmrr->scope);
+            xfree(acpi_rmrr);
+            continue;
+        }
+
+        acpi_rmrr->segment = seg;
+        acpi_rmrr->base_address = pfn_to_paddr(extra_rmrr_units[i].base_pfn);
+        acpi_rmrr->end_address = pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1);
+        acpi_rmrr->scope.devices_cnt = extra_rmrr_units[i].dev_count;
+
+        if ( register_one_rmrr(acpi_rmrr) )
+        {
+            printk(XENLOG_ERR VTDPREFIX
+                   "Could not register RMMR range "PRI_RMRR(s,e)"\n",
+                   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+            scope_devices_free(&acpi_rmrr->scope);
+            xfree(acpi_rmrr);
+        }
+    }
+}
+
 #include <asm/tboot.h>
 /* ACPI tables may not be DMA protected by tboot, so use DMAR copy */
 /* SINIT saved in SinitMleData in TXT heap (which is DMA protected) */
@@ -876,6 +1015,7 @@ int __init acpi_dmar_init(void)
 {
     acpi_physical_address dmar_addr;
     acpi_native_uint dmar_len;
+    int ret;
 
     if ( ACPI_SUCCESS(acpi_get_table_phys(ACPI_SIG_DMAR, 0,
                                           &dmar_addr, &dmar_len)) )
@@ -886,7 +1026,10 @@ int __init acpi_dmar_init(void)
         dmar_table = __va(dmar_addr);
     }
 
-    return parse_dmar_table(acpi_parse_dmar);
+    ret = parse_dmar_table(acpi_parse_dmar);
+    add_extra_rmrr();
+
+    return ret;
 }
 
 void acpi_dmar_reinstate(void)
@@ -917,3 +1060,67 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+/*
+ * Parse rmrr Xen command line options and add parsed device and region into
+ * acpi_rmrr_unit list to mapped as RMRRs parsed from ACPI.
+ * Format:
+ * rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
+ * If the segment of the first device is not specified, segment zero will be used.
+ * If other segments are not specified, first device segment will be used.
+ * If a segment is specified for other than the first device and it does not match
+ * the one specified for the first one, an error will be reported.
+ */
+static void __init parse_rmrr_param(const char *str)
+{
+    const char *s = str, *cur, *stmp;
+    unsigned int seg, bus, dev, func;
+    unsigned long start, end;
+
+    do {
+        start = simple_strtoul(cur = s, &s, 0);
+        if ( cur == s )
+            break;
+
+        if ( *s == '-' )
+        {
+            end = simple_strtoul(cur = s + 1, &s, 0);
+            if ( cur == s )
+                break;
+        }
+        else
+            end = start;
+
+        extra_rmrr_units[nr_rmrr].base_pfn = start;
+        extra_rmrr_units[nr_rmrr].end_pfn = end;
+        extra_rmrr_units[nr_rmrr].dev_count = 0;
+
+        if ( *s != '=' )
+            continue;
+
+        do {
+            bool_t default_segment = 0;
+
+            if ( *s == ';' )
+                break;
+            stmp = __parse_pci(s + 1, &seg, &bus, &dev, &func, &default_segment);
+            if ( !stmp )
+                break;
+
+            /* Not specified segment will be replaced with one from first device. */
+            if ( extra_rmrr_units[nr_rmrr].dev_count && default_segment )
+                seg = PCI_SEG(extra_rmrr_units[nr_rmrr].sbdf[0]);
+
+            /* Keep sbdf's even if they differ and later report an error. */
+            extra_rmrr_units[nr_rmrr].sbdf[extra_rmrr_units[nr_rmrr].dev_count] = PCI_SBDF(seg, bus, dev, func);
+            extra_rmrr_units[nr_rmrr].dev_count++;
+            s = stmp;
+        } while ( (*s == ',' || *s ) &&
+                  extra_rmrr_units[nr_rmrr].dev_count < MAX_EXTRA_RMRR_DEV );
+
+        if ( extra_rmrr_units[nr_rmrr].dev_count )
+            nr_rmrr++;
+
+    } while ( *s++ == ';' && nr_rmrr < MAX_EXTRA_RMRR );
+}
+custom_param("rmrr", parse_rmrr_param);
-- 
2.1.3

  parent reply	other threads:[~2015-07-13 18:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13 18:17 [PATCH v10 0/5] iommu: add rmrr Xen command line option elena.ufimtseva
2015-07-13 18:17 ` [PATCH v10 1/5] dmar: device scope mem leak fix elena.ufimtseva
2015-07-13 18:17 ` [PATCH v10 2/5] iommu VT-d: separate rmrr addition function elena.ufimtseva
2015-07-13 18:18 ` [PATCH v10 3/5] pci: add wrapper for parse_pci elena.ufimtseva
2015-07-13 18:18 ` [PATCH v10 4/5] pci: add PCI_SBDF and PCI_SEG macros elena.ufimtseva
2015-07-13 18:18 ` elena.ufimtseva [this message]
2015-07-14 10:43   ` [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs Jan Beulich
2015-07-15  7:25     ` Jan Beulich
2015-07-15 15:27       ` Elena Ufimtseva
2015-07-15 16:08         ` Jan Beulich
2015-07-14 10:18 ` [PATCH v10 0/5] iommu: add rmrr Xen command line option Jan Beulich
2015-07-15 16:15 [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs Elena Ufimtseva
2015-07-16  8:02 ` Jan Beulich
2015-07-16 16:00 Elena Ufimtseva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1436811482-16113-6-git-send-email-elena.ufimtseva@oracle.com \
    --to=elena.ufimtseva@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jbeulich@suse.com \
    --cc=kevin.tian@intel.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).