All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] vpci: PCI config space emulation
@ 2017-04-20 15:17 Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky

Hello,

The following series contain an implementation of handlers for the PCI
configuration space inside of Xen. This allows Xen to detect accesses to the
PCI configuration space and react accordingly.

Patch 1 implements the generic handlers for accesses to the PCI configuration
space together with a minimal user-space test harness that I've used during
development. Currently a per-device red-back tree is used in order to store the
list of handlers, and they are indexed based on their offset inside of the
configuration space. Patch 1 also adds the x86 port IO traps and wires them
into the newly introduced vPCI dispatchers. Patch 2 adds handlers for the ECAM
areas (as found on the MMCFG ACPI table). Patches 3 and 4 are mostly code
moment/refactoring in order to implement support for BAR mapping in patch 5.
Patch 6 allows Xen to mask certain PCI capabilities on-demand, which is used in
order to mask MSI and MSI-X.

Finally patches 8 and 9 implement support in order to emulate the MSI/MSI-X
capabilities inside of Xen, so that the interrupts are transparently routed to
the guest.

This series is based on top of my previous "x86/dpci: bind legacy PCI
interrupts to PVHv2 Dom0". The branch containing the patches can be found at:

git://xenbits.xen.org/people/royger/xen.git vpci_v2

Note that this is only safe to use for the hardware domain (that's trusted),
any non-trusted domain will need a lot more of traps before it can freely
access the PCI configuration space.

Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-21 16:07   ` Paul Durrant
  2017-04-21 16:23   ` Paul Durrant
  2017-04-20 15:17 ` [PATCH v2 2/9] x86/ecam: add handlers for the PVH Dom0 MMCFG areas Roger Pau Monne
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Paul Durrant, Jan Beulich,
	boris.ostrovsky, Roger Pau Monne

This functionality is going to reside in vpci.c (and the corresponding vpci.h
header), and should be arch-agnostic. The handlers introduced in this patch
setup the basic functionality required in order to trap accesses to the PCI
config space, and allow decoding the address and finding the corresponding
handler that should handle the access (although no handlers are implemented).

Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
inside of a x86 HVM file, since that's not shared with other arches.

A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen whether
a domain should use the newly introduced vPCI handlers, this is only enabled
for PVH Dom0 at the moment.

A very simple user-space test is also provided, so that the basic functionality
of the vPCI traps can be asserted. This has been proven quite helpful during
development, since the logic to handle partial accesses or accesses that expand
across multiple registers is not trivial.

The handlers for the registers are added to a red-black tree, that indexes them
based on their offset. Since Xen needs to handle partial accesses to the
registers and access that expand across multiple registers the logic in
xen_vpci_{read/write} is kind of convoluted, I've tried to properly comment it
in order to make it easier to understand.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v1:
 - Allow access to cross a word-boundary.
 - Add locking.
 - Add cleanup to xen_vpci_add_handlers in case of failure.
---
 .gitignore                        |   4 +
 tools/libxl/libxl_x86.c           |   2 +-
 tools/tests/Makefile              |   1 +
 tools/tests/vpci/Makefile         |  45 ++++
 tools/tests/vpci/emul.h           | 107 +++++++++
 tools/tests/vpci/main.c           | 206 +++++++++++++++++
 xen/arch/arm/xen.lds.S            |   3 +
 xen/arch/x86/domain.c             |  18 +-
 xen/arch/x86/hvm/hvm.c            |   2 +
 xen/arch/x86/hvm/io.c             | 135 +++++++++++
 xen/arch/x86/setup.c              |   3 +-
 xen/arch/x86/xen.lds.S            |   3 +
 xen/drivers/Makefile              |   2 +-
 xen/drivers/passthrough/pci.c     |   3 +
 xen/drivers/vpci/Makefile         |   1 +
 xen/drivers/vpci/vpci.c           | 474 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h      |   1 +
 xen/include/asm-x86/hvm/domain.h  |   3 +
 xen/include/asm-x86/hvm/io.h      |   3 +
 xen/include/public/arch-x86/xen.h |   5 +-
 xen/include/xen/pci.h             |   4 +
 xen/include/xen/vpci.h            |  66 ++++++
 22 files changed, 1083 insertions(+), 8 deletions(-)
 create mode 100644 tools/tests/vpci/Makefile
 create mode 100644 tools/tests/vpci/emul.h
 create mode 100644 tools/tests/vpci/main.c
 create mode 100644 xen/drivers/vpci/Makefile
 create mode 100644 xen/drivers/vpci/vpci.c
 create mode 100644 xen/include/xen/vpci.h

diff --git a/.gitignore b/.gitignore
index 74747cb7e7..ebafba25b5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -236,6 +236,10 @@ tools/tests/regression/build/*
 tools/tests/regression/downloads/*
 tools/tests/mem-sharing/memshrtool
 tools/tests/mce-test/tools/xen-mceinj
+tools/tests/vpci/rbtree.[hc]
+tools/tests/vpci/vpci.[hc]
+tools/tests/vpci/test_vpci.out
+tools/tests/vpci/test_vpci
 tools/xcutils/lsevtchn
 tools/xcutils/readnotes
 tools/xenbackendd/_paths.h
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 455f6f0bed..dd7fc78a99 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -11,7 +11,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
     if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) {
         if (d_config->b_info.device_model_version !=
             LIBXL_DEVICE_MODEL_VERSION_NONE) {
-            xc_config->emulation_flags = XEN_X86_EMU_ALL;
+            xc_config->emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
         } else if (libxl_defbool_val(d_config->b_info.u.hvm.apic)) {
             /*
              * HVM guests without device model may want
diff --git a/tools/tests/Makefile b/tools/tests/Makefile
index 639776130b..5cfe781e62 100644
--- a/tools/tests/Makefile
+++ b/tools/tests/Makefile
@@ -13,6 +13,7 @@ endif
 SUBDIRS-$(CONFIG_X86) += x86_emulator
 SUBDIRS-y += xen-access
 SUBDIRS-y += xenstore
+SUBDIRS-$(CONFIG_HAS_PCI) += vpci
 
 .PHONY: all clean install distclean
 all clean distclean: %: subdirs-%
diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
new file mode 100644
index 0000000000..7969fcbd82
--- /dev/null
+++ b/tools/tests/vpci/Makefile
@@ -0,0 +1,45 @@
+
+XEN_ROOT=$(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+TARGET := test_vpci
+
+.PHONY: all
+all: $(TARGET)
+
+.PHONY: run
+run: $(TARGET)
+	./$(TARGET) > $(TARGET).out
+
+$(TARGET): vpci.c vpci.h rbtree.c rbtree.h
+	$(HOSTCC) -g -o $@ vpci.c main.c rbtree.c
+
+.PHONY: clean
+clean:
+	rm -rf $(TARGET) $(TARGET).out *.o *~ vpci.h vpci.c rbtree.c rbtree.h
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: install
+install:
+
+vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
+	sed -e '/#include/d' <$< >$@
+
+vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
+	# Trick the compiler so it doesn't complain about missing symbols
+	sed -e '/#include/d' \
+	    -e '1s;^;#include "emul.h"\
+	             const vpci_register_init_t __start_vpci_array[1]\;\
+	             const vpci_register_init_t __end_vpci_array[1]\;\
+	             ;' <$< >$@
+
+rbtree.h: $(XEN_ROOT)/xen/include/xen/rbtree.h
+	sed -e '/#include/d' <$< >$@
+
+rbtree.c: $(XEN_ROOT)/xen/common/rbtree.c
+	sed -e "/#include/d" \
+	    -e '1s;^;#include "emul.h"\
+	             ;' <$< >$@
+
diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
new file mode 100644
index 0000000000..85897ed43b
--- /dev/null
+++ b/tools/tests/vpci/emul.h
@@ -0,0 +1,107 @@
+/*
+ * Unit tests for the generic vPCI handler code.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEST_VPCI_
+#define _TEST_VPCI_
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <assert.h>
+
+#define container_of(ptr, type, member) ({                      \
+        typeof( ((type *)0)->member ) *__mptr = (ptr);          \
+        (type *)( (char *)__mptr - offsetof(type,member) );})
+
+#include "rbtree.h"
+
+struct pci_dev {
+    struct domain *domain;
+    struct vpci *vpci;
+};
+
+struct domain {
+    struct pci_dev pdev;
+};
+
+struct vcpu
+{
+    struct domain *domain;
+};
+
+extern struct vcpu v;
+
+#define spin_lock(x)
+#define spin_unlock(x)
+#define spin_is_locked(x) true
+
+#define current (&v)
+
+#define has_vpci(d) true
+
+#include "vpci.h"
+
+#define xzalloc(type) (type *)calloc(1, sizeof(type))
+#define xfree(p) free(p)
+
+#define EXPORT_SYMBOL(x)
+
+#define pci_get_pdev_by_domain(d, ...) &(d)->pdev
+
+#define atomic_read(x) 1
+
+/* Dummy native helpers. Writes are ignored, reads return 1's. */
+#define pci_conf_read8(...) (0xff)
+#define pci_conf_read16(...) (0xffff)
+#define pci_conf_read32(...) (0xffffffff)
+#define pci_conf_write8(...)
+#define pci_conf_write16(...)
+#define pci_conf_write32(...)
+
+#define BUG() assert(0)
+#define ASSERT_UNREACHABLE() assert(0)
+#define ASSERT(x) assert(x)
+
+#ifdef _LP64
+#define BITS_PER_LONG 64
+#else
+#define BITS_PER_LONG 32
+#endif
+#define GENMASK(h, l) \
+    (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define min(x,y) ({ \
+        const typeof(x) _x = (x);       \
+        const typeof(y) _y = (y);       \
+        (void) (&_x == &_y);            \
+        _x < _y ? _x : _y; })
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
new file mode 100644
index 0000000000..0fc63de038
--- /dev/null
+++ b/tools/tests/vpci/main.c
@@ -0,0 +1,206 @@
+/*
+ * Unit tests for the generic vPCI handler code.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "emul.h"
+
+/* Single vcpu (current), and single domain with a single PCI device. */
+static struct vpci vpci = {
+    .handlers = RB_ROOT,
+};
+
+static struct domain d = {
+    .pdev.domain = &d,
+    .pdev.vpci = &vpci,
+};
+
+struct vcpu v = { .domain = &d };
+
+/* Dummy hooks, write stores data, read fetches it. */
+static int vpci_read8(struct pci_dev *pdev, unsigned int reg,
+                      union vpci_val *val, void *data)
+{
+    uint8_t *priv = data;
+
+    val->half_word = *priv;
+    return 0;
+}
+
+static int vpci_write8(struct pci_dev *pdev, unsigned int reg,
+                       union vpci_val val, void *data)
+{
+    uint8_t *priv = data;
+
+    *priv = val.half_word;
+    return 0;
+}
+
+static int vpci_read16(struct pci_dev *pdev, unsigned int reg,
+                       union vpci_val *val, void *data)
+{
+    uint16_t *priv = data;
+
+    val->word = *priv;
+    return 0;
+}
+
+static int vpci_write16(struct pci_dev *pdev, unsigned int reg,
+                        union vpci_val val, void *data)
+{
+    uint16_t *priv = data;
+
+    *priv = val.word;
+    return 0;
+}
+
+static int vpci_read32(struct pci_dev *pdev, unsigned int reg,
+                       union vpci_val *val, void *data)
+{
+    uint32_t *priv = data;
+
+    val->double_word = *priv;
+    return 0;
+}
+
+static int vpci_write32(struct pci_dev *pdev, unsigned int reg,
+                        union vpci_val val, void *data)
+{
+    uint32_t *priv = data;
+
+    *priv = val.double_word;
+    return 0;
+}
+
+#define VPCI_READ(reg, size, data) \
+    assert(!xen_vpci_read(0, 0, 0, reg, size, data))
+
+#define VPCI_READ_CHECK(reg, size, expected) ({ \
+    uint32_t val;                               \
+    VPCI_READ(reg, size, &val);                 \
+    assert(val == expected);                    \
+    })
+
+#define VPCI_WRITE(reg, size, data) \
+    assert(!xen_vpci_write(0, 0, 0, reg, size, data))
+
+#define VPCI_CHECK_REG(reg, size, data) ({      \
+    VPCI_WRITE(reg, size, data);                \
+    VPCI_READ_CHECK(reg, size, data);           \
+    })
+
+#define VPCI_ADD_REG(fread, fwrite, off, size, store)                         \
+    assert(!xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, &store)) \
+
+#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)                      \
+    assert(xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, NULL))  \
+
+int
+main(int argc, char **argv)
+{
+    /* Index storage by offset. */
+    uint32_t r0 = 0xdeadbeef;
+    uint8_t r5 = 0xef;
+    uint8_t r6 = 0xbe;
+    uint8_t r7 = 0xef;
+    uint16_t r12 = 0x8696;
+    int rc;
+
+    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
+    VPCI_READ_CHECK(0, 4, 0xdeadbeef);
+    VPCI_CHECK_REG(0, 4, 0xbcbcbcbc);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
+    VPCI_READ_CHECK(5, 1, 0xef);
+    VPCI_CHECK_REG(5, 1, 0xba);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
+    VPCI_READ_CHECK(6, 1, 0xbe);
+    VPCI_CHECK_REG(6, 1, 0xba);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
+    VPCI_READ_CHECK(7, 1, 0xef);
+    VPCI_CHECK_REG(7, 1, 0xbd);
+
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
+    VPCI_READ_CHECK(12, 2, 0x8696);
+    VPCI_READ_CHECK(12, 4, 0xffff8696);
+
+    /*
+     * At this point we have the following layout:
+     *
+     * 32    24    16     8     0
+     *  +-----+-----+-----+-----+
+     *  |          r0           | 0
+     *  +-----+-----+-----+-----+
+     *  | r7  |  r6 |  r5 |/////| 32
+     *  +-----+-----+-----+-----|
+     *  |///////////////////////| 64
+     *  +-----------+-----------+
+     *  |///////////|    r12    | 96
+     *  +-----------+-----------+
+     *             ...
+     *  / = empty.
+     */
+
+    /* Try to add an overlapping register handler. */
+    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
+
+    /* Try to add a non-aligned register. */
+    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
+
+    /* Try to add a register with wrong size. */
+    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
+
+    /* Try to add a register with missing handlers. */
+    VPCI_ADD_INVALID_REG(vpci_read16, NULL, 8, 2);
+    VPCI_ADD_INVALID_REG(NULL, vpci_write16, 8, 2);
+
+    /* Read/write of unset register. */
+    VPCI_READ_CHECK(8, 4, 0xffffffff);
+    VPCI_READ_CHECK(8, 2, 0xffff);
+    VPCI_READ_CHECK(8, 1, 0xff);
+    VPCI_WRITE(10, 2, 0xbeef);
+    VPCI_READ_CHECK(10, 2, 0xffff);
+
+    /* Read of multiple registers */
+    VPCI_CHECK_REG(7, 1, 0xbd);
+    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
+
+    /* Partial read of a register. */
+    VPCI_CHECK_REG(0, 4, 0x1a1b1c1d);
+    VPCI_READ_CHECK(2, 1, 0x1b);
+    VPCI_READ_CHECK(6, 2, 0xbdba);
+
+    /* Write of multiple registers. */
+    VPCI_CHECK_REG(4, 4, 0xaabbccff);
+
+    /* Partial write of a register. */
+    VPCI_CHECK_REG(2, 1, 0xfe);
+    VPCI_CHECK_REG(6, 2, 0xfebc);
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 44bd3bf0ce..41bf9dfaf3 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -79,6 +79,9 @@ SECTIONS
        __start_schedulers_array = .;
        *(.data.schedulers)
        __end_schedulers_array = .;
+       __start_vpci_array = .;
+       *(.data.vpci)
+       __end_vpci_array = .;
        *(.data.rel)
        *(.data.rel.*)
        CONSTRUCTORS
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 90e2b1f82a..f74020facc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -500,11 +500,21 @@ static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
     if ( is_hvm_domain(d) )
     {
         if ( is_hardware_domain(d) &&
-             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
-            return false;
-        if ( !is_hardware_domain(d) && emflags &&
-             emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
+             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
+                         XEN_X86_EMU_VPCI) )
             return false;
+        if ( !is_hardware_domain(d) )
+        {
+            switch ( emflags )
+            {
+            case XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI:
+            case XEN_X86_EMU_LAPIC:
+            case 0:
+                break;
+            default:
+                return false;
+            }
+        }
     }
     else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a441955322..7f3322ede6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -37,6 +37,7 @@
 #include <xen/vm_event.h>
 #include <xen/monitor.h>
 #include <xen/warning.h>
+#include <xen/vpci.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -655,6 +656,7 @@ int hvm_domain_initialise(struct domain *d)
         d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
 
     register_g2m_portio_handler(d);
+    register_vpci_portio_handler(d);
 
     hvm_ioreq_init(d);
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 214ab307c4..15048da556 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -25,6 +25,7 @@
 #include <xen/trace.h>
 #include <xen/event.h>
 #include <xen/hypercall.h>
+#include <xen/vpci.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -256,6 +257,140 @@ void register_g2m_portio_handler(struct domain *d)
     handler->ops = &g2m_portio_ops;
 }
 
+/* Do some sanity checks. */
+static int vpci_access_check(unsigned int reg, unsigned int len)
+{
+    /* Check access size. */
+    if ( len != 1 && len != 2 && len != 4 )
+    {
+        gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
+                 reg, len);
+        return -EINVAL;
+    }
+
+    /* Check if access crosses a double-word boundary. */
+    if ( (reg & 3) + len > 4 )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "invalid access across double-word boundary (reg: %#x, len: %u)\n",
+                 reg, len);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/* Helper to decode a PCI address. */
+static void vpci_decode_addr(unsigned int cf8, unsigned int addr,
+                             unsigned int *bus, unsigned int *devfn,
+                             unsigned int *reg)
+{
+    unsigned long bdf;
+
+    bdf = CF8_BDF(cf8);
+    *bus = PCI_BUS(bdf);
+    *devfn = PCI_DEVFN(PCI_SLOT(bdf), PCI_FUNC(bdf));
+    /*
+     * NB: the lower 2 bits of the register address are fetched from the
+     * offset into the 0xcfc register when reading/writing to it.
+     */
+    *reg = (cf8 & 0xfc) | (addr & 3);
+}
+
+/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
+static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
+                                 const ioreq_t *p)
+{
+    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc;
+}
+
+static int vpci_portio_read(const struct hvm_io_handler *handler,
+                            uint64_t addr, uint32_t size, uint64_t *data)
+{
+    struct domain *d = current->domain;
+    unsigned int bus, devfn, reg;
+    uint32_t data32;
+    int rc;
+
+    vpci_lock(d);
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        *data = d->arch.hvm_domain.pci_cf8;
+        vpci_unlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    /* Decode the PCI address. */
+    vpci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn, &reg);
+
+    if ( vpci_access_check(reg, size) || reg >= 0xff )
+    {
+        vpci_unlock(d);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    rc = xen_vpci_read(0, bus, devfn, reg, size, &data32);
+    if ( !rc )
+        *data = data32;
+    vpci_unlock(d);
+
+     return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
+}
+
+static int vpci_portio_write(const struct hvm_io_handler *handler,
+                             uint64_t addr, uint32_t size, uint64_t data)
+{
+    struct domain *d = current->domain;
+    unsigned int bus, devfn, reg;
+    int rc;
+
+    vpci_lock(d);
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        d->arch.hvm_domain.pci_cf8 = data;
+        vpci_unlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    /* Decode the PCI address. */
+    vpci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn, &reg);
+
+    if ( vpci_access_check(reg, size) || reg >= 0xff )
+    {
+        vpci_unlock(d);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    rc = xen_vpci_write(0, bus, devfn, reg, size, data);
+    vpci_unlock(d);
+
+    return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops vpci_portio_ops = {
+    .accept = vpci_portio_accept,
+    .read = vpci_portio_read,
+    .write = vpci_portio_write,
+};
+
+void register_vpci_portio_handler(struct domain *d)
+{
+    struct hvm_io_handler *handler;
+
+    if ( !has_vpci(d) )
+        return;
+
+    handler = hvm_next_io_handler(d);
+    if ( !handler )
+        return;
+
+    spin_lock_init(&d->arch.hvm_domain.vpci_lock);
+    handler->type = IOREQ_TYPE_PIO;
+    handler->ops = &vpci_portio_ops;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f7b927858c..4cf919f206 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         domcr_flags |= DOMCRF_hvm |
                        ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
                          DOMCRF_hap : 0);
-        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
+        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
+                                 XEN_X86_EMU_VPCI;
     }
 
     /* Create initial domain 0. */
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 8289a1bf09..f5cc8e2b8d 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -224,6 +224,9 @@ SECTIONS
        __start_schedulers_array = .;
        *(.data.schedulers)
        __end_schedulers_array = .;
+       __start_vpci_array = .;
+       *(.data.vpci)
+       __end_vpci_array = .;
        *(.data.rel.ro)
        *(.data.rel.ro.*)
   } :text
diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
index 19391802a8..d51c766453 100644
--- a/xen/drivers/Makefile
+++ b/xen/drivers/Makefile
@@ -1,6 +1,6 @@
 subdir-y += char
 subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
-subdir-$(CONFIG_HAS_PCI) += pci
+subdir-$(CONFIG_HAS_PCI) += pci vpci
 subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
 subdir-$(CONFIG_ACPI) += acpi
 subdir-$(CONFIG_VIDEO) += video
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c8e2d2d9a9..2288cf8814 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -30,6 +30,7 @@
 #include <xen/radix-tree.h>
 #include <xen/softirq.h>
 #include <xen/tasklet.h>
+#include <xen/vpci.h>
 #include <xsm/xsm.h>
 #include <asm/msi.h>
 #include "ats.h"
@@ -1041,6 +1042,8 @@ static void setup_one_hwdom_device(const struct setup_hwdom *ctxt,
         devfn += pdev->phantom_stride;
     } while ( devfn != pdev->devfn &&
               PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
+
+    xen_vpci_add_handlers(pdev);
 }
 
 static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void *arg)
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
new file mode 100644
index 0000000000..840a906470
--- /dev/null
+++ b/xen/drivers/vpci/Makefile
@@ -0,0 +1 @@
+obj-y += vpci.o
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
new file mode 100644
index 0000000000..f4cd04f11d
--- /dev/null
+++ b/xen/drivers/vpci/vpci.c
@@ -0,0 +1,474 @@
+/*
+ * Generic functionality for handling accesses to the PCI configuration space
+ * from guests.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[];
+#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
+#define vpci_init __start_vpci_array
+
+/* Helpers for locking/unlocking. */
+#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
+
+/* Internal struct to store the emulated PCI registers. */
+struct vpci_register {
+    vpci_read_t read;
+    vpci_write_t write;
+    unsigned int size;
+    unsigned int offset;
+    void *priv_data;
+    struct rb_node node;
+};
+
+int xen_vpci_add_handlers(struct pci_dev *pdev)
+{
+    int i, rc = 0;
+
+    if ( !has_vpci(pdev->domain) )
+        return 0;
+
+    pdev->vpci = xzalloc(struct vpci);
+    if ( !pdev->vpci )
+        return -ENOMEM;
+
+    pdev->vpci->handlers = RB_ROOT;
+
+    for ( i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        rc = vpci_init[i](pdev);
+        if ( rc )
+            break;
+    }
+
+    if ( rc )
+    {
+        struct rb_node *node = rb_first(&pdev->vpci->handlers);
+        struct vpci_register *r;
+
+        /* Iterate over the tree and cleanup. */
+        while ( node != NULL )
+        {
+            r = container_of(node, struct vpci_register, node);
+            node = rb_next(node);
+            rb_erase(&r->node, &pdev->vpci->handlers);
+            xfree(r);
+        }
+        xfree(pdev->vpci);
+    }
+
+    return rc;
+}
+
+static bool vpci_register_overlap(const struct vpci_register *r,
+                                  unsigned int offset)
+{
+    if ( offset >= r->offset && offset < r->offset + r->size )
+        return true;
+
+    return false;
+}
+
+
+static int vpci_register_cmp(const struct vpci_register *r1,
+                             const struct vpci_register *r2)
+{
+    /* Make sure there's no overlap between registers. */
+    if ( vpci_register_overlap(r1, r2->offset) ||
+         vpci_register_overlap(r1, r2->offset + r2->size - 1) ||
+         vpci_register_overlap(r2, r1->offset) ||
+         vpci_register_overlap(r2, r1->offset + r1->size - 1) )
+        return 0;
+
+    if (r1->offset < r2->offset)
+        return -1;
+    else if (r1->offset > r2->offset)
+        return 1;
+
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
+static struct vpci_register *vpci_find_register(const struct pci_dev *pdev,
+                                                const unsigned int reg,
+                                                const unsigned int size)
+{
+    struct rb_node *node;
+    struct vpci_register r = {
+        .offset = reg,
+        .size = size,
+    };
+
+    ASSERT(vpci_locked(pdev->domain));
+
+    node = pdev->vpci->handlers.rb_node;
+    while ( node )
+    {
+        struct vpci_register *t =
+            container_of(node, struct vpci_register, node);
+
+        switch ( vpci_register_cmp(&r, t) )
+        {
+        case -1:
+            node = node->rb_left;
+            break;
+        case 1:
+            node = node->rb_right;
+            break;
+        default:
+            return t;
+        }
+    }
+
+    return NULL;
+}
+
+int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t read_handler,
+                          vpci_write_t write_handler, unsigned int offset,
+                          unsigned int size, void *data)
+{
+    struct rb_node **new, *parent;
+    struct vpci_register *r;
+
+    /* Some sanity checks. */
+    if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||
+         offset & (size - 1) || read_handler == NULL || write_handler == NULL )
+        return -EINVAL;
+
+    r = xzalloc(struct vpci_register);
+    if ( !r )
+        return -ENOMEM;
+
+    r->read = read_handler;
+    r->write = write_handler;
+    r->size = size;
+    r->offset = offset;
+    r->priv_data = data;
+
+    vpci_lock(pdev->domain);
+    new = &pdev->vpci->handlers.rb_node;
+    parent = NULL;
+
+    while (*new) {
+        struct vpci_register *this =
+            container_of(*new, struct vpci_register, node);
+
+        parent = *new;
+        switch ( vpci_register_cmp(r, this) )
+        {
+        case -1:
+            new = &((*new)->rb_left);
+            break;
+        case 1:
+            new = &((*new)->rb_right);
+            break;
+        default:
+            xfree(r);
+            vpci_unlock(pdev->domain);
+            return -EEXIST;
+        }
+    }
+
+    rb_link_node(&r->node, parent, new);
+    rb_insert_color(&r->node, &pdev->vpci->handlers);
+    vpci_unlock(pdev->domain);
+
+    return 0;
+}
+
+int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset)
+{
+    struct vpci_register *r;
+
+    vpci_lock(pdev->domain);
+    r = vpci_find_register(pdev, offset, 1 /* size doesn't matter here. */);
+    if ( !r )
+    {
+        vpci_unlock(pdev->domain);
+        return -ENOENT;
+    }
+
+    rb_erase(&r->node, &pdev->vpci->handlers);
+    xfree(r);
+    vpci_unlock(pdev->domain);
+
+    return 0;
+}
+
+/* Wrappers for performing reads/writes to the underlying hardware. */
+static void vpci_read_hw(unsigned int seg, unsigned int bus,
+                         unsigned int devfn, unsigned int reg, uint32_t size,
+                         uint32_t *data)
+{
+    switch ( size )
+    {
+    case 4:
+        *data = pci_conf_read32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                                reg);
+        break;
+    case 3:
+        /*
+         * This is possible because a 4byte read can have 1byte trapped and
+         * the rest passed-through.
+         */
+        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                                reg + 1) << 8;
+        *data |= pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                               reg);
+        break;
+    case 2:
+        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                                reg);
+        break;
+    case 1:
+        *data = pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                               reg);
+        break;
+    default:
+        BUG();
+    }
+}
+
+static void vpci_write_hw(unsigned int seg, unsigned int bus,
+                          unsigned int devfn, unsigned int reg, uint32_t size,
+                          uint32_t data)
+{
+    switch ( size )
+    {
+    case 4:
+        pci_conf_write32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
+                         data);
+        break;
+    case 3:
+        /*
+         * This is possible because a 4byte write can have 1byte trapped and
+         * the rest passed-through.
+         */
+        pci_conf_write8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg, data);
+        pci_conf_write16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg + 1,
+                         data >> 8);
+        break;
+    case 2:
+        pci_conf_write16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
+                         data);
+        break;
+    case 1:
+        pci_conf_write8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg, data);
+        break;
+    default:
+        BUG();
+    }
+}
+
+/* Helper macros for the read/write handlers. */
+#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)
+#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8
+#define ADD_RESULT(r, d, s, o) r |= ((d) & GENMASK_BYTES(s, 0)) << ((o) * 8)
+
+int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
+                  unsigned int reg, uint32_t size, uint32_t *data)
+{
+    struct domain *d = current->domain;
+    struct pci_dev *pdev;
+    const struct vpci_register *r;
+    union vpci_val val = { .double_word = 0 };
+    unsigned int data_rshift = 0, data_lshift = 0, data_size;
+    uint32_t tmp_data;
+    int rc;
+
+    ASSERT(vpci_locked(d));
+
+    *data = 0;
+
+    /* Find the PCI dev matching the address. */
+    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
+    if ( !pdev )
+        goto passthrough;
+
+    /* Find the vPCI register handler. */
+    r = vpci_find_register(pdev, reg, size);
+    if ( !r )
+        goto passthrough;
+
+    if ( r->offset > reg )
+    {
+        /*
+         * There's a heading gap into the emulated register.
+         * NB: it's possible for this recursive call to have a size of 3.
+         */
+        rc = xen_vpci_read(seg, bus, devfn, reg, r->offset - reg, &tmp_data);
+        if ( rc )
+            return rc;
+
+        /* Add the head read to the partial result. */
+        ADD_RESULT(*data, tmp_data, r->offset - reg, 0);
+        data_lshift = r->offset - reg;
+
+        /* Account for the read. */
+        size -= data_lshift;
+        reg += data_lshift;
+    }
+    else if ( r->offset < reg )
+        /* There's an offset into the emulated register */
+        data_rshift = reg - r->offset;
+
+    ASSERT(data_lshift == 0 || data_rshift == 0);
+    data_size = min(size, r->size - data_rshift);
+    ASSERT(data_size != 0);
+
+    /* Perform the read of the register. */
+    rc = r->read(pdev, r->offset, &val, r->priv_data);
+    if ( rc )
+        return rc;
+
+    val.double_word >>= data_rshift * 8;
+    ADD_RESULT(*data, val.double_word, data_size, data_lshift);
+
+    /* Account for the read */
+    size -= data_size;
+    reg += data_size;
+
+    /* Read the remaining, if any. */
+    if ( size > 0 )
+    {
+        /*
+         * Read tailing data.
+         * NB: it's possible for this recursive call to have a size of 3.
+         */
+        rc = xen_vpci_read(seg, bus, devfn, reg, size, &tmp_data);
+        if ( rc )
+            return rc;
+
+        /* Add the tail read to the partial result. */
+        ADD_RESULT(*data, tmp_data, size, data_size + data_lshift);
+    }
+
+    return 0;
+
+ passthrough:
+    vpci_read_hw(seg, bus, devfn, reg, size, data);
+    return 0;
+}
+
+/* Perform a maybe partial write to a register. */
+static int vpci_write_helper(struct pci_dev *pdev,
+                             const struct vpci_register *r, unsigned int size,
+                             unsigned int offset, uint32_t data)
+{
+    union vpci_val val = { .double_word = data };
+    int rc;
+
+    ASSERT(size <= r->size);
+    if ( size != r->size )
+    {
+        rc = r->read(pdev, r->offset, &val, r->priv_data);
+        if ( rc )
+            return rc;
+        val.double_word &= ~GENMASK_BYTES(size + offset, offset);
+        data &= GENMASK_BYTES(size, 0);
+        val.double_word |= data << (offset * 8);
+    }
+
+    return r->write(pdev, r->offset, val, r->priv_data);
+}
+
+int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
+                   unsigned int reg, uint32_t size, uint32_t data)
+{
+    struct domain *d = current->domain;
+    struct pci_dev *pdev;
+    const struct vpci_register *r;
+    unsigned int data_size, data_offset = 0;
+    int rc;
+
+    ASSERT(vpci_locked(d));
+
+    /* Find the PCI dev matching the address. */
+    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
+    if ( !pdev )
+        goto passthrough;
+
+    /* Find the vPCI register handler. */
+    r = vpci_find_register(pdev, reg, size);
+    if ( !r )
+        goto passthrough;
+
+    else if ( r->offset > reg )
+    {
+        /*
+         * There's a heading gap into the emulated register found.
+         * NB: it's possible for this recursive call to have a size of 3.
+         */
+        rc = xen_vpci_write(seg, bus, devfn, reg, r->offset - reg, data);
+        if ( rc )
+            return rc;
+
+        /* Advance the data by the written size. */
+        SHIFT_RIGHT_BYTES(data, r->offset - reg);
+        size -= r->offset - reg;
+        reg += r->offset - reg;
+    }
+    else if ( r->offset < reg )
+        /* There's an offset into the emulated register. */
+        data_offset = reg - r->offset;
+
+    data_size = min(size, r->size - data_offset);
+
+    /* Perform the write of the register. */
+    ASSERT(data_size != 0);
+    rc = vpci_write_helper(pdev, r, data_size, data_offset, data);
+    if ( rc )
+        return rc;
+
+    /* Account for the read */
+    size -= data_size;
+    reg += data_size;
+    SHIFT_RIGHT_BYTES(data, data_size);
+
+    /* Write the remaining, if any. */
+    if ( size > 0 )
+    {
+        /*
+         * Write tailing data.
+         * NB: it's possible for this recursive call to have a size of 3.
+         */
+        rc = xen_vpci_write(seg, bus, devfn, reg, size, data);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+
+ passthrough:
+    vpci_write_hw(seg, bus, devfn, reg, size, data);
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6ab987f231..f0741917ed 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -426,6 +426,7 @@ struct arch_domain
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_PIT))
 #define has_pirq(d)        (!!((d)->arch.emulation_flags & \
                             XEN_X86_EMU_USE_PIRQ))
+#define has_vpci(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_VPCI))
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d2899c9bb2..cbf4170789 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -184,6 +184,9 @@ struct hvm_domain {
     /* List of guest to machine IO ports mapping. */
     struct list_head g2m_ioport_list;
 
+    /* Lock for the PCI emulation layer (vPCI). */
+    spinlock_t vpci_lock;
+
     /* List of permanently write-mapped pages. */
     struct {
         spinlock_t lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 2484eb1c75..2dbf92f13e 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -155,6 +155,9 @@ extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
  */
 void register_g2m_portio_handler(struct domain *d);
 
+/* HVM port IO handler for PCI accesses. */
+void register_vpci_portio_handler(struct domain *d);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index 8a9ba7982b..c00f8cda93 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -295,12 +295,15 @@ struct xen_arch_domainconfig {
 #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
 #define _XEN_X86_EMU_USE_PIRQ       9
 #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
+#define _XEN_X86_EMU_VPCI           10
+#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
 
 #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |  \
                                      XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
                                      XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
-                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
+                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
+                                     XEN_X86_EMU_VPCI)
     uint32_t emulation_flags;
 };
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 59b6e8a81c..a83c4a1276 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -13,6 +13,7 @@
 #include <xen/irq.h>
 #include <xen/pci_regs.h>
 #include <xen/pfn.h>
+#include <xen/rbtree.h>
 #include <asm/device.h>
 #include <asm/numa.h>
 #include <asm/pci.h>
@@ -88,6 +89,9 @@ struct pci_dev {
 #define PT_FAULT_THRESHOLD 10
     } fault;
     u64 vf_rlen[6];
+
+    /* Data for vPCI. */
+    struct vpci *vpci;
 };
 
 #define for_each_pdev(domain, pdev) \
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
new file mode 100644
index 0000000000..56e8d1c35e
--- /dev/null
+++ b/xen/include/xen/vpci.h
@@ -0,0 +1,66 @@
+#ifndef _VPCI_
+#define _VPCI_
+
+#include <xen/pci.h>
+#include <xen/types.h>
+
+/* Helpers for locking/unlocking. */
+#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
+
+/* Value read or written by the handlers. */
+union vpci_val {
+    uint8_t half_word;
+    uint16_t word;
+    uint32_t double_word;
+};
+
+/*
+ * The vPCI handlers will never be called concurrently for the same domain, ii
+ * is guaranteed that the vpci domain lock will always be locked when calling
+ * any handler.
+ */
+typedef int (*vpci_read_t)(struct pci_dev *pdev, unsigned int reg,
+                           union vpci_val *val, void *data);
+
+typedef int (*vpci_write_t)(struct pci_dev *pdev, unsigned int reg,
+                            union vpci_val val, void *data);
+
+typedef int (*vpci_register_init_t)(struct pci_dev *dev);
+
+#define REGISTER_VPCI_INIT(x) \
+  static const vpci_register_init_t x##_entry __used_section(".data.vpci") = x
+
+/* Add vPCI handlers to device. */
+int xen_vpci_add_handlers(struct pci_dev *dev);
+
+/* Add/remove a register handler. */
+int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t read_handler,
+                          vpci_write_t write_handler, unsigned int offset,
+                          unsigned int size, void *data);
+int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset);
+
+/* Generic read/write handlers for the PCI config space. */
+int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
+                  unsigned int reg, uint32_t size, uint32_t *data);
+int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
+                   unsigned int reg, uint32_t size, uint32_t data);
+
+struct vpci {
+    /* Root pointer for the tree of vPCI handlers. */
+    struct rb_root handlers;
+};
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/9] x86/ecam: add handlers for the PVH Dom0 MMCFG areas
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init Roger Pau Monne
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul Durrant, Jan Beulich, boris.ostrovsky,
	Roger Pau Monne

Introduce a set of handlers for the accesses to the ECAM areas. Those areas are
setup based on the contents of the hardware MMCFG tables, and the list of
handled ECAM areas is stored inside of the hvm_domain struct.

The read/writes are forwarded to the generic vpci handlers once the address is
decoded in order to obtain the device and register the guest is trying to
access.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v1:
 - Added locking.
---
 xen/arch/x86/hvm/dom0_build.c    |  27 ++++++++
 xen/arch/x86/hvm/hvm.c           |  10 +++
 xen/arch/x86/hvm/io.c            | 135 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h |  10 +++
 xen/include/asm-x86/hvm/io.h     |   4 ++
 5 files changed, 186 insertions(+)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 020c355faf..ca88c5835e 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -38,6 +38,8 @@
 #include <public/hvm/hvm_info_table.h>
 #include <public/hvm/hvm_vcpu.h>
 
+#include "../x86_64/mmconfig.h"
+
 /*
  * Have the TSS cover the ISA port range, which makes it
  * - 104 bytes base structure
@@ -1048,6 +1050,24 @@ static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
     return 0;
 }
 
+int __init pvh_setup_ecam(struct domain *d)
+{
+    unsigned int i;
+    int rc;
+
+    for ( i = 0; i < pci_mmcfg_config_num; i++ )
+    {
+        size_t size = (pci_mmcfg_config[i].end_bus_number + 1) << 20;
+
+        rc = register_vpci_ecam_handler(d, pci_mmcfg_config[i].address, size,
+                                        pci_mmcfg_config[i].pci_segment);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 int __init dom0_construct_pvh(struct domain *d, const module_t *image,
                               unsigned long image_headroom,
                               module_t *initrd,
@@ -1090,6 +1110,13 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_setup_ecam(d);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 PCI ECAM areas: %d\n", rc);
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7f3322ede6..ef3ad2a615 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -613,6 +613,7 @@ int hvm_domain_initialise(struct domain *d)
     spin_lock_init(&d->arch.hvm_domain.write_map.lock);
     INIT_LIST_HEAD(&d->arch.hvm_domain.write_map.list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.g2m_ioport_list);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ecam_regions);
 
     hvm_init_cacheattr_region_list(d);
 
@@ -725,6 +726,7 @@ void hvm_domain_destroy(struct domain *d)
 {
     struct list_head *ioport_list, *tmp;
     struct g2m_ioport *ioport;
+    struct hvm_ecam *ecam, *etmp;
 
     xfree(d->arch.hvm_domain.io_handler);
     d->arch.hvm_domain.io_handler = NULL;
@@ -752,6 +754,14 @@ void hvm_domain_destroy(struct domain *d)
         list_del(&ioport->list);
         xfree(ioport);
     }
+
+    list_for_each_entry_safe ( ecam, etmp, &d->arch.hvm_domain.ecam_regions,
+                               next )
+    {
+        list_del(&ecam->next);
+        xfree(ecam);
+    }
+
 }
 
 static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h)
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 15048da556..319cf9287b 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -391,6 +391,141 @@ void register_vpci_portio_handler(struct domain *d)
     handler->ops = &vpci_portio_ops;
 }
 
+/* Handlers to trap PCI ECAM config accesses. */
+static struct hvm_ecam *vpci_ecam_find(struct domain *d, unsigned long addr)
+{
+    struct hvm_ecam *ecam = NULL;
+
+    ASSERT(vpci_locked(d));
+    list_for_each_entry ( ecam, &d->arch.hvm_domain.ecam_regions, next )
+        if ( addr >= ecam->addr && addr < ecam->addr + ecam->size )
+            return ecam;
+
+    return NULL;
+}
+
+static void vpci_ecam_decode_addr(unsigned long addr, unsigned int *bus,
+                                  unsigned int *devfn, unsigned int *reg)
+{
+    *bus = (addr >> 20) & 0xff;
+    *devfn = (addr >> 12) & 0xff;
+    *reg = addr & 0xfff;
+}
+
+static int vpci_ecam_accept(struct vcpu *v, unsigned long addr)
+{
+    struct domain *d = v->domain;
+    int found;
+
+    vpci_lock(d);
+    found = !!vpci_ecam_find(v->domain, addr);
+    vpci_unlock(d);
+
+    return found;
+}
+
+static int vpci_ecam_read(struct vcpu *v, unsigned long addr,
+                          unsigned int len, unsigned long *data)
+{
+    struct domain *d = v->domain;
+    struct hvm_ecam *ecam;
+    unsigned int bus, devfn, reg;
+    uint32_t data32;
+    int rc;
+
+    vpci_lock(d);
+    ecam = vpci_ecam_find(d, addr);
+    if ( !ecam )
+    {
+        vpci_unlock(d);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    vpci_ecam_decode_addr(addr - ecam->addr, &bus, &devfn, &reg);
+
+    if ( vpci_access_check(reg, len) || reg >= 0xfff )
+    {
+        vpci_unlock(d);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    rc = xen_vpci_read(ecam->segment, bus, devfn, reg, len, &data32);
+    if ( !rc )
+        *data = data32;
+    vpci_unlock(d);
+
+    return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
+}
+
+static int vpci_ecam_write(struct vcpu *v, unsigned long addr,
+                           unsigned int len, unsigned long data)
+{
+    struct domain *d = v->domain;
+    struct hvm_ecam *ecam;
+    unsigned int bus, devfn, reg;
+    int rc;
+
+    vpci_lock(d);
+    ecam = vpci_ecam_find(d, addr);
+    if ( !ecam )
+    {
+        vpci_unlock(d);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    vpci_ecam_decode_addr(addr - ecam->addr, &bus, &devfn, &reg);
+
+    if ( vpci_access_check(reg, len) || reg >= 0xfff )
+    {
+        vpci_unlock(d);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    rc = xen_vpci_write(ecam->segment, bus, devfn, reg, len, data);
+    vpci_unlock(d);
+
+    return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
+}
+
+static const struct hvm_mmio_ops vpci_ecam_ops = {
+    .check = vpci_ecam_accept,
+    .read = vpci_ecam_read,
+    .write = vpci_ecam_write,
+};
+
+int register_vpci_ecam_handler(struct domain *d, paddr_t addr, size_t size,
+                               unsigned int seg)
+{
+    struct hvm_ecam *ecam;
+
+    ASSERT(is_hardware_domain(d));
+
+    vpci_lock(d);
+    if ( vpci_ecam_find(d, addr) )
+    {
+        vpci_unlock(d);
+        return -EEXIST;
+    }
+
+    ecam = xzalloc(struct hvm_ecam);
+    if ( !ecam )
+    {
+        vpci_unlock(d);
+        return -ENOMEM;
+    }
+
+    if ( list_empty(&d->arch.hvm_domain.ecam_regions) )
+        register_mmio_handler(d, &vpci_ecam_ops);
+
+    ecam->addr = addr;
+    ecam->segment = seg;
+    ecam->size = size;
+    list_add(&ecam->next,  &d->arch.hvm_domain.ecam_regions);
+    vpci_unlock(d);
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index cbf4170789..ce710496c7 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -100,6 +100,13 @@ struct hvm_pi_ops {
     void (*do_resume)(struct vcpu *v);
 };
 
+struct hvm_ecam {
+    paddr_t addr;
+    size_t size;
+    unsigned int segment;
+    struct list_head next;
+};
+
 struct hvm_domain {
     /* Guest page range used for non-default ioreq servers */
     struct {
@@ -187,6 +194,9 @@ struct hvm_domain {
     /* Lock for the PCI emulation layer (vPCI). */
     spinlock_t vpci_lock;
 
+    /* List of ECAM (MMCFG) regions trapped by Xen. */
+    struct list_head ecam_regions;
+
     /* List of permanently write-mapped pages. */
     struct {
         spinlock_t lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 2dbf92f13e..0434aca706 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -158,6 +158,10 @@ void register_g2m_portio_handler(struct domain *d);
 /* HVM port IO handler for PCI accesses. */
 void register_vpci_portio_handler(struct domain *d);
 
+/* HVM MMIO handler for PCI ECAM accesses. */
+int register_vpci_ecam_handler(struct domain *d, paddr_t addr, size_t size,
+                               unsigned int seg);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 2/9] x86/ecam: add handlers for the PVH Dom0 MMCFG areas Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-24 14:42   ` Julien Grall
  2017-04-20 15:17 ` [PATCH v2 4/9] xen/pci: split code to size BARs from pci_add_device Roger Pau Monne
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

And also allow it to do non-identity mappings by adding a new parameter. This
function will be needed in other parts apart from PVH Dom0 build.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/dom0_build.c | 22 +---------------------
 xen/common/memory.c           | 34 ++++++++++++++++++++++++++++++++++
 xen/include/xen/p2m-common.h  |  4 ++++
 3 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index ca88c5835e..65f606d33a 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -64,27 +64,7 @@ static struct acpi_madt_nmi_source __initdata *nmisrc;
 static int __init modify_identity_mmio(struct domain *d, unsigned long pfn,
                                        unsigned long nr_pages, const bool map)
 {
-    int rc;
-
-    for ( ; ; )
-    {
-        rc = (map ? map_mmio_regions : unmap_mmio_regions)
-             (d, _gfn(pfn), nr_pages, _mfn(pfn));
-        if ( rc == 0 )
-            break;
-        if ( rc < 0 )
-        {
-            printk(XENLOG_WARNING
-                   "Failed to identity %smap [%#lx,%#lx) for d%d: %d\n",
-                   map ? "" : "un", pfn, pfn + nr_pages, d->domain_id, rc);
-            break;
-        }
-        nr_pages -= rc;
-        pfn += rc;
-        process_pending_softirqs();
-    }
-
-    return rc;
+    return modify_mmio(d, pfn, pfn, nr_pages, map);
 }
 
 /* Populate a HVM memory range using the biggest possible order. */
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 52879e7438..0d970482cb 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1438,6 +1438,40 @@ int prepare_ring_for_helper(
     return 0;
 }
 
+int modify_mmio(struct domain *d, unsigned long gfn, unsigned long pfn,
+                unsigned long nr_pages, const bool map)
+{
+    int rc;
+
+    /*
+     * Make sure this function is only used by the hardware domain, because it
+     * can take an arbitrary long time, and could DoS the whole system.
+     */
+    ASSERT(is_hardware_domain(d));
+
+    for ( ; ; )
+    {
+        rc = (map ? map_mmio_regions : unmap_mmio_regions)
+             (d, _gfn(gfn), nr_pages, _mfn(pfn));
+        if ( rc == 0 )
+            break;
+        if ( rc < 0 )
+        {
+            printk(XENLOG_WARNING
+                   "Failed to %smap [%#lx, %#lx) -> [%#lx,%#lx) for d%d: %d\n",
+                   map ? "" : "un", gfn, gfn + nr_pages, pfn, pfn + nr_pages,
+                   d->domain_id, rc);
+            break;
+        }
+        nr_pages -= rc;
+        pfn += rc;
+        gfn += rc;
+        process_pending_softirqs();
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h
index 8cd5a6b503..1308da44e7 100644
--- a/xen/include/xen/p2m-common.h
+++ b/xen/include/xen/p2m-common.h
@@ -13,4 +13,8 @@ int unmap_mmio_regions(struct domain *d,
                        unsigned long nr,
                        mfn_t mfn);
 
+
+int modify_mmio(struct domain *d, unsigned long gfn, unsigned long pfn,
+                unsigned long nr_pages, const bool map);
+
 #endif /* _XEN_P2M_COMMON_H */
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 4/9] xen/pci: split code to size BARs from pci_add_device
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
                   ` (2 preceding siblings ...)
  2017-04-20 15:17 ` [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 5/9] xen/vpci: add handlers to map the BARs Roger Pau Monne
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky, Roger Pau Monne, Jan Beulich

So that it can be called from outside in order to get the size of regular PCI
BARs. This will be required in order to map the BARs from PCI devices into PVH
Dom0 p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/drivers/passthrough/pci.c | 86 ++++++++++++++++++++++++++-----------------
 xen/include/xen/pci.h         |  3 ++
 2 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 2288cf8814..7710c41533 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -588,6 +588,51 @@ static void pci_enable_acs(struct pci_dev *pdev)
     pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
+int pci_size_bar(unsigned int seg, unsigned int bus, unsigned int slot,
+                 unsigned int func, unsigned int base, unsigned int max_bars,
+                 unsigned int *index, uint64_t *addr, uint64_t *size)
+{
+    unsigned int idx = base + *index * 4;
+    u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
+    u32 hi = 0;
+
+    *addr = *size = 0;
+
+    ASSERT((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_MEMORY);
+    pci_conf_write32(seg, bus, slot, func, idx, ~0);
+    if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+         PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        if ( *index >= max_bars )
+        {
+            dprintk(XENLOG_WARNING,
+                    "device %04x:%02x:%02x.%u with 64-bit BAR in last slot\n",
+                    seg, bus, slot, func);
+            return -EINVAL;
+        }
+        hi = pci_conf_read32(seg, bus, slot, func, idx + 4);
+        pci_conf_write32(seg, bus, slot, func, idx + 4, ~0);
+    }
+    *size = pci_conf_read32(seg, bus, slot, func, idx) &
+            PCI_BASE_ADDRESS_MEM_MASK;
+    if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+         PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        *size |= (u64)pci_conf_read32(seg, bus, slot, func, idx + 4) << 32;
+        pci_conf_write32(seg, bus, slot, func, idx + 4, hi);
+    }
+    else if ( *size )
+        *size |= (u64)~0 << 32;
+    pci_conf_write32(seg, bus, slot, func, idx, bar);
+    *size = -(*size);
+    *addr = (bar & PCI_BASE_ADDRESS_MEM_MASK) | ((u64)hi << 32);
+    if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+         PCI_BASE_ADDRESS_MEM_TYPE_64 )
+        ++*index;
+
+    return 0;
+}
+
 int pci_add_device(u16 seg, u8 bus, u8 devfn,
                    const struct pci_dev_info *info, nodeid_t node)
 {
@@ -652,7 +697,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
             {
                 unsigned int idx = pos + PCI_SRIOV_BAR + i * 4;
                 u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
-                u32 hi = 0;
+                uint64_t addr;
 
                 if ( (bar & PCI_BASE_ADDRESS_SPACE) ==
                      PCI_BASE_ADDRESS_SPACE_IO )
@@ -663,38 +708,13 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
                            seg, bus, slot, func, i);
                     continue;
                 }
-                pci_conf_write32(seg, bus, slot, func, idx, ~0);
-                if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
-                     PCI_BASE_ADDRESS_MEM_TYPE_64 )
-                {
-                    if ( i >= PCI_SRIOV_NUM_BARS )
-                    {
-                        printk(XENLOG_WARNING
-                               "SR-IOV device %04x:%02x:%02x.%u with 64-bit"
-                               " vf BAR in last slot\n",
-                               seg, bus, slot, func);
-                        break;
-                    }
-                    hi = pci_conf_read32(seg, bus, slot, func, idx + 4);
-                    pci_conf_write32(seg, bus, slot, func, idx + 4, ~0);
-                }
-                pdev->vf_rlen[i] = pci_conf_read32(seg, bus, slot, func, idx) &
-                                   PCI_BASE_ADDRESS_MEM_MASK;
-                if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
-                     PCI_BASE_ADDRESS_MEM_TYPE_64 )
-                {
-                    pdev->vf_rlen[i] |= (u64)pci_conf_read32(seg, bus,
-                                                             slot, func,
-                                                             idx + 4) << 32;
-                    pci_conf_write32(seg, bus, slot, func, idx + 4, hi);
-                }
-                else if ( pdev->vf_rlen[i] )
-                    pdev->vf_rlen[i] |= (u64)~0 << 32;
-                pci_conf_write32(seg, bus, slot, func, idx, bar);
-                pdev->vf_rlen[i] = -pdev->vf_rlen[i];
-                if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
-                     PCI_BASE_ADDRESS_MEM_TYPE_64 )
-                    ++i;
+                ret = pci_size_bar(seg, bus, slot, func, pos + PCI_SRIOV_BAR,
+                                   PCI_SRIOV_NUM_BARS, &i, &addr,
+                                   &pdev->vf_rlen[i]);
+                if ( ret )
+                    dprintk(XENLOG_WARNING,
+                            "%04x:%02x:%02x.%u: failed to size SR-IOV BAR%u\n",
+                            seg, bus, slot, func, i);
             }
         }
         else
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index a83c4a1276..3d3853fd6f 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -165,6 +165,9 @@ const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
                       unsigned int *dev, unsigned int *func);
 const char *parse_pci_seg(const char *, unsigned int *seg, unsigned int *bus,
                           unsigned int *dev, unsigned int *func, bool *def_seg);
+int pci_size_bar(unsigned int seg, unsigned int bus, unsigned int slot,
+                 unsigned int func, unsigned int base, unsigned int max_bars,
+                 unsigned int *index, uint64_t *addr, uint64_t *size);
 
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 5/9] xen/vpci: add handlers to map the BARs
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
                   ` (3 preceding siblings ...)
  2017-04-20 15:17 ` [PATCH v2 4/9] xen/pci: split code to size BARs from pci_add_device Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 6/9] xen/vpci: trap access to the list of PCI capabilities Roger Pau Monne
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Jan Beulich, boris.ostrovsky,
	Roger Pau Monne

Introduce a set of handlers that trap accesses to the PCI BARs and the command
register, in order to emulate BAR sizing and BAR relocation.

The command handler is used to detect changes to bit 2 (response to memory
space accesses), and maps/unmaps the BARs of the device into the guest p2m.

The BAR register handlers are used to detect attempts by the guest to size or
relocate the BARs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/drivers/vpci/Makefile |   2 +-
 xen/drivers/vpci/header.c | 270 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/vpci.h    |  27 +++++
 3 files changed, 298 insertions(+), 1 deletion(-)
 create mode 100644 xen/drivers/vpci/header.c

diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 840a906470..241467212f 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o
+obj-y += vpci.o header.o
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
new file mode 100644
index 0000000000..808888c329
--- /dev/null
+++ b/xen/drivers/vpci/header.c
@@ -0,0 +1,270 @@
+/*
+ * Generic functionality for handling accesses to the PCI header from the
+ * configuration space.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <xen/p2m-common.h>
+
+static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
+{
+    struct vpci_header *header = &pdev->vpci->header;
+    unsigned int i;
+    int rc = 0;
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        paddr_t gaddr = map ? header->bars[i].gaddr
+                            : header->bars[i].mapped_addr;
+        paddr_t paddr = header->bars[i].paddr;
+
+        if ( header->bars[i].type != VPCI_BAR_MEM &&
+             header->bars[i].type != VPCI_BAR_MEM64_LO )
+            continue;
+
+        rc = modify_mmio(pdev->domain, PFN_DOWN(gaddr), PFN_DOWN(paddr),
+                         PFN_UP(header->bars[i].size), map);
+        if ( rc )
+            break;
+
+        header->bars[i].mapped_addr = map ? gaddr : 0;
+    }
+
+    return rc;
+}
+
+static int vpci_cmd_read(struct pci_dev *pdev, unsigned int reg,
+                         union vpci_val *val, void *data)
+{
+    struct vpci_header *header = data;
+
+    val->word = header->command;
+
+    return 0;
+}
+
+static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg,
+                          union vpci_val val, void *data)
+{
+    struct vpci_header *header = data;
+    uint16_t new_cmd, saved_cmd;
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    int rc;
+
+    new_cmd = val.word;
+    saved_cmd = header->command;
+
+    if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) )
+        goto out;
+
+    /* Memory space access change. */
+    rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u:unable to %smap BARs: %d\n",
+                seg, bus, slot, func,
+                new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc);
+        return rc;
+    }
+
+ out:
+    pci_conf_write16(seg, bus, slot, func, reg, new_cmd);
+    header->command = pci_conf_read16(seg, bus, slot, func, reg);
+    return 0;
+}
+
+static int vpci_bar_read(struct pci_dev *pdev, unsigned int reg,
+                         union vpci_val *val, void *data)
+{
+    struct vpci_bar *bar = data;
+    bool hi = false;
+
+    ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
+           bar->type == VPCI_BAR_MEM64_HI);
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);
+        bar--;
+        hi = true;
+    }
+
+    if ( bar->sizing )
+        val->double_word = ~(bar->size - 1) >> (hi ? 32 : 0);
+    else
+        val->double_word = bar->gaddr >> (hi ? 32 : 0);
+
+    val->double_word |= hi ? 0 : bar->attributes;
+
+    return 0;
+}
+
+static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
+                          union vpci_val val, void *data)
+{
+    struct vpci_bar *bar = data;
+    uint32_t wdata = val.double_word;
+    bool hi = false;
+
+    ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
+           bar->type == VPCI_BAR_MEM64_HI);
+
+    if ( wdata == GENMASK(31, 0) )
+    {
+        /* Next reads from this register are going to return the BAR size. */
+        bar->sizing = true;
+        return 0;
+    }
+
+    /* End previous sizing cycle if any. */
+    bar->sizing = false;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);
+        bar--;
+        hi = true;
+    }
+
+    /* Update the relevant part of the BAR address. */
+    bar->gaddr &= hi ? ~GENMASK(63, 32) : ~GENMASK(31, 0);
+    wdata &= hi ? GENMASK(31, 0) : PCI_BASE_ADDRESS_MEM_MASK;
+    bar->gaddr |= (uint64_t)wdata << (hi ? 32 : 0);
+
+    ASSERT(IS_ALIGNED(bar->gaddr, PAGE_SIZE));
+
+    return 0;
+}
+
+static int vpci_init_bars(struct pci_dev *pdev)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    uint8_t header_type;
+    unsigned int i, num_bars;
+    struct vpci_header *header = &pdev->vpci->header;
+    struct vpci_bar *bars = header->bars;
+    int rc;
+
+    header_type = pci_conf_read8(seg, bus, slot, func, PCI_HEADER_TYPE) & 0x7f;
+    if ( header_type == PCI_HEADER_TYPE_NORMAL )
+        num_bars = 6;
+    else if ( header_type == PCI_HEADER_TYPE_BRIDGE )
+        num_bars = 2;
+    else
+        return -ENOSYS;
+
+    /* Setup a handler for the control register. */
+    header->command = pci_conf_read16(seg, bus, slot, func, PCI_COMMAND);
+    rc = xen_vpci_add_register(pdev, vpci_cmd_read, vpci_cmd_write,
+                               PCI_COMMAND, 2, header);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler register %#x: %d\n",
+                seg, bus, slot, func, PCI_COMMAND, rc);
+        return rc;
+    }
+
+    for ( i = 0; i < num_bars; i++ )
+    {
+        uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
+        uint32_t val = pci_conf_read32(seg, bus, slot, func, reg);
+        uint64_t addr, size;
+        unsigned int index;
+
+        if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
+        {
+            bars[i].type = VPCI_BAR_MEM64_HI;
+            continue;
+        }
+        else if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
+        {
+            bars[i].type = VPCI_BAR_IO;
+            continue;
+        }
+        else if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+                  PCI_BASE_ADDRESS_MEM_TYPE_64 )
+            bars[i].type = VPCI_BAR_MEM64_LO;
+        else
+            bars[i].type = VPCI_BAR_MEM;
+
+        /* Size the BAR and map it. */
+        index = i;
+        rc = pci_size_bar(seg, bus, slot, func, PCI_BASE_ADDRESS_0, num_bars,
+                          &index, &addr, &size);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: unable to size BAR#%u: %d\n",
+                    seg, bus, slot, func, i, rc);
+            return rc;
+        }
+
+        if ( size == 0 )
+        {
+            bars[i].type = VPCI_BAR_EMPTY;
+            continue;
+        }
+
+        ASSERT(IS_ALIGNED(addr, PAGE_SIZE));
+
+        /* Initial guest address is the hardware one. */
+        bars[i].gaddr = bars[i].paddr = addr;
+        bars[i].size = size;
+        bars[i].attributes = val & ~PCI_BASE_ADDRESS_MEM_MASK;
+
+        rc = xen_vpci_add_register(pdev, vpci_bar_read, vpci_bar_write, reg,
+                                   4, &bars[i]);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to add handler for BAR#%u: %d\n",
+                    seg, bus, slot, func, i, rc);
+            return rc;
+        }
+    }
+
+    if ( header->command & PCI_COMMAND_MEMORY )
+    {
+        rc = vpci_modify_bars(pdev, true);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: unable to map BARs: %d\n",
+                    seg, bus, slot, func, rc);
+            return rc;
+        }
+    }
+
+    return 0;
+}
+
+REGISTER_VPCI_INIT(vpci_init_bars);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 56e8d1c35e..68a2ab9cd5 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -50,6 +50,33 @@ int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
 struct vpci {
     /* Root pointer for the tree of vPCI handlers. */
     struct rb_root handlers;
+
+#ifdef __XEN__
+    /* Hide the rest of the vpci struct from the user-space test harness. */
+    struct vpci_header {
+        /* Cached value of the command register. */
+        uint16_t command;
+        /* Information about the PCI BARs of this device. */
+        struct vpci_bar {
+            enum {
+                VPCI_BAR_EMPTY,
+                VPCI_BAR_IO,
+                VPCI_BAR_MEM,
+                VPCI_BAR_MEM64_LO,
+                VPCI_BAR_MEM64_HI,
+            } type;
+            /* Hardware address. */
+            paddr_t paddr;
+            /* Guest address where the BAR should be mapped. */
+            paddr_t gaddr;
+            /* Current guest address where the BAR is mapped. */
+            paddr_t mapped_addr;
+            size_t size;
+            unsigned int attributes:4;
+            bool sizing;
+        } bars[6];
+    } header;
+#endif
 };
 
 #endif
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 6/9] xen/vpci: trap access to the list of PCI capabilities
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
                   ` (4 preceding siblings ...)
  2017-04-20 15:17 ` [PATCH v2 5/9] xen/vpci: add handlers to map the BARs Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 7/9] vpci: add a priority field to the vPCI register initializer Roger Pau Monne
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

Add traps to each capability PCI_CAP_LIST_NEXT field in order to mask them on
request.

All capabilities from the device are fetched and stored in an internal list,
that's later used in order to return the next capability to the guest. Note
that this only removes the capability from the linked list as seen by the
guest, but the actual capability structure could still be accessed by the
guest, provided that it's position can be found using another mechanism.
Finally the MSI and MSI-X capabilities are masked until Xen knows how to
properly handle accesses to them.

This should allow a PVH Dom0 to boot on some hardware, provided that the
hardware doesn't require MSI/MSI-X and that there are no SR-IOV devices in the
system, so the panic at the end of the PVH Dom0 build is replaced by a
warning.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v1:
 - Add missing newline between cmd handlers.
 - Switch the handler to use list_for_each_entry_continue instead of a wrong
   open-coded version of it.
---
 xen/arch/x86/hvm/dom0_build.c   |   2 +-
 xen/drivers/vpci/Makefile       |   2 +-
 xen/drivers/vpci/capabilities.c | 159 ++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/vpci.h          |   3 +
 4 files changed, 164 insertions(+), 2 deletions(-)
 create mode 100644 xen/drivers/vpci/capabilities.c

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 65f606d33a..bcd10bd69c 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1097,7 +1097,7 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
-    panic("Building a PVHv2 Dom0 is not yet supported.");
+    printk("WARNING: PVH is an experimental mode with limited functionality\n");
     return 0;
 }
 
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 241467212f..c3f3085c93 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o header.o
+obj-y += vpci.o header.o capabilities.o
diff --git a/xen/drivers/vpci/capabilities.c b/xen/drivers/vpci/capabilities.c
new file mode 100644
index 0000000000..b2a3326aa7
--- /dev/null
+++ b/xen/drivers/vpci/capabilities.c
@@ -0,0 +1,159 @@
+/*
+ * Generic functionality for handling accesses to the PCI capabilities from
+ * the configuration space.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+struct vpci_capability {
+    struct list_head next;
+    uint8_t offset;
+    bool masked;
+};
+
+static int vpci_cap_read(struct pci_dev *pdev, unsigned int reg,
+                         union vpci_val *val, void *data)
+{
+    struct vpci_capability *cap = data;
+
+    val->half_word = 0;
+
+    /* Return the position of the next non-masked capability. */
+    list_for_each_entry_continue ( cap, &pdev->vpci->cap_list, next )
+    {
+        if ( !cap->masked )
+        {
+            val->half_word = cap->offset;
+            break;
+        }
+    }
+
+    return 0;
+}
+
+static int vpci_cap_write(struct pci_dev *pdev, unsigned int reg,
+                          union vpci_val val, void *data)
+{
+    /* Ignored. */
+    return 0;
+}
+
+static int vpci_index_capabilities(struct pci_dev *pdev)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    uint8_t pos = PCI_CAPABILITY_LIST;
+    uint16_t status;
+    unsigned int max_cap = 48;
+    struct vpci_capability *cap;
+    int rc;
+
+    INIT_LIST_HEAD(&pdev->vpci->cap_list);
+
+    /* Check if device has capabilities. */
+    status = pci_conf_read16(seg, bus, slot, func, PCI_STATUS);
+    if ( !(status & PCI_STATUS_CAP_LIST) )
+        return 0;
+
+    /* Add the root capability pointer. */
+    cap = xzalloc(struct vpci_capability);
+    if ( !cap )
+        return -ENOMEM;
+
+    cap->offset = pos;
+    list_add_tail(&cap->next, &pdev->vpci->cap_list);
+    rc = xen_vpci_add_register(pdev, vpci_cap_read, vpci_cap_write, pos,
+                               1, cap);
+    if ( rc )
+        return rc;
+
+    /*
+     * Iterate over the list of capabilities present in the device, and
+     * add a handler for each register pointer to the next item
+     * (PCI_CAP_LIST_NEXT).
+     */
+    while ( max_cap-- )
+    {
+        pos = pci_conf_read8(seg, bus, slot, func, pos);
+        if ( pos < 0x40 )
+            break;
+
+        cap = xzalloc(struct vpci_capability);
+        if ( !cap )
+            return -ENOMEM;
+
+        cap->offset = pos;
+        list_add_tail(&cap->next, &pdev->vpci->cap_list);
+        pos += PCI_CAP_LIST_NEXT;
+        rc = xen_vpci_add_register(pdev, vpci_cap_read, vpci_cap_write, pos,
+                                   1, cap);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
+static void vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
+{
+    struct vpci_capability *cap;
+    uint8_t cap_offset;
+
+    cap_offset = pci_find_cap_offset(pdev->seg, pdev->bus,
+                                     PCI_SLOT(pdev->devfn),
+                                     PCI_FUNC(pdev->devfn), cap_id);
+    if ( !cap_offset )
+        return;
+
+    list_for_each_entry ( cap, &pdev->vpci->cap_list, next )
+    {
+        if ( cap->offset == cap_offset )
+        {
+            cap->masked = true;
+            break;
+        }
+    }
+}
+
+static int vpci_capabilities_init(struct pci_dev *pdev)
+{
+    int rc;
+
+    rc = vpci_index_capabilities(pdev);
+    if ( rc )
+        return rc;
+
+    /* Mask MSI and MSI-X capabilities until Xen handles them. */
+    vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
+    vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
+
+    return 0;
+}
+
+REGISTER_VPCI_INIT(vpci_capabilities_init);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 68a2ab9cd5..53443f5164 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -76,6 +76,9 @@ struct vpci {
             bool sizing;
         } bars[6];
     } header;
+
+    /* List of capabilities supported by the device. */
+    struct list_head cap_list;
 #endif
 };
 
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 7/9] vpci: add a priority field to the vPCI register initializer
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
                   ` (5 preceding siblings ...)
  2017-04-20 15:17 ` [PATCH v2 6/9] xen/vpci: trap access to the list of PCI capabilities Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 8/9] vpci/msi: add MSI handlers Roger Pau Monne
  2017-04-20 15:17 ` [PATCH v2 9/9] vpci/msix: add MSI-X handlers Roger Pau Monne
  8 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky, Roger Pau Monne

And mark the capability and header vPCI register initializers as high priority,
so that they are initialized first.

This is needed for MSI-X, since MSI-X needs to know the position of the BARs in
order to perform it's initialization, and in order to mask or enable the
MSI/MSI-X functionality on demand.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Jan Beulich <jbeulich@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/tests/vpci/Makefile       |  4 ++--
 xen/drivers/vpci/capabilities.c |  2 +-
 xen/drivers/vpci/header.c       |  2 +-
 xen/drivers/vpci/vpci.c         | 14 ++++++++++++--
 xen/include/xen/vpci.h          | 13 +++++++++++--
 5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
index 7969fcbd82..e5edc4f512 100644
--- a/tools/tests/vpci/Makefile
+++ b/tools/tests/vpci/Makefile
@@ -31,8 +31,8 @@ vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
 	# Trick the compiler so it doesn't complain about missing symbols
 	sed -e '/#include/d' \
 	    -e '1s;^;#include "emul.h"\
-	             const vpci_register_init_t __start_vpci_array[1]\;\
-	             const vpci_register_init_t __end_vpci_array[1]\;\
+	             const struct vpci_register_init __start_vpci_array[1]\;\
+	             const struct vpci_register_init __end_vpci_array[1]\;\
 	             ;' <$< >$@
 
 rbtree.h: $(XEN_ROOT)/xen/include/xen/rbtree.h
diff --git a/xen/drivers/vpci/capabilities.c b/xen/drivers/vpci/capabilities.c
index b2a3326aa7..204355e673 100644
--- a/xen/drivers/vpci/capabilities.c
+++ b/xen/drivers/vpci/capabilities.c
@@ -145,7 +145,7 @@ static int vpci_capabilities_init(struct pci_dev *pdev)
     return 0;
 }
 
-REGISTER_VPCI_INIT(vpci_capabilities_init);
+REGISTER_VPCI_INIT(vpci_capabilities_init, true);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 808888c329..d77d82455f 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -256,7 +256,7 @@ static int vpci_init_bars(struct pci_dev *pdev)
     return 0;
 }
 
-REGISTER_VPCI_INIT(vpci_init_bars);
+REGISTER_VPCI_INIT(vpci_init_bars, true);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f4cd04f11d..9f9abadbdb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -20,7 +20,7 @@
 #include <xen/sched.h>
 #include <xen/vpci.h>
 
-extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[];
+extern const struct vpci_register_init __start_vpci_array[], __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 #define vpci_init __start_vpci_array
 
@@ -42,6 +42,7 @@ struct vpci_register {
 int xen_vpci_add_handlers(struct pci_dev *pdev)
 {
     int i, rc = 0;
+    bool priority = true;
 
     if ( !has_vpci(pdev->domain) )
         return 0;
@@ -52,9 +53,13 @@ int xen_vpci_add_handlers(struct pci_dev *pdev)
 
     pdev->vpci->handlers = RB_ROOT;
 
+ again:
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
-        rc = vpci_init[i](pdev);
+        if ( priority != vpci_init[i].priority )
+            continue;
+
+        rc = vpci_init[i].init(pdev);
         if ( rc )
             break;
     }
@@ -74,6 +79,11 @@ int xen_vpci_add_handlers(struct pci_dev *pdev)
         }
         xfree(pdev->vpci);
     }
+    else if ( priority )
+    {
+        priority = false;
+        goto again;
+    }
 
     return rc;
 }
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 53443f5164..75564b9d93 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -29,8 +29,17 @@ typedef int (*vpci_write_t)(struct pci_dev *pdev, unsigned int reg,
 
 typedef int (*vpci_register_init_t)(struct pci_dev *dev);
 
-#define REGISTER_VPCI_INIT(x) \
-  static const vpci_register_init_t x##_entry __used_section(".data.vpci") = x
+struct vpci_register_init {
+    vpci_register_init_t init;
+    bool priority;
+};
+
+#define REGISTER_VPCI_INIT(f, p)                                        \
+  static const struct vpci_register_init                                \
+                      x##_entry __used_section(".data.vpci") = {        \
+    .init = f,                                                          \
+    .priority = p,                                                      \
+}
 
 /* Add vPCI handlers to device. */
 int xen_vpci_add_handlers(struct pci_dev *dev);
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 8/9] vpci/msi: add MSI handlers
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
                   ` (6 preceding siblings ...)
  2017-04-20 15:17 ` [PATCH v2 7/9] vpci: add a priority field to the vPCI register initializer Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  2017-04-21  8:38   ` Roger Pau Monne
  2017-04-24 15:31   ` Julien Grall
  2017-04-20 15:17 ` [PATCH v2 9/9] vpci/msix: add MSI-X handlers Roger Pau Monne
  8 siblings, 2 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky, Roger Pau Monne

Add handlers for the MSI control, address, data and mask fields in order to
detect accesses to them and setup the interrupts as requested by the guest.

Note that the pending register is not trapped, and the guest can freely
read/write to it.

Whether Xen is going to provide this functionality to Dom0 (MSI emulation) is
controlled by the "msi" option in the dom0 field. When disabling this option
Xen will hide the MSI capability structure from Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Jan Beulich <jbeulich@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant <paul.durrant@citrix.com>
---
NB: I've only been able to test this with devices using a single MSI interrupt
and no mask register. I will try to find hardware that supports the mask
register and more than one vector, but I cannot make any promises.

If there are doubts about the untested parts we could always force Xen to
report no per-vector masking support and only 1 available vector, but I would
rather avoid doing it.
---
 docs/misc/xen-command-line.markdown |   9 +-
 xen/arch/x86/dom0_build.c           |  12 +-
 xen/arch/x86/hvm/vmsi.c             |  21 ++
 xen/drivers/vpci/Makefile           |   2 +-
 xen/drivers/vpci/capabilities.c     |   7 +-
 xen/drivers/vpci/msi.c              | 469 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/io.h        |   4 +
 xen/include/asm-x86/msi.h           |   2 +
 xen/include/xen/hvm/irq.h           |   1 +
 xen/include/xen/vpci.h              |  26 ++
 10 files changed, 545 insertions(+), 8 deletions(-)
 create mode 100644 xen/drivers/vpci/msi.c

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 450b222734..38a8d05e63 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -660,7 +660,7 @@ affinities to prefer but be not limited to the specified node(s).
 Pin dom0 vcpus to their respective pcpus
 
 ### dom0
-> `= List of [ pvh | shadow ]`
+> `= List of [ pvh | shadow | msi ]`
 
 > Sub-options:
 
@@ -677,6 +677,13 @@ Flag that makes a dom0 boot in PVHv2 mode.
 Flag that makes a dom0 use shadow paging. Only works when "pvh" is
 enabled.
 
+> `msi`
+
+> Default: `true`
+
+Enable or disable (using the `no-` prefix) the MSI emulation inside of
+Xen for a PVH Dom0. Note that this option has no effect on a PV Dom0.
+
 ### dtuart (ARM)
 > `= path [:options]`
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index cc8acad688..01afcf6215 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -176,29 +176,37 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 bool __initdata opt_dom0_shadow;
 #endif
 bool __initdata dom0_pvh;
+bool __initdata dom0_msi = true;
 
 /*
  * List of parameters that affect Dom0 creation:
  *
  *  - pvh               Create a PVHv2 Dom0.
  *  - shadow            Use shadow paging for Dom0.
+ *  - msi               MSI functionality.
  */
 static void __init parse_dom0_param(char *s)
 {
     char *ss;
+    bool enabled;
 
     do {
+        enabled = !!strncmp(s, "no-", 3);
+        if ( !enabled )
+            s += 3;
 
         ss = strchr(s, ',');
         if ( ss )
             *ss = '\0';
 
         if ( !strcmp(s, "pvh") )
-            dom0_pvh = true;
+            dom0_pvh = enabled;
 #ifdef CONFIG_SHADOW_PAGING
         else if ( !strcmp(s, "shadow") )
-            opt_dom0_shadow = true;
+            opt_dom0_shadow = enabled;
 #endif
+        else if ( !strcmp(s, "msi") )
+            dom0_msi = enabled;
 
         s = ss + 1;
     } while ( ss );
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index a36692c313..614d975efe 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -622,3 +622,24 @@ void msix_write_completion(struct vcpu *v)
     if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
+
+unsigned int msi_vector(uint16_t data)
+{
+    return (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+}
+
+unsigned int msi_flags(uint16_t data, uint64_t addr)
+{
+    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
+
+    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
+    dm = (addr >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
+    dest_id = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+
+    return dest_id | (rh << GFLAGS_SHIFT_RH) | (dm << GFLAGS_SHIFT_DM) |
+           (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
+           (trig_mode << GFLAGS_SHIFT_TRG_MODE);
+}
+
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index c3f3085c93..ef4fc6caf3 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o header.o capabilities.o
+obj-y += vpci.o header.o capabilities.o msi.o
diff --git a/xen/drivers/vpci/capabilities.c b/xen/drivers/vpci/capabilities.c
index 204355e673..ad9f45c2e1 100644
--- a/xen/drivers/vpci/capabilities.c
+++ b/xen/drivers/vpci/capabilities.c
@@ -109,7 +109,7 @@ static int vpci_index_capabilities(struct pci_dev *pdev)
     return 0;
 }
 
-static void vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
+void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
 {
     struct vpci_capability *cap;
     uint8_t cap_offset;
@@ -138,9 +138,8 @@ static int vpci_capabilities_init(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    /* Mask MSI and MSI-X capabilities until Xen handles them. */
-    vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
-    vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
+    /* Mask MSI-X capability until Xen handles it. */
+    xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
 
     return 0;
 }
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
new file mode 100644
index 0000000000..aea6c68907
--- /dev/null
+++ b/xen/drivers/vpci/msi.c
@@ -0,0 +1,469 @@
+/*
+ * Handlers for accesses to the MSI capability structure.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <asm/msi.h>
+#include <xen/keyhandler.h>
+
+static void vpci_msi_mask_pirq(int pirq, bool mask)
+{
+        struct pirq *pinfo = pirq_info(current->domain, pirq);
+        struct irq_desc *desc;
+        unsigned long flags;
+        int irq;
+
+        ASSERT(pinfo);
+        irq = pinfo->arch.irq;
+        ASSERT(irq < nr_irqs);
+
+        desc = irq_to_desc(irq);
+        ASSERT(desc);
+
+        spin_lock_irqsave(&desc->lock, flags);
+        guest_mask_msi_irq(desc, mask);
+        spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
+static int vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
+                                 union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    if ( msi->enabled )
+        val->word |= PCI_MSI_FLAGS_ENABLE;
+    if ( msi->masking )
+        val->word |= PCI_MSI_FLAGS_MASKBIT;
+    if ( msi->address64 )
+        val->word |= PCI_MSI_FLAGS_64BIT;
+
+    /* Set multiple message capable. */
+    val->word |= ((fls(msi->max_vectors) - 1) << 1) & PCI_MSI_FLAGS_QMASK;
+
+    /* Set current number of configured vectors. */
+    val->word |= ((fls(msi->guest_vectors) - 1) << 4) & PCI_MSI_FLAGS_QSIZE;
+
+    return 0;
+}
+
+static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+    unsigned int i, vectors = 1 << ((val.word & PCI_MSI_FLAGS_QSIZE) >> 4);
+    int rc;
+
+    if ( vectors > msi->max_vectors )
+        return -EINVAL;
+
+    msi->guest_vectors = vectors;
+
+    if ( !((val.word ^ msi->enabled) & PCI_MSI_FLAGS_ENABLE) )
+        return 0;
+
+    if ( val.word & PCI_MSI_FLAGS_ENABLE )
+    {
+        int index = -1;
+        struct msi_info msi_info = {
+            .seg = pdev->seg,
+            .bus = pdev->bus,
+            .devfn = pdev->devfn,
+            .entry_nr = vectors,
+        };
+
+        ASSERT(!msi->enabled);
+
+        /* Get a PIRQ. */
+        rc = allocate_and_map_msi_pirq(pdev->domain, &index, &msi->pirq,
+                                       &msi_info);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                    PCI_FUNC(pdev->devfn), rc);
+            return rc;
+        }
+
+        ASSERT(msi->pirq != -1);
+        ASSERT(msi->vectors == 0);
+        msi->vectors = vectors;
+
+        for ( i = 0; i < vectors; i++ )
+        {
+            xen_domctl_bind_pt_irq_t bind = {
+                .hvm_domid = DOMID_SELF,
+                .machine_irq = msi->pirq + i,
+                .irq_type = PT_IRQ_TYPE_MSI,
+                .u.msi.gvec = msi_vector(msi->data) + i,
+                .u.msi.gflags = msi_flags(msi->data, msi->address),
+            };
+
+            pcidevs_lock();
+            rc = pt_irq_create_bind(pdev->domain, &bind);
+            if ( rc )
+            {
+                dprintk(XENLOG_ERR,
+                        "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
+                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                        PCI_FUNC(pdev->devfn), msi->pirq + i, rc);
+                spin_lock(&pdev->domain->event_lock);
+                unmap_domain_pirq(pdev->domain, msi->pirq);
+                spin_unlock(&pdev->domain->event_lock);
+                pcidevs_unlock();
+                msi->pirq = -1;
+                msi->vectors = 0;
+                return rc;
+            }
+            pcidevs_unlock();
+        }
+
+        /* Apply the mask bits. */
+        if ( msi->masking )
+        {
+            uint32_t mask = msi->mask;
+
+            while ( mask )
+            {
+                unsigned int i = ffs(mask);
+
+                vpci_msi_mask_pirq(msi->pirq + i, true);
+                __clear_bit(i, &mask);
+            }
+        }
+
+        __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 1);
+        msi->enabled = true;
+    }
+    else
+    {
+        ASSERT(msi->enabled);
+        __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 0);
+
+        for ( i = 0; i < msi->vectors; i++ )
+        {
+            xen_domctl_bind_pt_irq_t bind = {
+                .hvm_domid = DOMID_SELF,
+                .machine_irq = msi->pirq + i,
+                .irq_type = PT_IRQ_TYPE_MSI,
+            };
+
+            pcidevs_lock();
+            pt_irq_destroy_bind(pdev->domain, &bind);
+            pcidevs_unlock();
+        }
+
+        pcidevs_lock();
+        spin_lock(&pdev->domain->event_lock);
+        unmap_domain_pirq(pdev->domain, msi->pirq);
+        spin_unlock(&pdev->domain->event_lock);
+        pcidevs_unlock();
+
+        msi->pirq = -1;
+        msi->vectors = 0;
+        msi->enabled = false;
+    }
+
+    return 0;
+}
+
+/* Handlers for the address field (32bit or low part of a 64bit address). */
+static int vpci_msi_address_read(struct pci_dev *pdev, unsigned int reg,
+                                 union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->double_word = msi->address;
+
+    return 0;
+}
+
+static int vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear low part. */
+    msi->address &= ~GENMASK(31, 0);
+    msi->address |= val.double_word;
+
+    return 0;
+}
+
+/* Handlers for the high part of a 64bit address field. */
+static int vpci_msi_address_upper_read(struct pci_dev *pdev, unsigned int reg,
+                                       union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->double_word = msi->address >> 32;
+
+    return 0;
+}
+
+static int vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
+                                        union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear high part. */
+    msi->address &= ~GENMASK(63, 32);
+    msi->address |= (uint64_t)val.double_word << 32;
+
+    return 0;
+}
+
+/* Handlers for the data field. */
+static int vpci_msi_data_read(struct pci_dev *pdev, unsigned int reg,
+                              union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->word = msi->data;
+
+    return 0;
+}
+
+static int vpci_msi_data_write(struct pci_dev *pdev, unsigned int reg,
+                               union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    msi->data = val.word;
+
+    return 0;
+}
+
+static int vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,
+                              union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->double_word = msi->mask;
+
+    return 0;
+}
+
+static int vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
+                               union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+    uint32_t dmask;
+
+    dmask = msi->mask ^ val.double_word;
+
+    if ( !dmask )
+        return 0;
+
+    while ( dmask && msi->pirq != -1 )
+    {
+        unsigned int i = ffs(dmask);
+
+        vpci_msi_mask_pirq(msi->pirq + i, !test_bit(i, &msi->mask));
+        __clear_bit(i, &dmask);
+    }
+
+    msi->mask = val.double_word;
+    return 0;
+}
+
+static int vpci_init_msi(struct pci_dev *pdev)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msi *msi = NULL;
+    unsigned int msi_offset;
+    uint16_t control;
+    int rc;
+
+    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
+    if ( !msi_offset )
+        return 0;
+
+    if ( !dom0_msi )
+    {
+        xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
+        return 0;
+    }
+
+    msi = xzalloc(struct vpci_msi);
+    if ( !msi )
+        return -ENOMEM;
+
+    control = pci_conf_read16(seg, bus, slot, func,
+                              msi_control_reg(msi_offset));
+
+    rc = xen_vpci_add_register(pdev, vpci_msi_control_read,
+                               vpci_msi_control_write,
+                               msi_control_reg(msi_offset), 2, msi);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI control: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    /* Get the maximum number of vectors the device supports. */
+    msi->max_vectors = multi_msi_capable(control);
+    ASSERT(msi->max_vectors <= 32);
+
+    /* Initial value after reset. */
+    msi->guest_vectors = 1;
+
+    /* No PIRQ bind yet. */
+    msi->pirq = -1;
+
+    if ( is_64bit_address(control) )
+        msi->address64 = true;
+    if ( is_mask_bit_support(control) )
+        msi->masking = true;
+
+    rc = xen_vpci_add_register(pdev, vpci_msi_address_read,
+                               vpci_msi_address_write,
+                               msi_lower_address_reg(msi_offset), 4, msi);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    rc = xen_vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,
+                               msi_data_reg(msi_offset, msi->address64), 2,
+                               msi);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    if ( msi->address64 )
+    {
+        rc = xen_vpci_add_register(pdev, vpci_msi_address_upper_read,
+                                   vpci_msi_address_upper_write,
+                                   msi_upper_address_reg(msi_offset), 4, msi);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
+                    seg, bus, slot, func, rc);
+            goto error;
+        }
+    }
+
+    if ( msi->masking )
+    {
+        rc = xen_vpci_add_register(pdev, vpci_msi_mask_read,
+                                   vpci_msi_mask_write,
+                                   msi_mask_bits_reg(msi_offset,
+                                                     msi->address64), 4, msi);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to add handler for MSI mask: %d\n",
+                    seg, bus, slot, func, rc);
+            goto error;
+        }
+    }
+
+    pdev->vpci->msi = msi;
+
+    return 0;
+
+ error:
+    ASSERT(rc);
+    xfree(msi);
+    return rc;
+}
+
+REGISTER_VPCI_INIT(vpci_init_msi, false);
+
+static void vpci_dump_msi(unsigned char key)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+
+    printk("Guest MSI information:\n");
+
+    for_each_domain ( d )
+    {
+        if ( !has_vpci(d) )
+            continue;
+
+        vpci_lock(d);
+        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list)
+        {
+            uint8_t seg = pdev->seg, bus = pdev->bus;
+            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+            struct vpci_msi *msi = pdev->vpci->msi;
+            uint16_t data;
+            uint64_t addr;
+
+            if ( !msi )
+                continue;
+
+            printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+
+            printk("Enabled: %u Supports masking: %u 64-bit addresses: %u\n",
+                   msi->enabled, msi->masking, msi->address64);
+            printk("Max vectors: %u guest vectors: %u enabled vectors: %u\n",
+                   msi->max_vectors, msi->guest_vectors, msi->vectors);
+
+            data = msi->data;
+            addr = msi->address;
+            printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq=%d\n",
+                   (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT,
+                   data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+                   data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+                   data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+                   addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+                   addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",
+                   (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT,
+                   msi->pirq);
+
+            if ( msi->masking )
+                printk("mask=%#032x\n", msi->mask);
+            printk("\n");
+        }
+        vpci_unlock(d);
+    }
+}
+
+static int __init vpci_msi_setup_keyhandler(void)
+{
+    register_keyhandler('Z', vpci_dump_msi, "dump guest MSI state", 1);
+    return 0;
+}
+__initcall(vpci_msi_setup_keyhandler);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 0434aca706..899e37ae0f 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -126,6 +126,10 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
 void msix_write_completion(struct vcpu *);
 void msixtbl_init(struct domain *d);
 
+/* Get the vector/flags from a MSI address/data fields. */
+unsigned int msi_vector(uint16_t data);
+unsigned int msi_flags(uint16_t data, uint64_t addr);
+
 enum stdvga_cache_state {
     STDVGA_CACHE_UNINITIALIZED,
     STDVGA_CACHE_ENABLED,
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index a5de6a1328..dcbec8cf04 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -251,4 +251,6 @@ void ack_nonmaskable_msi_irq(struct irq_desc *);
 void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
 void set_msi_affinity(struct irq_desc *, const cpumask_t *);
 
+extern bool dom0_msi;
+
 #endif /* __ASM_MSI_H */
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 0d2c72c109..37dfb3b6c5 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -58,6 +58,7 @@ struct dev_intx_gsi_link {
 #define VMSI_TRIG_MODE    0x8000
 
 #define GFLAGS_SHIFT_RH             8
+#define GFLAGS_SHIFT_DM             9
 #define GFLAGS_SHIFT_DELIV_MODE     12
 #define GFLAGS_SHIFT_TRG_MODE       15
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 75564b9d93..277e860d25 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -88,9 +88,35 @@ struct vpci {
 
     /* List of capabilities supported by the device. */
     struct list_head cap_list;
+
+    /* MSI data. */
+    struct vpci_msi {
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_vectors;
+        /* Current guest-written number of vectors. */
+        unsigned int guest_vectors;
+        /* Number of vectors configured. */
+        unsigned int vectors;
+        /* Address and data fields. */
+        uint64_t address;
+        uint16_t data;
+        /* PIRQ */
+        int pirq;
+        /* Mask bitfield. */
+        uint32_t mask;
+        /* MSI enabled? */
+        bool enabled;
+        /* Supports per-vector masking? */
+        bool masking;
+        /* 64-bit address capable? */
+        bool address64;
+    } *msi;
 #endif
 };
 
+/* Mask a PCI capability. */
+void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id);
+
 #endif
 
 /*
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 9/9] vpci/msix: add MSI-X handlers
  2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
                   ` (7 preceding siblings ...)
  2017-04-20 15:17 ` [PATCH v2 8/9] vpci/msi: add MSI handlers Roger Pau Monne
@ 2017-04-20 15:17 ` Roger Pau Monne
  8 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-20 15:17 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky, Roger Pau Monne

Add handlers for accesses to the MSI-X message control field on the PCI
configuration space, and traps for accesses to the memory region that contains
the MSI-X table. This traps detect attempts from the guest to configure MSI-X
interrupts and properly sets them up.

Note that accesses to the Table Offset, Table BIR, PBA Offset, PBA BIR and the
PBA memory region itself are not trapped by Xen at the moment.

Whether Xen is going to provide this functionality to Dom0 (MSI-X emulation) is
controlled by the "msix" option in the dom0 field. When disabling this option
Xen will hide the MSI-X capability structure from Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Jan Beulich <jbeulich@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>
---
This patch has been tested with devices using both a single MSI-X entry and
multiple ones.
---
 docs/misc/xen-command-line.markdown |   7 +
 xen/arch/x86/dom0_build.c           |   4 +
 xen/arch/x86/hvm/hvm.c              |   1 +
 xen/drivers/vpci/Makefile           |   2 +-
 xen/drivers/vpci/capabilities.c     |  16 +-
 xen/drivers/vpci/header.c           |  38 ++-
 xen/drivers/vpci/msix.c             | 590 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h    |   3 +
 xen/include/asm-x86/msi.h           |   1 +
 xen/include/xen/vpci.h              |  27 ++
 10 files changed, 669 insertions(+), 20 deletions(-)
 create mode 100644 xen/drivers/vpci/msix.c

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 38a8d05e63..2db2b49cb6 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -684,6 +684,13 @@ enabled.
 Enable or disable (using the `no-` prefix) the MSI emulation inside of
 Xen for a PVH Dom0. Note that this option has no effect on a PV Dom0.
 
+> `msix`
+
+> Default: `true`
+
+Enable or disable (using the `no-` prefix) the MSI-X emulation inside of
+Xen for a PVH Dom0. Note that this option has no effect on a PV Dom0.
+
 ### dtuart (ARM)
 > `= path [:options]`
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 01afcf6215..3996d9dd12 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -177,6 +177,7 @@ bool __initdata opt_dom0_shadow;
 #endif
 bool __initdata dom0_pvh;
 bool __initdata dom0_msi = true;
+bool __initdata dom0_msix = true;
 
 /*
  * List of parameters that affect Dom0 creation:
@@ -184,6 +185,7 @@ bool __initdata dom0_msi = true;
  *  - pvh               Create a PVHv2 Dom0.
  *  - shadow            Use shadow paging for Dom0.
  *  - msi               MSI functionality.
+ *  - msix              MSI-X functionality.
  */
 static void __init parse_dom0_param(char *s)
 {
@@ -207,6 +209,8 @@ static void __init parse_dom0_param(char *s)
 #endif
         else if ( !strcmp(s, "msi") )
             dom0_msi = enabled;
+        else if ( !strcmp(s, "msix") )
+            dom0_msix = enabled;
 
         s = ss + 1;
     } while ( ss );
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ef3ad2a615..3a3296ffe7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -614,6 +614,7 @@ int hvm_domain_initialise(struct domain *d)
     INIT_LIST_HEAD(&d->arch.hvm_domain.write_map.list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.g2m_ioport_list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.ecam_regions);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.msix_tables);
 
     hvm_init_cacheattr_region_list(d);
 
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index ef4fc6caf3..55398d4428 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o header.o capabilities.o msi.o
+obj-y += vpci.o header.o capabilities.o msi.o msix.o
diff --git a/xen/drivers/vpci/capabilities.c b/xen/drivers/vpci/capabilities.c
index ad9f45c2e1..7166ccb502 100644
--- a/xen/drivers/vpci/capabilities.c
+++ b/xen/drivers/vpci/capabilities.c
@@ -130,21 +130,7 @@ void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
     }
 }
 
-static int vpci_capabilities_init(struct pci_dev *pdev)
-{
-    int rc;
-
-    rc = vpci_index_capabilities(pdev);
-    if ( rc )
-        return rc;
-
-    /* Mask MSI-X capability until Xen handles it. */
-    xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
-
-    return 0;
-}
-
-REGISTER_VPCI_INIT(vpci_capabilities_init, true);
+REGISTER_VPCI_INIT(vpci_index_capabilities, true);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index d77d82455f..d1e3dfb9f0 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -32,15 +32,45 @@ static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
         paddr_t gaddr = map ? header->bars[i].gaddr
                             : header->bars[i].mapped_addr;
         paddr_t paddr = header->bars[i].paddr;
+        size_t size = header->bars[i].size;
 
         if ( header->bars[i].type != VPCI_BAR_MEM &&
              header->bars[i].type != VPCI_BAR_MEM64_LO )
             continue;
 
-        rc = modify_mmio(pdev->domain, PFN_DOWN(gaddr), PFN_DOWN(paddr),
-                         PFN_UP(header->bars[i].size), map);
-        if ( rc )
-            break;
+        if ( pdev->vpci->msix != NULL && pdev->vpci->msix->bir == i )
+        {
+            /* There's an MSI-X table inside of this BAR. */
+            paddr_t msix_gaddr = gaddr + pdev->vpci->msix->offset;
+            paddr_t msix_paddr = paddr + pdev->vpci->msix->offset;
+            size_t msix_size = pdev->vpci->msix->max_entries *
+                               PCI_MSIX_ENTRY_SIZE;
+
+            ASSERT(IS_ALIGNED(msix_gaddr, PAGE_SIZE) &&
+                   IS_ALIGNED(msix_paddr, PAGE_SIZE));
+
+            rc = modify_mmio(pdev->domain, PFN_DOWN(gaddr), PFN_DOWN(paddr),
+                             PFN_DOWN(msix_paddr - paddr), map);
+            if ( rc )
+                break;
+
+            rc = modify_mmio(pdev->domain, PFN_UP(msix_gaddr + msix_size),
+                             PFN_UP(msix_paddr + msix_size),
+                             PFN_UP(paddr + size -
+                                    round_pgup(msix_paddr + msix_size)), map);
+            if ( rc )
+                break;
+
+            if ( map )
+                pdev->vpci->msix->addr = msix_gaddr;
+        }
+        else
+        {
+            rc = modify_mmio(pdev->domain, PFN_DOWN(gaddr), PFN_DOWN(paddr),
+                             PFN_UP(size), map);
+            if ( rc )
+                break;
+        }
 
         header->bars[i].mapped_addr = map ? gaddr : 0;
     }
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
new file mode 100644
index 0000000000..339df244cf
--- /dev/null
+++ b/xen/drivers/vpci/msix.c
@@ -0,0 +1,590 @@
+/*
+ * Handlers for accesses to the MSI-X capability structure and the memory
+ * region.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <asm/msi.h>
+#include <xen/p2m-common.h>
+#include <xen/keyhandler.h>
+
+#define MSIX_SIZE(num) (offsetof(struct vpci_msix, entries[num]))
+
+static int vpci_msix_control_read(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val *val, void *data)
+{
+    struct vpci_msix *msix = data;
+
+    val->word = (msix->max_entries - 1) & PCI_MSIX_FLAGS_QSIZE;
+    val->word |= msix->enabled ? PCI_MSIX_FLAGS_ENABLE : 0;
+    val->word |= msix->masked ? PCI_MSIX_FLAGS_MASKALL : 0;
+
+    return 0;
+}
+
+static int vpci_msix_update_entry(struct pci_dev *pdev,
+                                  struct vpci_msix_entry *entry)
+{
+    struct domain *d = current->domain;
+    xen_domctl_bind_pt_irq_t bind = {
+        .hvm_domid = DOMID_SELF,
+        .irq_type = PT_IRQ_TYPE_MSI,
+        .u.msi.gvec = msi_vector(entry->data),
+        .u.msi.gflags = msi_flags(entry->data, entry->addr),
+    };
+    int rc;
+
+    if ( entry->pirq == -1 )
+    {
+        unsigned int bir = pdev->vpci->msix->bir;
+        struct msi_info msi_info = {
+            .seg = pdev->seg,
+            .bus = pdev->bus,
+            .devfn = pdev->devfn,
+            .table_base = pdev->vpci->header.bars[bir].paddr,
+            .entry_nr = entry->nr,
+        };
+        int index = -1;
+
+        /* Map PIRQ. */
+        rc = allocate_and_map_msi_pirq(pdev->domain, &index, &entry->pirq,
+                                       &msi_info);
+        if ( rc )
+        {
+            gdprintk(XENLOG_ERR,
+                     "%04x:%02x:%02x.%u: unable to map MSI-X PIRQ entry %u: %d\n",
+                     pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), entry->nr, rc);
+            return rc;
+        }
+    }
+
+    bind.machine_irq = entry->pirq;
+    pcidevs_lock();
+    rc = pt_irq_create_bind(d, &bind);
+    if ( rc )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: unable to create MSI-X bind %u: %d\n",
+                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                 PCI_FUNC(pdev->devfn), entry->nr, rc);
+        spin_lock(&pdev->domain->event_lock);
+        unmap_domain_pirq(pdev->domain, entry->pirq);
+        spin_unlock(&pdev->domain->event_lock);
+        entry->pirq = -1;
+    }
+    pcidevs_unlock();
+
+    return rc;
+}
+
+static int vpci_msix_disable_entry(struct vpci_msix_entry *entry)
+{
+    xen_domctl_bind_pt_irq_t bind = {
+        .hvm_domid = DOMID_SELF,
+        .irq_type = PT_IRQ_TYPE_MSI,
+        .machine_irq = entry->pirq,
+    };
+    int rc;
+
+    ASSERT(entry->pirq != -1);
+
+    pcidevs_lock();
+    rc = pt_irq_destroy_bind(current->domain, &bind);
+    if ( rc )
+    {
+        pcidevs_unlock();
+        return rc;
+    }
+
+    spin_lock(&current->domain->event_lock);
+    unmap_domain_pirq(current->domain, entry->pirq);
+    spin_unlock(&current->domain->event_lock);
+    pcidevs_unlock();
+
+    entry->pirq = -1;
+
+    return 0;
+}
+
+static void vpci_msix_mask_entry(struct vpci_msix_entry *entry, bool mask)
+{
+    unsigned int irq;
+    struct pirq *pirq;
+    struct irq_desc *desc;
+    unsigned long flags;
+
+    ASSERT(entry->pirq != -1);
+    pirq = pirq_info(current->domain, entry->pirq);
+    ASSERT(pirq);
+
+    irq = pirq->arch.irq;
+    ASSERT(irq < nr_irqs);
+
+    desc = irq_to_desc(irq);
+    ASSERT(desc);
+
+    spin_lock_irqsave(&desc->lock, flags);
+    guest_mask_msi_irq(desc, mask);
+    spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+static int vpci_msix_control_write(struct pci_dev *pdev, unsigned int reg,
+                                   union vpci_val val, void *data)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msix *msix = data;
+    bool new_masked, new_enabled;
+    unsigned int i;
+    uint32_t data32;
+    int rc;
+
+    new_masked = val.word & PCI_MSIX_FLAGS_MASKALL;
+    new_enabled = val.word & PCI_MSIX_FLAGS_ENABLE;
+
+    if ( new_enabled != msix->enabled && new_enabled )
+    {
+        /* MSI-X enabled. */
+        for ( i = 0; i < msix->max_entries; i++ )
+        {
+            if ( msix->entries[i].masked )
+                continue;
+
+            rc = vpci_msix_update_entry(pdev, &msix->entries[i]);
+            if ( rc )
+            {
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to update entry %u: %d\n",
+                         seg, bus, slot, func, i, rc);
+                return rc;
+            }
+
+            vpci_msix_mask_entry(&msix->entries[i], false);
+        }
+    }
+    else if ( new_enabled != msix->enabled && !new_enabled )
+    {
+        /* MSI-X disabled. */
+        for ( i = 0; i < msix->max_entries; i++ )
+        {
+            if ( msix->entries[i].pirq == -1 )
+                continue;
+
+            rc = vpci_msix_disable_entry(&msix->entries[i]);
+            if ( rc )
+            {
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to disable entry %u: %d\n",
+                         seg, bus, slot, func, i, rc);
+                return rc;
+            }
+        }
+    }
+
+    data32 = val.word;
+    if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
+         pci_msi_conf_write_intercept(pdev, reg, 2, &data32) >= 0 )
+        pci_conf_write16(seg, bus, slot, func, reg, data32);
+
+    msix->masked = new_masked;
+    msix->enabled = new_enabled;
+
+    return 0;
+}
+
+static struct vpci_msix *vpci_msix_find(struct domain *d, unsigned long addr)
+{
+    struct vpci_msix *msix;
+
+    ASSERT(vpci_locked(d));
+    list_for_each_entry ( msix,  &d->arch.hvm_domain.msix_tables, next )
+        if ( msix->pdev->vpci->header.command & PCI_COMMAND_MEMORY &&
+             addr >= msix->addr &&
+             addr < msix->addr + msix->max_entries * PCI_MSIX_ENTRY_SIZE )
+            return msix;
+
+    return NULL;
+}
+
+static int vpci_msix_table_accept(struct vcpu *v, unsigned long addr)
+{
+    int found;
+
+    vpci_lock(v->domain);
+    found = !!vpci_msix_find(v->domain, addr);
+    vpci_unlock(v->domain);
+
+    return found;
+}
+
+static int vpci_msix_access_check(struct pci_dev *pdev, unsigned long addr,
+                                  unsigned int len)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+
+
+    /* Only allow 32/64b accesses. */
+    if ( len != 4 && len != 8 )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: invalid MSI-X table access size: %u\n",
+                 seg, bus, slot, func, len);
+        return -EINVAL;
+    }
+
+    /* Do no allow accesses that span across multiple entries. */
+    if ( (addr & (PCI_MSIX_ENTRY_SIZE - 1)) + len > PCI_MSIX_ENTRY_SIZE )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: MSI-X access crosses entry boundary\n",
+                 seg, bus, slot, func);
+        return -EINVAL;
+    }
+
+    /*
+     * Only allow 64b accesses to the low message address field.
+     *
+     * NB: this is more restrictive than the specification, that allows 64b
+     * accesses to other fields under certain circumstances, so this check and
+     * the code will have to be fixed in order to fully comply with the
+     * specification.
+     */
+    if ( (addr & (PCI_MSIX_ENTRY_SIZE - 1)) != 0 && len != 4 )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: 64bit MSI-X table access to 32bit field"
+                 " (offset: %#lx len: %u)\n", seg, bus, slot, func,
+                 addr & (PCI_MSIX_ENTRY_SIZE - 1), len);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static struct vpci_msix_entry *vpci_msix_get_entry(struct vpci_msix *msix,
+                                                   unsigned long addr)
+{
+    return &msix->entries[(addr - msix->addr) / PCI_MSIX_ENTRY_SIZE];
+}
+
+static int vpci_msix_table_read(struct vcpu *v, unsigned long addr,
+                                unsigned int len, unsigned long *data)
+{
+    struct vpci_msix *msix;
+    struct vpci_msix_entry *entry;
+    unsigned int offset;
+
+    vpci_lock(v->domain);
+    msix = vpci_msix_find(v->domain, addr);
+    if ( !msix )
+    {
+        vpci_unlock(v->domain);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    if ( vpci_msix_access_check(msix->pdev, addr, len) )
+    {
+        vpci_unlock(v->domain);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Get the table entry and offset. */
+    entry = vpci_msix_get_entry(msix, addr);
+    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+
+    switch ( offset )
+    {
+    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
+        *data = entry->addr;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
+        *data = entry->addr >> 32;
+        break;
+    case PCI_MSIX_ENTRY_DATA_OFFSET:
+        *data = entry->data;
+        break;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
+        *data = entry->masked ? PCI_MSIX_VECTOR_BITMASK : 0;
+        break;
+    default:
+        BUG();
+    }
+    vpci_unlock(v->domain);
+
+    return X86EMUL_OKAY;
+}
+
+static int vpci_msix_table_write(struct vcpu *v, unsigned long addr,
+                                 unsigned int len, unsigned long data)
+{
+    struct vpci_msix *msix;
+    struct vpci_msix_entry *entry;
+    unsigned int offset;
+
+    vpci_lock(v->domain);
+    msix = vpci_msix_find(v->domain, addr);
+    if ( !msix )
+    {
+        vpci_unlock(v->domain);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    if ( vpci_msix_access_check(msix->pdev, addr, len) )
+    {
+        vpci_unlock(v->domain);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Get the table entry and offset. */
+    entry = vpci_msix_get_entry(msix, addr);
+    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+
+    switch ( offset )
+    {
+    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
+        if ( len == 8 )
+        {
+            entry->addr = data;
+            break;
+        }
+        entry->addr &= ~GENMASK(31, 0);
+        entry->addr |= data;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
+        entry->addr &= ~GENMASK(63, 32);
+        entry->addr |= data << 32;
+        break;
+    case PCI_MSIX_ENTRY_DATA_OFFSET:
+        entry->data = data;
+        break;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
+    {
+        bool new_masked = data & PCI_MSIX_VECTOR_BITMASK;
+        struct pci_dev *pdev = msix->pdev;
+        int rc;
+
+        if ( !msix->enabled )
+        {
+            ASSERT(entry->pirq == -1);
+            entry->masked = new_masked;
+            break;
+        }
+
+        if ( new_masked != entry->masked && !new_masked )
+        {
+            /* Unmasking an entry, update it. */
+            rc = vpci_msix_update_entry(msix->pdev, entry);
+            if ( rc )
+            {
+                vpci_unlock(v->domain);
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to update entry %u: %d\n",
+                         pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), entry->nr, rc);
+                return X86EMUL_UNHANDLEABLE;
+            }
+        }
+
+        vpci_msix_mask_entry(entry, new_masked);
+        entry->masked = new_masked;
+
+        break;
+    }
+    default:
+        BUG();
+    }
+    vpci_unlock(v->domain);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_mmio_ops vpci_msix_table_ops = {
+    .check = vpci_msix_table_accept,
+    .read = vpci_msix_table_read,
+    .write = vpci_msix_table_write,
+};
+
+static int vpci_init_msix(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msix *msix;
+    unsigned int msix_offset, i, max_entries;
+    paddr_t msix_paddr;
+    uint16_t control;
+    int rc;
+
+    msix_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX);
+    if ( !msix_offset )
+        return 0;
+
+    if ( !dom0_msix )
+    {
+        xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
+        return 0;
+    }
+
+    control = pci_conf_read16(seg, bus, slot, func,
+                              msix_control_reg(msix_offset));
+
+    /* Get the maximum number of vectors the device supports. */
+    max_entries = msix_table_size(control);
+    if ( !max_entries )
+        return 0;
+
+    msix = xzalloc_bytes(MSIX_SIZE(max_entries));
+    if ( !msix )
+        return -ENOMEM;
+
+    msix->max_entries = max_entries;
+    msix->pdev = pdev;
+
+    /* Find the MSI-X table address. */
+    msix->offset = pci_conf_read32(seg, bus, slot, func,
+                                   msix_table_offset_reg(msix_offset));
+    msix->bir = msix->offset & PCI_MSIX_BIRMASK;
+    msix->offset &= ~PCI_MSIX_BIRMASK;
+
+    ASSERT(pdev->vpci->header.bars[msix->bir].type == VPCI_BAR_MEM ||
+           pdev->vpci->header.bars[msix->bir].type == VPCI_BAR_MEM64_LO);
+    msix->addr = pdev->vpci->header.bars[msix->bir].mapped_addr + msix->offset;
+    msix_paddr = pdev->vpci->header.bars[msix->bir].paddr + msix->offset;
+
+    for ( i = 0; i < msix->max_entries; i++)
+    {
+        msix->entries[i].masked = true;
+        msix->entries[i].nr = i;
+        msix->entries[i].pirq = -1;
+    }
+
+    if ( list_empty(&d->arch.hvm_domain.msix_tables) )
+        register_mmio_handler(d, &vpci_msix_table_ops);
+
+    list_add(&msix->next, &d->arch.hvm_domain.msix_tables);
+
+    rc = xen_vpci_add_register(pdev, vpci_msix_control_read,
+                               vpci_msix_control_write,
+                               msix_control_reg(msix_offset), 2, msix);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI-X control: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    if ( pdev->vpci->header.command & PCI_COMMAND_MEMORY )
+    {
+        /* Unmap this memory from the guest. */
+        rc = modify_mmio(pdev->domain, PFN_DOWN(msix->addr),
+                         PFN_DOWN(msix_paddr),
+                         PFN_UP(msix->max_entries * PCI_MSIX_ENTRY_SIZE),
+                         false);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: unable to unmap MSI-X BAR region: %d\n",
+                    seg, bus, slot, func, rc);
+            goto error;
+        }
+    }
+
+    pdev->vpci->msix = msix;
+
+    return 0;
+
+ error:
+    ASSERT(rc);
+    xfree(msix);
+    return rc;
+}
+
+REGISTER_VPCI_INIT(vpci_init_msix, false);
+
+static void vpci_dump_msix(unsigned char key)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+
+    printk("Guest MSI-X information:\n");
+
+    for_each_domain ( d )
+    {
+        if ( !has_vpci(d) )
+            continue;
+
+        vpci_lock(d);
+        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list)
+        {
+            uint8_t seg = pdev->seg, bus = pdev->bus;
+            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+            struct vpci_msix *msix = pdev->vpci->msix;
+            unsigned int i;
+
+            if ( !msix )
+                continue;
+
+            printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+
+            printk("Max entries: %u maskall: %u enabled: %u\n",
+                   msix->max_entries, msix->masked, msix->enabled);
+
+            printk("Guest entries:\n");
+            for ( i = 0; i < msix->max_entries; i++ )
+            {
+                struct vpci_msix_entry *entry = &msix->entries[i];
+                uint32_t data = entry->data;
+                uint64_t addr = entry->addr;
+
+                printk("%4u vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu mask=%u pirq=%d\n",
+                       i,
+                       (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT,
+                       data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+                       data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+                       data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+                       addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+                       addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",
+                       (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT,
+                       entry->masked, entry->pirq);
+            }
+            printk("\n");
+        }
+        vpci_unlock(d);
+    }
+}
+
+static int __init vpci_msix_setup_keyhandler(void)
+{
+    register_keyhandler('X', vpci_dump_msix, "dump guest MSI-X state", 1);
+    return 0;
+}
+__initcall(vpci_msix_setup_keyhandler);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index ce710496c7..8f5043e8fb 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -197,6 +197,9 @@ struct hvm_domain {
     /* List of ECAM (MMCFG) regions trapped by Xen. */
     struct list_head ecam_regions;
 
+    /* List of MSI-X tables. */
+    struct list_head msix_tables;
+
     /* List of permanently write-mapped pages. */
     struct {
         spinlock_t lock;
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index dcbec8cf04..bd40efa5a6 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -252,5 +252,6 @@ void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
 void set_msi_affinity(struct irq_desc *, const cpumask_t *);
 
 extern bool dom0_msi;
+extern bool dom0_msix;
 
 #endif /* __ASM_MSI_H */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 277e860d25..339cb347ee 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -111,6 +111,33 @@ struct vpci {
         /* 64-bit address capable? */
         bool address64;
     } *msi;
+
+    /* MSI-X data. */
+    struct vpci_msix {
+        struct pci_dev *pdev;
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_entries;
+        /* MSI-X table offset. */
+        unsigned int offset;
+        /* MSI-X table BIR. */
+        unsigned int bir;
+        /* Table addr. */
+        paddr_t addr;
+        /* MSI-X enabled? */
+        bool enabled;
+        /* Masked? */
+        bool masked;
+        /* List link. */
+        struct list_head next;
+        /* Entries. */
+        struct vpci_msix_entry {
+                unsigned int nr;
+                uint64_t addr;
+                uint32_t data;
+                bool masked;
+                int pirq;
+          } entries[];
+    } *msix;
 #endif
 };
 
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/9] vpci/msi: add MSI handlers
  2017-04-20 15:17 ` [PATCH v2 8/9] vpci/msi: add MSI handlers Roger Pau Monne
@ 2017-04-21  8:38   ` Roger Pau Monne
  2017-04-24 15:31   ` Julien Grall
  1 sibling, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-21  8:38 UTC (permalink / raw)
  To: xen-devel, konrad.wilk, boris.ostrovsky
  Cc: Andrew Cooper, Paul Durrant, Jan Beulich

(Adding maintainers to the Cc...)

On Thu, Apr 20, 2017 at 04:17:42PM +0100, Roger Pau Monne wrote:
[...]
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 75564b9d93..277e860d25 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -88,9 +88,35 @@ struct vpci {
>  
>      /* List of capabilities supported by the device. */
>      struct list_head cap_list;
> +
> +    /* MSI data. */
> +    struct vpci_msi {
> +        /* Maximum number of vectors supported by the device. */
> +        unsigned int max_vectors;
> +        /* Current guest-written number of vectors. */
> +        unsigned int guest_vectors;
> +        /* Number of vectors configured. */
> +        unsigned int vectors;
> +        /* Address and data fields. */
> +        uint64_t address;
> +        uint16_t data;
> +        /* PIRQ */
> +        int pirq;
> +        /* Mask bitfield. */
> +        uint32_t mask;
> +        /* MSI enabled? */
> +        bool enabled;

I've realized that the enabled field is not needed, just checking if pirq != -1
is enough to know if MSIs are enabled or not, so I've folded the following diff
into this patch which removes the enabled field, no functional change.

---8<---
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index aea6c68907..329945b30f 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -46,7 +46,7 @@ static int vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
 {
     struct vpci_msi *msi = data;
 
-    if ( msi->enabled )
+    if ( msi->pirq != -1 )
         val->word |= PCI_MSI_FLAGS_ENABLE;
     if ( msi->masking )
         val->word |= PCI_MSI_FLAGS_MASKBIT;
@@ -74,7 +74,7 @@ static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
 
     msi->guest_vectors = vectors;
 
-    if ( !((val.word ^ msi->enabled) & PCI_MSI_FLAGS_ENABLE) )
+    if ( !!(val.word & PCI_MSI_FLAGS_ENABLE) == (msi->pirq != -1) )
         return 0;
 
     if ( val.word & PCI_MSI_FLAGS_ENABLE )
@@ -87,7 +87,7 @@ static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
             .entry_nr = vectors,
         };
 
-        ASSERT(!msi->enabled);
+        ASSERT(msi->pirq == -1);
 
         /* Get a PIRQ. */
         rc = allocate_and_map_msi_pirq(pdev->domain, &index, &msi->pirq,
@@ -149,11 +149,10 @@ static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
 
         __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
                          PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 1);
-        msi->enabled = true;
     }
     else
     {
-        ASSERT(msi->enabled);
+        ASSERT(msi->pirq != -1);
         __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
                          PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 0);
 
@@ -178,7 +177,6 @@ static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
 
         msi->pirq = -1;
         msi->vectors = 0;
-        msi->enabled = false;
     }
 
     return 0;
@@ -426,7 +424,7 @@ static void vpci_dump_msi(unsigned char key)
             printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
 
             printk("Enabled: %u Supports masking: %u 64-bit addresses: %u\n",
-                   msi->enabled, msi->masking, msi->address64);
+                   msi->pirq != -1, msi->masking, msi->address64);
             printk("Max vectors: %u guest vectors: %u enabled vectors: %u\n",
                    msi->max_vectors, msi->guest_vectors, msi->vectors);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 277e860d25..ad5347b118 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -100,12 +100,10 @@ struct vpci {
         /* Address and data fields. */
         uint64_t address;
         uint16_t data;
-        /* PIRQ */
+        /* PIRQ (if this field is different than -1, MSIs are enabled) */
         int pirq;
         /* Mask bitfield. */
         uint32_t mask;
-        /* MSI enabled? */
-        bool enabled;
         /* Supports per-vector masking? */
         bool masking;
         /* 64-bit address capable? */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-20 15:17 ` [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
@ 2017-04-21 16:07   ` Paul Durrant
  2017-04-24  9:09     ` Roger Pau Monne
  2017-04-21 16:23   ` Paul Durrant
  1 sibling, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2017-04-21 16:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson,
	boris.ostrovsky, Roger Pau Monne

> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: 20 April 2017 16:18
> To: xen-devel@lists.xenproject.org
> Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau Monne
> <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses
> to the PCI config space
> 
> This functionality is going to reside in vpci.c (and the corresponding vpci.h
> header), and should be arch-agnostic. The handlers introduced in this patch
> setup the basic functionality required in order to trap accesses to the PCI
> config space, and allow decoding the address and finding the corresponding
> handler that should handle the access (although no handlers are
> implemented).
> 
> Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
> inside of a x86 HVM file, since that's not shared with other arches.
> 
> A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
> whether
> a domain should use the newly introduced vPCI handlers, this is only enabled
> for PVH Dom0 at the moment.
> 
> A very simple user-space test is also provided, so that the basic functionality
> of the vPCI traps can be asserted. This has been proven quite helpful during
> development, since the logic to handle partial accesses or accesses that
> expand
> across multiple registers is not trivial.
> 
> The handlers for the registers are added to a red-black tree, that indexes
> them
> based on their offset. Since Xen needs to handle partial accesses to the
> registers and access that expand across multiple registers the logic in
> xen_vpci_{read/write} is kind of convoluted, I've tried to properly comment
> it
> in order to make it easier to understand.
> 

Since config space is not exactly huge, I'm wondering why you used an r-b tree rather than a direct map from register to handler?

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-20 15:17 ` [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
  2017-04-21 16:07   ` Paul Durrant
@ 2017-04-21 16:23   ` Paul Durrant
  2017-04-24  9:42     ` Roger Pau Monne
  1 sibling, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2017-04-21 16:23 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson,
	boris.ostrovsky, Roger Pau Monne

> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: 20 April 2017 16:18
> To: xen-devel@lists.xenproject.org
> Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau Monne
> <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses
> to the PCI config space
> 
> This functionality is going to reside in vpci.c (and the corresponding vpci.h
> header), and should be arch-agnostic. The handlers introduced in this patch
> setup the basic functionality required in order to trap accesses to the PCI
> config space, and allow decoding the address and finding the corresponding
> handler that should handle the access (although no handlers are
> implemented).
> 
> Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
> inside of a x86 HVM file, since that's not shared with other arches.
> 
> A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
> whether
> a domain should use the newly introduced vPCI handlers, this is only enabled
> for PVH Dom0 at the moment.
> 
> A very simple user-space test is also provided, so that the basic functionality
> of the vPCI traps can be asserted. This has been proven quite helpful during
> development, since the logic to handle partial accesses or accesses that
> expand
> across multiple registers is not trivial.
> 
> The handlers for the registers are added to a red-black tree, that indexes
> them
> based on their offset. Since Xen needs to handle partial accesses to the
> registers and access that expand across multiple registers the logic in
> xen_vpci_{read/write} is kind of convoluted, I've tried to properly comment
> it
> in order to make it easier to understand.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Paul Durrant <paul.durrant@citrix.com>
> ---
> Changes since v1:
>  - Allow access to cross a word-boundary.
>  - Add locking.
>  - Add cleanup to xen_vpci_add_handlers in case of failure.
> ---
>  .gitignore                        |   4 +
>  tools/libxl/libxl_x86.c           |   2 +-
>  tools/tests/Makefile              |   1 +
>  tools/tests/vpci/Makefile         |  45 ++++
>  tools/tests/vpci/emul.h           | 107 +++++++++
>  tools/tests/vpci/main.c           | 206 +++++++++++++++++
>  xen/arch/arm/xen.lds.S            |   3 +
>  xen/arch/x86/domain.c             |  18 +-
>  xen/arch/x86/hvm/hvm.c            |   2 +
>  xen/arch/x86/hvm/io.c             | 135 +++++++++++
>  xen/arch/x86/setup.c              |   3 +-
>  xen/arch/x86/xen.lds.S            |   3 +
>  xen/drivers/Makefile              |   2 +-
>  xen/drivers/passthrough/pci.c     |   3 +
>  xen/drivers/vpci/Makefile         |   1 +
>  xen/drivers/vpci/vpci.c           | 474
> ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h      |   1 +
>  xen/include/asm-x86/hvm/domain.h  |   3 +
>  xen/include/asm-x86/hvm/io.h      |   3 +
>  xen/include/public/arch-x86/xen.h |   5 +-
>  xen/include/xen/pci.h             |   4 +
>  xen/include/xen/vpci.h            |  66 ++++++
>  22 files changed, 1083 insertions(+), 8 deletions(-)
>  create mode 100644 tools/tests/vpci/Makefile
>  create mode 100644 tools/tests/vpci/emul.h
>  create mode 100644 tools/tests/vpci/main.c
>  create mode 100644 xen/drivers/vpci/Makefile
>  create mode 100644 xen/drivers/vpci/vpci.c
>  create mode 100644 xen/include/xen/vpci.h
> 
> diff --git a/.gitignore b/.gitignore
> index 74747cb7e7..ebafba25b5 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -236,6 +236,10 @@ tools/tests/regression/build/*
>  tools/tests/regression/downloads/*
>  tools/tests/mem-sharing/memshrtool
>  tools/tests/mce-test/tools/xen-mceinj
> +tools/tests/vpci/rbtree.[hc]
> +tools/tests/vpci/vpci.[hc]
> +tools/tests/vpci/test_vpci.out
> +tools/tests/vpci/test_vpci
>  tools/xcutils/lsevtchn
>  tools/xcutils/readnotes
>  tools/xenbackendd/_paths.h
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 455f6f0bed..dd7fc78a99 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -11,7 +11,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>      if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) {
>          if (d_config->b_info.device_model_version !=
>              LIBXL_DEVICE_MODEL_VERSION_NONE) {
> -            xc_config->emulation_flags = XEN_X86_EMU_ALL;
> +            xc_config->emulation_flags = (XEN_X86_EMU_ALL &
> ~XEN_X86_EMU_VPCI);
>          } else if (libxl_defbool_val(d_config->b_info.u.hvm.apic)) {
>              /*
>               * HVM guests without device model may want
> diff --git a/tools/tests/Makefile b/tools/tests/Makefile
> index 639776130b..5cfe781e62 100644
> --- a/tools/tests/Makefile
> +++ b/tools/tests/Makefile
> @@ -13,6 +13,7 @@ endif
>  SUBDIRS-$(CONFIG_X86) += x86_emulator
>  SUBDIRS-y += xen-access
>  SUBDIRS-y += xenstore
> +SUBDIRS-$(CONFIG_HAS_PCI) += vpci
> 
>  .PHONY: all clean install distclean
>  all clean distclean: %: subdirs-%
> diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
> new file mode 100644
> index 0000000000..7969fcbd82
> --- /dev/null
> +++ b/tools/tests/vpci/Makefile
> @@ -0,0 +1,45 @@
> +
> +XEN_ROOT=$(CURDIR)/../../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +TARGET := test_vpci
> +
> +.PHONY: all
> +all: $(TARGET)
> +
> +.PHONY: run
> +run: $(TARGET)
> +	./$(TARGET) > $(TARGET).out
> +
> +$(TARGET): vpci.c vpci.h rbtree.c rbtree.h
> +	$(HOSTCC) -g -o $@ vpci.c main.c rbtree.c
> +
> +.PHONY: clean
> +clean:
> +	rm -rf $(TARGET) $(TARGET).out *.o *~ vpci.h vpci.c rbtree.c rbtree.h
> +
> +.PHONY: distclean
> +distclean: clean
> +
> +.PHONY: install
> +install:
> +
> +vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
> +	sed -e '/#include/d' <$< >$@
> +
> +vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
> +	# Trick the compiler so it doesn't complain about missing symbols
> +	sed -e '/#include/d' \
> +	    -e '1s;^;#include "emul.h"\
> +	             const vpci_register_init_t __start_vpci_array[1]\;\
> +	             const vpci_register_init_t __end_vpci_array[1]\;\
> +	             ;' <$< >$@
> +
> +rbtree.h: $(XEN_ROOT)/xen/include/xen/rbtree.h
> +	sed -e '/#include/d' <$< >$@
> +
> +rbtree.c: $(XEN_ROOT)/xen/common/rbtree.c
> +	sed -e "/#include/d" \
> +	    -e '1s;^;#include "emul.h"\
> +	             ;' <$< >$@
> +
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> new file mode 100644
> index 0000000000..85897ed43b
> --- /dev/null
> +++ b/tools/tests/vpci/emul.h
> @@ -0,0 +1,107 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _TEST_VPCI_
> +#define _TEST_VPCI_
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <errno.h>
> +#include <assert.h>
> +
> +#define container_of(ptr, type, member) ({                      \
> +        typeof( ((type *)0)->member ) *__mptr = (ptr);          \
> +        (type *)( (char *)__mptr - offsetof(type,member) );})
> +
> +#include "rbtree.h"
> +
> +struct pci_dev {
> +    struct domain *domain;
> +    struct vpci *vpci;
> +};
> +
> +struct domain {
> +    struct pci_dev pdev;
> +};
> +
> +struct vcpu
> +{
> +    struct domain *domain;
> +};
> +
> +extern struct vcpu v;
> +
> +#define spin_lock(x)
> +#define spin_unlock(x)
> +#define spin_is_locked(x) true
> +
> +#define current (&v)
> +
> +#define has_vpci(d) true
> +
> +#include "vpci.h"
> +
> +#define xzalloc(type) (type *)calloc(1, sizeof(type))
> +#define xfree(p) free(p)
> +
> +#define EXPORT_SYMBOL(x)
> +
> +#define pci_get_pdev_by_domain(d, ...) &(d)->pdev
> +
> +#define atomic_read(x) 1
> +
> +/* Dummy native helpers. Writes are ignored, reads return 1's. */
> +#define pci_conf_read8(...) (0xff)
> +#define pci_conf_read16(...) (0xffff)
> +#define pci_conf_read32(...) (0xffffffff)
> +#define pci_conf_write8(...)
> +#define pci_conf_write16(...)
> +#define pci_conf_write32(...)
> +
> +#define BUG() assert(0)
> +#define ASSERT_UNREACHABLE() assert(0)
> +#define ASSERT(x) assert(x)
> +
> +#ifdef _LP64
> +#define BITS_PER_LONG 64
> +#else
> +#define BITS_PER_LONG 32
> +#endif
> +#define GENMASK(h, l) \
> +    (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
> +
> +#define min(x,y) ({ \
> +        const typeof(x) _x = (x);       \
> +        const typeof(y) _y = (y);       \
> +        (void) (&_x == &_y);            \
> +        _x < _y ? _x : _y; })
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
> new file mode 100644
> index 0000000000..0fc63de038
> --- /dev/null
> +++ b/tools/tests/vpci/main.c
> @@ -0,0 +1,206 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "emul.h"
> +
> +/* Single vcpu (current), and single domain with a single PCI device. */
> +static struct vpci vpci = {
> +    .handlers = RB_ROOT,
> +};
> +
> +static struct domain d = {
> +    .pdev.domain = &d,
> +    .pdev.vpci = &vpci,
> +};
> +
> +struct vcpu v = { .domain = &d };
> +
> +/* Dummy hooks, write stores data, read fetches it. */
> +static int vpci_read8(struct pci_dev *pdev, unsigned int reg,
> +                      union vpci_val *val, void *data)
> +{
> +    uint8_t *priv = data;
> +
> +    val->half_word = *priv;
> +    return 0;
> +}
> +
> +static int vpci_write8(struct pci_dev *pdev, unsigned int reg,
> +                       union vpci_val val, void *data)
> +{
> +    uint8_t *priv = data;
> +
> +    *priv = val.half_word;
> +    return 0;
> +}
> +
> +static int vpci_read16(struct pci_dev *pdev, unsigned int reg,
> +                       union vpci_val *val, void *data)
> +{
> +    uint16_t *priv = data;
> +
> +    val->word = *priv;
> +    return 0;
> +}
> +
> +static int vpci_write16(struct pci_dev *pdev, unsigned int reg,
> +                        union vpci_val val, void *data)
> +{
> +    uint16_t *priv = data;
> +
> +    *priv = val.word;
> +    return 0;
> +}
> +
> +static int vpci_read32(struct pci_dev *pdev, unsigned int reg,
> +                       union vpci_val *val, void *data)
> +{
> +    uint32_t *priv = data;
> +
> +    val->double_word = *priv;
> +    return 0;
> +}
> +
> +static int vpci_write32(struct pci_dev *pdev, unsigned int reg,
> +                        union vpci_val val, void *data)
> +{
> +    uint32_t *priv = data;
> +
> +    *priv = val.double_word;
> +    return 0;
> +}
> +
> +#define VPCI_READ(reg, size, data) \
> +    assert(!xen_vpci_read(0, 0, 0, reg, size, data))
> +
> +#define VPCI_READ_CHECK(reg, size, expected) ({ \
> +    uint32_t val;                               \
> +    VPCI_READ(reg, size, &val);                 \
> +    assert(val == expected);                    \
> +    })
> +
> +#define VPCI_WRITE(reg, size, data) \
> +    assert(!xen_vpci_write(0, 0, 0, reg, size, data))
> +
> +#define VPCI_CHECK_REG(reg, size, data) ({      \
> +    VPCI_WRITE(reg, size, data);                \
> +    VPCI_READ_CHECK(reg, size, data);           \
> +    })
> +
> +#define VPCI_ADD_REG(fread, fwrite, off, size, store)                         \
> +    assert(!xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, &store))
> \
> +
> +#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)                      \
> +    assert(xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, NULL))  \
> +
> +int
> +main(int argc, char **argv)
> +{
> +    /* Index storage by offset. */
> +    uint32_t r0 = 0xdeadbeef;
> +    uint8_t r5 = 0xef;
> +    uint8_t r6 = 0xbe;
> +    uint8_t r7 = 0xef;
> +    uint16_t r12 = 0x8696;
> +    int rc;
> +
> +    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
> +    VPCI_READ_CHECK(0, 4, 0xdeadbeef);
> +    VPCI_CHECK_REG(0, 4, 0xbcbcbcbc);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
> +    VPCI_READ_CHECK(5, 1, 0xef);
> +    VPCI_CHECK_REG(5, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
> +    VPCI_READ_CHECK(6, 1, 0xbe);
> +    VPCI_CHECK_REG(6, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
> +    VPCI_READ_CHECK(7, 1, 0xef);
> +    VPCI_CHECK_REG(7, 1, 0xbd);
> +
> +    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
> +    VPCI_READ_CHECK(12, 2, 0x8696);
> +    VPCI_READ_CHECK(12, 4, 0xffff8696);
> +
> +    /*
> +     * At this point we have the following layout:
> +     *
> +     * 32    24    16     8     0
> +     *  +-----+-----+-----+-----+
> +     *  |          r0           | 0
> +     *  +-----+-----+-----+-----+
> +     *  | r7  |  r6 |  r5 |/////| 32
> +     *  +-----+-----+-----+-----|
> +     *  |///////////////////////| 64
> +     *  +-----------+-----------+
> +     *  |///////////|    r12    | 96
> +     *  +-----------+-----------+
> +     *             ...
> +     *  / = empty.
> +     */
> +
> +    /* Try to add an overlapping register handler. */
> +    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
> +
> +    /* Try to add a non-aligned register. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
> +
> +    /* Try to add a register with wrong size. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
> +
> +    /* Try to add a register with missing handlers. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, NULL, 8, 2);
> +    VPCI_ADD_INVALID_REG(NULL, vpci_write16, 8, 2);
> +
> +    /* Read/write of unset register. */
> +    VPCI_READ_CHECK(8, 4, 0xffffffff);
> +    VPCI_READ_CHECK(8, 2, 0xffff);
> +    VPCI_READ_CHECK(8, 1, 0xff);
> +    VPCI_WRITE(10, 2, 0xbeef);
> +    VPCI_READ_CHECK(10, 2, 0xffff);
> +
> +    /* Read of multiple registers */
> +    VPCI_CHECK_REG(7, 1, 0xbd);
> +    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
> +
> +    /* Partial read of a register. */
> +    VPCI_CHECK_REG(0, 4, 0x1a1b1c1d);
> +    VPCI_READ_CHECK(2, 1, 0x1b);
> +    VPCI_READ_CHECK(6, 2, 0xbdba);
> +
> +    /* Write of multiple registers. */
> +    VPCI_CHECK_REG(4, 4, 0xaabbccff);
> +
> +    /* Partial write of a register. */
> +    VPCI_CHECK_REG(2, 1, 0xfe);
> +    VPCI_CHECK_REG(6, 2, 0xfebc);
> +
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 44bd3bf0ce..41bf9dfaf3 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -79,6 +79,9 @@ SECTIONS
>         __start_schedulers_array = .;
>         *(.data.schedulers)
>         __end_schedulers_array = .;
> +       __start_vpci_array = .;
> +       *(.data.vpci)
> +       __end_vpci_array = .;
>         *(.data.rel)
>         *(.data.rel.*)
>         CONSTRUCTORS
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 90e2b1f82a..f74020facc 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -500,11 +500,21 @@ static bool emulation_flags_ok(const struct domain
> *d, uint32_t emflags)
>      if ( is_hvm_domain(d) )
>      {
>          if ( is_hardware_domain(d) &&
> -             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
> -            return false;
> -        if ( !is_hardware_domain(d) && emflags &&
> -             emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
> +             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> +                         XEN_X86_EMU_VPCI) )
>              return false;
> +        if ( !is_hardware_domain(d) )
> +        {
> +            switch ( emflags )
> +            {
> +            case XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI:
> +            case XEN_X86_EMU_LAPIC:
> +            case 0:
> +                break;
> +            default:
> +                return false;
> +            }
> +        }
>      }
>      else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT )
>      {
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index a441955322..7f3322ede6 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -37,6 +37,7 @@
>  #include <xen/vm_event.h>
>  #include <xen/monitor.h>
>  #include <xen/warning.h>
> +#include <xen/vpci.h>
>  #include <asm/shadow.h>
>  #include <asm/hap.h>
>  #include <asm/current.h>
> @@ -655,6 +656,7 @@ int hvm_domain_initialise(struct domain *d)
>          d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
> 
>      register_g2m_portio_handler(d);
> +    register_vpci_portio_handler(d);
> 
>      hvm_ioreq_init(d);
> 
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 214ab307c4..15048da556 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -25,6 +25,7 @@
>  #include <xen/trace.h>
>  #include <xen/event.h>
>  #include <xen/hypercall.h>
> +#include <xen/vpci.h>
>  #include <asm/current.h>
>  #include <asm/cpufeature.h>
>  #include <asm/processor.h>
> @@ -256,6 +257,140 @@ void register_g2m_portio_handler(struct domain
> *d)
>      handler->ops = &g2m_portio_ops;
>  }
> 
> +/* Do some sanity checks. */
> +static int vpci_access_check(unsigned int reg, unsigned int len)
> +{
> +    /* Check access size. */
> +    if ( len != 1 && len != 2 && len != 4 )
> +    {
> +        gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
> +                 reg, len);
> +        return -EINVAL;
> +    }
> +
> +    /* Check if access crosses a double-word boundary. */
> +    if ( (reg & 3) + len > 4 )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "invalid access across double-word boundary (reg: %#x, len:
> %u)\n",
> +                 reg, len);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Helper to decode a PCI address. */
> +static void vpci_decode_addr(unsigned int cf8, unsigned int addr,
> +                             unsigned int *bus, unsigned int *devfn,
> +                             unsigned int *reg)
> +{
> +    unsigned long bdf;
> +
> +    bdf = CF8_BDF(cf8);
> +    *bus = PCI_BUS(bdf);
> +    *devfn = PCI_DEVFN(PCI_SLOT(bdf), PCI_FUNC(bdf));
> +    /*
> +     * NB: the lower 2 bits of the register address are fetched from the
> +     * offset into the 0xcfc register when reading/writing to it.
> +     */
> +    *reg = (cf8 & 0xfc) | (addr & 3);
> +}
> +
> +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
> +                                 const ioreq_t *p)
> +{
> +    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc;
> +}
> +
> +static int vpci_portio_read(const struct hvm_io_handler *handler,
> +                            uint64_t addr, uint32_t size, uint64_t *data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, devfn, reg;
> +    uint32_t data32;
> +    int rc;
> +
> +    vpci_lock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        *data = d->arch.hvm_domain.pci_cf8;
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    /* Decode the PCI address. */
> +    vpci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn,
> &reg);
> +
> +    if ( vpci_access_check(reg, size) || reg >= 0xff )
> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +
> +    rc = xen_vpci_read(0, bus, devfn, reg, size, &data32);
> +    if ( !rc )
> +        *data = data32;
> +    vpci_unlock(d);
> +
> +     return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
> +}
> +
> +static int vpci_portio_write(const struct hvm_io_handler *handler,
> +                             uint64_t addr, uint32_t size, uint64_t data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, devfn, reg;
> +    int rc;
> +
> +    vpci_lock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        d->arch.hvm_domain.pci_cf8 = data;
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    /* Decode the PCI address. */
> +    vpci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn,
> &reg);
> +
> +    if ( vpci_access_check(reg, size) || reg >= 0xff )
> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +
> +    rc = xen_vpci_write(0, bus, devfn, reg, size, data);
> +    vpci_unlock(d);
> +
> +    return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
> +}
> +
> +static const struct hvm_io_ops vpci_portio_ops = {
> +    .accept = vpci_portio_accept,
> +    .read = vpci_portio_read,
> +    .write = vpci_portio_write,
> +};
> +
> +void register_vpci_portio_handler(struct domain *d)
> +{
> +    struct hvm_io_handler *handler;
> +
> +    if ( !has_vpci(d) )
> +        return;
> +
> +    handler = hvm_next_io_handler(d);
> +    if ( !handler )
> +        return;
> +
> +    spin_lock_init(&d->arch.hvm_domain.vpci_lock);
> +    handler->type = IOREQ_TYPE_PIO;
> +    handler->ops = &vpci_portio_ops;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index f7b927858c..4cf919f206 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>          domcr_flags |= DOMCRF_hvm |
>                         ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
>                           DOMCRF_hap : 0);
> -        config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
> +        config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> +                                 XEN_X86_EMU_VPCI;
>      }
> 
>      /* Create initial domain 0. */
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index 8289a1bf09..f5cc8e2b8d 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -224,6 +224,9 @@ SECTIONS
>         __start_schedulers_array = .;
>         *(.data.schedulers)
>         __end_schedulers_array = .;
> +       __start_vpci_array = .;
> +       *(.data.vpci)
> +       __end_vpci_array = .;
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
>    } :text
> diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
> index 19391802a8..d51c766453 100644
> --- a/xen/drivers/Makefile
> +++ b/xen/drivers/Makefile
> @@ -1,6 +1,6 @@
>  subdir-y += char
>  subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
> -subdir-$(CONFIG_HAS_PCI) += pci
> +subdir-$(CONFIG_HAS_PCI) += pci vpci
>  subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
>  subdir-$(CONFIG_ACPI) += acpi
>  subdir-$(CONFIG_VIDEO) += video
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index c8e2d2d9a9..2288cf8814 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -30,6 +30,7 @@
>  #include <xen/radix-tree.h>
>  #include <xen/softirq.h>
>  #include <xen/tasklet.h>
> +#include <xen/vpci.h>
>  #include <xsm/xsm.h>
>  #include <asm/msi.h>
>  #include "ats.h"
> @@ -1041,6 +1042,8 @@ static void setup_one_hwdom_device(const struct
> setup_hwdom *ctxt,
>          devfn += pdev->phantom_stride;
>      } while ( devfn != pdev->devfn &&
>                PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
> +
> +    xen_vpci_add_handlers(pdev);
>  }
> 
>  static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg,
> void *arg)
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> new file mode 100644
> index 0000000000..840a906470
> --- /dev/null
> +++ b/xen/drivers/vpci/Makefile
> @@ -0,0 +1 @@
> +obj-y += vpci.o
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> new file mode 100644
> index 0000000000..f4cd04f11d
> --- /dev/null
> +++ b/xen/drivers/vpci/vpci.c
> @@ -0,0 +1,474 @@
> +/*
> + * Generic functionality for handling accesses to the PCI configuration space
> + * from guests.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[];
> +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> +#define vpci_init __start_vpci_array
> +
> +/* Helpers for locking/unlocking. */
> +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> +
> +/* Internal struct to store the emulated PCI registers. */
> +struct vpci_register {
> +    vpci_read_t read;
> +    vpci_write_t write;
> +    unsigned int size;
> +    unsigned int offset;
> +    void *priv_data;
> +    struct rb_node node;
> +};
> +
> +int xen_vpci_add_handlers(struct pci_dev *pdev)
> +{
> +    int i, rc = 0;
> +
> +    if ( !has_vpci(pdev->domain) )
> +        return 0;
> +
> +    pdev->vpci = xzalloc(struct vpci);
> +    if ( !pdev->vpci )
> +        return -ENOMEM;
> +
> +    pdev->vpci->handlers = RB_ROOT;
> +
> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        rc = vpci_init[i](pdev);
> +        if ( rc )
> +            break;
> +    }
> +
> +    if ( rc )
> +    {
> +        struct rb_node *node = rb_first(&pdev->vpci->handlers);
> +        struct vpci_register *r;
> +
> +        /* Iterate over the tree and cleanup. */
> +        while ( node != NULL )
> +        {
> +            r = container_of(node, struct vpci_register, node);
> +            node = rb_next(node);
> +            rb_erase(&r->node, &pdev->vpci->handlers);
> +            xfree(r);
> +        }
> +        xfree(pdev->vpci);
> +    }
> +
> +    return rc;
> +}
> +
> +static bool vpci_register_overlap(const struct vpci_register *r,
> +                                  unsigned int offset)
> +{
> +    if ( offset >= r->offset && offset < r->offset + r->size )
> +        return true;
> +
> +    return false;
> +}
> +
> +
> +static int vpci_register_cmp(const struct vpci_register *r1,
> +                             const struct vpci_register *r2)
> +{
> +    /* Make sure there's no overlap between registers. */
> +    if ( vpci_register_overlap(r1, r2->offset) ||
> +         vpci_register_overlap(r1, r2->offset + r2->size - 1) ||
> +         vpci_register_overlap(r2, r1->offset) ||
> +         vpci_register_overlap(r2, r1->offset + r1->size - 1) )
> +        return 0;
> +
> +    if (r1->offset < r2->offset)
> +        return -1;
> +    else if (r1->offset > r2->offset)
> +        return 1;
> +
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}
> +
> +static struct vpci_register *vpci_find_register(const struct pci_dev *pdev,
> +                                                const unsigned int reg,
> +                                                const unsigned int size)
> +{
> +    struct rb_node *node;
> +    struct vpci_register r = {
> +        .offset = reg,
> +        .size = size,
> +    };
> +
> +    ASSERT(vpci_locked(pdev->domain));
> +
> +    node = pdev->vpci->handlers.rb_node;
> +    while ( node )
> +    {
> +        struct vpci_register *t =
> +            container_of(node, struct vpci_register, node);
> +
> +        switch ( vpci_register_cmp(&r, t) )
> +        {
> +        case -1:
> +            node = node->rb_left;
> +            break;
> +        case 1:
> +            node = node->rb_right;
> +            break;
> +        default:
> +            return t;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t
> read_handler,
> +                          vpci_write_t write_handler, unsigned int offset,
> +                          unsigned int size, void *data)
> +{
> +    struct rb_node **new, *parent;
> +    struct vpci_register *r;
> +
> +    /* Some sanity checks. */
> +    if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||
> +         offset & (size - 1) || read_handler == NULL || write_handler == NULL )
> +        return -EINVAL;
> +
> +    r = xzalloc(struct vpci_register);
> +    if ( !r )
> +        return -ENOMEM;
> +
> +    r->read = read_handler;
> +    r->write = write_handler;
> +    r->size = size;
> +    r->offset = offset;
> +    r->priv_data = data;
> +
> +    vpci_lock(pdev->domain);
> +    new = &pdev->vpci->handlers.rb_node;
> +    parent = NULL;
> +
> +    while (*new) {
> +        struct vpci_register *this =
> +            container_of(*new, struct vpci_register, node);
> +
> +        parent = *new;
> +        switch ( vpci_register_cmp(r, this) )
> +        {
> +        case -1:
> +            new = &((*new)->rb_left);
> +            break;
> +        case 1:
> +            new = &((*new)->rb_right);
> +            break;
> +        default:
> +            xfree(r);
> +            vpci_unlock(pdev->domain);
> +            return -EEXIST;
> +        }
> +    }
> +
> +    rb_link_node(&r->node, parent, new);
> +    rb_insert_color(&r->node, &pdev->vpci->handlers);
> +    vpci_unlock(pdev->domain);
> +
> +    return 0;
> +}
> +
> +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset)
> +{
> +    struct vpci_register *r;
> +
> +    vpci_lock(pdev->domain);
> +    r = vpci_find_register(pdev, offset, 1 /* size doesn't matter here. */);
> +    if ( !r )
> +    {
> +        vpci_unlock(pdev->domain);
> +        return -ENOENT;
> +    }
> +
> +    rb_erase(&r->node, &pdev->vpci->handlers);
> +    xfree(r);
> +    vpci_unlock(pdev->domain);
> +
> +    return 0;
> +}
> +
> +/* Wrappers for performing reads/writes to the underlying hardware. */
> +static void vpci_read_hw(unsigned int seg, unsigned int bus,
> +                         unsigned int devfn, unsigned int reg, uint32_t size,
> +                         uint32_t *data)
> +{
> +    switch ( size )
> +    {
> +    case 4:
> +        *data = pci_conf_read32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte read can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg + 1) << 8;
> +        *data |= pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                               reg);
> +        break;
> +    case 2:
> +        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg);
> +        break;
> +    case 1:
> +        *data = pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                               reg);
> +        break;
> +    default:
> +        BUG();
> +    }
> +}
> +
> +static void vpci_write_hw(unsigned int seg, unsigned int bus,
> +                          unsigned int devfn, unsigned int reg, uint32_t size,
> +                          uint32_t data)
> +{
> +    switch ( size )
> +    {
> +    case 4:
> +        pci_conf_write32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> +                         data);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte write can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        pci_conf_write8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> data);
> +        pci_conf_write16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg +
> 1,
> +                         data >> 8);
> +        break;
> +    case 2:
> +        pci_conf_write16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> +                         data);
> +        break;
> +    case 1:
> +        pci_conf_write8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> data);
> +        break;
> +    default:
> +        BUG();
> +    }
> +}
> +
> +/* Helper macros for the read/write handlers. */
> +#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)
> +#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8
> +#define ADD_RESULT(r, d, s, o) r |= ((d) & GENMASK_BYTES(s, 0)) << ((o) *
> 8)
> +
> +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                  unsigned int reg, uint32_t size, uint32_t *data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    union vpci_val val = { .double_word = 0 };
> +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> +    uint32_t tmp_data;
> +    int rc;
> +
> +    ASSERT(vpci_locked(d));
> +
> +    *data = 0;
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> +    if ( !pdev )
> +        goto passthrough;

I hope this can eventually be generalised so I wonder what your intention is regarding co-existence between Xen emulated PCI config space, pass-through and PCI devices emulated externally. We already have a framework for registering PCI devices by SBDF but this code seems to make no use of it, which I suspect is likely to cause future conflict.

  Paul

> +
> +    /* Find the vPCI register handler. */
> +    r = vpci_find_register(pdev, reg, size);
> +    if ( !r )
> +        goto passthrough;
> +
> +    if ( r->offset > reg )
> +    {
> +        /*
> +         * There's a heading gap into the emulated register.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_read(seg, bus, devfn, reg, r->offset - reg, &tmp_data);
> +        if ( rc )
> +            return rc;
> +
> +        /* Add the head read to the partial result. */
> +        ADD_RESULT(*data, tmp_data, r->offset - reg, 0);
> +        data_lshift = r->offset - reg;
> +
> +        /* Account for the read. */
> +        size -= data_lshift;
> +        reg += data_lshift;
> +    }
> +    else if ( r->offset < reg )
> +        /* There's an offset into the emulated register */
> +        data_rshift = reg - r->offset;
> +
> +    ASSERT(data_lshift == 0 || data_rshift == 0);
> +    data_size = min(size, r->size - data_rshift);
> +    ASSERT(data_size != 0);
> +
> +    /* Perform the read of the register. */
> +    rc = r->read(pdev, r->offset, &val, r->priv_data);
> +    if ( rc )
> +        return rc;
> +
> +    val.double_word >>= data_rshift * 8;
> +    ADD_RESULT(*data, val.double_word, data_size, data_lshift);
> +
> +    /* Account for the read */
> +    size -= data_size;
> +    reg += data_size;
> +
> +    /* Read the remaining, if any. */
> +    if ( size > 0 )
> +    {
> +        /*
> +         * Read tailing data.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_read(seg, bus, devfn, reg, size, &tmp_data);
> +        if ( rc )
> +            return rc;
> +
> +        /* Add the tail read to the partial result. */
> +        ADD_RESULT(*data, tmp_data, size, data_size + data_lshift);
> +    }
> +
> +    return 0;
> +
> + passthrough:
> +    vpci_read_hw(seg, bus, devfn, reg, size, data);
> +    return 0;
> +}
> +
> +/* Perform a maybe partial write to a register. */
> +static int vpci_write_helper(struct pci_dev *pdev,
> +                             const struct vpci_register *r, unsigned int size,
> +                             unsigned int offset, uint32_t data)
> +{
> +    union vpci_val val = { .double_word = data };
> +    int rc;
> +
> +    ASSERT(size <= r->size);
> +    if ( size != r->size )
> +    {
> +        rc = r->read(pdev, r->offset, &val, r->priv_data);
> +        if ( rc )
> +            return rc;
> +        val.double_word &= ~GENMASK_BYTES(size + offset, offset);
> +        data &= GENMASK_BYTES(size, 0);
> +        val.double_word |= data << (offset * 8);
> +    }
> +
> +    return r->write(pdev, r->offset, val, r->priv_data);
> +}
> +
> +int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                   unsigned int reg, uint32_t size, uint32_t data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    unsigned int data_size, data_offset = 0;
> +    int rc;
> +
> +    ASSERT(vpci_locked(d));
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> +    if ( !pdev )
> +        goto passthrough;
> +
> +    /* Find the vPCI register handler. */
> +    r = vpci_find_register(pdev, reg, size);
> +    if ( !r )
> +        goto passthrough;
> +
> +    else if ( r->offset > reg )
> +    {
> +        /*
> +         * There's a heading gap into the emulated register found.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_write(seg, bus, devfn, reg, r->offset - reg, data);
> +        if ( rc )
> +            return rc;
> +
> +        /* Advance the data by the written size. */
> +        SHIFT_RIGHT_BYTES(data, r->offset - reg);
> +        size -= r->offset - reg;
> +        reg += r->offset - reg;
> +    }
> +    else if ( r->offset < reg )
> +        /* There's an offset into the emulated register. */
> +        data_offset = reg - r->offset;
> +
> +    data_size = min(size, r->size - data_offset);
> +
> +    /* Perform the write of the register. */
> +    ASSERT(data_size != 0);
> +    rc = vpci_write_helper(pdev, r, data_size, data_offset, data);
> +    if ( rc )
> +        return rc;
> +
> +    /* Account for the read */
> +    size -= data_size;
> +    reg += data_size;
> +    SHIFT_RIGHT_BYTES(data, data_size);
> +
> +    /* Write the remaining, if any. */
> +    if ( size > 0 )
> +    {
> +        /*
> +         * Write tailing data.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_write(seg, bus, devfn, reg, size, data);
> +        if ( rc )
> +            return rc;
> +    }
> +
> +    return 0;
> +
> + passthrough:
> +    vpci_write_hw(seg, bus, devfn, reg, size, data);
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 6ab987f231..f0741917ed 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -426,6 +426,7 @@ struct arch_domain
>  #define has_vpit(d)        (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_PIT))
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & \
>                              XEN_X86_EMU_USE_PIRQ))
> +#define has_vpci(d)        (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_VPCI))
> 
>  #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
> 
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index d2899c9bb2..cbf4170789 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -184,6 +184,9 @@ struct hvm_domain {
>      /* List of guest to machine IO ports mapping. */
>      struct list_head g2m_ioport_list;
> 
> +    /* Lock for the PCI emulation layer (vPCI). */
> +    spinlock_t vpci_lock;
> +
>      /* List of permanently write-mapped pages. */
>      struct {
>          spinlock_t lock;
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 2484eb1c75..2dbf92f13e 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -155,6 +155,9 @@ extern void hvm_dpci_msi_eoi(struct domain *d, int
> vector);
>   */
>  void register_g2m_portio_handler(struct domain *d);
> 
> +/* HVM port IO handler for PCI accesses. */
> +void register_vpci_portio_handler(struct domain *d);
> +
>  #endif /* __ASM_X86_HVM_IO_H__ */
> 
> 
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-
> x86/xen.h
> index 8a9ba7982b..c00f8cda93 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -295,12 +295,15 @@ struct xen_arch_domainconfig {
>  #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
>  #define _XEN_X86_EMU_USE_PIRQ       9
>  #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
> +#define _XEN_X86_EMU_VPCI           10
> +#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
> 
>  #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC |
> XEN_X86_EMU_HPET |  \
>                                       XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
>                                       XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
>                                       XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
> -                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
> +                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
> +                                     XEN_X86_EMU_VPCI)
>      uint32_t emulation_flags;
>  };
> 
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a81c..a83c4a1276 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -13,6 +13,7 @@
>  #include <xen/irq.h>
>  #include <xen/pci_regs.h>
>  #include <xen/pfn.h>
> +#include <xen/rbtree.h>
>  #include <asm/device.h>
>  #include <asm/numa.h>
>  #include <asm/pci.h>
> @@ -88,6 +89,9 @@ struct pci_dev {
>  #define PT_FAULT_THRESHOLD 10
>      } fault;
>      u64 vf_rlen[6];
> +
> +    /* Data for vPCI. */
> +    struct vpci *vpci;
>  };
> 
>  #define for_each_pdev(domain, pdev) \
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> new file mode 100644
> index 0000000000..56e8d1c35e
> --- /dev/null
> +++ b/xen/include/xen/vpci.h
> @@ -0,0 +1,66 @@
> +#ifndef _VPCI_
> +#define _VPCI_
> +
> +#include <xen/pci.h>
> +#include <xen/types.h>
> +
> +/* Helpers for locking/unlocking. */
> +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> +
> +/* Value read or written by the handlers. */
> +union vpci_val {
> +    uint8_t half_word;
> +    uint16_t word;
> +    uint32_t double_word;
> +};
> +
> +/*
> + * The vPCI handlers will never be called concurrently for the same domain,
> ii
> + * is guaranteed that the vpci domain lock will always be locked when calling
> + * any handler.
> + */
> +typedef int (*vpci_read_t)(struct pci_dev *pdev, unsigned int reg,
> +                           union vpci_val *val, void *data);
> +
> +typedef int (*vpci_write_t)(struct pci_dev *pdev, unsigned int reg,
> +                            union vpci_val val, void *data);
> +
> +typedef int (*vpci_register_init_t)(struct pci_dev *dev);
> +
> +#define REGISTER_VPCI_INIT(x) \
> +  static const vpci_register_init_t x##_entry __used_section(".data.vpci") =
> x
> +
> +/* Add vPCI handlers to device. */
> +int xen_vpci_add_handlers(struct pci_dev *dev);
> +
> +/* Add/remove a register handler. */
> +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t
> read_handler,
> +                          vpci_write_t write_handler, unsigned int offset,
> +                          unsigned int size, void *data);
> +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset);
> +
> +/* Generic read/write handlers for the PCI config space. */
> +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                  unsigned int reg, uint32_t size, uint32_t *data);
> +int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                   unsigned int reg, uint32_t size, uint32_t data);
> +
> +struct vpci {
> +    /* Root pointer for the tree of vPCI handlers. */
> +    struct rb_root handlers;
> +};
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> --
> 2.11.0 (Apple Git-81)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-21 16:07   ` Paul Durrant
@ 2017-04-24  9:09     ` Roger Pau Monne
  2017-04-24  9:34       ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-24  9:09 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

On Fri, Apr 21, 2017 at 05:07:43PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > Sent: 20 April 2017 16:18
> > To: xen-devel@lists.xenproject.org
> > Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau Monne
> > <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> > Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses
> > to the PCI config space
> > 
> > This functionality is going to reside in vpci.c (and the corresponding vpci.h
> > header), and should be arch-agnostic. The handlers introduced in this patch
> > setup the basic functionality required in order to trap accesses to the PCI
> > config space, and allow decoding the address and finding the corresponding
> > handler that should handle the access (although no handlers are
> > implemented).
> > 
> > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
> > inside of a x86 HVM file, since that's not shared with other arches.
> > 
> > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
> > whether
> > a domain should use the newly introduced vPCI handlers, this is only enabled
> > for PVH Dom0 at the moment.
> > 
> > A very simple user-space test is also provided, so that the basic functionality
> > of the vPCI traps can be asserted. This has been proven quite helpful during
> > development, since the logic to handle partial accesses or accesses that
> > expand
> > across multiple registers is not trivial.
> > 
> > The handlers for the registers are added to a red-black tree, that indexes
> > them
> > based on their offset. Since Xen needs to handle partial accesses to the
> > registers and access that expand across multiple registers the logic in
> > xen_vpci_{read/write} is kind of convoluted, I've tried to properly comment
> > it
> > in order to make it easier to understand.
> > 
> 
> Since config space is not exactly huge, I'm wondering why you used an r-b tree rather than a direct map from register to handler?

Hello,

For local PCI the configuration space it's 256byte only, which means using 1/2
a page (256 * 8) so that Xen can store a pointer for each possible register.
The extended configuration space (ECAM) extends the space to 4K, which means we
would use 8 pages per device (4096*8), I think that's too much.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24  9:09     ` Roger Pau Monne
@ 2017-04-24  9:34       ` Paul Durrant
  2017-04-24 10:08         ` Roger Pau Monne
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2017-04-24  9:34 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 24 April 2017 10:09
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Fri, Apr 21, 2017 at 05:07:43PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > > Sent: 20 April 2017 16:18
> > > To: xen-devel@lists.xenproject.org
> > > Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau
> Monne
> > > <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper
> > > <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> > > Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses
> > > to the PCI config space
> > >
> > > This functionality is going to reside in vpci.c (and the corresponding vpci.h
> > > header), and should be arch-agnostic. The handlers introduced in this
> patch
> > > setup the basic functionality required in order to trap accesses to the PCI
> > > config space, and allow decoding the address and finding the
> corresponding
> > > handler that should handle the access (although no handlers are
> > > implemented).
> > >
> > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
> > > inside of a x86 HVM file, since that's not shared with other arches.
> > >
> > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal
> Xen
> > > whether
> > > a domain should use the newly introduced vPCI handlers, this is only
> enabled
> > > for PVH Dom0 at the moment.
> > >
> > > A very simple user-space test is also provided, so that the basic
> functionality
> > > of the vPCI traps can be asserted. This has been proven quite helpful
> during
> > > development, since the logic to handle partial accesses or accesses that
> > > expand
> > > across multiple registers is not trivial.
> > >
> > > The handlers for the registers are added to a red-black tree, that indexes
> > > them
> > > based on their offset. Since Xen needs to handle partial accesses to the
> > > registers and access that expand across multiple registers the logic in
> > > xen_vpci_{read/write} is kind of convoluted, I've tried to properly
> comment
> > > it
> > > in order to make it easier to understand.
> > >
> >
> > Since config space is not exactly huge, I'm wondering why you used an r-b
> tree rather than a direct map from register to handler?
> 
> Hello,
> 
> For local PCI the configuration space it's 256byte only, which means using 1/2
> a page (256 * 8) so that Xen can store a pointer for each possible register.
> The extended configuration space (ECAM) extends the space to 4K, which
> means we
> would use 8 pages per device (4096*8), I think that's too much.

Ok, but I still think that adding an r-b tree implementation is just more complexity in the way that io handlers are registered in Xen.

TBH, the whole thing needs a clean-up. We don't have proper range-based handler registration for port IO or MMIO at all (instead we potentially call the 'accept' function for every handler for every I/O). We then have (IIRC) an ordered list for MSI-X BAR registrations and now you're proposing an r-b system for PCI config space. On top of that, there is then the rangeset based ioreq server selection that occurs if the I/O falls through all of this and needs sending outside Xen. There really has to be at least some scope for unificiation here; it's getting way too convoluted.

  Paul

> 
> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-21 16:23   ` Paul Durrant
@ 2017-04-24  9:42     ` Roger Pau Monne
  2017-04-24  9:55       ` Paul Durrant
  2017-04-24  9:58       ` Paul Durrant
  0 siblings, 2 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-24  9:42 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

On Fri, Apr 21, 2017 at 05:23:34PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
[...]
> > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> > +                  unsigned int reg, uint32_t size, uint32_t *data)
> > +{
> > +    struct domain *d = current->domain;
> > +    struct pci_dev *pdev;
> > +    const struct vpci_register *r;
> > +    union vpci_val val = { .double_word = 0 };
> > +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> > +    uint32_t tmp_data;
> > +    int rc;
> > +
> > +    ASSERT(vpci_locked(d));
> > +
> > +    *data = 0;
> > +
> > +    /* Find the PCI dev matching the address. */
> > +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> > +    if ( !pdev )
> > +        goto passthrough;
> 
> I hope this can eventually be generalised so I wonder what your intention is regarding co-existence between Xen emulated PCI config space, pass-through and PCI devices emulated externally. We already have a framework for registering PCI devices by SBDF but this code seems to make no use of it, which I suspect is likely to cause future conflict.

Yes, the long term aim is to use this code in order to implement
PCI-passthrough for PVH and HVM DomUs also.

TBH, I didn't know we already had such code (I assume you mean the IOREQ
related PCI code). As it is, I see a couple of issues with that, the first one
is that this code expects a ioreq client on the other end, and the code I'm
adding here is all inside of the hypervisor. The second issue is that the IOREQ
code ATM only allows for local PCI accesses, which means I should extend it to
also deal with ECAM/MMCFG areas.

I completely agree that at some point this should be made to work together, but
I'm not sure if it would be better to do that once we want to also use vPCI for
DomUs, so that the Dom0 side is not delayed further.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24  9:42     ` Roger Pau Monne
@ 2017-04-24  9:55       ` Paul Durrant
  2017-04-24  9:58       ` Paul Durrant
  1 sibling, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2017-04-24  9:55 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 24 April 2017 10:42
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Fri, Apr 21, 2017 at 05:23:34PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> [...]
> > > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int
> devfn,
> > > +                  unsigned int reg, uint32_t size, uint32_t *data)
> > > +{
> > > +    struct domain *d = current->domain;
> > > +    struct pci_dev *pdev;
> > > +    const struct vpci_register *r;
> > > +    union vpci_val val = { .double_word = 0 };
> > > +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> > > +    uint32_t tmp_data;
> > > +    int rc;
> > > +
> > > +    ASSERT(vpci_locked(d));
> > > +
> > > +    *data = 0;
> > > +
> > > +    /* Find the PCI dev matching the address. */
> > > +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> > > +    if ( !pdev )
> > > +        goto passthrough;
> >
> > I hope this can eventually be generalised so I wonder what your intention is
> regarding co-existence between Xen emulated PCI config space, pass-
> through and PCI devices emulated externally. We already have a framework
> for registering PCI devices by SBDF but this code seems to make no use of it,
> which I suspect is likely to cause future conflict.
> 
> Yes, the long term aim is to use this code in order to implement
> PCI-passthrough for PVH and HVM DomUs also.
> 
> TBH, I didn't know we already had such code (I assume you mean the IOREQ
> related PCI code). As it is, I see a couple of issues with that, the first one
> is that this code expects a ioreq client on the other end, and the code I'm
> adding here is all inside of the hypervisor. The second issue is that the IOREQ
> code ATM only allows for local PCI accesses, which means I should extend it
> to
> also deal with ECAM/MMCFG areas.
> 
> I completely agree that at some point this should be made to work together,
> but
> I'm not sure if it would be better to do that once we want to also use vPCI for
> DomUs, so that the Dom0 side is not delayed further.
> 

If the follow up work will definitely be done, then I can live with that. Is there an actual plan to deal with domU pass-through on some backlog somewhere?

  Paul

> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24  9:42     ` Roger Pau Monne
  2017-04-24  9:55       ` Paul Durrant
@ 2017-04-24  9:58       ` Paul Durrant
  2017-04-24 10:11         ` Roger Pau Monne
  1 sibling, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2017-04-24  9:58 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 24 April 2017 10:42
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Fri, Apr 21, 2017 at 05:23:34PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> [...]
> > > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int
> devfn,
> > > +                  unsigned int reg, uint32_t size, uint32_t *data)
> > > +{
> > > +    struct domain *d = current->domain;
> > > +    struct pci_dev *pdev;
> > > +    const struct vpci_register *r;
> > > +    union vpci_val val = { .double_word = 0 };
> > > +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> > > +    uint32_t tmp_data;
> > > +    int rc;
> > > +
> > > +    ASSERT(vpci_locked(d));
> > > +
> > > +    *data = 0;
> > > +
> > > +    /* Find the PCI dev matching the address. */
> > > +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> > > +    if ( !pdev )
> > > +        goto passthrough;
> >
> > I hope this can eventually be generalised so I wonder what your intention is
> regarding co-existence between Xen emulated PCI config space, pass-
> through and PCI devices emulated externally. We already have a framework
> for registering PCI devices by SBDF but this code seems to make no use of it,
> which I suspect is likely to cause future conflict.
> 
> Yes, the long term aim is to use this code in order to implement
> PCI-passthrough for PVH and HVM DomUs also.
> 
> TBH, I didn't know we already had such code (I assume you mean the IOREQ
> related PCI code). As it is, I see a couple of issues with that, the first one
> is that this code expects a ioreq client on the other end, and the code I'm
> adding here is all inside of the hypervisor. The second issue is that the IOREQ
> code ATM only allows for local PCI accesses, which means I should extend it
> to
> also deal with ECAM/MMCFG areas.
> 
> I completely agree that at some point this should be made to work together,
> but
> I'm not sure if it would be better to do that once we want to also use vPCI for
> DomUs, so that the Dom0 side is not delayed further.

BTW, that's also an argument for forgetting about the r-b scheme for handler registration since, if this really is for dom0 only, 8 pages worth of direct map is not a lot.

  Paul

> 
> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24  9:34       ` Paul Durrant
@ 2017-04-24 10:08         ` Roger Pau Monne
  2017-04-24 10:19           ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-24 10:08 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

On Mon, Apr 24, 2017 at 10:34:15AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 24 April 2017 10:09
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>
> > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > accesses to the PCI config space
> > 
> > On Fri, Apr 21, 2017 at 05:07:43PM +0100, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > > > Sent: 20 April 2017 16:18
> > > > To: xen-devel@lists.xenproject.org
> > > > Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau
> > Monne
> > > > <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> > Cooper
> > > > <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> > > > Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > accesses
> > > > to the PCI config space
> > > >
> > > > This functionality is going to reside in vpci.c (and the corresponding vpci.h
> > > > header), and should be arch-agnostic. The handlers introduced in this
> > patch
> > > > setup the basic functionality required in order to trap accesses to the PCI
> > > > config space, and allow decoding the address and finding the
> > corresponding
> > > > handler that should handle the access (although no handlers are
> > > > implemented).
> > > >
> > > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
> > > > inside of a x86 HVM file, since that's not shared with other arches.
> > > >
> > > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal
> > Xen
> > > > whether
> > > > a domain should use the newly introduced vPCI handlers, this is only
> > enabled
> > > > for PVH Dom0 at the moment.
> > > >
> > > > A very simple user-space test is also provided, so that the basic
> > functionality
> > > > of the vPCI traps can be asserted. This has been proven quite helpful
> > during
> > > > development, since the logic to handle partial accesses or accesses that
> > > > expand
> > > > across multiple registers is not trivial.
> > > >
> > > > The handlers for the registers are added to a red-black tree, that indexes
> > > > them
> > > > based on their offset. Since Xen needs to handle partial accesses to the
> > > > registers and access that expand across multiple registers the logic in
> > > > xen_vpci_{read/write} is kind of convoluted, I've tried to properly
> > comment
> > > > it
> > > > in order to make it easier to understand.
> > > >
> > >
> > > Since config space is not exactly huge, I'm wondering why you used an r-b
> > tree rather than a direct map from register to handler?
> > 
> > Hello,
> > 
> > For local PCI the configuration space it's 256byte only, which means using 1/2
> > a page (256 * 8) so that Xen can store a pointer for each possible register.
> > The extended configuration space (ECAM) extends the space to 4K, which
> > means we
> > would use 8 pages per device (4096*8), I think that's too much.
> 
> Ok, but I still think that adding an r-b tree implementation is just more complexity in the way that io handlers are registered in Xen.

But this complexity is completely hidden inside of the io handler itself that
traps the access to 0xcf8/cfc (or ECAM areas).

Do you mean that you would like this functionality to made available to IOREQ
clients also, so that they could register handlers for specific PCI registers
without owning the full configuration space of such device?

> TBH, the whole thing needs a clean-up. We don't have proper range-based handler registration for port IO or MMIO at all (instead we potentially call the 'accept' function for every handler for every I/O). We then have (IIRC) an ordered list for MSI-X BAR registrations and now you're proposing an r-b system for PCI config space.

One way or another Xen needs to track handlers for the PCI config space, and
currently this is not implemented inside of Xen.

The MSI-X BAR tracking will go away once this code is also used for
PCI-passthrough to DomUs. The msixtbl code is just extremely messy, because
MSI-X interrupt handling for passthrough devices is partially handled in QEMU
and partially inside of Xen.

> On top of that, there is then the rangeset based ioreq server selection that occurs if the I/O falls through all of this and needs sending outside Xen. There really has to be at least some scope for unificiation here; it's getting way too convoluted.

Yes, I agree that there's some room for sharing here, the address decoding done
in hvm_select_ioreq_server for PCI could be reused for vPCI also, it's just
that all this code expects a IOREQ server, and vPCI is not going to be an IOREQ
server.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24  9:58       ` Paul Durrant
@ 2017-04-24 10:11         ` Roger Pau Monne
  2017-04-24 10:12           ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-24 10:11 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

On Mon, Apr 24, 2017 at 10:58:04AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 24 April 2017 10:42
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>
> > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > accesses to the PCI config space
> > 
> > On Fri, Apr 21, 2017 at 05:23:34PM +0100, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > [...]
> > > > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int
> > devfn,
> > > > +                  unsigned int reg, uint32_t size, uint32_t *data)
> > > > +{
> > > > +    struct domain *d = current->domain;
> > > > +    struct pci_dev *pdev;
> > > > +    const struct vpci_register *r;
> > > > +    union vpci_val val = { .double_word = 0 };
> > > > +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> > > > +    uint32_t tmp_data;
> > > > +    int rc;
> > > > +
> > > > +    ASSERT(vpci_locked(d));
> > > > +
> > > > +    *data = 0;
> > > > +
> > > > +    /* Find the PCI dev matching the address. */
> > > > +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> > > > +    if ( !pdev )
> > > > +        goto passthrough;
> > >
> > > I hope this can eventually be generalised so I wonder what your intention is
> > regarding co-existence between Xen emulated PCI config space, pass-
> > through and PCI devices emulated externally. We already have a framework
> > for registering PCI devices by SBDF but this code seems to make no use of it,
> > which I suspect is likely to cause future conflict.
> > 
> > Yes, the long term aim is to use this code in order to implement
> > PCI-passthrough for PVH and HVM DomUs also.
> > 
> > TBH, I didn't know we already had such code (I assume you mean the IOREQ
> > related PCI code). As it is, I see a couple of issues with that, the first one
> > is that this code expects a ioreq client on the other end, and the code I'm
> > adding here is all inside of the hypervisor. The second issue is that the IOREQ
> > code ATM only allows for local PCI accesses, which means I should extend it
> > to
> > also deal with ECAM/MMCFG areas.
> > 
> > I completely agree that at some point this should be made to work together,
> > but
> > I'm not sure if it would be better to do that once we want to also use vPCI for
> > DomUs, so that the Dom0 side is not delayed further.
> 
> BTW, that's also an argument for forgetting about the r-b scheme for handler registration since, if this really is for dom0 only, 8 pages worth of direct map is not a lot.

It's 8 pages for each device, not 8 pages for each domain, so it doesn't matter
if it's Dom0 or DomU, each PCIe device would use 8 pages.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24 10:11         ` Roger Pau Monne
@ 2017-04-24 10:12           ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2017-04-24 10:12 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 24 April 2017 11:12
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Mon, Apr 24, 2017 at 10:58:04AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 24 April 2017 10:42
> > > To: Paul Durrant <Paul.Durrant@citrix.com>
> > > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> Liu
> > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper
> > > <Andrew.Cooper3@citrix.com>
> > > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > accesses to the PCI config space
> > >
> > > On Fri, Apr 21, 2017 at 05:23:34PM +0100, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > > [...]
> > > > > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int
> > > devfn,
> > > > > +                  unsigned int reg, uint32_t size, uint32_t *data)
> > > > > +{
> > > > > +    struct domain *d = current->domain;
> > > > > +    struct pci_dev *pdev;
> > > > > +    const struct vpci_register *r;
> > > > > +    union vpci_val val = { .double_word = 0 };
> > > > > +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> > > > > +    uint32_t tmp_data;
> > > > > +    int rc;
> > > > > +
> > > > > +    ASSERT(vpci_locked(d));
> > > > > +
> > > > > +    *data = 0;
> > > > > +
> > > > > +    /* Find the PCI dev matching the address. */
> > > > > +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> > > > > +    if ( !pdev )
> > > > > +        goto passthrough;
> > > >
> > > > I hope this can eventually be generalised so I wonder what your
> intention is
> > > regarding co-existence between Xen emulated PCI config space, pass-
> > > through and PCI devices emulated externally. We already have a
> framework
> > > for registering PCI devices by SBDF but this code seems to make no use of
> it,
> > > which I suspect is likely to cause future conflict.
> > >
> > > Yes, the long term aim is to use this code in order to implement
> > > PCI-passthrough for PVH and HVM DomUs also.
> > >
> > > TBH, I didn't know we already had such code (I assume you mean the
> IOREQ
> > > related PCI code). As it is, I see a couple of issues with that, the first one
> > > is that this code expects a ioreq client on the other end, and the code I'm
> > > adding here is all inside of the hypervisor. The second issue is that the
> IOREQ
> > > code ATM only allows for local PCI accesses, which means I should extend
> it
> > > to
> > > also deal with ECAM/MMCFG areas.
> > >
> > > I completely agree that at some point this should be made to work
> together,
> > > but
> > > I'm not sure if it would be better to do that once we want to also use vPCI
> for
> > > DomUs, so that the Dom0 side is not delayed further.
> >
> > BTW, that's also an argument for forgetting about the r-b scheme for
> handler registration since, if this really is for dom0 only, 8 pages worth of
> direct map is not a lot.
> 
> It's 8 pages for each device, not 8 pages for each domain, so it doesn't matter
> if it's Dom0 or DomU, each PCIe device would use 8 pages.

Sorry, yes of course it is.

  Paul

> 
> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24 10:08         ` Roger Pau Monne
@ 2017-04-24 10:19           ` Paul Durrant
  2017-04-24 11:02             ` Roger Pau Monne
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2017-04-24 10:19 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 24 April 2017 11:09
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Mon, Apr 24, 2017 at 10:34:15AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 24 April 2017 10:09
> > > To: Paul Durrant <Paul.Durrant@citrix.com>
> > > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> Liu
> > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper
> > > <Andrew.Cooper3@citrix.com>
> > > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > accesses to the PCI config space
> > >
> > > On Fri, Apr 21, 2017 at 05:07:43PM +0100, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > > > > Sent: 20 April 2017 16:18
> > > > > To: xen-devel@lists.xenproject.org
> > > > > Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau
> > > Monne
> > > > > <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> Liu
> > > > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> > > Cooper
> > > > > <Andrew.Cooper3@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> > > > > Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > accesses
> > > > > to the PCI config space
> > > > >
> > > > > This functionality is going to reside in vpci.c (and the corresponding
> vpci.h
> > > > > header), and should be arch-agnostic. The handlers introduced in this
> > > patch
> > > > > setup the basic functionality required in order to trap accesses to the
> PCI
> > > > > config space, and allow decoding the address and finding the
> > > corresponding
> > > > > handler that should handle the access (although no handlers are
> > > > > implemented).
> > > > >
> > > > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
> setup
> > > > > inside of a x86 HVM file, since that's not shared with other arches.
> > > > >
> > > > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to
> signal
> > > Xen
> > > > > whether
> > > > > a domain should use the newly introduced vPCI handlers, this is only
> > > enabled
> > > > > for PVH Dom0 at the moment.
> > > > >
> > > > > A very simple user-space test is also provided, so that the basic
> > > functionality
> > > > > of the vPCI traps can be asserted. This has been proven quite helpful
> > > during
> > > > > development, since the logic to handle partial accesses or accesses
> that
> > > > > expand
> > > > > across multiple registers is not trivial.
> > > > >
> > > > > The handlers for the registers are added to a red-black tree, that
> indexes
> > > > > them
> > > > > based on their offset. Since Xen needs to handle partial accesses to
> the
> > > > > registers and access that expand across multiple registers the logic in
> > > > > xen_vpci_{read/write} is kind of convoluted, I've tried to properly
> > > comment
> > > > > it
> > > > > in order to make it easier to understand.
> > > > >
> > > >
> > > > Since config space is not exactly huge, I'm wondering why you used an
> r-b
> > > tree rather than a direct map from register to handler?
> > >
> > > Hello,
> > >
> > > For local PCI the configuration space it's 256byte only, which means using
> 1/2
> > > a page (256 * 8) so that Xen can store a pointer for each possible register.
> > > The extended configuration space (ECAM) extends the space to 4K,
> which
> > > means we
> > > would use 8 pages per device (4096*8), I think that's too much.
> >
> > Ok, but I still think that adding an r-b tree implementation is just more
> complexity in the way that io handlers are registered in Xen.
> 
> But this complexity is completely hidden inside of the io handler itself that
> traps the access to 0xcf8/cfc (or ECAM areas).
> 
> Do you mean that you would like this functionality to made available to
> IOREQ
> clients also, so that they could register handlers for specific PCI registers
> without owning the full configuration space of such device?
> 
> > TBH, the whole thing needs a clean-up. We don't have proper range-based
> handler registration for port IO or MMIO at all (instead we potentially call the
> 'accept' function for every handler for every I/O). We then have (IIRC) an
> ordered list for MSI-X BAR registrations and now you're proposing an r-b
> system for PCI config space.
> 
> One way or another Xen needs to track handlers for the PCI config space,
> and
> currently this is not implemented inside of Xen.

What I mean is that we should have some form of range-based IO handler registration framework and then that can be used for port IO, MMIO and PCI config space. For external config space emulation then yes of course the external emulated needs to claim the whole space for that SBDF, but that's just a degenerate case of claiming a specific range within the SBDF.
Thus, if Xen can steer port IO, MMIO or PCI config accesses by range then we can potentially use that framework to register internal emulation handlers or a special emulation handler that sends the requests out to an ioreq server.

> 
> The MSI-X BAR tracking will go away once this code is also used for
> PCI-passthrough to DomUs. The msixtbl code is just extremely messy,
> because
> MSI-X interrupt handling for passthrough devices is partially handled in
> QEMU
> and partially inside of Xen.
> 
> > On top of that, there is then the rangeset based ioreq server selection that
> occurs if the I/O falls through all of this and needs sending outside Xen. There
> really has to be at least some scope for unificiation here; it's getting way too
> convoluted.
> 
> Yes, I agree that there's some room for sharing here, the address decoding
> done
> in hvm_select_ioreq_server for PCI could be reused for vPCI also, it's just
> that all this code expects a IOREQ server, and vPCI is not going to be an
> IOREQ
> server.

Indeed. Hopefully I've explained what I was thinking above.

  Paul

> 
> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24 10:19           ` Paul Durrant
@ 2017-04-24 11:02             ` Roger Pau Monne
  2017-04-24 11:50               ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-24 11:02 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

On Mon, Apr 24, 2017 at 11:19:10AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 24 April 2017 11:09
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>
> > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > accesses to the PCI config space
> > 
> > On Mon, Apr 24, 2017 at 10:34:15AM +0100, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Roger Pau Monne
> > > > Sent: 24 April 2017 10:09
> > > > To: Paul Durrant <Paul.Durrant@citrix.com>
> > > > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > > > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> > Liu
> > > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> > Cooper
> > > > <Andrew.Cooper3@citrix.com>
> > > > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > > accesses to the PCI config space
> > > >
> > > > On Fri, Apr 21, 2017 at 05:07:43PM +0100, Paul Durrant wrote:
> > > > > > -----Original Message-----
> > > > > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > > > > > Sent: 20 April 2017 16:18
> > > > > > To: xen-devel@lists.xenproject.org
> > > > > > Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau
> > > > Monne
> > > > > > <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> > Liu
> > > > > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> > > > Cooper
> > > > > > <Andrew.Cooper3@citrix.com>; Paul Durrant
> > <Paul.Durrant@citrix.com>
> > > > > > Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > > accesses
> > > > > > to the PCI config space
> > > > > >
> > > > > > This functionality is going to reside in vpci.c (and the corresponding
> > vpci.h
> > > > > > header), and should be arch-agnostic. The handlers introduced in this
> > > > patch
> > > > > > setup the basic functionality required in order to trap accesses to the
> > PCI
> > > > > > config space, and allow decoding the address and finding the
> > > > corresponding
> > > > > > handler that should handle the access (although no handlers are
> > > > > > implemented).
> > > > > >
> > > > > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
> > setup
> > > > > > inside of a x86 HVM file, since that's not shared with other arches.
> > > > > >
> > > > > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to
> > signal
> > > > Xen
> > > > > > whether
> > > > > > a domain should use the newly introduced vPCI handlers, this is only
> > > > enabled
> > > > > > for PVH Dom0 at the moment.
> > > > > >
> > > > > > A very simple user-space test is also provided, so that the basic
> > > > functionality
> > > > > > of the vPCI traps can be asserted. This has been proven quite helpful
> > > > during
> > > > > > development, since the logic to handle partial accesses or accesses
> > that
> > > > > > expand
> > > > > > across multiple registers is not trivial.
> > > > > >
> > > > > > The handlers for the registers are added to a red-black tree, that
> > indexes
> > > > > > them
> > > > > > based on their offset. Since Xen needs to handle partial accesses to
> > the
> > > > > > registers and access that expand across multiple registers the logic in
> > > > > > xen_vpci_{read/write} is kind of convoluted, I've tried to properly
> > > > comment
> > > > > > it
> > > > > > in order to make it easier to understand.
> > > > > >
> > > > >
> > > > > Since config space is not exactly huge, I'm wondering why you used an
> > r-b
> > > > tree rather than a direct map from register to handler?
> > > >
> > > > Hello,
> > > >
> > > > For local PCI the configuration space it's 256byte only, which means using
> > 1/2
> > > > a page (256 * 8) so that Xen can store a pointer for each possible register.
> > > > The extended configuration space (ECAM) extends the space to 4K,
> > which
> > > > means we
> > > > would use 8 pages per device (4096*8), I think that's too much.
> > >
> > > Ok, but I still think that adding an r-b tree implementation is just more
> > complexity in the way that io handlers are registered in Xen.
> > 
> > But this complexity is completely hidden inside of the io handler itself that
> > traps the access to 0xcf8/cfc (or ECAM areas).
> > 
> > Do you mean that you would like this functionality to made available to
> > IOREQ
> > clients also, so that they could register handlers for specific PCI registers
> > without owning the full configuration space of such device?
> > 
> > > TBH, the whole thing needs a clean-up. We don't have proper range-based
> > handler registration for port IO or MMIO at all (instead we potentially call the
> > 'accept' function for every handler for every I/O). We then have (IIRC) an
> > ordered list for MSI-X BAR registrations and now you're proposing an r-b
> > system for PCI config space.
> > 
> > One way or another Xen needs to track handlers for the PCI config space,
> > and
> > currently this is not implemented inside of Xen.
> 
> What I mean is that we should have some form of range-based IO handler registration framework and then that can be used for port IO, MMIO and PCI config space. For external config space emulation then yes of course the external emulated needs to claim the whole space for that SBDF, but that's just a degenerate case of claiming a specific range within the SBDF.
> Thus, if Xen can steer port IO, MMIO or PCI config accesses by range then we can potentially use that framework to register internal emulation handlers or a special emulation handler that sends the requests out to an ioreq server.

IMHO I'm not sure Xen needs PCI register based trapping granularity. I would
argue that whatever (IOREQ or Xen internal function) that wants to trap access
to a specific PCI config device register needs to take care of all the
registers for that device.

I will look into hooking this code (vPCI) into the existing hvm_*_ioreq
functionality, so that vPCI claims the full PCI config space for each device it
manages.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24 11:02             ` Roger Pau Monne
@ 2017-04-24 11:50               ` Paul Durrant
  2017-04-25  8:27                 ` Roger Pau Monne
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2017-04-24 11:50 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 24 April 2017 12:03
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Mon, Apr 24, 2017 at 11:19:10AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 24 April 2017 11:09
> > > To: Paul Durrant <Paul.Durrant@citrix.com>
> > > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> Liu
> > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper
> > > <Andrew.Cooper3@citrix.com>
> > > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > accesses to the PCI config space
> > >
> > > On Mon, Apr 24, 2017 at 10:34:15AM +0100, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Roger Pau Monne
> > > > > Sent: 24 April 2017 10:09
> > > > > To: Paul Durrant <Paul.Durrant@citrix.com>
> > > > > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > > > > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>;
> Wei
> > > Liu
> > > > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> > > Cooper
> > > > > <Andrew.Cooper3@citrix.com>
> > > > > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > > > accesses to the PCI config space
> > > > >
> > > > > On Fri, Apr 21, 2017 at 05:07:43PM +0100, Paul Durrant wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > > > > > > Sent: 20 April 2017 16:18
> > > > > > > To: xen-devel@lists.xenproject.org
> > > > > > > Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger
> Pau
> > > > > Monne
> > > > > > > <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
> Wei
> > > Liu
> > > > > > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> > > > > Cooper
> > > > > > > <Andrew.Cooper3@citrix.com>; Paul Durrant
> > > <Paul.Durrant@citrix.com>
> > > > > > > Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > > > accesses
> > > > > > > to the PCI config space
> > > > > > >
> > > > > > > This functionality is going to reside in vpci.c (and the
> corresponding
> > > vpci.h
> > > > > > > header), and should be arch-agnostic. The handlers introduced in
> this
> > > > > patch
> > > > > > > setup the basic functionality required in order to trap accesses to
> the
> > > PCI
> > > > > > > config space, and allow decoding the address and finding the
> > > > > corresponding
> > > > > > > handler that should handle the access (although no handlers are
> > > > > > > implemented).
> > > > > > >
> > > > > > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
> > > setup
> > > > > > > inside of a x86 HVM file, since that's not shared with other arches.
> > > > > > >
> > > > > > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to
> > > signal
> > > > > Xen
> > > > > > > whether
> > > > > > > a domain should use the newly introduced vPCI handlers, this is
> only
> > > > > enabled
> > > > > > > for PVH Dom0 at the moment.
> > > > > > >
> > > > > > > A very simple user-space test is also provided, so that the basic
> > > > > functionality
> > > > > > > of the vPCI traps can be asserted. This has been proven quite
> helpful
> > > > > during
> > > > > > > development, since the logic to handle partial accesses or
> accesses
> > > that
> > > > > > > expand
> > > > > > > across multiple registers is not trivial.
> > > > > > >
> > > > > > > The handlers for the registers are added to a red-black tree, that
> > > indexes
> > > > > > > them
> > > > > > > based on their offset. Since Xen needs to handle partial accesses
> to
> > > the
> > > > > > > registers and access that expand across multiple registers the logic
> in
> > > > > > > xen_vpci_{read/write} is kind of convoluted, I've tried to properly
> > > > > comment
> > > > > > > it
> > > > > > > in order to make it easier to understand.
> > > > > > >
> > > > > >
> > > > > > Since config space is not exactly huge, I'm wondering why you used
> an
> > > r-b
> > > > > tree rather than a direct map from register to handler?
> > > > >
> > > > > Hello,
> > > > >
> > > > > For local PCI the configuration space it's 256byte only, which means
> using
> > > 1/2
> > > > > a page (256 * 8) so that Xen can store a pointer for each possible
> register.
> > > > > The extended configuration space (ECAM) extends the space to 4K,
> > > which
> > > > > means we
> > > > > would use 8 pages per device (4096*8), I think that's too much.
> > > >
> > > > Ok, but I still think that adding an r-b tree implementation is just more
> > > complexity in the way that io handlers are registered in Xen.
> > >
> > > But this complexity is completely hidden inside of the io handler itself
> that
> > > traps the access to 0xcf8/cfc (or ECAM areas).
> > >
> > > Do you mean that you would like this functionality to made available to
> > > IOREQ
> > > clients also, so that they could register handlers for specific PCI registers
> > > without owning the full configuration space of such device?
> > >
> > > > TBH, the whole thing needs a clean-up. We don't have proper range-
> based
> > > handler registration for port IO or MMIO at all (instead we potentially call
> the
> > > 'accept' function for every handler for every I/O). We then have (IIRC) an
> > > ordered list for MSI-X BAR registrations and now you're proposing an r-b
> > > system for PCI config space.
> > >
> > > One way or another Xen needs to track handlers for the PCI config space,
> > > and
> > > currently this is not implemented inside of Xen.
> >
> > What I mean is that we should have some form of range-based IO handler
> registration framework and then that can be used for port IO, MMIO and PCI
> config space. For external config space emulation then yes of course the
> external emulated needs to claim the whole space for that SBDF, but that's
> just a degenerate case of claiming a specific range within the SBDF.
> > Thus, if Xen can steer port IO, MMIO or PCI config accesses by range then
> we can potentially use that framework to register internal emulation
> handlers or a special emulation handler that sends the requests out to an
> ioreq server.
> 
> IMHO I'm not sure Xen needs PCI register based trapping granularity. I would
> argue that whatever (IOREQ or Xen internal function) that wants to trap
> access
> to a specific PCI config device register needs to take care of all the
> registers for that device.
> 

Having distinct handers for distinct groups of makes sense though... e.g. being able to register a BAR handler for each BAR and then maybe an MSI-X capability handler for wherever that appears in the capability chain, etc. If you don't allow such registration at the top level then it ends up getting done at the next level. That said, it may make more sense to have a top level of emulation that just handles all register reads and writes to config space and then a second level that has callbacks for BAR enumeration, bus master enable, MSI-X mask/unmask, etc.

> I will look into hooking this code (vPCI) into the existing hvm_*_ioreq
> functionality, so that vPCI claims the full PCI config space for each device it
> manages.

Cool.

  Paul

> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-20 15:17 ` [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init Roger Pau Monne
@ 2017-04-24 14:42   ` Julien Grall
  2017-04-25  8:01     ` Roger Pau Monne
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2017-04-24 14:42 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Andrew Cooper, boris.ostrovsky, Stefano Stabellini, Jan Beulich

Hi Roger,

On 20/04/17 16:17, Roger Pau Monne wrote:
> And also allow it to do non-identity mappings by adding a new parameter. This
> function will be needed in other parts apart from PVH Dom0 build.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/hvm/dom0_build.c | 22 +---------------------
>  xen/common/memory.c           | 34 ++++++++++++++++++++++++++++++++++
>  xen/include/xen/p2m-common.h  |  4 ++++
>  3 files changed, 39 insertions(+), 21 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index ca88c5835e..65f606d33a 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -64,27 +64,7 @@ static struct acpi_madt_nmi_source __initdata *nmisrc;
>  static int __init modify_identity_mmio(struct domain *d, unsigned long pfn,
>                                         unsigned long nr_pages, const bool map)
>  {
> -    int rc;
> -
> -    for ( ; ; )
> -    {
> -        rc = (map ? map_mmio_regions : unmap_mmio_regions)
> -             (d, _gfn(pfn), nr_pages, _mfn(pfn));
> -        if ( rc == 0 )
> -            break;
> -        if ( rc < 0 )
> -        {
> -            printk(XENLOG_WARNING
> -                   "Failed to identity %smap [%#lx,%#lx) for d%d: %d\n",
> -                   map ? "" : "un", pfn, pfn + nr_pages, d->domain_id, rc);
> -            break;
> -        }
> -        nr_pages -= rc;
> -        pfn += rc;
> -        process_pending_softirqs();
> -    }
> -
> -    return rc;
> +    return modify_mmio(d, pfn, pfn, nr_pages, map);
>  }
>
>  /* Populate a HVM memory range using the biggest possible order. */
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 52879e7438..0d970482cb 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1438,6 +1438,40 @@ int prepare_ring_for_helper(
>      return 0;
>  }
>
> +int modify_mmio(struct domain *d, unsigned long gfn, unsigned long pfn,

Whilst you introduce this new function, please use mfn_t and gfn_t.

Also s/pfn/mfn/

> +                unsigned long nr_pages, const bool map)
> +{
> +    int rc;
> +
> +    /*
> +     * Make sure this function is only used by the hardware domain, because it
> +     * can take an arbitrary long time, and could DoS the whole system.
> +     */
> +    ASSERT(is_hardware_domain(d));

What would be the plan for guest if we decide to use vpci?

> +
> +    for ( ; ; )
> +    {
> +        rc = (map ? map_mmio_regions : unmap_mmio_regions)

On ARM, map_mmio_regions and unmap_mmio_regions will map the MMIO with 
very strict attribute. I think we would need an extra argument to know 
the wanted memory attribute (maybe p2m_type_t?).

> +             (d, _gfn(gfn), nr_pages, _mfn(pfn));
> +        if ( rc == 0 )
> +            break;
> +        if ( rc < 0 )
> +        {
> +            printk(XENLOG_WARNING

I would probably use XENLOG_G_WARNING.

> +                   "Failed to %smap [%#lx, %#lx) -> [%#lx,%#lx) for d%d: %d\n",
> +                   map ? "" : "un", gfn, gfn + nr_pages, pfn, pfn + nr_pages,
> +                   d->domain_id, rc);
> +            break;
> +        }
> +        nr_pages -= rc;
> +        pfn += rc;
> +        gfn += rc;
> +        process_pending_softirqs();
> +    }
> +
> +    return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h
> index 8cd5a6b503..1308da44e7 100644
> --- a/xen/include/xen/p2m-common.h
> +++ b/xen/include/xen/p2m-common.h
> @@ -13,4 +13,8 @@ int unmap_mmio_regions(struct domain *d,
>                         unsigned long nr,
>                         mfn_t mfn);
>
> +

Spurious newline.

> +int modify_mmio(struct domain *d, unsigned long gfn, unsigned long pfn,
> +                unsigned long nr_pages, const bool map);
> +
>  #endif /* _XEN_P2M_COMMON_H */
>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/9] vpci/msi: add MSI handlers
  2017-04-20 15:17 ` [PATCH v2 8/9] vpci/msi: add MSI handlers Roger Pau Monne
  2017-04-21  8:38   ` Roger Pau Monne
@ 2017-04-24 15:31   ` Julien Grall
  2017-04-25 11:49     ` Roger Pau Monne
  1 sibling, 1 reply; 40+ messages in thread
From: Julien Grall @ 2017-04-24 15:31 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: boris.ostrovsky, Stefano Stabellini, Punit Agrawal

Hi Roger,

On 20/04/17 16:17, Roger Pau Monne wrote:
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> new file mode 100644
> index 0000000000..aea6c68907
> --- /dev/null
> +++ b/xen/drivers/vpci/msi.c
> @@ -0,0 +1,469 @@
> +/*
> + * Handlers for accesses to the MSI capability structure.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +#include <asm/msi.h>
> +#include <xen/keyhandler.h>
> +
> +static void vpci_msi_mask_pirq(int pirq, bool mask)
> +{
> +        struct pirq *pinfo = pirq_info(current->domain, pirq);

We don't have pirq on ARM and don't plan to introduce it for MSI as 
interrupt will be handled directly by a virtual interrupt controller 
(see the vITS series [1]).

It would be nice if you can get the vPCI architecture agnostic. We would 
be to help here.

> +        struct irq_desc *desc;
> +        unsigned long flags;
> +        int irq;
> +
> +        ASSERT(pinfo);
> +        irq = pinfo->arch.irq;
> +        ASSERT(irq < nr_irqs);
> +
> +        desc = irq_to_desc(irq);

Similarly we don't have irq_desc for MSI.

> +        ASSERT(desc);
> +
> +        spin_lock_irqsave(&desc->lock, flags);
> +        guest_mask_msi_irq(desc, mask);
> +        spin_unlock_irqrestore(&desc->lock, flags);
> +}
> +

[...]

> +static int vpci_init_msi(struct pci_dev *pdev)
> +{
> +    uint8_t seg = pdev->seg, bus = pdev->bus;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    struct vpci_msi *msi = NULL;
> +    unsigned int msi_offset;
> +    uint16_t control;
> +    int rc;
> +
> +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
> +    if ( !msi_offset )
> +        return 0;
> +
> +    if ( !dom0_msi )

I would introduce an helper to allow per-architecture decision. Likely 
on ARM MSI will be enabled by default.

[...]

> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 0434aca706..899e37ae0f 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -126,6 +126,10 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
>  void msix_write_completion(struct vcpu *);
>  void msixtbl_init(struct domain *d);
>
> +/* Get the vector/flags from a MSI address/data fields. */
> +unsigned int msi_vector(uint16_t data);
> +unsigned int msi_flags(uint16_t data, uint64_t addr);

Should not those 2 helpers go in msi.h?

> +
>  enum stdvga_cache_state {
>      STDVGA_CACHE_UNINITIALIZED,
>      STDVGA_CACHE_ENABLED,
> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
> index a5de6a1328..dcbec8cf04 100644
> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -251,4 +251,6 @@ void ack_nonmaskable_msi_irq(struct irq_desc *);
>  void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
>  void set_msi_affinity(struct irq_desc *, const cpumask_t *);
>
> +extern bool dom0_msi;
> +
>  #endif /* __ASM_MSI_H */

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg01672.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-24 14:42   ` Julien Grall
@ 2017-04-25  8:01     ` Roger Pau Monne
  2017-04-25  9:09       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-25  8:01 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, boris.ostrovsky, Stefano Stabellini, Jan Beulich,
	Andrew Cooper

On Mon, Apr 24, 2017 at 03:42:08PM +0100, Julien Grall wrote:
> On 20/04/17 16:17, Roger Pau Monne wrote:
> >  /* Populate a HVM memory range using the biggest possible order. */
> > diff --git a/xen/common/memory.c b/xen/common/memory.c
> > index 52879e7438..0d970482cb 100644
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -1438,6 +1438,40 @@ int prepare_ring_for_helper(
> >      return 0;
> >  }
> > 
> > +int modify_mmio(struct domain *d, unsigned long gfn, unsigned long pfn,
> 
> Whilst you introduce this new function, please use mfn_t and gfn_t.
> 
> Also s/pfn/mfn/

Done.

> > +                unsigned long nr_pages, const bool map)
> > +{
> > +    int rc;
> > +
> > +    /*
> > +     * Make sure this function is only used by the hardware domain, because it
> > +     * can take an arbitrary long time, and could DoS the whole system.
> > +     */
> > +    ASSERT(is_hardware_domain(d));
> 
> What would be the plan for guest if we decide to use vpci?

One option would be to not allow the DomU to relocate it's BARs and ignore
writes to the 2nd bit of the command register (PCI_COMMAND_MEMORY), thus always
having the BARs mapped. The other is to somehow allow VMExit (and the ARM
equivalent) continuation (something similar to what we do with hypercalls).

> > +
> > +    for ( ; ; )
> > +    {
> > +        rc = (map ? map_mmio_regions : unmap_mmio_regions)
> 
> On ARM, map_mmio_regions and unmap_mmio_regions will map the MMIO with very
> strict attribute. I think we would need an extra argument to know the wanted
> memory attribute (maybe p2m_type_t?).

I'm not sure I can do anything regarding this ATM. Sorry for my ignorance, but
map_mmio_regions on ARM maps the region as p2m_mmio_direct_dev, and according
to the comment that's "Read/write mapping of genuine Device MMIO area", which
fits exactly into my usage (I'm using it to map BARs).

> > +             (d, _gfn(gfn), nr_pages, _mfn(pfn));
> > +        if ( rc == 0 )
> > +            break;
> > +        if ( rc < 0 )
> > +        {
> > +            printk(XENLOG_WARNING
> 
> I would probably use XENLOG_G_WARNING.

Done.

> > +                   "Failed to %smap [%#lx, %#lx) -> [%#lx,%#lx) for d%d: %d\n",
> > +                   map ? "" : "un", gfn, gfn + nr_pages, pfn, pfn + nr_pages,
> > +                   d->domain_id, rc);
> > +            break;
> > +        }
> > +        nr_pages -= rc;
> > +        pfn += rc;
> > +        gfn += rc;
> > +        process_pending_softirqs();
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h
> > index 8cd5a6b503..1308da44e7 100644
> > --- a/xen/include/xen/p2m-common.h
> > +++ b/xen/include/xen/p2m-common.h
> > @@ -13,4 +13,8 @@ int unmap_mmio_regions(struct domain *d,
> >                         unsigned long nr,
> >                         mfn_t mfn);
> > 
> > +
> 
> Spurious newline.

Done.

Thanks for the comments.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-24 11:50               ` Paul Durrant
@ 2017-04-25  8:27                 ` Roger Pau Monne
  2017-04-25  8:35                   ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-25  8:27 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

On Mon, Apr 24, 2017 at 12:50:58PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 24 April 2017 12:03
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>
> > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > accesses to the PCI config space
> > IMHO I'm not sure Xen needs PCI register based trapping granularity. I would
> > argue that whatever (IOREQ or Xen internal function) that wants to trap
> > access
> > to a specific PCI config device register needs to take care of all the
> > registers for that device.
> > 
> 
> Having distinct handers for distinct groups of makes sense though... e.g. being able to register a BAR handler for each BAR and then maybe an MSI-X capability handler for wherever that appears in the capability chain, etc. If you don't allow such registration at the top level then it ends up getting done at the next level.

Yes, that's what's done here. Handlers for specific registers are added at the
next level (vPCI). See patches 5, 6, 8 or 9 for examples.

> That said, it may make more sense to have a top level of emulation that just handles all register reads and writes to config space and then a second level that has callbacks for BAR enumeration, bus master enable, MSI-X mask/unmask, etc.
> 
> > I will look into hooking this code (vPCI) into the existing hvm_*_ioreq
> > functionality, so that vPCI claims the full PCI config space for each device it
> > manages.
> 
> Cool.

I've been looking into this, and I have to say this whole emulation handling is
a mess. The fact that Xen differentiates between internal and external (IOREQ)
handlers so early in the code (hvmemul_do_io) makes it far from trivial to
unify internal and external handlers, the more that external handlers have
grown a complex set of infrastructure that internal handlers don't have at
all.

Ideally I think the IOREQ filtering code should be generalized to apply to both
internal and external handlers, and the difference between external and
internal handlers should just be the set of functions that they use. External
ones would always use generic IOREQ functions for pushing requests to the
external emulators, while internal ones would just implement their own
functions.

That said, I think this is a non-trivial amount of work, that will further
delay this series. I don't see an easy way to integrate this code with the
current IOREQ code at all. I'm willing to do this, but I would rather have this
series merged first, so that other people can start working on PVH Dom0.

ATM, the only think I can see that could be easily shared between the IOREQ
code and vPCI is the PCI address decoding code.

Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
  2017-04-25  8:27                 ` Roger Pau Monne
@ 2017-04-25  8:35                   ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2017-04-25  8:35 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	boris.ostrovsky

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 25 April 2017 09:27
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>
> Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> accesses to the PCI config space
> 
> On Mon, Apr 24, 2017 at 12:50:58PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 24 April 2017 12:03
> > > To: Paul Durrant <Paul.Durrant@citrix.com>
> > > Cc: xen-devel@lists.xenproject.org; konrad.wilk@oracle.com;
> > > boris.ostrovsky@oracle.com; Ian Jackson <Ian.Jackson@citrix.com>; Wei
> Liu
> > > <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper
> > > <Andrew.Cooper3@citrix.com>
> > > Subject: Re: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap
> > > accesses to the PCI config space
> > > IMHO I'm not sure Xen needs PCI register based trapping granularity. I
> would
> > > argue that whatever (IOREQ or Xen internal function) that wants to trap
> > > access
> > > to a specific PCI config device register needs to take care of all the
> > > registers for that device.
> > >
> >
> > Having distinct handers for distinct groups of makes sense though... e.g.
> being able to register a BAR handler for each BAR and then maybe an MSI-X
> capability handler for wherever that appears in the capability chain, etc. If
> you don't allow such registration at the top level then it ends up getting done
> at the next level.
> 
> Yes, that's what's done here. Handlers for specific registers are added at the
> next level (vPCI). See patches 5, 6, 8 or 9 for examples.
> 
> > That said, it may make more sense to have a top level of emulation that
> just handles all register reads and writes to config space and then a second
> level that has callbacks for BAR enumeration, bus master enable, MSI-X
> mask/unmask, etc.
> >
> > > I will look into hooking this code (vPCI) into the existing hvm_*_ioreq
> > > functionality, so that vPCI claims the full PCI config space for each device
> it
> > > manages.
> >
> > Cool.
> 
> I've been looking into this, and I have to say this whole emulation handling is
> a mess.

Too right. It's pretty horrible.

>The fact that Xen differentiates between internal and external
> (IOREQ)
> handlers so early in the code (hvmemul_do_io) makes it far from trivial to
> unify internal and external handlers, the more that external handlers have
> grown a complex set of infrastructure that internal handlers don't have at
> all.
> 

Indeed. Arguably that's because the external emulation is asynchronous and therefore requires more infrastructure but I think a lot of the abstraction is the wrong way round.

> Ideally I think the IOREQ filtering code should be generalized to apply to both
> internal and external handlers, and the difference between external and
> internal handlers should just be the set of functions that they use.

Exactly.

> External
> ones would always use generic IOREQ functions for pushing requests to the
> external emulators, while internal ones would just implement their own
> functions.
> 

Yep. We are definitely on the same wavelength :-)

> That said, I think this is a non-trivial amount of work, that will further
> delay this series. I don't see an easy way to integrate this code with the
> current IOREQ code at all. I'm willing to do this, but I would rather have this
> series merged first, so that other people can start working on PVH Dom0.
> 

Fair enough. If you've looked and come to that conclusion then I trust your judgement.

> ATM, the only think I can see that could be easily shared between the IOREQ
> code and vPCI is the PCI address decoding code.
> 

Yes, maybe some utility functions/macros can be generalized. It's not much, but it's a start. Once 4.9 is out of the door I think there should be an I/O emulation cleanup/rationalization item for 4.10 which of course I'm happy to help with.

Cheers,

  Paul

> Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-25  8:01     ` Roger Pau Monne
@ 2017-04-25  9:09       ` Julien Grall
  2017-04-25  9:25         ` Roger Pau Monne
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2017-04-25  9:09 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Andrew Cooper, Jan Beulich, xen-devel,
	boris.ostrovsky, nd

Hi Roger,

On 25/04/2017 09:01, Roger Pau Monne wrote:
> On Mon, Apr 24, 2017 at 03:42:08PM +0100, Julien Grall wrote:
>> On 20/04/17 16:17, Roger Pau Monne wrote:
>>>  /* Populate a HVM memory range using the biggest possible order. */
>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>> index 52879e7438..0d970482cb 100644
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1438,6 +1438,40 @@ int prepare_ring_for_helper(
>>>      return 0;
>>>  }
>>>
>>> +int modify_mmio(struct domain *d, unsigned long gfn, unsigned long pfn,
>>
>> Whilst you introduce this new function, please use mfn_t and gfn_t.
>>
>> Also s/pfn/mfn/
>
> Done.
>
>>> +                unsigned long nr_pages, const bool map)
>>> +{
>>> +    int rc;
>>> +
>>> +    /*
>>> +     * Make sure this function is only used by the hardware domain, because it
>>> +     * can take an arbitrary long time, and could DoS the whole system.
>>> +     */
>>> +    ASSERT(is_hardware_domain(d));
>>
>> What would be the plan for guest if we decide to use vpci?
>
> One option would be to not allow the DomU to relocate it's BARs and ignore
> writes to the 2nd bit of the command register (PCI_COMMAND_MEMORY), thus always
> having the BARs mapped. The other is to somehow allow VMExit (and the ARM
> equivalent) continuation (something similar to what we do with hypercalls).

My understanding is BARs may be allocated by the kernel because the 
firmware didn't do it. This is the current case on ARM (and I guess x86) 
where Linux will always go through the BARs.

So if you do the first option, who would decide the position of the BARs?

For the second option, we can take advantage of superpage (4K, 2M, 1G) 
mapping on ARM, so the number of actual mapping would be really limited.

Also, we are looking at MMIO continuation for ARM for other part of the 
hypervisor. We might be able to leverage that for this function.

>
>>> +
>>> +    for ( ; ; )
>>> +    {
>>> +        rc = (map ? map_mmio_regions : unmap_mmio_regions)
>>
>> On ARM, map_mmio_regions and unmap_mmio_regions will map the MMIO with very
>> strict attribute. I think we would need an extra argument to know the wanted
>> memory attribute (maybe p2m_type_t?).
>
> I'm not sure I can do anything regarding this ATM. Sorry for my ignorance, but
> map_mmio_regions on ARM maps the region as p2m_mmio_direct_dev, and according
> to the comment that's "Read/write mapping of genuine Device MMIO area", which
> fits exactly into my usage (I'm using it to map BARs).

We have few p2m_mmio_direct_* p2m_type because the architecture allows 
us to have fine grain memory attribute.

The p2m type p2m_mmio_direct_dev is very restrictive (unaligned access 
forbid, non-cacheable, non-gatherable). This should be used for MMIO 
region that have side-effects and will affect performances.

We use this one by default as it will restrict the memory attribute used 
by the guest. However, this will be an issue for at least cacheable 
BARs. We had similar issue recently on ARM with SRAM device as driver 
may do unaligned access and cacheable one.

For DOM0 we are using p2m_mmio_direct_c and rely on the OS to restrict 
the memory attribute when necessary. We cannot do that for guest as this 
may have some security implications.

So for the guest we will do on the case by case basis. For instance we 
you map BAR, you know the kind and can decide of a proper memory attribute.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-25  9:09       ` Julien Grall
@ 2017-04-25  9:25         ` Roger Pau Monne
  2017-04-25  9:32           ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-25  9:25 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrew Cooper, Jan Beulich, xen-devel,
	boris.ostrovsky, nd

On Tue, Apr 25, 2017 at 10:09:34AM +0100, Julien Grall wrote:
> On 25/04/2017 09:01, Roger Pau Monne wrote:
> > On Mon, Apr 24, 2017 at 03:42:08PM +0100, Julien Grall wrote:
> > > On 20/04/17 16:17, Roger Pau Monne wrote:
> > > > +                unsigned long nr_pages, const bool map)
> > > > +{
> > > > +    int rc;
> > > > +
> > > > +    /*
> > > > +     * Make sure this function is only used by the hardware domain, because it
> > > > +     * can take an arbitrary long time, and could DoS the whole system.
> > > > +     */
> > > > +    ASSERT(is_hardware_domain(d));
> > > 
> > > What would be the plan for guest if we decide to use vpci?
> > 
> > One option would be to not allow the DomU to relocate it's BARs and ignore
> > writes to the 2nd bit of the command register (PCI_COMMAND_MEMORY), thus always
> > having the BARs mapped. The other is to somehow allow VMExit (and the ARM
> > equivalent) continuation (something similar to what we do with hypercalls).
> 
> My understanding is BARs may be allocated by the kernel because the firmware
> didn't do it. This is the current case on ARM (and I guess x86) where Linux
> will always go through the BARs.

No, on x86 BARs are allocated by the firmware. Linux or whatever OS will scan
the BARs in order to get it's position/size, but will not try to move them
AFAIK.

> So if you do the first option, who would decide the position of the BARs?

On x86 that would be what the firmware has set.

> For the second option, we can take advantage of superpage (4K, 2M, 1G)
> mapping on ARM, so the number of actual mapping would be really limited.

IIRC x86 should also do MMIO mappings with superpages. Maybe we should time how
long this takes and then make a decision.

> Also, we are looking at MMIO continuation for ARM for other part of the
> hypervisor. We might be able to leverage that for this function.

Indeed

> > 
> > > > +
> > > > +    for ( ; ; )
> > > > +    {
> > > > +        rc = (map ? map_mmio_regions : unmap_mmio_regions)
> > > 
> > > On ARM, map_mmio_regions and unmap_mmio_regions will map the MMIO with very
> > > strict attribute. I think we would need an extra argument to know the wanted
> > > memory attribute (maybe p2m_type_t?).
> > 
> > I'm not sure I can do anything regarding this ATM. Sorry for my ignorance, but
> > map_mmio_regions on ARM maps the region as p2m_mmio_direct_dev, and according
> > to the comment that's "Read/write mapping of genuine Device MMIO area", which
> > fits exactly into my usage (I'm using it to map BARs).
> 
> We have few p2m_mmio_direct_* p2m_type because the architecture allows us to
> have fine grain memory attribute.
> 
> The p2m type p2m_mmio_direct_dev is very restrictive (unaligned access
> forbid, non-cacheable, non-gatherable). This should be used for MMIO region
> that have side-effects and will affect performances.
> 
> We use this one by default as it will restrict the memory attribute used by
> the guest. However, this will be an issue for at least cacheable BARs. We
> had similar issue recently on ARM with SRAM device as driver may do
> unaligned access and cacheable one.
> 
> For DOM0 we are using p2m_mmio_direct_c and rely on the OS to restrict the
> memory attribute when necessary. We cannot do that for guest as this may
> have some security implications.
> 
> So for the guest we will do on the case by case basis. For instance we you
> map BAR, you know the kind and can decide of a proper memory attribute.

Oh, OK. I guess you can add a new parameter to pass whether the BAR is
prefetchable or not.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-25  9:25         ` Roger Pau Monne
@ 2017-04-25  9:32           ` Jan Beulich
  2017-04-26  8:26             ` Roger Pau Monne
  2017-04-27  8:58             ` Roger Pau Monne
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2017-04-25  9:32 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Andrew Cooper, Julien Grall, xen-devel,
	boris.ostrovsky, nd

>>> On 25.04.17 at 11:25, <roger.pau@citrix.com> wrote:
> On Tue, Apr 25, 2017 at 10:09:34AM +0100, Julien Grall wrote:
>> My understanding is BARs may be allocated by the kernel because the firmware
>> didn't do it. This is the current case on ARM (and I guess x86) where Linux
>> will always go through the BARs.
> 
> No, on x86 BARs are allocated by the firmware. Linux or whatever OS will scan
> the BARs in order to get it's position/size, but will not try to move them
> AFAIK.

That depends. Firmware is not required to set up all of them (only
such on devices needed for booting obviously need to be set up).
And Linux may (voluntarily or forced via command line option) still
move BARs.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/9] vpci/msi: add MSI handlers
  2017-04-24 15:31   ` Julien Grall
@ 2017-04-25 11:49     ` Roger Pau Monne
  2017-04-25 12:00       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-25 11:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, boris.ostrovsky, Stefano Stabellini, Punit Agrawal

On Mon, Apr 24, 2017 at 04:31:57PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 20/04/17 16:17, Roger Pau Monne wrote:
> > diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> > new file mode 100644
> > index 0000000000..aea6c68907
> > --- /dev/null
> > +++ b/xen/drivers/vpci/msi.c
> > @@ -0,0 +1,469 @@
> > +/*
> > + * Handlers for accesses to the MSI capability structure.
> > + *
> > + * Copyright (C) 2017 Citrix Systems R&D
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms and conditions of the GNU General Public
> > + * License, version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include <xen/sched.h>
> > +#include <xen/vpci.h>
> > +#include <asm/msi.h>
> > +#include <xen/keyhandler.h>
> > +
> > +static void vpci_msi_mask_pirq(int pirq, bool mask)
> > +{
> > +        struct pirq *pinfo = pirq_info(current->domain, pirq);
> 
> We don't have pirq on ARM and don't plan to introduce it for MSI as
> interrupt will be handled directly by a virtual interrupt controller (see
> the vITS series [1]).
> 
> It would be nice if you can get the vPCI architecture agnostic. We would be
> to help here.
>
> > +        struct irq_desc *desc;
> > +        unsigned long flags;
> > +        int irq;
> > +
> > +        ASSERT(pinfo);
> > +        irq = pinfo->arch.irq;
> > +        ASSERT(irq < nr_irqs);
> > +
> > +        desc = irq_to_desc(irq);
> 
> Similarly we don't have irq_desc for MSI.

OK, I've moved all the arch-specific functions into vmsi.c, and introduced a
vpci_arch_msi struct in order to store the PIRQ on x86.

> > +        ASSERT(desc);
> > +
> > +        spin_lock_irqsave(&desc->lock, flags);
> > +        guest_mask_msi_irq(desc, mask);
> > +        spin_unlock_irqrestore(&desc->lock, flags);
> > +}
> > +
> 
> [...]
> 
> > +static int vpci_init_msi(struct pci_dev *pdev)
> > +{
> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    struct vpci_msi *msi = NULL;
> > +    unsigned int msi_offset;
> > +    uint16_t control;
> > +    int rc;
> > +
> > +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
> > +    if ( !msi_offset )
> > +        return 0;
> > +
> > +    if ( !dom0_msi )
> 
> I would introduce an helper to allow per-architecture decision. Likely on
> ARM MSI will be enabled by default.

dom0_msi is also enabled by default on x86.

> > diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> > index 0434aca706..899e37ae0f 100644
> > --- a/xen/include/asm-x86/hvm/io.h
> > +++ b/xen/include/asm-x86/hvm/io.h
> > @@ -126,6 +126,10 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
> >  void msix_write_completion(struct vcpu *);
> >  void msixtbl_init(struct domain *d);
> > 
> > +/* Get the vector/flags from a MSI address/data fields. */
> > +unsigned int msi_vector(uint16_t data);
> > +unsigned int msi_flags(uint16_t data, uint64_t addr);
> 
> Should not those 2 helpers go in msi.h?

The other guest-related msi functions are in io.h, msi.h seems to only contain
functions that deal with the hardware itself (although I could be wrong).

Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/9] vpci/msi: add MSI handlers
  2017-04-25 11:49     ` Roger Pau Monne
@ 2017-04-25 12:00       ` Julien Grall
  2017-04-25 13:19         ` Roger Pau Monne
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2017-04-25 12:00 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: xen-devel, boris.ostrovsky, Stefano Stabellini, Punit Agrawal

Hi Roger,

On 25/04/17 12:49, Roger Pau Monne wrote:
> On Mon, Apr 24, 2017 at 04:31:57PM +0100, Julien Grall wrote:
>>> +static int vpci_init_msi(struct pci_dev *pdev)
>>> +{
>>> +    uint8_t seg = pdev->seg, bus = pdev->bus;
>>> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>>> +    struct vpci_msi *msi = NULL;
>>> +    unsigned int msi_offset;
>>> +    uint16_t control;
>>> +    int rc;
>>> +
>>> +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
>>> +    if ( !msi_offset )
>>> +        return 0;
>>> +
>>> +    if ( !dom0_msi )
>>
>> I would introduce an helper to allow per-architecture decision. Likely on
>> ARM MSI will be enabled by default.
>
> dom0_msi is also enabled by default on x86.

Sorry by default I meant that they will never be disabled on ARM. So you 
could introduce a helper similar to is_domain_direct_mapped avoid the 
introduction of dom0_msi for ARM.


-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/9] vpci/msi: add MSI handlers
  2017-04-25 12:00       ` Julien Grall
@ 2017-04-25 13:19         ` Roger Pau Monne
  0 siblings, 0 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-25 13:19 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, boris.ostrovsky, Stefano Stabellini, Punit Agrawal

On Tue, Apr 25, 2017 at 01:00:06PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 25/04/17 12:49, Roger Pau Monne wrote:
> > On Mon, Apr 24, 2017 at 04:31:57PM +0100, Julien Grall wrote:
> > > > +static int vpci_init_msi(struct pci_dev *pdev)
> > > > +{
> > > > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > > > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > > > +    struct vpci_msi *msi = NULL;
> > > > +    unsigned int msi_offset;
> > > > +    uint16_t control;
> > > > +    int rc;
> > > > +
> > > > +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
> > > > +    if ( !msi_offset )
> > > > +        return 0;
> > > > +
> > > > +    if ( !dom0_msi )
> > > 
> > > I would introduce an helper to allow per-architecture decision. Likely on
> > > ARM MSI will be enabled by default.
> > 
> > dom0_msi is also enabled by default on x86.
> 
> Sorry by default I meant that they will never be disabled on ARM. So you
> could introduce a helper similar to is_domain_direct_mapped avoid the
> introduction of dom0_msi for ARM.

OK, no problem. I've added two vpci_msi{x}_enabled macros that you can replace
with 'true' if you wish.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-25  9:32           ` Jan Beulich
@ 2017-04-26  8:26             ` Roger Pau Monne
  2017-04-26  8:51               ` Jan Beulich
  2017-04-27  8:58             ` Roger Pau Monne
  1 sibling, 1 reply; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-26  8:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Andrew Cooper, Julien Grall, xen-devel,
	boris.ostrovsky, nd

On Tue, Apr 25, 2017 at 03:32:50AM -0600, Jan Beulich wrote:
> >>> On 25.04.17 at 11:25, <roger.pau@citrix.com> wrote:
> > On Tue, Apr 25, 2017 at 10:09:34AM +0100, Julien Grall wrote:
> >> My understanding is BARs may be allocated by the kernel because the firmware
> >> didn't do it. This is the current case on ARM (and I guess x86) where Linux
> >> will always go through the BARs.
> > 
> > No, on x86 BARs are allocated by the firmware. Linux or whatever OS will scan
> > the BARs in order to get it's position/size, but will not try to move them
> > AFAIK.
> 
> That depends. Firmware is not required to set up all of them (only
> such on devices needed for booting obviously need to be set up).
> And Linux may (voluntarily or forced via command line option) still
> move BARs.

Right. In this series I allow the guest to change the position where the BARs
are mapped into the guest p2m, but I don't allow it to change the physical
address where the BAR is actually mapped. This might work well if all BARs
where positioned by the firmware, but if there are unset BARs I don't think Xen
is capable to make a good decision about it's position (because it lacks
information like ACPI OperationRegions), so I guess I should allow Dom0 to
write directly to the BAR and position it.

BTW, how does Xen deal with MSI-X tables inside of BARs? AFAICT a PV Dom0 is
able to move the BARs around as much as it wants.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-26  8:26             ` Roger Pau Monne
@ 2017-04-26  8:51               ` Jan Beulich
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2017-04-26  8:51 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Andrew Cooper, Julien Grall, xen-devel,
	boris.ostrovsky, nd

>>> On 26.04.17 at 10:26, <roger.pau@citrix.com> wrote:
> BTW, how does Xen deal with MSI-X tables inside of BARs? AFAICT a PV Dom0 is
> able to move the BARs around as much as it wants.

The current expectation is that this doesn't happen with (pre) set up
MSI-X interrupts (i.e. do BAR placement first, then set up interrupts).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-25  9:32           ` Jan Beulich
  2017-04-26  8:26             ` Roger Pau Monne
@ 2017-04-27  8:58             ` Roger Pau Monne
  2017-04-27  9:08               ` Julien Grall
  2017-04-27  9:29               ` Jan Beulich
  1 sibling, 2 replies; 40+ messages in thread
From: Roger Pau Monne @ 2017-04-27  8:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Andrew Cooper, Julien Grall, xen-devel,
	boris.ostrovsky, nd

On Tue, Apr 25, 2017 at 03:32:50AM -0600, Jan Beulich wrote:
> >>> On 25.04.17 at 11:25, <roger.pau@citrix.com> wrote:
> > On Tue, Apr 25, 2017 at 10:09:34AM +0100, Julien Grall wrote:
> >> My understanding is BARs may be allocated by the kernel because the firmware
> >> didn't do it. This is the current case on ARM (and I guess x86) where Linux
> >> will always go through the BARs.
> > 
> > No, on x86 BARs are allocated by the firmware. Linux or whatever OS will scan
> > the BARs in order to get it's position/size, but will not try to move them
> > AFAIK.
> 
> That depends. Firmware is not required to set up all of them (only
> such on devices needed for booting obviously need to be set up).
> And Linux may (voluntarily or forced via command line option) still
> move BARs.

The spec seems more strict here:

"Power-up software needs to build a consistent address map before booting the
machine to an operating system. This means it has to determine how much memory
is in the system, and how much address space the I/O controllers in the system
require. After determining this information, power-up software can map the I/O
controllers into reasonable locations and proceed with system boot."

This is from PCI LOCAL BUS SPECIFICATION, REV. 3.0. I read that as "firmware
will position all the BARs".

Moving the BARs is not a huge problem, this series already allows the guest to
move where the BARs are mapped in it's p2m, allowing a guest to set the initial
BAR position would also be feasible, but I haven't been able to find any device
on my boxes that's not initialized by the firmware, hence it would be hard for
me to test that.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-27  8:58             ` Roger Pau Monne
@ 2017-04-27  9:08               ` Julien Grall
  2017-04-27  9:29               ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Julien Grall @ 2017-04-27  9:08 UTC (permalink / raw)
  To: Roger Pau Monne, Jan Beulich
  Cc: Stefano Stabellini, Andrew Cooper, Julien Grall, xen-devel,
	boris.ostrovsky, nd


[-- Attachment #1.1: Type: text/plain, Size: 2218 bytes --]

Hi roger,

On Thu, 27 Apr 2017, 11:02 Roger Pau Monne, <roger.pau@citrix.com> wrote:

> On Tue, Apr 25, 2017 at 03:32:50AM -0600, Jan Beulich wrote:
> > >>> On 25.04.17 at 11:25, <roger.pau@citrix.com> wrote:
> > > On Tue, Apr 25, 2017 at 10:09:34AM +0100, Julien Grall wrote:
> > >> My understanding is BARs may be allocated by the kernel because the
> firmware
> > >> didn't do it. This is the current case on ARM (and I guess x86) where
> Linux
> > >> will always go through the BARs.
> > >
> > > No, on x86 BARs are allocated by the firmware. Linux or whatever OS
> will scan
> > > the BARs in order to get it's position/size, but will not try to move
> them
> > > AFAIK.
> >
> > That depends. Firmware is not required to set up all of them (only
> > such on devices needed for booting obviously need to be set up).
> > And Linux may (voluntarily or forced via command line option) still
> > move BARs.
>
> The spec seems more strict here:
>
> "Power-up software needs to build a consistent address map before booting
> the
> machine to an operating system. This means it has to determine how much
> memory
> is in the system, and how much address space the I/O controllers in the
> system
> require. After determining this information, power-up software can map the
> I/O
> controllers into reasonable locations and proceed with system boot."
>

It does not seem that strict to me. The spec says "power-up software can
map".

It is neither must nor should. So it may or may not.

As we spoke on the PCI passthrough design document the firmware is only
required to initialize device at used for boot.

So it does not cover hotplug devices nor devices not used for boot.


> This is from PCI LOCAL BUS SPECIFICATION, REV. 3.0. I read that as
> "firmware
> will position all the BARs".
>
> Moving the BARs is not a huge problem, this series already allows the
> guest to
> move where the BARs are mapped in it's p2m, allowing a guest to set the
> initial
> BAR position would also be feasible, but I haven't been able to find any
> device
> on my boxes that's not initialized by the firmware, hence it would be hard
> for
> me to test that.
>

I will have a look on my ARM box when I am back to test it.

Cheers,

>

[-- Attachment #1.2: Type: text/html, Size: 3188 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init
  2017-04-27  8:58             ` Roger Pau Monne
  2017-04-27  9:08               ` Julien Grall
@ 2017-04-27  9:29               ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2017-04-27  9:29 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Andrew Cooper, Julien Grall, xen-devel,
	boris.ostrovsky, nd

>>> On 27.04.17 at 10:58, <roger.pau@citrix.com> wrote:
> On Tue, Apr 25, 2017 at 03:32:50AM -0600, Jan Beulich wrote:
>> >>> On 25.04.17 at 11:25, <roger.pau@citrix.com> wrote:
>> > On Tue, Apr 25, 2017 at 10:09:34AM +0100, Julien Grall wrote:
>> >> My understanding is BARs may be allocated by the kernel because the 
> firmware
>> >> didn't do it. This is the current case on ARM (and I guess x86) where Linux
>> >> will always go through the BARs.
>> > 
>> > No, on x86 BARs are allocated by the firmware. Linux or whatever OS will 
> scan
>> > the BARs in order to get it's position/size, but will not try to move them
>> > AFAIK.
>> 
>> That depends. Firmware is not required to set up all of them (only
>> such on devices needed for booting obviously need to be set up).
>> And Linux may (voluntarily or forced via command line option) still
>> move BARs.
> 
> The spec seems more strict here:
> 
> "Power-up software needs to build a consistent address map before booting the
> machine to an operating system. This means it has to determine how much memory
> is in the system, and how much address space the I/O controllers in the system
> require. After determining this information, power-up software can map the I/O
> controllers into reasonable locations and proceed with system boot."

I don't view this as more strict: Note how the last sentence says
"can map", not "has to" or "will". There are actually downsides to
firmware doing it for all devices: Firmware can't know whether the
OS is capable of dealing with 64-bit BARs, yet it is undesirable
(and perhaps impossible) to place all of them below 4Gb.

> Moving the BARs is not a huge problem, this series already allows the guest to
> move where the BARs are mapped in it's p2m, allowing a guest to set the initial
> BAR position would also be feasible, but I haven't been able to find any device
> on my boxes that's not initialized by the firmware, hence it would be hard for
> me to test that.

Well, you could zap some instead of mapping them into Dom0's p2m,
you'd just need to be careful not to zap any which Dom0 needs for
booting (in Linux this may be no more than the graphics card and, if
used, a plug-in serial card, as everything else ought to come from
the initrd).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-04-27  9:29 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20 15:17 [PATCH v2 0/9] vpci: PCI config space emulation Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
2017-04-21 16:07   ` Paul Durrant
2017-04-24  9:09     ` Roger Pau Monne
2017-04-24  9:34       ` Paul Durrant
2017-04-24 10:08         ` Roger Pau Monne
2017-04-24 10:19           ` Paul Durrant
2017-04-24 11:02             ` Roger Pau Monne
2017-04-24 11:50               ` Paul Durrant
2017-04-25  8:27                 ` Roger Pau Monne
2017-04-25  8:35                   ` Paul Durrant
2017-04-21 16:23   ` Paul Durrant
2017-04-24  9:42     ` Roger Pau Monne
2017-04-24  9:55       ` Paul Durrant
2017-04-24  9:58       ` Paul Durrant
2017-04-24 10:11         ` Roger Pau Monne
2017-04-24 10:12           ` Paul Durrant
2017-04-20 15:17 ` [PATCH v2 2/9] x86/ecam: add handlers for the PVH Dom0 MMCFG areas Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 3/9] xen/mm: move modify_identity_mmio to global file and drop __init Roger Pau Monne
2017-04-24 14:42   ` Julien Grall
2017-04-25  8:01     ` Roger Pau Monne
2017-04-25  9:09       ` Julien Grall
2017-04-25  9:25         ` Roger Pau Monne
2017-04-25  9:32           ` Jan Beulich
2017-04-26  8:26             ` Roger Pau Monne
2017-04-26  8:51               ` Jan Beulich
2017-04-27  8:58             ` Roger Pau Monne
2017-04-27  9:08               ` Julien Grall
2017-04-27  9:29               ` Jan Beulich
2017-04-20 15:17 ` [PATCH v2 4/9] xen/pci: split code to size BARs from pci_add_device Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 5/9] xen/vpci: add handlers to map the BARs Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 6/9] xen/vpci: trap access to the list of PCI capabilities Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 7/9] vpci: add a priority field to the vPCI register initializer Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 8/9] vpci/msi: add MSI handlers Roger Pau Monne
2017-04-21  8:38   ` Roger Pau Monne
2017-04-24 15:31   ` Julien Grall
2017-04-25 11:49     ` Roger Pau Monne
2017-04-25 12:00       ` Julien Grall
2017-04-25 13:19         ` Roger Pau Monne
2017-04-20 15:17 ` [PATCH v2 9/9] vpci/msix: add MSI-X handlers Roger Pau Monne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.