All of lore.kernel.org
 help / color / mirror / Atom feed
* [v8][PATCH 00/17] xen: RMRR fix
@ 2014-12-01  9:24 Tiejun Chen
  2014-12-01  9:24 ` [v8][PATCH 01/17] tools/hvmloader: link errno.h from xen internal Tiejun Chen
                   ` (17 more replies)
  0 siblings, 18 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

v8:

A brief summary:

* Rebased on the latest
* We skip creating p2m table associated to RMRR range as Jan and Tim commented
  And especially, this is identified to setting these tables as INVALID since
  as you know all p2m tables already are initialized as INVALID. Actually I also
  tried to reset them as INVALID again, everything is still fine but some
  warning logs show we're resetting these Already-Invalided p2m entries so just
  keep skipping these entries.
* Add something into mem_access to skip these RMRR ranges to make sure we're in
  the same page.
* Provide a new approach to control if
    1. we want to check/reserve all RMRR ranges
    2. Or just some RMRR ranges specific to those devices we want to PT
  This brings a new parameter to enable/disable this, and new domCTL to
  get SBDF associated to those PT devices.
* Add a new check point in mem_hole_alloc() since we may use this to allocate
  some ranges living in runtime cycle, like igd_opregion_pgbase.
* Some code itself improvements/corrections.

The details:

0001-tools-hvmloader-link-errno.h-from-xen-internal.patch

This is a new patch but Acked by Jan.

0002-introduce-XEN_DOMCTL_set_rdm.patch

This is a new patch to introduce that parameter and domCTL. Please refer
to the patch head description.

0003-introduce-XENMEM_reserved_device_memory_map.patch

This is slightly rebased to make sure all codes should be
in PT case, and this is from Jan and Acked by Kevin. To me, I just
rebase on the latest.

0004-update-the-existing-hypercall-to-support-XEN_DOMCTL_.patch

This is a new patch since after we introduce that parameter and
domCTL, we need to rework that existing hypercall from #4 patch.

Most should be discussed between Jan and me, but I changed something
slightly again since I found we were missing some scenarios, like
the hypercall caller don't know construct how much buffers to carry
forward all necessary entries at first, and additionally, we also
need to expose a return value from iommu ops as some caller expects.

So I have to ask Jan take a look at this again. And maybe we can
squash this into #4 eventually.

0005-tools-libxc-introduce-hypercall-for-xc_reserved_devi.patch
0006-tools-libxc-check-if-modules-space-is-overlapping-wi.patch
0007-hvmloader-util-get-reserved-device-memory-maps.patch

Just rebase and cleanup.

0008-hvmloader-mmio-reconcile-guest-mmio-with-reserved-de.patch

Just rebase and cleanup, and some code corrections as Jan comments.

0009-hvmloader-ram-check-if-guest-memory-is-out-of-reserv.patch

Just rebase and cleanup, and some code improvements.

0010-hvmloader-mem_hole_alloc-skip-any-overlap-with-reser.patch

This is a new used to address igd_opregion_pgbase as Yang mentioned
to me.

0011-xen-x86-p2m-reject-populating-for-reserved-device-me.patch

Refactor some codes to handle such a case. Note please refer to
a description above.

0012-xen-x86-ept-handle-reserved-device-memory-in-ept_han.patch

Just improve comments.

0013-xen-mem_access-don-t-allow-accessing-reserved-device.patch

This is a new to sync mem_access after our action in memory populating.

0014-xen-x86-p2m-introduce-set_identity_p2m_entry.patch
0015-xen-vtd-create-RMRR-mapping.patch
0016-xen-vtd-group-assigned-device-with-RMRR.patch

There's nothing to change.

0017-xen-vtd-re-enable-USB-device-assignment-if-enable-pc.patch

Just rebase.


How to reproduce this issu:

* In shared-ept case with Xen.
* Target owns RMRR.
* Do IGD passthrough with Windows guest OS: gfx_passthru=1 pci=["00:02.0"]
* Please use qemu-xen-traditional.

My test machine is BDW with Windows 7.

v7:

This series of patches try to reconcile those remaining problems but
just post as RFC to ask for any comments to refine everything.

The current whole scheme is as follows:

1. Reconcile guest mmio with RMRR in pci_setup
2. Reconcile guest RAM with RMRR in e820 table

Then in theory guest wouldn't access any RMRR range.

3. Just initialize all RMRR ranges as p2m_access_n in p2m table:
    gfn:mfn:p2m_access_n

Here I think we shouldn't set 1:1 to expose RMRR to guest if guest
may never have a device assignment. It can prevent from leaking RMRR.

4. We reset those mappings as 1:1:p2m_mmio_direct:p2m_ram_rw once we
have a device assignment.

5. Before we take real device assignment, any access to RMRR may issue
ept_handle_violation because of p2m_access_n. Then we just call
update_guest_eip() to return.

6. After a device assignment, guest may maliciously access RMRR ranges
although we already reserve in e820 table. In the worst-case scenario
just that device can't work well. But this behavior should be same as
native so I think we shouldn't do anything here.

7. Its not necessary to introduce any flag in ept_set_entry.

First of all, hypervisor/dom0 should be trusted. Any user should make
sure they never override any valid RMRR tables without any check. So
our original set_identity_p2m_entry() tries to set as follows:

 - gfn space unoccupied -> insert mapping; success.
 - gfn space already occupied by 1:1 RMRR mapping -> do nothing; success.
 - gfn space already occupied by other mapping -> fail.

Now in our case we add a rule:
 - if p2m_access_n is set we also set this mapping.

Another reason is that ept_set_entry is called in many scenarios to
support its own management, I think we shouldn't corrupt this mechanism
and its also difficult to cover all points.

8. We need to take a consideration grouping all devices that have same
RMRR range to make sure they're just assigned to one VM.

----------------------------------------------------------------
Jan Beulich (1):
      introduce XENMEM_reserved_device_memory_map

Tiejun Chen (16):
      tools/hvmloader: link errno.h from xen internal
      introduce XEN_DOMCTL_set_rdm
      update the existing hypercall to support XEN_DOMCTL_set_rdm
      tools/libxc: introduce hypercall for xc_reserved_device_memory_map
      tools/libxc: check if modules space is overlapping with reserved device memory
      hvmloader/util: get reserved device memory maps
      hvmloader/mmio: reconcile guest mmio with reserved device memory
      hvmloader/ram: check if guest memory is out of reserved device memory maps
      hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
      xen/x86/p2m: reject populating for reserved device memory mapping
      xen/x86/ept: handle reserved device memory in ept_handle_violation
      xen/mem_access: don't allow accessing reserved device memory
      xen/x86/p2m: introduce set_identity_p2m_entry
      xen:vtd: create RMRR mapping
      xen/vtd: group assigned device with RMRR
      xen/vtd: re-enable USB device assignment if enable pci_force

 .gitignore                           |   1 +
 docs/man/xl.cfg.pod.5                |   6 ++++++
 docs/misc/vtd.txt                    |  15 ++++++++++++++
 tools/firmware/hvmloader/Makefile    |   7 ++++++-
 tools/firmware/hvmloader/e820.c      | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/pci.c       |  54 +++++++++++++++++++++++++++++++++++++++++++++++++-
 tools/firmware/hvmloader/util.c      |  90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 tools/firmware/hvmloader/util.h      |   4 ++++
 tools/libxc/include/xenctrl.h        |  11 +++++++++++
 tools/libxc/xc_domain.c              |  58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xc_hvm_build_x86.c       |  94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
 tools/libxl/libxl_create.c           |   3 +++
 tools/libxl/libxl_dm.c               |  47 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h         |   4 ++++
 tools/libxl/libxl_types.idl          |   2 ++
 tools/libxl/libxlu_pci.c             |   2 ++
 tools/libxl/xl_cmdimpl.c             |  10 ++++++++++
 xen/arch/x86/hvm/vmx/vmx.c           |  18 +++++++++++++++++
 xen/arch/x86/mm/p2m.c                |  87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 xen/common/compat/memory.c           |  97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 xen/common/mem_access.c              |  41 ++++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  |  94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      |  10 ++++++++++
 xen/drivers/passthrough/pci.c        |  39 ++++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/dmar.c   |  69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 xen/drivers/passthrough/vtd/dmar.h   |   3 +++
 xen/drivers/passthrough/vtd/extern.h |   1 +
 xen/drivers/passthrough/vtd/iommu.c  |  94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
 xen/drivers/passthrough/vtd/utils.c  |  18 +++++++++++++++++
 xen/include/asm-x86/hvm/domain.h     |   4 ++++
 xen/include/asm-x86/p2m.h            |  13 ++++++++++++
 xen/include/public/domctl.h          |  21 ++++++++++++++++++++
 xen/include/public/memory.h          |  29 ++++++++++++++++++++++++++-
 xen/include/xen/iommu.h              |   4 ++++
 xen/include/xen/pci.h                |   2 ++
 xen/include/xlat.lst                 |   3 ++-
 xen/xsm/flask/hooks.c                |   1 +
 37 files changed, 1187 insertions(+), 37 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [v8][PATCH 01/17] tools/hvmloader: link errno.h from xen internal
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-01  9:24 ` [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm Tiejun Chen
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We need to act on some specific hypercall error numbers, so
require the hypervisor view on the errno.h value rather than
just the build environment's number. So here link this headfile
from xen.

Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 .gitignore                        | 1 +
 tools/firmware/hvmloader/Makefile | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index b24e905..52c3038 100644
--- a/.gitignore
+++ b/.gitignore
@@ -127,6 +127,7 @@ tools/firmware/hvmloader/acpi/ssdt_*.h
 tools/firmware/hvmloader/hvmloader
 tools/firmware/hvmloader/roms.h
 tools/firmware/hvmloader/roms.inc
+tools/firmware/hvmloader/errno.h
 tools/firmware/rombios/BIOS-bochs-[^/]*
 tools/firmware/rombios/_rombios[^/]*_.c
 tools/firmware/rombios/rombios[^/]*.s
diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index 46a79c5..ef2337b 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -87,6 +87,11 @@ endif
 all: subdirs-all
 	$(MAKE) hvmloader
 
+subdirs-all: errno.h
+
+errno.h:
+	ln -sf $(XEN_ROOT)/xen/include/xen/errno.h .
+
 ovmf.o rombios.o seabios.o hvmloader.o: roms.inc
 smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(shell date +%m/%d/%Y)\""
 
@@ -136,7 +141,7 @@ endif
 
 .PHONY: clean
 clean: subdirs-clean
-	rm -f roms.inc roms.inc.new acpi.h
+	rm -f roms.inc roms.inc.new acpi.h errno.h
 	rm -f hvmloader hvmloader.tmp *.o $(DEPS)
 
 -include $(DEPS)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
  2014-12-01  9:24 ` [v8][PATCH 01/17] tools/hvmloader: link errno.h from xen internal Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  8:33   ` Tian, Kevin
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (15 subsequent siblings)
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

This should be based on a new parameter globally, 'pci_rdmforce'.

pci_rdmforce = 1 => Of course this should be 0 by default.

'1' means we should force check to reserve all ranges. If failed
VM wouldn't be created successfully. This also can give user a
chance to work well with later hotplug, even if not a device
assignment while creating VM.

But we can override that by one specific pci device:

pci = ['AA:BB.CC,rdmforce=0/1]

But this 'rdmforce' should be 1 by default since obviously any
passthrough device always need to do this. Actually no one really
want to set as '0' so it may be unnecessary but I'd like to leave
this as a potential approach.

So this domctl provides an approach to control how to populate
reserved device memory by tools.

Note we always post a message to user about this once we owns
RMRR.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 docs/man/xl.cfg.pod.5              |  6 +++++
 docs/misc/vtd.txt                  | 15 ++++++++++++
 tools/libxc/include/xenctrl.h      |  6 +++++
 tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
 tools/libxl/libxl_create.c         |  3 +++
 tools/libxl/libxl_dm.c             | 47 ++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h       |  4 ++++
 tools/libxl/libxl_types.idl        |  2 ++
 tools/libxl/libxlu_pci.c           |  2 ++
 tools/libxl/xl_cmdimpl.c           | 10 ++++++++
 xen/drivers/passthrough/pci.c      | 39 +++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
 xen/include/asm-x86/hvm/domain.h   |  4 ++++
 xen/include/public/domctl.h        | 21 +++++++++++++++++
 xen/xsm/flask/hooks.c              |  1 +
 15 files changed, 196 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 622ea53..9adc41e 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdmforce=BOOLEAN>
+
+(HVM/x86 only) Specifies that the VM would force to check and try to
+reserve all reserved device memory, like RMRR, associated to the PCI
+device. False (0) by default.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..23544d5 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,21 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+The BIOS controls some devices in terms of some reginos of memory used for
+these devices. This kind of region should be reserved before creating a VM
+to make sure they are not occupied by RAM/MMIO to conflict, and also we can
+create necessary IOMMU table successfully.
+
+To enable this globally, add "pci_rdmforce" in the config file:
+
+	pci_rdmforce = 1         (default is 0)
+
+Or just enable for a specific device:
+	pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..84012fe 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2038,6 +2038,12 @@ int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
                      uint32_t machine_bdf);
 
+int xc_domain_device_setrdm(xc_interface *xch,
+                            uint32_t domid,
+                            uint32_t num_pcidevs,
+                            uint32_t pci_rdmforce,
+                            struct xen_guest_pcidev_info *pcidevs);
+
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
                      uint32_t machine_bdf,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index b864872..7fd43e9 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1633,6 +1633,34 @@ int xc_assign_device(
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_device_setrdm(xc_interface *xch,
+                            uint32_t domid,
+                            uint32_t num_pcidevs,
+                            uint32_t pci_rdmforce,
+                            struct xen_guest_pcidev_info *pcidevs)
+{
+    int ret;
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BOUNCE(pcidevs,
+                             num_pcidevs*sizeof(xen_guest_pcidev_info_t),
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+    if ( xc_hypercall_bounce_pre(xch, pcidevs) )
+        return -1;
+
+    domctl.cmd = XEN_DOMCTL_set_rdm;
+    domctl.domain = (domid_t)domid;
+    domctl.u.set_rdm.flags = pci_rdmforce;
+    domctl.u.set_rdm.num_pcidevs = num_pcidevs;
+    set_xen_guest_handle(domctl.u.set_rdm.pcidevs, pcidevs);
+
+    ret = do_domctl(xch, &domctl);
+
+    xc_hypercall_bounce_post(xch, pcidevs);
+
+    return ret;
+}
+
 int xc_get_device_group(
     xc_interface *xch,
     uint32_t domid,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1198225..c615686 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -862,6 +862,9 @@ static void initiate_domain_create(libxl__egc *egc,
     ret = libxl__domain_build_info_setdefault(gc, &d_config->b_info);
     if (ret) goto error_out;
 
+    ret = libxl__domain_device_setrdm(gc, d_config, domid);
+    if (ret) goto error_out;
+
     if (!sched_params_valid(gc, domid, &d_config->b_info.sched_params)) {
         LOG(ERROR, "Invalid scheduling parameters\n");
         ret = ERROR_INVAL;
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 3e191c3..e50587d 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,53 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+int libxl__domain_device_setrdm(libxl__gc *gc,
+                                libxl_domain_config *d_config,
+                                uint32_t dm_domid)
+{
+    int i, ret;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    struct xen_guest_pcidev_info *pcidevs = NULL;
+    uint32_t rdmforce = 0;
+
+    if ( d_config->num_pcidevs )
+    {
+        pcidevs = malloc(d_config->num_pcidevs*sizeof(xen_guest_pcidev_info_t));
+        if ( pcidevs )
+        {
+            for (i = 0; i < d_config->num_pcidevs; i++)
+            {
+                pcidevs[i].devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
+                                             d_config->pcidevs[i].func);
+                pcidevs[i].bus = d_config->pcidevs[i].bus;
+                pcidevs[i].seg = d_config->pcidevs[i].domain;
+                pcidevs[i].flags = d_config->pcidevs[i].rdmforce &
+                                   PCI_DEV_RDM_CHECK;
+            }
+        }
+        else
+        {
+            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                               "Can't allocate for pcidevs.");
+            return -1;
+        }
+    }
+    rdmforce = libxl_defbool_val(d_config->b_info.rdmforce) ? 1 : 0;
+
+    /* Nothing to do. */
+    if ( !rdmforce && !d_config->num_pcidevs )
+        return 0;
+
+    ret = xc_domain_device_setrdm(ctx->xch, dm_domid,
+                                  (uint32_t)d_config->num_pcidevs,
+                                  rdmforce,
+                                  pcidevs);
+    if ( d_config->num_pcidevs )
+        free(pcidevs);
+
+    return ret;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a38f695..be397a6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1477,6 +1477,10 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_disks, libxl_device_disk *disks,
         int nr_channels, libxl_device_channel *channels);
 
+_hidden int libxl__domain_device_setrdm(libxl__gc *gc,
+                                        libxl_domain_config *info,
+                                        uint32_t domid);
+
 /*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f7fc695..0076a32 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -398,6 +398,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("kernel",           string),
     ("cmdline",          string),
     ("ramdisk",          string),
+    ("rdmforce",         libxl_defbool),
     ("u", KeyedUnion(None, libxl_domain_type, "type",
                 [("hvm", Struct(None, [("firmware",         string),
                                        ("bios",             libxl_bios_type),
@@ -518,6 +519,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdmforce", bool),
     ])
 
 libxl_device_vtpm = Struct("device_vtpm", [
diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..989eac8 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -143,6 +143,8 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
+                }else if ( !strcmp(optkey, "rdmforce") ) {
+                    pcidev->rdmforce = atoi(tok);
                 }else{
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 0e754e7..9c23733 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -919,6 +919,7 @@ static void parse_config_data(const char *config_source,
     int pci_msitranslate = 0;
     int pci_permissive = 0;
     int pci_seize = 0;
+    int pci_rdmforce = 0;
     int i, e;
 
     libxl_domain_create_info *c_info = &d_config->c_info;
@@ -1699,6 +1700,9 @@ skip_vfb:
     if (!xlu_cfg_get_long (config, "pci_seize", &l, 0))
         pci_seize = l;
 
+    if (!xlu_cfg_get_long (config, "pci_rdmforce", &l, 0))
+        pci_rdmforce = l;
+
     /* To be reworked (automatically enabled) once the auto ballooning
      * after guest starts is done (with PCI devices passed in). */
     if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {
@@ -1719,6 +1723,7 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            pcidev->rdmforce = pci_rdmforce;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
@@ -1726,6 +1731,11 @@ skip_vfb:
             libxl_defbool_set(&b_info->u.pv.e820_host, true);
     }
 
+    if ((c_info->type == LIBXL_DOMAIN_TYPE_HVM) && pci_rdmforce)
+        libxl_defbool_set(&b_info->rdmforce, true);
+    else
+        libxl_defbool_set(&b_info->rdmforce, false);
+
     switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
     case 0:
         {
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 78c6977..ae924ad 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -34,6 +34,7 @@
 #include <xen/tasklet.h>
 #include <xsm/xsm.h>
 #include <asm/msi.h>
+#include <xen/stdbool.h>
 
 struct pci_seg {
     struct list_head alldevs_list;
@@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
         }
         break;
 
+    case XEN_DOMCTL_set_rdm:
+    {
+        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
+        struct xen_guest_pcidev_info *pcidevs = NULL;
+        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
+
+        if ( d == NULL )
+            return -ESRCH;
+
+        d->arch.hvm_domain.pci_force =
+                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
+        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
+        d->arch.hvm_domain.pcidevs = NULL;
+
+        if ( xdsr->num_pcidevs )
+        {
+            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
+                                    xdsr->num_pcidevs);
+            if ( pcidevs == NULL )
+            {
+                rcu_unlock_domain(d);
+                return -ENOMEM;
+            }
+
+            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
+                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
+            {
+                xfree(pcidevs);
+                rcu_unlock_domain(d);
+                return -EFAULT;
+            }
+        }
+
+        d->arch.hvm_domain.pcidevs = pcidevs;
+        rcu_unlock_domain(d);
+    }
+        break;
+
     case XEN_DOMCTL_assign_device:
         if ( unlikely(d->is_dying) )
         {
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 1152c3a..5e41e7a 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
                         "  RMRR region: base_addr %"PRIx64
                         " end_address %"PRIx64"\n",
                         rmrru->base_address, rmrru->end_address);
+            /*
+             * TODO: we may provide a precise paramter just to reserve
+             * RMRR range specific to one device.
+             */
+            dprintk(XENLOG_WARNING VTDPREFIX,
+                    "So please set pci_rdmforce to reserve these ranges"
+                    " if you need such a device in hotplug case.\n");
+
             acpi_register_rmrr_unit(rmrru);
         }
     }
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 2757c7f..38530e5 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -90,6 +90,10 @@ struct hvm_domain {
     /* Cached CF8 for guest PCI config cycles */
     uint32_t                pci_cf8;
 
+    bool_t                  pci_force;
+    uint32_t                num_pcidevs;
+    struct xen_guest_pcidev_info      *pcidevs;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 57e2ed7..ba8970d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
 typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
 
+/* Currently just one bit to indicate force to check Reserved Device Memory. */
+#define PCI_DEV_RDM_CHECK   0x1
+struct xen_guest_pcidev_info {
+    uint16_t    seg;
+    uint8_t     bus;
+    uint8_t     devfn;
+    uint32_t    flags;
+};
+typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
+/* Control whether/how we check and reserve device memory. */
+struct xen_domctl_set_rdm {
+    uint32_t    flags;
+    uint32_t    num_pcidevs;
+    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
+};
+typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
+
 /* Pass-through interrupts: bind real irq -> hvm devfn. */
 /* XEN_DOMCTL_bind_pt_irq */
 /* XEN_DOMCTL_unbind_pt_irq */
@@ -1070,6 +1089,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_setvnumainfo                  74
 #define XEN_DOMCTL_psr_cmt_op                    75
 #define XEN_DOMCTL_arm_configure_domain          76
+#define XEN_DOMCTL_set_rdm                       77
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1135,6 +1155,7 @@ struct xen_domctl {
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
         struct xen_domctl_vnuma             vnuma;
         struct xen_domctl_psr_cmt_op        psr_cmt_op;
+        struct xen_domctl_set_rdm           set_rdm;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index d48463f..5a760e2 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_test_assign_device:
     case XEN_DOMCTL_assign_device:
     case XEN_DOMCTL_deassign_device:
+    case XEN_DOMCTL_set_rdm:
 #endif
         return 0;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
  2014-12-01  9:24 ` [v8][PATCH 01/17] tools/hvmloader: link errno.h from xen internal Tiejun Chen
  2014-12-01  9:24 ` [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02 19:47   ` Konrad Rzeszutek Wilk
  2014-12-01  9:24 ` [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm Tiejun Chen
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/common/compat/memory.c           | 54 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 51 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 +++++++
 xen/drivers/passthrough/vtd/dmar.c   | 17 ++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 24 +++++++++++++++-
 xen/include/xen/iommu.h              |  4 +++
 xen/include/xlat.lst                 |  3 +-
 9 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 06c90be..60512fa 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -16,6 +16,37 @@ CHECK_TYPE(domid);
 
 CHECK_mem_access_op;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start,
+                                      xen_ulong_t nr, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+
+    if ( grdm->used_entries < grdm->map.nr_entries )
+    {
+        struct compat_reserved_device_memory rdm = {
+            .start_pfn = start, .nr_pages = nr
+        };
+
+        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+            return -ERANGE;
+
+        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
+                                     &rdm, 1) )
+            return -EFAULT;
+    }
+
+    ++grdm->used_entries;
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -273,6 +304,29 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( __copy_to_guest(compat, &grdm.map, 1) )
+                rc = -EFAULT;
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 9f21bd3..4788acc 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -692,6 +692,34 @@ out:
     return rc;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start,
+                                      xen_ulong_t nr, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+
+    if ( grdm->used_entries < grdm->map.nr_entries )
+    {
+        struct xen_reserved_device_memory rdm = {
+            .start_pfn = start, .nr_pages = nr
+        };
+
+        if ( __copy_to_guest_offset(grdm->map.buffer, grdm->used_entries,
+                                    &rdm, 1) )
+            return -EFAULT;
+    }
+
+    ++grdm->used_entries;
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1101,6 +1129,29 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( __copy_to_guest(arg, &grdm.map, 1) )
+            rc = -EFAULT;
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index cc12735..7c17e8d 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -344,6 +344,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 5e41e7a..86cfad3 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -901,3 +901,20 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr;
+    int rc = 0;
+
+    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
+    {
+        rc = func(PFN_DOWN(rmrr->base_address),
+                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
+                  ctxt);
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 19d8165..a38f201 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 595f953..cee4535 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -572,7 +572,29 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * For legacy reasons, some devices must be configured with special memory
+ * regions to function correctly.  The guest must avoid using any of these
+ * regions.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN/OUT */
+    unsigned int nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 8eb764a..409f6f8 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -120,6 +120,8 @@ void iommu_dt_domain_destroy(struct domain *d);
 
 struct page_info;
 
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -156,12 +158,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 41b3e35..42229fd 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	physdev_eoi			physdev.h
 ?	physdev_get_free_pirq		physdev.h
 ?	physdev_irq			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (2 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  8:46   ` Tian, Kevin
  2014-12-04 15:50   ` Jan Beulich
  2014-12-01  9:24 ` [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map Tiejun Chen
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

After we intend to expost that hypercall explicitly based on
XEN_DOMCTL_set_rdm, we need this rebase. I hope we can squash
this into that previous patch once Jan Ack this.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/common/compat/memory.c         | 75 ++++++++++++++++++++++++++++++--------
 xen/common/memory.c                | 71 +++++++++++++++++++++++++++++-------
 xen/drivers/passthrough/vtd/dmar.c | 32 ++++++++++++----
 xen/include/public/memory.h        |  5 +++
 xen/include/xen/iommu.h            |  2 +-
 xen/include/xen/pci.h              |  2 +
 6 files changed, 148 insertions(+), 39 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 60512fa..e6a256e 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -22,27 +22,66 @@ struct get_reserved_device_memory {
     unsigned int used_entries;
 };
 
-static int get_reserved_device_memory(xen_pfn_t start,
-                                      xen_ulong_t nr, void *ctxt)
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
 {
     struct get_reserved_device_memory *grdm = ctxt;
+    struct domain *d;
+    unsigned int i;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
 
-    if ( grdm->used_entries < grdm->map.nr_entries )
-    {
-        struct compat_reserved_device_memory rdm = {
-            .start_pfn = start, .nr_pages = nr
-        };
+    if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+        return -ERANGE;
 
-        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
-            return -ERANGE;
+    d = rcu_lock_domain_by_any_id(grdm->map.domid);
+    if ( d == NULL )
+        return -ESRCH;
 
-        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
-                                     &rdm, 1) )
-            return -EFAULT;
+    if ( d )
+    {
+        if ( d->arch.hvm_domain.pci_force )
+        {
+            if ( grdm->used_entries < grdm->map.nr_entries )
+            {
+                if ( __copy_to_compat_offset(grdm->map.buffer,
+                                             grdm->used_entries,
+                                             &rdm, 1) )
+                {
+                    rcu_unlock_domain(d);
+                    return -EFAULT;
+                }
+            }
+            ++grdm->used_entries;
+        }
+        else
+        {
+            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
+            {
+                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
+                                 d->arch.hvm_domain.pcidevs[i].bus,
+                                 d->arch.hvm_domain.pcidevs[i].devfn);
+                if ( sbdf == id )
+                {
+                    if ( grdm->used_entries < grdm->map.nr_entries )
+                    {
+                        if ( __copy_to_compat_offset(grdm->map.buffer,
+                                                     grdm->used_entries,
+                                                     &rdm, 1) )
+                        {
+                            rcu_unlock_domain(d);
+                            return -EFAULT;
+                        }
+                    }
+                    ++grdm->used_entries;
+                }
+            }
+        }
     }
 
-    ++grdm->used_entries;
-
+    rcu_unlock_domain(d);
     return 0;
 }
 #endif
@@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 
             if ( !rc && grdm.map.nr_entries < grdm.used_entries )
                 rc = -ENOBUFS;
+
             grdm.map.nr_entries = grdm.used_entries;
-            if ( __copy_to_guest(compat, &grdm.map, 1) )
-                rc = -EFAULT;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
 
             return rc;
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 4788acc..9ce82b1 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -698,24 +698,63 @@ struct get_reserved_device_memory {
     unsigned int used_entries;
 };
 
-static int get_reserved_device_memory(xen_pfn_t start,
-                                      xen_ulong_t nr, void *ctxt)
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
 {
     struct get_reserved_device_memory *grdm = ctxt;
+    struct domain *d;
+    unsigned int i;
+    u32 sbdf;
+    struct xen_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
 
-    if ( grdm->used_entries < grdm->map.nr_entries )
-    {
-        struct xen_reserved_device_memory rdm = {
-            .start_pfn = start, .nr_pages = nr
-        };
+    d = rcu_lock_domain_by_any_id(grdm->map.domid);
+    if ( d == NULL )
+        return -ESRCH;
 
-        if ( __copy_to_guest_offset(grdm->map.buffer, grdm->used_entries,
-                                    &rdm, 1) )
-            return -EFAULT;
+    if ( d )
+    {
+        if ( d->arch.hvm_domain.pci_force )
+        {
+            if ( grdm->used_entries < grdm->map.nr_entries )
+            {
+                if ( __copy_to_guest_offset(grdm->map.buffer,
+                                            grdm->used_entries,
+                                            &rdm, 1) )
+                {
+                    rcu_unlock_domain(d);
+                    return -EFAULT;
+                }
+            }
+            ++grdm->used_entries;
+        }
+        else
+        {
+            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
+            {
+                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
+                                 d->arch.hvm_domain.pcidevs[i].bus,
+                                 d->arch.hvm_domain.pcidevs[i].devfn);
+                if ( sbdf == id )
+                {
+                    if ( grdm->used_entries < grdm->map.nr_entries )
+                    {
+                        if ( __copy_to_guest_offset(grdm->map.buffer,
+                                                    grdm->used_entries,
+                                                    &rdm, 1) )
+                        {
+                            rcu_unlock_domain(d);
+                            return -EFAULT;
+                        }
+                    }
+                    ++grdm->used_entries;
+                }
+            }
+        }
     }
 
-    ++grdm->used_entries;
-
+    rcu_unlock_domain(d);
     return 0;
 }
 #endif
@@ -1144,9 +1183,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         if ( !rc && grdm.map.nr_entries < grdm.used_entries )
             rc = -ENOBUFS;
+
         grdm.map.nr_entries = grdm.used_entries;
-        if ( __copy_to_guest(arg, &grdm.map, 1) )
-            rc = -EFAULT;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
 
         break;
     }
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 86cfad3..c5bc8d6 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
 
 int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
 {
-    struct acpi_rmrr_unit *rmrr;
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
     int rc = 0;
+    unsigned int i;
+    u16 bdf;
 
-    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
+    for_each_rmrr_device ( rmrr, bdf, i )
     {
-        rc = func(PFN_DOWN(rmrr->base_address),
-                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
-                  ctxt);
-        if ( rc )
-            break;
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            /* Just go next. */
+            if ( !rc )
+                rmrr_cur = rmrr;
+
+            /* Now just return specific to user requirement. */
+            if ( rc > 0 )
+                return rc;
+        }
     }
 
-    return rc;
+    return 0;
 }
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index cee4535..0d0544e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
 DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
 
 struct xen_reserved_device_memory_map {
+    /*
+     * Domain whose reservation is being changed.
+     * Unprivileged domains can specify only DOMID_SELF.
+     */
+    domid_t        domid;
     /* IN/OUT */
     unsigned int nr_entries;
     /* OUT */
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 409f6f8..8fc6d6d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -120,7 +120,7 @@ void iommu_dt_domain_destroy(struct domain *d);
 
 struct page_info;
 
-typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, void *ctxt);
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
 
 struct iommu_ops {
     int (*init)(struct domain *d);
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 5f295f3..d34205f 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -31,6 +31,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (3 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  8:46   ` Tian, Kevin
  2014-12-02 19:50   ` Konrad Rzeszutek Wilk
  2014-12-01  9:24 ` [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory Tiejun Chen
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We will introduce that hypercall xc_reserved_device_memory_map
approach to libxc.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxc/include/xenctrl.h |  5 +++++
 tools/libxc/xc_domain.c       | 30 ++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 84012fe..a3aeac3 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1294,6 +1294,11 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t dom,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 7fd43e9..09fd988 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -679,6 +679,36 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t domid,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .domid = domid,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc ? rc : xrdmmap.nr_entries;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (4 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  8:54   ` Tian, Kevin
  2014-12-02 19:55   ` Konrad Rzeszutek Wilk
  2014-12-01  9:24 ` [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps Tiejun Chen
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

In case of reserved device memory overlapping with ram, it also probably
overlap with modules space so we need to check these reserved device
memory as well.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxc/xc_hvm_build_x86.c | 94 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 79 insertions(+), 15 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index c81a25b..ddcf06d 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -54,9 +54,82 @@
 
 #define VGA_HOLE_SIZE (0x20)
 
+/*
+ * Check whether there exists mmio hole in the specified memory range.
+ * Returns 1 if exists, else returns 0.
+ */
+static int check_mmio_hole(uint64_t start, uint64_t memsize,
+                           uint64_t mmio_start, uint64_t mmio_size)
+{
+    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
+        return 0;
+    else
+        return 1;
+}
+
+/* Getting all reserved device memory map info. */
+static struct xen_reserved_device_memory
+*xc_get_reserved_device_memory_map(xc_interface *xch, unsigned int nr_entries,
+                                   uint32_t dom)
+{
+    struct xen_reserved_device_memory *xrdm = NULL;
+    int rc = xc_reserved_device_memory_map(xch, dom, xrdm, &nr_entries);
+
+    if ( rc < 0 )
+    {
+        if ( errno == ENOBUFS )
+        {
+            if ( (xrdm = malloc(nr_entries *
+                                sizeof(xen_reserved_device_memory_t))) == NULL )
+            {
+                PERROR("Could not allocate memory.");
+                return 0;
+            }
+            rc = xc_reserved_device_memory_map(xch, dom, xrdm, &nr_entries);
+            if ( rc )
+            {
+                PERROR("Could not get reserved device memory maps.");
+                free(xrdm);
+                return 0;
+            }
+        }
+        else
+            PERROR("Could not get reserved device memory maps.");
+    }
+
+    return xrdm;
+}
+
+static int xc_check_modules_space(xc_interface *xch, uint64_t *mstart_out,
+                                  uint64_t *mend_out, uint32_t dom)
+{
+    unsigned int i = 0, nr_entries = 0;
+    uint64_t rdm_start = 0, rdm_end = 0;
+    struct xen_reserved_device_memory *rdm_map =
+                        xc_get_reserved_device_memory_map(xch, nr_entries, dom);
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        rdm_start = (uint64_t)rdm_map[i].start_pfn << XC_PAGE_SHIFT;
+        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << XC_PAGE_SHIFT);
+
+        /* Just use check_mmio_hole() to check modules ranges. */
+        if ( check_mmio_hole(rdm_start,
+                             rdm_end - rdm_start,
+                             *mstart_out, *mend_out) )
+            return -1;
+    }
+    
+    free(rdm_map);
+
+    return 0;
+}
+
 static int modules_init(struct xc_hvm_build_args *args,
                         uint64_t vend, struct elf_binary *elf,
-                        uint64_t *mstart_out, uint64_t *mend_out)
+                        uint64_t *mstart_out, uint64_t *mend_out,
+                        xc_interface *xch,
+                        uint32_t dom)
 {
 #define MODULE_ALIGN 1UL << 7
 #define MB_ALIGN     1UL << 20
@@ -80,6 +153,10 @@ static int modules_init(struct xc_hvm_build_args *args,
     if ( *mend_out > vend )    
         return -1;
 
+    /* Is it overlapping with reserved device memory? */
+    if ( xc_check_modules_space(xch, mstart_out, mend_out, dom) )
+        return -1;
+
     if ( args->acpi_module.length != 0 )
         args->acpi_module.guest_addr_out = *mstart_out;
     if ( args->smbios_module.length != 0 )
@@ -226,19 +303,6 @@ static int loadmodules(xc_interface *xch,
     return rc;
 }
 
-/*
- * Check whether there exists mmio hole in the specified memory range.
- * Returns 1 if exists, else returns 0.
- */
-static int check_mmio_hole(uint64_t start, uint64_t memsize,
-                           uint64_t mmio_start, uint64_t mmio_size)
-{
-    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
-        return 0;
-    else
-        return 1;
-}
-
 static int setup_guest(xc_interface *xch,
                        uint32_t dom, struct xc_hvm_build_args *args,
                        char *image, unsigned long image_size)
@@ -282,7 +346,7 @@ static int setup_guest(xc_interface *xch,
         goto error_out;
     }
 
-    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
+    if ( modules_init(args, v_end, &elf, &m_start, &m_end, xch, dom) != 0 )
     {
         ERROR("Insufficient space to load modules.");
         goto error_out;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (5 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  8:59   ` Tian, Kevin
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory Tiejun Chen
                   ` (10 subsequent siblings)
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We need to use reserved device memory maps with multiple times, so
provide just one common function should be friend.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/util.c | 59 +++++++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h |  2 ++
 2 files changed, 61 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..dd81fb6 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -22,11 +22,14 @@
 #include "config.h"
 #include "hypercall.h"
 #include "ctype.h"
+#include "errno.h"
 #include <stdint.h>
 #include <xen/xen.h>
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+struct xen_reserved_device_memory *rdm_map;
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -828,6 +831,62 @@ int hpet_exists(unsigned long hpet_base)
     return ((hpet_id >> 16) == 0x8086);
 }
 
+static int
+get_reserved_device_memory_map(struct xen_reserved_device_memory entries[],
+                               uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .domid = DOMID_SELF,
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_reserved_device_memory_map, &xrdmmap);
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
+/*
+ * Getting all reserved device memory map info in case of hvmloader.
+ * We just return zero for any failed cases, and this means we
+ * can't further handle any reserved device memory.
+ */
+unsigned int hvm_get_reserved_device_memory_map(void)
+{
+    static unsigned int nr_entries = 0;
+    int rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
+
+    if ( rc == -ENOBUFS )
+    {
+        rdm_map = mem_alloc(nr_entries*sizeof(struct xen_reserved_device_memory),
+                            0);
+        if ( rdm_map )
+        {
+            rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
+            if ( rc )
+            {
+                printf("Could not get reserved dev memory info on domain");
+                return 0;
+            }
+        }
+        else
+        {
+            printf("No space to get reserved dev memory maps!\n");
+            return 0;
+        }
+    }
+    else if ( rc )
+    {
+        printf("Could not get reserved dev memory info on domain");
+        return 0;
+    }
+
+    return nr_entries;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index a70e4aa..e4f1851 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -241,6 +241,8 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int bios_image_base);
 void dump_e820_table(struct e820entry *e820, unsigned int nr);
 
+unsigned int hvm_get_reserved_device_memory_map(void);
+
 #ifndef NDEBUG
 void perform_tests(void);
 #else
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (6 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  9:11   ` Tian, Kevin
  2014-12-04 16:04   ` Jan Beulich
  2014-12-01  9:24 ` [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We need to make sure all mmio allocation don't overlap
any rdm, reserved device memory. Here we just skip
all reserved device memory range in mmio space.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/pci.c  | 54 ++++++++++++++++++++++++++++++++++++++++-
 tools/firmware/hvmloader/util.c |  9 +++++++
 tools/firmware/hvmloader/util.h |  2 ++
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 4e8d803..fc22ab3 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,30 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static unsigned int need_skip_rmrr;
+extern struct xen_reserved_device_memory *rdm_map;
+
+static unsigned int
+check_reserved_device_memory_map(uint64_t mmio_base, uint64_t mmio_max)
+{
+    uint32_t i;
+    uint64_t rdm_start, rdm_end;
+    unsigned int nr_rdm_entries = hvm_get_reserved_device_memory_map();
+
+    for ( i = 0; i < nr_rdm_entries; i++ )
+    {
+        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
+        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << PAGE_SHIFT);
+        if ( check_rdm_hole_conflict(mmio_base, mmio_max - mmio_base,
+                                     rdm_start, rdm_end - rdm_start) )
+        {
+            need_skip_rmrr++;
+        }
+    }
+
+    return nr_rdm_entries;
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -59,8 +83,10 @@ void pci_setup(void)
         uint32_t bar_reg;
         uint64_t bar_sz;
     } *bars = (struct bars *)scratch_start;
-    unsigned int i, nr_bars = 0;
+    unsigned int i, j, nr_bars = 0;
     uint64_t mmio_hole_size = 0;
+    unsigned int nr_rdm_entries;
+    uint64_t rdm_start, rdm_end;
 
     const char *s;
     /*
@@ -338,6 +364,14 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;
 
+    /* Check low mmio range. */
+    nr_rdm_entries = check_reserved_device_memory_map(mem_resource.base,
+                                                      mem_resource.max);
+    /* Check high mmio range. */
+    if ( nr_rdm_entries )
+        nr_rdm_entries = check_reserved_device_memory_map(high_mem_resource.base,
+                                                          high_mem_resource.max);
+
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -393,8 +427,26 @@ void pci_setup(void)
         }
 
         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_mmio:
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
+
+        if ( need_skip_rmrr )
+        {
+            for ( j = 0; j < nr_rdm_entries; j++ )
+            {
+                rdm_start = (uint64_t)rdm_map[j].start_pfn << PAGE_SHIFT;
+                rdm_end = rdm_start + ((uint64_t)rdm_map[j].nr_pages << PAGE_SHIFT);
+                if ( check_rdm_hole_conflict(base, bar_sz,
+                                             rdm_start, rdm_end - rdm_start) )
+                {
+                    base = (rdm_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+                    need_skip_rmrr--;
+                    goto reallocate_mmio;
+                }
+            }
+        }
+
         base += bar_sz;
 
         if ( (base < resource->base) || (base > resource->max) )
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index dd81fb6..8767897 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -887,6 +887,15 @@ unsigned int hvm_get_reserved_device_memory_map(void)
     return nr_entries;
 }
 
+int check_rdm_hole_conflict(uint64_t start, uint64_t size,
+                            uint64_t rdm_start, uint64_t rdm_size)
+{
+    if ( start + size <= rdm_start || start >= rdm_start + rdm_size )
+        return 0;
+    else
+        return 1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index e4f1851..9b02f95 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -242,6 +242,8 @@ int build_e820_table(struct e820entry *e820,
 void dump_e820_table(struct e820entry *e820, unsigned int nr);
 
 unsigned int hvm_get_reserved_device_memory_map(void);
+int check_rdm_hole_conflict(uint64_t start, uint64_t size,
+                            uint64_t rdm_start, uint64_t rdm_size);
 
 #ifndef NDEBUG
 void perform_tests(void);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (7 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  9:42   ` Tian, Kevin
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory Tiejun Chen
                   ` (8 subsequent siblings)
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We need to check to reserve all reserved device memory maps in e820
to avoid any potential guest memory conflict.

Currently, if we can't insert RDM entries directly, we may need to handle
several ranges as follows:
a. Fixed Ranges --> BUG()
 lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
 BIOS region,
 RESERVED_MEMBASE ~ 0x100000000,
b. RAM or RAM:Hole -> Try to reserve

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/e820.c | 168 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 168 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..ef87e41 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -22,6 +22,7 @@
 
 #include "config.h"
 #include "util.h"
+#include <xen/memory.h>
 
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
@@ -68,12 +69,173 @@ void dump_e820_table(struct e820entry *e820, unsigned int nr)
     }
 }
 
+extern struct xen_reserved_device_memory *rdm_map;
+static unsigned int construct_rdm_e820_maps(unsigned int next_e820_entry_index,
+                                            uint32_t nr_map,
+                                            struct xen_reserved_device_memory *map,
+                                            struct e820entry *e820,
+                                            unsigned int lowmem_reserved_base,
+                                            unsigned int bios_image_base)
+{
+    unsigned int i, j, sum_nr;
+    uint64_t start, end, next_start, rdm_start, rdm_end;
+    uint32_t type;
+    int err = 0;
+
+    for ( i = 0; i < nr_map; i++ )
+    {
+        rdm_start = (uint64_t)map[i].start_pfn << PAGE_SHIFT;
+        rdm_end = rdm_start + ((uint64_t)map[i].nr_pages << PAGE_SHIFT);
+
+        for ( j = 0; j < next_e820_entry_index - 1; j++ )
+        {
+            sum_nr = next_e820_entry_index + nr_map;
+            start = e820[j].addr;
+            end = e820[j].addr + e820[j].size;
+            type = e820[j].type;
+            next_start = e820[j+1].addr;
+
+            if ( rdm_start >= start && rdm_start <= end )
+            {
+                /*
+                 * lowmem_reserved_base-0xA0000: reserved by BIOS
+                 * implementation.
+                 * Or BIOS region.
+                 */
+                if ( (lowmem_reserved_base < 0xA0000 &&
+                        start == lowmem_reserved_base) ||
+                     start == bios_image_base )
+                {
+                    err = -1;
+                    break;
+                }
+            }
+
+            /* Just amid those remaining e820 entries. */
+            if ( (rdm_start > end) && (rdm_end < next_start) )
+            {
+                memmove(&e820[j+2], &e820[j+1],
+                        (sum_nr - j - 1) * sizeof(struct e820entry));
+
+                /* Then fill RMRR into that entry. */
+                e820[j+1].addr = rdm_start;
+                e820[j+1].size = rdm_end - rdm_start;
+                e820[j+1].type = E820_RESERVED;
+                next_e820_entry_index++;
+                continue;
+            }
+
+            /* Already at the end. */
+            if ( (rdm_start > end) && !next_start )
+            {
+                e820[next_e820_entry_index].addr = rdm_start;
+                e820[next_e820_entry_index].size = rdm_end - rdm_start;
+                e820[next_e820_entry_index].type = E820_RESERVED;
+                next_e820_entry_index++;
+                continue;
+            }
+
+            if ( type == E820_RAM )
+            {
+                /* If coincide with one RAM range. */
+                if ( rdm_start == start && rdm_end == end)
+                {
+                    e820[j].type = E820_RESERVED;
+                    continue;
+                }
+
+                /* If we're just aligned with start of one RAM range. */
+                if ( rdm_start == start && rdm_end < end )
+                {
+                    memmove(&e820[j+1], &e820[j],
+                            (sum_nr - j) * sizeof(struct e820entry));
+
+                    e820[j+1].addr = rdm_end;
+                    e820[j+1].size = e820[j].addr + e820[j].size - rdm_end;
+                    e820[j+1].type = E820_RAM;
+                    next_e820_entry_index++;
+
+                    e820[j].addr = rdm_start;
+                    e820[j].size = rdm_end - rdm_start;
+                    e820[j].type = E820_RESERVED;
+                    continue;
+                }
+
+                /* If we're just aligned with end of one RAM range. */
+                if ( rdm_start > start && rdm_end == end )
+                {
+                    memmove(&e820[j+1], &e820[j],
+                            (sum_nr - j) * sizeof(struct e820entry));
+
+                    e820[j].size = rdm_start - e820[j].addr;
+                    e820[j].type = E820_RAM;
+
+                    e820[j+1].addr = rdm_start;
+                    e820[j+1].size = rdm_end - rdm_start;
+                    e820[j+1].type = E820_RESERVED;
+                    next_e820_entry_index++;
+                    continue;
+                }
+
+                /* If we're just in of one RAM range */
+                if ( rdm_start > start && rdm_end < end )
+                {
+                    memmove(&e820[j+2], &e820[j],
+                            (sum_nr - j) * sizeof(struct e820entry));
+
+                    e820[j+2].addr = rdm_end;
+                    e820[j+2].size = e820[j].addr + e820[j].size - rdm_end;
+                    e820[j+2].type = E820_RAM;
+                    next_e820_entry_index++;
+
+                    e820[j+1].addr = rdm_start;
+                    e820[j+1].size = rdm_end - rdm_start;
+                    e820[j+1].type = E820_RESERVED;
+                    next_e820_entry_index++;
+
+                    e820[j].size = rdm_start - e820[j].addr;
+                    e820[j].type = E820_RAM;
+                    continue;
+                }
+
+                /* If we're going last RAM:Hole range */
+                if ( end < next_start && rdm_start > start &&
+                     rdm_end < next_start )
+                {
+                    memmove(&e820[j+1], &e820[j],
+                            (sum_nr - j) * sizeof(struct e820entry));
+
+                    e820[j].size = rdm_start - e820[j].addr;
+                    e820[j].type = E820_RAM;
+
+                    e820[j+1].addr = rdm_start;
+                    e820[j+1].size = rdm_end - rdm_start;
+                    e820[j+1].type = E820_RESERVED;
+                    next_e820_entry_index++;
+                    continue;
+                }
+            }
+        }
+    }
+
+    /* These overlap may issue guest can't work well. */
+    if ( err )
+    {
+        printf("Guest can't work with some reserved device memory overlap!\n");
+        BUG();
+    }
+
+    /* Fine to construct RDM mappings into e820. */
+    return next_e820_entry_index;
+}
+
 /* Create an E820 table based on memory parameters provided in hvm_info. */
 int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
     unsigned int nr = 0;
+    unsigned int nr_entries = 0;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -169,6 +331,12 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
+    nr_entries = hvm_get_reserved_device_memory_map();
+    if ( nr_entries )
+        nr = construct_rdm_e820_maps(nr, nr_entries, rdm_map, e820,
+                                     lowmem_reserved_base,
+                                     bios_image_base);
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (8 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  9:48   ` Tian, Kevin
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping Tiejun Chen
                   ` (7 subsequent siblings)
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
to allocate some memory to use in runtime cycle, so we alsoe need to
make sure all reserved device memory don't overlap such a region.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/util.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 8767897..f3723c7 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -416,9 +416,29 @@ static uint32_t alloc_down = RESERVED_MEMORY_DYNAMIC_END;
 
 xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
 {
+    unsigned int i, num = hvm_get_reserved_device_memory_map();
+    uint64_t rdm_start, rdm_end;
+    uint32_t alloc_start, alloc_end;
+
     alloc_down -= nr_mfns << PAGE_SHIFT;
+    alloc_start = alloc_down;
+    alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
+    for ( i = 0; i < num; i++ )
+    {
+        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
+        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << PAGE_SHIFT);
+        if ( check_rdm_hole_conflict((uint64_t)alloc_start,
+                                     (uint64_t)alloc_end,
+                                     rdm_start, rdm_end - rdm_start) )
+        {
+            alloc_end = rdm_start;
+            alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
+            BUG_ON(alloc_up >= alloc_start);
+        }
+    }
+
     BUG_ON(alloc_up >= alloc_down);
-    return alloc_down >> PAGE_SHIFT;
+    return alloc_start >> PAGE_SHIFT;
 }
 
 void *mem_alloc(uint32_t size, uint32_t align)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (9 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  9:57   ` Tian, Kevin
  2014-12-04 16:42   ` Jan Beulich
  2014-12-01  9:24 ` [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We need to reject to populate reserved device memory mapping, and
then make sure all reserved device memory can't be accessed by any
!iommu approach.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c     | 59 +++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-x86/p2m.h |  9 ++++++++
 2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index efa49dd..607ecd0 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -556,6 +556,40 @@ guest_physmap_remove_page(struct domain *d, unsigned long gfn,
     gfn_unlock(p2m, gfn, page_order);
 }
 
+/* Check if we are accessing rdm. */
+int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                     u32 id, void *ctxt)
+{
+    xen_pfn_t end = start + nr;
+    unsigned int i;
+    u32 sbdf;
+    struct p2m_get_reserved_device_memory *pgrdm = ctxt;
+    struct domain *d = pgrdm->domain;
+
+    if ( d->arch.hvm_domain.pci_force )
+    {
+        if ( pgrdm->gfn >= start && pgrdm->gfn < end )
+            return 1;
+    }
+    else
+    {
+        for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
+        {
+            sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
+                             d->arch.hvm_domain.pcidevs[i].bus,
+                             d->arch.hvm_domain.pcidevs[i].devfn);
+
+            if ( sbdf == id )
+            {
+                if ( pgrdm->gfn >= start && pgrdm->gfn < end )
+                    return 1;
+            }
+        }
+    }
+
+    return 0;
+}
+
 int
 guest_physmap_add_entry(struct domain *d, unsigned long gfn,
                         unsigned long mfn, unsigned int page_order, 
@@ -568,6 +602,7 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
     mfn_t omfn;
     int pod_count = 0;
     int rc = 0;
+    struct p2m_get_reserved_device_memory pgrdm;
 
     if ( !paging_mode_translate(d) )
     {
@@ -686,8 +721,28 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
     /* Now, actually do the two-way mapping */
     if ( mfn_valid(_mfn(mfn)) ) 
     {
-        rc = p2m_set_entry(p2m, gfn, _mfn(mfn), page_order, t,
-                           p2m->default_access);
+        pgrdm.gfn = gfn;
+        pgrdm.domain = d;
+        if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
+        {
+            rc = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
+                                                  &pgrdm);
+            /* We always avoid populating reserved device memory. */
+            if ( rc == 1 )
+            {
+                rc = -EBUSY;
+                goto out;
+            }
+            else if ( rc < 0 )
+            {
+                printk(XENLOG_G_WARNING
+                       "Can't check reserved device memory for Dom%d.\n",
+                       d->domain_id);
+                goto out;
+            }
+        }
+
+        rc = p2m_set_entry(p2m, gfn, _mfn(mfn), page_order, t, p2m->default_access);
         if ( rc )
             goto out; /* Failed to update p2m, bail without updating m2p. */
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 5f7fe71..99f7fb7 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -709,6 +709,15 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
     return flags;
 }
 
+struct p2m_get_reserved_device_memory {
+    unsigned long gfn;
+    struct domain *domain;
+};
+
+/* Check if we are accessing rdm. */
+extern int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                            u32 id, void *ctxt);
+
 #endif /* _XEN_P2M_H */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (10 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02  9:59   ` Tian, Kevin
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory Tiejun Chen
                   ` (5 subsequent siblings)
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We always reserve these ranges since we never allow any stuff to
poke them.

But in theory some untrusted VM can maliciously access them. So we
need to intercept this approach. But we just don't want to leak
anything or introduce any side affect since other OSs may touch them
by careless behavior, so its enough to have a lightweight way, and
it shouldn't be same as those broken pages which cause domain crush.

So we just need to return with next eip then let VM/OS itself handle
such a scenario as its own logic.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2907afa..3ee884a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2403,6 +2403,7 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
     p2m_type_t p2mt;
     int ret;
     struct domain *d = current->domain;
+    struct p2m_get_reserved_device_memory pgrdm;
 
     /*
      * We treat all write violations also as read violations.
@@ -2438,6 +2439,23 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
         __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
     }
 
+    /* This means some untrusted VM can maliciously access reserved
+     * device memory. But we just don't want to leak anything or
+     * introduce any side affect since other OSs may touch them by
+     * careless behavior, so its enough to have a lightweight way.
+     * Here we just need to return with next eip then let VM/OS itself
+     * handle such a scenario as its own logic.
+     */
+    pgrdm.gfn = gfn;
+    pgrdm.domain = d;
+    ret = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
+                                           &pgrdm);
+    if ( ret )
+    {
+        update_guest_eip();
+        return;
+    }
+
     if ( qualification & EPT_GLA_VALID )
     {
         __vmread(GUEST_LINEAR_ADDRESS, &gla);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (11 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02 14:54   ` Julien Grall
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
                   ` (4 subsequent siblings)
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We can't expost those reserved device memory in case of mem_access
since any access may corrupt device usage.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/common/mem_access.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
index 6c2724b..72a807a 100644
--- a/xen/common/mem_access.c
+++ b/xen/common/mem_access.c
@@ -55,6 +55,43 @@ void mem_access_resume(struct domain *d)
     }
 }
 
+/* We can't expose reserved device memory. */
+static int mem_access_check_rdm(struct domain *d, uint64_aligned_t start,
+                                uint32_t nr)
+{
+    uint32_t i;
+    struct p2m_get_reserved_device_memory pgrdm;
+    int rc = 0;
+
+    if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
+    {
+        for ( i = 0; i < nr; i++ )
+        {
+            pgrdm.gfn = start + i;
+            pgrdm.domain = d;
+            rc = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
+                                                  &pgrdm);
+            if ( rc < 0 )
+            {
+                printk(XENLOG_WARNING
+                       "Domain %d can't check reserved device memory.\n",
+                       d->domain_id);
+                return rc;
+            }
+
+            if ( rc == 1 )
+            {
+                printk(XENLOG_WARNING
+                       "Domain %d: we shouldn't mem_access reserved device memory.\n",
+                       d->domain_id);
+                return rc;
+            }
+        }
+    }
+
+    return rc;
+}
+
 int mem_access_memop(unsigned long cmd,
                      XEN_GUEST_HANDLE_PARAM(xen_mem_access_op_t) arg)
 {
@@ -99,6 +136,10 @@ int mem_access_memop(unsigned long cmd,
               ((mao.pfn + mao.nr - 1) > domain_get_maximum_gpfn(d))) )
             break;
 
+        rc =  mem_access_check_rdm(d, mao.pfn, mao.nr);
+        if ( rc == 1 )
+            break;
+
         rc = p2m_set_mem_access(d, mao.pfn, mao.nr, start_iter,
                                 MEMOP_CMD_MASK, mao.access);
         if ( rc > 0 )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (12 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02 10:00   ` Tian, Kevin
  2014-12-02 20:29   ` Konrad Rzeszutek Wilk
  2014-12-01  9:24 ` [v8][PATCH 15/17] xen:vtd: create RMRR mapping Tiejun Chen
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

We will create RMRR mapping as follows:

If gfn space unoccupied, we just set that. If
space already occupy by 1:1 RMRR mapping do thing. Others
should be failed.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c     | 28 ++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  4 ++++
 2 files changed, 32 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 607ecd0..c415521 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -913,6 +913,34 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret = -EBUSY;
+
+    gfn_lock(p2m, gfn, 0);
+
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+    if ( !mfn_valid(mfn) )
+        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct,
+                            p2ma);
+    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+        ret = 0;
+    else
+        printk(XENLOG_G_WARNING
+               "Cannot identity map d%d:%lx, already mapped to %lx.\n",
+               d->domain_id, gfn, mfn_x(mfn));
+
+    gfn_unlock(p2m, gfn, 0);
+
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 99f7fb7..26cf0cc 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -509,6 +509,10 @@ int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 15/17] xen:vtd: create RMRR mapping
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (13 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02 10:02   ` Tian, Kevin
  2014-12-02 20:30   ` Konrad Rzeszutek Wilk
  2014-12-01  9:24 ` [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR Tiejun Chen
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

intel_iommu_map_page() does nothing if VT-d shares EPT page table.
So rmrr_identity_mapping() never create RMRR mapping but in some
cases like some GFX drivers it still need to access RMRR.

Here we will create those RMRR mappings even in shared EPT case.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a38f201..a54c6eb 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1856,10 +1856,15 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
-
-        if ( err )
+        int err = 0;
+        if ( iommu_use_hap_pt(d) )
+        {
+            ASSERT(!iommu_passthrough || !is_hardware_domain(d));
+            if ( (err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw)) )
+                return err;
+        }
+        else if ( (err = intel_iommu_map_page(d, base_pfn, base_pfn,
+					      IOMMUF_readable|IOMMUF_writable)) )
             return err;
         base_pfn++;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (14 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 15/17] xen:vtd: create RMRR mapping Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-02 10:11   ` Tian, Kevin
                     ` (2 more replies)
  2014-12-01  9:24 ` [v8][PATCH 17/17] xen/vtd: re-enable USB device assignment if enable pci_force Tiejun Chen
  2014-12-02 19:17 ` [v8][PATCH 00/17] xen: RMRR fix Konrad Rzeszutek Wilk
  17 siblings, 3 replies; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

Sometimes different devices may share RMRR range so in this
case we shouldn't assign these devices into different VMs
since they may have potential leakage even damage between VMs.

So we need to group all devices as RMRR range to make sure they
are just assigned into the same VM.

Here we introduce two field, gid and domid, in struct,
acpi_rmrr_unit:
 gid: indicate which group this device owns. "0" is invalid so
      just start from "1".
 domid: indicate which domain this device owns currently. Firstly
        the hardware domain should own it.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/dmar.c  | 28 ++++++++++++++-
 xen/drivers/passthrough/vtd/dmar.h  |  2 ++
 xen/drivers/passthrough/vtd/iommu.c | 68 +++++++++++++++++++++++++++++++++----
 3 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index c5bc8d6..8d3406f 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -572,10 +572,11 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
     struct acpi_dmar_reserved_memory *rmrr =
         container_of(header, struct acpi_dmar_reserved_memory, header);
-    struct acpi_rmrr_unit *rmrru;
+    struct acpi_rmrr_unit *rmrru, *cur_rmrr;
     void *dev_scope_start, *dev_scope_end;
     u64 base_addr = rmrr->base_address, end_addr = rmrr->end_address;
     int ret;
+    static unsigned int group_id = 0;
 
     if ( (ret = acpi_dmar_check_length(header, sizeof(*rmrr))) != 0 )
         return ret;
@@ -611,6 +612,8 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
     rmrru->base_address = base_addr;
     rmrru->end_address = end_addr;
     rmrru->segment = rmrr->segment;
+    /* "0" is an invalid group id. */
+    rmrru->gid = 0;
 
     dev_scope_start = (void *)(rmrr + 1);
     dev_scope_end   = ((void *)rmrr) + header->length;
@@ -682,7 +685,30 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
                     "So please set pci_rdmforce to reserve these ranges"
                     " if you need such a device in hotplug case.\n");
 
+            list_for_each_entry(cur_rmrr, &acpi_rmrr_units, list)
+            {
+                /*
+                 * Any same or overlap range mean they should be
+                 * at same group.
+                 */
+                if ( ((base_addr >= cur_rmrr->base_address) &&
+                     (end_addr <= cur_rmrr->end_address)) ||
+                     ((base_addr <= cur_rmrr->base_address) &&
+                     (end_addr >= cur_rmrr->end_address)) )
+                {
+                    rmrru->gid = cur_rmrr->gid;
+                    continue;
+                }
+            }
+
             acpi_register_rmrr_unit(rmrru);
+
+            /* Allocate group id from gid:1. */
+            if ( !rmrru->gid )
+            {
+                group_id++;
+                rmrru->gid = group_id;
+            }
         }
     }
 
diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..a57c0d4 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -76,6 +76,8 @@ struct acpi_rmrr_unit {
     u64    end_address;
     u16    segment;
     u8     allow_all:1;
+    int    gid;
+    domid_t    domid;
 };
 
 struct acpi_atsr_unit {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a54c6eb..ba40209 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1882,9 +1882,9 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
 static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
 {
-    struct acpi_rmrr_unit *rmrr;
-    u16 bdf;
-    int ret, i;
+    struct acpi_rmrr_unit *rmrr, *g_rmrr;
+    u16 bdf, g_bdf;
+    int ret, i, j;
 
     ASSERT(spin_is_locked(&pcidevs_lock));
 
@@ -1905,6 +1905,32 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
+            if ( rmrr->domid == hardware_domain->domain_id )
+            {
+                for_each_rmrr_device ( g_rmrr, g_bdf, j )
+                {
+                    if ( g_rmrr->gid == rmrr->gid )
+                    {
+                        if ( g_rmrr->domid == hardware_domain->domain_id )
+                            g_rmrr->domid = pdev->domain->domain_id;
+                        else if ( g_rmrr->domid != pdev->domain->domain_id )
+                        {
+                            rmrr->domid = g_rmrr->domid;
+                            continue;
+                        }
+                    }
+                }
+            }
+
+            if ( rmrr->domid != pdev->domain->domain_id )
+            {
+                domain_context_unmap(pdev->domain, devfn, pdev);
+                dprintk(XENLOG_ERR VTDPREFIX, "d%d: this is a group device owned by d%d\n",
+                        pdev->domain->domain_id, rmrr->domid);
+                rmrr->domid = 0;
+                return -EINVAL;
+            }
+
             ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
@@ -1946,6 +1972,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
+        /* Just release to hardware domain. */
+        rmrr->domid = hardware_domain->domain_id;
         rmrr_identity_mapping(pdev->domain, 0, rmrr);
     }
 
@@ -2104,6 +2132,8 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
+        /* hwdom should own all devices at first. */
+        rmrr->domid = d->domain_id;
         ret = rmrr_identity_mapping(d, 1, rmrr);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
@@ -2273,9 +2303,9 @@ static int reassign_device_ownership(
 static int intel_iommu_assign_device(
     struct domain *d, u8 devfn, struct pci_dev *pdev)
 {
-    struct acpi_rmrr_unit *rmrr;
-    int ret = 0, i;
-    u16 bdf, seg;
+    struct acpi_rmrr_unit *rmrr, *g_rmrr;
+    int ret = 0, i, j;
+    u16 bdf, seg, g_bdf;
     u8 bus;
 
     if ( list_empty(&acpi_drhd_units) )
@@ -2300,6 +2330,32 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
+            if ( rmrr->domid == hardware_domain->domain_id )
+            {
+                for_each_rmrr_device ( g_rmrr, g_bdf, j )
+                {
+                    if ( g_rmrr->gid == rmrr->gid )
+                    {
+                        if ( g_rmrr->domid == hardware_domain->domain_id )
+                            g_rmrr->domid = pdev->domain->domain_id;
+                        else if ( g_rmrr->domid != pdev->domain->domain_id )
+                        {
+                            rmrr->domid = g_rmrr->domid;
+                            continue;
+                        }
+                    }
+                }
+            }
+
+            if ( rmrr->domid != pdev->domain->domain_id )
+            {
+                domain_context_unmap(pdev->domain, devfn, pdev);
+                dprintk(XENLOG_ERR VTDPREFIX, "d%d: this is a group device owned by d%d\n",
+                        pdev->domain->domain_id, rmrr->domid);
+                rmrr->domid = 0;
+                return -EINVAL;
+            }
+
             ret = rmrr_identity_mapping(d, 1, rmrr);
             if ( ret )
             {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [v8][PATCH 17/17] xen/vtd: re-enable USB device assignment if enable pci_force
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (15 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR Tiejun Chen
@ 2014-12-01  9:24 ` Tiejun Chen
  2014-12-05 16:12   ` Konrad Rzeszutek Wilk
  2014-12-02 19:17 ` [v8][PATCH 00/17] xen: RMRR fix Konrad Rzeszutek Wilk
  17 siblings, 1 reply; 106+ messages in thread
From: Tiejun Chen @ 2014-12-01  9:24 UTC (permalink / raw)
  To: jbeulich, ian.jackson, stefano.stabellini, ian.campbell,
	wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: xen-devel

Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
region so we always ignore USB RMRR. Now this can be gone when we enable
pci_force to check/reserve RMRR.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/dmar.h  |  1 +
 xen/drivers/passthrough/vtd/iommu.c | 12 ++++++++----
 xen/drivers/passthrough/vtd/utils.c | 18 ++++++++++++++++++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index a57c0d4..832dc32 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -132,6 +132,7 @@ do {                                                \
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
 int is_usb_device(u16 seg, u8 bus, u8 devfn);
+int is_reserve_device_memory(struct domain *d, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index ba40209..1f1ceb7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2264,9 +2264,11 @@ static int reassign_device_ownership(
      * remove it from the hardware domain, because BIOS may use RMRR at
      * booting time. Also account for the special casing of USB below (in
      * intel_iommu_assign_device()).
+     * But if we already check to reserve RMRR, this should be fine.
      */
     if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) &&
+         !is_reserve_device_memory(source, pdev->bus, pdev->devfn) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2315,12 +2317,14 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
+    /*
+     * Because USB RMRR conflicts with guest bios region,
+     * ignore USB RMRR temporarily in case of non-reserving-RMRR.
      */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
+    if ( is_usb_device(seg, bus, pdev->devfn) &&
+         !is_reserve_device_memory(d, bus, pdev->devfn) )
         return 0;
 
     /* Setup rmrr identity mapping */
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index a33564b..1045ac1 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -36,6 +36,24 @@ int is_usb_device(u16 seg, u8 bus, u8 devfn)
     return (class == 0xc03);
 }
 
+int is_reserve_device_memory(struct domain *d, u8 bus, u8 devfn)
+{
+    int i = 0;
+
+    if ( d->arch.hvm_domain.pci_force == PCI_DEV_RDM_CHECK )
+        return 1;
+
+    for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
+    {
+        if ( d->arch.hvm_domain.pcidevs[i].bus == bus &&
+             d->arch.hvm_domain.pcidevs[i].devfn == devfn &&
+             d->arch.hvm_domain.pcidevs[i].flags == PCI_DEV_RDM_CHECK )
+        return 1;
+    }
+
+    return 0;
+}
+
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-01  9:24 ` [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm Tiejun Chen
@ 2014-12-02  8:33   ` Tian, Kevin
  2014-12-08  1:30     ` Chen, Tiejun
  2014-12-02 19:39   ` Konrad Rzeszutek Wilk
  2014-12-04 15:33   ` Jan Beulich
  2 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  8:33 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> This should be based on a new parameter globally, 'pci_rdmforce'.
> 
> pci_rdmforce = 1 => Of course this should be 0 by default.
> 
> '1' means we should force check to reserve all ranges. If failed
> VM wouldn't be created successfully. This also can give user a
> chance to work well with later hotplug, even if not a device
> assignment while creating VM.
> 
> But we can override that by one specific pci device:
> 
> pci = ['AA:BB.CC,rdmforce=0/1]
> 
> But this 'rdmforce' should be 1 by default since obviously any
> passthrough device always need to do this. Actually no one really
> want to set as '0' so it may be unnecessary but I'd like to leave
> this as a potential approach.

since no one requires it, why bother adding it? better to just
keep global option.

> 
> So this domctl provides an approach to control how to populate
> reserved device memory by tools.
> 
> Note we always post a message to user about this once we owns
> RMRR.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  docs/man/xl.cfg.pod.5              |  6 +++++
>  docs/misc/vtd.txt                  | 15 ++++++++++++
>  tools/libxc/include/xenctrl.h      |  6 +++++
>  tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
>  tools/libxl/libxl_create.c         |  3 +++
>  tools/libxl/libxl_dm.c             | 47
> ++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h       |  4 ++++
>  tools/libxl/libxl_types.idl        |  2 ++
>  tools/libxl/libxlu_pci.c           |  2 ++
>  tools/libxl/xl_cmdimpl.c           | 10 ++++++++
>  xen/drivers/passthrough/pci.c      | 39
> +++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
>  xen/include/asm-x86/hvm/domain.h   |  4 ++++
>  xen/include/public/domctl.h        | 21 +++++++++++++++++
>  xen/xsm/flask/hooks.c              |  1 +
>  15 files changed, 196 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 622ea53..9adc41e 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
>  D0-D3hot power management states for the PCI device. False (0) by
>  default.
> 
> +=item B<rdmforce=BOOLEAN>
> +
> +(HVM/x86 only) Specifies that the VM would force to check and try to
> +reserve all reserved device memory, like RMRR, associated to the PCI
> +device. False (0) by default.
> +
>  =back
> 
>  =back
> diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> index 9af0e99..23544d5 100644
> --- a/docs/misc/vtd.txt
> +++ b/docs/misc/vtd.txt
> @@ -111,6 +111,21 @@ in the config file:
>  To override for a specific device:
>  	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
> 
> +RDM, 'reserved device memory', for PCI Device Passthrough
> +---------------------------------------------------------
> +
> +The BIOS controls some devices in terms of some reginos of memory used for
> +these devices. This kind of region should be reserved before creating a VM
> +to make sure they are not occupied by RAM/MMIO to conflict, and also we
> can
> +create necessary IOMMU table successfully.
> +
> +To enable this globally, add "pci_rdmforce" in the config file:
> +
> +	pci_rdmforce = 1         (default is 0)
> +
> +Or just enable for a specific device:
> +	pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
> +
> 
>  Caveat on Conventional PCI Device Passthrough
>  ---------------------------------------------
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 0ad8b8d..84012fe 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2038,6 +2038,12 @@ int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
>                       uint32_t machine_bdf);
> 
> +int xc_domain_device_setrdm(xc_interface *xch,
> +                            uint32_t domid,
> +                            uint32_t num_pcidevs,
> +                            uint32_t pci_rdmforce,
> +                            struct xen_guest_pcidev_info *pcidevs);
> +
>  int xc_get_device_group(xc_interface *xch,
>                       uint32_t domid,
>                       uint32_t machine_bdf,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index b864872..7fd43e9 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1633,6 +1633,34 @@ int xc_assign_device(
>      return do_domctl(xch, &domctl);
>  }
> 
> +int xc_domain_device_setrdm(xc_interface *xch,
> +                            uint32_t domid,
> +                            uint32_t num_pcidevs,
> +                            uint32_t pci_rdmforce,
> +                            struct xen_guest_pcidev_info *pcidevs)
> +{
> +    int ret;
> +    DECLARE_DOMCTL;
> +    DECLARE_HYPERCALL_BOUNCE(pcidevs,
> +
> num_pcidevs*sizeof(xen_guest_pcidev_info_t),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
> +
> +    if ( xc_hypercall_bounce_pre(xch, pcidevs) )
> +        return -1;
> +
> +    domctl.cmd = XEN_DOMCTL_set_rdm;
> +    domctl.domain = (domid_t)domid;
> +    domctl.u.set_rdm.flags = pci_rdmforce;
> +    domctl.u.set_rdm.num_pcidevs = num_pcidevs;
> +    set_xen_guest_handle(domctl.u.set_rdm.pcidevs, pcidevs);
> +
> +    ret = do_domctl(xch, &domctl);
> +
> +    xc_hypercall_bounce_post(xch, pcidevs);
> +
> +    return ret;
> +}
> +
>  int xc_get_device_group(
>      xc_interface *xch,
>      uint32_t domid,
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 1198225..c615686 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -862,6 +862,9 @@ static void initiate_domain_create(libxl__egc *egc,
>      ret = libxl__domain_build_info_setdefault(gc, &d_config->b_info);
>      if (ret) goto error_out;
> 
> +    ret = libxl__domain_device_setrdm(gc, d_config, domid);
> +    if (ret) goto error_out;
> +
>      if (!sched_params_valid(gc, domid, &d_config->b_info.sched_params)) {
>          LOG(ERROR, "Invalid scheduling parameters\n");
>          ret = ERROR_INVAL;
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 3e191c3..e50587d 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -90,6 +90,53 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>      return dm;
>  }
> 
> +int libxl__domain_device_setrdm(libxl__gc *gc,
> +                                libxl_domain_config *d_config,
> +                                uint32_t dm_domid)
> +{
> +    int i, ret;
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    struct xen_guest_pcidev_info *pcidevs = NULL;
> +    uint32_t rdmforce = 0;
> +
> +    if ( d_config->num_pcidevs )
> +    {
> +        pcidevs =
> malloc(d_config->num_pcidevs*sizeof(xen_guest_pcidev_info_t));
> +        if ( pcidevs )
> +        {
> +            for (i = 0; i < d_config->num_pcidevs; i++)
> +            {
> +                pcidevs[i].devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
> +
> d_config->pcidevs[i].func);
> +                pcidevs[i].bus = d_config->pcidevs[i].bus;
> +                pcidevs[i].seg = d_config->pcidevs[i].domain;
> +                pcidevs[i].flags = d_config->pcidevs[i].rdmforce &
> +                                   PCI_DEV_RDM_CHECK;
> +            }
> +        }
> +        else
> +        {
> +            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
> +                               "Can't allocate for pcidevs.");
> +            return -1;
> +        }
> +    }
> +    rdmforce = libxl_defbool_val(d_config->b_info.rdmforce) ? 1 : 0;
> +
> +    /* Nothing to do. */
> +    if ( !rdmforce && !d_config->num_pcidevs )
> +        return 0;

move check before creating pcidevs.

> +
> +    ret = xc_domain_device_setrdm(ctx->xch, dm_domid,
> +                                  (uint32_t)d_config->num_pcidevs,
> +                                  rdmforce,
> +                                  pcidevs);
> +    if ( d_config->num_pcidevs )
> +        free(pcidevs);
> +
> +    return ret;
> +}
> +
>  const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config
> *guest_config)
>  {
>      const libxl_vnc_info *vnc = NULL;
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index a38f695..be397a6 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1477,6 +1477,10 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc
> *gc,
>          int nr_disks, libxl_device_disk *disks,
>          int nr_channels, libxl_device_channel *channels);
> 
> +_hidden int libxl__domain_device_setrdm(libxl__gc *gc,
> +                                        libxl_domain_config *info,
> +                                        uint32_t domid);
> +
>  /*
>   * This function will cause the whole libxl process to hang
>   * if the device model does not respond.  It is deprecated.
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index f7fc695..0076a32 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -398,6 +398,7 @@ libxl_domain_build_info =
> Struct("domain_build_info",[
>      ("kernel",           string),
>      ("cmdline",          string),
>      ("ramdisk",          string),
> +    ("rdmforce",         libxl_defbool),
>      ("u", KeyedUnion(None, libxl_domain_type, "type",
>                  [("hvm", Struct(None, [("firmware",         string),
>                                         ("bios",
> libxl_bios_type),
> @@ -518,6 +519,7 @@ libxl_device_pci = Struct("device_pci", [
>      ("power_mgmt", bool),
>      ("permissive", bool),
>      ("seize", bool),
> +    ("rdmforce", bool),
>      ])
> 
>  libxl_device_vtpm = Struct("device_vtpm", [
> diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
> index 26fb143..989eac8 100644
> --- a/tools/libxl/libxlu_pci.c
> +++ b/tools/libxl/libxlu_pci.c
> @@ -143,6 +143,8 @@ int xlu_pci_parse_bdf(XLU_Config *cfg,
> libxl_device_pci *pcidev, const char *str
>                      pcidev->permissive = atoi(tok);
>                  }else if ( !strcmp(optkey, "seize") ) {
>                      pcidev->seize = atoi(tok);
> +                }else if ( !strcmp(optkey, "rdmforce") ) {
> +                    pcidev->rdmforce = atoi(tok);
>                  }else{
>                      XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s",
> optkey);
>                  }
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 0e754e7..9c23733 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -919,6 +919,7 @@ static void parse_config_data(const char
> *config_source,
>      int pci_msitranslate = 0;
>      int pci_permissive = 0;
>      int pci_seize = 0;
> +    int pci_rdmforce = 0;
>      int i, e;
> 
>      libxl_domain_create_info *c_info = &d_config->c_info;
> @@ -1699,6 +1700,9 @@ skip_vfb:
>      if (!xlu_cfg_get_long (config, "pci_seize", &l, 0))
>          pci_seize = l;
> 
> +    if (!xlu_cfg_get_long (config, "pci_rdmforce", &l, 0))
> +        pci_rdmforce = l;
> +
>      /* To be reworked (automatically enabled) once the auto ballooning
>       * after guest starts is done (with PCI devices passed in). */
>      if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {
> @@ -1719,6 +1723,7 @@ skip_vfb:
>              pcidev->power_mgmt = pci_power_mgmt;
>              pcidev->permissive = pci_permissive;
>              pcidev->seize = pci_seize;
> +            pcidev->rdmforce = pci_rdmforce;
>              if (!xlu_pci_parse_bdf(config, pcidev, buf))
>                  d_config->num_pcidevs++;
>          }
> @@ -1726,6 +1731,11 @@ skip_vfb:
>              libxl_defbool_set(&b_info->u.pv.e820_host, true);
>      }
> 
> +    if ((c_info->type == LIBXL_DOMAIN_TYPE_HVM) && pci_rdmforce)
> +        libxl_defbool_set(&b_info->rdmforce, true);
> +    else
> +        libxl_defbool_set(&b_info->rdmforce, false);
> +
>      switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
>      case 0:
>          {
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 78c6977..ae924ad 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -34,6 +34,7 @@
>  #include <xen/tasklet.h>
>  #include <xsm/xsm.h>
>  #include <asm/msi.h>
> +#include <xen/stdbool.h>
> 
>  struct pci_seg {
>      struct list_head alldevs_list;
> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>          }
>          break;
> 
> +    case XEN_DOMCTL_set_rdm:
> +    {
> +        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> +        struct xen_guest_pcidev_info *pcidevs = NULL;
> +        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
> +
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        d->arch.hvm_domain.pci_force =
> +                            xdsr->flags & PCI_DEV_RDM_CHECK ?
> true : false;
> +        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
> +        d->arch.hvm_domain.pcidevs = NULL;
> +
> +        if ( xdsr->num_pcidevs )
> +        {
> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> +                                    xdsr->num_pcidevs);
> +            if ( pcidevs == NULL )
> +            {
> +                rcu_unlock_domain(d);
> +                return -ENOMEM;
> +            }
> +
> +            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> +
> xdsr->num_pcidevs*sizeof(*pcidevs)) )
> +            {
> +                xfree(pcidevs);
> +                rcu_unlock_domain(d);
> +                return -EFAULT;
> +            }
> +        }
> +
> +        d->arch.hvm_domain.pcidevs = pcidevs;
> +        rcu_unlock_domain(d);
> +    }
> +        break;
> +
>      case XEN_DOMCTL_assign_device:
>          if ( unlikely(d->is_dying) )
>          {
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index 1152c3a..5e41e7a 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header
> *header)
>                          "  RMRR region: base_addr %"PRIx64
>                          " end_address %"PRIx64"\n",
>                          rmrru->base_address, rmrru->end_address);
> +            /*
> +             * TODO: we may provide a precise paramter just to reserve
> +             * RMRR range specific to one device.
> +             */
> +            dprintk(XENLOG_WARNING VTDPREFIX,
> +                    "So please set pci_rdmforce to reserve these ranges"
> +                    " if you need such a device in hotplug case.\n");
> +
>              acpi_register_rmrr_unit(rmrru);
>          }
>      }
> diff --git a/xen/include/asm-x86/hvm/domain.h
> b/xen/include/asm-x86/hvm/domain.h
> index 2757c7f..38530e5 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -90,6 +90,10 @@ struct hvm_domain {
>      /* Cached CF8 for guest PCI config cycles */
>      uint32_t                pci_cf8;
> 
> +    bool_t                  pci_force;
> +    uint32_t                num_pcidevs;
> +    struct xen_guest_pcidev_info      *pcidevs;
> +
>      struct pl_time         pl_time;
> 
>      struct hvm_io_handler *io_handler;
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 57e2ed7..ba8970d 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
>  typedef struct xen_domctl_get_device_group
> xen_domctl_get_device_group_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
> 
> +/* Currently just one bit to indicate force to check Reserved Device Memory.
> */
> +#define PCI_DEV_RDM_CHECK   0x1
> +struct xen_guest_pcidev_info {
> +    uint16_t    seg;
> +    uint8_t     bus;
> +    uint8_t     devfn;
> +    uint32_t    flags;
> +};
> +typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
> +/* Control whether/how we check and reserve device memory. */
> +struct xen_domctl_set_rdm {
> +    uint32_t    flags;
> +    uint32_t    num_pcidevs;
> +    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
> +};
> +typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
> +
>  /* Pass-through interrupts: bind real irq -> hvm devfn. */
>  /* XEN_DOMCTL_bind_pt_irq */
>  /* XEN_DOMCTL_unbind_pt_irq */
> @@ -1070,6 +1089,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_setvnumainfo                  74
>  #define XEN_DOMCTL_psr_cmt_op                    75
>  #define XEN_DOMCTL_arm_configure_domain          76
> +#define XEN_DOMCTL_set_rdm                       77
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -1135,6 +1155,7 @@ struct xen_domctl {
>          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
>          struct xen_domctl_vnuma             vnuma;
>          struct xen_domctl_psr_cmt_op        psr_cmt_op;
> +        struct xen_domctl_set_rdm           set_rdm;
>          uint8_t                             pad[128];
>      } u;
>  };
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index d48463f..5a760e2 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
>      case XEN_DOMCTL_test_assign_device:
>      case XEN_DOMCTL_assign_device:
>      case XEN_DOMCTL_deassign_device:
> +    case XEN_DOMCTL_set_rdm:
>  #endif
>          return 0;
> 
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-01  9:24 ` [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm Tiejun Chen
@ 2014-12-02  8:46   ` Tian, Kevin
  2014-12-08  6:22     ` Chen, Tiejun
  2014-12-04 15:50   ` Jan Beulich
  1 sibling, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  8:46 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> After we intend to expost that hypercall explicitly based on
> XEN_DOMCTL_set_rdm, we need this rebase. I hope we can squash
> this into that previous patch once Jan Ack this.

better to merge together, since it's the right thing to do based on previous
discussion.

one question about 'd->arch.hvm_domain.pci_force'. My impression is
that this flag enables force check, and while enabled, you'll always
do selected BDF filtering by default. However from below code, seems
pci_force is used to whether report all or selected regions. Am I reading
it wrong?

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/common/compat/memory.c         | 75
> ++++++++++++++++++++++++++++++--------
>  xen/common/memory.c                | 71
> +++++++++++++++++++++++++++++-------
>  xen/drivers/passthrough/vtd/dmar.c | 32 ++++++++++++----
>  xen/include/public/memory.h        |  5 +++
>  xen/include/xen/iommu.h            |  2 +-
>  xen/include/xen/pci.h              |  2 +
>  6 files changed, 148 insertions(+), 39 deletions(-)
> 
> diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
> index 60512fa..e6a256e 100644
> --- a/xen/common/compat/memory.c
> +++ b/xen/common/compat/memory.c
> @@ -22,27 +22,66 @@ struct get_reserved_device_memory {
>      unsigned int used_entries;
>  };
> 
> -static int get_reserved_device_memory(xen_pfn_t start,
> -                                      xen_ulong_t nr, void *ctxt)
> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                      u32 id, void *ctxt)
>  {
>      struct get_reserved_device_memory *grdm = ctxt;
> +    struct domain *d;
> +    unsigned int i;
> +    u32 sbdf;
> +    struct compat_reserved_device_memory rdm = {
> +        .start_pfn = start, .nr_pages = nr
> +    };
> 
> -    if ( grdm->used_entries < grdm->map.nr_entries )
> -    {
> -        struct compat_reserved_device_memory rdm = {
> -            .start_pfn = start, .nr_pages = nr
> -        };
> +    if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> +        return -ERANGE;
> 
> -        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> -            return -ERANGE;
> +    d = rcu_lock_domain_by_any_id(grdm->map.domid);
> +    if ( d == NULL )
> +        return -ESRCH;
> 
> -        if ( __copy_to_compat_offset(grdm->map.buffer,
> grdm->used_entries,
> -                                     &rdm, 1) )
> -            return -EFAULT;
> +    if ( d )
> +    {
> +        if ( d->arch.hvm_domain.pci_force )
> +        {
> +            if ( grdm->used_entries < grdm->map.nr_entries )
> +            {
> +                if ( __copy_to_compat_offset(grdm->map.buffer,
> +                                             grdm->used_entries,
> +                                             &rdm, 1) )
> +                {
> +                    rcu_unlock_domain(d);
> +                    return -EFAULT;
> +                }
> +            }
> +            ++grdm->used_entries;
> +        }
> +        else
> +        {
> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +            {
> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> +                                 d->arch.hvm_domain.pcidevs[i].bus,
> +
> d->arch.hvm_domain.pcidevs[i].devfn);
> +                if ( sbdf == id )
> +                {
> +                    if ( grdm->used_entries < grdm->map.nr_entries )
> +                    {
> +                        if
> ( __copy_to_compat_offset(grdm->map.buffer,
> +
> grdm->used_entries,
> +                                                     &rdm, 1) )
> +                        {
> +                            rcu_unlock_domain(d);
> +                            return -EFAULT;
> +                        }
> +                    }
> +                    ++grdm->used_entries;
> +                }
> +            }
> +        }
>      }
> 
> -    ++grdm->used_entries;
> -
> +    rcu_unlock_domain(d);
>      return 0;
>  }
>  #endif
> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd,
> XEN_GUEST_HANDLE_PARAM(void) compat)
> 
>              if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>                  rc = -ENOBUFS;
> +
>              grdm.map.nr_entries = grdm.used_entries;
> -            if ( __copy_to_guest(compat, &grdm.map, 1) )
> -                rc = -EFAULT;
> +            if ( grdm.map.nr_entries )
> +            {
> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
> +                    rc = -EFAULT;
> +            }
> 
>              return rc;
>          }
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 4788acc..9ce82b1 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -698,24 +698,63 @@ struct get_reserved_device_memory {
>      unsigned int used_entries;
>  };
> 
> -static int get_reserved_device_memory(xen_pfn_t start,
> -                                      xen_ulong_t nr, void *ctxt)
> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                      u32 id, void *ctxt)
>  {
>      struct get_reserved_device_memory *grdm = ctxt;
> +    struct domain *d;
> +    unsigned int i;
> +    u32 sbdf;
> +    struct xen_reserved_device_memory rdm = {
> +        .start_pfn = start, .nr_pages = nr
> +    };
> 
> -    if ( grdm->used_entries < grdm->map.nr_entries )
> -    {
> -        struct xen_reserved_device_memory rdm = {
> -            .start_pfn = start, .nr_pages = nr
> -        };
> +    d = rcu_lock_domain_by_any_id(grdm->map.domid);
> +    if ( d == NULL )
> +        return -ESRCH;
> 
> -        if ( __copy_to_guest_offset(grdm->map.buffer,
> grdm->used_entries,
> -                                    &rdm, 1) )
> -            return -EFAULT;
> +    if ( d )
> +    {
> +        if ( d->arch.hvm_domain.pci_force )
> +        {
> +            if ( grdm->used_entries < grdm->map.nr_entries )
> +            {
> +                if ( __copy_to_guest_offset(grdm->map.buffer,
> +                                            grdm->used_entries,
> +                                            &rdm, 1) )
> +                {
> +                    rcu_unlock_domain(d);
> +                    return -EFAULT;
> +                }
> +            }
> +            ++grdm->used_entries;
> +        }
> +        else
> +        {
> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +            {
> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> +                                 d->arch.hvm_domain.pcidevs[i].bus,
> +
> d->arch.hvm_domain.pcidevs[i].devfn);
> +                if ( sbdf == id )
> +                {
> +                    if ( grdm->used_entries < grdm->map.nr_entries )
> +                    {
> +                        if ( __copy_to_guest_offset(grdm->map.buffer,
> +
> grdm->used_entries,
> +                                                    &rdm, 1) )
> +                        {
> +                            rcu_unlock_domain(d);
> +                            return -EFAULT;
> +                        }
> +                    }
> +                    ++grdm->used_entries;
> +                }
> +            }
> +        }
>      }
> 
> -    ++grdm->used_entries;
> -
> +    rcu_unlock_domain(d);
>      return 0;
>  }
>  #endif
> @@ -1144,9 +1183,13 @@ long do_memory_op(unsigned long cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> 
>          if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>              rc = -ENOBUFS;
> +
>          grdm.map.nr_entries = grdm.used_entries;
> -        if ( __copy_to_guest(arg, &grdm.map, 1) )
> -            rc = -EFAULT;
> +        if ( grdm.map.nr_entries )
> +        {
> +            if ( __copy_to_guest(arg, &grdm.map, 1) )
> +                rc = -EFAULT;
> +        }
> 
>          break;
>      }
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index 86cfad3..c5bc8d6 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
> 
>  int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void
> *ctxt)
>  {
> -    struct acpi_rmrr_unit *rmrr;
> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>      int rc = 0;
> +    unsigned int i;
> +    u16 bdf;
> 
> -    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
> +    for_each_rmrr_device ( rmrr, bdf, i )
>      {
> -        rc = func(PFN_DOWN(rmrr->base_address),
> -                  PFN_UP(rmrr->end_address) -
> PFN_DOWN(rmrr->base_address),
> -                  ctxt);
> -        if ( rc )
> -            break;
> +        if ( rmrr != rmrr_cur )
> +        {
> +            rc = func(PFN_DOWN(rmrr->base_address),
> +                      PFN_UP(rmrr->end_address) -
> +                        PFN_DOWN(rmrr->base_address),
> +                      PCI_SBDF(rmrr->segment, bdf),
> +                      ctxt);
> +
> +            if ( unlikely(rc < 0) )
> +                return rc;
> +
> +            /* Just go next. */
> +            if ( !rc )
> +                rmrr_cur = rmrr;
> +
> +            /* Now just return specific to user requirement. */
> +            if ( rc > 0 )
> +                return rc;
> +        }
>      }
> 
> -    return rc;
> +    return 0;
>  }
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index cee4535..0d0544e 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory
> xen_reserved_device_memory_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
> 
>  struct xen_reserved_device_memory_map {
> +    /*
> +     * Domain whose reservation is being changed.
> +     * Unprivileged domains can specify only DOMID_SELF.
> +     */
> +    domid_t        domid;
>      /* IN/OUT */
>      unsigned int nr_entries;
>      /* OUT */
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 409f6f8..8fc6d6d 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -120,7 +120,7 @@ void iommu_dt_domain_destroy(struct domain *d);
> 
>  struct page_info;
> 
> -typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, void *ctxt);
> +typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void
> *ctxt);
> 
>  struct iommu_ops {
>      int (*init)(struct domain *d);
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 5f295f3..d34205f 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -31,6 +31,8 @@
>  #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
>  #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
>  #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
> +#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
> +#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
> 
>  struct pci_dev_info {
>      bool_t is_extfn;
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map
  2014-12-01  9:24 ` [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map Tiejun Chen
@ 2014-12-02  8:46   ` Tian, Kevin
  2014-12-02 19:50   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  8:46 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> We will introduce that hypercall xc_reserved_device_memory_map
> approach to libxc.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  tools/libxc/include/xenctrl.h |  5 +++++
>  tools/libxc/xc_domain.c       | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 84012fe..a3aeac3 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1294,6 +1294,11 @@ int xc_domain_set_memory_map(xc_interface
> *xch,
>  int xc_get_machine_memory_map(xc_interface *xch,
>                                struct e820entry entries[],
>                                uint32_t max_entries);
> +
> +int xc_reserved_device_memory_map(xc_interface *xch,
> +                                  uint32_t dom,
> +                                  struct
> xen_reserved_device_memory entries[],
> +                                  uint32_t *max_entries);
>  #endif
>  int xc_domain_set_time_offset(xc_interface *xch,
>                                uint32_t domid,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 7fd43e9..09fd988 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -679,6 +679,36 @@ int xc_domain_set_memory_map(xc_interface *xch,
> 
>      return rc;
>  }
> +
> +int xc_reserved_device_memory_map(xc_interface *xch,
> +                                  uint32_t domid,
> +                                  struct
> xen_reserved_device_memory entries[],
> +                                  uint32_t *max_entries)
> +{
> +    int rc;
> +    struct xen_reserved_device_memory_map xrdmmap = {
> +        .domid = domid,
> +        .nr_entries = *max_entries
> +    };
> +    DECLARE_HYPERCALL_BOUNCE(entries,
> +                             sizeof(struct
> xen_reserved_device_memory) *
> +                             *max_entries,
> XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +
> +    if ( xc_hypercall_bounce_pre(xch, entries) )
> +        return -1;
> +
> +    set_xen_guest_handle(xrdmmap.buffer, entries);
> +
> +    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
> +                      &xrdmmap, sizeof(xrdmmap));
> +
> +    xc_hypercall_bounce_post(xch, entries);
> +
> +    *max_entries = xrdmmap.nr_entries;
> +
> +    return rc ? rc : xrdmmap.nr_entries;
> +}
> +
>  int xc_get_machine_memory_map(xc_interface *xch,
>                                struct e820entry entries[],
>                                uint32_t max_entries)
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory Tiejun Chen
@ 2014-12-02  8:54   ` Tian, Kevin
  2014-12-02 19:55   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  8:54 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> In case of reserved device memory overlapping with ram, it also probably
> overlap with modules space so we need to check these reserved device
> memory as well.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>, with one small comment

> ---
>  tools/libxc/xc_hvm_build_x86.c | 94
> +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 79 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index c81a25b..ddcf06d 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -54,9 +54,82 @@
> 
>  #define VGA_HOLE_SIZE (0x20)
> 
> +/*
> + * Check whether there exists mmio hole in the specified memory range.
> + * Returns 1 if exists, else returns 0.
> + */
> +static int check_mmio_hole(uint64_t start, uint64_t memsize,
> +                           uint64_t mmio_start, uint64_t mmio_size)
> +{
> +    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
> +        return 0;
> +    else
> +        return 1;
> +}
> +
> +/* Getting all reserved device memory map info. */
> +static struct xen_reserved_device_memory
> +*xc_get_reserved_device_memory_map(xc_interface *xch, unsigned int
> nr_entries,
> +                                   uint32_t dom)
> +{
> +    struct xen_reserved_device_memory *xrdm = NULL;
> +    int rc = xc_reserved_device_memory_map(xch, dom, xrdm,
> &nr_entries);
> +
> +    if ( rc < 0 )
> +    {
> +        if ( errno == ENOBUFS )
> +        {
> +            if ( (xrdm = malloc(nr_entries *
> +
> sizeof(xen_reserved_device_memory_t))) == NULL )
> +            {
> +                PERROR("Could not allocate memory.");
> +                return 0;
> +            }
> +            rc = xc_reserved_device_memory_map(xch, dom, xrdm,
> &nr_entries);
> +            if ( rc )
> +            {
> +                PERROR("Could not get reserved device memory maps.");
> +                free(xrdm);
> +                return 0;
> +            }
> +        }
> +        else
> +            PERROR("Could not get reserved device memory maps.");
> +    }
> +
> +    return xrdm;
> +}
> +
> +static int xc_check_modules_space(xc_interface *xch, uint64_t *mstart_out,
> +                                  uint64_t *mend_out, uint32_t dom)
> +{
> +    unsigned int i = 0, nr_entries = 0;
> +    uint64_t rdm_start = 0, rdm_end = 0;
> +    struct xen_reserved_device_memory *rdm_map =
> +                        xc_get_reserved_device_memory_map(xch,
> nr_entries, dom);
> +
> +    for ( i = 0; i < nr_entries; i++ )
> +    {
> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << XC_PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
> XC_PAGE_SHIFT);
> +
> +        /* Just use check_mmio_hole() to check modules ranges. */
> +        if ( check_mmio_hole(rdm_start,
> +                             rdm_end - rdm_start,

then you don't need a rdm_end variable here, since only size is wanted.

> +                             *mstart_out, *mend_out) )
> +            return -1;
> +    }
> +
> +    free(rdm_map);
> +
> +    return 0;
> +}
> +
>  static int modules_init(struct xc_hvm_build_args *args,
>                          uint64_t vend, struct elf_binary *elf,
> -                        uint64_t *mstart_out, uint64_t *mend_out)
> +                        uint64_t *mstart_out, uint64_t *mend_out,
> +                        xc_interface *xch,
> +                        uint32_t dom)
>  {
>  #define MODULE_ALIGN 1UL << 7
>  #define MB_ALIGN     1UL << 20
> @@ -80,6 +153,10 @@ static int modules_init(struct xc_hvm_build_args
> *args,
>      if ( *mend_out > vend )
>          return -1;
> 
> +    /* Is it overlapping with reserved device memory? */
> +    if ( xc_check_modules_space(xch, mstart_out, mend_out, dom) )
> +        return -1;
> +
>      if ( args->acpi_module.length != 0 )
>          args->acpi_module.guest_addr_out = *mstart_out;
>      if ( args->smbios_module.length != 0 )
> @@ -226,19 +303,6 @@ static int loadmodules(xc_interface *xch,
>      return rc;
>  }
> 
> -/*
> - * Check whether there exists mmio hole in the specified memory range.
> - * Returns 1 if exists, else returns 0.
> - */
> -static int check_mmio_hole(uint64_t start, uint64_t memsize,
> -                           uint64_t mmio_start, uint64_t mmio_size)
> -{
> -    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
> -        return 0;
> -    else
> -        return 1;
> -}
> -
>  static int setup_guest(xc_interface *xch,
>                         uint32_t dom, struct xc_hvm_build_args *args,
>                         char *image, unsigned long image_size)
> @@ -282,7 +346,7 @@ static int setup_guest(xc_interface *xch,
>          goto error_out;
>      }
> 
> -    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
> +    if ( modules_init(args, v_end, &elf, &m_start, &m_end, xch, dom) != 0 )
>      {
>          ERROR("Insufficient space to load modules.");
>          goto error_out;
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-01  9:24 ` [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps Tiejun Chen
@ 2014-12-02  8:59   ` Tian, Kevin
  2014-12-08  7:55     ` Chen, Tiejun
  2014-12-02 20:01   ` Konrad Rzeszutek Wilk
  2014-12-04 15:52   ` Jan Beulich
  2 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  8:59 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> We need to use reserved device memory maps with multiple times, so
> provide just one common function should be friend.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/util.c | 59
> +++++++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h |  2 ++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 80d822f..dd81fb6 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -22,11 +22,14 @@
>  #include "config.h"
>  #include "hypercall.h"
>  #include "ctype.h"
> +#include "errno.h"
>  #include <stdint.h>
>  #include <xen/xen.h>
>  #include <xen/memory.h>
>  #include <xen/sched.h>
> 
> +struct xen_reserved_device_memory *rdm_map;
> +
>  void wrmsr(uint32_t idx, uint64_t v)
>  {
>      asm volatile (
> @@ -828,6 +831,62 @@ int hpet_exists(unsigned long hpet_base)
>      return ((hpet_id >> 16) == 0x8086);
>  }
> 
> +static int
> +get_reserved_device_memory_map(struct xen_reserved_device_memory
> entries[],
> +                               uint32_t *max_entries)
> +{
> +    int rc;
> +    struct xen_reserved_device_memory_map xrdmmap = {
> +        .domid = DOMID_SELF,
> +        .nr_entries = *max_entries
> +    };
> +
> +    set_xen_guest_handle(xrdmmap.buffer, entries);
> +
> +    rc = hypercall_memory_op(XENMEM_reserved_device_memory_map,
> &xrdmmap);
> +    *max_entries = xrdmmap.nr_entries;
> +
> +    return rc;
> +}
> +
> +/*
> + * Getting all reserved device memory map info in case of hvmloader.
> + * We just return zero for any failed cases, and this means we
> + * can't further handle any reserved device memory.
> + */
> +unsigned int hvm_get_reserved_device_memory_map(void)
> +{
> +    static unsigned int nr_entries = 0;
> +    int rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
> +

if this function is aimed to be invoked once, just check wheher rdm_map
is valid instead of always issuing a new call.

> +    if ( rc == -ENOBUFS )
> +    {
> +        rdm_map = mem_alloc(nr_entries*sizeof(struct
> xen_reserved_device_memory),
> +                            0);
> +        if ( rdm_map )
> +        {
> +            rc = get_reserved_device_memory_map(rdm_map,
> &nr_entries);
> +            if ( rc )
> +            {
> +                printf("Could not get reserved dev memory info on
> domain");
> +                return 0;

why return '0' at failure?

> +            }
> +        }
> +        else
> +        {
> +            printf("No space to get reserved dev memory maps!\n");
> +            return 0;
> +        }
> +    }
> +    else if ( rc )
> +    {
> +        printf("Could not get reserved dev memory info on domain");
> +        return 0;
> +    }
> +
> +    return nr_entries;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/firmware/hvmloader/util.h
> b/tools/firmware/hvmloader/util.h
> index a70e4aa..e4f1851 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -241,6 +241,8 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int bios_image_base);
>  void dump_e820_table(struct e820entry *e820, unsigned int nr);
> 
> +unsigned int hvm_get_reserved_device_memory_map(void);
> +
>  #ifndef NDEBUG
>  void perform_tests(void);
>  #else
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory Tiejun Chen
@ 2014-12-02  9:11   ` Tian, Kevin
  2014-12-08  9:04     ` Chen, Tiejun
  2014-12-04 16:04   ` Jan Beulich
  1 sibling, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  9:11 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> We need to make sure all mmio allocation don't overlap
> any rdm, reserved device memory. Here we just skip
> all reserved device memory range in mmio space.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/pci.c  | 54
> ++++++++++++++++++++++++++++++++++++++++-
>  tools/firmware/hvmloader/util.c |  9 +++++++
>  tools/firmware/hvmloader/util.h |  2 ++
>  3 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 4e8d803..fc22ab3 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -38,6 +38,30 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>  enum virtual_vga virtual_vga = VGA_none;
>  unsigned long igd_opregion_pgbase = 0;
> 
> +static unsigned int need_skip_rmrr;
> +extern struct xen_reserved_device_memory *rdm_map;
> +
> +static unsigned int
> +check_reserved_device_memory_map(uint64_t mmio_base, uint64_t
> mmio_max)
> +{
> +    uint32_t i;
> +    uint64_t rdm_start, rdm_end;
> +    unsigned int nr_rdm_entries =
> hvm_get_reserved_device_memory_map();
> +
> +    for ( i = 0; i < nr_rdm_entries; i++ )
> +    {
> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
> PAGE_SHIFT);
> +        if ( check_rdm_hole_conflict(mmio_base, mmio_max -
> mmio_base,
> +                                     rdm_start, rdm_end -
> rdm_start) )
> +        {
> +            need_skip_rmrr++;
> +        }
> +    }
> +
> +    return nr_rdm_entries;
> +}
> +

I don't understand the use of need_skip_rmrr here. What does the counter actually
mean here? Also the function is not well organized. Usually the value returned is
the major purpose of the function, but it looks the function is actually for need_skip_rmrr.
If it's the actual purpose, better to rename the function and move nr_rdm_entries
directly in the outer function.

>  void pci_setup(void)
>  {
>      uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> @@ -59,8 +83,10 @@ void pci_setup(void)
>          uint32_t bar_reg;
>          uint64_t bar_sz;
>      } *bars = (struct bars *)scratch_start;
> -    unsigned int i, nr_bars = 0;
> +    unsigned int i, j, nr_bars = 0;
>      uint64_t mmio_hole_size = 0;
> +    unsigned int nr_rdm_entries;
> +    uint64_t rdm_start, rdm_end;
> 
>      const char *s;
>      /*
> @@ -338,6 +364,14 @@ void pci_setup(void)
>      io_resource.base = 0xc000;
>      io_resource.max = 0x10000;
> 
> +    /* Check low mmio range. */
> +    nr_rdm_entries =
> check_reserved_device_memory_map(mem_resource.base,
> +
> mem_resource.max);
> +    /* Check high mmio range. */
> +    if ( nr_rdm_entries )
> +        nr_rdm_entries =
> check_reserved_device_memory_map(high_mem_resource.base,
> +
> high_mem_resource.max);
> +
>      /* Assign iomem and ioport resources in descending order of size. */
>      for ( i = 0; i < nr_bars; i++ )
>      {
> @@ -393,8 +427,26 @@ void pci_setup(void)
>          }
> 
>          base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> + reallocate_mmio:
>          bar_data |= (uint32_t)base;
>          bar_data_upper = (uint32_t)(base >> 32);
> +
> +        if ( need_skip_rmrr )
> +        {
> +            for ( j = 0; j < nr_rdm_entries; j++ )
> +            {
> +                rdm_start = (uint64_t)rdm_map[j].start_pfn <<
> PAGE_SHIFT;
> +                rdm_end = rdm_start + ((uint64_t)rdm_map[j].nr_pages
> << PAGE_SHIFT);
> +                if ( check_rdm_hole_conflict(base, bar_sz,
> +                                             rdm_start, rdm_end -
> rdm_start) )
> +                {
> +                    base = (rdm_end  + bar_sz - 1) & ~(uint64_t)(bar_sz
> - 1);
> +                    need_skip_rmrr--;
> +                    goto reallocate_mmio;
> +                }
> +            }
> +        }
> +

here is the point which I don't understand. what's required here is just to
walk the rmrr entries for a given allocation, and if conflicting then move
the base. Then how does need_skip_rmrr helps here? and why do you
need pre-check on low/high region earlier?

>          base += bar_sz;
> 
>          if ( (base < resource->base) || (base > resource->max) )
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index dd81fb6..8767897 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -887,6 +887,15 @@ unsigned int
> hvm_get_reserved_device_memory_map(void)
>      return nr_entries;
>  }
> 
> +int check_rdm_hole_conflict(uint64_t start, uint64_t size,
> +                            uint64_t rdm_start, uint64_t rdm_size)
> +{
> +    if ( start + size <= rdm_start || start >= rdm_start + rdm_size )
> +        return 0;
> +    else
> +        return 1;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/firmware/hvmloader/util.h
> b/tools/firmware/hvmloader/util.h
> index e4f1851..9b02f95 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -242,6 +242,8 @@ int build_e820_table(struct e820entry *e820,
>  void dump_e820_table(struct e820entry *e820, unsigned int nr);
> 
>  unsigned int hvm_get_reserved_device_memory_map(void);
> +int check_rdm_hole_conflict(uint64_t start, uint64_t size,
> +                            uint64_t rdm_start, uint64_t rdm_size);
> 
>  #ifndef NDEBUG
>  void perform_tests(void);
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps
  2014-12-01  9:24 ` [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
@ 2014-12-02  9:42   ` Tian, Kevin
  2014-12-02 20:17   ` Konrad Rzeszutek Wilk
  2014-12-04 16:20   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  9:42 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> We need to check to reserve all reserved device memory maps in e820
> to avoid any potential guest memory conflict.
> 
> Currently, if we can't insert RDM entries directly, we may need to handle
> several ranges as follows:
> a. Fixed Ranges --> BUG()
>  lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
>  BIOS region,
>  RESERVED_MEMBASE ~ 0x100000000,
> b. RAM or RAM:Hole -> Try to reserve
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/e820.c | 168
> ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 168 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/e820.c
> b/tools/firmware/hvmloader/e820.c
> index 2e05e93..ef87e41 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -22,6 +22,7 @@
> 
>  #include "config.h"
>  #include "util.h"
> +#include <xen/memory.h>
> 
>  void dump_e820_table(struct e820entry *e820, unsigned int nr)
>  {
> @@ -68,12 +69,173 @@ void dump_e820_table(struct e820entry *e820,
> unsigned int nr)
>      }
>  }
> 
> +extern struct xen_reserved_device_memory *rdm_map;
> +static unsigned int construct_rdm_e820_maps(unsigned int
> next_e820_entry_index,
> +                                            uint32_t nr_map,
> +                                            struct
> xen_reserved_device_memory *map,
> +                                            struct e820entry
> *e820,
> +                                            unsigned int
> lowmem_reserved_base,
> +                                            unsigned int
> bios_image_base)
> +{
> +    unsigned int i, j, sum_nr;
> +    uint64_t start, end, next_start, rdm_start, rdm_end;
> +    uint32_t type;
> +    int err = 0;
> +
> +    for ( i = 0; i < nr_map; i++ )
> +    {
> +        rdm_start = (uint64_t)map[i].start_pfn << PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)map[i].nr_pages <<
> PAGE_SHIFT);
> +
> +        for ( j = 0; j < next_e820_entry_index - 1; j++ )
> +        {
> +            sum_nr = next_e820_entry_index + nr_map;

need a check whether sum_nr exceeds max #entries...

> +            start = e820[j].addr;
> +            end = e820[j].addr + e820[j].size;
> +            type = e820[j].type;
> +            next_start = e820[j+1].addr;
> +
> +            if ( rdm_start >= start && rdm_start <= end )
> +            {

what about other confliction pattern here, e.g. rdm_end between [start,end]?

> +                /*
> +                 * lowmem_reserved_base-0xA0000: reserved by BIOS
> +                 * implementation.
> +                 * Or BIOS region.
> +                 */
> +                if ( (lowmem_reserved_base < 0xA0000 &&
> +                        start == lowmem_reserved_base) ||
> +                     start == bios_image_base )
> +                {
> +                    err = -1;
> +                    break;
> +                }
> +            }
> +
> +            /* Just amid those remaining e820 entries. */
> +            if ( (rdm_start > end) && (rdm_end < next_start) )

>= and <=?

> +            {
> +                memmove(&e820[j+2], &e820[j+1],
> +                        (sum_nr - j - 1) * sizeof(struct e820entry));

seems you just need (next_e820_entry_index - j - 1)

> +
> +                /* Then fill RMRR into that entry. */
> +                e820[j+1].addr = rdm_start;
> +                e820[j+1].size = rdm_end - rdm_start;
> +                e820[j+1].type = E820_RESERVED;
> +                next_e820_entry_index++;
> +                continue;

continue->break? otherwise the newly added rmrr entry will be checked next
for same rmrr entry, and then catch an unnecessary confliction.

> +            }
> +
> +            /* Already at the end. */
> +            if ( (rdm_start > end) && !next_start )
> +            {
> +                e820[next_e820_entry_index].addr = rdm_start;
> +                e820[next_e820_entry_index].size = rdm_end -
> rdm_start;
> +                e820[next_e820_entry_index].type = E820_RESERVED;
> +                next_e820_entry_index++;
> +                continue;

break though it's the last one.

> +            }
> +
> +            if ( type == E820_RAM )
> +            {
> +                /* If coincide with one RAM range. */
> +                if ( rdm_start == start && rdm_end == end)
> +                {
> +                    e820[j].type = E820_RESERVED;
> +                    continue;
> +                }
> +
> +                /* If we're just aligned with start of one RAM range. */
> +                if ( rdm_start == start && rdm_end < end )
> +                {
> +                    memmove(&e820[j+1], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j+1].addr = rdm_end;
> +                    e820[j+1].size = e820[j].addr + e820[j].size -
> rdm_end;
> +                    e820[j+1].type = E820_RAM;
> +                    next_e820_entry_index++;
> +
> +                    e820[j].addr = rdm_start;
> +                    e820[j].size = rdm_end - rdm_start;
> +                    e820[j].type = E820_RESERVED;
> +                    continue;
> +                }
> +
> +                /* If we're just aligned with end of one RAM range. */
> +                if ( rdm_start > start && rdm_end == end )
> +                {
> +                    memmove(&e820[j+1], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j].size = rdm_start - e820[j].addr;
> +                    e820[j].type = E820_RAM;
> +
> +                    e820[j+1].addr = rdm_start;
> +                    e820[j+1].size = rdm_end - rdm_start;
> +                    e820[j+1].type = E820_RESERVED;
> +                    next_e820_entry_index++;
> +                    continue;
> +                }
> +
> +                /* If we're just in of one RAM range */
> +                if ( rdm_start > start && rdm_end < end )
> +                {
> +                    memmove(&e820[j+2], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j+2].addr = rdm_end;
> +                    e820[j+2].size = e820[j].addr + e820[j].size -
> rdm_end;
> +                    e820[j+2].type = E820_RAM;
> +                    next_e820_entry_index++;
> +
> +                    e820[j+1].addr = rdm_start;
> +                    e820[j+1].size = rdm_end - rdm_start;
> +                    e820[j+1].type = E820_RESERVED;
> +                    next_e820_entry_index++;
> +
> +                    e820[j].size = rdm_start - e820[j].addr;
> +                    e820[j].type = E820_RAM;
> +                    continue;
> +                }
> +
> +                /* If we're going last RAM:Hole range */
> +                if ( end < next_start && rdm_start > start &&
> +                     rdm_end < next_start )
> +                {
> +                    memmove(&e820[j+1], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j].size = rdm_start - e820[j].addr;
> +                    e820[j].type = E820_RAM;
> +
> +                    e820[j+1].addr = rdm_start;
> +                    e820[j+1].size = rdm_end - rdm_start;
> +                    e820[j+1].type = E820_RESERVED;
> +                    next_e820_entry_index++;
> +                    continue;

above logic looks incomplete. e.g. what about a RMRR region conflicts with multiple
e820 entries, etc.

also you need detect confliction with igd_opregion too, since although it's marked
as reserved, it means some useful thing so confliction with RMRR also means problem.
ideally like Yang suggested before, opregion base can be adjusted dynamically but
here at least an error should be thrown out as the 1st step.

> +                }
> +            }
> +        }

here also need a check on 'err'. You don't want to continue the outer loop too when
you already see an error condition.

> +    }
> +
> +    /* These overlap may issue guest can't work well. */
> +    if ( err )
> +    {
> +        printf("Guest can't work with some reserved device memory
> overlap!\n");
> +        BUG();
> +    }
> +
> +    /* Fine to construct RDM mappings into e820. */
> +    return next_e820_entry_index;
> +}
> +
>  /* Create an E820 table based on memory parameters provided in hvm_info.
> */
>  int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
>      unsigned int nr = 0;
> +    unsigned int nr_entries = 0;
> 
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -169,6 +331,12 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
> 
> +    nr_entries = hvm_get_reserved_device_memory_map();
> +    if ( nr_entries )
> +        nr = construct_rdm_e820_maps(nr, nr_entries, rdm_map, e820,
> +                                     lowmem_reserved_base,
> +                                     bios_image_base);
> +
>      return nr;
>  }
> 
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory Tiejun Chen
@ 2014-12-02  9:48   ` Tian, Kevin
  2014-12-02 20:23   ` Konrad Rzeszutek Wilk
  2014-12-04 16:28   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  9:48 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
> to allocate some memory to use in runtime cycle, so we alsoe need to
> make sure all reserved device memory don't overlap such a region.

OK, seems you meant to use this patch to address opregion confliction.
when it works, I think it's better to still add opregion detection in e820,
as a modulo suggestion. It's not good to make assumption in one module
about how other module works. Now opregion is allocated dynamically,
but it may be fixed somewhere in the future. So you always need to
detect confliction, purely in e820 world, regardless of how a range is
actually allocated.

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/util.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 8767897..f3723c7 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -416,9 +416,29 @@ static uint32_t alloc_down =
> RESERVED_MEMORY_DYNAMIC_END;
> 
>  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
>  {
> +    unsigned int i, num = hvm_get_reserved_device_memory_map();
> +    uint64_t rdm_start, rdm_end;
> +    uint32_t alloc_start, alloc_end;
> +
>      alloc_down -= nr_mfns << PAGE_SHIFT;
> +    alloc_start = alloc_down;
> +    alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
> +    for ( i = 0; i < num; i++ )
> +    {
> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
> PAGE_SHIFT);
> +        if ( check_rdm_hole_conflict((uint64_t)alloc_start,
> +                                     (uint64_t)alloc_end,
> +                                     rdm_start, rdm_end -
> rdm_start) )
> +        {
> +            alloc_end = rdm_start;
> +            alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
> +            BUG_ON(alloc_up >= alloc_start);
> +        }
> +    }
> +
>      BUG_ON(alloc_up >= alloc_down);
> -    return alloc_down >> PAGE_SHIFT;
> +    return alloc_start >> PAGE_SHIFT;
>  }
> 

this patch is required, but I'd prefer to have an initialization phase check
to have a sane alloc_up/down, so you don't bother detection for every
run-time call.

>  void *mem_alloc(uint32_t size, uint32_t align)
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping
  2014-12-01  9:24 ` [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping Tiejun Chen
@ 2014-12-02  9:57   ` Tian, Kevin
  2014-12-04 16:42   ` Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  9:57 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:24 PM
> 
> We need to reject to populate reserved device memory mapping, and
> then make sure all reserved device memory can't be accessed by any
> !iommu approach.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/arch/x86/mm/p2m.c     | 59
> +++++++++++++++++++++++++++++++++++++++++++++--
>  xen/include/asm-x86/p2m.h |  9 ++++++++
>  2 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index efa49dd..607ecd0 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -556,6 +556,40 @@ guest_physmap_remove_page(struct domain *d,
> unsigned long gfn,
>      gfn_unlock(p2m, gfn, page_order);
>  }
> 
> +/* Check if we are accessing rdm. */
> +int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                     u32 id, void *ctxt)
> +{
> +    xen_pfn_t end = start + nr;
> +    unsigned int i;
> +    u32 sbdf;
> +    struct p2m_get_reserved_device_memory *pgrdm = ctxt;
> +    struct domain *d = pgrdm->domain;
> +
> +    if ( d->arch.hvm_domain.pci_force )
> +    {
> +        if ( pgrdm->gfn >= start && pgrdm->gfn < end )
> +            return 1;
> +    }
> +    else
> +    {
> +        for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +        {
> +            sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> +                             d->arch.hvm_domain.pcidevs[i].bus,
> +                             d->arch.hvm_domain.pcidevs[i].devfn);
> +
> +            if ( sbdf == id )
> +            {
> +                if ( pgrdm->gfn >= start && pgrdm->gfn < end )
> +                    return 1;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  int
>  guest_physmap_add_entry(struct domain *d, unsigned long gfn,
>                          unsigned long mfn, unsigned int page_order,
> @@ -568,6 +602,7 @@ guest_physmap_add_entry(struct domain *d, unsigned
> long gfn,
>      mfn_t omfn;
>      int pod_count = 0;
>      int rc = 0;
> +    struct p2m_get_reserved_device_memory pgrdm;
> 
>      if ( !paging_mode_translate(d) )
>      {
> @@ -686,8 +721,28 @@ guest_physmap_add_entry(struct domain *d,
> unsigned long gfn,
>      /* Now, actually do the two-way mapping */
>      if ( mfn_valid(_mfn(mfn)) )
>      {
> -        rc = p2m_set_entry(p2m, gfn, _mfn(mfn), page_order, t,
> -                           p2m->default_access);
> +        pgrdm.gfn = gfn;
> +        pgrdm.domain = d;
> +        if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )

why is this check only for shared case?

> +        {
> +            rc =
> iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                                  &pgrdm);
> +            /* We always avoid populating reserved device memory. */
> +            if ( rc == 1 )
> +            {
> +                rc = -EBUSY;
> +                goto out;
> +            }
> +            else if ( rc < 0 )
> +            {
> +                printk(XENLOG_G_WARNING
> +                       "Can't check reserved device memory for
> Dom%d.\n",
> +                       d->domain_id);
> +                goto out;
> +            }
> +        }
> +
> +        rc = p2m_set_entry(p2m, gfn, _mfn(mfn), page_order, t,
> p2m->default_access);
>          if ( rc )
>              goto out; /* Failed to update p2m, bail without updating m2p.
> */
> 
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 5f7fe71..99f7fb7 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -709,6 +709,15 @@ static inline unsigned int
> p2m_get_iommu_flags(p2m_type_t p2mt)
>      return flags;
>  }
> 
> +struct p2m_get_reserved_device_memory {
> +    unsigned long gfn;
> +    struct domain *domain;
> +};
> +
> +/* Check if we are accessing rdm. */
> +extern int p2m_check_reserved_device_memory(xen_pfn_t start,
> xen_ulong_t nr,
> +                                            u32 id, void *ctxt);
> +
>  #endif /* _XEN_P2M_H */
> 
>  /*
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation
  2014-12-01  9:24 ` [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
@ 2014-12-02  9:59   ` Tian, Kevin
  2014-12-02 20:26   ` Konrad Rzeszutek Wilk
  2014-12-04 16:46   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02  9:59 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:25 PM
> 
> We always reserve these ranges since we never allow any stuff to
> poke them.
> 
> But in theory some untrusted VM can maliciously access them. So we
> need to intercept this approach. But we just don't want to leak
> anything or introduce any side affect since other OSs may touch them
> by careless behavior, so its enough to have a lightweight way, and
> it shouldn't be same as those broken pages which cause domain crush.
> 
> So we just need to return with next eip then let VM/OS itself handle
> such a scenario as its own logic.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 2907afa..3ee884a 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2403,6 +2403,7 @@ static void ept_handle_violation(unsigned long
> qualification, paddr_t gpa)
>      p2m_type_t p2mt;
>      int ret;
>      struct domain *d = current->domain;
> +    struct p2m_get_reserved_device_memory pgrdm;
> 
>      /*
>       * We treat all write violations also as read violations.
> @@ -2438,6 +2439,23 @@ static void ept_handle_violation(unsigned long
> qualification, paddr_t gpa)
>          __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
>      }
> 
> +    /* This means some untrusted VM can maliciously access reserved
> +     * device memory. But we just don't want to leak anything or
> +     * introduce any side affect since other OSs may touch them by
> +     * careless behavior, so its enough to have a lightweight way.
> +     * Here we just need to return with next eip then let VM/OS itself
> +     * handle such a scenario as its own logic.
> +     */
> +    pgrdm.gfn = gfn;
> +    pgrdm.domain = d;
> +    ret =
> iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                           &pgrdm);

can this be optimized to not walk rmrr map if no device is assigned?

> +    if ( ret )
> +    {
> +        update_guest_eip();
> +        return;
> +    }
> +
>      if ( qualification & EPT_GLA_VALID )
>      {
>          __vmread(GUEST_LINEAR_ADDRESS, &gla);
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry
  2014-12-01  9:24 ` [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2014-12-02 10:00   ` Tian, Kevin
  2014-12-02 20:29   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02 10:00 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:25 PM
> 
> We will create RMRR mapping as follows:
> 
> If gfn space unoccupied, we just set that. If
> space already occupy by 1:1 RMRR mapping do thing. Others
> should be failed.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/arch/x86/mm/p2m.c     | 28 ++++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h |  4 ++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 607ecd0..c415521 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -913,6 +913,34 @@ int set_mmio_p2m_entry(struct domain *d, unsigned
> long gfn, mfn_t mfn)
>      return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
>  }
> 
> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                           p2m_access_t p2ma)
> +{
> +    p2m_type_t p2mt;
> +    p2m_access_t a;
> +    mfn_t mfn;
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int ret = -EBUSY;
> +
> +    gfn_lock(p2m, gfn, 0);
> +
> +    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) )
> +        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
> p2m_mmio_direct,
> +                            p2ma);
> +    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a ==
> p2ma )
> +        ret = 0;
> +    else
> +        printk(XENLOG_G_WARNING
> +               "Cannot identity map d%d:%lx, already mapped to %lx.\n",
> +               d->domain_id, gfn, mfn_x(mfn));
> +
> +    gfn_unlock(p2m, gfn, 0);
> +
> +    return ret;
> +}
> +
>  /* Returns: 0 for success, -errno for failure */
>  int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
>  {
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 99f7fb7..26cf0cc 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -509,6 +509,10 @@ int p2m_is_logdirty_range(struct p2m_domain *,
> unsigned long start,
>  int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
>  int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
> 
> +/* Set identity addresses in the p2m table (for pass-through) */
> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                           p2m_access_t p2ma);
> +
>  /* Add foreign mapping to the guest's p2m table. */
>  int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>                      unsigned long gpfn, domid_t foreign_domid);
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 15/17] xen:vtd: create RMRR mapping
  2014-12-01  9:24 ` [v8][PATCH 15/17] xen:vtd: create RMRR mapping Tiejun Chen
@ 2014-12-02 10:02   ` Tian, Kevin
  2014-12-02 20:30   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02 10:02 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:25 PM
> 
> intel_iommu_map_page() does nothing if VT-d shares EPT page table.
> So rmrr_identity_mapping() never create RMRR mapping but in some
> cases like some GFX drivers it still need to access RMRR.
> 
> Here we will create those RMRR mappings even in shared EPT case.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/drivers/passthrough/vtd/iommu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index a38f201..a54c6eb 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1856,10 +1856,15 @@ static int rmrr_identity_mapping(struct domain
> *d, bool_t map,
> 
>      while ( base_pfn < end_pfn )
>      {
> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> -
> IOMMUF_readable|IOMMUF_writable);
> -
> -        if ( err )
> +        int err = 0;
> +        if ( iommu_use_hap_pt(d) )
> +        {
> +            ASSERT(!iommu_passthrough || !is_hardware_domain(d));
> +            if ( (err = set_identity_p2m_entry(d, base_pfn,
> p2m_access_rw)) )
> +                return err;
> +        }
> +        else if ( (err = intel_iommu_map_page(d, base_pfn, base_pfn,
> +					      IOMMUF_readable|IOMMUF_writable)) )
>              return err;
>          base_pfn++;
>      }
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR
  2014-12-01  9:24 ` [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR Tiejun Chen
@ 2014-12-02 10:11   ` Tian, Kevin
  2014-12-02 20:40   ` Konrad Rzeszutek Wilk
  2014-12-04 17:05   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-02 10:11 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Monday, December 01, 2014 5:25 PM
> 
> Sometimes different devices may share RMRR range so in this
> case we shouldn't assign these devices into different VMs
> since they may have potential leakage even damage between VMs.
> 
> So we need to group all devices as RMRR range to make sure they
> are just assigned into the same VM.
> 
> Here we introduce two field, gid and domid, in struct,
> acpi_rmrr_unit:
>  gid: indicate which group this device owns. "0" is invalid so
>       just start from "1".
>  domid: indicate which domain this device owns currently. Firstly
>         the hardware domain should own it.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/drivers/passthrough/vtd/dmar.c  | 28 ++++++++++++++-
>  xen/drivers/passthrough/vtd/dmar.h  |  2 ++
>  xen/drivers/passthrough/vtd/iommu.c | 68
> +++++++++++++++++++++++++++++++++----
>  3 files changed, 91 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index c5bc8d6..8d3406f 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -572,10 +572,11 @@ acpi_parse_one_rmrr(struct acpi_dmar_header
> *header)
>  {
>      struct acpi_dmar_reserved_memory *rmrr =
>          container_of(header, struct acpi_dmar_reserved_memory,
> header);
> -    struct acpi_rmrr_unit *rmrru;
> +    struct acpi_rmrr_unit *rmrru, *cur_rmrr;
>      void *dev_scope_start, *dev_scope_end;
>      u64 base_addr = rmrr->base_address, end_addr = rmrr->end_address;
>      int ret;
> +    static unsigned int group_id = 0;
> 
>      if ( (ret = acpi_dmar_check_length(header, sizeof(*rmrr))) != 0 )
>          return ret;
> @@ -611,6 +612,8 @@ acpi_parse_one_rmrr(struct acpi_dmar_header
> *header)
>      rmrru->base_address = base_addr;
>      rmrru->end_address = end_addr;
>      rmrru->segment = rmrr->segment;
> +    /* "0" is an invalid group id. */
> +    rmrru->gid = 0;
> 
>      dev_scope_start = (void *)(rmrr + 1);
>      dev_scope_end   = ((void *)rmrr) + header->length;
> @@ -682,7 +685,30 @@ acpi_parse_one_rmrr(struct acpi_dmar_header
> *header)
>                      "So please set pci_rdmforce to reserve these ranges"
>                      " if you need such a device in hotplug case.\n");
> 
> +            list_for_each_entry(cur_rmrr, &acpi_rmrr_units, list)
> +            {
> +                /*
> +                 * Any same or overlap range mean they should be
> +                 * at same group.
> +                 */
> +                if ( ((base_addr >= cur_rmrr->base_address) &&
> +                     (end_addr <= cur_rmrr->end_address)) ||
> +                     ((base_addr <= cur_rmrr->base_address) &&
> +                     (end_addr >= cur_rmrr->end_address)) )

again, such overlap detection pattern is incomplete.

> +                {
> +                    rmrru->gid = cur_rmrr->gid;
> +                    continue;
> +                }
> +            }
> +
>              acpi_register_rmrr_unit(rmrru);
> +
> +            /* Allocate group id from gid:1. */
> +            if ( !rmrru->gid )
> +            {
> +                group_id++;
> +                rmrru->gid = group_id;
> +            }
>          }
>      }
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.h
> b/xen/drivers/passthrough/vtd/dmar.h
> index af1feef..a57c0d4 100644
> --- a/xen/drivers/passthrough/vtd/dmar.h
> +++ b/xen/drivers/passthrough/vtd/dmar.h
> @@ -76,6 +76,8 @@ struct acpi_rmrr_unit {
>      u64    end_address;
>      u16    segment;
>      u8     allow_all:1;
> +    int    gid;
> +    domid_t    domid;
>  };
> 
>  struct acpi_atsr_unit {
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index a54c6eb..ba40209 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1882,9 +1882,9 @@ static int rmrr_identity_mapping(struct domain *d,
> bool_t map,
> 
>  static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>  {
> -    struct acpi_rmrr_unit *rmrr;
> -    u16 bdf;
> -    int ret, i;
> +    struct acpi_rmrr_unit *rmrr, *g_rmrr;
> +    u16 bdf, g_bdf;
> +    int ret, i, j;
> 
>      ASSERT(spin_is_locked(&pcidevs_lock));
> 
> @@ -1905,6 +1905,32 @@ static int intel_iommu_add_device(u8 devfn, struct
> pci_dev *pdev)
>               PCI_BUS(bdf) == pdev->bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> +            if ( rmrr->domid == hardware_domain->domain_id )
> +            {
> +                for_each_rmrr_device ( g_rmrr, g_bdf, j )
> +                {
> +                    if ( g_rmrr->gid == rmrr->gid )
> +                    {
> +                        if ( g_rmrr->domid ==
> hardware_domain->domain_id )
> +                            g_rmrr->domid =
> pdev->domain->domain_id;
> +                        else if ( g_rmrr->domid !=
> pdev->domain->domain_id )
> +                        {
> +                            rmrr->domid = g_rmrr->domid;
> +                            continue;
> +                        }
> +                    }
> +                }
> +            }
> +
> +            if ( rmrr->domid != pdev->domain->domain_id )
> +            {
> +                domain_context_unmap(pdev->domain, devfn, pdev);
> +                dprintk(XENLOG_ERR VTDPREFIX, "d%d: this is a group
> device owned by d%d\n",
> +                        pdev->domain->domain_id, rmrr->domid);
> +                rmrr->domid = 0;

0 or release to hardware domain?

> +                return -EINVAL;
> +            }
> +
>              ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>              if ( ret )
>                  dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping
> failed\n",
> @@ -1946,6 +1972,8 @@ static int intel_iommu_remove_device(u8 devfn,
> struct pci_dev *pdev)
>               PCI_DEVFN2(bdf) != devfn )
>              continue;
> 
> +        /* Just release to hardware domain. */
> +        rmrr->domid = hardware_domain->domain_id;
>          rmrr_identity_mapping(pdev->domain, 0, rmrr);
>      }
> 
> @@ -2104,6 +2132,8 @@ static void __hwdom_init setup_hwdom_rmrr(struct
> domain *d)
>      spin_lock(&pcidevs_lock);
>      for_each_rmrr_device ( rmrr, bdf, i )
>      {
> +        /* hwdom should own all devices at first. */
> +        rmrr->domid = d->domain_id;
>          ret = rmrr_identity_mapping(d, 1, rmrr);
>          if ( ret )
>              dprintk(XENLOG_ERR VTDPREFIX,
> @@ -2273,9 +2303,9 @@ static int reassign_device_ownership(
>  static int intel_iommu_assign_device(
>      struct domain *d, u8 devfn, struct pci_dev *pdev)
>  {
> -    struct acpi_rmrr_unit *rmrr;
> -    int ret = 0, i;
> -    u16 bdf, seg;
> +    struct acpi_rmrr_unit *rmrr, *g_rmrr;
> +    int ret = 0, i, j;
> +    u16 bdf, seg, g_bdf;
>      u8 bus;
> 
>      if ( list_empty(&acpi_drhd_units) )
> @@ -2300,6 +2330,32 @@ static int intel_iommu_assign_device(
>               PCI_BUS(bdf) == bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> +            if ( rmrr->domid == hardware_domain->domain_id )
> +            {
> +                for_each_rmrr_device ( g_rmrr, g_bdf, j )
> +                {
> +                    if ( g_rmrr->gid == rmrr->gid )
> +                    {
> +                        if ( g_rmrr->domid ==
> hardware_domain->domain_id )
> +                            g_rmrr->domid =
> pdev->domain->domain_id;
> +                        else if ( g_rmrr->domid !=
> pdev->domain->domain_id )
> +                        {
> +                            rmrr->domid = g_rmrr->domid;
> +                            continue;
> +                        }
> +                    }
> +                }
> +            }
> +
> +            if ( rmrr->domid != pdev->domain->domain_id )
> +            {
> +                domain_context_unmap(pdev->domain, devfn, pdev);
> +                dprintk(XENLOG_ERR VTDPREFIX, "d%d: this is a group
> device owned by d%d\n",
> +                        pdev->domain->domain_id, rmrr->domid);
> +                rmrr->domid = 0;
> +                return -EINVAL;
> +            }
> +
>              ret = rmrr_identity_mapping(d, 1, rmrr);
>              if ( ret )
>              {
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory Tiejun Chen
@ 2014-12-02 14:54   ` Julien Grall
  2014-12-18 22:56     ` Tamas K Lengyel
  2014-12-02 20:27   ` Konrad Rzeszutek Wilk
  2014-12-04 16:51   ` Jan Beulich
  2 siblings, 1 reply; 106+ messages in thread
From: Julien Grall @ 2014-12-02 14:54 UTC (permalink / raw)
  To: Tiejun Chen, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, kevin.tian, tim, yang.z.zhang
  Cc: Tamas K Lengyel, xen-devel

Hi,

CC Tamas as he did some work on memaccess for ARM.

On 01/12/14 09:24, Tiejun Chen wrote:
> We can't expost those reserved device memory in case of mem_access

s/expost/expose/

> since any access may corrupt device usage.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/common/mem_access.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
> index 6c2724b..72a807a 100644
> --- a/xen/common/mem_access.c
> +++ b/xen/common/mem_access.c
> @@ -55,6 +55,43 @@ void mem_access_resume(struct domain *d)
>      }
>  }
>  
> +/* We can't expose reserved device memory. */
> +static int mem_access_check_rdm(struct domain *d, uint64_aligned_t start,
> +                                uint32_t nr)
> +{
> +    uint32_t i;
> +    struct p2m_get_reserved_device_memory pgrdm;

p2m_get_reserved_device_memory is only defined on x86. This will fail to
compile on ARM when memaccess is enabled.

> +    int rc = 0;
> +
> +    if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
> +    {
> +        for ( i = 0; i < nr; i++ )
> +        {
> +            pgrdm.gfn = start + i;
> +            pgrdm.domain = d;
> +            rc = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                                  &pgrdm);

Same here.

Overall, I'm not sure if it's worth to introduce this code in the common
part has it doesn't seem useful for ARM.

In any case, you have to at least stub those bits for ARM.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 00/17] xen: RMRR fix
  2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
                   ` (16 preceding siblings ...)
  2014-12-01  9:24 ` [v8][PATCH 17/17] xen/vtd: re-enable USB device assignment if enable pci_force Tiejun Chen
@ 2014-12-02 19:17 ` Konrad Rzeszutek Wilk
  17 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 19:17 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

> How to reproduce this issu:
> 
> * In shared-ept case with Xen.
> * Target owns RMRR.

How do you verify/check for that?


> * Do IGD passthrough with Windows guest OS: gfx_passthru=1 pci=["00:02.0"]
> * Please use qemu-xen-traditional.
> 
> My test machine is BDW with Windows 7.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-01  9:24 ` [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm Tiejun Chen
  2014-12-02  8:33   ` Tian, Kevin
@ 2014-12-02 19:39   ` Konrad Rzeszutek Wilk
  2014-12-08  3:16     ` Chen, Tiejun
  2014-12-04 15:33   ` Jan Beulich
  2 siblings, 1 reply; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 19:39 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:20PM +0800, Tiejun Chen wrote:
> This should be based on a new parameter globally, 'pci_rdmforce'.
> 
> pci_rdmforce = 1 => Of course this should be 0 by default.
> 
> '1' means we should force check to reserve all ranges. If failed
> VM wouldn't be created successfully. This also can give user a
> chance to work well with later hotplug, even if not a device
> assignment while creating VM.
> 
> But we can override that by one specific pci device:
> 
> pci = ['AA:BB.CC,rdmforce=0/1]
> 
> But this 'rdmforce' should be 1 by default since obviously any
> passthrough device always need to do this. Actually no one really
> want to set as '0' so it may be unnecessary but I'd like to leave
> this as a potential approach.
> 
> So this domctl provides an approach to control how to populate
> reserved device memory by tools.
> 
> Note we always post a message to user about this once we owns
> RMRR.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  docs/man/xl.cfg.pod.5              |  6 +++++
>  docs/misc/vtd.txt                  | 15 ++++++++++++
>  tools/libxc/include/xenctrl.h      |  6 +++++
>  tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
>  tools/libxl/libxl_create.c         |  3 +++
>  tools/libxl/libxl_dm.c             | 47 ++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h       |  4 ++++
>  tools/libxl/libxl_types.idl        |  2 ++
>  tools/libxl/libxlu_pci.c           |  2 ++
>  tools/libxl/xl_cmdimpl.c           | 10 ++++++++

In the past we had split the hypervisor and the
toolstack patches in two. So that one could focus
on the hypervisor ones first, and then in another
patch on the toolstack.

But perhaps this was intended to be in one patch?

>  xen/drivers/passthrough/pci.c      | 39 +++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
>  xen/include/asm-x86/hvm/domain.h   |  4 ++++

I don't see ARM here? Should there be an ARM variant of this? If not
should the toolstack ones only run under x86?

>  xen/include/public/domctl.h        | 21 +++++++++++++++++
>  xen/xsm/flask/hooks.c              |  1 +
>  15 files changed, 196 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 622ea53..9adc41e 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
>  D0-D3hot power management states for the PCI device. False (0) by
>  default.
>  
> +=item B<rdmforce=BOOLEAN>
> +
> +(HVM/x86 only) Specifies that the VM would force to check and try to

s/force/forced/
> +reserve all reserved device memory, like RMRR, associated to the PCI
> +device. False (0) by default.

Not sure I understand. How would the VM be forced to do this? Or is
it that the hvmloader would force to do this? And if it fails (as you
say 'try') ? What then? 

> +
>  =back
>  
>  =back
> diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> index 9af0e99..23544d5 100644
> --- a/docs/misc/vtd.txt
> +++ b/docs/misc/vtd.txt
> @@ -111,6 +111,21 @@ in the config file:
>  To override for a specific device:
>  	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
>  
> +RDM, 'reserved device memory', for PCI Device Passthrough
> +---------------------------------------------------------
> +
> +The BIOS controls some devices in terms of some reginos of memory used for

Could you elaborate what 'some devices' are? Network cards? GPUs? What
are the most commons ones.

s/reginos/regions/

And by regions you mean BAR regions?

> +these devices. This kind of region should be reserved before creating a VM
> +to make sure they are not occupied by RAM/MMIO to conflict, and also we can

You said 'This' but here you are using the plural ' are'. IF you want it plural
it needs to be 'These regions'
> +create necessary IOMMU table successfully.
> +
> +To enable this globally, add "pci_rdmforce" in the config file:
> +
> +	pci_rdmforce = 1         (default is 0)

The guest config file? Or /etc/xen/xl.conf ?

> +
> +Or just enable for a specific device:
> +	pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
> +
>  
>  Caveat on Conventional PCI Device Passthrough
>  ---------------------------------------------
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 0ad8b8d..84012fe 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2038,6 +2038,12 @@ int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
>                       uint32_t machine_bdf);
>  
> +int xc_domain_device_setrdm(xc_interface *xch,
> +                            uint32_t domid,
> +                            uint32_t num_pcidevs,
> +                            uint32_t pci_rdmforce,
> +                            struct xen_guest_pcidev_info *pcidevs);
> +
>  int xc_get_device_group(xc_interface *xch,
>                       uint32_t domid,
>                       uint32_t machine_bdf,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index b864872..7fd43e9 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1633,6 +1633,34 @@ int xc_assign_device(
>      return do_domctl(xch, &domctl);
>  }
>  
> +int xc_domain_device_setrdm(xc_interface *xch,
> +                            uint32_t domid,
> +                            uint32_t num_pcidevs,
> +                            uint32_t pci_rdmforce,
> +                            struct xen_guest_pcidev_info *pcidevs)
> +{
> +    int ret;
> +    DECLARE_DOMCTL;
> +    DECLARE_HYPERCALL_BOUNCE(pcidevs,
> +                             num_pcidevs*sizeof(xen_guest_pcidev_info_t),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
> +
> +    if ( xc_hypercall_bounce_pre(xch, pcidevs) )
> +        return -1;
> +
> +    domctl.cmd = XEN_DOMCTL_set_rdm;
> +    domctl.domain = (domid_t)domid;
> +    domctl.u.set_rdm.flags = pci_rdmforce;
> +    domctl.u.set_rdm.num_pcidevs = num_pcidevs;
> +    set_xen_guest_handle(domctl.u.set_rdm.pcidevs, pcidevs);
> +
> +    ret = do_domctl(xch, &domctl);
> +
> +    xc_hypercall_bounce_post(xch, pcidevs);
> +
> +    return ret;
> +}
> +
>  int xc_get_device_group(
>      xc_interface *xch,
>      uint32_t domid,
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 1198225..c615686 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -862,6 +862,9 @@ static void initiate_domain_create(libxl__egc *egc,
>      ret = libxl__domain_build_info_setdefault(gc, &d_config->b_info);
>      if (ret) goto error_out;
>  
> +    ret = libxl__domain_device_setrdm(gc, d_config, domid);
> +    if (ret) goto error_out;
> +
>      if (!sched_params_valid(gc, domid, &d_config->b_info.sched_params)) {
>          LOG(ERROR, "Invalid scheduling parameters\n");
>          ret = ERROR_INVAL;
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 3e191c3..e50587d 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -90,6 +90,53 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>      return dm;
>  }
>  
> +int libxl__domain_device_setrdm(libxl__gc *gc,
> +                                libxl_domain_config *d_config,
> +                                uint32_t dm_domid)
> +{
> +    int i, ret;
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    struct xen_guest_pcidev_info *pcidevs = NULL;
> +    uint32_t rdmforce = 0;
> +
> +    if ( d_config->num_pcidevs )
> +    {
> +        pcidevs = malloc(d_config->num_pcidevs*sizeof(xen_guest_pcidev_info_t));
> +        if ( pcidevs )
> +        {
> +            for (i = 0; i < d_config->num_pcidevs; i++)
> +            {
> +                pcidevs[i].devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
> +                                             d_config->pcidevs[i].func);
> +                pcidevs[i].bus = d_config->pcidevs[i].bus;
> +                pcidevs[i].seg = d_config->pcidevs[i].domain;
> +                pcidevs[i].flags = d_config->pcidevs[i].rdmforce &
> +                                   PCI_DEV_RDM_CHECK;
> +            }
> +        }
> +        else
> +        {
> +            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
> +                               "Can't allocate for pcidevs.");
> +            return -1;
> +        }
> +    }
> +    rdmforce = libxl_defbool_val(d_config->b_info.rdmforce) ? 1 : 0;
> +
> +    /* Nothing to do. */
> +    if ( !rdmforce && !d_config->num_pcidevs )
> +        return 0;
> +
> +    ret = xc_domain_device_setrdm(ctx->xch, dm_domid,
> +                                  (uint32_t)d_config->num_pcidevs,
> +                                  rdmforce,
> +                                  pcidevs);
> +    if ( d_config->num_pcidevs )
> +        free(pcidevs);
> +
> +    return ret;
> +}
> +
>  const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
>  {
>      const libxl_vnc_info *vnc = NULL;
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index a38f695..be397a6 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1477,6 +1477,10 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
>          int nr_disks, libxl_device_disk *disks,
>          int nr_channels, libxl_device_channel *channels);
>  
> +_hidden int libxl__domain_device_setrdm(libxl__gc *gc,
> +                                        libxl_domain_config *info,
> +                                        uint32_t domid);
> +
>  /*
>   * This function will cause the whole libxl process to hang
>   * if the device model does not respond.  It is deprecated.
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index f7fc695..0076a32 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -398,6 +398,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("kernel",           string),
>      ("cmdline",          string),
>      ("ramdisk",          string),
> +    ("rdmforce",         libxl_defbool),
>      ("u", KeyedUnion(None, libxl_domain_type, "type",
>                  [("hvm", Struct(None, [("firmware",         string),
>                                         ("bios",             libxl_bios_type),
> @@ -518,6 +519,7 @@ libxl_device_pci = Struct("device_pci", [
>      ("power_mgmt", bool),
>      ("permissive", bool),
>      ("seize", bool),
> +    ("rdmforce", bool),
>      ])
>  
>  libxl_device_vtpm = Struct("device_vtpm", [
> diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
> index 26fb143..989eac8 100644
> --- a/tools/libxl/libxlu_pci.c
> +++ b/tools/libxl/libxlu_pci.c
> @@ -143,6 +143,8 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
>                      pcidev->permissive = atoi(tok);
>                  }else if ( !strcmp(optkey, "seize") ) {
>                      pcidev->seize = atoi(tok);
> +                }else if ( !strcmp(optkey, "rdmforce") ) {
> +                    pcidev->rdmforce = atoi(tok);
>                  }else{
>                      XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
>                  }
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 0e754e7..9c23733 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -919,6 +919,7 @@ static void parse_config_data(const char *config_source,
>      int pci_msitranslate = 0;
>      int pci_permissive = 0;
>      int pci_seize = 0;
> +    int pci_rdmforce = 0;
>      int i, e;
>  
>      libxl_domain_create_info *c_info = &d_config->c_info;
> @@ -1699,6 +1700,9 @@ skip_vfb:
>      if (!xlu_cfg_get_long (config, "pci_seize", &l, 0))
>          pci_seize = l;
>  
> +    if (!xlu_cfg_get_long (config, "pci_rdmforce", &l, 0))
> +        pci_rdmforce = l;
> +
>      /* To be reworked (automatically enabled) once the auto ballooning
>       * after guest starts is done (with PCI devices passed in). */
>      if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {
> @@ -1719,6 +1723,7 @@ skip_vfb:
>              pcidev->power_mgmt = pci_power_mgmt;
>              pcidev->permissive = pci_permissive;
>              pcidev->seize = pci_seize;
> +            pcidev->rdmforce = pci_rdmforce;
>              if (!xlu_pci_parse_bdf(config, pcidev, buf))
>                  d_config->num_pcidevs++;
>          }
> @@ -1726,6 +1731,11 @@ skip_vfb:
>              libxl_defbool_set(&b_info->u.pv.e820_host, true);
>      }
>  
> +    if ((c_info->type == LIBXL_DOMAIN_TYPE_HVM) && pci_rdmforce)
> +        libxl_defbool_set(&b_info->rdmforce, true);
> +    else
> +        libxl_defbool_set(&b_info->rdmforce, false);
> +
>      switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
>      case 0:
>          {
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 78c6977..ae924ad 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -34,6 +34,7 @@
>  #include <xen/tasklet.h>
>  #include <xsm/xsm.h>
>  #include <asm/msi.h>
> +#include <xen/stdbool.h>
>  
>  struct pci_seg {
>      struct list_head alldevs_list;
> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>          }
>          break;
>  
> +    case XEN_DOMCTL_set_rdm:
> +    {
> +        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> +        struct xen_guest_pcidev_info *pcidevs = NULL;
> +        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
> +
> +        if ( d == NULL )
> +            return -ESRCH;
> +

What if this is called on an PV domain?

You are also missing the XSM checks.

What if this is called multiple times. Is it OK to over-ride
the 'pci_force' or should it stick once?


> +        d->arch.hvm_domain.pci_force =
> +                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;

Won't we crash here if this is called for PV guests?

> +        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;

What if the 'num_pcidevs' has some bogus value. You need to check for that.


> +        d->arch.hvm_domain.pcidevs = NULL;

Please first free it. It might be that the toolstack
is doing this a couple of times. You don't want to leak memory.


> +
> +        if ( xdsr->num_pcidevs )
> +        {
> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> +                                    xdsr->num_pcidevs);
> +            if ( pcidevs == NULL )
> +            {
> +                rcu_unlock_domain(d);
> +                return -ENOMEM;

But you already have set 'num_pcidevs' to some value. This copying/check
should be done before you modify 'd->arch.hvm_domain'...
> +            }
> +
> +            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> +                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
> +            {
> +                xfree(pcidevs);
> +                rcu_unlock_domain(d);

Ditto. You need to do these checks before you modify 'd->arch.hvm_domain'.

> +                return -EFAULT;
> +            }
> +        }
> +
> +        d->arch.hvm_domain.pcidevs = pcidevs;
> +        rcu_unlock_domain(d);
> +    }
> +        break;
> +
>      case XEN_DOMCTL_assign_device:
>          if ( unlikely(d->is_dying) )
>          {
> diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
> index 1152c3a..5e41e7a 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>                          "  RMRR region: base_addr %"PRIx64
>                          " end_address %"PRIx64"\n",
>                          rmrru->base_address, rmrru->end_address);
> +            /*
> +             * TODO: we may provide a precise paramter just to reserve

s/paramter/parameter/
> +             * RMRR range specific to one device.
> +             */
> +            dprintk(XENLOG_WARNING VTDPREFIX,
> +                    "So please set pci_rdmforce to reserve these ranges"
> +                    " if you need such a device in hotplug case.\n");

'Please set rdmforce to reserve ranges %lx->%lx if you plan to hotplug this device.'

But then this is going to be a bit verbose, so perhaps:

'Ranges %lx-%lx need rdmforce to properly work.' ?

> +
>              acpi_register_rmrr_unit(rmrru);
>          }
>      }
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index 2757c7f..38530e5 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -90,6 +90,10 @@ struct hvm_domain {
>      /* Cached CF8 for guest PCI config cycles */
>      uint32_t                pci_cf8;
>  

Maybe a comment explaining its purpose?

> +    bool_t                  pci_force;
> +    uint32_t                num_pcidevs;
> +    struct xen_guest_pcidev_info      *pcidevs;
> +

You are also missing freeing of this in the hypervisor when the guest
is destroyed. Please fix that.

>      struct pl_time         pl_time;
>  
>      struct hvm_io_handler *io_handler;
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 57e2ed7..ba8970d 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
>  typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
>  
> +/* Currently just one bit to indicate force to check Reserved Device Memory. */

Not sure I understand. Did you mean:

'Check Reserved Device Memory'.

What happens if you do not have this flag? What are the semantics 
of this hypercall - as in what will it mean.

> +#define PCI_DEV_RDM_CHECK   0x1
> +struct xen_guest_pcidev_info {
> +    uint16_t    seg;
> +    uint8_t     bus;
> +    uint8_t     devfn;
> +    uint32_t    flags;
> +};
> +typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
> +/* Control whether/how we check and reserve device memory. */
> +struct xen_domctl_set_rdm {
> +    uint32_t    flags;

What is this 'flags' purpose compared to the 'pcidevs.flags'? Please
explain.

> +    uint32_t    num_pcidevs;
> +    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
> +};
> +typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
> +
>  /* Pass-through interrupts: bind real irq -> hvm devfn. */
>  /* XEN_DOMCTL_bind_pt_irq */
>  /* XEN_DOMCTL_unbind_pt_irq */
> @@ -1070,6 +1089,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_setvnumainfo                  74
>  #define XEN_DOMCTL_psr_cmt_op                    75
>  #define XEN_DOMCTL_arm_configure_domain          76
> +#define XEN_DOMCTL_set_rdm                       77
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -1135,6 +1155,7 @@ struct xen_domctl {
>          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
>          struct xen_domctl_vnuma             vnuma;
>          struct xen_domctl_psr_cmt_op        psr_cmt_op;
> +        struct xen_domctl_set_rdm           set_rdm;
>          uint8_t                             pad[128];
>      } u;
>  };
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index d48463f..5a760e2 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
>      case XEN_DOMCTL_test_assign_device:
>      case XEN_DOMCTL_assign_device:
>      case XEN_DOMCTL_deassign_device:
> +    case XEN_DOMCTL_set_rdm:

There is more to XSM than just this file..

Please compile with XSM enabled.
>  #endif
>          return 0;


Also how does this work with 32-bit dom0s? Is there a need to use the
compat layer?

>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map
  2014-12-01  9:24 ` [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2014-12-02 19:47   ` Konrad Rzeszutek Wilk
  2014-12-08  6:17     ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 19:47 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:21PM +0800, Tiejun Chen wrote:
> From: Jan Beulich <jbeulich@suse.com>
> 
> This is a prerequisite for punching holes into HVM and PVH guests' P2M
> to allow passing through devices that are associated with (on VT-d)
> RMRRs.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  xen/common/compat/memory.c           | 54 ++++++++++++++++++++++++++++++++++++
>  xen/common/memory.c                  | 51 ++++++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/iommu.c      | 10 +++++++
>  xen/drivers/passthrough/vtd/dmar.c   | 17 ++++++++++++
>  xen/drivers/passthrough/vtd/extern.h |  1 +
>  xen/drivers/passthrough/vtd/iommu.c  |  1 +
>  xen/include/public/memory.h          | 24 +++++++++++++++-
>  xen/include/xen/iommu.h              |  4 +++
>  xen/include/xlat.lst                 |  3 +-
>  9 files changed, 163 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
> index 06c90be..60512fa 100644
> --- a/xen/common/compat/memory.c
> +++ b/xen/common/compat/memory.c
> @@ -16,6 +16,37 @@ CHECK_TYPE(domid);
>  
>  CHECK_mem_access_op;
>  
> +#ifdef HAS_PASSTHROUGH
> +struct get_reserved_device_memory {
> +    struct compat_reserved_device_memory_map map;
> +    unsigned int used_entries;
> +};
> +
> +static int get_reserved_device_memory(xen_pfn_t start,
> +                                      xen_ulong_t nr, void *ctxt)
> +{
> +    struct get_reserved_device_memory *grdm = ctxt;
> +
> +    if ( grdm->used_entries < grdm->map.nr_entries )
> +    {
> +        struct compat_reserved_device_memory rdm = {
> +            .start_pfn = start, .nr_pages = nr
> +        };
> +
> +        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> +            return -ERANGE;
> +
> +        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
> +                                     &rdm, 1) )
> +            return -EFAULT;
> +    }
> +
> +    ++grdm->used_entries;
> +
> +    return 0;
> +}
> +#endif
> +
>  int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>  {
>      int split, op = cmd & MEMOP_CMD_MASK;
> @@ -273,6 +304,29 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>              break;
>          }
>  
> +#ifdef HAS_PASSTHROUGH
> +        case XENMEM_reserved_device_memory_map:
> +        {
> +            struct get_reserved_device_memory grdm;
> +
> +            if ( copy_from_guest(&grdm.map, compat, 1) ||
> +                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
> +                return -EFAULT;
> +
> +            grdm.used_entries = 0;
> +            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
> +                                                  &grdm);
> +
> +            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
> +                rc = -ENOBUFS;
> +            grdm.map.nr_entries = grdm.used_entries;
> +            if ( __copy_to_guest(compat, &grdm.map, 1) )
> +                rc = -EFAULT;
> +
> +            return rc;
> +        }
> +#endif
> +
>          default:
>              return compat_arch_memory_op(cmd, compat);
>          }
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 9f21bd3..4788acc 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -692,6 +692,34 @@ out:
>      return rc;
>  }
>  
> +#ifdef HAS_PASSTHROUGH
> +struct get_reserved_device_memory {
> +    struct xen_reserved_device_memory_map map;
> +    unsigned int used_entries;
> +};
> +
> +static int get_reserved_device_memory(xen_pfn_t start,
> +                                      xen_ulong_t nr, void *ctxt)
> +{
> +    struct get_reserved_device_memory *grdm = ctxt;
> +
> +    if ( grdm->used_entries < grdm->map.nr_entries )
> +    {
> +        struct xen_reserved_device_memory rdm = {
> +            .start_pfn = start, .nr_pages = nr
> +        };
> +
> +        if ( __copy_to_guest_offset(grdm->map.buffer, grdm->used_entries,
> +                                    &rdm, 1) )
> +            return -EFAULT;
> +    }
> +
> +    ++grdm->used_entries;
> +
> +    return 0;
> +}
> +#endif
> +
>  long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>      struct domain *d;
> @@ -1101,6 +1129,29 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +#ifdef HAS_PASSTHROUGH
> +    case XENMEM_reserved_device_memory_map:
> +    {
> +        struct get_reserved_device_memory grdm;
> +
> +        if ( copy_from_guest(&grdm.map, arg, 1) ||
> +             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
> +            return -EFAULT;
> +

Shouldn't there be an XSM check here?

> +        grdm.used_entries = 0;
> +        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
> +                                              &grdm);
> +

Also since we doing an iteration over possible many nr_entries should
we think about returning -EAGAIN to user-space so that it can retry?
(As in, have preemption baked in this hypercall)

> +        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
> +            rc = -ENOBUFS;
> +        grdm.map.nr_entries = grdm.used_entries;
> +        if ( __copy_to_guest(arg, &grdm.map, 1) )
> +            rc = -EFAULT;
> +
> +        break;
> +    }
> +#endif
> +
>      default:
>          rc = arch_memory_op(cmd, arg);
>          break;
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index cc12735..7c17e8d 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -344,6 +344,16 @@ void iommu_crash_shutdown(void)
>      iommu_enabled = iommu_intremap = 0;
>  }
>  
> +int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> +{
> +    const struct iommu_ops *ops = iommu_get_ops();
> +
> +    if ( !iommu_enabled || !ops->get_reserved_device_memory )
> +        return 0;
> +
> +    return ops->get_reserved_device_memory(func, ctxt);
> +}
> +
>  bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
>  {
>      const struct hvm_iommu *hd = domain_hvm_iommu(d);
> diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
> index 5e41e7a..86cfad3 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -901,3 +901,20 @@ int platform_supports_x2apic(void)
>      unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
>      return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
>  }
> +
> +int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> +{
> +    struct acpi_rmrr_unit *rmrr;
> +    int rc = 0;
> +
> +    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
> +    {
> +        rc = func(PFN_DOWN(rmrr->base_address),
> +                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
> +                  ctxt);
> +        if ( rc )
> +            break;
> +    }
> +
> +    return rc;
> +}
> diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
> index 5524dba..f9ee9b0 100644
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
>                                 u8 bus, u8 devfn, const struct pci_dev *);
>  int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
>                               u8 bus, u8 devfn);
> +int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
>  
>  unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
>  void io_apic_write_remap_rte(unsigned int apic,
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index 19d8165..a38f201 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
>      .crash_shutdown = vtd_crash_shutdown,
>      .iotlb_flush = intel_iommu_iotlb_flush,
>      .iotlb_flush_all = intel_iommu_iotlb_flush_all,
> +    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
>      .dump_p2m_table = vtd_dump_p2m_table,
>  };
>  
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 595f953..cee4535 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -572,7 +572,29 @@ struct xen_vnuma_topology_info {
>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>  
> -/* Next available subop number is 27 */
> +/*
> + * For legacy reasons, some devices must be configured with special memory
> + * regions to function correctly.  The guest must avoid using any of these
> + * regions.
> + */
> +#define XENMEM_reserved_device_memory_map   27
> +struct xen_reserved_device_memory {
> +    xen_pfn_t start_pfn;
> +    xen_ulong_t nr_pages;
> +};
> +typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
> +
> +struct xen_reserved_device_memory_map {
> +    /* IN/OUT */
> +    unsigned int nr_entries;
> +    /* OUT */
> +    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
> +};
> +typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
> +
> +/* Next available subop number is 28 */
>  
>  #endif /* __XEN_PUBLIC_MEMORY_H__ */
>  
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 8eb764a..409f6f8 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -120,6 +120,8 @@ void iommu_dt_domain_destroy(struct domain *d);
>  
>  struct page_info;
>  
> +typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, void *ctxt);
> +
>  struct iommu_ops {
>      int (*init)(struct domain *d);
>      void (*hwdom_init)(struct domain *d);
> @@ -156,12 +158,14 @@ struct iommu_ops {
>      void (*crash_shutdown)(void);
>      void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
>      void (*iotlb_flush_all)(struct domain *d);
> +    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
>      void (*dump_p2m_table)(struct domain *d);
>  };
>  
>  void iommu_suspend(void);
>  void iommu_resume(void);
>  void iommu_crash_shutdown(void);
> +int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
>  
>  void iommu_share_p2m_table(struct domain *d);
>  
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index 41b3e35..42229fd 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -61,9 +61,10 @@
>  !	memory_exchange			memory.h
>  !	memory_map			memory.h
>  !	memory_reservation		memory.h
> -?	mem_access_op		memory.h
> +?	mem_access_op			memory.h
>  !	pod_target			memory.h
>  !	remove_from_physmap		memory.h
> +!	reserved_device_memory_map	memory.h
>  ?	physdev_eoi			physdev.h
>  ?	physdev_get_free_pirq		physdev.h
>  ?	physdev_irq			physdev.h
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map
  2014-12-01  9:24 ` [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map Tiejun Chen
  2014-12-02  8:46   ` Tian, Kevin
@ 2014-12-02 19:50   ` Konrad Rzeszutek Wilk
  2014-12-08  7:25     ` Chen, Tiejun
  1 sibling, 1 reply; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 19:50 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:23PM +0800, Tiejun Chen wrote:
> We will introduce that hypercall xc_reserved_device_memory_map
> approach to libxc.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/libxc/include/xenctrl.h |  5 +++++
>  tools/libxc/xc_domain.c       | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 84012fe..a3aeac3 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1294,6 +1294,11 @@ int xc_domain_set_memory_map(xc_interface *xch,
>  int xc_get_machine_memory_map(xc_interface *xch,
>                                struct e820entry entries[],
>                                uint32_t max_entries);
> +
> +int xc_reserved_device_memory_map(xc_interface *xch,
> +                                  uint32_t dom,
> +                                  struct xen_reserved_device_memory entries[],
> +                                  uint32_t *max_entries);
>  #endif
>  int xc_domain_set_time_offset(xc_interface *xch,
>                                uint32_t domid,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 7fd43e9..09fd988 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -679,6 +679,36 @@ int xc_domain_set_memory_map(xc_interface *xch,
>  
>      return rc;
>  }
> +
> +int xc_reserved_device_memory_map(xc_interface *xch,
> +                                  uint32_t domid,
> +                                  struct xen_reserved_device_memory entries[],
> +                                  uint32_t *max_entries)
> +{
> +    int rc;
> +    struct xen_reserved_device_memory_map xrdmmap = {
> +        .domid = domid,
> +        .nr_entries = *max_entries
> +    };
> +    DECLARE_HYPERCALL_BOUNCE(entries,
> +                             sizeof(struct xen_reserved_device_memory) *
> +                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +
> +    if ( xc_hypercall_bounce_pre(xch, entries) )
> +        return -1;
> +
> +    set_xen_guest_handle(xrdmmap.buffer, entries);
> +
> +    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
> +                      &xrdmmap, sizeof(xrdmmap));
> +
> +    xc_hypercall_bounce_post(xch, entries);
> +
> +    *max_entries = xrdmmap.nr_entries;
> +

I would bake the -EAGAIN support in here to loop here.

See how the xc_domain_destroy does it.
> +    return rc ? rc : xrdmmap.nr_entries;
> +}
> +
>  int xc_get_machine_memory_map(xc_interface *xch,
>                                struct e820entry entries[],
>                                uint32_t max_entries)
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory Tiejun Chen
  2014-12-02  8:54   ` Tian, Kevin
@ 2014-12-02 19:55   ` Konrad Rzeszutek Wilk
  2014-12-08  7:49     ` Chen, Tiejun
  1 sibling, 1 reply; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 19:55 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:24PM +0800, Tiejun Chen wrote:
> In case of reserved device memory overlapping with ram, it also probably

s/also//
> overlap with modules space so we need to check these reserved device
s/overlap/overlaps/

What is 'modules space'?

> memory as well.

s/reserved device memory/E820_RSV/ ?

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/libxc/xc_hvm_build_x86.c | 94 +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 79 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index c81a25b..ddcf06d 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -54,9 +54,82 @@
>  
>  #define VGA_HOLE_SIZE (0x20)
>  
> +/*
> + * Check whether there exists mmio hole in the specified memory range.
> + * Returns 1 if exists, else returns 0.
> + */
> +static int check_mmio_hole(uint64_t start, uint64_t memsize,
> +                           uint64_t mmio_start, uint64_t mmio_size)
> +{
> +    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
> +        return 0;
> +    else
> +        return 1;
> +}
> +
> +/* Getting all reserved device memory map info. */
> +static struct xen_reserved_device_memory
> +*xc_get_reserved_device_memory_map(xc_interface *xch, unsigned int nr_entries,
> +                                   uint32_t dom)
> +{
> +    struct xen_reserved_device_memory *xrdm = NULL;
> +    int rc = xc_reserved_device_memory_map(xch, dom, xrdm, &nr_entries);
> +
> +    if ( rc < 0 )
> +    {
> +        if ( errno == ENOBUFS )
> +        {
> +            if ( (xrdm = malloc(nr_entries *
> +                                sizeof(xen_reserved_device_memory_t))) == NULL )
> +            {
> +                PERROR("Could not allocate memory.");
> +                return 0;
> +            }
> +            rc = xc_reserved_device_memory_map(xch, dom, xrdm, &nr_entries);
> +            if ( rc )
> +            {
> +                PERROR("Could not get reserved device memory maps.");
> +                free(xrdm);
> +                return 0;

Uhhh, is that the right error to return?

Don't you mean ERR_PTR logic? Or 'return NULL' ?


> +            }
> +        }
> +        else
> +            PERROR("Could not get reserved device memory maps.");
> +    }
> +
> +    return xrdm;
> +}
> +
> +static int xc_check_modules_space(xc_interface *xch, uint64_t *mstart_out,
> +                                  uint64_t *mend_out, uint32_t dom)
> +{
> +    unsigned int i = 0, nr_entries = 0;
> +    uint64_t rdm_start = 0, rdm_end = 0;
> +    struct xen_reserved_device_memory *rdm_map =
> +                        xc_get_reserved_device_memory_map(xch, nr_entries, dom);
> +

You need to check whether 'rdm_map' is NULL.

> +    for ( i = 0; i < nr_entries; i++ )
> +    {
> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << XC_PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << XC_PAGE_SHIFT);
> +
> +        /* Just use check_mmio_hole() to check modules ranges. */
> +        if ( check_mmio_hole(rdm_start,
> +                             rdm_end - rdm_start,
> +                             *mstart_out, *mend_out) )
> +            return -1;
> +    }
> +    
> +    free(rdm_map);
> +
> +    return 0;
> +}
> +
>  static int modules_init(struct xc_hvm_build_args *args,
>                          uint64_t vend, struct elf_binary *elf,
> -                        uint64_t *mstart_out, uint64_t *mend_out)
> +                        uint64_t *mstart_out, uint64_t *mend_out,
> +                        xc_interface *xch,
> +                        uint32_t dom)
>  {
>  #define MODULE_ALIGN 1UL << 7
>  #define MB_ALIGN     1UL << 20
> @@ -80,6 +153,10 @@ static int modules_init(struct xc_hvm_build_args *args,
>      if ( *mend_out > vend )    
>          return -1;
>  
> +    /* Is it overlapping with reserved device memory? */
> +    if ( xc_check_modules_space(xch, mstart_out, mend_out, dom) )
> +        return -1;
> +
>      if ( args->acpi_module.length != 0 )
>          args->acpi_module.guest_addr_out = *mstart_out;
>      if ( args->smbios_module.length != 0 )
> @@ -226,19 +303,6 @@ static int loadmodules(xc_interface *xch,
>      return rc;
>  }
>  
> -/*
> - * Check whether there exists mmio hole in the specified memory range.
> - * Returns 1 if exists, else returns 0.
> - */
> -static int check_mmio_hole(uint64_t start, uint64_t memsize,
> -                           uint64_t mmio_start, uint64_t mmio_size)
> -{
> -    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
> -        return 0;
> -    else
> -        return 1;
> -}
> -

This movement of 'check_mmio_hole' needs to be a seperate patch.

>  static int setup_guest(xc_interface *xch,
>                         uint32_t dom, struct xc_hvm_build_args *args,
>                         char *image, unsigned long image_size)
> @@ -282,7 +346,7 @@ static int setup_guest(xc_interface *xch,
>          goto error_out;
>      }
>  
> -    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
> +    if ( modules_init(args, v_end, &elf, &m_start, &m_end, xch, dom) != 0 )
>      {
>          ERROR("Insufficient space to load modules.");
>          goto error_out;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-01  9:24 ` [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps Tiejun Chen
  2014-12-02  8:59   ` Tian, Kevin
@ 2014-12-02 20:01   ` Konrad Rzeszutek Wilk
  2014-12-08  8:09     ` Chen, Tiejun
  2014-12-04 15:52   ` Jan Beulich
  2 siblings, 1 reply; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:01 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:25PM +0800, Tiejun Chen wrote:
> We need to use reserved device memory maps with multiple times, so
> provide just one common function should be friend.

We need to call reserved device memory maps hypercall
(XENMEM_reserved_device_memory_map) many times, hence provide one common function.

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/util.c | 59 +++++++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h |  2 ++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 80d822f..dd81fb6 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -22,11 +22,14 @@
>  #include "config.h"
>  #include "hypercall.h"
>  #include "ctype.h"
> +#include "errno.h"
>  #include <stdint.h>
>  #include <xen/xen.h>
>  #include <xen/memory.h>
>  #include <xen/sched.h>
>  
> +struct xen_reserved_device_memory *rdm_map;
> +
>  void wrmsr(uint32_t idx, uint64_t v)
>  {
>      asm volatile (
> @@ -828,6 +831,62 @@ int hpet_exists(unsigned long hpet_base)
>      return ((hpet_id >> 16) == 0x8086);
>  }
>  
> +static int
> +get_reserved_device_memory_map(struct xen_reserved_device_memory entries[],
> +                               uint32_t *max_entries)
> +{
> +    int rc;
> +    struct xen_reserved_device_memory_map xrdmmap = {
> +        .domid = DOMID_SELF,
> +        .nr_entries = *max_entries
> +    };
> +
> +    set_xen_guest_handle(xrdmmap.buffer, entries);
> +
> +    rc = hypercall_memory_op(XENMEM_reserved_device_memory_map, &xrdmmap);
> +    *max_entries = xrdmmap.nr_entries;

Don't you want to check rc first before altering 'max_entries' ?

> +
> +    return rc;
> +}
> +
> +/*
> + * Getting all reserved device memory map info in case of hvmloader.
> + * We just return zero for any failed cases, and this means we
> + * can't further handle any reserved device memory.

That does not sound like the right error value. Why not a proper
return value? At worst you can put 'nr_entries' as an parameter
and return the error value.

> + */
> +unsigned int hvm_get_reserved_device_memory_map(void)
> +{
> +    static unsigned int nr_entries = 0;
> +    int rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
> +
> +    if ( rc == -ENOBUFS )
> +    {
> +        rdm_map = mem_alloc(nr_entries*sizeof(struct xen_reserved_device_memory),

That '*' being squashed looks wrong. Just make it bigger and don't worry about
the 80 line.

> +                            0);
> +        if ( rdm_map )
> +        {
> +            rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
> +            if ( rc )
> +            {
> +                printf("Could not get reserved dev memory info on domain");
> +                return 0;
> +            }
> +        }
> +        else
> +        {
> +            printf("No space to get reserved dev memory maps!\n");
> +            return 0;
> +        }
> +    }
> +    else if ( rc )
> +    {
> +        printf("Could not get reserved dev memory info on domain");
> +        return 0;
> +    }
> +
> +    return nr_entries;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index a70e4aa..e4f1851 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -241,6 +241,8 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int bios_image_base);
>  void dump_e820_table(struct e820entry *e820, unsigned int nr);
>  
> +unsigned int hvm_get_reserved_device_memory_map(void);
> +
>  #ifndef NDEBUG
>  void perform_tests(void);
>  #else
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps
  2014-12-01  9:24 ` [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
  2014-12-02  9:42   ` Tian, Kevin
@ 2014-12-02 20:17   ` Konrad Rzeszutek Wilk
  2014-12-04 16:20   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:17 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:27PM +0800, Tiejun Chen wrote:
> We need to check to reserve all reserved device memory maps in e820
> to avoid any potential guest memory conflict.
> 
> Currently, if we can't insert RDM entries directly, we may need to handle
> several ranges as follows:

s/several/two/

s/follows/follow/

> a. Fixed Ranges --> BUG()
>  lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
>  BIOS region,
>  RESERVED_MEMBASE ~ 0x100000000,

I am not sure what you are trying to say here. Could you explain it 
a bit more please?

> b. RAM or RAM:Hole -> Try to reserve

Reading the beginning of the 'Currently'.. this says:

we may need to handle RAM or RAM:Hole -> Try to reserve.

I don't know what 'RAM:Hole' means. And instead of using '->' you can
say: we will try to reserve.

But what are we reserving ? Are we reserving it as an E820_RSV or just
as an hole? What about the RAM behind it? Are we gulping up the RAM regions
(as in losing them) or are we moving the RAM regions (GPFNs) to somewhere else?

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/e820.c | 168 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 168 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
> index 2e05e93..ef87e41 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -22,6 +22,7 @@
>  
>  #include "config.h"
>  #include "util.h"
> +#include <xen/memory.h>
>  
>  void dump_e820_table(struct e820entry *e820, unsigned int nr)
>  {
> @@ -68,12 +69,173 @@ void dump_e820_table(struct e820entry *e820, unsigned int nr)
>      }
>  }
>  
> +extern struct xen_reserved_device_memory *rdm_map;
> +static unsigned int construct_rdm_e820_maps(unsigned int next_e820_entry_index,

s/next_e820_entry_index/next_entry/ ?

> +                                            uint32_t nr_map,
> +                                            struct xen_reserved_device_memory *map,
> +                                            struct e820entry *e820,
> +                                            unsigned int lowmem_reserved_base,
> +                                            unsigned int bios_image_base)
> +{
> +    unsigned int i, j, sum_nr;
> +    uint64_t start, end, next_start, rdm_start, rdm_end;
> +    uint32_t type;
> +    int err = 0;
> +
> +    for ( i = 0; i < nr_map; i++ )
> +    {
> +        rdm_start = (uint64_t)map[i].start_pfn << PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)map[i].nr_pages << PAGE_SHIFT);
> +
> +        for ( j = 0; j < next_e820_entry_index - 1; j++ )
> +        {
> +            sum_nr = next_e820_entry_index + nr_map;
> +            start = e820[j].addr;
> +            end = e820[j].addr + e820[j].size;
> +            type = e820[j].type;
> +            next_start = e820[j+1].addr;
> +
> +            if ( rdm_start >= start && rdm_start <= end )
> +            {
> +                /*
> +                 * lowmem_reserved_base-0xA0000: reserved by BIOS
> +                 * implementation.
> +                 * Or BIOS region.
> +                 */
> +                if ( (lowmem_reserved_base < 0xA0000 &&
> +                        start == lowmem_reserved_base) ||
> +                     start == bios_image_base )

something is off with your spaces.
> +                {
> +                    err = -1;
> +                    break;

Keep in mind we will just break out of this loop. Do you want to add
at the end of this loop:


if (err)
	break;

> +                }
> +            }
> +
> +            /* Just amid those remaining e820 entries. */
> +            if ( (rdm_start > end) && (rdm_end < next_start) )
> +            {
> +                memmove(&e820[j+2], &e820[j+1],
> +                        (sum_nr - j - 1) * sizeof(struct e820entry));

What if there is something at j+2? Should we have an
j-2 < E820_MAX check somewhere?

This whole 'memmove' logic is making me a bit worried.

Would it be easier to have this logic inside build_e820_table so
that it could construct the e820 with this information right away?

Or if that was deemed incorrect could you explain that in the
commit description?

> +
> +                /* Then fill RMRR into that entry. */
> +                e820[j+1].addr = rdm_start;
> +                e820[j+1].size = rdm_end - rdm_start;
> +                e820[j+1].type = E820_RESERVED;
> +                next_e820_entry_index++;
> +                continue;
> +            }
> +
> +            /* Already at the end. */
> +            if ( (rdm_start > end) && !next_start )
> +            {
> +                e820[next_e820_entry_index].addr = rdm_start;
> +                e820[next_e820_entry_index].size = rdm_end - rdm_start;
> +                e820[next_e820_entry_index].type = E820_RESERVED;
> +                next_e820_entry_index++;
> +                continue;
> +            }
> +
> +            if ( type == E820_RAM )
> +            {
> +                /* If coincide with one RAM range. */
> +                if ( rdm_start == start && rdm_end == end)
> +                {
> +                    e820[j].type = E820_RESERVED;
> +                    continue;
> +                }
> +
> +                /* If we're just aligned with start of one RAM range. */
> +                if ( rdm_start == start && rdm_end < end )
> +                {
> +                    memmove(&e820[j+1], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j+1].addr = rdm_end;
> +                    e820[j+1].size = e820[j].addr + e820[j].size - rdm_end;
> +                    e820[j+1].type = E820_RAM;
> +                    next_e820_entry_index++;
> +
> +                    e820[j].addr = rdm_start;
> +                    e820[j].size = rdm_end - rdm_start;
> +                    e820[j].type = E820_RESERVED;
> +                    continue;
> +                }
> +
> +                /* If we're just aligned with end of one RAM range. */
> +                if ( rdm_start > start && rdm_end == end )
> +                {
> +                    memmove(&e820[j+1], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j].size = rdm_start - e820[j].addr;
> +                    e820[j].type = E820_RAM;
> +
> +                    e820[j+1].addr = rdm_start;
> +                    e820[j+1].size = rdm_end - rdm_start;
> +                    e820[j+1].type = E820_RESERVED;
> +                    next_e820_entry_index++;
> +                    continue;
> +                }
> +
> +                /* If we're just in of one RAM range */
> +                if ( rdm_start > start && rdm_end < end )
> +                {
> +                    memmove(&e820[j+2], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j+2].addr = rdm_end;
> +                    e820[j+2].size = e820[j].addr + e820[j].size - rdm_end;
> +                    e820[j+2].type = E820_RAM;
> +                    next_e820_entry_index++;
> +
> +                    e820[j+1].addr = rdm_start;
> +                    e820[j+1].size = rdm_end - rdm_start;
> +                    e820[j+1].type = E820_RESERVED;
> +                    next_e820_entry_index++;
> +
> +                    e820[j].size = rdm_start - e820[j].addr;
> +                    e820[j].type = E820_RAM;
> +                    continue;
> +                }
> +
> +                /* If we're going last RAM:Hole range */
> +                if ( end < next_start && rdm_start > start &&
> +                     rdm_end < next_start )
> +                {
> +                    memmove(&e820[j+1], &e820[j],
> +                            (sum_nr - j) * sizeof(struct e820entry));
> +
> +                    e820[j].size = rdm_start - e820[j].addr;
> +                    e820[j].type = E820_RAM;
> +
> +                    e820[j+1].addr = rdm_start;
> +                    e820[j+1].size = rdm_end - rdm_start;
> +                    e820[j+1].type = E820_RESERVED;
> +                    next_e820_entry_index++;
> +                    continue;
> +                }
> +            }
> +        }
> +    }
> +
> +    /* These overlap may issue guest can't work well. */
> +    if ( err )
> +    {
> +        printf("Guest can't work with some reserved device memory overlap!\n");
> +        BUG();
> +    }
> +
> +    /* Fine to construct RDM mappings into e820. */
> +    return next_e820_entry_index;
> +}
> +
>  /* Create an E820 table based on memory parameters provided in hvm_info. */
>  int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
>      unsigned int nr = 0;
> +    unsigned int nr_entries = 0;
>  
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -169,6 +331,12 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
>  
> +    nr_entries = hvm_get_reserved_device_memory_map();
> +    if ( nr_entries )
> +        nr = construct_rdm_e820_maps(nr, nr_entries, rdm_map, e820,
> +                                     lowmem_reserved_base,
> +                                     bios_image_base);
> +
>      return nr;
>  }
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory Tiejun Chen
  2014-12-02  9:48   ` Tian, Kevin
@ 2014-12-02 20:23   ` Konrad Rzeszutek Wilk
  2014-12-04 16:28   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:23 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:28PM +0800, Tiejun Chen wrote:
> In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
> to allocate some memory to use in runtime cycle, so we alsoe need to

s/cycle//

s/alsoe/also

> make sure all reserved device memory don't overlap such a region.

s/such a/with such/
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/util.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 8767897..f3723c7 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -416,9 +416,29 @@ static uint32_t alloc_down = RESERVED_MEMORY_DYNAMIC_END;
>  
>  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
>  {
> +    unsigned int i, num = hvm_get_reserved_device_memory_map();
> +    uint64_t rdm_start, rdm_end;
> +    uint32_t alloc_start, alloc_end;
> +
>      alloc_down -= nr_mfns << PAGE_SHIFT;
> +    alloc_start = alloc_down;
> +    alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
> +    for ( i = 0; i < num; i++ )
> +    {
> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << PAGE_SHIFT);
> +        if ( check_rdm_hole_conflict((uint64_t)alloc_start,
> +                                     (uint64_t)alloc_end,
> +                                     rdm_start, rdm_end - rdm_start) )
> +        {
> +            alloc_end = rdm_start;
> +            alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
> +            BUG_ON(alloc_up >= alloc_start);
> +        }
> +    }
> +
>      BUG_ON(alloc_up >= alloc_down);
> -    return alloc_down >> PAGE_SHIFT;
> +    return alloc_start >> PAGE_SHIFT;
>  }
>  
>  void *mem_alloc(uint32_t size, uint32_t align)
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation
  2014-12-01  9:24 ` [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
  2014-12-02  9:59   ` Tian, Kevin
@ 2014-12-02 20:26   ` Konrad Rzeszutek Wilk
  2014-12-04 16:46   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:26 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:30PM +0800, Tiejun Chen wrote:
> We always reserve these ranges since we never allow any stuff to
> poke them.

s/any stuff to poke them/guest to access them./
> 
> But in theory some untrusted VM can maliciously access them. So we
> need to intercept this approach. But we just don't want to leak
> anything or introduce any side affect since other OSs may touch them
> by careless behavior, so its enough to have a lightweight way, and
> it shouldn't be same as those broken pages which cause domain crush.

s/crush/crash/
> 
> So we just need to return with next eip then let VM/OS itself handle

s/So//

s/itself//
> such a scenario as its own logic.

s/as its own/using its own/
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 2907afa..3ee884a 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2403,6 +2403,7 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
>      p2m_type_t p2mt;
>      int ret;
>      struct domain *d = current->domain;
> +    struct p2m_get_reserved_device_memory pgrdm;
>  
>      /*
>       * We treat all write violations also as read violations.
> @@ -2438,6 +2439,23 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
>          __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
>      }
>  
> +    /* This means some untrusted VM can maliciously access reserved
> +     * device memory. But we just don't want to leak anything or
> +     * introduce any side affect since other OSs may touch them by
> +     * careless behavior, so its enough to have a lightweight way.
> +     * Here we just need to return with next eip then let VM/OS itself
> +     * handle such a scenario as its own logic.
> +     */
> +    pgrdm.gfn = gfn;
> +    pgrdm.domain = d;
> +    ret = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                           &pgrdm);
> +    if ( ret )
> +    {
> +        update_guest_eip();
> +        return;
> +    }
> +
>      if ( qualification & EPT_GLA_VALID )
>      {
>          __vmread(GUEST_LINEAR_ADDRESS, &gla);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory Tiejun Chen
  2014-12-02 14:54   ` Julien Grall
@ 2014-12-02 20:27   ` Konrad Rzeszutek Wilk
  2014-12-04 16:51   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:27 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:31PM +0800, Tiejun Chen wrote:
> We can't expost those reserved device memory in case of mem_access

s/expost/expose/

> since any access may corrupt device usage.

Could you explain this in more details please?

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/common/mem_access.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
> index 6c2724b..72a807a 100644
> --- a/xen/common/mem_access.c
> +++ b/xen/common/mem_access.c
> @@ -55,6 +55,43 @@ void mem_access_resume(struct domain *d)
>      }
>  }
>  
> +/* We can't expose reserved device memory. */
> +static int mem_access_check_rdm(struct domain *d, uint64_aligned_t start,
> +                                uint32_t nr)
> +{
> +    uint32_t i;
> +    struct p2m_get_reserved_device_memory pgrdm;
> +    int rc = 0;
> +
> +    if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
> +    {
> +        for ( i = 0; i < nr; i++ )
> +        {
> +            pgrdm.gfn = start + i;
> +            pgrdm.domain = d;
> +            rc = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                                  &pgrdm);
> +            if ( rc < 0 )
> +            {
> +                printk(XENLOG_WARNING
> +                       "Domain %d can't check reserved device memory.\n",
> +                       d->domain_id);
> +                return rc;
> +            }
> +
> +            if ( rc == 1 )
> +            {
> +                printk(XENLOG_WARNING
> +                       "Domain %d: we shouldn't mem_access reserved device memory.\n",
> +                       d->domain_id);
> +                return rc;
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}
> +
>  int mem_access_memop(unsigned long cmd,
>                       XEN_GUEST_HANDLE_PARAM(xen_mem_access_op_t) arg)
>  {
> @@ -99,6 +136,10 @@ int mem_access_memop(unsigned long cmd,
>                ((mao.pfn + mao.nr - 1) > domain_get_maximum_gpfn(d))) )
>              break;
>  
> +        rc =  mem_access_check_rdm(d, mao.pfn, mao.nr);
> +        if ( rc == 1 )
> +            break;
> +
>          rc = p2m_set_mem_access(d, mao.pfn, mao.nr, start_iter,
>                                  MEMOP_CMD_MASK, mao.access);
>          if ( rc > 0 )
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry
  2014-12-01  9:24 ` [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
  2014-12-02 10:00   ` Tian, Kevin
@ 2014-12-02 20:29   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:29 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:32PM +0800, Tiejun Chen wrote:
> We will create RMRR mapping as follows:
> 
> If gfn space unoccupied, we just set that. If
> space already occupy by 1:1 RMRR mapping do thing. Others

What is 'do thing'?

It looks as if we do nothing. Is that what you meant?

> should be failed.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/arch/x86/mm/p2m.c     | 28 ++++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h |  4 ++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 607ecd0..c415521 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -913,6 +913,34 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
>      return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
>  }
>  
> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                           p2m_access_t p2ma)
> +{
> +    p2m_type_t p2mt;
> +    p2m_access_t a;
> +    mfn_t mfn;
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int ret = -EBUSY;
> +
> +    gfn_lock(p2m, gfn, 0);
> +
> +    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) )
> +        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct,
> +                            p2ma);
> +    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
> +        ret = 0;
> +    else
> +        printk(XENLOG_G_WARNING
> +               "Cannot identity map d%d:%lx, already mapped to %lx.\n",
> +               d->domain_id, gfn, mfn_x(mfn));
> +
> +    gfn_unlock(p2m, gfn, 0);
> +
> +    return ret;
> +}
> +
>  /* Returns: 0 for success, -errno for failure */
>  int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
>  {
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 99f7fb7..26cf0cc 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -509,6 +509,10 @@ int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
>  int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
>  int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
>  
> +/* Set identity addresses in the p2m table (for pass-through) */
> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                           p2m_access_t p2ma);
> +
>  /* Add foreign mapping to the guest's p2m table. */
>  int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>                      unsigned long gpfn, domid_t foreign_domid);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 15/17] xen:vtd: create RMRR mapping
  2014-12-01  9:24 ` [v8][PATCH 15/17] xen:vtd: create RMRR mapping Tiejun Chen
  2014-12-02 10:02   ` Tian, Kevin
@ 2014-12-02 20:30   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:30 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:33PM +0800, Tiejun Chen wrote:
> intel_iommu_map_page() does nothing if VT-d shares EPT page table.
> So rmrr_identity_mapping() never create RMRR mapping but in some

s/So//

s/create/creates/
> cases like some GFX drivers it still need to access RMRR.

s/drivers .../drivers which may need to access RMRR/
> 
> Here we will create those RMRR mappings even in shared EPT case.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/drivers/passthrough/vtd/iommu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index a38f201..a54c6eb 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1856,10 +1856,15 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>  
>      while ( base_pfn < end_pfn )
>      {
> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> -                                       IOMMUF_readable|IOMMUF_writable);
> -
> -        if ( err )
> +        int err = 0;
> +        if ( iommu_use_hap_pt(d) )
> +        {
> +            ASSERT(!iommu_passthrough || !is_hardware_domain(d));
> +            if ( (err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw)) )
> +                return err;
> +        }
> +        else if ( (err = intel_iommu_map_page(d, base_pfn, base_pfn,
> +					      IOMMUF_readable|IOMMUF_writable)) )
>              return err;
>          base_pfn++;
>      }
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR
  2014-12-01  9:24 ` [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR Tiejun Chen
  2014-12-02 10:11   ` Tian, Kevin
@ 2014-12-02 20:40   ` Konrad Rzeszutek Wilk
  2014-12-04 17:05   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-02 20:40 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:34PM +0800, Tiejun Chen wrote:
> Sometimes different devices may share RMRR range so in this

s/Sometimes//

s/range/ranges/
> case we shouldn't assign these devices into different VMs
> since they may have potential leakage even damage between VMs.

s/potential leak../corrupt each other/?

I am actually not sure what they would leak? Security data?

> 
> So we need to group all devices as RMRR range to make sure they

s/So//

s/range/ranges/
> are just assigned into the same VM.
> 
> Here we introduce two field, gid and domid, in struct,
> acpi_rmrr_unit:
>  gid: indicate which group this device owns. "0" is invalid so
>       just start from "1".
>  domid: indicate which domain this device owns currently. Firstly
>         the hardware domain should own it.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/drivers/passthrough/vtd/dmar.c  | 28 ++++++++++++++-
>  xen/drivers/passthrough/vtd/dmar.h  |  2 ++
>  xen/drivers/passthrough/vtd/iommu.c | 68 +++++++++++++++++++++++++++++++++----
>  3 files changed, 91 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
> index c5bc8d6..8d3406f 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -572,10 +572,11 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  {
>      struct acpi_dmar_reserved_memory *rmrr =
>          container_of(header, struct acpi_dmar_reserved_memory, header);
> -    struct acpi_rmrr_unit *rmrru;
> +    struct acpi_rmrr_unit *rmrru, *cur_rmrr;
>      void *dev_scope_start, *dev_scope_end;
>      u64 base_addr = rmrr->base_address, end_addr = rmrr->end_address;
>      int ret;
> +    static unsigned int group_id = 0;
>  
>      if ( (ret = acpi_dmar_check_length(header, sizeof(*rmrr))) != 0 )
>          return ret;
> @@ -611,6 +612,8 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>      rmrru->base_address = base_addr;
>      rmrru->end_address = end_addr;
>      rmrru->segment = rmrr->segment;
> +    /* "0" is an invalid group id. */
> +    rmrru->gid = 0;
>  
>      dev_scope_start = (void *)(rmrr + 1);
>      dev_scope_end   = ((void *)rmrr) + header->length;
> @@ -682,7 +685,30 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>                      "So please set pci_rdmforce to reserve these ranges"
>                      " if you need such a device in hotplug case.\n");
>  
> +            list_for_each_entry(cur_rmrr, &acpi_rmrr_units, list)
> +            {
> +                /*
> +                 * Any same or overlap range mean they should be
> +                 * at same group.

Same or overlap ranges must be in the same group.

> +                 */
> +                if ( ((base_addr >= cur_rmrr->base_address) &&
> +                     (end_addr <= cur_rmrr->end_address)) ||
> +                     ((base_addr <= cur_rmrr->base_address) &&
> +                     (end_addr >= cur_rmrr->end_address)) )
> +                {
> +                    rmrru->gid = cur_rmrr->gid;
> +                    continue;
> +                }
> +            }
> +
>              acpi_register_rmrr_unit(rmrru);
> +
> +            /* Allocate group id from gid:1. */
> +            if ( !rmrru->gid )
> +            {
> +                group_id++;
> +                rmrru->gid = group_id;
> +            }
>          }
>      }
>  
> diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
> index af1feef..a57c0d4 100644
> --- a/xen/drivers/passthrough/vtd/dmar.h
> +++ b/xen/drivers/passthrough/vtd/dmar.h
> @@ -76,6 +76,8 @@ struct acpi_rmrr_unit {
>      u64    end_address;
>      u16    segment;
>      u8     allow_all:1;
> +    int    gid;

unsigned int?

> +    domid_t    domid;
>  };
>  
>  struct acpi_atsr_unit {
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index a54c6eb..ba40209 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1882,9 +1882,9 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>  
>  static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>  {
> -    struct acpi_rmrr_unit *rmrr;
> -    u16 bdf;
> -    int ret, i;
> +    struct acpi_rmrr_unit *rmrr, *g_rmrr;
> +    u16 bdf, g_bdf;
> +    int ret, i, j;
>  
>      ASSERT(spin_is_locked(&pcidevs_lock));
>  
> @@ -1905,6 +1905,32 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>               PCI_BUS(bdf) == pdev->bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> +            if ( rmrr->domid == hardware_domain->domain_id )
> +            {
> +                for_each_rmrr_device ( g_rmrr, g_bdf, j )
> +                {
> +                    if ( g_rmrr->gid == rmrr->gid )
> +                    {
> +                        if ( g_rmrr->domid == hardware_domain->domain_id )
> +                            g_rmrr->domid = pdev->domain->domain_id;
> +                        else if ( g_rmrr->domid != pdev->domain->domain_id )
> +                        {
> +                            rmrr->domid = g_rmrr->domid;
> +                            continue;
> +                        }
> +                    }
> +                }
> +            }
> +
> +            if ( rmrr->domid != pdev->domain->domain_id )
> +            {
> +                domain_context_unmap(pdev->domain, devfn, pdev);
> +                dprintk(XENLOG_ERR VTDPREFIX, "d%d: this is a group device owned by d%d\n",
> +                        pdev->domain->domain_id, rmrr->domid);
> +                rmrr->domid = 0;
> +                return -EINVAL;
> +            }
> +
>              ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>              if ( ret )
>                  dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
> @@ -1946,6 +1972,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
>               PCI_DEVFN2(bdf) != devfn )
>              continue;
>  
> +        /* Just release to hardware domain. */
> +        rmrr->domid = hardware_domain->domain_id;
>          rmrr_identity_mapping(pdev->domain, 0, rmrr);
>      }
>  
> @@ -2104,6 +2132,8 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
>      spin_lock(&pcidevs_lock);
>      for_each_rmrr_device ( rmrr, bdf, i )
>      {
> +        /* hwdom should own all devices at first. */
> +        rmrr->domid = d->domain_id;
>          ret = rmrr_identity_mapping(d, 1, rmrr);
>          if ( ret )
>              dprintk(XENLOG_ERR VTDPREFIX,
> @@ -2273,9 +2303,9 @@ static int reassign_device_ownership(
>  static int intel_iommu_assign_device(
>      struct domain *d, u8 devfn, struct pci_dev *pdev)
>  {
> -    struct acpi_rmrr_unit *rmrr;
> -    int ret = 0, i;
> -    u16 bdf, seg;
> +    struct acpi_rmrr_unit *rmrr, *g_rmrr;
> +    int ret = 0, i, j;
> +    u16 bdf, seg, g_bdf;
>      u8 bus;
>  
>      if ( list_empty(&acpi_drhd_units) )
> @@ -2300,6 +2330,32 @@ static int intel_iommu_assign_device(
>               PCI_BUS(bdf) == bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> +            if ( rmrr->domid == hardware_domain->domain_id )
> +            {
> +                for_each_rmrr_device ( g_rmrr, g_bdf, j )
> +                {
> +                    if ( g_rmrr->gid == rmrr->gid )
> +                    {
> +                        if ( g_rmrr->domid == hardware_domain->domain_id )
> +                            g_rmrr->domid = pdev->domain->domain_id;
> +                        else if ( g_rmrr->domid != pdev->domain->domain_id )
> +                        {
> +                            rmrr->domid = g_rmrr->domid;
> +                            continue;
> +                        }
> +                    }
> +                }
> +            }
> +
> +            if ( rmrr->domid != pdev->domain->domain_id )
> +            {
> +                domain_context_unmap(pdev->domain, devfn, pdev);
> +                dprintk(XENLOG_ERR VTDPREFIX, "d%d: this is a group device owned by d%d\n",
> +                        pdev->domain->domain_id, rmrr->domid);
> +                rmrr->domid = 0;
> +                return -EINVAL;
> +            }
> +

Please make this a function.
>              ret = rmrr_identity_mapping(d, 1, rmrr);
>              if ( ret )
>              {
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-01  9:24 ` [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm Tiejun Chen
  2014-12-02  8:33   ` Tian, Kevin
  2014-12-02 19:39   ` Konrad Rzeszutek Wilk
@ 2014-12-04 15:33   ` Jan Beulich
  2014-12-05  6:13     ` Tian, Kevin
  2014-12-08  6:06     ` Chen, Tiejun
  2 siblings, 2 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 15:33 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -34,6 +34,7 @@
>  #include <xen/tasklet.h>
>  #include <xsm/xsm.h>
>  #include <asm/msi.h>
> +#include <xen/stdbool.h>

Please don't - we use bool_t in the hypervisor, not bool. The header
only exists for source code shared with the tools.

> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>          }
>          break;
>  
> +    case XEN_DOMCTL_set_rdm:
> +    {
> +        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> +        struct xen_guest_pcidev_info *pcidevs = NULL;
> +        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);

"d" gets passed into this function - no need to shadow the variable
and (wrongly) re-obtain the pointer.

> +
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        d->arch.hvm_domain.pci_force =
> +                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
> +        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;

You shouldn't set the count before setting the pointer.

> +        d->arch.hvm_domain.pcidevs = NULL;
> +
> +        if ( xdsr->num_pcidevs )
> +        {
> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> +                                    xdsr->num_pcidevs);

New domctl-s must not represent security risks: xdsr->num_pcidevs
can be (almost) arbitrarily large - do you really want to allow such
huge allocations? A reasonable upper bound could for example be
the total number of PCI devices the hypervisor knows about.

> +            if ( pcidevs == NULL )
> +            {
> +                rcu_unlock_domain(d);
> +                return -ENOMEM;
> +            }
> +
> +            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> +                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
> +            {
> +                xfree(pcidevs);
> +                rcu_unlock_domain(d);
> +                return -EFAULT;
> +            }
> +        }
> +
> +        d->arch.hvm_domain.pcidevs = pcidevs;

If the operation gets issued more than once for a given domain,
you're leaking the old pointer here. Overall should think a bit
more about this multiple use case (or outright disallow it).

> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>                          "  RMRR region: base_addr %"PRIx64
>                          " end_address %"PRIx64"\n",
>                          rmrru->base_address, rmrru->end_address);
> +            /*
> +             * TODO: we may provide a precise paramter just to reserve
> +             * RMRR range specific to one device.
> +             */
> +            dprintk(XENLOG_WARNING VTDPREFIX,
> +                    "So please set pci_rdmforce to reserve these ranges"
> +                    " if you need such a device in hotplug case.\n");

It makes no sense to use dprintk() here. I also don't see how this
message relates to whatever may have been logged immediately
before, so the wording ("So please set ...") is questionable. Nor is the
reference to "hotplug case" meaningful here - in this context, only
physical (host) device hotplug can be meant without further
qualification. In the end I think trying to log something here is just
wrong - simply drop the message and make sure whatever you want
to say can be found easily by looking elsewhere.

> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -90,6 +90,10 @@ struct hvm_domain {
>      /* Cached CF8 for guest PCI config cycles */
>      uint32_t                pci_cf8;
>  
> +    bool_t                  pci_force;
> +    uint32_t                num_pcidevs;
> +    struct xen_guest_pcidev_info      *pcidevs;

Without a comment all these field names are pretty questionable.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-01  9:24 ` [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm Tiejun Chen
  2014-12-02  8:46   ` Tian, Kevin
@ 2014-12-04 15:50   ` Jan Beulich
  2014-12-08  7:11     ` Chen, Tiejun
  1 sibling, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 15:50 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> --- a/xen/common/compat/memory.c
> +++ b/xen/common/compat/memory.c
> @@ -22,27 +22,66 @@ struct get_reserved_device_memory {
>      unsigned int used_entries;
>  };
>  
> -static int get_reserved_device_memory(xen_pfn_t start,
> -                                      xen_ulong_t nr, void *ctxt)
> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                      u32 id, void *ctxt)
>  {
>      struct get_reserved_device_memory *grdm = ctxt;
> +    struct domain *d;
> +    unsigned int i;
> +    u32 sbdf;
> +    struct compat_reserved_device_memory rdm = {
> +        .start_pfn = start, .nr_pages = nr
> +    };
>  
> -    if ( grdm->used_entries < grdm->map.nr_entries )
> -    {
> -        struct compat_reserved_device_memory rdm = {
> -            .start_pfn = start, .nr_pages = nr
> -        };
> +    if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> +        return -ERANGE;
>  
> -        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> -            return -ERANGE;
> +    d = rcu_lock_domain_by_any_id(grdm->map.domid);
> +    if ( d == NULL )
> +        return -ESRCH;

So why are you doing this in the call back (potentially many times)
instead of just once in compat_memory_op(), storing the pointer in
the context structure?

>  
> -        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
> -                                     &rdm, 1) )
> -            return -EFAULT;
> +    if ( d )
> +    {
> +        if ( d->arch.hvm_domain.pci_force )

You didn't verify that the domain is a HVM/PVH one.

> +        {
> +            if ( grdm->used_entries < grdm->map.nr_entries )
> +            {
> +                if ( __copy_to_compat_offset(grdm->map.buffer,
> +                                             grdm->used_entries,
> +                                             &rdm, 1) )
> +                {
> +                    rcu_unlock_domain(d);
> +                    return -EFAULT;
> +                }
> +            }
> +            ++grdm->used_entries;
> +        }
> +        else
> +        {
> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +            {
> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> +                                 d->arch.hvm_domain.pcidevs[i].bus,
> +                                 d->arch.hvm_domain.pcidevs[i].devfn);
> +                if ( sbdf == id )
> +                {
> +                    if ( grdm->used_entries < grdm->map.nr_entries )
> +                    {
> +                        if ( __copy_to_compat_offset(grdm->map.buffer,
> +                                                     grdm->used_entries,
> +                                                     &rdm, 1) )
> +                        {
> +                            rcu_unlock_domain(d);
> +                            return -EFAULT;
> +                        }
> +                    }
> +                    ++grdm->used_entries;

break;

Also trying to fold code identical on the if and else branches would
seem pretty desirable.

> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>  
>              if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>                  rc = -ENOBUFS;
> +
>              grdm.map.nr_entries = grdm.used_entries;
> -            if ( __copy_to_guest(compat, &grdm.map, 1) )
> -                rc = -EFAULT;
> +            if ( grdm.map.nr_entries )
> +            {
> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
> +                    rc = -EFAULT;
> +            }

Why do you need this change?

> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
>  
>  int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>  {
> -    struct acpi_rmrr_unit *rmrr;
> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>      int rc = 0;
> +    unsigned int i;
> +    u16 bdf;
>  
> -    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
> +    for_each_rmrr_device ( rmrr, bdf, i )
>      {
> -        rc = func(PFN_DOWN(rmrr->base_address),
> -                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
> -                  ctxt);
> -        if ( rc )
> -            break;
> +        if ( rmrr != rmrr_cur )
> +        {
> +            rc = func(PFN_DOWN(rmrr->base_address),
> +                      PFN_UP(rmrr->end_address) -
> +                        PFN_DOWN(rmrr->base_address),
> +                      PCI_SBDF(rmrr->segment, bdf),
> +                      ctxt);
> +
> +            if ( unlikely(rc < 0) )
> +                return rc;
> +
> +            /* Just go next. */
> +            if ( !rc )
> +                rmrr_cur = rmrr;
> +
> +            /* Now just return specific to user requirement. */
> +            if ( rc > 0 )
> +                return rc;

Nice that you check for that, but I can't see this case occurring
anymore. Did you lose some code? Also please don't write code
more complicated than necessary. The above two if()s could be


+            if ( rc > 0 )
+                return rc;
+
+            rmrr_cur = rmrr;

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory 
> xen_reserved_device_memory_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
>  
>  struct xen_reserved_device_memory_map {
> +    /*
> +     * Domain whose reservation is being changed.
> +     * Unprivileged domains can specify only DOMID_SELF.
> +     */
> +    domid_t        domid;
>      /* IN/OUT */
>      unsigned int nr_entries;
>      /* OUT */

Your addition lacks an IN annotation.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -31,6 +31,8 @@
>  #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
>  #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
>  #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
> +#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
> +#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))

Missing several parentheses.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-01  9:24 ` [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps Tiejun Chen
  2014-12-02  8:59   ` Tian, Kevin
  2014-12-02 20:01   ` Konrad Rzeszutek Wilk
@ 2014-12-04 15:52   ` Jan Beulich
  2014-12-08  8:52     ` Chen, Tiejun
  2 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 15:52 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> We need to use reserved device memory maps with multiple times, so
> provide just one common function should be friend.

I'm not going to repeat earlier comments; the way this is done right
now it's neither a proper runtime function nor a proper init time one.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory Tiejun Chen
  2014-12-02  9:11   ` Tian, Kevin
@ 2014-12-04 16:04   ` Jan Beulich
  2014-12-08  9:10     ` Chen, Tiejun
  1 sibling, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 16:04 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> We need to make sure all mmio allocation don't overlap
> any rdm, reserved device memory. Here we just skip
> all reserved device memory range in mmio space.

I think someone else already suggested that this and patch 9 should
be swapped, and the BAR allocation be changed to use the E820
map as input. That may end up being a bigger change, but will yield
ultimately better (and namely better maintainable) code.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps
  2014-12-01  9:24 ` [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
  2014-12-02  9:42   ` Tian, Kevin
  2014-12-02 20:17   ` Konrad Rzeszutek Wilk
@ 2014-12-04 16:20   ` Jan Beulich
  2014-12-05  6:23     ` Tian, Kevin
  2 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 16:20 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> We need to check to reserve all reserved device memory maps in e820
> to avoid any potential guest memory conflict.
> 
> Currently, if we can't insert RDM entries directly, we may need to handle
> several ranges as follows:
> a. Fixed Ranges --> BUG()
>  lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
>  BIOS region,
>  RESERVED_MEMBASE ~ 0x100000000,
> b. RAM or RAM:Hole -> Try to reserve

I continue to be unconvinced of the overall approach: The domain
builder continues to populate these regions when it shouldn't. Yet
once it doesn't, it would be most natural to simply communicate the
RAM regions to hvmloader, and hvmloader would use just that to
build the E820 table (and subsequently assign BARs).

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory Tiejun Chen
  2014-12-02  9:48   ` Tian, Kevin
  2014-12-02 20:23   ` Konrad Rzeszutek Wilk
@ 2014-12-04 16:28   ` Jan Beulich
  2014-12-05  6:24     ` Tian, Kevin
  2 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 16:28 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
> to allocate some memory to use in runtime cycle, so we alsoe need to
> make sure all reserved device memory don't overlap such a region.

While ideally this would get switched to the model outlined for
the previous two patches too, it's at least reasonable to keep
this simple allocator simple for the time being.

> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -416,9 +416,29 @@ static uint32_t alloc_down = RESERVED_MEMORY_DYNAMIC_END;
>  
>  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
>  {
> +    unsigned int i, num = hvm_get_reserved_device_memory_map();
> +    uint64_t rdm_start, rdm_end;
> +    uint32_t alloc_start, alloc_end;
> +
>      alloc_down -= nr_mfns << PAGE_SHIFT;
> +    alloc_start = alloc_down;
> +    alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
> +    for ( i = 0; i < num; i++ )
> +    {
> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << PAGE_SHIFT);
> +        if ( check_rdm_hole_conflict((uint64_t)alloc_start,
> +                                     (uint64_t)alloc_end,

Pointless casts.

> +                                     rdm_start, rdm_end - rdm_start) )
> +        {
> +            alloc_end = rdm_start;
> +            alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
> +            BUG_ON(alloc_up >= alloc_start);

This is redundant with the BUG_ON() below afaict. Or at least it
would be, if you would properly update allow_down (if you don't
you may end up returning the same PFN for two allocations).

> +        }
> +    }
> +
>      BUG_ON(alloc_up >= alloc_down);

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping
  2014-12-01  9:24 ` [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping Tiejun Chen
  2014-12-02  9:57   ` Tian, Kevin
@ 2014-12-04 16:42   ` Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 16:42 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -556,6 +556,40 @@ guest_physmap_remove_page(struct domain *d, unsigned long gfn,
>      gfn_unlock(p2m, gfn, page_order);
>  }
>  
> +/* Check if we are accessing rdm. */

If a comment doesn't do anything but re-state a function name,
it's imo superfluous.

> +int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                     u32 id, void *ctxt)
> +{
> +    xen_pfn_t end = start + nr;
> +    unsigned int i;
> +    u32 sbdf;
> +    struct p2m_get_reserved_device_memory *pgrdm = ctxt;
> +    struct domain *d = pgrdm->domain;
> +
> +    if ( d->arch.hvm_domain.pci_force )
> +    {
> +        if ( pgrdm->gfn >= start && pgrdm->gfn < end )
> +            return 1;
> +    }
> +    else
> +    {
> +        for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +        {
> +            sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> +                             d->arch.hvm_domain.pcidevs[i].bus,
> +                             d->arch.hvm_domain.pcidevs[i].devfn);
> +
> +            if ( sbdf == id )
> +            {
> +                if ( pgrdm->gfn >= start && pgrdm->gfn < end )
> +                    return 1;
> +            }

Please join together if()s like these.

> @@ -686,8 +721,28 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
>      /* Now, actually do the two-way mapping */
>      if ( mfn_valid(_mfn(mfn)) ) 
>      {
> -        rc = p2m_set_entry(p2m, gfn, _mfn(mfn), page_order, t,
> -                           p2m->default_access);
> +        pgrdm.gfn = gfn;
> +        pgrdm.domain = d;
> +        if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
> +        {
> +            rc = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                                  &pgrdm);
> +            /* We always avoid populating reserved device memory. */
> +            if ( rc == 1 )
> +            {
> +                rc = -EBUSY;
> +                goto out;

Did I overlook something in the earlier tool stack patches? How does
the tool stack avoid populating these areas?

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -709,6 +709,15 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
>      return flags;
>  }
>  
> +struct p2m_get_reserved_device_memory {
> +    unsigned long gfn;
> +    struct domain *domain;
> +};
> +
> +/* Check if we are accessing rdm. */
> +extern int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                            u32 id, void *ctxt);

Are subsequent patches going to make use of this outside of p2m.c?
If not, these declarations don't belong here. And even if the
function was going to be used elsewhere, the structure wouldn't
need defining here.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation
  2014-12-01  9:24 ` [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
  2014-12-02  9:59   ` Tian, Kevin
  2014-12-02 20:26   ` Konrad Rzeszutek Wilk
@ 2014-12-04 16:46   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 16:46 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> We always reserve these ranges since we never allow any stuff to
> poke them.
> 
> But in theory some untrusted VM can maliciously access them. So we
> need to intercept this approach. But we just don't want to leak
> anything or introduce any side affect since other OSs may touch them
> by careless behavior, so its enough to have a lightweight way, and
> it shouldn't be same as those broken pages which cause domain crush.

This needs a better explanation: If the devices associated with the
reserved region being touched are assigned to the guest, it is
permitted to touch them. If it touches regions of devices not yet or
not anymore assigned to it, the behavior should match real
hardware: Writes ignored and reads return all ones. I.e. such
accesses should get handed to the DM in that latter case.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory
  2014-12-01  9:24 ` [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory Tiejun Chen
  2014-12-02 14:54   ` Julien Grall
  2014-12-02 20:27   ` Konrad Rzeszutek Wilk
@ 2014-12-04 16:51   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 16:51 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> --- a/xen/common/mem_access.c
> +++ b/xen/common/mem_access.c
> @@ -55,6 +55,43 @@ void mem_access_resume(struct domain *d)
>      }
>  }
>  
> +/* We can't expose reserved device memory. */
> +static int mem_access_check_rdm(struct domain *d, uint64_aligned_t start,
> +                                uint32_t nr)
> +{
> +    uint32_t i;
> +    struct p2m_get_reserved_device_memory pgrdm;
> +    int rc = 0;
> +
> +    if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )

Why?

> +    {
> +        for ( i = 0; i < nr; i++ )
> +        {
> +            pgrdm.gfn = start + i;
> +            pgrdm.domain = d;
> +            rc = iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +                                                  &pgrdm);
> +            if ( rc < 0 )
> +            {
> +                printk(XENLOG_WARNING
> +                       "Domain %d can't check reserved device memory.\n",

If I saw this text in a log file, it wouldn't mean anything to me.
Additionally this is only partly useful without also listing the
offending domain (which isn't d afaict) and the GFN.

> +                       d->domain_id);
> +                return rc;
> +            }
> +
> +            if ( rc == 1 )
> +            {
> +                printk(XENLOG_WARNING
> +                       "Domain %d: we shouldn't mem_access reserved device memory.\n",

This one's only marginally better than the one above.

> +                       d->domain_id);
> +                return rc;
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}
> +
>  int mem_access_memop(unsigned long cmd,
>                       XEN_GUEST_HANDLE_PARAM(xen_mem_access_op_t) arg)
>  {
> @@ -99,6 +136,10 @@ int mem_access_memop(unsigned long cmd,
>                ((mao.pfn + mao.nr - 1) > domain_get_maximum_gpfn(d))) )
>              break;
>  
> +        rc =  mem_access_check_rdm(d, mao.pfn, mao.nr);
> +        if ( rc == 1 )
> +            break;

So you decided to return 1 from the hypercall - what is that
supposed to mean to an unaware caller?

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR
  2014-12-01  9:24 ` [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR Tiejun Chen
  2014-12-02 10:11   ` Tian, Kevin
  2014-12-02 20:40   ` Konrad Rzeszutek Wilk
@ 2014-12-04 17:05   ` Jan Beulich
  2 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-04 17:05 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -572,10 +572,11 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  {
>      struct acpi_dmar_reserved_memory *rmrr =
>          container_of(header, struct acpi_dmar_reserved_memory, header);
> -    struct acpi_rmrr_unit *rmrru;
> +    struct acpi_rmrr_unit *rmrru, *cur_rmrr;
>      void *dev_scope_start, *dev_scope_end;
>      u64 base_addr = rmrr->base_address, end_addr = rmrr->end_address;
>      int ret;
> +    static unsigned int group_id = 0;

__initdata. Pointless initializer.

> @@ -682,7 +685,30 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>                      "So please set pci_rdmforce to reserve these ranges"
>                      " if you need such a device in hotplug case.\n");
>  
> +            list_for_each_entry(cur_rmrr, &acpi_rmrr_units, list)
> +            {
> +                /*
> +                 * Any same or overlap range mean they should be
> +                 * at same group.
> +                 */
> +                if ( ((base_addr >= cur_rmrr->base_address) &&
> +                     (end_addr <= cur_rmrr->end_address)) ||
> +                     ((base_addr <= cur_rmrr->base_address) &&
> +                     (end_addr >= cur_rmrr->end_address)) )

This is both more complicated than needed and wrong. You want
an overlap (partial or complete doesn't matter) check, i.e.
start1 <= end2 && start2 <= end1.

> +                {
> +                    rmrru->gid = cur_rmrr->gid;
> +                    continue;

break

Also this doesn't seem to handle cases where you see in this order

[2,3]
[4,6]
[3,5]

But the more fundamental question is: Are overlaps of RMRRs
actually allowed, or would it not better to bail in that case and
leave the IOMMU disabled?

But the code further down looks so broken that I'll leave to you
to discuss this with your colleagues acting as VT-d maintainers.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-04 15:33   ` Jan Beulich
@ 2014-12-05  6:13     ` Tian, Kevin
  2014-12-08  6:06     ` Chen, Tiejun
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-05  6:13 UTC (permalink / raw)
  To: Jan Beulich, Chen, Tiejun
  Cc: wei.liu2, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Zhang, Yang Z

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, December 04, 2014 11:33 PM
> > +            if ( pcidevs == NULL )
> > +            {
> > +                rcu_unlock_domain(d);
> > +                return -ENOMEM;
> > +            }
> > +
> > +            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> > +
> xdsr->num_pcidevs*sizeof(*pcidevs)) )
> > +            {
> > +                xfree(pcidevs);
> > +                rcu_unlock_domain(d);
> > +                return -EFAULT;
> > +            }
> > +        }
> > +
> > +        d->arch.hvm_domain.pcidevs = pcidevs;
> 
> If the operation gets issued more than once for a given domain,
> you're leaking the old pointer here. Overall should think a bit
> more about this multiple use case (or outright disallow it).

from current discussion let's outright disallow it. the information
should be ready early enough before populating p2m.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps
  2014-12-04 16:20   ` Jan Beulich
@ 2014-12-05  6:23     ` Tian, Kevin
  2014-12-05  7:43       ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-05  6:23 UTC (permalink / raw)
  To: Jan Beulich, Chen, Tiejun
  Cc: wei.liu2, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Zhang, Yang Z

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, December 05, 2014 12:20 AM
> 
> >>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> > We need to check to reserve all reserved device memory maps in e820
> > to avoid any potential guest memory conflict.
> >
> > Currently, if we can't insert RDM entries directly, we may need to handle
> > several ranges as follows:
> > a. Fixed Ranges --> BUG()
> >  lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
> >  BIOS region,
> >  RESERVED_MEMBASE ~ 0x100000000,
> > b. RAM or RAM:Hole -> Try to reserve
> 
> I continue to be unconvinced of the overall approach: The domain
> builder continues to populate these regions when it shouldn't. Yet
> once it doesn't, it would be most natural to simply communicate the

doesn't -> does?

> RAM regions to hvmloader, and hvmloader would use just that to
> build the E820 table (and subsequently assign BARs).
> 

My impression is that you didn't like extending hvm_info to carry
sparse RAM regions. that's why the current tradeoff is taken, i.e.
leaving domain builder unchanged for RAM, then preventing EPT 
setup for reserved regions in hypervisor (means wasting memory), 
and then having hvmloader to actually figure out the final e820. 
and that's also why per-BDF design is introduced to minimize wasted 
memory. We discussed to change domain builder to avoid populating 
reserved regions as the next step after 4.5, but w/o extending 
hvm_info we always need the logic in hvmloader to construct e820 
from scratch.

I did not catch all the discussion history between you and Tiejun, so 
may miss sth. here. (btw Tiejun is on an urgent leave, so his response
will be slow in a few days)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
  2014-12-04 16:28   ` Jan Beulich
@ 2014-12-05  6:24     ` Tian, Kevin
  2014-12-05  7:46       ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-05  6:24 UTC (permalink / raw)
  To: Jan Beulich, Chen, Tiejun
  Cc: wei.liu2, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Zhang, Yang Z

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, December 05, 2014 12:29 AM
> 
> >>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> > In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
> > to allocate some memory to use in runtime cycle, so we alsoe need to
> > make sure all reserved device memory don't overlap such a region.
> 
> While ideally this would get switched to the model outlined for
> the previous two patches too, it's at least reasonable to keep
> this simple allocator simple for the time being.
> 
> > --- a/tools/firmware/hvmloader/util.c
> > +++ b/tools/firmware/hvmloader/util.c
> > @@ -416,9 +416,29 @@ static uint32_t alloc_down =
> RESERVED_MEMORY_DYNAMIC_END;
> >
> >  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
> >  {
> > +    unsigned int i, num = hvm_get_reserved_device_memory_map();
> > +    uint64_t rdm_start, rdm_end;
> > +    uint32_t alloc_start, alloc_end;
> > +
> >      alloc_down -= nr_mfns << PAGE_SHIFT;
> > +    alloc_start = alloc_down;
> > +    alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
> > +    for ( i = 0; i < num; i++ )
> > +    {
> > +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> > +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
> PAGE_SHIFT);
> > +        if ( check_rdm_hole_conflict((uint64_t)alloc_start,
> > +                                     (uint64_t)alloc_end,
> 
> Pointless casts.
> 
> > +                                     rdm_start, rdm_end -
> rdm_start) )
> > +        {
> > +            alloc_end = rdm_start;
> > +            alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
> > +            BUG_ON(alloc_up >= alloc_start);
> 
> This is redundant with the BUG_ON() below afaict. Or at least it
> would be, if you would properly update allow_down (if you don't
> you may end up returning the same PFN for two allocations).
> 

I'd like this being done once at init time. Once alloc_up/down is
verified/adjusted, no need to add run-time overhead here.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps
  2014-12-05  6:23     ` Tian, Kevin
@ 2014-12-05  7:43       ` Jan Beulich
  0 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-05  7:43 UTC (permalink / raw)
  To: Kevin Tian, Tiejun Chen
  Cc: wei.liu2, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Yang Z Zhang

>>> On 05.12.14 at 07:23, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, December 05, 2014 12:20 AM
>> 
>> >>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> > We need to check to reserve all reserved device memory maps in e820
>> > to avoid any potential guest memory conflict.
>> >
>> > Currently, if we can't insert RDM entries directly, we may need to handle
>> > several ranges as follows:
>> > a. Fixed Ranges --> BUG()
>> >  lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
>> >  BIOS region,
>> >  RESERVED_MEMBASE ~ 0x100000000,
>> > b. RAM or RAM:Hole -> Try to reserve
>> 
>> I continue to be unconvinced of the overall approach: The domain
>> builder continues to populate these regions when it shouldn't. Yet
>> once it doesn't, it would be most natural to simply communicate the
> 
> doesn't -> does?

No. The domain builder currently populates these regions (at least
I didn't spot a change to make it not do so).

>> RAM regions to hvmloader, and hvmloader would use just that to
>> build the E820 table (and subsequently assign BARs).
>> 
> 
> My impression is that you didn't like extending hvm_info to carry
> sparse RAM regions. that's why the current tradeoff is taken, i.e.
> leaving domain builder unchanged for RAM, then preventing EPT 
> setup for reserved regions in hypervisor (means wasting memory), 
> and then having hvmloader to actually figure out the final e820. 
> and that's also why per-BDF design is introduced to minimize wasted 
> memory. We discussed to change domain builder to avoid populating 
> reserved regions as the next step after 4.5, but w/o extending 
> hvm_info we always need the logic in hvmloader to construct e820 
> from scratch.

Communicating this via hvm_info is not the only way. For example,
the XENMEM_{set_,}memory_map pair of hypercalls could be used
(and is readily available to be extended that way, since for HVM
domains XENMEM_set_memory_map returns -EPERM at present). The
only potentially problematic aspect I can see with using it might be
its limiting of the entry count to E820MAX.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory
  2014-12-05  6:24     ` Tian, Kevin
@ 2014-12-05  7:46       ` Jan Beulich
  0 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-05  7:46 UTC (permalink / raw)
  To: Kevin Tian, Tiejun Chen
  Cc: wei.liu2, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Yang Z Zhang

>>> On 05.12.14 at 07:24, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, December 05, 2014 12:29 AM
>> 
>> >>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> > In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
>> > to allocate some memory to use in runtime cycle, so we alsoe need to
>> > make sure all reserved device memory don't overlap such a region.
>> 
>> While ideally this would get switched to the model outlined for
>> the previous two patches too, it's at least reasonable to keep
>> this simple allocator simple for the time being.
>> 
>> > --- a/tools/firmware/hvmloader/util.c
>> > +++ b/tools/firmware/hvmloader/util.c
>> > @@ -416,9 +416,29 @@ static uint32_t alloc_down =
>> RESERVED_MEMORY_DYNAMIC_END;
>> >
>> >  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
>> >  {
>> > +    unsigned int i, num = hvm_get_reserved_device_memory_map();
>> > +    uint64_t rdm_start, rdm_end;
>> > +    uint32_t alloc_start, alloc_end;
>> > +
>> >      alloc_down -= nr_mfns << PAGE_SHIFT;
>> > +    alloc_start = alloc_down;
>> > +    alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
>> > +    for ( i = 0; i < num; i++ )
>> > +    {
>> > +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
>> > +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
>> PAGE_SHIFT);
>> > +        if ( check_rdm_hole_conflict((uint64_t)alloc_start,
>> > +                                     (uint64_t)alloc_end,
>> 
>> Pointless casts.
>> 
>> > +                                     rdm_start, rdm_end -
>> rdm_start) )
>> > +        {
>> > +            alloc_end = rdm_start;
>> > +            alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
>> > +            BUG_ON(alloc_up >= alloc_start);
>> 
>> This is redundant with the BUG_ON() below afaict. Or at least it
>> would be, if you would properly update allow_down (if you don't
>> you may end up returning the same PFN for two allocations).
>> 
> 
> I'd like this being done once at init time. Once alloc_up/down is
> verified/adjusted, no need to add run-time overhead here.

I don't think that would work, as you can't predict where the holes
are, and limiting allocations to e.g. just the largest region between
any two holes may end up being too restrictive (without having
checked how much memory may get allocated this way in the
worst case). Of course there's also the problem to address that
the whole region used so far overlaps with an enforced hole.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 17/17] xen/vtd: re-enable USB device assignment if enable pci_force
  2014-12-01  9:24 ` [v8][PATCH 17/17] xen/vtd: re-enable USB device assignment if enable pci_force Tiejun Chen
@ 2014-12-05 16:12   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-05 16:12 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 01, 2014 at 05:24:35PM +0800, Tiejun Chen wrote:
> Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
> region so we always ignore USB RMRR. Now this can be gone when we enable
> pci_force to check/reserve RMRR.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/drivers/passthrough/vtd/dmar.h  |  1 +
>  xen/drivers/passthrough/vtd/iommu.c | 12 ++++++++----
>  xen/drivers/passthrough/vtd/utils.c | 18 ++++++++++++++++++
>  3 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
> index a57c0d4..832dc32 100644
> --- a/xen/drivers/passthrough/vtd/dmar.h
> +++ b/xen/drivers/passthrough/vtd/dmar.h
> @@ -132,6 +132,7 @@ do {                                                \
>  int vtd_hw_check(void);
>  void disable_pmr(struct iommu *iommu);
>  int is_usb_device(u16 seg, u8 bus, u8 devfn);
> +int is_reserve_device_memory(struct domain *d, u8 bus, u8 devfn);
>  int is_igd_drhd(struct acpi_drhd_unit *drhd);
>  
>  #endif /* _DMAR_H_ */
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index ba40209..1f1ceb7 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2264,9 +2264,11 @@ static int reassign_device_ownership(
>       * remove it from the hardware domain, because BIOS may use RMRR at
>       * booting time. Also account for the special casing of USB below (in
>       * intel_iommu_assign_device()).
> +     * But if we already check to reserve RMRR, this should be fine.
>       */
>      if ( !is_hardware_domain(source) &&
> -         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
> +         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) &&
> +         !is_reserve_device_memory(source, pdev->bus, pdev->devfn) )
>      {
>          const struct acpi_rmrr_unit *rmrr;
>          u16 bdf;
> @@ -2315,12 +2317,14 @@ static int intel_iommu_assign_device(
>      if ( ret )
>          return ret;
>  
> -    /* FIXME: Because USB RMRR conflicts with guest bios region,
> -     * ignore USB RMRR temporarily.
> +    /*
> +     * Because USB RMRR conflicts with guest bios region,
> +     * ignore USB RMRR temporarily in case of non-reserving-RMRR.
>       */
>      seg = pdev->seg;
>      bus = pdev->bus;
> -    if ( is_usb_device(seg, bus, pdev->devfn) )
> +    if ( is_usb_device(seg, bus, pdev->devfn) &&
> +         !is_reserve_device_memory(d, bus, pdev->devfn) )
>          return 0;
>  
>      /* Setup rmrr identity mapping */
> diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
> index a33564b..1045ac1 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -36,6 +36,24 @@ int is_usb_device(u16 seg, u8 bus, u8 devfn)
>      return (class == 0xc03);
>  }
>  
> +int is_reserve_device_memory(struct domain *d, u8 bus, u8 devfn)
> +{
> +    int i = 0;
> +
> +    if ( d->arch.hvm_domain.pci_force == PCI_DEV_RDM_CHECK )
> +        return 1;

Ouch. What if the 'hvm_domain' is not there? Please check
first for that.
> +
> +    for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +    {
> +        if ( d->arch.hvm_domain.pcidevs[i].bus == bus &&
> +             d->arch.hvm_domain.pcidevs[i].devfn == devfn &&
> +             d->arch.hvm_domain.pcidevs[i].flags == PCI_DEV_RDM_CHECK )
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
>  /* Disable vt-d protected memory registers. */
>  void disable_pmr(struct iommu *iommu)
>  {
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-02  8:33   ` Tian, Kevin
@ 2014-12-08  1:30     ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  1:30 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

On 2014/12/2 16:33, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Monday, December 01, 2014 5:24 PM
>>
>> This should be based on a new parameter globally, 'pci_rdmforce'.
>>
>> pci_rdmforce = 1 => Of course this should be 0 by default.
>>
>> '1' means we should force check to reserve all ranges. If failed
>> VM wouldn't be created successfully. This also can give user a
>> chance to work well with later hotplug, even if not a device
>> assignment while creating VM.
>>
>> But we can override that by one specific pci device:
>>
>> pci = ['AA:BB.CC,rdmforce=0/1]
>>
>> But this 'rdmforce' should be 1 by default since obviously any
>> passthrough device always need to do this. Actually no one really
>> want to set as '0' so it may be unnecessary but I'd like to leave
>> this as a potential approach.
>
> since no one requires it, why bother adding it? better to just
> keep global option.

This originates from my preliminary thought. Here I hope we can extend 
this approach to enable this feature just for any specific device in 
hotplug case.

But yes, this definitely isn't mandatory now since we can take this in 
next step so I can remove this right now, and of course I assume there's 
no any objection from other guys.

>
>>
>> So this domctl provides an approach to control how to populate
>> reserved device memory by tools.
>>
>> Note we always post a message to user about this once we owns
>> RMRR.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---

[snip]

>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -90,6 +90,53 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>>       return dm;
>>   }
>>
>> +int libxl__domain_device_setrdm(libxl__gc *gc,
>> +                                libxl_domain_config *d_config,
>> +                                uint32_t dm_domid)
>> +{
>> +    int i, ret;
>> +    libxl_ctx *ctx = libxl__gc_owner(gc);
>> +    struct xen_guest_pcidev_info *pcidevs = NULL;
>> +    uint32_t rdmforce = 0;
>> +
>> +    if ( d_config->num_pcidevs )
>> +    {
>> +        pcidevs =
>> malloc(d_config->num_pcidevs*sizeof(xen_guest_pcidev_info_t));
>> +        if ( pcidevs )
>> +        {
>> +            for (i = 0; i < d_config->num_pcidevs; i++)
>> +            {
>> +                pcidevs[i].devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
>> +
>> d_config->pcidevs[i].func);
>> +                pcidevs[i].bus = d_config->pcidevs[i].bus;
>> +                pcidevs[i].seg = d_config->pcidevs[i].domain;
>> +                pcidevs[i].flags = d_config->pcidevs[i].rdmforce &
>> +                                   PCI_DEV_RDM_CHECK;
>> +            }
>> +        }
>> +        else
>> +        {
>> +            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
>> +                               "Can't allocate for pcidevs.");
>> +            return -1;
>> +        }
>> +    }
>> +    rdmforce = libxl_defbool_val(d_config->b_info.rdmforce) ? 1 : 0;
>> +
>> +    /* Nothing to do. */
>> +    if ( !rdmforce && !d_config->num_pcidevs )
>> +        return 0;
>
> move check before creating pcidevs.
>

Okay,

@@ -99,40 +99,33 @@ int libxl__domain_device_setrdm(libxl__gc *gc,
      struct xen_guest_pcidev_info *pcidevs = NULL;
      uint32_t rdmforce = 0;

-    if ( d_config->num_pcidevs )
-    {
-        pcidevs = 
malloc(d_config->num_pcidevs*sizeof(xen_guest_pcidev_info_t));
-        if ( pcidevs )
-        {
-            for (i = 0; i < d_config->num_pcidevs; i++)
-            {
-                pcidevs[i].devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
-                                             d_config->pcidevs[i].func);
-                pcidevs[i].bus = d_config->pcidevs[i].bus;
-                pcidevs[i].seg = d_config->pcidevs[i].domain;
-                pcidevs[i].flags = d_config->pcidevs[i].rdmforce &
-                                   PCI_DEV_RDM_CHECK;
-            }
-        }
-        else
-        {
-            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
-                               "Can't allocate for pcidevs.");
-            return -1;
-        }
-    }
      rdmforce = libxl_defbool_val(d_config->b_info.rdmforce) ? 1 : 0;
-
      /* Nothing to do. */
      if ( !rdmforce && !d_config->num_pcidevs )
          return 0;

+    pcidevs = 
malloc(d_config->num_pcidevs*sizeof(xen_guest_pcidev_info_t));
+    if ( pcidevs )
+    {
+        for (i = 0; i < d_config->num_pcidevs; i++)
+        {
+            pcidevs[i].devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
+                                         d_config->pcidevs[i].func);
+            pcidevs[i].bus = d_config->pcidevs[i].bus;
+            pcidevs[i].seg = d_config->pcidevs[i].domain;
+        }
+    }
+    else
+    {
+        LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Can't allocate for pcidevs.");
+        return -1;
+    }
+
      ret = xc_domain_device_setrdm(ctx->xch, dm_domid,
                                    (uint32_t)d_config->num_pcidevs,
-                                  rdmforce,
+                                  rdmforce & PCI_DEV_RDM_CHECK,
                                    pcidevs);
-    if ( d_config->num_pcidevs )
-        free(pcidevs);
+    free(pcidevs);

      return ret;
  }


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-02 19:39   ` Konrad Rzeszutek Wilk
@ 2014-12-08  3:16     ` Chen, Tiejun
  2014-12-08 15:57       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  3:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang


On 2014/12/3 3:39, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 01, 2014 at 05:24:20PM +0800, Tiejun Chen wrote:
>> This should be based on a new parameter globally, 'pci_rdmforce'.
>>
>> pci_rdmforce = 1 => Of course this should be 0 by default.
>>
>> '1' means we should force check to reserve all ranges. If failed
>> VM wouldn't be created successfully. This also can give user a
>> chance to work well with later hotplug, even if not a device
>> assignment while creating VM.
>>
>> But we can override that by one specific pci device:
>>
>> pci = ['AA:BB.CC,rdmforce=0/1]
>>
>> But this 'rdmforce' should be 1 by default since obviously any
>> passthrough device always need to do this. Actually no one really
>> want to set as '0' so it may be unnecessary but I'd like to leave
>> this as a potential approach.
>>
>> So this domctl provides an approach to control how to populate
>> reserved device memory by tools.
>>
>> Note we always post a message to user about this once we owns
>> RMRR.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   docs/man/xl.cfg.pod.5              |  6 +++++
>>   docs/misc/vtd.txt                  | 15 ++++++++++++
>>   tools/libxc/include/xenctrl.h      |  6 +++++
>>   tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
>>   tools/libxl/libxl_create.c         |  3 +++
>>   tools/libxl/libxl_dm.c             | 47 ++++++++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_internal.h       |  4 ++++
>>   tools/libxl/libxl_types.idl        |  2 ++
>>   tools/libxl/libxlu_pci.c           |  2 ++
>>   tools/libxl/xl_cmdimpl.c           | 10 ++++++++
>
> In the past we had split the hypervisor and the
> toolstack patches in two. So that one could focus
> on the hypervisor ones first, and then in another
> patch on the toolstack.
>

Yes.

> But perhaps this was intended to be in one patch?

This change also involve docs so its little bit harder to understand the 
whole page if we split this.

>
>>   xen/drivers/passthrough/pci.c      | 39 +++++++++++++++++++++++++++++++
>>   xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
>>   xen/include/asm-x86/hvm/domain.h   |  4 ++++
>
> I don't see ARM here? Should there be an ARM variant of this? If not

ARM doesn't need this feature.

> should the toolstack ones only run under x86?

And I think this shouldn't broken current ARM path as well. I mean this 
would return simply since ARM hasn't such a hypercall handler.

>
>>   xen/include/public/domctl.h        | 21 +++++++++++++++++
>>   xen/xsm/flask/hooks.c              |  1 +
>>   15 files changed, 196 insertions(+)
>>
>> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
>> index 622ea53..9adc41e 100644
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
>>   D0-D3hot power management states for the PCI device. False (0) by
>>   default.
>>
>> +=item B<rdmforce=BOOLEAN>
>> +
>> +(HVM/x86 only) Specifies that the VM would force to check and try to
>
> s/force/forced/

I guess you're saying 'be forced'.

>> +reserve all reserved device memory, like RMRR, associated to the PCI
>> +device. False (0) by default.
>
> Not sure I understand. How would the VM be forced to do this? Or is
> it that the hvmloader would force to do this? And if it fails (as you

Yes.

> say 'try') ? What then?

In most cases we can reserve these regions but if these RMRR regions 
overlap with some fixed range, like guest BIOS, we can't succeed in this 
case.

>
>> +
>>   =back
>>
>>   =back
>> diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
>> index 9af0e99..23544d5 100644
>> --- a/docs/misc/vtd.txt
>> +++ b/docs/misc/vtd.txt
>> @@ -111,6 +111,21 @@ in the config file:
>>   To override for a specific device:
>>   	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
>>
>> +RDM, 'reserved device memory', for PCI Device Passthrough
>> +---------------------------------------------------------
>> +
>> +The BIOS controls some devices in terms of some reginos of memory used for
>
> Could you elaborate what 'some devices' are? Network cards? GPUs? What
> are the most commons ones.

Some legacy USB device to perform PS2 emulation, and GPU has a stolen 
memory as I remember.

>
> s/reginos/regions/

Fixed.

>
> And by regions you mean BAR regions?

No. I guess you want to know some background about RMRR :)

There's a good brief description in Linux:

What is RMRR?
-------------

There are some devices the BIOS controls, for e.g USB devices to perform
PS2 emulation. The regions of memory used for these devices are marked
reserved in the e820 map. When we turn on DMA translation, DMA to those
regions will fail. Hence BIOS uses RMRR to specify these regions along with
devices that need to access these regions. OS is expected to setup
unity mappings for these regions for these devices to access these regions.

>
>> +these devices. This kind of region should be reserved before creating a VM
>> +to make sure they are not occupied by RAM/MMIO to conflict, and also we can
>
> You said 'This' but here you are using the plural ' are'. IF you want it plural
> it needs to be 'These regions'

Thanks for your correction.

>> +create necessary IOMMU table successfully.
>> +
>> +To enable this globally, add "pci_rdmforce" in the config file:
>> +
>> +	pci_rdmforce = 1         (default is 0)
>
> The guest config file? Or /etc/xen/xl.conf ?

The guest config file. Here I just follow something about 
'pci_msitranslate' since they have that usage in common.

>
>> +
>> +Or just enable for a specific device:
>> +	pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
>> +
>>
>>   Caveat on Conventional PCI Device Passthrough
>>   ---------------------------------------------
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h

[snip]

>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -34,6 +34,7 @@
>>   #include <xen/tasklet.h>
>>   #include <xsm/xsm.h>
>>   #include <asm/msi.h>
>> +#include <xen/stdbool.h>
>>
>>   struct pci_seg {
>>       struct list_head alldevs_list;
>> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>>           }
>>           break;
>>
>> +    case XEN_DOMCTL_set_rdm:
>> +    {
>> +        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
>> +        struct xen_guest_pcidev_info *pcidevs = NULL;
>> +        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
>> +
>> +        if ( d == NULL )
>> +            return -ESRCH;
>> +
>
> What if this is called on an PV domain?

Currently we just support this in HVM, so I'd like to add this,

          if ( d == NULL )
              return -ESRCH;

+        ASSERT( is_hvm_domain(d) );
+

>
> You are also missing the XSM checks.

Just see this below.

>
> What if this is called multiple times. Is it OK to over-ride
> the 'pci_force' or should it stick once?

It should be fine since just xc/hvmloader need such an information while 
creating a VM.

And especially, currently we just call this one time to set. So why we 
need to call this again and again? I think if anyone want to extend such 
a case you're worrying, he should know any effect before he take a 
action, right?

>
>
>> +        d->arch.hvm_domain.pci_force =
>> +                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
>
> Won't we crash here if this is called for PV guests?

After the line, 'ASSERT( is_hvm_domain(d) );', is added, this problem 
should be gone.

>
>> +        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
>
> What if the 'num_pcidevs' has some bogus value. You need to check for that.

This value is grabbed from that existing interface, assign_device, so I 
mean this is already checked.

>
>
>> +        d->arch.hvm_domain.pcidevs = NULL;
>
> Please first free it. It might be that the toolstack
> is doing this a couple of times. You don't want to leak memory.
>

Okay,

+        if ( d->arch.hvm_domain.pcidevs )
+            xfree(d->arch.hvm_domain.pcidevs);

>
>> +
>> +        if ( xdsr->num_pcidevs )
>> +        {
>> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
>> +                                    xdsr->num_pcidevs);
>> +            if ( pcidevs == NULL )
>> +            {
>> +                rcu_unlock_domain(d);
>> +                return -ENOMEM;
>
> But you already have set 'num_pcidevs' to some value. This copying/check
> should be done before you modify 'd->arch.hvm_domain'...

This makes sense so I'll move down this fragment.

>> +            }
>> +
>> +            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
>> +                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
>> +            {
>> +                xfree(pcidevs);
>> +                rcu_unlock_domain(d);
>
> Ditto. You need to do these checks before you modify 'd->arch.hvm_domain'.
>
>> +                return -EFAULT;
>> +            }
>> +        }
>> +
>> +        d->arch.hvm_domain.pcidevs = pcidevs;
>> +        rcu_unlock_domain(d);
>> +    }
>> +        break;
>> +
>>       case XEN_DOMCTL_assign_device:
>>           if ( unlikely(d->is_dying) )
>>           {
>> diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
>> index 1152c3a..5e41e7a 100644
>> --- a/xen/drivers/passthrough/vtd/dmar.c
>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>> @@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>>                           "  RMRR region: base_addr %"PRIx64
>>                           " end_address %"PRIx64"\n",
>>                           rmrru->base_address, rmrru->end_address);
>> +            /*
>> +             * TODO: we may provide a precise paramter just to reserve
>
> s/paramter/parameter/

Fixed.

>> +             * RMRR range specific to one device.
>> +             */
>> +            dprintk(XENLOG_WARNING VTDPREFIX,
>> +                    "So please set pci_rdmforce to reserve these ranges"
>> +                    " if you need such a device in hotplug case.\n");

s/hotplug/passthrough

>
> 'Please set rdmforce to reserve ranges %lx->%lx if you plan to hotplug this device.'
>
> But then this is going to be a bit verbose, so perhaps:
>
> 'Ranges %lx-%lx need rdmforce to properly work.' ?

Its unnecessary to output range again since we already have such a print 
message here.

>
>> +
>>               acpi_register_rmrr_unit(rmrru);
>>           }
>>       }
>> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
>> index 2757c7f..38530e5 100644
>> --- a/xen/include/asm-x86/hvm/domain.h
>> +++ b/xen/include/asm-x86/hvm/domain.h
>> @@ -90,6 +90,10 @@ struct hvm_domain {
>>       /* Cached CF8 for guest PCI config cycles */
>>       uint32_t                pci_cf8;
>>
>
> Maybe a comment explaining its purpose?

Okay.

/* Force to check/reserve Reserved Device Memory. */
     bool_t                  pci_force;

>
>> +    bool_t                  pci_force;
>> +    uint32_t                num_pcidevs;
>> +    struct xen_guest_pcidev_info      *pcidevs;
>> +
>
> You are also missing freeing of this in the hypervisor when the guest
> is destroyed. Please fix that.

You're right. I will go there next revision.

>
>>       struct pl_time         pl_time;
>>
>>       struct hvm_io_handler *io_handler;
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 57e2ed7..ba8970d 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
>>   typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
>>
>> +/* Currently just one bit to indicate force to check Reserved Device Memory. */
>
> Not sure I understand. Did you mean:
>
> 'Check Reserved Device Memory'.

I can change this as '...force checking Reserved Device Memory.'

>
> What happens if you do not have this flag? What are the semantics
> of this hypercall - as in what will it mean.

If we have no this flag, these devices owned RMRR can't work in 
passthrough case.

>
>> +#define PCI_DEV_RDM_CHECK   0x1
>> +struct xen_guest_pcidev_info {
>> +    uint16_t    seg;
>> +    uint8_t     bus;
>> +    uint8_t     devfn;
>> +    uint32_t    flags;
>> +};
>> +typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
>> +/* Control whether/how we check and reserve device memory. */
>> +struct xen_domctl_set_rdm {
>> +    uint32_t    flags;
>
> What is this 'flags' purpose compared to the 'pcidevs.flags'? Please
> explain.

I replied something to Kevin, and we just need a global flag so we can 
remove pcidevs.flags.

>
>> +    uint32_t    num_pcidevs;
>> +    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
>> +};
>> +typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
>> +
>>   /* Pass-through interrupts: bind real irq -> hvm devfn. */
>>   /* XEN_DOMCTL_bind_pt_irq */
>>   /* XEN_DOMCTL_unbind_pt_irq */
>> @@ -1070,6 +1089,7 @@ struct xen_domctl {
>>   #define XEN_DOMCTL_setvnumainfo                  74
>>   #define XEN_DOMCTL_psr_cmt_op                    75
>>   #define XEN_DOMCTL_arm_configure_domain          76
>> +#define XEN_DOMCTL_set_rdm                       77
>>   #define XEN_DOMCTL_gdbsx_guestmemio            1000
>>   #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>>   #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>> @@ -1135,6 +1155,7 @@ struct xen_domctl {
>>           struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
>>           struct xen_domctl_vnuma             vnuma;
>>           struct xen_domctl_psr_cmt_op        psr_cmt_op;
>> +        struct xen_domctl_set_rdm           set_rdm;
>>           uint8_t                             pad[128];
>>       } u;
>>   };
>> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
>> index d48463f..5a760e2 100644
>> --- a/xen/xsm/flask/hooks.c
>> +++ b/xen/xsm/flask/hooks.c
>> @@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
>>       case XEN_DOMCTL_test_assign_device:
>>       case XEN_DOMCTL_assign_device:
>>       case XEN_DOMCTL_deassign_device:
>> +    case XEN_DOMCTL_set_rdm:
>
> There is more to XSM than just this file..

But I don't see more other stuff, like XEN_DOMCTL_assign_device.

>
> Please compile with XSM enabled.

Anyway, I add XSM_ENABLE = y and FLASK_ENABLE = y in Config.mk then 
recompile, but looks good.

Anything I'm missing?

>>   #endif
>>           return 0;
>
>
> Also how does this work with 32-bit dom0s? Is there a need to use the
> compat layer?

Are you saying in xsm case? Others?

Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in 
some senses but I don't see such an issue you're pointing.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-04 15:33   ` Jan Beulich
  2014-12-05  6:13     ` Tian, Kevin
@ 2014-12-08  6:06     ` Chen, Tiejun
  2014-12-08  8:43       ` Jan Beulich
  1 sibling, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  6:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/4 23:33, Jan Beulich wrote:
>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -34,6 +34,7 @@
>>   #include <xen/tasklet.h>
>>   #include <xsm/xsm.h>
>>   #include <asm/msi.h>
>> +#include <xen/stdbool.h>
>
> Please don't - we use bool_t in the hypervisor, not bool. The header

Yes.

> only exists for source code shared with the tools.

Looks this could be fine,

d->arch.hvm_domain.pci_force = xdsr->flags & PCI_DEV_RDM_CHECK;

>
>> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>>           }
>>           break;
>>
>> +    case XEN_DOMCTL_set_rdm:
>> +    {
>> +        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
>> +        struct xen_guest_pcidev_info *pcidevs = NULL;
>> +        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
>
> "d" gets passed into this function - no need to shadow the variable

You're right.

> and (wrongly) re-obtain the pointer.
>
>> +
>> +        if ( d == NULL )
>> +            return -ESRCH;
>> +
>> +        d->arch.hvm_domain.pci_force =
>> +                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
>> +        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
>
> You shouldn't set the count before setting the pointer.

Will reorder them.

>
>> +        d->arch.hvm_domain.pcidevs = NULL;
>> +
>> +        if ( xdsr->num_pcidevs )
>> +        {
>> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
>> +                                    xdsr->num_pcidevs);
>
> New domctl-s must not represent security risks: xdsr->num_pcidevs
> can be (almost) arbitrarily large - do you really want to allow such
> huge allocations? A reasonable upper bound could for example be

Sorry, as you know this num_pcidevs is from tools, and actually it share 
that result from that existing hypercall, assign_device, while parsing 
'pci=[]', so I couldn't understand why this can be (almost" arbitrarily 
large.

> the total number of PCI devices the hypervisor knows about.

I take a quick look at this but looks we have no this exact value that 
we can get directly.

>
>> +            if ( pcidevs == NULL )
>> +            {
>> +                rcu_unlock_domain(d);
>> +                return -ENOMEM;
>> +            }
>> +
>> +            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
>> +                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
>> +            {
>> +                xfree(pcidevs);
>> +                rcu_unlock_domain(d);
>> +                return -EFAULT;
>> +            }
>> +        }
>> +
>> +        d->arch.hvm_domain.pcidevs = pcidevs;
>
> If the operation gets issued more than once for a given domain,
> you're leaking the old pointer here. Overall should think a bit
> more about this multiple use case (or outright disallow it).

Currently this should be disallowed, so I will do this,

     case XEN_DOMCTL_set_rdm:
     {
         struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
         struct xen_guest_pcidev_info *pcidevs = NULL;

         if ( d->arch.hvm_domain.pcidevs )
             break;
	...

>
>> --- a/xen/drivers/passthrough/vtd/dmar.c
>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>> @@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>>                           "  RMRR region: base_addr %"PRIx64
>>                           " end_address %"PRIx64"\n",
>>                           rmrru->base_address, rmrru->end_address);
>> +            /*
>> +             * TODO: we may provide a precise paramter just to reserve
>> +             * RMRR range specific to one device.
>> +             */
>> +            dprintk(XENLOG_WARNING VTDPREFIX,
>> +                    "So please set pci_rdmforce to reserve these ranges"
>> +                    " if you need such a device in hotplug case.\n");
>
> It makes no sense to use dprintk() here. I also don't see how this
> message relates to whatever may have been logged immediately
> before, so the wording ("So please set ...") is questionable. Nor is the
> reference to "hotplug case" meaningful here - in this context, only
> physical (host) device hotplug can be meant without further
> qualification. In the end I think trying to log something here is just
> wrong - simply drop the message and make sure whatever you want

Okay.

> to say can be found easily by looking elsewhere.

Maybe we can print something in case when we can't set those identified 
mapping successfully, but its too late...

>
>> --- a/xen/include/asm-x86/hvm/domain.h
>> +++ b/xen/include/asm-x86/hvm/domain.h
>> @@ -90,6 +90,10 @@ struct hvm_domain {
>>       /* Cached CF8 for guest PCI config cycles */
>>       uint32_t                pci_cf8;
>>
>> +    bool_t                  pci_force;
>> +    uint32_t                num_pcidevs;
>> +    struct xen_guest_pcidev_info      *pcidevs;
>
> Without a comment all these field names are pretty questionable.

Yeah, I try to add some comments,

     /* A global flag, we need to check/reserve all Reserved Device 
Memory. */
     bool_t                  pci_force;
     /*
      * If pci_force is 0, this represents how many pci devices we need
      * to check/reserve Reserved Device Memory.
      * If pci_force is 1, this is always 0.
      */
     uint32_t                num_pcidevs;
     /* This represents those pci devices instances when num_pcidevs != 
0. */
     struct xen_guest_pcidev_info      *pcidevs;

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map
  2014-12-02 19:47   ` Konrad Rzeszutek Wilk
@ 2014-12-08  6:17     ` Chen, Tiejun
  2014-12-08 10:00       ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  6:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, jbeulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	ian.jackson, tim, xen-devel, yang.z.zhang

On 2014/12/3 3:47, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 01, 2014 at 05:24:21PM +0800, Tiejun Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>>
>> This is a prerequisite for punching holes into HVM and PVH guests' P2M
>> to allow passing through devices that are associated with (on VT-d)
>> RMRRs.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Acked-by: Kevin Tian <kevin.tian@intel.com>
>> ---

[snip]

>> @@ -1101,6 +1129,29 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           break;
>>       }
>>
>> +#ifdef HAS_PASSTHROUGH
>> +    case XENMEM_reserved_device_memory_map:
>> +    {
>> +        struct get_reserved_device_memory grdm;
>> +
>> +        if ( copy_from_guest(&grdm.map, arg, 1) ||
>> +             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
>> +            return -EFAULT;
>> +
>
> Shouldn't there be an XSM check here?

I'm not sure, and Jan should be the author for this patch, so Jan can 
give you a correct reply.

>
>> +        grdm.used_entries = 0;
>> +        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
>> +                                              &grdm);
>> +
>
> Also since we doing an iteration over possible many nr_entries should
> we think about returning -EAGAIN to user-space so that it can retry?

Yes,

> (As in, have preemption baked in this hypercall)
>
>> +        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>> +            rc = -ENOBUFS;

we have this return value.

Thanks
Tiejun

>> +        grdm.map.nr_entries = grdm.used_entries;
>> +        if ( __copy_to_guest(arg, &grdm.map, 1) )
>> +            rc = -EFAULT;
>> +
>> +        break;
>> +    }
>> +#endif
>> +
>>       default:
>>           rc = arch_memory_op(cmd, arg);
>>           break;
>> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
>> index cc12735..7c17e8d 100644
>> --- a/xen/drivers/passthrough/iommu.c
>> +++ b/xen/drivers/passthrough/iommu.c
>> @@ -344,6 +344,16 @@ void iommu_crash_shutdown(void)
>>       iommu_enabled = iommu_intremap = 0;
>>   }
>>
>> +int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>> +{
>> +    const struct iommu_ops *ops = iommu_get_ops();
>> +
>> +    if ( !iommu_enabled || !ops->get_reserved_device_memory )
>> +        return 0;
>> +
>> +    return ops->get_reserved_device_memory(func, ctxt);
>> +}
>> +
>>   bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
>>   {
>>       const struct hvm_iommu *hd = domain_hvm_iommu(d);
>> diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
>> index 5e41e7a..86cfad3 100644
>> --- a/xen/drivers/passthrough/vtd/dmar.c
>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>> @@ -901,3 +901,20 @@ int platform_supports_x2apic(void)
>>       unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
>>       return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
>>   }
>> +
>> +int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>> +{
>> +    struct acpi_rmrr_unit *rmrr;
>> +    int rc = 0;
>> +
>> +    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
>> +    {
>> +        rc = func(PFN_DOWN(rmrr->base_address),
>> +                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
>> +                  ctxt);
>> +        if ( rc )
>> +            break;
>> +    }
>> +
>> +    return rc;
>> +}
>> diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
>> index 5524dba..f9ee9b0 100644
>> --- a/xen/drivers/passthrough/vtd/extern.h
>> +++ b/xen/drivers/passthrough/vtd/extern.h
>> @@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
>>                                  u8 bus, u8 devfn, const struct pci_dev *);
>>   int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
>>                                u8 bus, u8 devfn);
>> +int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
>>
>>   unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
>>   void io_apic_write_remap_rte(unsigned int apic,
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
>> index 19d8165..a38f201 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
>>       .crash_shutdown = vtd_crash_shutdown,
>>       .iotlb_flush = intel_iommu_iotlb_flush,
>>       .iotlb_flush_all = intel_iommu_iotlb_flush_all,
>> +    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
>>       .dump_p2m_table = vtd_dump_p2m_table,
>>   };
>>
>> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
>> index 595f953..cee4535 100644
>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -572,7 +572,29 @@ struct xen_vnuma_topology_info {
>>   typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>>
>> -/* Next available subop number is 27 */
>> +/*
>> + * For legacy reasons, some devices must be configured with special memory
>> + * regions to function correctly.  The guest must avoid using any of these
>> + * regions.
>> + */
>> +#define XENMEM_reserved_device_memory_map   27
>> +struct xen_reserved_device_memory {
>> +    xen_pfn_t start_pfn;
>> +    xen_ulong_t nr_pages;
>> +};
>> +typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
>> +
>> +struct xen_reserved_device_memory_map {
>> +    /* IN/OUT */
>> +    unsigned int nr_entries;
>> +    /* OUT */
>> +    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
>> +};
>> +typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
>> +
>> +/* Next available subop number is 28 */
>>
>>   #endif /* __XEN_PUBLIC_MEMORY_H__ */
>>
>> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
>> index 8eb764a..409f6f8 100644
>> --- a/xen/include/xen/iommu.h
>> +++ b/xen/include/xen/iommu.h
>> @@ -120,6 +120,8 @@ void iommu_dt_domain_destroy(struct domain *d);
>>
>>   struct page_info;
>>
>> +typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, void *ctxt);
>> +
>>   struct iommu_ops {
>>       int (*init)(struct domain *d);
>>       void (*hwdom_init)(struct domain *d);
>> @@ -156,12 +158,14 @@ struct iommu_ops {
>>       void (*crash_shutdown)(void);
>>       void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
>>       void (*iotlb_flush_all)(struct domain *d);
>> +    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
>>       void (*dump_p2m_table)(struct domain *d);
>>   };
>>
>>   void iommu_suspend(void);
>>   void iommu_resume(void);
>>   void iommu_crash_shutdown(void);
>> +int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
>>
>>   void iommu_share_p2m_table(struct domain *d);
>>
>> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
>> index 41b3e35..42229fd 100644
>> --- a/xen/include/xlat.lst
>> +++ b/xen/include/xlat.lst
>> @@ -61,9 +61,10 @@
>>   !	memory_exchange			memory.h
>>   !	memory_map			memory.h
>>   !	memory_reservation		memory.h
>> -?	mem_access_op		memory.h
>> +?	mem_access_op			memory.h
>>   !	pod_target			memory.h
>>   !	remove_from_physmap		memory.h
>> +!	reserved_device_memory_map	memory.h
>>   ?	physdev_eoi			physdev.h
>>   ?	physdev_get_free_pirq		physdev.h
>>   ?	physdev_irq			physdev.h
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-02  8:46   ` Tian, Kevin
@ 2014-12-08  6:22     ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  6:22 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

On 2014/12/2 16:46, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Monday, December 01, 2014 5:24 PM
>>
>> After we intend to expost that hypercall explicitly based on
>> XEN_DOMCTL_set_rdm, we need this rebase. I hope we can squash
>> this into that previous patch once Jan Ack this.
>
> better to merge together, since it's the right thing to do based on previous
> discussion.

As I said I will do this after this patch is acked.

>
> one question about 'd->arch.hvm_domain.pci_force'. My impression is
> that this flag enables force check, and while enabled, you'll always

Yes.

> do selected BDF filtering by default. However from below code, seems

No.

> pci_force is used to whether report all or selected regions. Am I reading
> it wrong?

	if ( d->arch.hvm_domain.pci_force )
	{
		...
	}
	else
	{
		for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
		...
	}

So by default, we first check d->arch.hvm_domain.pci_force without 
filtering all selected BDF :)

Thanks
Tiejun

>
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   xen/common/compat/memory.c         | 75
>> ++++++++++++++++++++++++++++++--------
>>   xen/common/memory.c                | 71
>> +++++++++++++++++++++++++++++-------
>>   xen/drivers/passthrough/vtd/dmar.c | 32 ++++++++++++----
>>   xen/include/public/memory.h        |  5 +++
>>   xen/include/xen/iommu.h            |  2 +-
>>   xen/include/xen/pci.h              |  2 +
>>   6 files changed, 148 insertions(+), 39 deletions(-)
>>
>> diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
>> index 60512fa..e6a256e 100644
>> --- a/xen/common/compat/memory.c
>> +++ b/xen/common/compat/memory.c
>> @@ -22,27 +22,66 @@ struct get_reserved_device_memory {
>>       unsigned int used_entries;
>>   };
>>
>> -static int get_reserved_device_memory(xen_pfn_t start,
>> -                                      xen_ulong_t nr, void *ctxt)
>> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>> +                                      u32 id, void *ctxt)
>>   {
>>       struct get_reserved_device_memory *grdm = ctxt;
>> +    struct domain *d;
>> +    unsigned int i;
>> +    u32 sbdf;
>> +    struct compat_reserved_device_memory rdm = {
>> +        .start_pfn = start, .nr_pages = nr
>> +    };
>>
>> -    if ( grdm->used_entries < grdm->map.nr_entries )
>> -    {
>> -        struct compat_reserved_device_memory rdm = {
>> -            .start_pfn = start, .nr_pages = nr
>> -        };
>> +    if ( rdm.start_pfn != start || rdm.nr_pages != nr )
>> +        return -ERANGE;
>>
>> -        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
>> -            return -ERANGE;
>> +    d = rcu_lock_domain_by_any_id(grdm->map.domid);
>> +    if ( d == NULL )
>> +        return -ESRCH;
>>
>> -        if ( __copy_to_compat_offset(grdm->map.buffer,
>> grdm->used_entries,
>> -                                     &rdm, 1) )
>> -            return -EFAULT;
>> +    if ( d )
>> +    {
>> +        if ( d->arch.hvm_domain.pci_force )
>> +        {
>> +            if ( grdm->used_entries < grdm->map.nr_entries )
>> +            {
>> +                if ( __copy_to_compat_offset(grdm->map.buffer,
>> +                                             grdm->used_entries,
>> +                                             &rdm, 1) )
>> +                {
>> +                    rcu_unlock_domain(d);
>> +                    return -EFAULT;
>> +                }
>> +            }
>> +            ++grdm->used_entries;
>> +        }
>> +        else
>> +        {
>> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>> +            {
>> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>> +                                 d->arch.hvm_domain.pcidevs[i].bus,
>> +
>> d->arch.hvm_domain.pcidevs[i].devfn);
>> +                if ( sbdf == id )
>> +                {
>> +                    if ( grdm->used_entries < grdm->map.nr_entries )
>> +                    {
>> +                        if
>> ( __copy_to_compat_offset(grdm->map.buffer,
>> +
>> grdm->used_entries,
>> +                                                     &rdm, 1) )
>> +                        {
>> +                            rcu_unlock_domain(d);
>> +                            return -EFAULT;
>> +                        }
>> +                    }
>> +                    ++grdm->used_entries;
>> +                }
>> +            }
>> +        }
>>       }
>>
>> -    ++grdm->used_entries;
>> -
>> +    rcu_unlock_domain(d);
>>       return 0;
>>   }
>>   #endif
>> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd,
>> XEN_GUEST_HANDLE_PARAM(void) compat)
>>
>>               if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>>                   rc = -ENOBUFS;
>> +
>>               grdm.map.nr_entries = grdm.used_entries;
>> -            if ( __copy_to_guest(compat, &grdm.map, 1) )
>> -                rc = -EFAULT;
>> +            if ( grdm.map.nr_entries )
>> +            {
>> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
>> +                    rc = -EFAULT;
>> +            }
>>
>>               return rc;
>>           }
>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>> index 4788acc..9ce82b1 100644
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -698,24 +698,63 @@ struct get_reserved_device_memory {
>>       unsigned int used_entries;
>>   };
>>
>> -static int get_reserved_device_memory(xen_pfn_t start,
>> -                                      xen_ulong_t nr, void *ctxt)
>> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>> +                                      u32 id, void *ctxt)
>>   {
>>       struct get_reserved_device_memory *grdm = ctxt;
>> +    struct domain *d;
>> +    unsigned int i;
>> +    u32 sbdf;
>> +    struct xen_reserved_device_memory rdm = {
>> +        .start_pfn = start, .nr_pages = nr
>> +    };
>>
>> -    if ( grdm->used_entries < grdm->map.nr_entries )
>> -    {
>> -        struct xen_reserved_device_memory rdm = {
>> -            .start_pfn = start, .nr_pages = nr
>> -        };
>> +    d = rcu_lock_domain_by_any_id(grdm->map.domid);
>> +    if ( d == NULL )
>> +        return -ESRCH;
>>
>> -        if ( __copy_to_guest_offset(grdm->map.buffer,
>> grdm->used_entries,
>> -                                    &rdm, 1) )
>> -            return -EFAULT;
>> +    if ( d )
>> +    {
>> +        if ( d->arch.hvm_domain.pci_force )
>> +        {
>> +            if ( grdm->used_entries < grdm->map.nr_entries )
>> +            {
>> +                if ( __copy_to_guest_offset(grdm->map.buffer,
>> +                                            grdm->used_entries,
>> +                                            &rdm, 1) )
>> +                {
>> +                    rcu_unlock_domain(d);
>> +                    return -EFAULT;
>> +                }
>> +            }
>> +            ++grdm->used_entries;
>> +        }
>> +        else
>> +        {
>> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>> +            {
>> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>> +                                 d->arch.hvm_domain.pcidevs[i].bus,
>> +
>> d->arch.hvm_domain.pcidevs[i].devfn);
>> +                if ( sbdf == id )
>> +                {
>> +                    if ( grdm->used_entries < grdm->map.nr_entries )
>> +                    {
>> +                        if ( __copy_to_guest_offset(grdm->map.buffer,
>> +
>> grdm->used_entries,
>> +                                                    &rdm, 1) )
>> +                        {
>> +                            rcu_unlock_domain(d);
>> +                            return -EFAULT;
>> +                        }
>> +                    }
>> +                    ++grdm->used_entries;
>> +                }
>> +            }
>> +        }
>>       }
>>
>> -    ++grdm->used_entries;
>> -
>> +    rcu_unlock_domain(d);
>>       return 0;
>>   }
>>   #endif
>> @@ -1144,9 +1183,13 @@ long do_memory_op(unsigned long cmd,
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>
>>           if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>>               rc = -ENOBUFS;
>> +
>>           grdm.map.nr_entries = grdm.used_entries;
>> -        if ( __copy_to_guest(arg, &grdm.map, 1) )
>> -            rc = -EFAULT;
>> +        if ( grdm.map.nr_entries )
>> +        {
>> +            if ( __copy_to_guest(arg, &grdm.map, 1) )
>> +                rc = -EFAULT;
>> +        }
>>
>>           break;
>>       }
>> diff --git a/xen/drivers/passthrough/vtd/dmar.c
>> b/xen/drivers/passthrough/vtd/dmar.c
>> index 86cfad3..c5bc8d6 100644
>> --- a/xen/drivers/passthrough/vtd/dmar.c
>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
>>
>>   int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void
>> *ctxt)
>>   {
>> -    struct acpi_rmrr_unit *rmrr;
>> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>>       int rc = 0;
>> +    unsigned int i;
>> +    u16 bdf;
>>
>> -    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
>> +    for_each_rmrr_device ( rmrr, bdf, i )
>>       {
>> -        rc = func(PFN_DOWN(rmrr->base_address),
>> -                  PFN_UP(rmrr->end_address) -
>> PFN_DOWN(rmrr->base_address),
>> -                  ctxt);
>> -        if ( rc )
>> -            break;
>> +        if ( rmrr != rmrr_cur )
>> +        {
>> +            rc = func(PFN_DOWN(rmrr->base_address),
>> +                      PFN_UP(rmrr->end_address) -
>> +                        PFN_DOWN(rmrr->base_address),
>> +                      PCI_SBDF(rmrr->segment, bdf),
>> +                      ctxt);
>> +
>> +            if ( unlikely(rc < 0) )
>> +                return rc;
>> +
>> +            /* Just go next. */
>> +            if ( !rc )
>> +                rmrr_cur = rmrr;
>> +
>> +            /* Now just return specific to user requirement. */
>> +            if ( rc > 0 )
>> +                return rc;
>> +        }
>>       }
>>
>> -    return rc;
>> +    return 0;
>>   }
>> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
>> index cee4535..0d0544e 100644
>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory
>> xen_reserved_device_memory_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
>>
>>   struct xen_reserved_device_memory_map {
>> +    /*
>> +     * Domain whose reservation is being changed.
>> +     * Unprivileged domains can specify only DOMID_SELF.
>> +     */
>> +    domid_t        domid;
>>       /* IN/OUT */
>>       unsigned int nr_entries;
>>       /* OUT */
>> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
>> index 409f6f8..8fc6d6d 100644
>> --- a/xen/include/xen/iommu.h
>> +++ b/xen/include/xen/iommu.h
>> @@ -120,7 +120,7 @@ void iommu_dt_domain_destroy(struct domain *d);
>>
>>   struct page_info;
>>
>> -typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, void *ctxt);
>> +typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void
>> *ctxt);
>>
>>   struct iommu_ops {
>>       int (*init)(struct domain *d);
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index 5f295f3..d34205f 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -31,6 +31,8 @@
>>   #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
>>   #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
>>   #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
>> +#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
>> +#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
>>
>>   struct pci_dev_info {
>>       bool_t is_extfn;
>> --
>> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-04 15:50   ` Jan Beulich
@ 2014-12-08  7:11     ` Chen, Tiejun
  2014-12-08  8:51       ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  7:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/4 23:50, Jan Beulich wrote:
>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> --- a/xen/common/compat/memory.c
>> +++ b/xen/common/compat/memory.c
>> @@ -22,27 +22,66 @@ struct get_reserved_device_memory {
>>       unsigned int used_entries;
>>   };
>>
>> -static int get_reserved_device_memory(xen_pfn_t start,
>> -                                      xen_ulong_t nr, void *ctxt)
>> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>> +                                      u32 id, void *ctxt)
>>   {
>>       struct get_reserved_device_memory *grdm = ctxt;
>> +    struct domain *d;
>> +    unsigned int i;
>> +    u32 sbdf;
>> +    struct compat_reserved_device_memory rdm = {
>> +        .start_pfn = start, .nr_pages = nr
>> +    };
>>
>> -    if ( grdm->used_entries < grdm->map.nr_entries )
>> -    {
>> -        struct compat_reserved_device_memory rdm = {
>> -            .start_pfn = start, .nr_pages = nr
>> -        };
>> +    if ( rdm.start_pfn != start || rdm.nr_pages != nr )
>> +        return -ERANGE;
>>
>> -        if ( rdm.start_pfn != start || rdm.nr_pages != nr )
>> -            return -ERANGE;
>> +    d = rcu_lock_domain_by_any_id(grdm->map.domid);
>> +    if ( d == NULL )
>> +        return -ESRCH;
>
> So why are you doing this in the call back (potentially many times)
> instead of just once in compat_memory_op(), storing the pointer in
> the context structure?

Okay.

>
>>
>> -        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
>> -                                     &rdm, 1) )
>> -            return -EFAULT;
>> +    if ( d )
>> +    {
>> +        if ( d->arch.hvm_domain.pci_force )
>
> You didn't verify that the domain is a HVM/PVH one.

Is this, ASSERT(is_hvm_domain(grdm.domain)), correct?

>
>> +        {
>> +            if ( grdm->used_entries < grdm->map.nr_entries )
>> +            {
>> +                if ( __copy_to_compat_offset(grdm->map.buffer,
>> +                                             grdm->used_entries,
>> +                                             &rdm, 1) )
>> +                {
>> +                    rcu_unlock_domain(d);
>> +                    return -EFAULT;
>> +                }
>> +            }
>> +            ++grdm->used_entries;
>> +        }
>> +        else
>> +        {
>> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>> +            {
>> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>> +                                 d->arch.hvm_domain.pcidevs[i].bus,
>> +                                 d->arch.hvm_domain.pcidevs[i].devfn);
>> +                if ( sbdf == id )
>> +                {
>> +                    if ( grdm->used_entries < grdm->map.nr_entries )
>> +                    {
>> +                        if ( __copy_to_compat_offset(grdm->map.buffer,
>> +                                                     grdm->used_entries,
>> +                                                     &rdm, 1) )
>> +                        {
>> +                            rcu_unlock_domain(d);
>> +                            return -EFAULT;
>> +                        }
>> +                    }
>> +                    ++grdm->used_entries;
>
> break;

Added.

>
> Also trying to fold code identical on the if and else branches would
> seem pretty desirable.

Sorry, I can't see what I'm missing.

>
>> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>>
>>               if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>>                   rc = -ENOBUFS;
>> +
>>               grdm.map.nr_entries = grdm.used_entries;
>> -            if ( __copy_to_guest(compat, &grdm.map, 1) )
>> -                rc = -EFAULT;
>> +            if ( grdm.map.nr_entries )
>> +            {
>> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
>> +                    rc = -EFAULT;
>> +            }
>
> Why do you need this change?

If we have no any entries, why do we still copy that?

>
>> --- a/xen/drivers/passthrough/vtd/dmar.c
>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
>>
>>   int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>>   {
>> -    struct acpi_rmrr_unit *rmrr;
>> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>>       int rc = 0;
>> +    unsigned int i;
>> +    u16 bdf;
>>
>> -    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
>> +    for_each_rmrr_device ( rmrr, bdf, i )
>>       {
>> -        rc = func(PFN_DOWN(rmrr->base_address),
>> -                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
>> -                  ctxt);
>> -        if ( rc )
>> -            break;
>> +        if ( rmrr != rmrr_cur )
>> +        {
>> +            rc = func(PFN_DOWN(rmrr->base_address),
>> +                      PFN_UP(rmrr->end_address) -
>> +                        PFN_DOWN(rmrr->base_address),
>> +                      PCI_SBDF(rmrr->segment, bdf),
>> +                      ctxt);
>> +
>> +            if ( unlikely(rc < 0) )
>> +                return rc;
>> +
>> +            /* Just go next. */
>> +            if ( !rc )
>> +                rmrr_cur = rmrr;
>> +
>> +            /* Now just return specific to user requirement. */
>> +            if ( rc > 0 )
>> +                return rc;
>
> Nice that you check for that, but I can't see this case occurring
> anymore. Did you lose some code? Also please don't write code

We have three scenarios here:

#1 rc < 0 means this is an error;
#2 rc = 0 means the tools caller don't know how many buffers it should 
construct, so we need to count all entries as 'nr_entries' to return.
#3 rc > 0 means in some cases, we need to return some specific values, 
like 1 to indicate we're hitting some RMRR range. Currently, we use gfn 
to check this in case of memory populating, ept violation handler and 
mem_access.

> more complicated than necessary. The above two if()s could be
>
>
> +            if ( rc > 0 )
> +                return rc;
> +
> +            rmrr_cur = rmrr;
>
>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory
>> xen_reserved_device_memory_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
>>
>>   struct xen_reserved_device_memory_map {
>> +    /*
>> +     * Domain whose reservation is being changed.
>> +     * Unprivileged domains can specify only DOMID_SELF.
>> +     */
>> +    domid_t        domid;
>>       /* IN/OUT */
>>       unsigned int nr_entries;
>>       /* OUT */
>
> Your addition lacks an IN annotation.

Are you saying something for 'nr_entries'? But I didn't introduce 
anything to change the original usage. Anyway, I try to improve this,

     /*
      * IN: on call the number of entries which can be stored in buffer.
      * OUT: on return the number of entries which have been stored in
      * buffer. If on call the number is less the number of all necessary
      * entries, on return the number of entries which is needed.
      */


>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -31,6 +31,8 @@
>>   #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
>>   #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
>>   #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
>> +#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
>> +#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
>
> Missing several parentheses.
>

Okay,

#define PCI_SBDF(s,bdf) ((((s) & 0xffff) << 16) | ((bdf) & 0xffff))
#define PCI_SBDF2(s,b,df) ((((s) & 0xffff) << 16) | PCI_BDF2(b,df))


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map
  2014-12-02 19:50   ` Konrad Rzeszutek Wilk
@ 2014-12-08  7:25     ` Chen, Tiejun
  2014-12-08 15:52       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  7:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On 2014/12/3 3:50, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 01, 2014 at 05:24:23PM +0800, Tiejun Chen wrote:
>> We will introduce that hypercall xc_reserved_device_memory_map
>> approach to libxc.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxc/include/xenctrl.h |  5 +++++
>>   tools/libxc/xc_domain.c       | 30 ++++++++++++++++++++++++++++++
>>   2 files changed, 35 insertions(+)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 84012fe..a3aeac3 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -1294,6 +1294,11 @@ int xc_domain_set_memory_map(xc_interface *xch,
>>   int xc_get_machine_memory_map(xc_interface *xch,
>>                                 struct e820entry entries[],
>>                                 uint32_t max_entries);
>> +
>> +int xc_reserved_device_memory_map(xc_interface *xch,
>> +                                  uint32_t dom,
>> +                                  struct xen_reserved_device_memory entries[],
>> +                                  uint32_t *max_entries);
>>   #endif
>>   int xc_domain_set_time_offset(xc_interface *xch,
>>                                 uint32_t domid,
>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>> index 7fd43e9..09fd988 100644
>> --- a/tools/libxc/xc_domain.c
>> +++ b/tools/libxc/xc_domain.c
>> @@ -679,6 +679,36 @@ int xc_domain_set_memory_map(xc_interface *xch,
>>
>>       return rc;
>>   }
>> +
>> +int xc_reserved_device_memory_map(xc_interface *xch,
>> +                                  uint32_t domid,
>> +                                  struct xen_reserved_device_memory entries[],
>> +                                  uint32_t *max_entries)
>> +{
>> +    int rc;
>> +    struct xen_reserved_device_memory_map xrdmmap = {
>> +        .domid = domid,
>> +        .nr_entries = *max_entries
>> +    };
>> +    DECLARE_HYPERCALL_BOUNCE(entries,
>> +                             sizeof(struct xen_reserved_device_memory) *
>> +                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>> +
>> +    if ( xc_hypercall_bounce_pre(xch, entries) )
>> +        return -1;
>> +
>> +    set_xen_guest_handle(xrdmmap.buffer, entries);
>> +
>> +    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
>> +                      &xrdmmap, sizeof(xrdmmap));
>> +
>> +    xc_hypercall_bounce_post(xch, entries);
>> +
>> +    *max_entries = xrdmmap.nr_entries;
>> +
>
> I would bake the -EAGAIN support in here to loop here.
>
> See how the xc_domain_destroy does it.

Do you mean this change?

@@ -699,8 +699,10 @@ int xc_reserved_device_memory_map(xc_interface *xch,

      set_xen_guest_handle(xrdmmap.buffer, entries);

-    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
-                      &xrdmmap, sizeof(xrdmmap));
+    do {
+        rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                          &xrdmmap, sizeof(xrdmmap));
+    } while ( rc && (errno == EAGAIN) );

      xc_hypercall_bounce_post(xch, entries);

Thanks
Tiejun

>> +    return rc ? rc : xrdmmap.nr_entries;
>> +}
>> +
>>   int xc_get_machine_memory_map(xc_interface *xch,
>>                                 struct e820entry entries[],
>>                                 uint32_t max_entries)
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory
  2014-12-02 19:55   ` Konrad Rzeszutek Wilk
@ 2014-12-08  7:49     ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  7:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On 2014/12/3 3:55, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 01, 2014 at 05:24:24PM +0800, Tiejun Chen wrote:
>> In case of reserved device memory overlapping with ram, it also probably
>
> s/also//

Fixed.

>> overlap with modules space so we need to check these reserved device
> s/overlap/overlaps/

Fixed.

>
> What is 'modules space'?

Please see modules_init(), and looks it includs acpi and smbios currently.

>
>> memory as well.
>
> s/reserved device memory/E820_RSV/ ?

I don't find we have such a definition.

>
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxc/xc_hvm_build_x86.c | 94 +++++++++++++++++++++++++++++++++++-------
>>   1 file changed, 79 insertions(+), 15 deletions(-)
>>
>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>> index c81a25b..ddcf06d 100644
>> --- a/tools/libxc/xc_hvm_build_x86.c
>> +++ b/tools/libxc/xc_hvm_build_x86.c
>> @@ -54,9 +54,82 @@
>>
>>   #define VGA_HOLE_SIZE (0x20)
>>
>> +/*
>> + * Check whether there exists mmio hole in the specified memory range.
>> + * Returns 1 if exists, else returns 0.
>> + */
>> +static int check_mmio_hole(uint64_t start, uint64_t memsize,
>> +                           uint64_t mmio_start, uint64_t mmio_size)
>> +{
>> +    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
>> +        return 0;
>> +    else
>> +        return 1;
>> +}
>> +
>> +/* Getting all reserved device memory map info. */
>> +static struct xen_reserved_device_memory
>> +*xc_get_reserved_device_memory_map(xc_interface *xch, unsigned int nr_entries,
>> +                                   uint32_t dom)
>> +{
>> +    struct xen_reserved_device_memory *xrdm = NULL;
>> +    int rc = xc_reserved_device_memory_map(xch, dom, xrdm, &nr_entries);
>> +
>> +    if ( rc < 0 )
>> +    {
>> +        if ( errno == ENOBUFS )
>> +        {
>> +            if ( (xrdm = malloc(nr_entries *
>> +                                sizeof(xen_reserved_device_memory_t))) == NULL )
>> +            {
>> +                PERROR("Could not allocate memory.");
>> +                return 0;
>> +            }
>> +            rc = xc_reserved_device_memory_map(xch, dom, xrdm, &nr_entries);
>> +            if ( rc )
>> +            {
>> +                PERROR("Could not get reserved device memory maps.");
>> +                free(xrdm);
>> +                return 0;
>
> Uhhh, is that the right error to return?
>
> Don't you mean ERR_PTR logic? Or 'return NULL' ?

OOPS, return NULL.

>
>
>> +            }
>> +        }
>> +        else
>> +            PERROR("Could not get reserved device memory maps.");
>> +    }
>> +
>> +    return xrdm;
>> +}
>> +
>> +static int xc_check_modules_space(xc_interface *xch, uint64_t *mstart_out,
>> +                                  uint64_t *mend_out, uint32_t dom)
>> +{
>> +    unsigned int i = 0, nr_entries = 0;
>> +    uint64_t rdm_start = 0, rdm_end = 0;
>> +    struct xen_reserved_device_memory *rdm_map =
>> +                        xc_get_reserved_device_memory_map(xch, nr_entries, dom);
>> +
>
> You need to check whether 'rdm_map' is NULL.

You're right.

Actually nr_entries is always 0 if rdm_map is NULL with my original 
design. But now this should be checked as you mentioned, so

+    if ( !rdm_map )
+        return 0;
+

>
>> +    for ( i = 0; i < nr_entries; i++ )
>> +    {
>> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << XC_PAGE_SHIFT;
>> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << XC_PAGE_SHIFT);
>> +
>> +        /* Just use check_mmio_hole() to check modules ranges. */
>> +        if ( check_mmio_hole(rdm_start,
>> +                             rdm_end - rdm_start,
>> +                             *mstart_out, *mend_out) )
>> +            return -1;
>> +    }
>> +
>> +    free(rdm_map);
>> +
>> +    return 0;
>> +}
>> +
>>   static int modules_init(struct xc_hvm_build_args *args,
>>                           uint64_t vend, struct elf_binary *elf,
>> -                        uint64_t *mstart_out, uint64_t *mend_out)
>> +                        uint64_t *mstart_out, uint64_t *mend_out,
>> +                        xc_interface *xch,
>> +                        uint32_t dom)
>>   {
>>   #define MODULE_ALIGN 1UL << 7
>>   #define MB_ALIGN     1UL << 20
>> @@ -80,6 +153,10 @@ static int modules_init(struct xc_hvm_build_args *args,
>>       if ( *mend_out > vend )
>>           return -1;
>>
>> +    /* Is it overlapping with reserved device memory? */
>> +    if ( xc_check_modules_space(xch, mstart_out, mend_out, dom) )
>> +        return -1;
>> +
>>       if ( args->acpi_module.length != 0 )
>>           args->acpi_module.guest_addr_out = *mstart_out;
>>       if ( args->smbios_module.length != 0 )
>> @@ -226,19 +303,6 @@ static int loadmodules(xc_interface *xch,
>>       return rc;
>>   }
>>
>> -/*
>> - * Check whether there exists mmio hole in the specified memory range.
>> - * Returns 1 if exists, else returns 0.
>> - */
>> -static int check_mmio_hole(uint64_t start, uint64_t memsize,
>> -                           uint64_t mmio_start, uint64_t mmio_size)
>> -{
>> -    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
>> -        return 0;
>> -    else
>> -        return 1;
>> -}
>> -
>
> This movement of 'check_mmio_hole' needs to be a seperate patch.

Okay.

Thanks
Tiejun

>
>>   static int setup_guest(xc_interface *xch,
>>                          uint32_t dom, struct xc_hvm_build_args *args,
>>                          char *image, unsigned long image_size)
>> @@ -282,7 +346,7 @@ static int setup_guest(xc_interface *xch,
>>           goto error_out;
>>       }
>>
>> -    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
>> +    if ( modules_init(args, v_end, &elf, &m_start, &m_end, xch, dom) != 0 )
>>       {
>>           ERROR("Insufficient space to load modules.");
>>           goto error_out;
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-02  8:59   ` Tian, Kevin
@ 2014-12-08  7:55     ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  7:55 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

On 2014/12/2 16:59, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Monday, December 01, 2014 5:24 PM
>>
>> We need to use reserved device memory maps with multiple times, so
>> provide just one common function should be friend.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/firmware/hvmloader/util.c | 59
>> +++++++++++++++++++++++++++++++++++++++++
>>   tools/firmware/hvmloader/util.h |  2 ++
>>   2 files changed, 61 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index 80d822f..dd81fb6 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -22,11 +22,14 @@
>>   #include "config.h"
>>   #include "hypercall.h"
>>   #include "ctype.h"
>> +#include "errno.h"
>>   #include <stdint.h>
>>   #include <xen/xen.h>
>>   #include <xen/memory.h>
>>   #include <xen/sched.h>
>>
>> +struct xen_reserved_device_memory *rdm_map;
>> +
>>   void wrmsr(uint32_t idx, uint64_t v)
>>   {
>>       asm volatile (
>> @@ -828,6 +831,62 @@ int hpet_exists(unsigned long hpet_base)
>>       return ((hpet_id >> 16) == 0x8086);
>>   }
>>
>> +static int
>> +get_reserved_device_memory_map(struct xen_reserved_device_memory
>> entries[],
>> +                               uint32_t *max_entries)
>> +{
>> +    int rc;
>> +    struct xen_reserved_device_memory_map xrdmmap = {
>> +        .domid = DOMID_SELF,
>> +        .nr_entries = *max_entries
>> +    };
>> +
>> +    set_xen_guest_handle(xrdmmap.buffer, entries);
>> +
>> +    rc = hypercall_memory_op(XENMEM_reserved_device_memory_map,
>> &xrdmmap);
>> +    *max_entries = xrdmmap.nr_entries;
>> +
>> +    return rc;
>> +}
>> +
>> +/*
>> + * Getting all reserved device memory map info in case of hvmloader.
>> + * We just return zero for any failed cases, and this means we
>> + * can't further handle any reserved device memory.
>> + */
>> +unsigned int hvm_get_reserved_device_memory_map(void)
>> +{
>> +    static unsigned int nr_entries = 0;
>> +    int rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
>> +
>
> if this function is aimed to be invoked once, just check wheher rdm_map
> is valid instead of always issuing a new call.

As I remember Jan thought this doesn't cost more even in multiple times.

>
>> +    if ( rc == -ENOBUFS )
>> +    {
>> +        rdm_map = mem_alloc(nr_entries*sizeof(struct
>> xen_reserved_device_memory),
>> +                            0);
>> +        if ( rdm_map )
>> +        {
>> +            rc = get_reserved_device_memory_map(rdm_map,
>> &nr_entries);
>> +            if ( rc )
>> +            {
>> +                printf("Could not get reserved dev memory info on
>> domain");
>> +                return 0;
>
> why return '0' at failure?

In out real case, we don't want to handle more since its already fine 
with one message.

Thanks
Tiejun

>
>> +            }
>> +        }
>> +        else
>> +        {
>> +            printf("No space to get reserved dev memory maps!\n");
>> +            return 0;
>> +        }
>> +    }
>> +    else if ( rc )
>> +    {
>> +        printf("Could not get reserved dev memory info on domain");
>> +        return 0;
>> +    }
>> +
>> +    return nr_entries;
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/tools/firmware/hvmloader/util.h
>> b/tools/firmware/hvmloader/util.h
>> index a70e4aa..e4f1851 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -241,6 +241,8 @@ int build_e820_table(struct e820entry *e820,
>>                        unsigned int bios_image_base);
>>   void dump_e820_table(struct e820entry *e820, unsigned int nr);
>>
>> +unsigned int hvm_get_reserved_device_memory_map(void);
>> +
>>   #ifndef NDEBUG
>>   void perform_tests(void);
>>   #else
>> --
>> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-02 20:01   ` Konrad Rzeszutek Wilk
@ 2014-12-08  8:09     ` Chen, Tiejun
  2014-12-08  8:45       ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  8:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On 2014/12/3 4:01, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 01, 2014 at 05:24:25PM +0800, Tiejun Chen wrote:
>> We need to use reserved device memory maps with multiple times, so
>> provide just one common function should be friend.
>
> We need to call reserved device memory maps hypercall
> (XENMEM_reserved_device_memory_map) many times, hence provide one common function.

Rephrased and thanks.

>
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/firmware/hvmloader/util.c | 59 +++++++++++++++++++++++++++++++++++++++++
>>   tools/firmware/hvmloader/util.h |  2 ++
>>   2 files changed, 61 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index 80d822f..dd81fb6 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -22,11 +22,14 @@
>>   #include "config.h"
>>   #include "hypercall.h"
>>   #include "ctype.h"
>> +#include "errno.h"
>>   #include <stdint.h>
>>   #include <xen/xen.h>
>>   #include <xen/memory.h>
>>   #include <xen/sched.h>
>>
>> +struct xen_reserved_device_memory *rdm_map;
>> +
>>   void wrmsr(uint32_t idx, uint64_t v)
>>   {
>>       asm volatile (
>> @@ -828,6 +831,62 @@ int hpet_exists(unsigned long hpet_base)
>>       return ((hpet_id >> 16) == 0x8086);
>>   }
>>
>> +static int
>> +get_reserved_device_memory_map(struct xen_reserved_device_memory entries[],
>> +                               uint32_t *max_entries)
>> +{
>> +    int rc;
>> +    struct xen_reserved_device_memory_map xrdmmap = {
>> +        .domid = DOMID_SELF,
>> +        .nr_entries = *max_entries
>> +    };
>> +
>> +    set_xen_guest_handle(xrdmmap.buffer, entries);
>> +
>> +    rc = hypercall_memory_op(XENMEM_reserved_device_memory_map, &xrdmmap);
>> +    *max_entries = xrdmmap.nr_entries;
>
> Don't you want to check rc first before altering 'max_entries' ?

-    *max_entries = xrdmmap.nr_entries;
+    if ( rc == -ENOBUFS )
+        *max_entries = xrdmmap.nr_entries;

>
>> +
>> +    return rc;
>> +}
>> +
>> +/*
>> + * Getting all reserved device memory map info in case of hvmloader.
>> + * We just return zero for any failed cases, and this means we
>> + * can't further handle any reserved device memory.
>
> That does not sound like the right error value. Why not a proper
> return value? At worst you can put 'nr_entries' as an parameter
> and return the error value.

Any caller to use hvm_get_reserved_device_memory_map() don't care that 
real return value, the caller can work just with a return value as 
unsigned int. '0' means we have nothing to do, '>0' should be good to 
handle.

>
>> + */
>> +unsigned int hvm_get_reserved_device_memory_map(void)
>> +{
>> +    static unsigned int nr_entries = 0;
>> +    int rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
>> +
>> +    if ( rc == -ENOBUFS )
>> +    {
>> +        rdm_map = mem_alloc(nr_entries*sizeof(struct xen_reserved_device_memory),
>
> That '*' being squashed looks wrong. Just make it bigger and don't worry about
> the 80 line.

-        rdm_map = mem_alloc(nr_entries*sizeof(struct 
xen_reserved_device_memory),
+        rdm_map = mem_alloc(nr_entries * sizeof(struct 
xen_reserved_device_memory),


Thanks
Tiejun

>
>> +                            0);
>> +        if ( rdm_map )
>> +        {
>> +            rc = get_reserved_device_memory_map(rdm_map, &nr_entries);
>> +            if ( rc )
>> +            {
>> +                printf("Could not get reserved dev memory info on domain");
>> +                return 0;
>> +            }
>> +        }
>> +        else
>> +        {
>> +            printf("No space to get reserved dev memory maps!\n");
>> +            return 0;
>> +        }
>> +    }
>> +    else if ( rc )
>> +    {
>> +        printf("Could not get reserved dev memory info on domain");
>> +        return 0;
>> +    }
>> +
>> +    return nr_entries;
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
>> index a70e4aa..e4f1851 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -241,6 +241,8 @@ int build_e820_table(struct e820entry *e820,
>>                        unsigned int bios_image_base);
>>   void dump_e820_table(struct e820entry *e820, unsigned int nr);
>>
>> +unsigned int hvm_get_reserved_device_memory_map(void);
>> +
>>   #ifndef NDEBUG
>>   void perform_tests(void);
>>   #else
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-08  6:06     ` Chen, Tiejun
@ 2014-12-08  8:43       ` Jan Beulich
  2014-12-09  2:38         ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-08  8:43 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 08.12.14 at 07:06, <tiejun.chen@intel.com> wrote:
> On 2014/12/4 23:33, Jan Beulich wrote:
>>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
> Looks this could be fine,
> 
> d->arch.hvm_domain.pci_force = xdsr->flags & PCI_DEV_RDM_CHECK;

Which is correct only because PCI_DEV_RDM_CHECK happens to be
1. Such hidden dependencies shouldn't be introduced though, in
particular to avoid others then cloning the code for a flag that's not
1. The canonical form (found in many places throughout the treei

    d->arch.hvm_domain.pci_force = !!(xdsr->flags & PCI_DEV_RDM_CHECK);

>>> +        d->arch.hvm_domain.pcidevs = NULL;
>>> +
>>> +        if ( xdsr->num_pcidevs )
>>> +        {
>>> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
>>> +                                    xdsr->num_pcidevs);
>>
>> New domctl-s must not represent security risks: xdsr->num_pcidevs
>> can be (almost) arbitrarily large - do you really want to allow such
>> huge allocations? A reasonable upper bound could for example be
> 
> Sorry, as you know this num_pcidevs is from tools, and actually it share 
> that result from that existing hypercall, assign_device, while parsing 
> 'pci=[]', so I couldn't understand why this can be (almost" arbitrarily 
> large.

You imply well behaved tools, which you shouldn't when viewing
things from a security perspective.

>> the total number of PCI devices the hypervisor knows about.
> 
> I take a quick look at this but looks we have no this exact value that 
> we can get directly.

You need some upper bound. Whether you introduce a properly
maintained count or a suitable estimate thereof doesn't matter.

>>> --- a/xen/include/asm-x86/hvm/domain.h
>>> +++ b/xen/include/asm-x86/hvm/domain.h
>>> @@ -90,6 +90,10 @@ struct hvm_domain {
>>>       /* Cached CF8 for guest PCI config cycles */
>>>       uint32_t                pci_cf8;
>>>
>>> +    bool_t                  pci_force;
>>> +    uint32_t                num_pcidevs;
>>> +    struct xen_guest_pcidev_info      *pcidevs;
>>
>> Without a comment all these field names are pretty questionable.
> 
> Yeah, I try to add some comments,
> 
>      /* A global flag, we need to check/reserve all Reserved Device 
> Memory. */
>      bool_t                  pci_force;
>      /*
>       * If pci_force is 0, this represents how many pci devices we need
>       * to check/reserve Reserved Device Memory.
>       * If pci_force is 1, this is always 0.
>       */
>      uint32_t                num_pcidevs;
>      /* This represents those pci devices instances when num_pcidevs != 
> 0. */
>      struct xen_guest_pcidev_info      *pcidevs;

I really didn't necessarily mean individual comments - one for the whole
group would suffice.

Also I don't think pci_force is really the right name - all_pcidevs or
some such would seem more suitable following the entire series.
And finally, I'm generally advocating for avoiding redundant data
items - I'm sure this "all" notion can be encoded reasonable with
just the other two field (and a suitable checking macro).

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-08  8:09     ` Chen, Tiejun
@ 2014-12-08  8:45       ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  8:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On 2014/12/8 16:09, Chen, Tiejun wrote:
> On 2014/12/3 4:01, Konrad Rzeszutek Wilk wrote:
>> On Mon, Dec 01, 2014 at 05:24:25PM +0800, Tiejun Chen wrote:
>>> We need to use reserved device memory maps with multiple times, so
>>> provide just one common function should be friend.
>>
>> We need to call reserved device memory maps hypercall
>> (XENMEM_reserved_device_memory_map) many times, hence provide one
>> common function.
>
> Rephrased and thanks.
>
>>
>>>
>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>> ---
>>>   tools/firmware/hvmloader/util.c | 59
>>> +++++++++++++++++++++++++++++++++++++++++
>>>   tools/firmware/hvmloader/util.h |  2 ++
>>>   2 files changed, 61 insertions(+)
>>>
>>> diff --git a/tools/firmware/hvmloader/util.c
>>> b/tools/firmware/hvmloader/util.c
>>> index 80d822f..dd81fb6 100644
>>> --- a/tools/firmware/hvmloader/util.c
>>> +++ b/tools/firmware/hvmloader/util.c
>>> @@ -22,11 +22,14 @@
>>>   #include "config.h"
>>>   #include "hypercall.h"
>>>   #include "ctype.h"
>>> +#include "errno.h"
>>>   #include <stdint.h>
>>>   #include <xen/xen.h>
>>>   #include <xen/memory.h>
>>>   #include <xen/sched.h>
>>>
>>> +struct xen_reserved_device_memory *rdm_map;
>>> +
>>>   void wrmsr(uint32_t idx, uint64_t v)
>>>   {
>>>       asm volatile (
>>> @@ -828,6 +831,62 @@ int hpet_exists(unsigned long hpet_base)
>>>       return ((hpet_id >> 16) == 0x8086);
>>>   }
>>>
>>> +static int
>>> +get_reserved_device_memory_map(struct xen_reserved_device_memory
>>> entries[],
>>> +                               uint32_t *max_entries)
>>> +{
>>> +    int rc;
>>> +    struct xen_reserved_device_memory_map xrdmmap = {
>>> +        .domid = DOMID_SELF,
>>> +        .nr_entries = *max_entries
>>> +    };
>>> +
>>> +    set_xen_guest_handle(xrdmmap.buffer, entries);
>>> +
>>> +    rc = hypercall_memory_op(XENMEM_reserved_device_memory_map,
>>> &xrdmmap);
>>> +    *max_entries = xrdmmap.nr_entries;
>>
>> Don't you want to check rc first before altering 'max_entries' ?
>
> -    *max_entries = xrdmmap.nr_entries;
> +    if ( rc == -ENOBUFS )
> +        *max_entries = xrdmmap.nr_entries;
>

Something remind me. That is, in all case we should set max_entries 
since now we don't count all RMRR ranges again. Instead, we may count 
those RMRR ranges associated to a assigned device but obviously, it may 
have no any RMRR. So we always should update max_entries as well.

Thanks
Tiejun

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-08  7:11     ` Chen, Tiejun
@ 2014-12-08  8:51       ` Jan Beulich
  2014-12-09  7:47         ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-08  8:51 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 08.12.14 at 08:11, <tiejun.chen@intel.com> wrote:
> On 2014/12/4 23:50, Jan Beulich wrote:
>>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>>>
>>> -        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
>>> -                                     &rdm, 1) )
>>> -            return -EFAULT;
>>> +    if ( d )
>>> +    {
>>> +        if ( d->arch.hvm_domain.pci_force )
>>
>> You didn't verify that the domain is a HVM/PVH one.
> 
> Is this, ASSERT(is_hvm_domain(grdm.domain)), correct?

Certainly not, or do you want to crash the hypervisor because of bad
tools input?

>>> +        {
>>> +            if ( grdm->used_entries < grdm->map.nr_entries )
>>> +            {
>>> +                if ( __copy_to_compat_offset(grdm->map.buffer,
>>> +                                             grdm->used_entries,
>>> +                                             &rdm, 1) )
>>> +                {
>>> +                    rcu_unlock_domain(d);
>>> +                    return -EFAULT;
>>> +                }
>>> +            }
>>> +            ++grdm->used_entries;
>>> +        }
>>> +        else
>>> +        {
>>> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>>> +            {
>>> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>>> +                                 d->arch.hvm_domain.pcidevs[i].bus,
>>> +                                 d->arch.hvm_domain.pcidevs[i].devfn);
>>> +                if ( sbdf == id )
>>> +                {
>>> +                    if ( grdm->used_entries < grdm->map.nr_entries )
>>> +                    {
>>> +                        if ( __copy_to_compat_offset(grdm->map.buffer,
>>> +                                                     grdm->used_entries,
>>> +                                                     &rdm, 1) )
>>> +                        {
>>> +                            rcu_unlock_domain(d);
>>> +                            return -EFAULT;
>>> +                        }
>>> +                    }
>>> +                    ++grdm->used_entries;
>>
>> break;
> 
> Added.
> 
>>
>> Also trying to fold code identical on the if and else branches would
>> seem pretty desirable.
> 
> Sorry, I can't see what I'm missing.

The whole "if-copy-unlock-and-return-EFAULT-otherwise-increment"
is identical and can be factored out pretty easily afaict.

>>> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) compat)
>>>
>>>               if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>>>                   rc = -ENOBUFS;
>>> +
>>>               grdm.map.nr_entries = grdm.used_entries;
>>> -            if ( __copy_to_guest(compat, &grdm.map, 1) )
>>> -                rc = -EFAULT;
>>> +            if ( grdm.map.nr_entries )
>>> +            {
>>> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
>>> +                    rc = -EFAULT;
>>> +            }
>>
>> Why do you need this change?
> 
> If we have no any entries, why do we still copy that?

That's not only a pointless optimization (the counter question being
"Why add an extra conditional when the copying does no harm?"), but
also not subject of this patch. Additionally iirc the field is an IN/OUT,
i.e. when no entries were found you want to tell the caller so.

>>> --- a/xen/drivers/passthrough/vtd/dmar.c
>>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>>> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
>>>
>>>   int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>>>   {
>>> -    struct acpi_rmrr_unit *rmrr;
>>> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>>>       int rc = 0;
>>> +    unsigned int i;
>>> +    u16 bdf;
>>>
>>> -    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
>>> +    for_each_rmrr_device ( rmrr, bdf, i )
>>>       {
>>> -        rc = func(PFN_DOWN(rmrr->base_address),
>>> -                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
>>> -                  ctxt);
>>> -        if ( rc )
>>> -            break;
>>> +        if ( rmrr != rmrr_cur )
>>> +        {
>>> +            rc = func(PFN_DOWN(rmrr->base_address),
>>> +                      PFN_UP(rmrr->end_address) -
>>> +                        PFN_DOWN(rmrr->base_address),
>>> +                      PCI_SBDF(rmrr->segment, bdf),
>>> +                      ctxt);
>>> +
>>> +            if ( unlikely(rc < 0) )
>>> +                return rc;
>>> +
>>> +            /* Just go next. */
>>> +            if ( !rc )
>>> +                rmrr_cur = rmrr;
>>> +
>>> +            /* Now just return specific to user requirement. */
>>> +            if ( rc > 0 )
>>> +                return rc;
>>
>> Nice that you check for that, but I can't see this case occurring
>> anymore. Did you lose some code? Also please don't write code
> 
> We have three scenarios here:
> 
> #1 rc < 0 means this is an error;
> #2 rc = 0 means the tools caller don't know how many buffers it should 
> construct, so we need to count all entries as 'nr_entries' to return.
> #3 rc > 0 means in some cases, we need to return some specific values, 
> like 1 to indicate we're hitting some RMRR range. Currently, we use gfn 
> to check this in case of memory populating, ept violation handler and 
> mem_access.

Yes, I saw that you make use of this in later patches. It just seemed
suspicious that you don't in this one.

>>> --- a/xen/include/public/memory.h
>>> +++ b/xen/include/public/memory.h
>>> @@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory
>>> xen_reserved_device_memory_t;
>>>   DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
>>>
>>>   struct xen_reserved_device_memory_map {
>>> +    /*
>>> +     * Domain whose reservation is being changed.
>>> +     * Unprivileged domains can specify only DOMID_SELF.
>>> +     */
>>> +    domid_t        domid;
>>>       /* IN/OUT */
>>>       unsigned int nr_entries;
>>>       /* OUT */
>>
>> Your addition lacks an IN annotation.
> 
> Are you saying something for 'nr_entries'? But I didn't introduce 
> anything to change the original usage. Anyway, I try to improve this,
> 
>      /*
>       * IN: on call the number of entries which can be stored in buffer.
>       * OUT: on return the number of entries which have been stored in
>       * buffer. If on call the number is less the number of all necessary
>       * entries, on return the number of entries which is needed.
>       */
> 

No, I said "your addition lacks ...". And you addition is the "domid"
field.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-04 15:52   ` Jan Beulich
@ 2014-12-08  8:52     ` Chen, Tiejun
  2014-12-08  9:18       ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  8:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/4 23:52, Jan Beulich wrote:
>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> We need to use reserved device memory maps with multiple times, so
>> provide just one common function should be friend.
>
> I'm not going to repeat earlier comments; the way this is done right
> now it's neither a proper runtime function nor a proper init time one.
>

Actually I tried to do this in runtime as I comments in patch head. But 
maybe you hope I should return rdm_map directly, and nr_entries as a 
in/out parameter, right?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory
  2014-12-02  9:11   ` Tian, Kevin
@ 2014-12-08  9:04     ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  9:04 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, tim, Zhang, Yang Z
  Cc: xen-devel

On 2014/12/2 17:11, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Monday, December 01, 2014 5:24 PM
>>
>> We need to make sure all mmio allocation don't overlap
>> any rdm, reserved device memory. Here we just skip
>> all reserved device memory range in mmio space.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/firmware/hvmloader/pci.c  | 54
>> ++++++++++++++++++++++++++++++++++++++++-
>>   tools/firmware/hvmloader/util.c |  9 +++++++
>>   tools/firmware/hvmloader/util.h |  2 ++
>>   3 files changed, 64 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
>> index 4e8d803..fc22ab3 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -38,6 +38,30 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>>   enum virtual_vga virtual_vga = VGA_none;
>>   unsigned long igd_opregion_pgbase = 0;
>>
>> +static unsigned int need_skip_rmrr;
>> +extern struct xen_reserved_device_memory *rdm_map;
>> +
>> +static unsigned int
>> +check_reserved_device_memory_map(uint64_t mmio_base, uint64_t
>> mmio_max)
>> +{
>> +    uint32_t i;
>> +    uint64_t rdm_start, rdm_end;
>> +    unsigned int nr_rdm_entries =
>> hvm_get_reserved_device_memory_map();
>> +
>> +    for ( i = 0; i < nr_rdm_entries; i++ )
>> +    {
>> +        rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
>> +        rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
>> PAGE_SHIFT);
>> +        if ( check_rdm_hole_conflict(mmio_base, mmio_max -
>> mmio_base,
>> +                                     rdm_start, rdm_end -
>> rdm_start) )
>> +        {
>> +            need_skip_rmrr++;
>> +        }
>> +    }
>> +
>> +    return nr_rdm_entries;
>> +}
>> +
>
> I don't understand the use of need_skip_rmrr here. What does the counter actually
> mean here? Also the function is not well organized. Usually the value returned is
> the major purpose of the function, but it looks the function is actually for need_skip_rmrr.
> If it's the actual purpose, better to rename the function and move nr_rdm_entries
> directly in the outer function.

See online below.

>
>>   void pci_setup(void)
>>   {
>>       uint8_t is_64bar, using_64bar, bar64_relocate = 0;
>> @@ -59,8 +83,10 @@ void pci_setup(void)
>>           uint32_t bar_reg;
>>           uint64_t bar_sz;
>>       } *bars = (struct bars *)scratch_start;
>> -    unsigned int i, nr_bars = 0;
>> +    unsigned int i, j, nr_bars = 0;
>>       uint64_t mmio_hole_size = 0;
>> +    unsigned int nr_rdm_entries;
>> +    uint64_t rdm_start, rdm_end;
>>
>>       const char *s;
>>       /*
>> @@ -338,6 +364,14 @@ void pci_setup(void)
>>       io_resource.base = 0xc000;
>>       io_resource.max = 0x10000;
>>
>> +    /* Check low mmio range. */
>> +    nr_rdm_entries =
>> check_reserved_device_memory_map(mem_resource.base,
>> +
>> mem_resource.max);
>> +    /* Check high mmio range. */
>> +    if ( nr_rdm_entries )
>> +        nr_rdm_entries =
>> check_reserved_device_memory_map(high_mem_resource.base,
>> +
>> high_mem_resource.max);
>> +
>>       /* Assign iomem and ioport resources in descending order of size. */
>>       for ( i = 0; i < nr_bars; i++ )
>>       {
>> @@ -393,8 +427,26 @@ void pci_setup(void)
>>           }
>>
>>           base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
>> + reallocate_mmio:
>>           bar_data |= (uint32_t)base;
>>           bar_data_upper = (uint32_t)(base >> 32);
>> +
>> +        if ( need_skip_rmrr )
>> +        {
>> +            for ( j = 0; j < nr_rdm_entries; j++ )
>> +            {
>> +                rdm_start = (uint64_t)rdm_map[j].start_pfn <<
>> PAGE_SHIFT;
>> +                rdm_end = rdm_start + ((uint64_t)rdm_map[j].nr_pages
>> << PAGE_SHIFT);
>> +                if ( check_rdm_hole_conflict(base, bar_sz,
>> +                                             rdm_start, rdm_end -
>> rdm_start) )
>> +                {
>> +                    base = (rdm_end  + bar_sz - 1) & ~(uint64_t)(bar_sz
>> - 1);
>> +                    need_skip_rmrr--;
>> +                    goto reallocate_mmio;
>> +                }
>> +            }
>> +        }
>> +
>
> here is the point which I don't understand. what's required here is just to
> walk the rmrr entries for a given allocation, and if conflicting then move
> the base. Then how does need_skip_rmrr helps here? and why do you
> need pre-check on low/high region earlier?

We may have multiple RMRR entries but some of entries may overlap mmio. 
Here I use need_skip_rmrr to count this.

When we skip one RMRR entry to allocate a range for a pci device, we 
shouldn't check this entry again since this means we already cross that 
range, then we decrease need_skip_rmrr. Once need_skip_rmrr is changed 
as zero, even need_skip_rmrr is always zero, we should do nothing.

Thanks
Tiejun

>
>>           base += bar_sz;
>>
>>           if ( (base < resource->base) || (base > resource->max) )
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index dd81fb6..8767897 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -887,6 +887,15 @@ unsigned int
>> hvm_get_reserved_device_memory_map(void)
>>       return nr_entries;
>>   }
>>
>> +int check_rdm_hole_conflict(uint64_t start, uint64_t size,
>> +                            uint64_t rdm_start, uint64_t rdm_size)
>> +{
>> +    if ( start + size <= rdm_start || start >= rdm_start + rdm_size )
>> +        return 0;
>> +    else
>> +        return 1;
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/tools/firmware/hvmloader/util.h
>> b/tools/firmware/hvmloader/util.h
>> index e4f1851..9b02f95 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -242,6 +242,8 @@ int build_e820_table(struct e820entry *e820,
>>   void dump_e820_table(struct e820entry *e820, unsigned int nr);
>>
>>   unsigned int hvm_get_reserved_device_memory_map(void);
>> +int check_rdm_hole_conflict(uint64_t start, uint64_t size,
>> +                            uint64_t rdm_start, uint64_t rdm_size);
>>
>>   #ifndef NDEBUG
>>   void perform_tests(void);
>> --
>> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory
  2014-12-04 16:04   ` Jan Beulich
@ 2014-12-08  9:10     ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-08  9:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/5 0:04, Jan Beulich wrote:
>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> We need to make sure all mmio allocation don't overlap
>> any rdm, reserved device memory. Here we just skip
>> all reserved device memory range in mmio space.
>
> I think someone else already suggested that this and patch 9 should

Who?

I just see Kevin commented something for this patch.

> be swapped, and the BAR allocation be changed to use the E820
> map as input. That may end up being a bigger change, but will yield
> ultimately better (and namely better maintainable) code.

Maybe you're saying some comments in patch9? I need to take a look.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps
  2014-12-08  8:52     ` Chen, Tiejun
@ 2014-12-08  9:18       ` Jan Beulich
  0 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-08  9:18 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 08.12.14 at 09:52, <tiejun.chen@intel.com> wrote:
> On 2014/12/4 23:52, Jan Beulich wrote:
>>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>>> We need to use reserved device memory maps with multiple times, so
>>> provide just one common function should be friend.
>>
>> I'm not going to repeat earlier comments; the way this is done right
>> now it's neither a proper runtime function nor a proper init time one.
>>
> 
> Actually I tried to do this in runtime as I comments in patch head. But 
> maybe you hope I should return rdm_map directly, and nr_entries as a 
> in/out parameter, right?

As said numerous times before - either you invoke the function
once (early on) and subsequently access the retrieved data via
global variables, or each invocation returns both the count and
the array (which, if any, of them via return value and which via
indirection is secondary).

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map
  2014-12-08  6:17     ` Chen, Tiejun
@ 2014-12-08 10:00       ` Jan Beulich
  2014-12-08 16:45         ` Daniel De Graaf
  0 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-08 10:00 UTC (permalink / raw)
  To: Tiejun Chen, dgdegra
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 08.12.14 at 07:17, <tiejun.chen@intel.com> wrote:
> On 2014/12/3 3:47, Konrad Rzeszutek Wilk wrote:
>> On Mon, Dec 01, 2014 at 05:24:21PM +0800, Tiejun Chen wrote:
>>> @@ -1101,6 +1129,29 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>           break;
>>>       }
>>>
>>> +#ifdef HAS_PASSTHROUGH
>>> +    case XENMEM_reserved_device_memory_map:
>>> +    {
>>> +        struct get_reserved_device_memory grdm;
>>> +
>>> +        if ( copy_from_guest(&grdm.map, arg, 1) ||
>>> +             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
>>> +            return -EFAULT;
>>> +
>>
>> Shouldn't there be an XSM check here?
> 
> I'm not sure, and Jan should be the author for this patch, so Jan can 
> give you a correct reply.

Hmm, not sure: Daniel, does an operation like this need an XSM
check? It's not clear whether the absence of such a check in e.g.
the handling of XENMEM_memory_map, XENMEM_machphys_mapping,
or XENMEM_maximum_ram_page is intentional (and can be used as
justification for it to be absent here too - after all the operation is for
a domain to find out information about only itself).

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map
  2014-12-08  7:25     ` Chen, Tiejun
@ 2014-12-08 15:52       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-08 15:52 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 08, 2014 at 03:25:12PM +0800, Chen, Tiejun wrote:
> On 2014/12/3 3:50, Konrad Rzeszutek Wilk wrote:
> >On Mon, Dec 01, 2014 at 05:24:23PM +0800, Tiejun Chen wrote:
> >>We will introduce that hypercall xc_reserved_device_memory_map
> >>approach to libxc.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> >>---
> >>  tools/libxc/include/xenctrl.h |  5 +++++
> >>  tools/libxc/xc_domain.c       | 30 ++++++++++++++++++++++++++++++
> >>  2 files changed, 35 insertions(+)
> >>
> >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> >>index 84012fe..a3aeac3 100644
> >>--- a/tools/libxc/include/xenctrl.h
> >>+++ b/tools/libxc/include/xenctrl.h
> >>@@ -1294,6 +1294,11 @@ int xc_domain_set_memory_map(xc_interface *xch,
> >>  int xc_get_machine_memory_map(xc_interface *xch,
> >>                                struct e820entry entries[],
> >>                                uint32_t max_entries);
> >>+
> >>+int xc_reserved_device_memory_map(xc_interface *xch,
> >>+                                  uint32_t dom,
> >>+                                  struct xen_reserved_device_memory entries[],
> >>+                                  uint32_t *max_entries);
> >>  #endif
> >>  int xc_domain_set_time_offset(xc_interface *xch,
> >>                                uint32_t domid,
> >>diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> >>index 7fd43e9..09fd988 100644
> >>--- a/tools/libxc/xc_domain.c
> >>+++ b/tools/libxc/xc_domain.c
> >>@@ -679,6 +679,36 @@ int xc_domain_set_memory_map(xc_interface *xch,
> >>
> >>      return rc;
> >>  }
> >>+
> >>+int xc_reserved_device_memory_map(xc_interface *xch,
> >>+                                  uint32_t domid,
> >>+                                  struct xen_reserved_device_memory entries[],
> >>+                                  uint32_t *max_entries)
> >>+{
> >>+    int rc;
> >>+    struct xen_reserved_device_memory_map xrdmmap = {
> >>+        .domid = domid,
> >>+        .nr_entries = *max_entries
> >>+    };
> >>+    DECLARE_HYPERCALL_BOUNCE(entries,
> >>+                             sizeof(struct xen_reserved_device_memory) *
> >>+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> >>+
> >>+    if ( xc_hypercall_bounce_pre(xch, entries) )
> >>+        return -1;
> >>+
> >>+    set_xen_guest_handle(xrdmmap.buffer, entries);
> >>+
> >>+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
> >>+                      &xrdmmap, sizeof(xrdmmap));
> >>+
> >>+    xc_hypercall_bounce_post(xch, entries);
> >>+
> >>+    *max_entries = xrdmmap.nr_entries;
> >>+
> >
> >I would bake the -EAGAIN support in here to loop here.
> >
> >See how the xc_domain_destroy does it.
> 
> Do you mean this change?
> 
> @@ -699,8 +699,10 @@ int xc_reserved_device_memory_map(xc_interface *xch,
> 
>      set_xen_guest_handle(xrdmmap.buffer, entries);
> 
> -    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
> -                      &xrdmmap, sizeof(xrdmmap));
> +    do {
> +        rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
> +                          &xrdmmap, sizeof(xrdmmap));
> +    } while ( rc && (errno == EAGAIN) );

Yes.
> 
>      xc_hypercall_bounce_post(xch, entries);
> 
> Thanks
> Tiejun
> 
> >>+    return rc ? rc : xrdmmap.nr_entries;
> >>+}
> >>+
> >>  int xc_get_machine_memory_map(xc_interface *xch,
> >>                                struct e820entry entries[],
> >>                                uint32_t max_entries)
> >>--
> >>1.9.1
> >>
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-08  3:16     ` Chen, Tiejun
@ 2014-12-08 15:57       ` Konrad Rzeszutek Wilk
  2014-12-09  1:06         ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-08 15:57 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

On Mon, Dec 08, 2014 at 11:16:07AM +0800, Chen, Tiejun wrote:
> 
> On 2014/12/3 3:39, Konrad Rzeszutek Wilk wrote:
> >On Mon, Dec 01, 2014 at 05:24:20PM +0800, Tiejun Chen wrote:
> >>This should be based on a new parameter globally, 'pci_rdmforce'.
> >>
> >>pci_rdmforce = 1 => Of course this should be 0 by default.
> >>
> >>'1' means we should force check to reserve all ranges. If failed
> >>VM wouldn't be created successfully. This also can give user a
> >>chance to work well with later hotplug, even if not a device
> >>assignment while creating VM.
> >>
> >>But we can override that by one specific pci device:
> >>
> >>pci = ['AA:BB.CC,rdmforce=0/1]
> >>
> >>But this 'rdmforce' should be 1 by default since obviously any
> >>passthrough device always need to do this. Actually no one really
> >>want to set as '0' so it may be unnecessary but I'd like to leave
> >>this as a potential approach.
> >>
> >>So this domctl provides an approach to control how to populate
> >>reserved device memory by tools.
> >>
> >>Note we always post a message to user about this once we owns
> >>RMRR.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> >>---
> >>  docs/man/xl.cfg.pod.5              |  6 +++++
> >>  docs/misc/vtd.txt                  | 15 ++++++++++++
> >>  tools/libxc/include/xenctrl.h      |  6 +++++
> >>  tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
> >>  tools/libxl/libxl_create.c         |  3 +++
> >>  tools/libxl/libxl_dm.c             | 47 ++++++++++++++++++++++++++++++++++++++
> >>  tools/libxl/libxl_internal.h       |  4 ++++
> >>  tools/libxl/libxl_types.idl        |  2 ++
> >>  tools/libxl/libxlu_pci.c           |  2 ++
> >>  tools/libxl/xl_cmdimpl.c           | 10 ++++++++
> >
> >In the past we had split the hypervisor and the
> >toolstack patches in two. So that one could focus
> >on the hypervisor ones first, and then in another
> >patch on the toolstack.
> >
> 
> Yes.
> 
> >But perhaps this was intended to be in one patch?
> 
> This change also involve docs so its little bit harder to understand the
> whole page if we split this.
> 
> >
> >>  xen/drivers/passthrough/pci.c      | 39 +++++++++++++++++++++++++++++++
> >>  xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
> >>  xen/include/asm-x86/hvm/domain.h   |  4 ++++
> >
> >I don't see ARM here? Should there be an ARM variant of this? If not
> 
> ARM doesn't need this feature.
> 
> >should the toolstack ones only run under x86?
> 
> And I think this shouldn't broken current ARM path as well. I mean this
> would return simply since ARM hasn't such a hypercall handler.
> 
> >
> >>  xen/include/public/domctl.h        | 21 +++++++++++++++++
> >>  xen/xsm/flask/hooks.c              |  1 +
> >>  15 files changed, 196 insertions(+)
> >>
> >>diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> >>index 622ea53..9adc41e 100644
> >>--- a/docs/man/xl.cfg.pod.5
> >>+++ b/docs/man/xl.cfg.pod.5
> >>@@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
> >>  D0-D3hot power management states for the PCI device. False (0) by
> >>  default.
> >>
> >>+=item B<rdmforce=BOOLEAN>
> >>+
> >>+(HVM/x86 only) Specifies that the VM would force to check and try to
> >
> >s/force/forced/
> 
> I guess you're saying 'be forced'.
> 
> >>+reserve all reserved device memory, like RMRR, associated to the PCI
> >>+device. False (0) by default.
> >
> >Not sure I understand. How would the VM be forced to do this? Or is
> >it that the hvmloader would force to do this? And if it fails (as you
> 
> Yes.
> 
> >say 'try') ? What then?
> 
> In most cases we can reserve these regions but if these RMRR regions overlap
> with some fixed range, like guest BIOS, we can't succeed in this case.
> 
> >
> >>+
> >>  =back
> >>
> >>  =back
> >>diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> >>index 9af0e99..23544d5 100644
> >>--- a/docs/misc/vtd.txt
> >>+++ b/docs/misc/vtd.txt
> >>@@ -111,6 +111,21 @@ in the config file:
> >>  To override for a specific device:
> >>  	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
> >>
> >>+RDM, 'reserved device memory', for PCI Device Passthrough
> >>+---------------------------------------------------------
> >>+
> >>+The BIOS controls some devices in terms of some reginos of memory used for
> >
> >Could you elaborate what 'some devices' are? Network cards? GPUs? What
> >are the most commons ones.
> 
> Some legacy USB device to perform PS2 emulation, and GPU has a stolen memory
> as I remember.
> 
> >
> >s/reginos/regions/
> 
> Fixed.
> 
> >
> >And by regions you mean BAR regions?
> 
> No. I guess you want to know some background about RMRR :)
> 
> There's a good brief description in Linux:
> 
> What is RMRR?
> -------------
> 
> There are some devices the BIOS controls, for e.g USB devices to perform
> PS2 emulation. The regions of memory used for these devices are marked
> reserved in the e820 map. When we turn on DMA translation, DMA to those
> regions will fail. Hence BIOS uses RMRR to specify these regions along with
> devices that need to access these regions. OS is expected to setup
> unity mappings for these regions for these devices to access these regions.
> 
> >
> >>+these devices. This kind of region should be reserved before creating a VM
> >>+to make sure they are not occupied by RAM/MMIO to conflict, and also we can
> >
> >You said 'This' but here you are using the plural ' are'. IF you want it plural
> >it needs to be 'These regions'
> 
> Thanks for your correction.
> 
> >>+create necessary IOMMU table successfully.
> >>+
> >>+To enable this globally, add "pci_rdmforce" in the config file:
> >>+
> >>+	pci_rdmforce = 1         (default is 0)
> >
> >The guest config file? Or /etc/xen/xl.conf ?
> 
> The guest config file. Here I just follow something about 'pci_msitranslate'
> since they have that usage in common.
> 
> >
> >>+
> >>+Or just enable for a specific device:
> >>+	pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
> >>+
> >>
> >>  Caveat on Conventional PCI Device Passthrough
> >>  ---------------------------------------------
> >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> 
> [snip]
> 
> >>--- a/xen/drivers/passthrough/pci.c
> >>+++ b/xen/drivers/passthrough/pci.c
> >>@@ -34,6 +34,7 @@
> >>  #include <xen/tasklet.h>
> >>  #include <xsm/xsm.h>
> >>  #include <asm/msi.h>
> >>+#include <xen/stdbool.h>
> >>
> >>  struct pci_seg {
> >>      struct list_head alldevs_list;
> >>@@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
> >>          }
> >>          break;
> >>
> >>+    case XEN_DOMCTL_set_rdm:
> >>+    {
> >>+        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> >>+        struct xen_guest_pcidev_info *pcidevs = NULL;
> >>+        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
> >>+
> >>+        if ( d == NULL )
> >>+            return -ESRCH;
> >>+
> >
> >What if this is called on an PV domain?
> 
> Currently we just support this in HVM, so I'd like to add this,
> 
>          if ( d == NULL )
>              return -ESRCH;
> 
> +        ASSERT( is_hvm_domain(d) );
> +

No. Please don't crash the hypervisor.

Just return -ENOSYS or such when done for PV guests.

> 
> >
> >You are also missing the XSM checks.
> 
> Just see this below.
> 
> >
> >What if this is called multiple times. Is it OK to over-ride
> >the 'pci_force' or should it stick once?
> 
> It should be fine since just xc/hvmloader need such an information while
> creating a VM.
> 
> And especially, currently we just call this one time to set. So why we need
> to call this again and again? I think if anyone want to extend such a case
> you're worrying, he should know any effect before he take a action, right?

Program defensively and also think about preemption. If this call end up
being preempted you might need to call it again. Or if the third-party
toolstack use this operation and call this with wacky values?
> 
> >
> >
> >>+        d->arch.hvm_domain.pci_force =
> >>+                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
> >
> >Won't we crash here if this is called for PV guests?
> 
> After the line, 'ASSERT( is_hvm_domain(d) );', is added, this problem should
> be gone.

No it won't be. You will just crash the hypervisor.

Please please put yourself in the mind that the toolstack can (and will)
have bugs.
> 
> >
> >>+        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
> >
> >What if the 'num_pcidevs' has some bogus value. You need to check for that.
> 
> This value is grabbed from that existing interface, assign_device, so I mean
> this is already checked.
> 
> >
> >
> >>+        d->arch.hvm_domain.pcidevs = NULL;
> >
> >Please first free it. It might be that the toolstack
> >is doing this a couple of times. You don't want to leak memory.
> >
> 
> Okay,
> 
> +        if ( d->arch.hvm_domain.pcidevs )
> +            xfree(d->arch.hvm_domain.pcidevs);
> 
> >
> >>+
> >>+        if ( xdsr->num_pcidevs )
> >>+        {
> >>+            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> >>+                                    xdsr->num_pcidevs);
> >>+            if ( pcidevs == NULL )
> >>+            {
> >>+                rcu_unlock_domain(d);
> >>+                return -ENOMEM;
> >
> >But you already have set 'num_pcidevs' to some value. This copying/check
> >should be done before you modify 'd->arch.hvm_domain'...
> 
> This makes sense so I'll move down this fragment.
> 
> >>+            }
> >>+
> >>+            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> >>+                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
> >>+            {
> >>+                xfree(pcidevs);
> >>+                rcu_unlock_domain(d);
> >
> >Ditto. You need to do these checks before you modify 'd->arch.hvm_domain'.
> >
> >>+                return -EFAULT;
> >>+            }
> >>+        }
> >>+
> >>+        d->arch.hvm_domain.pcidevs = pcidevs;
> >>+        rcu_unlock_domain(d);
> >>+    }
> >>+        break;
> >>+
> >>      case XEN_DOMCTL_assign_device:
> >>          if ( unlikely(d->is_dying) )
> >>          {
> >>diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
> >>index 1152c3a..5e41e7a 100644
> >>--- a/xen/drivers/passthrough/vtd/dmar.c
> >>+++ b/xen/drivers/passthrough/vtd/dmar.c
> >>@@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
> >>                          "  RMRR region: base_addr %"PRIx64
> >>                          " end_address %"PRIx64"\n",
> >>                          rmrru->base_address, rmrru->end_address);
> >>+            /*
> >>+             * TODO: we may provide a precise paramter just to reserve
> >
> >s/paramter/parameter/
> 
> Fixed.
> 
> >>+             * RMRR range specific to one device.
> >>+             */
> >>+            dprintk(XENLOG_WARNING VTDPREFIX,
> >>+                    "So please set pci_rdmforce to reserve these ranges"
> >>+                    " if you need such a device in hotplug case.\n");
> 
> s/hotplug/passthrough
> 
> >
> >'Please set rdmforce to reserve ranges %lx->%lx if you plan to hotplug this device.'
> >
> >But then this is going to be a bit verbose, so perhaps:
> >
> >'Ranges %lx-%lx need rdmforce to properly work.' ?
> 
> Its unnecessary to output range again since we already have such a print
> message here.
> 
> >
> >>+
> >>              acpi_register_rmrr_unit(rmrru);
> >>          }
> >>      }
> >>diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> >>index 2757c7f..38530e5 100644
> >>--- a/xen/include/asm-x86/hvm/domain.h
> >>+++ b/xen/include/asm-x86/hvm/domain.h
> >>@@ -90,6 +90,10 @@ struct hvm_domain {
> >>      /* Cached CF8 for guest PCI config cycles */
> >>      uint32_t                pci_cf8;
> >>
> >
> >Maybe a comment explaining its purpose?
> 
> Okay.
> 
> /* Force to check/reserve Reserved Device Memory. */
>     bool_t                  pci_force;
> 
> >
> >>+    bool_t                  pci_force;
> >>+    uint32_t                num_pcidevs;
> >>+    struct xen_guest_pcidev_info      *pcidevs;
> >>+
> >
> >You are also missing freeing of this in the hypervisor when the guest
> >is destroyed. Please fix that.
> 
> You're right. I will go there next revision.
> 
> >
> >>      struct pl_time         pl_time;
> >>
> >>      struct hvm_io_handler *io_handler;
> >>diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >>index 57e2ed7..ba8970d 100644
> >>--- a/xen/include/public/domctl.h
> >>+++ b/xen/include/public/domctl.h
> >>@@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
> >>  typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
> >>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
> >>
> >>+/* Currently just one bit to indicate force to check Reserved Device Memory. */
> >
> >Not sure I understand. Did you mean:
> >
> >'Check Reserved Device Memory'.
> 
> I can change this as '...force checking Reserved Device Memory.'
> 
> >
> >What happens if you do not have this flag? What are the semantics
> >of this hypercall - as in what will it mean.
> 
> If we have no this flag, these devices owned RMRR can't work in passthrough
> case.
> 
> >
> >>+#define PCI_DEV_RDM_CHECK   0x1
> >>+struct xen_guest_pcidev_info {
> >>+    uint16_t    seg;
> >>+    uint8_t     bus;
> >>+    uint8_t     devfn;
> >>+    uint32_t    flags;
> >>+};
> >>+typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
> >>+DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
> >>+/* Control whether/how we check and reserve device memory. */
> >>+struct xen_domctl_set_rdm {
> >>+    uint32_t    flags;
> >
> >What is this 'flags' purpose compared to the 'pcidevs.flags'? Please
> >explain.
> 
> I replied something to Kevin, and we just need a global flag so we can
> remove pcidevs.flags.
> 
> >
> >>+    uint32_t    num_pcidevs;
> >>+    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
> >>+};
> >>+typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
> >>+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
> >>+
> >>  /* Pass-through interrupts: bind real irq -> hvm devfn. */
> >>  /* XEN_DOMCTL_bind_pt_irq */
> >>  /* XEN_DOMCTL_unbind_pt_irq */
> >>@@ -1070,6 +1089,7 @@ struct xen_domctl {
> >>  #define XEN_DOMCTL_setvnumainfo                  74
> >>  #define XEN_DOMCTL_psr_cmt_op                    75
> >>  #define XEN_DOMCTL_arm_configure_domain          76
> >>+#define XEN_DOMCTL_set_rdm                       77
> >>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
> >>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
> >>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> >>@@ -1135,6 +1155,7 @@ struct xen_domctl {
> >>          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
> >>          struct xen_domctl_vnuma             vnuma;
> >>          struct xen_domctl_psr_cmt_op        psr_cmt_op;
> >>+        struct xen_domctl_set_rdm           set_rdm;
> >>          uint8_t                             pad[128];
> >>      } u;
> >>  };
> >>diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> >>index d48463f..5a760e2 100644
> >>--- a/xen/xsm/flask/hooks.c
> >>+++ b/xen/xsm/flask/hooks.c
> >>@@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
> >>      case XEN_DOMCTL_test_assign_device:
> >>      case XEN_DOMCTL_assign_device:
> >>      case XEN_DOMCTL_deassign_device:
> >>+    case XEN_DOMCTL_set_rdm:
> >
> >There is more to XSM than just this file..
> 
> But I don't see more other stuff, like XEN_DOMCTL_assign_device.
> 
> >
> >Please compile with XSM enabled.
> 
> Anyway, I add XSM_ENABLE = y and FLASK_ENABLE = y in Config.mk then
> recompile, but looks good.
> 
> Anything I'm missing?

Ah, then it is fine!

> 
> >>  #endif
> >>          return 0;
> >
> >
> >Also how does this work with 32-bit dom0s? Is there a need to use the
> >compat layer?
> 
> Are you saying in xsm case? Others?
> 
> Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some
> senses but I don't see such an issue you're pointing.

I was thinking about the compat layer and making sure it works properly.
> 
> Thanks
> Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map
  2014-12-08 10:00       ` Jan Beulich
@ 2014-12-08 16:45         ` Daniel De Graaf
  2014-12-08 16:54           ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Daniel De Graaf @ 2014-12-08 16:45 UTC (permalink / raw)
  To: Jan Beulich, Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 12/08/2014 05:00 AM, Jan Beulich wrote:
>>>> On 08.12.14 at 07:17, <tiejun.chen@intel.com> wrote:
>> On 2014/12/3 3:47, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Dec 01, 2014 at 05:24:21PM +0800, Tiejun Chen wrote:
>>>> @@ -1101,6 +1129,29 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>            break;
>>>>        }
>>>>
>>>> +#ifdef HAS_PASSTHROUGH
>>>> +    case XENMEM_reserved_device_memory_map:
>>>> +    {
>>>> +        struct get_reserved_device_memory grdm;
>>>> +
>>>> +        if ( copy_from_guest(&grdm.map, arg, 1) ||
>>>> +             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
>>>> +            return -EFAULT;
>>>> +
>>>
>>> Shouldn't there be an XSM check here?
>>
>> I'm not sure, and Jan should be the author for this patch, so Jan can
>> give you a correct reply.
>
> Hmm, not sure: Daniel, does an operation like this need an XSM
> check? It's not clear whether the absence of such a check in e.g.
> the handling of XENMEM_memory_map, XENMEM_machphys_mapping,
> or XENMEM_maximum_ram_page is intentional (and can be used as
> justification for it to be absent here too - after all the operation is for
> a domain to find out information about only itself).
>
> Jan

I can see a possible reason why an XSM check might be needed here, but
I'm not sufficiently clear on what reserved device memory is to tell
for sure.  My best guess is that it is not needed.

 From my reading of this patchset, this hypercall just identifies regions
of memory that are reserved, similar to exposing the host's e820 map to a
guest.  That seems similar enough to the other XENMEM_* leaks that it is
acceptable to also allow it.  If there is a reason that it would be useful
to hide this, adding hooks to all these locations so that only domains
that use passthrough devices (and therefore need to know about the host
system's memory) will have access is probably the best option.

If a guest who has control of a passthrough device can cause these
reserved ranges to change, then there may be reason to prevent others
from querying them, but that doesn't appear to be the case here.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map
  2014-12-08 16:45         ` Daniel De Graaf
@ 2014-12-08 16:54           ` Jan Beulich
  0 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-08 16:54 UTC (permalink / raw)
  To: Tiejun Chen, Daniel De Graaf
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 08.12.14 at 17:45, <dgdegra@tycho.nsa.gov> wrote:
> If a guest who has control of a passthrough device can cause these
> reserved ranges to change, then there may be reason to prevent others
> from querying them, but that doesn't appear to be the case here.

Right, in that case we definitely would need a check.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-08 15:57       ` Konrad Rzeszutek Wilk
@ 2014-12-09  1:06         ` Chen, Tiejun
  2014-12-09  8:33           ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-09  1:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, jbeulich, yang.z.zhang

>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -34,6 +34,7 @@
>>>>   #include <xen/tasklet.h>
>>>>   #include <xsm/xsm.h>
>>>>   #include <asm/msi.h>
>>>> +#include <xen/stdbool.h>
>>>>
>>>>   struct pci_seg {
>>>>       struct list_head alldevs_list;
>>>> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>>>>           }
>>>>           break;
>>>>
>>>> +    case XEN_DOMCTL_set_rdm:
>>>> +    {
>>>> +        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
>>>> +        struct xen_guest_pcidev_info *pcidevs = NULL;
>>>> +        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
>>>> +
>>>> +        if ( d == NULL )
>>>> +            return -ESRCH;
>>>> +
>>>
>>> What if this is called on an PV domain?
>>
>> Currently we just support this in HVM, so I'd like to add this,
>>
>>           if ( d == NULL )
>>               return -ESRCH;
>>
>> +        ASSERT( is_hvm_domain(d) );
>> +
>
> No. Please don't crash the hypervisor.

Okay.

>
> Just return -ENOSYS or such when done for PV guests.

So,

+        if ( !is_hvm_domain(d) )
+            return -ENOSYS;

>
>>
>>>
>>> You are also missing the XSM checks.
>>
>> Just see this below.
>>
>>>
>>> What if this is called multiple times. Is it OK to over-ride
>>> the 'pci_force' or should it stick once?
>>
>> It should be fine since just xc/hvmloader need such an information while
>> creating a VM.
>>
>> And especially, currently we just call this one time to set. So why we need
>> to call this again and again? I think if anyone want to extend such a case
>> you're worrying, he should know any effect before he take a action, right?
>
> Program defensively and also think about preemption. If this call end up

Do you think we need a fine grain way, like lock here?

> being preempted you might need to call it again. Or if the third-party
> toolstack use this operation and call this with wacky values?

Maybe can the following address this enough,

     case XEN_DOMCTL_set_rdm:
     {
         struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
         struct xen_guest_pcidev_info *pcidevs = NULL;

         if ( d->arch.hvm_domain.pcidevs )
             break;

         if ( !is_hvm_domain(d) )
             return -ENOSYS;
	...

>>
>>>
>>>
>>>> +        d->arch.hvm_domain.pci_force =
>>>> +                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
>>>
>>> Won't we crash here if this is called for PV guests?
>>
>> After the line, 'ASSERT( is_hvm_domain(d) );', is added, this problem should
>> be gone.
>
> No it won't be. You will just crash the hypervisor.
>
> Please please put yourself in the mind that the toolstack can (and will)
> have bugs.

Thanks for your reminder.

>>
>>>
>>>> +        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
>>>
>>> What if the 'num_pcidevs' has some bogus value. You need to check for that.
>>

[snip]

>>>>           return 0;
>>>
>>>
>>> Also how does this work with 32-bit dom0s? Is there a need to use the
>>> compat layer?
>>
>> Are you saying in xsm case? Others?
>>
>> Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some
>> senses but I don't see such an issue you're pointing.
>
> I was thinking about the compat layer and making sure it works properly.

Do we really need this consideration? I mean I referred to that existing 
XEN_DOMCTL_assign_device to implement this new DOMCTL, but looks there's 
nothing related to this point.

Or could you make your thought clear to me with an exiting example? Then 
I can take a look at what exactly should be taken in my new DOMCTL since 
I'm a fresh man to work out this properly inside xen.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-08  8:43       ` Jan Beulich
@ 2014-12-09  2:38         ` Chen, Tiejun
  2014-12-09  7:29           ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-09  2:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/8 16:43, Jan Beulich wrote:
>>>> On 08.12.14 at 07:06, <tiejun.chen@intel.com> wrote:
>> On 2014/12/4 23:33, Jan Beulich wrote:
>>>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>> Looks this could be fine,
>>
>> d->arch.hvm_domain.pci_force = xdsr->flags & PCI_DEV_RDM_CHECK;
>
> Which is correct only because PCI_DEV_RDM_CHECK happens to be
> 1. Such hidden dependencies shouldn't be introduced though, in
> particular to avoid others then cloning the code for a flag that's not
> 1. The canonical form (found in many places throughout the treei

Right.

>
>      d->arch.hvm_domain.pci_force = !!(xdsr->flags & PCI_DEV_RDM_CHECK);

Fixed.

>
>>>> +        d->arch.hvm_domain.pcidevs = NULL;
>>>> +
>>>> +        if ( xdsr->num_pcidevs )
>>>> +        {
>>>> +            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
>>>> +                                    xdsr->num_pcidevs);
>>>
>>> New domctl-s must not represent security risks: xdsr->num_pcidevs
>>> can be (almost) arbitrarily large - do you really want to allow such
>>> huge allocations? A reasonable upper bound could for example be
>>
>> Sorry, as you know this num_pcidevs is from tools, and actually it share
>> that result from that existing hypercall, assign_device, while parsing
>> 'pci=[]', so I couldn't understand why this can be (almost" arbitrarily
>> large.
>
> You imply well behaved tools, which you shouldn't when viewing
> things from a security perspective.
>
>>> the total number of PCI devices the hypervisor knows about.
>>
>> I take a quick look at this but looks we have no this exact value that
>> we can get directly.
>
> You need some upper bound. Whether you introduce a properly

In theory, we may have at most the number, domain(16bit) x bus(8bit) x 
device(5bit) x function(3bit), 2^16 x 2^8 x 2^5 x 2^3 = 0x1000000, so 
could we define a macro like this,

#define PCI_DEVICES_NUM_UP 0x1000000

> maintained count or a suitable estimate thereof doesn't matter.
>
>>>> --- a/xen/include/asm-x86/hvm/domain.h
>>>> +++ b/xen/include/asm-x86/hvm/domain.h
>>>> @@ -90,6 +90,10 @@ struct hvm_domain {
>>>>        /* Cached CF8 for guest PCI config cycles */

[snip]

>
> I really didn't necessarily mean individual comments - one for the whole
> group would suffice.
>
> Also I don't think pci_force is really the right name - all_pcidevs or
> some such would seem more suitable following the entire series.
> And finally, I'm generally advocating for avoiding redundant data
> items - I'm sure this "all" notion can be encoded reasonable with
> just the other two field (and a suitable checking macro).

Yeah.

>

Are you saying this way?

#define PCI_DEVS_NUM_UP 0x1000000
#define ALL_PCI_DEVS    (0x1 << 31)

#define is_all_pcidevs(d)  ((d)->arch.hvm_domain.num_pcidevs & ALL_PCI_DEVS)
#define is_valid_pcidevs_num(d)  \
     ((d)->arch.hvm_domain.num_pcidevs <= PCI_DEVS_NUM_UP)

     /*
      * num_pcidevs:
      * bit31 indicates if all devices need to be checked/reserved
      * Reserved Device Memory.
      * bit30 ~ bit25 are reserved now.
      * bit24 ~ bit0 represent actually how many pci devices we
      * need to check/reserve Reserved Device Memory. They are
      * valid just when bit31 is zero.
      *
      * pcidevs represent these pci device instances associated to
      * bit42 ~ bit0.
      */
     uint32_t                num_pcidevs;
     struct xen_guest_pcidev_info      *pcidevs;

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-09  2:38         ` Chen, Tiejun
@ 2014-12-09  7:29           ` Jan Beulich
  0 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-09  7:29 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 09.12.14 at 03:38, <tiejun.chen@intel.com> wrote:
> On 2014/12/8 16:43, Jan Beulich wrote:
>>>>> On 08.12.14 at 07:06, <tiejun.chen@intel.com> wrote:
>>> I take a quick look at this but looks we have no this exact value that
>>> we can get directly.
>>
>> You need some upper bound. Whether you introduce a properly
> 
> In theory, we may have at most the number, domain(16bit) x bus(8bit) x 
> device(5bit) x function(3bit), 2^16 x 2^8 x 2^5 x 2^3 = 0x1000000, so 
> could we define a macro like this,
> 
> #define PCI_DEVICES_NUM_UP 0x1000000

16+8+5+3 = 32 for me. And an upper bound of 4G is surely not
useful here in any way.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-08  8:51       ` Jan Beulich
@ 2014-12-09  7:47         ` Chen, Tiejun
  2014-12-09  8:19           ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-09  7:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/8 16:51, Jan Beulich wrote:
>>>> On 08.12.14 at 08:11, <tiejun.chen@intel.com> wrote:
>> On 2014/12/4 23:50, Jan Beulich wrote:
>>>>>> On 01.12.14 at 10:24, <tiejun.chen@intel.com> wrote:
>>>>
>>>> -        if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
>>>> -                                     &rdm, 1) )
>>>> -            return -EFAULT;
>>>> +    if ( d )
>>>> +    {
>>>> +        if ( d->arch.hvm_domain.pci_force )
>>>
>>> You didn't verify that the domain is a HVM/PVH one.
>>
>> Is this, ASSERT(is_hvm_domain(grdm.domain)), correct?
>
> Certainly not, or do you want to crash the hypervisor because of bad
> tools input?

Based on konrad's hint, I hope this should work for you,

+        if ( !is_hvm_domain(d) )
+            return -ENOSYS;

>
>>>> +        {
>>>> +            if ( grdm->used_entries < grdm->map.nr_entries )
>>>> +            {
>>>> +                if ( __copy_to_compat_offset(grdm->map.buffer,
>>>> +                                             grdm->used_entries,
>>>> +                                             &rdm, 1) )
>>>> +                {
>>>> +                    rcu_unlock_domain(d);
>>>> +                    return -EFAULT;
>>>> +                }
>>>> +            }
>>>> +            ++grdm->used_entries;
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>>>> +            {
>>>> +                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>>>> +                                 d->arch.hvm_domain.pcidevs[i].bus,
>>>> +                                 d->arch.hvm_domain.pcidevs[i].devfn);
>>>> +                if ( sbdf == id )
>>>> +                {
>>>> +                    if ( grdm->used_entries < grdm->map.nr_entries )
>>>> +                    {
>>>> +                        if ( __copy_to_compat_offset(grdm->map.buffer,
>>>> +                                                     grdm->used_entries,
>>>> +                                                     &rdm, 1) )
>>>> +                        {
>>>> +                            rcu_unlock_domain(d);
>>>> +                            return -EFAULT;
>>>> +                        }
>>>> +                    }
>>>> +                    ++grdm->used_entries;
>>>
>>> break;
>>
>> Added.
>>
>>>
>>> Also trying to fold code identical on the if and else branches would
>>> seem pretty desirable.
>>
>> Sorry, I can't see what I'm missing.
>
> The whole "if-copy-unlock-and-return-EFAULT-otherwise-increment"
> is identical and can be factored out pretty easily afaict.

What about this?

struct get_reserved_device_memory {
     struct xen_reserved_device_memory_map map;
     unsigned int used_entries;
     struct domain *domain;
};

static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
                                       u32 id, void *ctxt)
{
     struct get_reserved_device_memory *grdm = ctxt;
     struct domain *d = grdm->domain;
     unsigned int i, hit_one = 0;
     u32 sbdf;
     struct xen_reserved_device_memory rdm = {
         .start_pfn = start, .nr_pages = nr
     };

     if ( !d->arch.hvm_domain.pci_force )
     {
         for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
         {
             sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
                              d->arch.hvm_domain.pcidevs[i].bus,
                              d->arch.hvm_domain.pcidevs[i].devfn);
             if ( sbdf == id )
             {
                 hit_one = 1;
                 break;
             }
         }

         if ( !hit_one )
             return 0;
     }

     if ( grdm->used_entries < grdm->map.nr_entries )
     {
         if ( __copy_to_guest_offset(grdm->map.buffer,
                                     grdm->used_entries,
                                     &rdm, 1) )
             return -EFAULT;
     }

     ++grdm->used_entries;

     return 0;
}

>
>>>> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd,
>> XEN_GUEST_HANDLE_PARAM(void) compat)
>>>>
>>>>                if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>>>>                    rc = -ENOBUFS;
>>>> +
>>>>                grdm.map.nr_entries = grdm.used_entries;
>>>> -            if ( __copy_to_guest(compat, &grdm.map, 1) )
>>>> -                rc = -EFAULT;
>>>> +            if ( grdm.map.nr_entries )
>>>> +            {
>>>> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
>>>> +                    rc = -EFAULT;
>>>> +            }
>>>
>>> Why do you need this change?
>>
>> If we have no any entries, why do we still copy that?
>
> That's not only a pointless optimization (the counter question being
> "Why add an extra conditional when the copying does no harm?"), but
> also not subject of this patch. Additionally iirc the field is an IN/OUT,
> i.e. when no entries were found you want to tell the caller so.

Right so I will recover that.

>
>>>> --- a/xen/drivers/passthrough/vtd/dmar.c
>>>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>>>> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
>>>>
>>>>    int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>>>>    {
>>>> -    struct acpi_rmrr_unit *rmrr;
>>>> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>>>>        int rc = 0;
>>>> +    unsigned int i;
>>>> +    u16 bdf;
>>>>
>>>> -    list_for_each_entry(rmrr, &acpi_rmrr_units, list)
>>>> +    for_each_rmrr_device ( rmrr, bdf, i )
>>>>        {
>>>> -        rc = func(PFN_DOWN(rmrr->base_address),
>>>> -                  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
>>>> -                  ctxt);
>>>> -        if ( rc )
>>>> -            break;
>>>> +        if ( rmrr != rmrr_cur )
>>>> +        {
>>>> +            rc = func(PFN_DOWN(rmrr->base_address),
>>>> +                      PFN_UP(rmrr->end_address) -
>>>> +                        PFN_DOWN(rmrr->base_address),
>>>> +                      PCI_SBDF(rmrr->segment, bdf),
>>>> +                      ctxt);
>>>> +
>>>> +            if ( unlikely(rc < 0) )
>>>> +                return rc;
>>>> +
>>>> +            /* Just go next. */
>>>> +            if ( !rc )
>>>> +                rmrr_cur = rmrr;
>>>> +
>>>> +            /* Now just return specific to user requirement. */
>>>> +            if ( rc > 0 )
>>>> +                return rc;
>>>
>>> Nice that you check for that, but I can't see this case occurring
>>> anymore. Did you lose some code? Also please don't write code
>>
>> We have three scenarios here:
>>
>> #1 rc < 0 means this is an error;
>> #2 rc = 0 means the tools caller don't know how many buffers it should
>> construct, so we need to count all entries as 'nr_entries' to return.
>> #3 rc > 0 means in some cases, we need to return some specific values,
>> like 1 to indicate we're hitting some RMRR range. Currently, we use gfn
>> to check this in case of memory populating, ept violation handler and
>> mem_access.
>
> Yes, I saw that you make use of this in later patches. It just seemed
> suspicious that you don't in this one.
>
>>>> --- a/xen/include/public/memory.h
>>>> +++ b/xen/include/public/memory.h
>>>> @@ -586,6 +586,11 @@ typedef struct xen_reserved_device_memory
>>>> xen_reserved_device_memory_t;
>>>>    DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
>>>>
>>>>    struct xen_reserved_device_memory_map {
>>>> +    /*
>>>> +     * Domain whose reservation is being changed.
>>>> +     * Unprivileged domains can specify only DOMID_SELF.
>>>> +     */
>>>> +    domid_t        domid;
>>>>        /* IN/OUT */
>>>>        unsigned int nr_entries;
>>>>        /* OUT */
>>>
>>> Your addition lacks an IN annotation.
>>
>> Are you saying something for 'nr_entries'? But I didn't introduce
>> anything to change the original usage. Anyway, I try to improve this,
>>
>>       /*
>>        * IN: on call the number of entries which can be stored in buffer.
>>        * OUT: on return the number of entries which have been stored in
>>        * buffer. If on call the number is less the number of all necessary
>>        * entries, on return the number of entries which is needed.
>>        */
>>
>
> No, I said "your addition lacks ...". And you addition is the "domid"
> field.
>

Sorry I'm misunderstanding this.

struct xen_reserved_device_memory_map {
     /*
      * IN: Domain whose reservation is being changed.
      * Unprivileged domains can specify only DOMID_SELF.
      */
     domid_t        domid;
     /* IN/OUT */
     unsigned int nr_entries;
     /* OUT */
     XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
};

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09  7:47         ` Chen, Tiejun
@ 2014-12-09  8:19           ` Jan Beulich
  2014-12-09  9:12             ` Chen, Tiejun
  2014-12-09 10:11             ` Tim Deegan
  0 siblings, 2 replies; 106+ messages in thread
From: Jan Beulich @ 2014-12-09  8:19 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 09.12.14 at 08:47, <tiejun.chen@intel.com> wrote:
> On 2014/12/8 16:51, Jan Beulich wrote:
>> The whole "if-copy-unlock-and-return-EFAULT-otherwise-increment"
>> is identical and can be factored out pretty easily afaict.
> 
> What about this?
> 
> struct get_reserved_device_memory {
>      struct xen_reserved_device_memory_map map;
>      unsigned int used_entries;
>      struct domain *domain;
> };
> 
> static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>                                        u32 id, void *ctxt)
> {
>      struct get_reserved_device_memory *grdm = ctxt;
>      struct domain *d = grdm->domain;
>      unsigned int i, hit_one = 0;
>      u32 sbdf;
>      struct xen_reserved_device_memory rdm = {
>          .start_pfn = start, .nr_pages = nr
>      };
> 
>      if ( !d->arch.hvm_domain.pci_force )
>      {
>          for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>          {
>              sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>                               d->arch.hvm_domain.pcidevs[i].bus,
>                               d->arch.hvm_domain.pcidevs[i].devfn);
>              if ( sbdf == id )
>              {
>                  hit_one = 1;
>                  break;
>              }
>          }
> 
>          if ( !hit_one )
>              return 0;
>      }

Why do you always pick other than the simplest possible solution?
You don't need a separate variable here, you can simply check
whether i reached d->arch.hvm_domain.num_pcidevs after the
loop. And even if you added a variable, it would want to be a
bool_t one with the way you use it.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-09  1:06         ` Chen, Tiejun
@ 2014-12-09  8:33           ` Jan Beulich
  2014-12-09 16:36             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-09  8:33 UTC (permalink / raw)
  To: Tiejun Chen, Konrad Rzeszutek Wilk
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 09.12.14 at 02:06, <tiejun.chen@intel.com> wrote:
>>>> Also how does this work with 32-bit dom0s? Is there a need to use the
>>>> compat layer?
>>>
>>> Are you saying in xsm case? Others?
>>>
>>> Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some
>>> senses but I don't see such an issue you're pointing.
>>
>> I was thinking about the compat layer and making sure it works properly.
> 
> Do we really need this consideration? I mean I referred to that existing 
> XEN_DOMCTL_assign_device to implement this new DOMCTL, but looks there's 
> nothing related to this point.
> 
> Or could you make your thought clear to me with an exiting example? Then 
> I can take a look at what exactly should be taken in my new DOMCTL since 
> I'm a fresh man to work out this properly inside xen.

I think Konrad got a little confused here - domctl-s intentionally are
structured so that they don't need a compat translation layer.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09  8:19           ` Jan Beulich
@ 2014-12-09  9:12             ` Chen, Tiejun
  2014-12-09  9:21               ` Jan Beulich
  2014-12-09 10:11             ` Tim Deegan
  1 sibling, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-09  9:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/9 16:19, Jan Beulich wrote:
>>>> On 09.12.14 at 08:47, <tiejun.chen@intel.com> wrote:
>> On 2014/12/8 16:51, Jan Beulich wrote:
>>> The whole "if-copy-unlock-and-return-EFAULT-otherwise-increment"
>>> is identical and can be factored out pretty easily afaict.
>>
>> What about this?
>>
>> struct get_reserved_device_memory {
>>       struct xen_reserved_device_memory_map map;
>>       unsigned int used_entries;
>>       struct domain *domain;
>> };
>>
>> static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>>                                         u32 id, void *ctxt)
>> {
>>       struct get_reserved_device_memory *grdm = ctxt;
>>       struct domain *d = grdm->domain;
>>       unsigned int i, hit_one = 0;
>>       u32 sbdf;
>>       struct xen_reserved_device_memory rdm = {
>>           .start_pfn = start, .nr_pages = nr
>>       };
>>
>>       if ( !d->arch.hvm_domain.pci_force )
>>       {
>>           for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>>           {
>>               sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>>                                d->arch.hvm_domain.pcidevs[i].bus,
>>                                d->arch.hvm_domain.pcidevs[i].devfn);
>>               if ( sbdf == id )
>>               {
>>                   hit_one = 1;
>>                   break;
>>               }
>>           }
>>
>>           if ( !hit_one )
>>               return 0;
>>       }
>
> Why do you always pick other than the simplest possible solution?

I don't intend it to be, but I may go a complicated way, even a wrong 
way, based on my understanding. But as one main maintainer, if you 
always say to me in such a reproachful word more than once, I have to 
consider you may hint constantly I'm not a suitable candidate to finish 
this. Its fair to me, I'd really like to quit this to ask my manager if 
it can deliver to other guy to make sure this can move forward.

> You don't need a separate variable here, you can simply check
> whether i reached d->arch.hvm_domain.num_pcidevs after the
> loop. And even if you added a variable, it would want to be a

Are you saying this?

	if ( i == d->arch.hvm_domain.num_pcidevs )
		return 0;

But if the last one happens to one hit, 'i' is equal to 
d->arch.hvm_domain.num_pcidevs.

Thanks
Tiejun

> bool_t one with the way you use it.
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09  9:12             ` Chen, Tiejun
@ 2014-12-09  9:21               ` Jan Beulich
  2014-12-09  9:35                 ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-09  9:21 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 09.12.14 at 10:12, <tiejun.chen@intel.com> wrote:
> On 2014/12/9 16:19, Jan Beulich wrote:
>>>>> On 09.12.14 at 08:47, <tiejun.chen@intel.com> wrote:
>>> On 2014/12/8 16:51, Jan Beulich wrote:
>>>> The whole "if-copy-unlock-and-return-EFAULT-otherwise-increment"
>>>> is identical and can be factored out pretty easily afaict.
>>>
>>> What about this?
>>>
>>> struct get_reserved_device_memory {
>>>       struct xen_reserved_device_memory_map map;
>>>       unsigned int used_entries;
>>>       struct domain *domain;
>>> };
>>>
>>> static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>>>                                         u32 id, void *ctxt)
>>> {
>>>       struct get_reserved_device_memory *grdm = ctxt;
>>>       struct domain *d = grdm->domain;
>>>       unsigned int i, hit_one = 0;
>>>       u32 sbdf;
>>>       struct xen_reserved_device_memory rdm = {
>>>           .start_pfn = start, .nr_pages = nr
>>>       };
>>>
>>>       if ( !d->arch.hvm_domain.pci_force )
>>>       {
>>>           for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>>>           {
>>>               sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>>>                                d->arch.hvm_domain.pcidevs[i].bus,
>>>                                d->arch.hvm_domain.pcidevs[i].devfn);
>>>               if ( sbdf == id )
>>>               {
>>>                   hit_one = 1;
>>>                   break;
>>>               }
>>>           }
>>>
>>>           if ( !hit_one )
>>>               return 0;
>>>       }
>>
>> Why do you always pick other than the simplest possible solution?
> 
> I don't intend it to be, but I may go a complicated way, even a wrong 
> way, based on my understanding. But as one main maintainer, if you 
> always say to me in such a reproachful word more than once, I have to 
> consider you may hint constantly I'm not a suitable candidate to finish 
> this. Its fair to me, I'd really like to quit this to ask my manager if 
> it can deliver to other guy to make sure this can move forward.
> 
>> You don't need a separate variable here, you can simply check
>> whether i reached d->arch.hvm_domain.num_pcidevs after the
>> loop. And even if you added a variable, it would want to be a
> 
> Are you saying this?
> 
> 	if ( i == d->arch.hvm_domain.num_pcidevs )
> 		return 0;

Yes. Or use >=.

> But if the last one happens to one hit, 'i' is equal to 
> d->arch.hvm_domain.num_pcidevs.

No, when the last one hits, i == d->arch.hvm_domain.num_pcidevs - 1.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09  9:21               ` Jan Beulich
@ 2014-12-09  9:35                 ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-09  9:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/9 17:21, Jan Beulich wrote:
>>>> On 09.12.14 at 10:12, <tiejun.chen@intel.com> wrote:
>> On 2014/12/9 16:19, Jan Beulich wrote:
>>>>>> On 09.12.14 at 08:47, <tiejun.chen@intel.com> wrote:
>>>> On 2014/12/8 16:51, Jan Beulich wrote:
>>>>> The whole "if-copy-unlock-and-return-EFAULT-otherwise-increment"
>>>>> is identical and can be factored out pretty easily afaict.
>>>>
>>>> What about this?
>>>>
>>>> struct get_reserved_device_memory {
>>>>        struct xen_reserved_device_memory_map map;
>>>>        unsigned int used_entries;
>>>>        struct domain *domain;
>>>> };
>>>>
>>>> static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
>>>>                                          u32 id, void *ctxt)
>>>> {
>>>>        struct get_reserved_device_memory *grdm = ctxt;
>>>>        struct domain *d = grdm->domain;
>>>>        unsigned int i, hit_one = 0;
>>>>        u32 sbdf;
>>>>        struct xen_reserved_device_memory rdm = {
>>>>            .start_pfn = start, .nr_pages = nr
>>>>        };
>>>>
>>>>        if ( !d->arch.hvm_domain.pci_force )
>>>>        {
>>>>            for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
>>>>            {
>>>>                sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
>>>>                                 d->arch.hvm_domain.pcidevs[i].bus,
>>>>                                 d->arch.hvm_domain.pcidevs[i].devfn);
>>>>                if ( sbdf == id )
>>>>                {
>>>>                    hit_one = 1;
>>>>                    break;
>>>>                }
>>>>            }
>>>>
>>>>            if ( !hit_one )
>>>>                return 0;
>>>>        }
>>>
>>> Why do you always pick other than the simplest possible solution?
>>
>> I don't intend it to be, but I may go a complicated way, even a wrong
>> way, based on my understanding. But as one main maintainer, if you
>> always say to me in such a reproachful word more than once, I have to
>> consider you may hint constantly I'm not a suitable candidate to finish
>> this. Its fair to me, I'd really like to quit this to ask my manager if
>> it can deliver to other guy to make sure this can move forward.
>>
>>> You don't need a separate variable here, you can simply check
>>> whether i reached d->arch.hvm_domain.num_pcidevs after the
>>> loop. And even if you added a variable, it would want to be a
>>
>> Are you saying this?
>>
>> 	if ( i == d->arch.hvm_domain.num_pcidevs )
>> 		return 0;
>
> Yes. Or use >=.

Okay.

>
>> But if the last one happens to one hit, 'i' is equal to
>> d->arch.hvm_domain.num_pcidevs.
>
> No, when the last one hits, i == d->arch.hvm_domain.num_pcidevs - 1.
>

You're right.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09  8:19           ` Jan Beulich
  2014-12-09  9:12             ` Chen, Tiejun
@ 2014-12-09 10:11             ` Tim Deegan
  2014-12-09 10:22               ` Jan Beulich
  2014-12-10  3:39               ` Tian, Kevin
  1 sibling, 2 replies; 106+ messages in thread
From: Tim Deegan @ 2014-12-09 10:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	ian.jackson, xen-devel, yang.z.zhang, Tiejun Chen

At 08:19 +0000 on 09 Dec (1418109561), Jan Beulich wrote:
> Why do you always pick other than the simplest possible solution?

Jan, please don't make personal comments like this in code review.  It
can only offend and demoralize contributors, and deter other people
from joining in.

I understand that it can be frustrating, and I'm sure I have lashed
out at people on-list in the past.  But remember that people who are
new to the project need time to learn, and keep the comments to the
code itself.

I can see that this series has been going for a long time, and is
still getting hung up on coding style issues.  Might it be useful to
have a round of internal review from some of the more xen-experienced
engineers at Intel before Jan looks at it again?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09 10:11             ` Tim Deegan
@ 2014-12-09 10:22               ` Jan Beulich
  2014-12-10  1:59                 ` Chen, Tiejun
  2014-12-10  3:39               ` Tian, Kevin
  1 sibling, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-09 10:22 UTC (permalink / raw)
  To: Tiejun Chen, Tim Deegan
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	ian.jackson, xen-devel, yang.z.zhang

>>> On 09.12.14 at 11:11, <tim@xen.org> wrote:
> At 08:19 +0000 on 09 Dec (1418109561), Jan Beulich wrote:
>> Why do you always pick other than the simplest possible solution?
> 
> Jan, please don't make personal comments like this in code review.  It
> can only offend and demoralize contributors, and deter other people
> from joining in.

I apologize - I shouldn't have permitted myself to do so.

> I understand that it can be frustrating, and I'm sure I have lashed
> out at people on-list in the past.  But remember that people who are
> new to the project need time to learn, and keep the comments to the
> code itself.
> 
> I can see that this series has been going for a long time, and is
> still getting hung up on coding style issues.  Might it be useful to
> have a round of internal review from some of the more xen-experienced
> engineers at Intel before Jan looks at it again?

I've been suggesting something along those lines a number of times,
with (apparently) no success at all.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
  2014-12-09  8:33           ` Jan Beulich
@ 2014-12-09 16:36             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-09 16:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, yang.z.zhang, Tiejun Chen

On Tue, Dec 09, 2014 at 08:33:56AM +0000, Jan Beulich wrote:
> >>> On 09.12.14 at 02:06, <tiejun.chen@intel.com> wrote:
> >>>> Also how does this work with 32-bit dom0s? Is there a need to use the
> >>>> compat layer?
> >>>
> >>> Are you saying in xsm case? Others?
> >>>
> >>> Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some
> >>> senses but I don't see such an issue you're pointing.
> >>
> >> I was thinking about the compat layer and making sure it works properly.
> > 
> > Do we really need this consideration? I mean I referred to that existing 
> > XEN_DOMCTL_assign_device to implement this new DOMCTL, but looks there's 
> > nothing related to this point.
> > 
> > Or could you make your thought clear to me with an exiting example? Then 
> > I can take a look at what exactly should be taken in my new DOMCTL since 
> > I'm a fresh man to work out this properly inside xen.
> 
> I think Konrad got a little confused here - domctl-s intentionally are
> structured so that they don't need a compat translation layer.

Ah! Thank you for that reminder!
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09 10:22               ` Jan Beulich
@ 2014-12-10  1:59                 ` Chen, Tiejun
  2014-12-10 20:21                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-10  1:59 UTC (permalink / raw)
  To: Jan Beulich, Tim Deegan
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	ian.jackson, xen-devel, yang.z.zhang

On 2014/12/9 18:22, Jan Beulich wrote:
>>>> On 09.12.14 at 11:11, <tim@xen.org> wrote:
>> At 08:19 +0000 on 09 Dec (1418109561), Jan Beulich wrote:
>>> Why do you always pick other than the simplest possible solution?
>>
>> Jan, please don't make personal comments like this in code review.  It
>> can only offend and demoralize contributors, and deter other people
>> from joining in.
>
> I apologize - I shouldn't have permitted myself to do so.
>
>> I understand that it can be frustrating, and I'm sure I have lashed

Actually myself also have this same feeling but this really isn't 
helpful to step forward.

>> out at people on-list in the past.  But remember that people who are
>> new to the project need time to learn, and keep the comments to the
>> code itself.
>>
>> I can see that this series has been going for a long time, and is
>> still getting hung up on coding style issues.  Might it be useful to

Something is my fault here. Although I learn more about Xen from this 
series constantly, looks its hard to cover this big thread with that 
reasonable code under Xen circumstance. This is also why I claimed I 
can't do this right from the start.

And additionally, even some initial designs are not very clear or argued 
between us, so when we discuss something during each revision, it bring 
a new thought again...

>> have a round of internal review from some of the more xen-experienced
>> engineers at Intel before Jan looks at it again?
>
> I've been suggesting something along those lines a number of times,
> with (apparently) no success at all.
>

Actually we had a little bit discussion internal and some guys really 
brought me some comments previously, but I think they're too busy to 
review each patch carefully one line by one line, especially if I bring 
something new to address yours comments just in the process of public 
review.

Anyway, now let me ask if this can deliver to other suitable guy. As I 
said previously I really would like to quit if this can step next properly.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-09 10:11             ` Tim Deegan
  2014-12-09 10:22               ` Jan Beulich
@ 2014-12-10  3:39               ` Tian, Kevin
  2014-12-10  9:01                 ` Jan Beulich
  2014-12-10 11:12                 ` Tim Deegan
  1 sibling, 2 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-10  3:39 UTC (permalink / raw)
  To: Tim Deegan, Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, Zhang, Yang Z, Chen, Tiejun

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Tuesday, December 09, 2014 6:11 PM
> 
> At 08:19 +0000 on 09 Dec (1418109561), Jan Beulich wrote:
> > Why do you always pick other than the simplest possible solution?
> 
> Jan, please don't make personal comments like this in code review.  It
> can only offend and demoralize contributors, and deter other people
> from joining in.
> 
> I understand that it can be frustrating, and I'm sure I have lashed
> out at people on-list in the past.  But remember that people who are
> new to the project need time to learn, and keep the comments to the
> code itself.
> 
> I can see that this series has been going for a long time, and is
> still getting hung up on coding style issues.  Might it be useful to
> have a round of internal review from some of the more xen-experienced
> engineers at Intel before Jan looks at it again?
> 

Thanks Tim. Some of my thoughts:

1. It's more efficient for new people to start from a small, well-defined task
in one area, and then spanning to adjacent areas gradually. Patience must 
be given by the community to help them grow;

2. Unfortunately this RMRR effort grows from original two patches (very VT-d
focused) to now v8 17 patches spanning many areas (including hypercall, mmu, 
domain builder, hvmloader, etc.), and thus imposing both long learning curve
and lots of frustrations being no achievement returned. Though having a complete
solution is desired, we need to help split the big task into step-by-step approach 
as long as:
	- the overall design is agreed
	- each step is self-contained 
	- each step won't be wasted moving forward. 
That way new people can see incremental achievements, instead of being hit 
down before final success. 

Take this RMRR for example. Maybe we can split into steps like:

	step-1: setup RMRR identify mapping in hypervisor. fail if confliction. no
user space changes
	step-2: expose RMRR knowledge to userspace and detect confliction in
domain builder and hvmloader
	step-3: reconcile e820/mmio to avoid confliction in a best effort
	step-4: miscellaneous enhancements, each can be ACK-ed individually:
		- handle guest CPU access to RMRR regions
		- handle conflicting RMRR regions
		- handle group RMRRs
		- re-enable USB device assignment

That way Tiejun can focus on a self-contained task at one time, and then observe
continuous acknowledgements for his work. We don't need to claim RMRR fully
ready in Xen until last patch is accepted, but that at least provides a way to ack
new people when working on complex features and also allow for partial usage 
on his work.

3. Maintainers sometimes didn't give definitive guidance to the new people, 
and the high-level design is not closed in the 1st place. e.g. when I thought this v8
version has everyone agreed on the design, there's another comment from Jan
about using XENMEM_set_memory_map might be better, while back to Aug.
when Tiejun raised that option the answer is "I'm not sure". Note I understand
as a maintainer he might not have definite answers for all opens. But w/o a
definitive guide new people may waste lots of effort on chasing a wrong option,
which is definitely frustrating. 

So I'd suggest for such non-trivial task for a new people, all maintainers in 
involved areas (xen, mmu, tools, vtd, etc) should first spend time to agree 
on the high level design. At that stage, let's skip the coding problems to save
both time. After agreed design, then we can help the new people to improve 
the coding to reach check-in criteria, which then becomes a converging process.

for this RMRR issue, let's close the design first, and then use staged approach
to get this patch series in.

4. Regarding to coding style, Intel will definitely help our new people internally,
and we also like all the discussion happened publicly, to also benefit from the
community, especially regarding to current scope extended out of VT-d area. 
In the meantime, if w/ above suggestions, our new people can then focus on 
incremental deliverables and then pay more attention to coding styles. 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-10  3:39               ` Tian, Kevin
@ 2014-12-10  9:01                 ` Jan Beulich
  2014-12-10  9:57                   ` Tian, Kevin
  2014-12-10 11:12                 ` Tim Deegan
  1 sibling, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2014-12-10  9:01 UTC (permalink / raw)
  To: Kevin Tian
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Yang Z Zhang, Tiejun Chen

>>> On 10.12.14 at 04:39, <kevin.tian@intel.com> wrote:
> 1. It's more efficient for new people to start from a small, well-defined 
> task
> in one area, and then spanning to adjacent areas gradually. Patience must 
> be given by the community to help them grow;

Yes. But if a large item like the RMRR one is being picked, this is a
recipe for failure.

> 2. Unfortunately this RMRR effort grows from original two patches (very VT-d
> focused) to now v8 17 patches spanning many areas (including hypercall, mmu, 
> domain builder, hvmloader, etc.), and thus imposing both long learning curve
> and lots of frustrations being no achievement returned. Though having a 
> complete
> solution is desired, we need to help split the big task into step-by-step 
> approach 
> as long as:
> 	- the overall design is agreed
> 	- each step is self-contained 
> 	- each step won't be wasted moving forward. 
> That way new people can see incremental achievements, instead of being hit 
> down before final success. 
> 
> Take this RMRR for example. Maybe we can split into steps like:
> 
> 	step-1: setup RMRR identify mapping in hypervisor. fail if confliction. no
> user space changes
> 	step-2: expose RMRR knowledge to userspace and detect confliction in
> domain builder and hvmloader
> 	step-3: reconcile e820/mmio to avoid confliction in a best effort
> 	step-4: miscellaneous enhancements, each can be ACK-ed individually:
> 		- handle guest CPU access to RMRR regions
> 		- handle conflicting RMRR regions
> 		- handle group RMRRs
> 		- re-enable USB device assignment
> 
> That way Tiejun can focus on a self-contained task at one time, and then 
> observe
> continuous acknowledgements for his work. We don't need to claim RMRR fully
> ready in Xen until last patch is accepted, but that at least provides a way 
> to ack
> new people when working on complex features and also allow for partial usage 
> on his work.

If only this wouldn't result in regressions when done in steps (like you
outlined above, or likely also if split up in any other ways). Having to
do this in one go is the price you/we have to pay for this not having
got done properly from the beginning.

> 3. Maintainers sometimes didn't give definitive guidance to the new people, 
> and the high-level design is not closed in the 1st place. e.g. when I thought 
> this v8
> version has everyone agreed on the design, there's another comment from Jan
> about using XENMEM_set_memory_map might be better, while back to Aug.
> when Tiejun raised that option the answer is "I'm not sure". Note I 
> understand
> as a maintainer he might not have definite answers for all opens. But w/o a
> definitive guide new people may waste lots of effort on chasing a wrong 
> option,
> which is definitely frustrating. 

Main problem being that the maintainers to help here are primarily you,
and only then me or others - after all this is a VT-d only problem, not a
general IOMMU one. The fact that non-VT-d code gets touched doesn't
matter when considering just the design. And that's why I had asked
Tiejun to work with the two of you on getting the basis right. I don't
know how much of that possibly happened without the public seeing it,
but the results seem to suggest not all that much.

> So I'd suggest for such non-trivial task for a new people, all maintainers in 
> involved areas (xen, mmu, tools, vtd, etc) should first spend time to agree 
> on the high level design. At that stage, let's skip the coding problems to 
> save
> both time. After agreed design, then we can help the new people to improve 
> the coding to reach check-in criteria, which then becomes a converging 
> process.
> 
> for this RMRR issue, let's close the design first, and then use staged 
> approach
> to get this patch series in.

Yes please. Till now (as said many times before) the only route I see
without grounds up design considerations is to disable pass-through for
devices associated with RMRRs. The longer the current situation lasts,
the more I'm tempted to put together a patch to do just that.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-10  9:01                 ` Jan Beulich
@ 2014-12-10  9:57                   ` Tian, Kevin
  0 siblings, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2014-12-10  9:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Zhang, Yang Z, Chen, Tiejun

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, December 10, 2014 5:01 PM
> 
> >>> On 10.12.14 at 04:39, <kevin.tian@intel.com> wrote:
> > 1. It's more efficient for new people to start from a small, well-defined
> > task
> > in one area, and then spanning to adjacent areas gradually. Patience must
> > be given by the community to help them grow;
> 
> Yes. But if a large item like the RMRR one is being picked, this is a
> recipe for failure.

RMRR is already there, just broken. Here what we are doing is to fix it. Regarding
to that, as long as the small item doesn't cause new failures or regressions, it
should be fine.

> 
> > 2. Unfortunately this RMRR effort grows from original two patches (very
> VT-d
> > focused) to now v8 17 patches spanning many areas (including hypercall,
> mmu,
> > domain builder, hvmloader, etc.), and thus imposing both long learning curve
> > and lots of frustrations being no achievement returned. Though having a
> > complete
> > solution is desired, we need to help split the big task into step-by-step
> > approach
> > as long as:
> > 	- the overall design is agreed
> > 	- each step is self-contained
> > 	- each step won't be wasted moving forward.
> > That way new people can see incremental achievements, instead of being hit
> > down before final success.
> >
> > Take this RMRR for example. Maybe we can split into steps like:
> >
> > 	step-1: setup RMRR identify mapping in hypervisor. fail if confliction. no
> > user space changes
> > 	step-2: expose RMRR knowledge to userspace and detect confliction in
> > domain builder and hvmloader
> > 	step-3: reconcile e820/mmio to avoid confliction in a best effort
> > 	step-4: miscellaneous enhancements, each can be ACK-ed individually:
> > 		- handle guest CPU access to RMRR regions
> > 		- handle conflicting RMRR regions
> > 		- handle group RMRRs
> > 		- re-enable USB device assignment
> >
> > That way Tiejun can focus on a self-contained task at one time, and then
> > observe
> > continuous acknowledgements for his work. We don't need to claim RMRR
> fully
> > ready in Xen until last patch is accepted, but that at least provides a way
> > to ack
> > new people when working on complex features and also allow for partial
> usage
> > on his work.
> 
> If only this wouldn't result in regressions when done in steps (like you
> outlined above, or likely also if split up in any other ways). Having to
> do this in one go is the price you/we have to pay for this not having
> got done properly from the beginning.

then let's sit down to clear the design first before going to review the detail.

> 
> > 3. Maintainers sometimes didn't give definitive guidance to the new people,
> > and the high-level design is not closed in the 1st place. e.g. when I thought
> > this v8
> > version has everyone agreed on the design, there's another comment from
> Jan
> > about using XENMEM_set_memory_map might be better, while back to Aug.
> > when Tiejun raised that option the answer is "I'm not sure". Note I
> > understand
> > as a maintainer he might not have definite answers for all opens. But w/o a
> > definitive guide new people may waste lots of effort on chasing a wrong
> > option,
> > which is definitely frustrating.
> 
> Main problem being that the maintainers to help here are primarily you,
> and only then me or others - after all this is a VT-d only problem, not a
> general IOMMU one. The fact that non-VT-d code gets touched doesn't
> matter when considering just the design. And that's why I had asked
> Tiejun to work with the two of you on getting the basis right. I don't
> know how much of that possibly happened without the public seeing it,
> but the results seem to suggest not all that much.

We worked with Tiejun on the design, and some level of code review (not much
as you did since it touches lots of different areas), and the version he sent out is 
what we discussed to be a right way to go. But since you tempt to have 
spontaneous different opinions in each series, let's close that first. We'll work
with Tiejun to send a design review request separately, and let's see how it works
and whether it may be split into small steps.

and... as you already noted, when 'RMRR' is a VT feature, the majority code
touched in the patch series are not in VT-d space. Somehow I view this is a
general feature development, i.e. how to reserve a resource from guest 
physical space. RMRR is just one client of this feature, and there could be
others. In such case, we need all the maintainers in corresponding areas to
help, meant you as the general maintainer, meant Tim for mmu part, and
meant Ian for domain builder part, etc... Regarding to design, we need all
on the table and come to agreement.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-10  3:39               ` Tian, Kevin
  2014-12-10  9:01                 ` Jan Beulich
@ 2014-12-10 11:12                 ` Tim Deegan
  2014-12-11  2:03                   ` Tian, Kevin
  1 sibling, 1 reply; 106+ messages in thread
From: Tim Deegan @ 2014-12-10 11:12 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, Jan Beulich, Zhang, Yang Z, Chen, Tiejun

Hi Kevin,

Thanks for taking the time to work through this.

At 03:39 +0000 on 10 Dec (1418179184), Tian, Kevin wrote:
> 1. It's more efficient for new people to start from a small, well-defined task
> in one area, and then spanning to adjacent areas gradually. Patience must 
> be given by the community to help them grow;
> 
> 2. Unfortunately this RMRR effort grows from original two patches (very VT-d
> focused) to now v8 17 patches spanning many areas (including hypercall, mmu, 
> domain builder, hvmloader, etc.), and thus imposing both long learning curve
> and lots of frustrations being no achievement returned. Though having a complete
> solution is desired, we need to help split the big task into step-by-step approach 
> as long as:
> 	- the overall design is agreed
> 	- each step is self-contained 
> 	- each step won't be wasted moving forward. 
> That way new people can see incremental achievements, instead of being hit 
> down before final success. 
> 
> Take this RMRR for example. Maybe we can split into steps like:
> 
> 	step-1: setup RMRR identify mapping in hypervisor. fail if confliction. no
> user space changes
> 	step-2: expose RMRR knowledge to userspace and detect confliction in
> domain builder and hvmloader
> 	step-3: reconcile e820/mmio to avoid confliction in a best effort
> 	step-4: miscellaneous enhancements, each can be ACK-ed individually:
> 		- handle guest CPU access to RMRR regions
> 		- handle conflicting RMRR regions
> 		- handle group RMRRs
> 		- re-enable USB device assignment
> 
> That way Tiejun can focus on a self-contained task at one time, and then observe
> continuous acknowledgements for his work. We don't need to claim RMRR fully
> ready in Xen until last patch is accepted, but that at least provides a way to ack
> new people when working on complex features and also allow for partial usage 
> on his work.

We had this discussion before and I think it was clear that the
maintainers in general are unhappy with a partial solution.  OTOH, if
we can agree on the roadmap, and Intel will commit to seeing the work
through, it might be possible.  I think Jan is the man to convince
here. :)

Now since the code's not going to be in 4.5 anyway, it should be
possible to _develop_ it in this manner, possibly in a private branch,
even if the early stages aren't getting applied immediately.  We
should be able to set up an rmrr branch on the public servers if that
helps.  But again, that relies on having a design worked out in
advance that both developers and maintainers are (within reason)
committed to.

> 3. Maintainers sometimes didn't give definitive guidance to the new people, 
> and the high-level design is not closed in the 1st place. e.g. when I thought this v8
> version has everyone agreed on the design, there's another comment from Jan
> about using XENMEM_set_memory_map might be better, while back to Aug.
> when Tiejun raised that option the answer is "I'm not sure". Note I understand
> as a maintainer he might not have definite answers for all opens. But w/o a
> definitive guide new people may waste lots of effort on chasing a wrong option,
> which is definitely frustrating. 

This is definitely a problem, and indeed frustrating for the
developers.  We had similar difficulties with PVH development, where
even though we planned the architecture up-front, once the code was
written it became clear that a different approach would have been
better.  I'm not sure what we can do here to make it more likely that
we get the design right first time.

I do think that figuring out the design in advance makes these
projects much smoother, and it's something we're seeing more of.  For
example David Vrabel's designs for the new event channel interface, or
indeed the design doc he wrote about grant table locking, where review
at the design stage may have avoided a bunch of implementation effort
altogether.

> So I'd suggest for such non-trivial task for a new people, all maintainers in 
> involved areas (xen, mmu, tools, vtd, etc) should first spend time to agree 
> on the high level design. At that stage, let's skip the coding problems to save
> both time.

That sounds like a very good idea to me.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-10  1:59                 ` Chen, Tiejun
@ 2014-12-10 20:21                   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 106+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-10 20:21 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	ian.jackson, Tim Deegan, xen-devel, Jan Beulich, yang.z.zhang

On Wed, Dec 10, 2014 at 09:59:12AM +0800, Chen, Tiejun wrote:
> On 2014/12/9 18:22, Jan Beulich wrote:
> >>>>On 09.12.14 at 11:11, <tim@xen.org> wrote:
> >>At 08:19 +0000 on 09 Dec (1418109561), Jan Beulich wrote:
> >>>Why do you always pick other than the simplest possible solution?
> >>
> >>Jan, please don't make personal comments like this in code review.  It
> >>can only offend and demoralize contributors, and deter other people
> >>from joining in.
> >
> >I apologize - I shouldn't have permitted myself to do so.
> >
> >>I understand that it can be frustrating, and I'm sure I have lashed
> 
> Actually myself also have this same feeling but this really isn't helpful to
> step forward.
> 
> >>out at people on-list in the past.  But remember that people who are
> >>new to the project need time to learn, and keep the comments to the
> >>code itself.
> >>
> >>I can see that this series has been going for a long time, and is
> >>still getting hung up on coding style issues.  Might it be useful to
> 
> Something is my fault here. Although I learn more about Xen from this series
> constantly, looks its hard to cover this big thread with that reasonable
> code under Xen circumstance. This is also why I claimed I can't do this
> right from the start.
> 
> And additionally, even some initial designs are not very clear or argued
> between us, so when we discuss something during each revision, it bring a
> new thought again...

<nods>
> 
> >>have a round of internal review from some of the more xen-experienced
> >>engineers at Intel before Jan looks at it again?
> >
> >I've been suggesting something along those lines a number of times,
> >with (apparently) no success at all.
> >
> 
> Actually we had a little bit discussion internal and some guys really
> brought me some comments previously, but I think they're too busy to review
> each patch carefully one line by one line, especially if I bring something
> new to address yours comments just in the process of public review.
> 
> Anyway, now let me ask if this can deliver to other suitable guy. As I said
> previously I really would like to quit if this can step next properly.

Bummer. I was really looking forward to more patches from you. I understand
the frustration working throught this (my first patches in Linux took 9
months!) and I was hoping that you having to go through a pretty complex
code and getting the it done (with twist and turns at every corner) the
end result would be:
 - You would understand the Xen codebase at an expert level!
 - And would be able to contribute to Xen on other features knowing
   pitfalls, etc.
 - You would be able to help other engineers that start working on Xen
   how to do it.
 - Comfrotably review other folks patches.

Is there something that can be done for you to not step away?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-10 11:12                 ` Tim Deegan
@ 2014-12-11  2:03                   ` Tian, Kevin
  2014-12-11 13:09                     ` Tim Deegan
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2014-12-11  2:03 UTC (permalink / raw)
  To: Tim Deegan
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, Jan Beulich, Zhang, Yang Z, Chen, Tiejun

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Wednesday, December 10, 2014 7:12 PM
> 
> Hi Kevin,
> 
> Thanks for taking the time to work through this.
> 
> At 03:39 +0000 on 10 Dec (1418179184), Tian, Kevin wrote:
> > 1. It's more efficient for new people to start from a small, well-defined task
> > in one area, and then spanning to adjacent areas gradually. Patience must
> > be given by the community to help them grow;
> >
> > 2. Unfortunately this RMRR effort grows from original two patches (very
> VT-d
> > focused) to now v8 17 patches spanning many areas (including hypercall,
> mmu,
> > domain builder, hvmloader, etc.), and thus imposing both long learning curve
> > and lots of frustrations being no achievement returned. Though having a
> complete
> > solution is desired, we need to help split the big task into step-by-step
> approach
> > as long as:
> > 	- the overall design is agreed
> > 	- each step is self-contained
> > 	- each step won't be wasted moving forward.
> > That way new people can see incremental achievements, instead of being hit
> > down before final success.
> >
> > Take this RMRR for example. Maybe we can split into steps like:
> >
> > 	step-1: setup RMRR identify mapping in hypervisor. fail if confliction. no
> > user space changes
> > 	step-2: expose RMRR knowledge to userspace and detect confliction in
> > domain builder and hvmloader
> > 	step-3: reconcile e820/mmio to avoid confliction in a best effort
> > 	step-4: miscellaneous enhancements, each can be ACK-ed individually:
> > 		- handle guest CPU access to RMRR regions
> > 		- handle conflicting RMRR regions
> > 		- handle group RMRRs
> > 		- re-enable USB device assignment
> >
> > That way Tiejun can focus on a self-contained task at one time, and then
> observe
> > continuous acknowledgements for his work. We don't need to claim RMRR
> fully
> > ready in Xen until last patch is accepted, but that at least provides a way to
> ack
> > new people when working on complex features and also allow for partial
> usage
> > on his work.
> 
> We had this discussion before and I think it was clear that the
> maintainers in general are unhappy with a partial solution.  OTOH, if
> we can agree on the roadmap, and Intel will commit to seeing the work
> through, it might be possible.  I think Jan is the man to convince
> here. :)

I think w/ last 8 series Tiejun has sent out, there's no doubt Intel commits
to make a complete work. :-)

> 
> Now since the code's not going to be in 4.5 anyway, it should be
> possible to _develop_ it in this manner, possibly in a private branch,
> even if the early stages aren't getting applied immediately.  We
> should be able to set up an rmrr branch on the public servers if that
> helps.  But again, that relies on having a design worked out in
> advance that both developers and maintainers are (within reason)
> committed to.

that's a good suggestion. We'll send out a design review today, and then
based on discussion we can see whether making sense to do such 
incremental way.

> 
> > 3. Maintainers sometimes didn't give definitive guidance to the new people,
> > and the high-level design is not closed in the 1st place. e.g. when I thought
> this v8
> > version has everyone agreed on the design, there's another comment from
> Jan
> > about using XENMEM_set_memory_map might be better, while back to Aug.
> > when Tiejun raised that option the answer is "I'm not sure". Note I
> understand
> > as a maintainer he might not have definite answers for all opens. But w/o a
> > definitive guide new people may waste lots of effort on chasing a wrong
> option,
> > which is definitely frustrating.
> 
> This is definitely a problem, and indeed frustrating for the
> developers.  We had similar difficulties with PVH development, where
> even though we planned the architecture up-front, once the code was
> written it became clear that a different approach would have been
> better.  I'm not sure what we can do here to make it more likely that
> we get the design right first time.

understand and reasonable.

> 
> I do think that figuring out the design in advance makes these
> projects much smoother, and it's something we're seeing more of.  For
> example David Vrabel's designs for the new event channel interface, or
> indeed the design doc he wrote about grant table locking, where review
> at the design stage may have avoided a bunch of implementation effort
> altogether.

yes. a formal review in advance would be lot better than mixing design 
comments in scattered in deep coding review comments. For this RMRR
example, Tiejun did send out some summary for his patch set, but not
abstracted enough to catch people's eye on key design opens. And having
a goal for 4.5 window really brought him hard time to balance code 
refactoring and learn new areas when each series was questioned with 
new design inputs.

> 
> > So I'd suggest for such non-trivial task for a new people, all maintainers in
> > involved areas (xen, mmu, tools, vtd, etc) should first spend time to agree
> > on the high level design. At that stage, let's skip the coding problems to save
> > both time.
> 
> That sounds like a very good idea to me.
> 

Thanks a lot,
Kevin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-11  2:03                   ` Tian, Kevin
@ 2014-12-11 13:09                     ` Tim Deegan
  2014-12-18 16:13                       ` Tim Deegan
  0 siblings, 1 reply; 106+ messages in thread
From: Tim Deegan @ 2014-12-11 13:09 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, Jan Beulich, Zhang, Yang Z, Chen, Tiejun

At 02:03 +0000 on 11 Dec (1418259797), Tian, Kevin wrote:
> > From: Tim Deegan [mailto:tim@xen.org]
> > Now since the code's not going to be in 4.5 anyway, it should be
> > possible to _develop_ it in this manner, possibly in a private branch,
> > even if the early stages aren't getting applied immediately.  We
> > should be able to set up an rmrr branch on the public servers if that
> > helps.  But again, that relies on having a design worked out in
> > advance that both developers and maintainers are (within reason)
> > committed to.
> 
> that's a good suggestion. We'll send out a design review today, and then
> based on discussion we can see whether making sense to do such 
> incremental way.

Sounds good!

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-11 13:09                     ` Tim Deegan
@ 2014-12-18 16:13                       ` Tim Deegan
  2014-12-19  1:03                         ` Chen, Tiejun
  0 siblings, 1 reply; 106+ messages in thread
From: Tim Deegan @ 2014-12-18 16:13 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, Jan Beulich, Zhang, Yang Z, Chen, Tiejun

Hi Kevin,

At 14:09 +0100 on 11 Dec (1418303386), Tim Deegan wrote:
> At 02:03 +0000 on 11 Dec (1418259797), Tian, Kevin wrote:
> > > From: Tim Deegan [mailto:tim@xen.org]
> > > Now since the code's not going to be in 4.5 anyway, it should be
> > > possible to _develop_ it in this manner, possibly in a private branch,
> > > even if the early stages aren't getting applied immediately.  We
> > > should be able to set up an rmrr branch on the public servers if that
> > > helps.  But again, that relies on having a design worked out in
> > > advance that both developers and maintainers are (within reason)
> > > committed to.
> > 
> > that's a good suggestion. We'll send out a design review today, and then
> > based on discussion we can see whether making sense to do such 
> > incremental way.
> 
> Sounds good!

I haven't seen this design doc yet -- if I missed it can someone point
me to it?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory
  2014-12-02 14:54   ` Julien Grall
@ 2014-12-18 22:56     ` Tamas K Lengyel
  0 siblings, 0 replies; 106+ messages in thread
From: Tamas K Lengyel @ 2014-12-18 22:56 UTC (permalink / raw)
  To: Julien Grall
  Cc: Tian, Kevin, wei.liu2, Ian Campbell, Stefano Stabellini,
	Tim Deegan, Ian Jackson, xen-devel, Jan Beulich, yang.z.zhang,
	Tiejun Chen, Tamas K Lengyel


[-- Attachment #1.1: Type: text/plain, Size: 2088 bytes --]

I agree with Julien below, this should probably be put into the p2m layer.
The ARM definition of the function could then simply be an inline
definition that just returns 0.

Tamas

On Tue, Dec 2, 2014 at 9:54 AM, Julien Grall <julien.grall@linaro.org>
wrote:
>
> Hi,
>
> CC Tamas as he did some work on memaccess for ARM.
>
> On 01/12/14 09:24, Tiejun Chen wrote:
> > We can't expost those reserved device memory in case of mem_access
>
> s/expost/expose/
>
> > since any access may corrupt device usage.
> >
> > Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> > ---
> >  xen/common/mem_access.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> >
> > diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
> > index 6c2724b..72a807a 100644
> > --- a/xen/common/mem_access.c
> > +++ b/xen/common/mem_access.c
> > @@ -55,6 +55,43 @@ void mem_access_resume(struct domain *d)
> >      }
> >  }
> >
> > +/* We can't expose reserved device memory. */
> > +static int mem_access_check_rdm(struct domain *d, uint64_aligned_t
> start,
> > +                                uint32_t nr)
> > +{
> > +    uint32_t i;
> > +    struct p2m_get_reserved_device_memory pgrdm;
>
> p2m_get_reserved_device_memory is only defined on x86. This will fail to
> compile on ARM when memaccess is enabled.
>
> > +    int rc = 0;
> > +
> > +    if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
> > +    {
> > +        for ( i = 0; i < nr; i++ )
> > +        {
> > +            pgrdm.gfn = start + i;
> > +            pgrdm.domain = d;
> > +            rc =
> iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> > +                                                  &pgrdm);
>
> Same here.
>
> Overall, I'm not sure if it's worth to introduce this code in the common
> part has it doesn't seem useful for ARM.
>
> In any case, you have to at least stub those bits for ARM.
>
> Regards,
>
> --
> Julien Grall
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 3088 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm
  2014-12-18 16:13                       ` Tim Deegan
@ 2014-12-19  1:03                         ` Chen, Tiejun
  0 siblings, 0 replies; 106+ messages in thread
From: Chen, Tiejun @ 2014-12-19  1:03 UTC (permalink / raw)
  To: Tim Deegan, Tian, Kevin
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, Jan Beulich, Zhang, Yang Z

On 2014/12/19 0:13, Tim Deegan wrote:
> Hi Kevin,
>
> At 14:09 +0100 on 11 Dec (1418303386), Tim Deegan wrote:
>> At 02:03 +0000 on 11 Dec (1418259797), Tian, Kevin wrote:
>>>> From: Tim Deegan [mailto:tim@xen.org]
>>>> Now since the code's not going to be in 4.5 anyway, it should be
>>>> possible to _develop_ it in this manner, possibly in a private branch,
>>>> even if the early stages aren't getting applied immediately.  We
>>>> should be able to set up an rmrr branch on the public servers if that
>>>> helps.  But again, that relies on having a design worked out in
>>>> advance that both developers and maintainers are (within reason)
>>>> committed to.
>>>
>>> that's a good suggestion. We'll send out a design review today, and then
>>> based on discussion we can see whether making sense to do such
>>> incremental way.
>>
>> Sounds good!
>
> I haven't seen this design doc yet -- if I missed it can someone point
> me to it?
>

No.

Its still reviewed/discussed internally but it will come soon.

Tiejun

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2014-12-19  1:03 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01  9:24 [v8][PATCH 00/17] xen: RMRR fix Tiejun Chen
2014-12-01  9:24 ` [v8][PATCH 01/17] tools/hvmloader: link errno.h from xen internal Tiejun Chen
2014-12-01  9:24 ` [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm Tiejun Chen
2014-12-02  8:33   ` Tian, Kevin
2014-12-08  1:30     ` Chen, Tiejun
2014-12-02 19:39   ` Konrad Rzeszutek Wilk
2014-12-08  3:16     ` Chen, Tiejun
2014-12-08 15:57       ` Konrad Rzeszutek Wilk
2014-12-09  1:06         ` Chen, Tiejun
2014-12-09  8:33           ` Jan Beulich
2014-12-09 16:36             ` Konrad Rzeszutek Wilk
2014-12-04 15:33   ` Jan Beulich
2014-12-05  6:13     ` Tian, Kevin
2014-12-08  6:06     ` Chen, Tiejun
2014-12-08  8:43       ` Jan Beulich
2014-12-09  2:38         ` Chen, Tiejun
2014-12-09  7:29           ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 03/17] introduce XENMEM_reserved_device_memory_map Tiejun Chen
2014-12-02 19:47   ` Konrad Rzeszutek Wilk
2014-12-08  6:17     ` Chen, Tiejun
2014-12-08 10:00       ` Jan Beulich
2014-12-08 16:45         ` Daniel De Graaf
2014-12-08 16:54           ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm Tiejun Chen
2014-12-02  8:46   ` Tian, Kevin
2014-12-08  6:22     ` Chen, Tiejun
2014-12-04 15:50   ` Jan Beulich
2014-12-08  7:11     ` Chen, Tiejun
2014-12-08  8:51       ` Jan Beulich
2014-12-09  7:47         ` Chen, Tiejun
2014-12-09  8:19           ` Jan Beulich
2014-12-09  9:12             ` Chen, Tiejun
2014-12-09  9:21               ` Jan Beulich
2014-12-09  9:35                 ` Chen, Tiejun
2014-12-09 10:11             ` Tim Deegan
2014-12-09 10:22               ` Jan Beulich
2014-12-10  1:59                 ` Chen, Tiejun
2014-12-10 20:21                   ` Konrad Rzeszutek Wilk
2014-12-10  3:39               ` Tian, Kevin
2014-12-10  9:01                 ` Jan Beulich
2014-12-10  9:57                   ` Tian, Kevin
2014-12-10 11:12                 ` Tim Deegan
2014-12-11  2:03                   ` Tian, Kevin
2014-12-11 13:09                     ` Tim Deegan
2014-12-18 16:13                       ` Tim Deegan
2014-12-19  1:03                         ` Chen, Tiejun
2014-12-01  9:24 ` [v8][PATCH 05/17] tools/libxc: introduce hypercall for xc_reserved_device_memory_map Tiejun Chen
2014-12-02  8:46   ` Tian, Kevin
2014-12-02 19:50   ` Konrad Rzeszutek Wilk
2014-12-08  7:25     ` Chen, Tiejun
2014-12-08 15:52       ` Konrad Rzeszutek Wilk
2014-12-01  9:24 ` [v8][PATCH 06/17] tools/libxc: check if modules space is overlapping with reserved device memory Tiejun Chen
2014-12-02  8:54   ` Tian, Kevin
2014-12-02 19:55   ` Konrad Rzeszutek Wilk
2014-12-08  7:49     ` Chen, Tiejun
2014-12-01  9:24 ` [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps Tiejun Chen
2014-12-02  8:59   ` Tian, Kevin
2014-12-08  7:55     ` Chen, Tiejun
2014-12-02 20:01   ` Konrad Rzeszutek Wilk
2014-12-08  8:09     ` Chen, Tiejun
2014-12-08  8:45       ` Chen, Tiejun
2014-12-04 15:52   ` Jan Beulich
2014-12-08  8:52     ` Chen, Tiejun
2014-12-08  9:18       ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory Tiejun Chen
2014-12-02  9:11   ` Tian, Kevin
2014-12-08  9:04     ` Chen, Tiejun
2014-12-04 16:04   ` Jan Beulich
2014-12-08  9:10     ` Chen, Tiejun
2014-12-01  9:24 ` [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
2014-12-02  9:42   ` Tian, Kevin
2014-12-02 20:17   ` Konrad Rzeszutek Wilk
2014-12-04 16:20   ` Jan Beulich
2014-12-05  6:23     ` Tian, Kevin
2014-12-05  7:43       ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory Tiejun Chen
2014-12-02  9:48   ` Tian, Kevin
2014-12-02 20:23   ` Konrad Rzeszutek Wilk
2014-12-04 16:28   ` Jan Beulich
2014-12-05  6:24     ` Tian, Kevin
2014-12-05  7:46       ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping Tiejun Chen
2014-12-02  9:57   ` Tian, Kevin
2014-12-04 16:42   ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
2014-12-02  9:59   ` Tian, Kevin
2014-12-02 20:26   ` Konrad Rzeszutek Wilk
2014-12-04 16:46   ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory Tiejun Chen
2014-12-02 14:54   ` Julien Grall
2014-12-18 22:56     ` Tamas K Lengyel
2014-12-02 20:27   ` Konrad Rzeszutek Wilk
2014-12-04 16:51   ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 14/17] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2014-12-02 10:00   ` Tian, Kevin
2014-12-02 20:29   ` Konrad Rzeszutek Wilk
2014-12-01  9:24 ` [v8][PATCH 15/17] xen:vtd: create RMRR mapping Tiejun Chen
2014-12-02 10:02   ` Tian, Kevin
2014-12-02 20:30   ` Konrad Rzeszutek Wilk
2014-12-01  9:24 ` [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR Tiejun Chen
2014-12-02 10:11   ` Tian, Kevin
2014-12-02 20:40   ` Konrad Rzeszutek Wilk
2014-12-04 17:05   ` Jan Beulich
2014-12-01  9:24 ` [v8][PATCH 17/17] xen/vtd: re-enable USB device assignment if enable pci_force Tiejun Chen
2014-12-05 16:12   ` Konrad Rzeszutek Wilk
2014-12-02 19:17 ` [v8][PATCH 00/17] xen: RMRR fix Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.