All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
@ 2013-12-18 17:28 Ian Campbell
  2013-12-18 17:30 ` [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED Ian Campbell
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-18 17:28 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, Tim Deegan, Julien Grall, Stefano.Stabellini,
	David Vrabel, Boris Ostrovsky

When building an ARM guest we need to take care of cache maintenance
because the guest starts with MMU and cache disabled, which means we
need to make sure that the initial images (kernel, initrd, dtb) which we
write to guest memory are not in the cache.

We thought we had solved this with "tools: libxc: flush data cache after
loading images into guest memory" (a0035ecc0d82) however it turns out
that there are a couple of issues with this approach:

Firstly we need to do a cache flush from userspace, on arm64 this is
possible by directly using the instructions from userspace, but on arm32
this is not possible and so we need to use a system call. Unfortunately
the system call provided by Linux for this purpose does not flush far
enough down the cache hierarchy. Extending the system call would not be
an insurmountable barrier, were it not for the second issue:

Secondly, and more importantly, Catalin Marinas points out (via Marc
Zyngier) that there is a race between the cache flush and the point
where we tear down the mappings, where the processor might speculatively
pull some data into the cache (cache flushes are by Virtual Address, so
this race is unavoidable).

If this happens then guest kernels which modify some code/data before
enabling MMUs + caches may see stale data in the cache.

The solution to this is to use a non-cacheable mapping to populate the
guest RAM, which prevents the processor from pulling anything into the
cache lines. Since we need to write all the way through to RAM anyway
there is not any downside to this.

In order to do this we need a new privcmd ioctl which creates uncached
mappings and a libxc patch to use it. Both of these will follow.

I'm sending this out early to validate that this approach does actually
work and makes sense.

I've tried this on arm64 and it seems ok, Julien, please can you give
this a go in your midway test case which exposes this issue.

The libxc patch here is especially RFC and not at all ready to go, the
kernel patch is pretty simple and probably OK, but still flagged as RFC
since it is part of the whole pair.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-18 17:28 [PATCH RFC] xen: arm: use uncached foreign mappings when building guests Ian Campbell
@ 2013-12-18 17:30 ` Ian Campbell
  2013-12-18 19:24   ` Stefano Stabellini
  2013-12-18 21:16   ` Konrad Rzeszutek Wilk
  2013-12-18 17:30 ` [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder Ian Campbell
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-18 17:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, julien.grall, tim, David Vrabel, xen-devel,
	Boris Ostrovsky, stefano.stabellini

On ARM we want to use uncached foreign mappings while building the domain
because the guests start with MMU and caches disabled.

Flushing the caches before launching the guest is problematic because there is
a window between flush and unmap where the processor might speculatively fill
a cache line. Using a non-cacheable mapping completely avoids this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano.Stabellini@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
---
 drivers/xen/privcmd.c      |   15 ++++++++++-----
 include/uapi/xen/privcmd.h |    2 ++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 569a13b..b5561d1 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -253,6 +253,7 @@ struct mmap_batch_state {
 	domid_t domain;
 	unsigned long va;
 	struct vm_area_struct *vma;
+	pgprot_t prot;
 	int index;
 	/* A tristate:
 	 *      0 for no errors
@@ -285,8 +286,7 @@ static int mmap_batch_fn(void *data, void *state)
 		cur_page = pages[st->index++];
 
 	ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
-					 st->vma->vm_page_prot, st->domain,
-					 &cur_page);
+					 st->prot, st->domain, &cur_page);
 
 	/* Store error code for second pass. */
 	if (st->version == 1) {
@@ -367,7 +367,7 @@ static int alloc_empty_pages(struct vm_area_struct *vma, int numpgs)
 
 static struct vm_operations_struct privcmd_vm_ops;
 
-static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
+static long privcmd_ioctl_mmap_batch(void __user *udata, int version, int cached)
 {
 	int ret;
 	struct privcmd_mmapbatch_v2 m;
@@ -464,6 +464,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 
 	state.domain        = m.dom;
 	state.vma           = vma;
+	state.prot          = cached ? vma->vm_page_prot
+				     : pgprot_noncached(vma->vm_page_prot);
 	state.va            = m.addr;
 	state.index         = 0;
 	state.global_error  = 0;
@@ -514,13 +516,16 @@ static long privcmd_ioctl(struct file *file,
 		break;
 
 	case IOCTL_PRIVCMD_MMAPBATCH:
-		ret = privcmd_ioctl_mmap_batch(udata, 1);
+		ret = privcmd_ioctl_mmap_batch(udata, 1, 1);
 		break;
 
 	case IOCTL_PRIVCMD_MMAPBATCH_V2:
-		ret = privcmd_ioctl_mmap_batch(udata, 2);
+		ret = privcmd_ioctl_mmap_batch(udata, 2, 1);
 		break;
 
+	case IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED:
+		ret = privcmd_ioctl_mmap_batch(udata, 2, 0);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
index a853168..be7e72b 100644
--- a/include/uapi/xen/privcmd.h
+++ b/include/uapi/xen/privcmd.h
@@ -94,5 +94,7 @@ struct privcmd_mmapbatch_v2 {
 	_IOC(_IOC_NONE, 'P', 3, sizeof(struct privcmd_mmapbatch))
 #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
 	_IOC(_IOC_NONE, 'P', 4, sizeof(struct privcmd_mmapbatch_v2))
+#define IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED			\
+	_IOC(_IOC_NONE, 'P', 5, sizeof(struct privcmd_mmapbatch_v2))
 
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder
  2013-12-18 17:28 [PATCH RFC] xen: arm: use uncached foreign mappings when building guests Ian Campbell
  2013-12-18 17:30 ` [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED Ian Campbell
@ 2013-12-18 17:30 ` Ian Campbell
  2013-12-18 21:14   ` Konrad Rzeszutek Wilk
  2013-12-18 18:41 ` [PATCH RFC] xen: arm: use uncached foreign mappings when building guests David Vrabel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-12-18 17:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, julien.grall, tim, Ian Campbell, stefano.stabellini

VERY MUCH A WIP.

On ARM guest OSes are started with MMU and Caches disables (as they are on
native) however caching is enabled in the domain running the builder and
therefore we must use an explcitly uncached mapping, otherwise when the guest
starts running it may not see them.

Cache flushes are not sufficient because there is a race between the flush and
the unmap, where the processor may speculatively fill a cache line. Thanks to
Catalin Marinas and Marc Zyngier for pointing this out.

Therefore use the newly introduced IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED on ARM
when mapping guest memory from xc_dom_* (which means xc_dom_boot_domU_map in
practice).  This avoids issues with the processor dirtying cache lines while
the guest will then fail to see because it starts with MMU and caches
disabled.

The xc_map_foreign_* functions here are a bit of a twisty maze, so far I have
just updated exactly the call path which is used by the function which I
needed to update. I have not yet considered non-xc_linux*.c nor tried to
disentangle this into a saner patch, nor any level of consistency with the
other foreign mapping functions. I've not even tried to compile on x86.

This applies on top of a revert of "tools: libxc: flush data cache after
loading images into guest memory" (a0035ecc0d82) which I will include in the
eventual proper posting.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/include/xen-sys/Linux/privcmd.h |    2 ++
 tools/libxc/xc_core_x86.c             |    4 ++--
 tools/libxc/xc_dom_boot.c             |    2 +-
 tools/libxc/xc_domain_restore.c       |    4 ++--
 tools/libxc/xc_domain_save.c          |    4 ++--
 tools/libxc/xc_foreign_memory.c       |   19 ++++++++++++++-----
 tools/libxc/xc_gnttab.c               |    2 +-
 tools/libxc/xc_linux_osdep.c          |   18 +++++++++++++-----
 tools/libxc/xc_private.h              |    3 +++
 tools/libxc/xenctrl.h                 |    5 +++--
 tools/libxc/xenctrl_osdep_ENOSYS.c    |    4 ++--
 tools/libxc/xenctrlosdep.h            |    4 ++--
 12 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index 5be860a..e3a92a0 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -88,5 +88,7 @@ typedef struct privcmd_mmapbatch_v2 {
 	_IOC(_IOC_NONE, 'P', 3, sizeof(privcmd_mmapbatch_t))
 #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
 	_IOC(_IOC_NONE, 'P', 4, sizeof(privcmd_mmapbatch_v2_t))
+#define IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED			\
+	_IOC(_IOC_NONE, 'P', 5, sizeof(privcmd_mmapbatch_v2_t))
 
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
index e328dcf..8e73bc4 100644
--- a/tools/libxc/xc_core_x86.c
+++ b/tools/libxc/xc_core_x86.c
@@ -131,7 +131,7 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct domain_info_context *dinfo, xc
     live_p2m_frame_list =
         xc_map_foreign_pages(xch, dom, PROT_READ,
                              p2m_frame_list_list,
-                             P2M_FLL_ENTRIES);
+                             P2M_FLL_ENTRIES, 1);
 
     if ( !live_p2m_frame_list )
     {
@@ -159,7 +159,7 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct domain_info_context *dinfo, xc
     *live_p2m = xc_map_foreign_pages(xch, dom,
                                     rw ? (PROT_READ | PROT_WRITE) : PROT_READ,
                                     p2m_frame_list,
-                                    P2M_FL_ENTRIES);
+                                    P2M_FL_ENTRIES, 1);
 
     if ( !*live_p2m )
     {
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index fdfeaf8..691fc2b 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -179,7 +179,7 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, xen_pfn_t pfn,
     for ( i = 0; i < count; i++ )
         entries[i].mfn = xc_dom_p2m_host(dom, pfn + i);
 
-    ptr = xc_map_foreign_ranges(dom->xch, dom->guest_domid,
+    ptr = xc_map_foreign_ranges_uncached(dom->xch, dom->guest_domid,
                 count << page_shift, PROT_READ | PROT_WRITE, 1 << page_shift,
                 entries, count);
     if ( ptr == NULL )
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 80769a7..a4c621b 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1885,7 +1885,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
             if ( (i == (dinfo->p2m_size-1)) || (j == MAX_BATCH_SIZE) )
             {
                 region_base = xc_map_foreign_pages(
-                    xch, dom, PROT_READ | PROT_WRITE, region_mfn, j);
+                    xch, dom, PROT_READ | PROT_WRITE, region_mfn, j, 1);
                 if ( region_base == NULL )
                 {
                     PERROR("map batch failed");
@@ -2222,7 +2222,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
 
     /* Copy the P2M we've constructed to the 'live' P2M */
     if ( !(ctx->live_p2m = xc_map_foreign_pages(xch, dom, PROT_WRITE,
-                                           p2m_frame_list, P2M_FL_ENTRIES)) )
+                                                p2m_frame_list, P2M_FL_ENTRIES, 1)) )
     {
         PERROR("Couldn't map p2m table");
         goto out;
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 42c4752..00c97f7 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -631,7 +631,7 @@ static xen_pfn_t *map_and_save_p2m_table(xc_interface *xch,
     live_p2m_frame_list =
         xc_map_foreign_pages(xch, dom, PROT_READ,
                              p2m_frame_list_list,
-                             P2M_FLL_ENTRIES);
+                             P2M_FLL_ENTRIES, 1);
     if ( !live_p2m_frame_list )
     {
         PERROR("Couldn't map p2m_frame_list");
@@ -666,7 +666,7 @@ static xen_pfn_t *map_and_save_p2m_table(xc_interface *xch,
 
     p2m = xc_map_foreign_pages(xch, dom, PROT_READ,
                                p2m_frame_list,
-                               P2M_FL_ENTRIES);
+                               P2M_FL_ENTRIES, 1);
     if ( !p2m )
     {
         PERROR("Couldn't map p2m table");
diff --git a/tools/libxc/xc_foreign_memory.c b/tools/libxc/xc_foreign_memory.c
index 7dfc817..ac4793d 100644
--- a/tools/libxc/xc_foreign_memory.c
+++ b/tools/libxc/xc_foreign_memory.c
@@ -21,7 +21,7 @@
 #include "xc_private.h"
 
 void *xc_map_foreign_pages(xc_interface *xch, uint32_t dom, int prot,
-                           const xen_pfn_t *arr, int num)
+                           const xen_pfn_t *arr, int num, int cached)
 {
     void *res;
     int i, *err;
@@ -35,7 +35,7 @@ void *xc_map_foreign_pages(xc_interface *xch, uint32_t dom, int prot,
     if (!err)
         return NULL;
 
-    res = xc_map_foreign_bulk(xch, dom, prot, arr, err, num);
+    res = xc_map_foreign_bulk(xch, dom, prot, arr, err, num, cached);
     if (res) {
         for (i = 0; i < num; i++) {
             if (err[i]) {
@@ -63,7 +63,15 @@ void *xc_map_foreign_ranges(xc_interface *xch, uint32_t dom,
                             privcmd_mmap_entry_t entries[], int nentries)
 {
     return xch->ops->u.privcmd.map_foreign_ranges(xch, xch->ops_handle,
-                                                  dom, size, prot, chunksize, entries, nentries);
+                                                  dom, size, prot, chunksize, entries, nentries, 1);
+}
+
+void *xc_map_foreign_ranges_uncached(xc_interface *xch, uint32_t dom,
+                            size_t size, int prot, size_t chunksize,
+                            privcmd_mmap_entry_t entries[], int nentries)
+{
+    return xch->ops->u.privcmd.map_foreign_ranges(xch, xch->ops_handle,
+                                                  dom, size, prot, chunksize, entries, nentries, 0);
 }
 
 void *xc_map_foreign_batch(xc_interface *xch, uint32_t dom, int prot,
@@ -74,10 +82,11 @@ void *xc_map_foreign_batch(xc_interface *xch, uint32_t dom, int prot,
 }
 
 void *xc_map_foreign_bulk(xc_interface *xch, uint32_t dom, int prot,
-                          const xen_pfn_t *arr, int *err, unsigned int num)
+                          const xen_pfn_t *arr, int *err, unsigned int num,
+                          int cached)
 {
     return xch->ops->u.privcmd.map_foreign_bulk(xch, xch->ops_handle,
-                                                dom, prot, arr, err, num);
+                                                dom, prot, arr, err, num, cached);
 }
 
 /* stub for all not yet converted OSes */
diff --git a/tools/libxc/xc_gnttab.c b/tools/libxc/xc_gnttab.c
index 79dab40..4774c69 100644
--- a/tools/libxc/xc_gnttab.c
+++ b/tools/libxc/xc_gnttab.c
@@ -114,7 +114,7 @@ static void *_gnttab_map_table(xc_interface *xch, int domid, int *gnt_num)
         pfn_list[i] = frame_list[i];
 
     gnt = xc_map_foreign_pages(xch, domid, PROT_READ, pfn_list,
-                               setup.nr_frames);
+                               setup.nr_frames, 1);
     if ( !gnt )
     {
         ERROR("Could not map grant table\n");
diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c
index 73860a2..09aae37 100644
--- a/tools/libxc/xc_linux_osdep.c
+++ b/tools/libxc/xc_linux_osdep.c
@@ -246,7 +246,8 @@ out:
 
 static void *linux_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h,
                                             uint32_t dom, int prot,
-                                            const xen_pfn_t *arr, int *err, unsigned int num)
+                                            const xen_pfn_t *arr, int *err,
+                                            unsigned int num, int cached)
 {
     int fd = (int)h;
     privcmd_mmapbatch_v2_t ioctlx;
@@ -254,6 +255,11 @@ static void *linux_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h
     unsigned int i;
     int rc;
 
+#if defined(__i386__) || defined(__x86_64__)
+    /* No need for cache maintenance on x86 */
+    cached = 1;
+#endif
+
     addr = mmap(NULL, (unsigned long)num << XC_PAGE_SHIFT, prot, MAP_SHARED,
                 fd, 0);
     if ( addr == MAP_FAILED )
@@ -268,7 +274,9 @@ static void *linux_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h
     ioctlx.arr = arr;
     ioctlx.err = err;
 
-    rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx);
+    rc = ioctl(fd, cached ? IOCTL_PRIVCMD_MMAPBATCH_V2
+                          : IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED,
+               &ioctlx);
 
     /* Command was recognized, some gfn in arr are in paging state */
     if ( rc < 0 && errno == ENOENT )
@@ -384,7 +392,7 @@ static void *linux_privcmd_map_foreign_range(xc_interface *xch, xc_osdep_handle
     for ( i = 0; i < num; i++ )
         arr[i] = mfn + i;
 
-    ret = xc_map_foreign_pages(xch, dom, prot, arr, num);
+    ret = xc_map_foreign_pages(xch, dom, prot, arr, num, 1);
     free(arr);
     return ret;
 }
@@ -392,7 +400,7 @@ static void *linux_privcmd_map_foreign_range(xc_interface *xch, xc_osdep_handle
 static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle h,
                                               uint32_t dom, size_t size, int prot,
                                               size_t chunksize, privcmd_mmap_entry_t entries[],
-                                              int nentries)
+                                              int nentries, int cached)
 {
     xen_pfn_t *arr;
     int num_per_entry;
@@ -411,7 +419,7 @@ static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle
         for ( j = 0; j < num_per_entry; j++ )
             arr[i * num_per_entry + j] = entries[i].mfn + j;
 
-    ret = xc_map_foreign_pages(xch, dom, prot, arr, num);
+    ret = xc_map_foreign_pages(xch, dom, prot, arr, num, 1);
     free(arr);
     return ret;
 }
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 92271c9..e040179 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -294,6 +294,9 @@ int do_memory_op(xc_interface *xch, int cmd, void *arg, size_t len);
 void *xc_map_foreign_ranges(xc_interface *xch, uint32_t dom,
                             size_t size, int prot, size_t chunksize,
                             privcmd_mmap_entry_t entries[], int nentries);
+void *xc_map_foreign_ranges_uncached(xc_interface *xch, uint32_t dom,
+                            size_t size, int prot, size_t chunksize,
+                            privcmd_mmap_entry_t entries[], int nentries);
 
 int xc_get_pfn_type_batch(xc_interface *xch, uint32_t dom,
                           unsigned int num, xen_pfn_t *);
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 6e58ebe..b20731c 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1324,7 +1324,7 @@ void *xc_map_foreign_range(xc_interface *xch, uint32_t dom,
                             unsigned long mfn );
 
 void *xc_map_foreign_pages(xc_interface *xch, uint32_t dom, int prot,
-                           const xen_pfn_t *arr, int num );
+                           const xen_pfn_t *arr, int num, int cached);
 
 /**
  * DEPRECATED - use xc_map_foreign_bulk() instead.
@@ -1342,7 +1342,8 @@ void *xc_map_foreign_batch(xc_interface *xch, uint32_t dom, int prot,
  * set to the corresponding errno value.
  */
 void *xc_map_foreign_bulk(xc_interface *xch, uint32_t dom, int prot,
-                          const xen_pfn_t *arr, int *err, unsigned int num);
+                          const xen_pfn_t *arr, int *err, unsigned int num,
+			  int cached);
 
 /**
  * Translates a virtual address in the context of a given domain and
diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c
index 4821342..d26c696 100644
--- a/tools/libxc/xenctrl_osdep_ENOSYS.c
+++ b/tools/libxc/xenctrl_osdep_ENOSYS.c
@@ -42,7 +42,7 @@ static void *ENOSYS_privcmd_map_foreign_batch(xc_interface *xch, xc_osdep_handle
 }
 
 static void *ENOSYS_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
-                                     const xen_pfn_t *arr, int *err, unsigned int num)
+                                     const xen_pfn_t *arr, int *err, unsigned int num, int cached)
 {
     IPRINTF(xch, "ENOSYS_privcmd %p: map_foreign_buld: dom%d prot %#x arr %p err %p num %d\n", h, dom, prot, arr, err, num);
     return MAP_FAILED;
@@ -57,7 +57,7 @@ static void *ENOSYS_privcmd_map_foreign_range(xc_interface *xch, xc_osdep_handle
 
 static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot,
                                        size_t chunksize, privcmd_mmap_entry_t entries[],
-                                       int nentries)
+                                       int nentries, int cached)
 {
     IPRINTF(xch, "ENOSYS_privcmd %p: map_foreign_ranges: dom%d size %zd prot %#x chunksize %zd entries %p num %d\n", h, dom, size, prot, chunksize, entries, nentries);
     return MAP_FAILED;
diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h
index e610a24..3daf757 100644
--- a/tools/libxc/xenctrlosdep.h
+++ b/tools/libxc/xenctrlosdep.h
@@ -83,12 +83,12 @@ struct xc_osdep_ops
             void *(*map_foreign_batch)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
                                        xen_pfn_t *arr, int num);
             void *(*map_foreign_bulk)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
-                                      const xen_pfn_t *arr, int *err, unsigned int num);
+                                      const xen_pfn_t *arr, int *err, unsigned int num, int cached);
             void *(*map_foreign_range)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int size, int prot,
                                        unsigned long mfn);
             void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot,
                                         size_t chunksize, privcmd_mmap_entry_t entries[],
-                                        int nentries);
+                                        int nentries, int cached);
         } privcmd;
         struct {
             int (*fd)(xc_evtchn *xce, xc_osdep_handle h);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-18 17:28 [PATCH RFC] xen: arm: use uncached foreign mappings when building guests Ian Campbell
  2013-12-18 17:30 ` [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED Ian Campbell
  2013-12-18 17:30 ` [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder Ian Campbell
@ 2013-12-18 18:41 ` David Vrabel
  2013-12-19 10:10   ` Ian Campbell
  2013-12-19  4:16 ` Julien Grall
  2013-12-19  4:26 ` Julien Grall
  4 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2013-12-18 18:41 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Tim Deegan, Julien Grall, Stefano.Stabellini,
	David Vrabel, Boris Ostrovsky

On 18/12/2013 17:28, Ian Campbell wrote:
> When building an ARM guest we need to take care of cache maintenance
> because the guest starts with MMU and cache disabled, which means we
> need to make sure that the initial images (kernel, initrd, dtb) which we
> write to guest memory are not in the cache.
> 
> We thought we had solved this with "tools: libxc: flush data cache after
> loading images into guest memory" (a0035ecc0d82) however it turns out
> that there are a couple of issues with this approach:
> 
> Firstly we need to do a cache flush from userspace, on arm64 this is
> possible by directly using the instructions from userspace, but on arm32
> this is not possible and so we need to use a system call. Unfortunately
> the system call provided by Linux for this purpose does not flush far
> enough down the cache hierarchy. Extending the system call would not be
> an insurmountable barrier, were it not for the second issue:
> 
> Secondly, and more importantly, Catalin Marinas points out (via Marc
> Zyngier) that there is a race between the cache flush and the point
> where we tear down the mappings, where the processor might speculatively
> pull some data into the cache (cache flushes are by Virtual Address, so
> this race is unavoidable).
> 
> If this happens then guest kernels which modify some code/data before
> enabling MMUs + caches may see stale data in the cache.

Would this same problem with occur with save/restore of a guest that has
caching disabled?  If so, this would require using uncached foreign
mappings for save/restore which I would think would make performance suck.

Does the same problem occur if the guest has MMU and caching enabled but
has uncached mappings?

Is there not some sort of cache flush you can do /after/ tearing down
the foreign mappings?  Flush all by ASID or similar?

Would this also require that granted pages must only have cachable
mappings in the granting domain?  If so, this should be documented as
part of the ABI.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-18 17:30 ` [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED Ian Campbell
@ 2013-12-18 19:24   ` Stefano Stabellini
  2013-12-18 21:16   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 22+ messages in thread
From: Stefano Stabellini @ 2013-12-18 19:24 UTC (permalink / raw)
  To: Ian Campbell
  Cc: julien.grall, stefano.stabellini, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Wed, 18 Dec 2013, Ian Campbell wrote:
> On ARM we want to use uncached foreign mappings while building the domain
> because the guests start with MMU and caches disabled.
> 
> Flushing the caches before launching the guest is problematic because there is
> a window between flush and unmap where the processor might speculatively fill
> a cache line. Using a non-cacheable mapping completely avoids this.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Stefano.Stabellini@citrix.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: xen-devel@lists.xenproject.org

Very nice and simple


>  drivers/xen/privcmd.c      |   15 ++++++++++-----
>  include/uapi/xen/privcmd.h |    2 ++
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 569a13b..b5561d1 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -253,6 +253,7 @@ struct mmap_batch_state {
>  	domid_t domain;
>  	unsigned long va;
>  	struct vm_area_struct *vma;
> +	pgprot_t prot;
>  	int index;
>  	/* A tristate:
>  	 *      0 for no errors
> @@ -285,8 +286,7 @@ static int mmap_batch_fn(void *data, void *state)
>  		cur_page = pages[st->index++];
>  
>  	ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
> -					 st->vma->vm_page_prot, st->domain,
> -					 &cur_page);
> +					 st->prot, st->domain, &cur_page);
>  
>  	/* Store error code for second pass. */
>  	if (st->version == 1) {
> @@ -367,7 +367,7 @@ static int alloc_empty_pages(struct vm_area_struct *vma, int numpgs)
>  
>  static struct vm_operations_struct privcmd_vm_ops;
>  
> -static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
> +static long privcmd_ioctl_mmap_batch(void __user *udata, int version, int cached)
>  {
>  	int ret;
>  	struct privcmd_mmapbatch_v2 m;
> @@ -464,6 +464,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
>  
>  	state.domain        = m.dom;
>  	state.vma           = vma;
> +	state.prot          = cached ? vma->vm_page_prot
> +				     : pgprot_noncached(vma->vm_page_prot);
>  	state.va            = m.addr;
>  	state.index         = 0;
>  	state.global_error  = 0;
> @@ -514,13 +516,16 @@ static long privcmd_ioctl(struct file *file,
>  		break;
>  
>  	case IOCTL_PRIVCMD_MMAPBATCH:
> -		ret = privcmd_ioctl_mmap_batch(udata, 1);
> +		ret = privcmd_ioctl_mmap_batch(udata, 1, 1);
>  		break;
>  
>  	case IOCTL_PRIVCMD_MMAPBATCH_V2:
> -		ret = privcmd_ioctl_mmap_batch(udata, 2);
> +		ret = privcmd_ioctl_mmap_batch(udata, 2, 1);
>  		break;
>  
> +	case IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED:
> +		ret = privcmd_ioctl_mmap_batch(udata, 2, 0);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
> index a853168..be7e72b 100644
> --- a/include/uapi/xen/privcmd.h
> +++ b/include/uapi/xen/privcmd.h
> @@ -94,5 +94,7 @@ struct privcmd_mmapbatch_v2 {
>  	_IOC(_IOC_NONE, 'P', 3, sizeof(struct privcmd_mmapbatch))
>  #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
>  	_IOC(_IOC_NONE, 'P', 4, sizeof(struct privcmd_mmapbatch_v2))
> +#define IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED			\
> +	_IOC(_IOC_NONE, 'P', 5, sizeof(struct privcmd_mmapbatch_v2))
>  
>  #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder
  2013-12-18 17:30 ` [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder Ian Campbell
@ 2013-12-18 21:14   ` Konrad Rzeszutek Wilk
  2013-12-20  9:19     ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 21:14 UTC (permalink / raw)
  To: Ian Campbell
  Cc: tim, julien.grall, Ian Jackson, stefano.stabellini, xen-devel

On Wed, Dec 18, 2013 at 05:30:47PM +0000, Ian Campbell wrote:
> VERY MUCH A WIP.
> 
> On ARM guest OSes are started with MMU and Caches disables (as they are on
> native) however caching is enabled in the domain running the builder and
> therefore we must use an explcitly uncached mapping, otherwise when the guest
> starts running it may not see them.
> 
> Cache flushes are not sufficient because there is a race between the flush and
> the unmap, where the processor may speculatively fill a cache line. Thanks to
> Catalin Marinas and Marc Zyngier for pointing this out.
> 
> Therefore use the newly introduced IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED on ARM
> when mapping guest memory from xc_dom_* (which means xc_dom_boot_domU_map in
> practice).  This avoids issues with the processor dirtying cache lines while
> the guest will then fail to see because it starts with MMU and caches
> disabled.

Why not make the default IOCTL_PRIVCMD_MMAPBATCH_V2 when running under ARM
do this?

> 
> The xc_map_foreign_* functions here are a bit of a twisty maze, so far I have
> just updated exactly the call path which is used by the function which I
> needed to update. I have not yet considered non-xc_linux*.c nor tried to
> disentangle this into a saner patch, nor any level of consistency with the
> other foreign mapping functions. I've not even tried to compile on x86.
> 
> This applies on top of a revert of "tools: libxc: flush data cache after
> loading images into guest memory" (a0035ecc0d82) which I will include in the
> eventual proper posting.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> ---
>  tools/include/xen-sys/Linux/privcmd.h |    2 ++
>  tools/libxc/xc_core_x86.c             |    4 ++--
>  tools/libxc/xc_dom_boot.c             |    2 +-
>  tools/libxc/xc_domain_restore.c       |    4 ++--
>  tools/libxc/xc_domain_save.c          |    4 ++--
>  tools/libxc/xc_foreign_memory.c       |   19 ++++++++++++++-----
>  tools/libxc/xc_gnttab.c               |    2 +-
>  tools/libxc/xc_linux_osdep.c          |   18 +++++++++++++-----
>  tools/libxc/xc_private.h              |    3 +++
>  tools/libxc/xenctrl.h                 |    5 +++--
>  tools/libxc/xenctrl_osdep_ENOSYS.c    |    4 ++--
>  tools/libxc/xenctrlosdep.h            |    4 ++--
>  12 files changed, 47 insertions(+), 24 deletions(-)
> 
> diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
> index 5be860a..e3a92a0 100644
> --- a/tools/include/xen-sys/Linux/privcmd.h
> +++ b/tools/include/xen-sys/Linux/privcmd.h
> @@ -88,5 +88,7 @@ typedef struct privcmd_mmapbatch_v2 {
>  	_IOC(_IOC_NONE, 'P', 3, sizeof(privcmd_mmapbatch_t))
>  #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
>  	_IOC(_IOC_NONE, 'P', 4, sizeof(privcmd_mmapbatch_v2_t))
> +#define IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED			\
> +	_IOC(_IOC_NONE, 'P', 5, sizeof(privcmd_mmapbatch_v2_t))
>  
>  #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
> diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
> index e328dcf..8e73bc4 100644
> --- a/tools/libxc/xc_core_x86.c
> +++ b/tools/libxc/xc_core_x86.c
> @@ -131,7 +131,7 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct domain_info_context *dinfo, xc
>      live_p2m_frame_list =
>          xc_map_foreign_pages(xch, dom, PROT_READ,
>                               p2m_frame_list_list,
> -                             P2M_FLL_ENTRIES);
> +                             P2M_FLL_ENTRIES, 1);
>  
>      if ( !live_p2m_frame_list )
>      {
> @@ -159,7 +159,7 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct domain_info_context *dinfo, xc
>      *live_p2m = xc_map_foreign_pages(xch, dom,
>                                      rw ? (PROT_READ | PROT_WRITE) : PROT_READ,
>                                      p2m_frame_list,
> -                                    P2M_FL_ENTRIES);
> +                                    P2M_FL_ENTRIES, 1);
>  
>      if ( !*live_p2m )
>      {
> diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
> index fdfeaf8..691fc2b 100644
> --- a/tools/libxc/xc_dom_boot.c
> +++ b/tools/libxc/xc_dom_boot.c
> @@ -179,7 +179,7 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, xen_pfn_t pfn,
>      for ( i = 0; i < count; i++ )
>          entries[i].mfn = xc_dom_p2m_host(dom, pfn + i);
>  
> -    ptr = xc_map_foreign_ranges(dom->xch, dom->guest_domid,
> +    ptr = xc_map_foreign_ranges_uncached(dom->xch, dom->guest_domid,
>                  count << page_shift, PROT_READ | PROT_WRITE, 1 << page_shift,
>                  entries, count);
>      if ( ptr == NULL )
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index 80769a7..a4c621b 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1885,7 +1885,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>              if ( (i == (dinfo->p2m_size-1)) || (j == MAX_BATCH_SIZE) )
>              {
>                  region_base = xc_map_foreign_pages(
> -                    xch, dom, PROT_READ | PROT_WRITE, region_mfn, j);
> +                    xch, dom, PROT_READ | PROT_WRITE, region_mfn, j, 1);
>                  if ( region_base == NULL )
>                  {
>                      PERROR("map batch failed");
> @@ -2222,7 +2222,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>  
>      /* Copy the P2M we've constructed to the 'live' P2M */
>      if ( !(ctx->live_p2m = xc_map_foreign_pages(xch, dom, PROT_WRITE,
> -                                           p2m_frame_list, P2M_FL_ENTRIES)) )
> +                                                p2m_frame_list, P2M_FL_ENTRIES, 1)) )
>      {
>          PERROR("Couldn't map p2m table");
>          goto out;
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 42c4752..00c97f7 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -631,7 +631,7 @@ static xen_pfn_t *map_and_save_p2m_table(xc_interface *xch,
>      live_p2m_frame_list =
>          xc_map_foreign_pages(xch, dom, PROT_READ,
>                               p2m_frame_list_list,
> -                             P2M_FLL_ENTRIES);
> +                             P2M_FLL_ENTRIES, 1);
>      if ( !live_p2m_frame_list )
>      {
>          PERROR("Couldn't map p2m_frame_list");
> @@ -666,7 +666,7 @@ static xen_pfn_t *map_and_save_p2m_table(xc_interface *xch,
>  
>      p2m = xc_map_foreign_pages(xch, dom, PROT_READ,
>                                 p2m_frame_list,
> -                               P2M_FL_ENTRIES);
> +                               P2M_FL_ENTRIES, 1);
>      if ( !p2m )
>      {
>          PERROR("Couldn't map p2m table");
> diff --git a/tools/libxc/xc_foreign_memory.c b/tools/libxc/xc_foreign_memory.c
> index 7dfc817..ac4793d 100644
> --- a/tools/libxc/xc_foreign_memory.c
> +++ b/tools/libxc/xc_foreign_memory.c
> @@ -21,7 +21,7 @@
>  #include "xc_private.h"
>  
>  void *xc_map_foreign_pages(xc_interface *xch, uint32_t dom, int prot,
> -                           const xen_pfn_t *arr, int num)
> +                           const xen_pfn_t *arr, int num, int cached)
>  {
>      void *res;
>      int i, *err;
> @@ -35,7 +35,7 @@ void *xc_map_foreign_pages(xc_interface *xch, uint32_t dom, int prot,
>      if (!err)
>          return NULL;
>  
> -    res = xc_map_foreign_bulk(xch, dom, prot, arr, err, num);
> +    res = xc_map_foreign_bulk(xch, dom, prot, arr, err, num, cached);
>      if (res) {
>          for (i = 0; i < num; i++) {
>              if (err[i]) {
> @@ -63,7 +63,15 @@ void *xc_map_foreign_ranges(xc_interface *xch, uint32_t dom,
>                              privcmd_mmap_entry_t entries[], int nentries)
>  {
>      return xch->ops->u.privcmd.map_foreign_ranges(xch, xch->ops_handle,
> -                                                  dom, size, prot, chunksize, entries, nentries);
> +                                                  dom, size, prot, chunksize, entries, nentries, 1);
> +}
> +
> +void *xc_map_foreign_ranges_uncached(xc_interface *xch, uint32_t dom,
> +                            size_t size, int prot, size_t chunksize,
> +                            privcmd_mmap_entry_t entries[], int nentries)
> +{
> +    return xch->ops->u.privcmd.map_foreign_ranges(xch, xch->ops_handle,
> +                                                  dom, size, prot, chunksize, entries, nentries, 0);
>  }
>  
>  void *xc_map_foreign_batch(xc_interface *xch, uint32_t dom, int prot,
> @@ -74,10 +82,11 @@ void *xc_map_foreign_batch(xc_interface *xch, uint32_t dom, int prot,
>  }
>  
>  void *xc_map_foreign_bulk(xc_interface *xch, uint32_t dom, int prot,
> -                          const xen_pfn_t *arr, int *err, unsigned int num)
> +                          const xen_pfn_t *arr, int *err, unsigned int num,
> +                          int cached)
>  {
>      return xch->ops->u.privcmd.map_foreign_bulk(xch, xch->ops_handle,
> -                                                dom, prot, arr, err, num);
> +                                                dom, prot, arr, err, num, cached);
>  }
>  
>  /* stub for all not yet converted OSes */
> diff --git a/tools/libxc/xc_gnttab.c b/tools/libxc/xc_gnttab.c
> index 79dab40..4774c69 100644
> --- a/tools/libxc/xc_gnttab.c
> +++ b/tools/libxc/xc_gnttab.c
> @@ -114,7 +114,7 @@ static void *_gnttab_map_table(xc_interface *xch, int domid, int *gnt_num)
>          pfn_list[i] = frame_list[i];
>  
>      gnt = xc_map_foreign_pages(xch, domid, PROT_READ, pfn_list,
> -                               setup.nr_frames);
> +                               setup.nr_frames, 1);
>      if ( !gnt )
>      {
>          ERROR("Could not map grant table\n");
> diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c
> index 73860a2..09aae37 100644
> --- a/tools/libxc/xc_linux_osdep.c
> +++ b/tools/libxc/xc_linux_osdep.c
> @@ -246,7 +246,8 @@ out:
>  
>  static void *linux_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h,
>                                              uint32_t dom, int prot,
> -                                            const xen_pfn_t *arr, int *err, unsigned int num)
> +                                            const xen_pfn_t *arr, int *err,
> +                                            unsigned int num, int cached)
>  {
>      int fd = (int)h;
>      privcmd_mmapbatch_v2_t ioctlx;
> @@ -254,6 +255,11 @@ static void *linux_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h
>      unsigned int i;
>      int rc;
>  
> +#if defined(__i386__) || defined(__x86_64__)
> +    /* No need for cache maintenance on x86 */
> +    cached = 1;
> +#endif
> +
>      addr = mmap(NULL, (unsigned long)num << XC_PAGE_SHIFT, prot, MAP_SHARED,
>                  fd, 0);
>      if ( addr == MAP_FAILED )
> @@ -268,7 +274,9 @@ static void *linux_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h
>      ioctlx.arr = arr;
>      ioctlx.err = err;
>  
> -    rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx);
> +    rc = ioctl(fd, cached ? IOCTL_PRIVCMD_MMAPBATCH_V2
> +                          : IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED,
> +               &ioctlx);
>  
>      /* Command was recognized, some gfn in arr are in paging state */
>      if ( rc < 0 && errno == ENOENT )
> @@ -384,7 +392,7 @@ static void *linux_privcmd_map_foreign_range(xc_interface *xch, xc_osdep_handle
>      for ( i = 0; i < num; i++ )
>          arr[i] = mfn + i;
>  
> -    ret = xc_map_foreign_pages(xch, dom, prot, arr, num);
> +    ret = xc_map_foreign_pages(xch, dom, prot, arr, num, 1);
>      free(arr);
>      return ret;
>  }
> @@ -392,7 +400,7 @@ static void *linux_privcmd_map_foreign_range(xc_interface *xch, xc_osdep_handle
>  static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle h,
>                                                uint32_t dom, size_t size, int prot,
>                                                size_t chunksize, privcmd_mmap_entry_t entries[],
> -                                              int nentries)
> +                                              int nentries, int cached)
>  {
>      xen_pfn_t *arr;
>      int num_per_entry;
> @@ -411,7 +419,7 @@ static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle
>          for ( j = 0; j < num_per_entry; j++ )
>              arr[i * num_per_entry + j] = entries[i].mfn + j;
>  
> -    ret = xc_map_foreign_pages(xch, dom, prot, arr, num);
> +    ret = xc_map_foreign_pages(xch, dom, prot, arr, num, 1);
>      free(arr);
>      return ret;
>  }
> diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
> index 92271c9..e040179 100644
> --- a/tools/libxc/xc_private.h
> +++ b/tools/libxc/xc_private.h
> @@ -294,6 +294,9 @@ int do_memory_op(xc_interface *xch, int cmd, void *arg, size_t len);
>  void *xc_map_foreign_ranges(xc_interface *xch, uint32_t dom,
>                              size_t size, int prot, size_t chunksize,
>                              privcmd_mmap_entry_t entries[], int nentries);
> +void *xc_map_foreign_ranges_uncached(xc_interface *xch, uint32_t dom,
> +                            size_t size, int prot, size_t chunksize,
> +                            privcmd_mmap_entry_t entries[], int nentries);
>  
>  int xc_get_pfn_type_batch(xc_interface *xch, uint32_t dom,
>                            unsigned int num, xen_pfn_t *);
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 6e58ebe..b20731c 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1324,7 +1324,7 @@ void *xc_map_foreign_range(xc_interface *xch, uint32_t dom,
>                              unsigned long mfn );
>  
>  void *xc_map_foreign_pages(xc_interface *xch, uint32_t dom, int prot,
> -                           const xen_pfn_t *arr, int num );
> +                           const xen_pfn_t *arr, int num, int cached);
>  
>  /**
>   * DEPRECATED - use xc_map_foreign_bulk() instead.
> @@ -1342,7 +1342,8 @@ void *xc_map_foreign_batch(xc_interface *xch, uint32_t dom, int prot,
>   * set to the corresponding errno value.
>   */
>  void *xc_map_foreign_bulk(xc_interface *xch, uint32_t dom, int prot,
> -                          const xen_pfn_t *arr, int *err, unsigned int num);
> +                          const xen_pfn_t *arr, int *err, unsigned int num,
> +			  int cached);
>  
>  /**
>   * Translates a virtual address in the context of a given domain and
> diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c
> index 4821342..d26c696 100644
> --- a/tools/libxc/xenctrl_osdep_ENOSYS.c
> +++ b/tools/libxc/xenctrl_osdep_ENOSYS.c
> @@ -42,7 +42,7 @@ static void *ENOSYS_privcmd_map_foreign_batch(xc_interface *xch, xc_osdep_handle
>  }
>  
>  static void *ENOSYS_privcmd_map_foreign_bulk(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
> -                                     const xen_pfn_t *arr, int *err, unsigned int num)
> +                                     const xen_pfn_t *arr, int *err, unsigned int num, int cached)
>  {
>      IPRINTF(xch, "ENOSYS_privcmd %p: map_foreign_buld: dom%d prot %#x arr %p err %p num %d\n", h, dom, prot, arr, err, num);
>      return MAP_FAILED;
> @@ -57,7 +57,7 @@ static void *ENOSYS_privcmd_map_foreign_range(xc_interface *xch, xc_osdep_handle
>  
>  static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot,
>                                         size_t chunksize, privcmd_mmap_entry_t entries[],
> -                                       int nentries)
> +                                       int nentries, int cached)
>  {
>      IPRINTF(xch, "ENOSYS_privcmd %p: map_foreign_ranges: dom%d size %zd prot %#x chunksize %zd entries %p num %d\n", h, dom, size, prot, chunksize, entries, nentries);
>      return MAP_FAILED;
> diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h
> index e610a24..3daf757 100644
> --- a/tools/libxc/xenctrlosdep.h
> +++ b/tools/libxc/xenctrlosdep.h
> @@ -83,12 +83,12 @@ struct xc_osdep_ops
>              void *(*map_foreign_batch)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
>                                         xen_pfn_t *arr, int num);
>              void *(*map_foreign_bulk)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
> -                                      const xen_pfn_t *arr, int *err, unsigned int num);
> +                                      const xen_pfn_t *arr, int *err, unsigned int num, int cached);
>              void *(*map_foreign_range)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int size, int prot,
>                                         unsigned long mfn);
>              void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot,
>                                          size_t chunksize, privcmd_mmap_entry_t entries[],
> -                                        int nentries);
> +                                        int nentries, int cached);
>          } privcmd;
>          struct {
>              int (*fd)(xc_evtchn *xce, xc_osdep_handle h);
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-18 17:30 ` [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED Ian Campbell
  2013-12-18 19:24   ` Stefano Stabellini
@ 2013-12-18 21:16   ` Konrad Rzeszutek Wilk
  2013-12-20  9:19     ` Ian Campbell
  1 sibling, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 21:16 UTC (permalink / raw)
  To: Ian Campbell
  Cc: stefano.stabellini, julien.grall, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Wed, Dec 18, 2013 at 05:30:37PM +0000, Ian Campbell wrote:
> On ARM we want to use uncached foreign mappings while building the domain
> because the guests start with MMU and caches disabled.
> 

Why introduce a new ioctl? Could we piggyback on the old one and on ARM
do the uncached.

> Flushing the caches before launching the guest is problematic because there is
> a window between flush and unmap where the processor might speculatively fill
> a cache line. Using a non-cacheable mapping completely avoids this.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Stefano.Stabellini@citrix.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: xen-devel@lists.xenproject.org
> ---
>  drivers/xen/privcmd.c      |   15 ++++++++++-----
>  include/uapi/xen/privcmd.h |    2 ++
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 569a13b..b5561d1 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -253,6 +253,7 @@ struct mmap_batch_state {
>  	domid_t domain;
>  	unsigned long va;
>  	struct vm_area_struct *vma;
> +	pgprot_t prot;
>  	int index;
>  	/* A tristate:
>  	 *      0 for no errors
> @@ -285,8 +286,7 @@ static int mmap_batch_fn(void *data, void *state)
>  		cur_page = pages[st->index++];
>  
>  	ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
> -					 st->vma->vm_page_prot, st->domain,
> -					 &cur_page);
> +					 st->prot, st->domain, &cur_page);
>  
>  	/* Store error code for second pass. */
>  	if (st->version == 1) {
> @@ -367,7 +367,7 @@ static int alloc_empty_pages(struct vm_area_struct *vma, int numpgs)
>  
>  static struct vm_operations_struct privcmd_vm_ops;
>  
> -static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
> +static long privcmd_ioctl_mmap_batch(void __user *udata, int version, int cached)
>  {
>  	int ret;
>  	struct privcmd_mmapbatch_v2 m;
> @@ -464,6 +464,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
>  
>  	state.domain        = m.dom;
>  	state.vma           = vma;
> +	state.prot          = cached ? vma->vm_page_prot
> +				     : pgprot_noncached(vma->vm_page_prot);
>  	state.va            = m.addr;
>  	state.index         = 0;
>  	state.global_error  = 0;
> @@ -514,13 +516,16 @@ static long privcmd_ioctl(struct file *file,
>  		break;
>  
>  	case IOCTL_PRIVCMD_MMAPBATCH:
> -		ret = privcmd_ioctl_mmap_batch(udata, 1);
> +		ret = privcmd_ioctl_mmap_batch(udata, 1, 1);
>  		break;
>  
>  	case IOCTL_PRIVCMD_MMAPBATCH_V2:
> -		ret = privcmd_ioctl_mmap_batch(udata, 2);
> +		ret = privcmd_ioctl_mmap_batch(udata, 2, 1);
>  		break;
>  
> +	case IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED:
> +		ret = privcmd_ioctl_mmap_batch(udata, 2, 0);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
> index a853168..be7e72b 100644
> --- a/include/uapi/xen/privcmd.h
> +++ b/include/uapi/xen/privcmd.h
> @@ -94,5 +94,7 @@ struct privcmd_mmapbatch_v2 {
>  	_IOC(_IOC_NONE, 'P', 3, sizeof(struct privcmd_mmapbatch))
>  #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
>  	_IOC(_IOC_NONE, 'P', 4, sizeof(struct privcmd_mmapbatch_v2))
> +#define IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED			\
> +	_IOC(_IOC_NONE, 'P', 5, sizeof(struct privcmd_mmapbatch_v2))
>  
>  #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-18 17:28 [PATCH RFC] xen: arm: use uncached foreign mappings when building guests Ian Campbell
                   ` (2 preceding siblings ...)
  2013-12-18 18:41 ` [PATCH RFC] xen: arm: use uncached foreign mappings when building guests David Vrabel
@ 2013-12-19  4:16 ` Julien Grall
  2013-12-19  4:26 ` Julien Grall
  4 siblings, 0 replies; 22+ messages in thread
From: Julien Grall @ 2013-12-19  4:16 UTC (permalink / raw)
  To: xen-devel



On 12/18/2013 05:28 PM, Ian Campbell wrote:
> I've tried this on arm64 and it seems ok, Julien, please can you give
> this a go in your midway test case which exposes this issue.

It's even worst :/. Crash every 2 times instead of 5 times.

Did you try to create/destroy numerous time? I have a script shell for 
this purpose:
#! /bin/bash

for i in `seq 1 20000`; do
     echo "Boot dom $i";
     xl create ~/dom1.xl
     sleep 5;
     xl destroy test;
     if [ $? -ne 0 ]; then
         exit
     fi
done


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-18 17:28 [PATCH RFC] xen: arm: use uncached foreign mappings when building guests Ian Campbell
                   ` (3 preceding siblings ...)
  2013-12-19  4:16 ` Julien Grall
@ 2013-12-19  4:26 ` Julien Grall
  2013-12-19 14:30   ` Ian Campbell
  4 siblings, 1 reply; 22+ messages in thread
From: Julien Grall @ 2013-12-19  4:26 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Tim Deegan, Julien Grall, Stefano.Stabellini,
	David Vrabel, Boris Ostrovsky



On 12/18/2013 05:28 PM, Ian Campbell wrote:
>
> I've tried this on arm64 and it seems ok, Julien, please can you give
> this a go in your midway test case which exposes this issue.
>

It's even worst :/. Crash every 2 times instead of 5 times.

Did you try to create/destroy numerous time? I have a script shell for 
this purpose:
#! /bin/bash

for i in `seq 1 20000`; do
     echo "Boot dom $i";
     xl create ~/dom1.xl
     sleep 5;
     xl destroy test;
     if [ $? -ne 0 ]; then
         exit
     fi
done

To be sure that the guest crash "correctly", you will need to revert the 
commit 91082d7b877cbb40de73199ba3f769d98be94215 "xen: arm: inject 
unhandled instruction and data aborts to the guest.".
Otherwise the data/prefetch fault will be redirected to the guest and 
hide for the script shell.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-18 18:41 ` [PATCH RFC] xen: arm: use uncached foreign mappings when building guests David Vrabel
@ 2013-12-19 10:10   ` Ian Campbell
  2013-12-19 11:23     ` Stefano Stabellini
  2014-01-02 15:10     ` David Vrabel
  0 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-19 10:10 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian Jackson, Tim Deegan, xen-devel, Julien Grall,
	Stefano.Stabellini, David Vrabel, Boris Ostrovsky

On Wed, 2013-12-18 at 18:41 +0000, David Vrabel wrote:
> On 18/12/2013 17:28, Ian Campbell wrote:
> > When building an ARM guest we need to take care of cache maintenance
> > because the guest starts with MMU and cache disabled, which means we
> > need to make sure that the initial images (kernel, initrd, dtb) which we
> > write to guest memory are not in the cache.
> > 
> > We thought we had solved this with "tools: libxc: flush data cache after
> > loading images into guest memory" (a0035ecc0d82) however it turns out
> > that there are a couple of issues with this approach:
> > 
> > Firstly we need to do a cache flush from userspace, on arm64 this is
> > possible by directly using the instructions from userspace, but on arm32
> > this is not possible and so we need to use a system call. Unfortunately
> > the system call provided by Linux for this purpose does not flush far
> > enough down the cache hierarchy. Extending the system call would not be
> > an insurmountable barrier, were it not for the second issue:
> > 
> > Secondly, and more importantly, Catalin Marinas points out (via Marc
> > Zyngier) that there is a race between the cache flush and the point
> > where we tear down the mappings, where the processor might speculatively
> > pull some data into the cache (cache flushes are by Virtual Address, so
> > this race is unavoidable).
> > 
> > If this happens then guest kernels which modify some code/data before
> > enabling MMUs + caches may see stale data in the cache.
> 
> Would this same problem with occur with save/restore of a guest that has
> caching disabled?

We basically only support guests running without caches at start of day.
We expect and require them to come up and enable caches and then keep
them enabled. Fortunately this is normal practice...

> If so, this would require using uncached foreign
> mappings for save/restore which I would think would make performance suck.

Indeed.

> Does the same problem occur if the guest has MMU and caching enabled but
> has uncached mappings?

Basically the only reason I can think of to have such a mapping would be
related to DMA for passthrough devices, and we wouldn't support
migration of a guest with a passthrough device.

If this does turn out to be a bridge we need to cross then I think a
strategy of ignoring uncached mappings during the live phase and
sweeping them up during the final stop and copy would work.

> Is there not some sort of cache flush you can do /after/ tearing down
> the foreign mappings?  Flush all by ASID or similar?

Sadly not. dcache flush options are by MVA (which has the described race
against unlocking) and flush by set/way which is not broadcast to all
processors and is therefore pretty useless on an SMP system (you would
need stop_machine or something similar to use it effectively). It's also
not terribly virt friendly, since it blows the cache for everyone...

> Would this also require that granted pages must only have cachable
> mappings in the granting domain?  If so, this should be documented as
> part of the ABI.

As of yesterday in arch-arm.h we have:
 * All hypercall arguments passed via a pointer to guest memory must
 * reside in memory which is mapped as Normal Inner-cacheable. Any
 * Inner cache allocation strategy (Write-Back, Write-Through etc) is
 * acceptable. There is no restriction on the Outer-cacheability.

I could have sworn I mentioned I/O in there when I wrote it. I must have
accidentally dropped it during my initial (unposted) drafts. Oops

8<------------------------

>From 8637d439e529b90b2e50ea148b935ba81a0d412d Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Thu, 19 Dec 2013 10:08:39 +0000
Subject: [PATCH] xen: arm: further clarify the requirement for cached
 mappings

We need to include all shared memory, including grant table mappings etc
in this statement.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/public/arch-arm.h |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index ef6217d..2e7fb3e 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -59,10 +59,17 @@
  * used regardless of guest type. Structures which are passed as
  * hypercall arguments are always little endian.
  *
- * All hypercall arguments passed via a pointer to guest memory must
- * reside in memory which is mapped as Normal Inner-cacheable. Any
- * Inner cache allocation strategy (Write-Back, Write-Through etc) is
- * acceptable. There is no restriction on the Outer-cacheability.
+ * All memory which is shared with other entities in the system
+ * (including the hypervisor and other guests) must reside in memory
+ * which is mapped as Normal Inner-cacheable. This applies to:
+ *  - hypercall arguments passed via a pointer to guest memory.
+ *  - memory shared via the grant table mechanism (including PV I/O
+ *    rings etc).
+ *  - memory shared with the hypervisor (struct shared_info, struct
+ *    vcpu_info, the grant table, etc).
+ *
+ * Any Inner cache allocation strategy (Write-Back, Write-Through etc)
+ * is acceptable. There is no restriction on the Outer-cacheability.
  */
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-19 10:10   ` Ian Campbell
@ 2013-12-19 11:23     ` Stefano Stabellini
  2013-12-19 11:29       ` Ian Campbell
  2014-01-02 15:10     ` David Vrabel
  1 sibling, 1 reply; 22+ messages in thread
From: Stefano Stabellini @ 2013-12-19 11:23 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Julien Grall, Ian Jackson, Tim Deegan, xen-devel, David Vrabel,
	Stefano.Stabellini, David Vrabel, Boris Ostrovsky

On Thu, 19 Dec 2013, Ian Campbell wrote:
> On Wed, 2013-12-18 at 18:41 +0000, David Vrabel wrote:
> > On 18/12/2013 17:28, Ian Campbell wrote:
> > > When building an ARM guest we need to take care of cache maintenance
> > > because the guest starts with MMU and cache disabled, which means we
> > > need to make sure that the initial images (kernel, initrd, dtb) which we
> > > write to guest memory are not in the cache.
> > > 
> > > We thought we had solved this with "tools: libxc: flush data cache after
> > > loading images into guest memory" (a0035ecc0d82) however it turns out
> > > that there are a couple of issues with this approach:
> > > 
> > > Firstly we need to do a cache flush from userspace, on arm64 this is
> > > possible by directly using the instructions from userspace, but on arm32
> > > this is not possible and so we need to use a system call. Unfortunately
> > > the system call provided by Linux for this purpose does not flush far
> > > enough down the cache hierarchy. Extending the system call would not be
> > > an insurmountable barrier, were it not for the second issue:
> > > 
> > > Secondly, and more importantly, Catalin Marinas points out (via Marc
> > > Zyngier) that there is a race between the cache flush and the point
> > > where we tear down the mappings, where the processor might speculatively
> > > pull some data into the cache (cache flushes are by Virtual Address, so
> > > this race is unavoidable).
> > > 
> > > If this happens then guest kernels which modify some code/data before
> > > enabling MMUs + caches may see stale data in the cache.
> > 
> > Would this same problem with occur with save/restore of a guest that has
> > caching disabled?
> 
> We basically only support guests running without caches at start of day.
> We expect and require them to come up and enable caches and then keep
> them enabled. Fortunately this is normal practice...
> 
> > If so, this would require using uncached foreign
> > mappings for save/restore which I would think would make performance suck.
> 
> Indeed.
> 
> > Does the same problem occur if the guest has MMU and caching enabled but
> > has uncached mappings?
> 
> Basically the only reason I can think of to have such a mapping would be
> related to DMA for passthrough devices, and we wouldn't support
> migration of a guest with a passthrough device.
> 
> If this does turn out to be a bridge we need to cross then I think a
> strategy of ignoring uncached mappings during the live phase and
> sweeping them up during the final stop and copy would work.
> 
> > Is there not some sort of cache flush you can do /after/ tearing down
> > the foreign mappings?  Flush all by ASID or similar?
> 
> Sadly not. dcache flush options are by MVA (which has the described race
> against unlocking) and flush by set/way which is not broadcast to all
> processors and is therefore pretty useless on an SMP system (you would
> need stop_machine or something similar to use it effectively). It's also
> not terribly virt friendly, since it blows the cache for everyone...
> 
> > Would this also require that granted pages must only have cachable
> > mappings in the granting domain?  If so, this should be documented as
> > part of the ABI.
> 
> As of yesterday in arch-arm.h we have:
>  * All hypercall arguments passed via a pointer to guest memory must
>  * reside in memory which is mapped as Normal Inner-cacheable. Any
>  * Inner cache allocation strategy (Write-Back, Write-Through etc) is
>  * acceptable. There is no restriction on the Outer-cacheability.
> 
> I could have sworn I mentioned I/O in there when I wrote it. I must have
> accidentally dropped it during my initial (unposted) drafts. Oops
> 
> 8<------------------------
> 
> >From 8637d439e529b90b2e50ea148b935ba81a0d412d Mon Sep 17 00:00:00 2001
> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Thu, 19 Dec 2013 10:08:39 +0000
> Subject: [PATCH] xen: arm: further clarify the requirement for cached
>  mappings
> 
> We need to include all shared memory, including grant table mappings etc
> in this statement.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  xen/include/public/arch-arm.h |   15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index ef6217d..2e7fb3e 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -59,10 +59,17 @@
>   * used regardless of guest type. Structures which are passed as
>   * hypercall arguments are always little endian.
>   *
> - * All hypercall arguments passed via a pointer to guest memory must
> - * reside in memory which is mapped as Normal Inner-cacheable. Any
> - * Inner cache allocation strategy (Write-Back, Write-Through etc) is
> - * acceptable. There is no restriction on the Outer-cacheability.
> + * All memory which is shared with other entities in the system
> + * (including the hypervisor and other guests) must reside in memory
> + * which is mapped as Normal Inner-cacheable. This applies to:
> + *  - hypercall arguments passed via a pointer to guest memory.
> + *  - memory shared via the grant table mechanism (including PV I/O
> + *    rings etc).
> + *  - memory shared with the hypervisor (struct shared_info, struct
> + *    vcpu_info, the grant table, etc).
> + *
> + * Any Inner cache allocation strategy (Write-Back, Write-Through etc)
> + * is acceptable. There is no restriction on the Outer-cacheability.
>   */
>  
>  /*
> -- 
> 1.7.10.4
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-19 11:23     ` Stefano Stabellini
@ 2013-12-19 11:29       ` Ian Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-19 11:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Ian Jackson, Tim Deegan, xen-devel, David Vrabel,
	Stefano.Stabellini, David Vrabel, Boris Ostrovsky

On Thu, 2013-12-19 at 11:23 +0000, Stefano Stabellini wrote:
> On Thu, 19 Dec 2013, Ian Campbell wrote:
> > >From 8637d439e529b90b2e50ea148b935ba81a0d412d Mon Sep 17 00:00:00 2001
> > From: Ian Campbell <ian.campbell@citrix.com>
> > Date: Thu, 19 Dec 2013 10:08:39 +0000
> > Subject: [PATCH] xen: arm: further clarify the requirement for cached
> >  mappings
> > 
> > We need to include all shared memory, including grant table mappings etc
> > in this statement.
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> 
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Ta, applied.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-19  4:26 ` Julien Grall
@ 2013-12-19 14:30   ` Ian Campbell
  2014-01-06 12:10     ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-12-19 14:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Jackson, Tim Deegan, xen-devel, Julien Grall,
	Stefano.Stabellini, David Vrabel, Boris Ostrovsky

On Thu, 2013-12-19 at 04:26 +0000, Julien Grall wrote:
> 
> On 12/18/2013 05:28 PM, Ian Campbell wrote:
> >
> > I've tried this on arm64 and it seems ok, Julien, please can you give
> > this a go in your midway test case which exposes this issue.
> >
> 
> It's even worst :/. Crash every 2 times instead of 5 times.
> 
> Did you try to create/destroy numerous time?

I'd been testing on v8 where the failure case was 100%.

[..]
> To be sure that the guest crash "correctly", you will need to revert the 
> commit 91082d7b877cbb40de73199ba3f769d98be94215 "xen: arm: inject 
> unhandled instruction and data aborts to the guest.".
> Otherwise the data/prefetch fault will be redirected to the guest and 
> hide for the script shell.

Ah, that's why my first few attempts at a loop went so well!

I'm debugging the loop now -- looks like  I might have missed a
callpath.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-18 21:16   ` Konrad Rzeszutek Wilk
@ 2013-12-20  9:19     ` Ian Campbell
  2013-12-20 14:13       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-12-20  9:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, julien.grall, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Wed, 2013-12-18 at 16:16 -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 18, 2013 at 05:30:37PM +0000, Ian Campbell wrote:
> > On ARM we want to use uncached foreign mappings while building the domain
> > because the guests start with MMU and caches disabled.
> > 
> 
> Why introduce a new ioctl? Could we piggyback on the old one and on ARM
> do the uncached.

Because there are (going to be) other circumstances where we do want a
cached foreign mapping, specifically migration.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder
  2013-12-18 21:14   ` Konrad Rzeszutek Wilk
@ 2013-12-20  9:19     ` Ian Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-20  9:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: tim, julien.grall, Ian Jackson, stefano.stabellini, xen-devel

On Wed, 2013-12-18 at 16:14 -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 18, 2013 at 05:30:47PM +0000, Ian Campbell wrote:
> > VERY MUCH A WIP.
> > 
> > On ARM guest OSes are started with MMU and Caches disables (as they are on
> > native) however caching is enabled in the domain running the builder and
> > therefore we must use an explcitly uncached mapping, otherwise when the guest
> > starts running it may not see them.
> > 
> > Cache flushes are not sufficient because there is a race between the flush and
> > the unmap, where the processor may speculatively fill a cache line. Thanks to
> > Catalin Marinas and Marc Zyngier for pointing this out.
> > 
> > Therefore use the newly introduced IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED on ARM
> > when mapping guest memory from xc_dom_* (which means xc_dom_boot_domU_map in
> > practice).  This avoids issues with the processor dirtying cache lines while
> > the guest will then fail to see because it starts with MMU and caches
> > disabled.
> 
> Why not make the default IOCTL_PRIVCMD_MMAPBATCH_V2 when running under ARM
> do this?

For the same reason as I explained in your equivalent reply to the Linux
patch...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-20  9:19     ` Ian Campbell
@ 2013-12-20 14:13       ` Konrad Rzeszutek Wilk
  2013-12-20 14:18         ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-20 14:13 UTC (permalink / raw)
  To: Ian Campbell
  Cc: stefano.stabellini, julien.grall, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Fri, Dec 20, 2013 at 09:19:13AM +0000, Ian Campbell wrote:
> On Wed, 2013-12-18 at 16:16 -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Dec 18, 2013 at 05:30:37PM +0000, Ian Campbell wrote:
> > > On ARM we want to use uncached foreign mappings while building the domain
> > > because the guests start with MMU and caches disabled.
> > > 
> > 
> > Why introduce a new ioctl? Could we piggyback on the old one and on ARM
> > do the uncached.
> 
> Because there are (going to be) other circumstances where we do want a
> cached foreign mapping, specifically migration.

Why not then just expand the existing ioctl with a flag? ARM after all
is still experimental so you could do that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-20 14:13       ` Konrad Rzeszutek Wilk
@ 2013-12-20 14:18         ` Ian Campbell
  2013-12-20 14:38           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-12-20 14:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, julien.grall, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Fri, 2013-12-20 at 09:13 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Dec 20, 2013 at 09:19:13AM +0000, Ian Campbell wrote:
> > On Wed, 2013-12-18 at 16:16 -0500, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Dec 18, 2013 at 05:30:37PM +0000, Ian Campbell wrote:
> > > > On ARM we want to use uncached foreign mappings while building the domain
> > > > because the guests start with MMU and caches disabled.
> > > > 
> > > 
> > > Why introduce a new ioctl? Could we piggyback on the old one and on ARM
> > > do the uncached.
> > 
> > Because there are (going to be) other circumstances where we do want a
> > cached foreign mapping, specifically migration.
> 
> Why not then just expand the existing ioctl with a flag? ARM after all
> is still experimental so you could do that.

Well, ARM just uses the same ioctl as on x86, so we wouldn't be
expanding, but duplicating, which didn't seem worth it.

And secondly, unlike hypercalls, the ioctls are part of Linux's
interface, so I'm not sure how changing them would go down.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-20 14:18         ` Ian Campbell
@ 2013-12-20 14:38           ` Konrad Rzeszutek Wilk
  2013-12-20 14:44             ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-20 14:38 UTC (permalink / raw)
  To: Ian Campbell
  Cc: stefano.stabellini, julien.grall, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Fri, Dec 20, 2013 at 02:18:48PM +0000, Ian Campbell wrote:
> On Fri, 2013-12-20 at 09:13 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Dec 20, 2013 at 09:19:13AM +0000, Ian Campbell wrote:
> > > On Wed, 2013-12-18 at 16:16 -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Dec 18, 2013 at 05:30:37PM +0000, Ian Campbell wrote:
> > > > > On ARM we want to use uncached foreign mappings while building the domain
> > > > > because the guests start with MMU and caches disabled.
> > > > > 
> > > > 
> > > > Why introduce a new ioctl? Could we piggyback on the old one and on ARM
> > > > do the uncached.
> > > 
> > > Because there are (going to be) other circumstances where we do want a
> > > cached foreign mapping, specifically migration.
> > 
> > Why not then just expand the existing ioctl with a flag? ARM after all
> > is still experimental so you could do that.
> 
> Well, ARM just uses the same ioctl as on x86, so we wouldn't be
> expanding, but duplicating, which didn't seem worth it.
> 
> And secondly, unlike hypercalls, the ioctls are part of Linux's
> interface, so I'm not sure how changing them would go down.

You wouldn't break - expanding the structure still preserves the ABI.
Only more recent toolstacks would know how to use it.


But I get your point - you are hesistant to modify it the existing one.
How about making this new more flexible - perhaps have a 'flag' and
'version' or even 'size' to account for changes? And call it IOCTL_.._V3
which would allows us to add more things in the future (depending on the
version size)? That way if something else likes this pops up we have
a nice escape vehicle ready.

> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED
  2013-12-20 14:38           ` Konrad Rzeszutek Wilk
@ 2013-12-20 14:44             ` Ian Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-20 14:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, julien.grall, tim, xen-devel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Fri, 2013-12-20 at 09:38 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Dec 20, 2013 at 02:18:48PM +0000, Ian Campbell wrote:
> > On Fri, 2013-12-20 at 09:13 -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Dec 20, 2013 at 09:19:13AM +0000, Ian Campbell wrote:
> > > > On Wed, 2013-12-18 at 16:16 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Wed, Dec 18, 2013 at 05:30:37PM +0000, Ian Campbell wrote:
> > > > > > On ARM we want to use uncached foreign mappings while building the domain
> > > > > > because the guests start with MMU and caches disabled.
> > > > > > 
> > > > > 
> > > > > Why introduce a new ioctl? Could we piggyback on the old one and on ARM
> > > > > do the uncached.
> > > > 
> > > > Because there are (going to be) other circumstances where we do want a
> > > > cached foreign mapping, specifically migration.
> > > 
> > > Why not then just expand the existing ioctl with a flag? ARM after all
> > > is still experimental so you could do that.
> > 
> > Well, ARM just uses the same ioctl as on x86, so we wouldn't be
> > expanding, but duplicating, which didn't seem worth it.
> > 
> > And secondly, unlike hypercalls, the ioctls are part of Linux's
> > interface, so I'm not sure how changing them would go down.
> 
> You wouldn't break - expanding the structure still preserves the ABI.

The ioctl number includes sizeof(the struct) though, so changing the
struct would mean adding compatiblity handling for the old struct,
binding the old and new numbers, and all that jazz.

> Only more recent toolstacks would know how to use it.
> 
> 
> But I get your point - you are hesistant to modify it the existing one.
> How about making this new more flexible - perhaps have a 'flag' and
> 'version' or even 'size' to account for changes? And call it IOCTL_.._V3
> which would allows us to add more things in the future (depending on the
> version size)? That way if something else likes this pops up we have
> a nice escape vehicle ready.

I think I remember Andy cooper saying he wanted to do a more thorough
revamp of the API at some point (I can't recall why), so I wanted to
keep this extension as small as possible rather than create a new thing
in the mean time.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-19 10:10   ` Ian Campbell
  2013-12-19 11:23     ` Stefano Stabellini
@ 2014-01-02 15:10     ` David Vrabel
  2014-01-06 10:05       ` Ian Campbell
  1 sibling, 1 reply; 22+ messages in thread
From: David Vrabel @ 2014-01-02 15:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Julien Grall, Ian Jackson, Tim Deegan, xen-devel, David Vrabel,
	Stefano.Stabellini, Boris Ostrovsky

On 19/12/13 10:10, Ian Campbell wrote:
> On Wed, 2013-12-18 at 18:41 +0000, David Vrabel wrote:
>> On 18/12/2013 17:28, Ian Campbell wrote:
>>> When building an ARM guest we need to take care of cache maintenance
>>> because the guest starts with MMU and cache disabled, which means we
>>> need to make sure that the initial images (kernel, initrd, dtb) which we
>>> write to guest memory are not in the cache.
>>>
>>> We thought we had solved this with "tools: libxc: flush data cache after
>>> loading images into guest memory" (a0035ecc0d82) however it turns out
>>> that there are a couple of issues with this approach:
>>>
>>> Firstly we need to do a cache flush from userspace, on arm64 this is
>>> possible by directly using the instructions from userspace, but on arm32
>>> this is not possible and so we need to use a system call. Unfortunately
>>> the system call provided by Linux for this purpose does not flush far
>>> enough down the cache hierarchy. Extending the system call would not be
>>> an insurmountable barrier, were it not for the second issue:
>>>
>>> Secondly, and more importantly, Catalin Marinas points out (via Marc
>>> Zyngier) that there is a race between the cache flush and the point
>>> where we tear down the mappings, where the processor might speculatively
>>> pull some data into the cache (cache flushes are by Virtual Address, so
>>> this race is unavoidable).
>>>
>>> If this happens then guest kernels which modify some code/data before
>>> enabling MMUs + caches may see stale data in the cache.
>>
>> Would this same problem with occur with save/restore of a guest that has
>> caching disabled?
> 
> We basically only support guests running without caches at start of day.
> We expect and require them to come up and enable caches and then keep
> them enabled. Fortunately this is normal practice...

There is still a (small) window where a guest is running with caches
disabled and it may be saved/migrated at this point.  Is this a concern?

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2014-01-02 15:10     ` David Vrabel
@ 2014-01-06 10:05       ` Ian Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2014-01-06 10:05 UTC (permalink / raw)
  To: David Vrabel
  Cc: Julien Grall, Ian Jackson, Tim Deegan, xen-devel, David Vrabel,
	Stefano.Stabellini, Boris Ostrovsky

On Thu, 2014-01-02 at 15:10 +0000, David Vrabel wrote:
> > We basically only support guests running without caches at start of day.
> > We expect and require them to come up and enable caches and then keep
> > them enabled. Fortunately this is normal practice...
> 
> There is still a (small) window where a guest is running with caches
> disabled and it may be saved/migrated at this point.  Is this a concern?

I don't think so, if nothing else while the guest has caches disabled it
is so early that it is mostly certainly not watching the xenbus control
node.

I suppose we might theoretically start a live migration right after boot
and iterate until caches were enabled, but with stale pages in the save
image (those which didn't change since boot).

There's a large dollop of "don't do that then" but probably the
migration code should check that caches are enabled as one of the first
things it does.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] xen: arm: use uncached foreign mappings when building guests
  2013-12-19 14:30   ` Ian Campbell
@ 2014-01-06 12:10     ` Ian Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2014-01-06 12:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Jackson, Tim Deegan, xen-devel, Julien Grall,
	Stefano.Stabellini, David Vrabel, Boris Ostrovsky

On Thu, 2013-12-19 at 14:30 +0000, Ian Campbell wrote:
> Looks like I might have missed a callpath.

Not just that, but a cache flush too so even with the uncached mapping
the cache is *still* potentially dirty.

One place this could be fixed is in dom0 in remap_pte_fn, and I've
confirmed that this works:
        @@ -101,6 +102,8 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsig
                if (map_foreign_page(pfn, info->fgmfn, info->domid))
                        return -EFAULT;
                set_pte_at(info->vma->vm_mm, addr, ptep, pte);
        +       if (pgprot_val(info->prot) == pgprot_val(pgprot_noncached(info->prot)))
        +               __cpuc_flush_dcache_area((void *)addr, PAGE_SIZE);
         
                return 0;
         }
        
I'm wondering though if it might be more correct to do this on the
hypervisor side. In particular I'm thinking that caches should be
invalidated when a page is freed. scrub_one_page would have been
convenient but is not called for a ballooned down page, so we'd have to
add something which maps the page and flushes it on free.

An alternative would be to invalidate the cache when we allocate the
page to a guest, e.g. in populate physmap.

Sadly -- neither of the last two ideas actually work in practice for
some reason...

I'll poke at it a bit more.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-01-06 12:10 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-18 17:28 [PATCH RFC] xen: arm: use uncached foreign mappings when building guests Ian Campbell
2013-12-18 17:30 ` [PATCH LINUX RFC] xen: privcmd: implement IOCTL_PRIVCMD_MMAPBATCH_V2_UNCACHED Ian Campbell
2013-12-18 19:24   ` Stefano Stabellini
2013-12-18 21:16   ` Konrad Rzeszutek Wilk
2013-12-20  9:19     ` Ian Campbell
2013-12-20 14:13       ` Konrad Rzeszutek Wilk
2013-12-20 14:18         ` Ian Campbell
2013-12-20 14:38           ` Konrad Rzeszutek Wilk
2013-12-20 14:44             ` Ian Campbell
2013-12-18 17:30 ` [PATCH XEN RFC] libxc: use an uncached mapping of guest ram in domain builder Ian Campbell
2013-12-18 21:14   ` Konrad Rzeszutek Wilk
2013-12-20  9:19     ` Ian Campbell
2013-12-18 18:41 ` [PATCH RFC] xen: arm: use uncached foreign mappings when building guests David Vrabel
2013-12-19 10:10   ` Ian Campbell
2013-12-19 11:23     ` Stefano Stabellini
2013-12-19 11:29       ` Ian Campbell
2014-01-02 15:10     ` David Vrabel
2014-01-06 10:05       ` Ian Campbell
2013-12-19  4:16 ` Julien Grall
2013-12-19  4:26 ` Julien Grall
2013-12-19 14:30   ` Ian Campbell
2014-01-06 12:10     ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.