All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3 0/6] xen/arm: ARM save/restore/migration support
@ 2014-05-08 21:18 Wei Huang
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
                   ` (7 more replies)
  0 siblings, 8 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

The following patches enable save/restore/migration support for ARM
guest VMs. Note that the original series were sent from Jaeyong Yoo.

Working:
   * 32-bit (including SMP) guest VM save/restore/migration
   * 64-bit guest VM save
WIP:
   * 64-bit guest restore/migration 

-Wei


Rev 3:
   * Merge bitmap and VLPT design to become a common log_dirty
   * Seperate save/restore for VGICD_* and GICH_*
   * Merged with x86 code path for related hypercalls
   * Numerous minor fixes and extensive code comments
     
Rev 2:
   * save and restore of guest states is split into specific areas (and files)
   * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
   * name and layout of some functions
   * small areas commented by Julien Grall and Andrew Cooper 

Wei Huang (6):
  xen/arm: Add basic save/restore support for ARM
  xen/arm: Add save/restore support for ARM GIC V2
  xen/arm: Add save/restore support for ARM arch timer
  xen/arm: Add save/restore support for guest core registers
  xen/arm: Add log_dirty support for ARM
  xen/arm: Implement toolstack for xl restore/save/migration

 config/arm32.mk                        |    1 +
 config/arm64.mk                        |    1 +
 tools/libxc/Makefile                   |    5 +
 tools/libxc/xc_arm_migrate.c           |  653 ++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c               |    4 +-
 tools/libxc/xc_resume.c                |   20 +-
 tools/libxl/libxl.h                    |    3 -
 tools/misc/Makefile                    |    4 +-
 xen/arch/arm/Makefile                  |    1 +
 xen/arch/arm/domain.c                  |    6 +
 xen/arch/arm/domctl.c                  |   31 +-
 xen/arch/arm/hvm.c                     |  263 ++++++++++++-
 xen/arch/arm/mm.c                      |  298 ++++++++++++++-
 xen/arch/arm/p2m.c                     |  204 ++++++++++
 xen/arch/arm/save.c                    |   68 ++++
 xen/arch/arm/traps.c                   |    9 +
 xen/arch/arm/vgic.c                    |  171 +++++++++
 xen/arch/arm/vtimer.c                  |   90 +++++
 xen/arch/x86/domctl.c                  |   85 -----
 xen/arch/x86/hvm/save.c                |   12 +
 xen/common/Makefile                    |    2 +-
 xen/common/domctl.c                    |   86 +++++
 xen/common/hvm/save.c                  |   11 -
 xen/include/asm-arm/config.h           |   12 +-
 xen/include/asm-arm/domain.h           |   19 +
 xen/include/asm-arm/hvm/support.h      |   29 ++
 xen/include/asm-arm/mm.h               |   23 ++
 xen/include/asm-arm/p2m.h              |    8 +-
 xen/include/asm-arm/processor.h        |    2 +
 xen/include/public/arch-arm/hvm/save.h |  184 +++++++++
 30 files changed, 2180 insertions(+), 125 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
@ 2014-05-08 21:18 ` Wei Huang
  2014-05-08 22:11   ` Andrew Cooper
                     ` (3 more replies)
  2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
                   ` (6 subsequent siblings)
  7 siblings, 4 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

This patch implements a basic framework for ARM guest
save/restore. It defines a HVM save header for ARM guests
and correponding arch_ save/load functions. These functions
are hooked up with domain control hypercalls (gethvmcontext
and sethvmcontext). The hypercalls become a common code path to
both x86 and ARM. As a result of merging, the x86 specific
header saving code is moved to x86 sub-directory.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/Makefile                  |    1 +
 xen/arch/arm/save.c                    |   68 +++++++++++++++++++++++++
 xen/arch/x86/domctl.c                  |   85 -------------------------------
 xen/arch/x86/hvm/save.c                |   12 +++++
 xen/common/Makefile                    |    2 +-
 xen/common/domctl.c                    |   86 ++++++++++++++++++++++++++++++++
 xen/common/hvm/save.c                  |   11 ----
 xen/include/asm-arm/hvm/support.h      |   29 +++++++++++
 xen/include/public/arch-arm/hvm/save.h |   19 +++++++
 9 files changed, 216 insertions(+), 97 deletions(-)
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 63e0460..d9a328c 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -33,6 +33,7 @@ obj-y += hvm.o
 obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
+obj-y += save.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c
new file mode 100644
index 0000000..c4a6215
--- /dev/null
+++ b/xen/arch/arm/save.c
@@ -0,0 +1,68 @@
+/*
+ * hvm/save.c: Save and restore HVM guest's emulated hardware state for ARM.
+ *
+ * Copyright (c) 2014 Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+#include <xen/config.h>
+#include <asm/hvm/support.h>
+#include <public/hvm/save.h>
+
+void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
+{
+    hdr->magic = HVM_ARM_FILE_MAGIC;
+    hdr->version = HVM_ARM_FILE_VERSION;
+    hdr->cpuinfo = READ_SYSREG32(MIDR_EL1);
+}
+
+int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
+{
+    uint32_t cpuinfo;
+
+    if ( hdr->magic != HVM_ARM_FILE_MAGIC )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n",
+               d->domain_id, hdr->magic);
+        return -EINVAL;
+    }
+
+    if ( hdr->version != HVM_ARM_FILE_VERSION )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n",
+               d->domain_id, hdr->version);
+        return -EINVAL;
+    }
+
+    cpuinfo = READ_SYSREG32(MIDR_EL1);
+    if ( hdr->cpuinfo != cpuinfo )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: VM saved on one CPU "
+               "(%#"PRIx32") and restored on another (%#"PRIx32").\n",
+               d->domain_id, hdr->cpuinfo, cpuinfo);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index d792e87..06a19b0 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -401,91 +401,6 @@ long arch_do_domctl(
     }
     break;
 
-    case XEN_DOMCTL_sethvmcontext:
-    { 
-        struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size };
-
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            goto sethvmcontext_out;
-
-        ret = -ENOMEM;
-        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
-            goto sethvmcontext_out;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0)
-            goto sethvmcontext_out;
-
-        domain_pause(d);
-        ret = hvm_load(d, &c);
-        domain_unpause(d);
-
-    sethvmcontext_out:
-        if ( c.data != NULL )
-            xfree(c.data);
-    }
-    break;
-
-    case XEN_DOMCTL_gethvmcontext:
-    { 
-        struct hvm_domain_context c = { 0 };
-
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            goto gethvmcontext_out;
-
-        c.size = hvm_save_size(d);
-
-        if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) )
-        {
-            /* Client is querying for the correct buffer size */
-            domctl->u.hvmcontext.size = c.size;
-            ret = 0;
-            goto gethvmcontext_out;            
-        }
-
-        /* Check that the client has a big enough buffer */
-        ret = -ENOSPC;
-        if ( domctl->u.hvmcontext.size < c.size ) 
-            goto gethvmcontext_out;
-
-        /* Allocate our own marshalling buffer */
-        ret = -ENOMEM;
-        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
-            goto gethvmcontext_out;
-
-        domain_pause(d);
-        ret = hvm_save(d, &c);
-        domain_unpause(d);
-
-        domctl->u.hvmcontext.size = c.cur;
-        if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 )
-            ret = -EFAULT;
-
-    gethvmcontext_out:
-        copyback = 1;
-
-        if ( c.data != NULL )
-            xfree(c.data);
-    }
-    break;
-
-    case XEN_DOMCTL_gethvmcontext_partial:
-    { 
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            break;
-
-        domain_pause(d);
-        ret = hvm_save_one(d, domctl->u.hvmcontext_partial.type,
-                           domctl->u.hvmcontext_partial.instance,
-                           domctl->u.hvmcontext_partial.buffer);
-        domain_unpause(d);
-    }
-    break;
-
-
     case XEN_DOMCTL_set_address_size:
     {
         switch ( domctl->u.address_size.size )
diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c
index 6af19be..0d6da5c 100644
--- a/xen/arch/x86/hvm/save.c
+++ b/xen/arch/x86/hvm/save.c
@@ -27,6 +27,18 @@
 void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
 {
     uint32_t eax, ebx, ecx, edx;
+    char *c;
+
+    /* Save magic and version info */
+    hdr->magic = HVM_FILE_MAGIC;
+    hdr->version = HVM_FILE_VERSION;
+
+    /* Save xen changeset */
+    c = strrchr(xen_changeset(), ':');
+    if ( c )
+        hdr->changeset = simple_strtoll(c, NULL, 16);
+    else
+        hdr->changeset = -1ULL; /* Unknown */
 
     /* Save some CPUID bits */
     cpuid(1, &eax, &ebx, &ecx, &edx);
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3683ae3..13b781f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -62,7 +62,7 @@ obj-$(CONFIG_XENCOMM) += xencomm.o
 
 subdir-$(CONFIG_COMPAT) += compat
 
-subdir-$(x86_64) += hvm
+subdir-y += hvm
 
 subdir-$(coverage) += gcov
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index af3614b..66358e4 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -24,6 +24,8 @@
 #include <xen/bitmap.h>
 #include <xen/paging.h>
 #include <xen/hypercall.h>
+#include <xen/hvm/save.h>
+#include <xen/guest_access.h>
 #include <asm/current.h>
 #include <asm/irq.h>
 #include <asm/page.h>
@@ -885,6 +887,90 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     }
     break;
 
+    case XEN_DOMCTL_sethvmcontext:
+    {
+        struct hvm_domain_context c = { .size = op->u.hvmcontext.size };
+
+        ret = -EINVAL;
+        if ( (d == current->domain)  /* no domain_pause() */
+             || !is_hvm_domain(d) )
+            goto sethvmcontext_out;
+
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto sethvmcontext_out;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(c.data, op->u.hvmcontext.buffer, c.size) )
+            goto sethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_load(d, &c);
+        domain_unpause(d);
+
+    sethvmcontext_out:
+        xfree(c.data);
+    }
+    break;
+
+    case XEN_DOMCTL_gethvmcontext:
+    {
+        struct hvm_domain_context c = { 0 };
+
+        ret = -EINVAL;
+        if ( (d == current->domain)  /* no domain_pause() */
+             || !is_hvm_domain(d) )
+            goto gethvmcontext_out;
+
+        c.size = hvm_save_size(d);
+
+        if ( guest_handle_is_null(op->u.hvmcontext.buffer) )
+        {
+            /* Client is querying for the correct buffer size */
+            op->u.hvmcontext.size = c.size;
+            ret = 0;
+            goto gethvmcontext_out;
+        }
+
+        /* Check that the client has a big enough buffer */
+        ret = -ENOSPC;
+        if ( op->u.hvmcontext.size < c.size )
+            goto gethvmcontext_out;
+
+        /* Allocate our own marshalling buffer */
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto gethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_save(d, &c);
+        domain_unpause(d);
+
+        op->u.hvmcontext.size = c.cur;
+        if ( copy_to_guest(op->u.hvmcontext.buffer, c.data, c.size) )
+            ret = -EFAULT;
+
+    gethvmcontext_out:
+        copyback = 1;
+        xfree(c.data);
+    }
+    break;
+
+    case XEN_DOMCTL_gethvmcontext_partial:
+    {
+        ret = -EINVAL;
+        if ( (d == current->domain) /* no domain_pause() */
+             || !is_hvm_domain(d) )
+            break;
+
+        domain_pause(d);
+        ret = hvm_save_one(d, op->u.hvmcontext_partial.type,
+                           op->u.hvmcontext_partial.instance,
+                           op->u.hvmcontext_partial.buffer);
+        domain_unpause(d);
+    }
+    break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c
index 6c16399..0b303ff 100644
--- a/xen/common/hvm/save.c
+++ b/xen/common/hvm/save.c
@@ -140,7 +140,6 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
 
 int hvm_save(struct domain *d, hvm_domain_context_t *h)
 {
-    char *c;
     struct hvm_save_header hdr;
     struct hvm_save_end end;
     hvm_save_handler handler;
@@ -149,16 +148,6 @@ int hvm_save(struct domain *d, hvm_domain_context_t *h)
     if ( d->is_dying )
         return -EINVAL;
 
-    hdr.magic = HVM_FILE_MAGIC;
-    hdr.version = HVM_FILE_VERSION;
-
-    /* Save xen changeset */
-    c = strrchr(xen_changeset(), ':');
-    if ( c )
-        hdr.changeset = simple_strtoll(c, NULL, 16);
-    else 
-        hdr.changeset = -1ULL; /* Unknown */
-
     arch_hvm_save(d, &hdr);
 
     if ( hvm_save_entry(HEADER, 0, h, &hdr) != 0 )
diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
new file mode 100644
index 0000000..fa5fe75
--- /dev/null
+++ b/xen/include/asm-arm/hvm/support.h
@@ -0,0 +1,29 @@
+/*
+ * HVM support routines
+ *
+ * Copyright (c) 2014, Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#ifndef __ASM_ARM_HVM_SUPPORT_H__
+#define __ASM_ARM_HVM_SUPPORT_H__
+
+#include <xen/types.h>
+#include <public/hvm/ioreq.h>
+#include <xen/sched.h>
+#include <xen/hvm/save.h>
+#include <asm/processor.h>
+
+#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 75b8e65..8312e7b 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -3,6 +3,7 @@
  * be saved along with the domain's memory and device-model state.
  *
  * Copyright (c) 2012 Citrix Systems Ltd.
+ * Copyright (c) 2014 Samsung Electronics.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to
@@ -26,6 +27,24 @@
 #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
 #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
 
+#define HVM_ARM_FILE_MAGIC   0x92385520
+#define HVM_ARM_FILE_VERSION 0x00000001
+
+/* Note: For compilation purpose hvm_save_header name is the same as x86,
+ * but layout is different. */
+struct hvm_save_header
+{
+    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
+    uint32_t version;           /* File format version */
+    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
+};
+DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
+
+/*
+ * Largest type-code in use
+ */
+#define HVM_SAVE_CODE_MAX 1
+
 #endif
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
@ 2014-05-08 21:18 ` Wei Huang
  2014-05-08 22:47   ` Andrew Cooper
                     ` (3 more replies)
  2014-05-08 21:18 ` [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer Wei Huang
                   ` (5 subsequent siblings)
  7 siblings, 4 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

This patch implements a save/restore support for
ARM guest GIC. Two types of GIC V2 states are saved seperately:
1) VGICD_* contains the GIC distributor state from
guest VM's view; 2) GICH_* is the GIC virtual control
state from hypervisor's persepctive.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/vgic.c                    |  171 ++++++++++++++++++++++++++++++++
 xen/include/public/arch-arm/hvm/save.h |   34 ++++++-
 2 files changed, 204 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 4cf6470..505e944 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -24,6 +24,7 @@
 #include <xen/softirq.h>
 #include <xen/irq.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 
 #include <asm/current.h>
 
@@ -73,6 +74,110 @@ static struct vgic_irq_rank *vgic_irq_rank(struct vcpu *v, int b, int n)
         return NULL;
 }
 
+/* Save guest VM's distributor info into a context to support domains
+ * save/restore. Such info represents guest VM's view of its GIC
+ * distributor (GICD_*).
+ */
+static int hvm_vgicd_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_arm_vgicd_v2 ctxt;
+    struct vcpu *v;
+    struct vgic_irq_rank *rank;
+    int rc = 0;
+
+    /* Save the state for each VCPU */
+    for_each_vcpu( d, v )
+    {
+        rank = &v->arch.vgic.private_irqs;
+
+        /* IENABLE, IACTIVE, IPEND,  PENDSGI */
+        ctxt.ienable = rank->ienable;
+        ctxt.iactive = rank->iactive;
+        ctxt.ipend = rank->ipend;
+        ctxt.pendsgi = rank->pendsgi;
+
+        /* ICFG */
+        ctxt.icfg[0] = rank->icfg[0];
+        ctxt.icfg[1] = rank->icfg[1];
+
+        /* IPRIORITY */
+        BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ctxt.ipriority));
+        memcpy(ctxt.ipriority, rank->ipriority, sizeof(rank->ipriority));
+
+        /* ITARGETS */
+        BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ctxt.itargets));
+        memcpy(ctxt.itargets, rank->itargets, sizeof(rank->itargets));
+
+        if ( (rc = hvm_save_entry(VGICD_V2, v->vcpu_id, h, &ctxt)) != 0 )
+            return rc;
+    }
+
+    return rc;
+}
+
+/* Load guest VM's distributor info from a context to support domain
+ * save/restore. The info is loaded into vgic_irq_rank.
+ */
+static int hvm_vgicd_load(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_arm_vgicd_v2 ctxt;
+    struct vgic_irq_rank *rank;
+    struct vcpu *v;
+    int vcpuid;
+    unsigned long enable_bits;
+    struct pending_irq *p;
+    unsigned int irq = 0;
+    int rc = 0;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( (rc = hvm_load_entry(VGICD_V2, h, &ctxt)) != 0 )
+        return rc;
+
+    /* Restore PPI states */
+    rank = &v->arch.vgic.private_irqs;
+
+    /* IENABLE, IACTIVE, IPEND, PENDSGI */
+    rank->ienable = ctxt.ienable;
+    rank->iactive = ctxt.iactive;
+    rank->ipend = ctxt.ipend;
+    rank->pendsgi = ctxt.pendsgi;
+
+    /* ICFG */
+    rank->icfg[0] = ctxt.icfg[0];
+    rank->icfg[1] = ctxt.icfg[1];
+
+    /* IPRIORITY */
+    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ctxt.ipriority));
+    memcpy(rank->ipriority, ctxt.ipriority, sizeof(rank->ipriority));
+
+    /* ITARGETS */
+    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ctxt.itargets));
+    memcpy(rank->itargets, ctxt.itargets, sizeof(rank->itargets));
+
+    /* Set IRQ status as enabled by iterating through rank->ienable register.
+     * This step is required otherwise events won't be received by the VM
+     * after restore. */
+    enable_bits = ctxt.ienable;
+    while ( (irq = find_next_bit(&enable_bits, 32, irq)) < 32 )
+    {
+        p = irq_to_pending(v, irq);
+        set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
+        irq++;
+    }
+
+    return 0;
+}
+HVM_REGISTER_SAVE_RESTORE(VGICD_V2, hvm_vgicd_save, hvm_vgicd_load,
+                          1, HVMSR_PER_VCPU);
+
 int domain_vgic_init(struct domain *d)
 {
     int i;
@@ -759,6 +864,72 @@ out:
         smp_send_event_check_mask(cpumask_of(v->processor));
 }
 
+/* Save GIC virtual control state into a context to support save/restore. 
+ * The info reprsents most of GICH_* registers. */
+static int hvm_gich_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_arm_gich_v2 ctxt;
+    struct vcpu *v;
+    int rc = 0;
+
+    /* Save the state of GICs */
+    for_each_vcpu( d, v )
+    {
+        ctxt.gic_hcr = v->arch.gic_hcr;
+        ctxt.gic_vmcr = v->arch.gic_vmcr;
+        ctxt.gic_apr = v->arch.gic_apr;
+
+        /* Save list registers and masks */
+        BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
+        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
+
+        ctxt.lr_mask = v->arch.lr_mask;
+        ctxt.event_mask = v->arch.event_mask;
+
+        if ( (rc = hvm_save_entry(GICH_V2, v->vcpu_id, h, &ctxt)) != 0 )
+            return rc;
+    }
+
+    return rc;
+}
+
+/* Restore GIC virtual control state from a context to support save/restore */
+static int hvm_gich_load(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_arm_gich_v2 ctxt;
+    struct vcpu *v;
+    int rc = 0;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_ERR, "HVM restore: dom%u has no vcpu%u\n", d->domain_id,
+                vcpuid);
+        return -EINVAL;
+    }
+
+    if ( (rc = hvm_load_entry(GICH_V2, h, &ctxt)) != 0 )
+        return rc;
+
+    v->arch.gic_hcr = ctxt.gic_hcr;
+    v->arch.gic_vmcr = ctxt.gic_vmcr;
+    v->arch.gic_apr = ctxt.gic_apr;
+
+    /* Restore list registers and masks */
+    BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
+    memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(v->arch.gic_lr));
+
+    v->arch.lr_mask = ctxt.lr_mask;
+    v->arch.event_mask = ctxt.event_mask;
+
+    return rc;
+}
+
+HVM_REGISTER_SAVE_RESTORE(GICH_V2, hvm_gich_save, hvm_gich_load, 1,
+                          HVMSR_PER_VCPU);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 8312e7b..421a6f6 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -40,10 +40,42 @@ struct hvm_save_header
 };
 DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
 
+/* Guest's view of GIC distributor (per-vcpu)
+ *   - Based on GICv2 (see "struct vgic_irq_rank")
+ *   - Store guest's view of GIC distributor
+ *   - Only support SGI and PPI for DomU (DomU doesn't handle SPI)
+ */
+struct hvm_arm_vgicd_v2
+{
+    uint32_t ienable;
+    uint32_t iactive;
+    uint32_t ipend;
+    uint32_t pendsgi;
+    uint32_t icfg[2];
+    uint32_t ipriority[8];
+    uint32_t itargets[8];
+};
+DECLARE_HVM_SAVE_TYPE(VGICD_V2, 2, struct hvm_arm_vgicd_v2);
+
+/* Info for hypervisor to manage guests (per-vcpu)
+ *   - Based on GICv2
+ *   - Mainly store registers of GICH_*
+ */
+struct hvm_arm_gich_v2
+{
+    uint32_t gic_hcr;
+    uint32_t gic_vmcr;
+    uint32_t gic_apr;
+    uint32_t gic_lr[64];
+    uint64_t event_mask;
+    uint64_t lr_mask;
+};
+DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
+
 /*
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 1
+#define HVM_SAVE_CODE_MAX 3
 
 #endif
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
  2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
@ 2014-05-08 21:18 ` Wei Huang
  2014-05-08 23:02   ` Andrew Cooper
                     ` (2 more replies)
  2014-05-08 21:18 ` [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers Wei Huang
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

This patch implements a save/resore support for ARM architecture
timer.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/vtimer.c                  |   90 ++++++++++++++++++++++++++++++++
 xen/include/public/arch-arm/hvm/save.h |   16 +++++-
 2 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index b93153e..6576408 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -21,6 +21,7 @@
 #include <xen/lib.h>
 #include <xen/timer.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 #include <asm/irq.h>
 #include <asm/time.h>
 #include <asm/gic.h>
@@ -285,6 +286,95 @@ int vtimer_emulate(struct cpu_user_regs *regs, union hsr hsr)
     }
 }
 
+/* Save timer info to support save/restore */
+static int hvm_timer_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_arm_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t;
+    int i;
+    int rc = 0;
+
+    /* Save the state of vtimer and ptimer */
+    for_each_vcpu( d, v )
+    {
+        t = &v->arch.virt_timer;
+
+        for ( i = 0; i < ARM_TIMER_TYPE_COUNT; i++ )
+        {
+            ctxt.cval = t->cval;
+            ctxt.ctl = t->ctl;
+
+            switch ( i )
+            {
+            case ARM_TIMER_TYPE_PHYS:
+                ctxt.vtb_offset = d->arch.phys_timer_base.offset;
+                ctxt.type = ARM_TIMER_TYPE_PHYS;
+                break;
+            case ARM_TIMER_TYPE_VIRT:
+                ctxt.vtb_offset = d->arch.virt_timer_base.offset;
+                ctxt.type = ARM_TIMER_TYPE_VIRT;
+            default:
+                rc = -EINVAL;
+                break;
+            }
+
+            if ( (rc = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
+                return rc;
+
+            t = &v->arch.phys_timer;
+        }
+    }
+
+    return rc;
+}
+
+/* Restore timer info from context to support save/restore */
+static int hvm_timer_load(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_arm_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t = NULL;
+    int rc = 0;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    switch ( ctxt.type )
+    {
+    case ARM_TIMER_TYPE_PHYS:
+        t = &v->arch.phys_timer;
+        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
+        break;
+    case ARM_TIMER_TYPE_VIRT:
+        t = &v->arch.virt_timer;
+        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
+        break;
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+    t->cval = ctxt.cval;
+    t->ctl = ctxt.ctl;
+    t->v = v;
+
+    return rc;
+}
+
+HVM_REGISTER_SAVE_RESTORE(TIMER, hvm_timer_save, hvm_timer_load, 2,
+                          HVMSR_PER_VCPU);
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 421a6f6..8679bfd 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -72,10 +72,24 @@ struct hvm_arm_gich_v2
 };
 DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
 
+/* Two ARM timers (physical and virtual) are saved */
+#define ARM_TIMER_TYPE_VIRT  0
+#define ARM_TIMER_TYPE_PHYS  1
+#define ARM_TIMER_TYPE_COUNT 2       /* total count */
+
+struct hvm_arm_timer
+{
+    uint64_t vtb_offset;
+    uint32_t ctl;
+    uint64_t cval;
+    uint32_t type;
+};
+DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);
+
 /*
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 3
+#define HVM_SAVE_CODE_MAX 4
 
 #endif
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
                   ` (2 preceding siblings ...)
  2014-05-08 21:18 ` [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer Wei Huang
@ 2014-05-08 21:18 ` Wei Huang
  2014-05-08 23:10   ` Andrew Cooper
                     ` (2 more replies)
  2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

This patch implements a save/resore support for ARM guest core
registers.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/hvm.c                     |  263 +++++++++++++++++++++++++++++++-
 xen/include/public/arch-arm/hvm/save.h |  121 ++++++++++++++-
 2 files changed, 382 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 471c4cd..7bfa547 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -7,14 +7,15 @@
 
 #include <xsm/xsm.h>
 
+#include <xen/hvm/save.h>
 #include <public/xen.h>
 #include <public/hvm/params.h>
 #include <public/hvm/hvm_op.h>
 
 #include <asm/hypercall.h>
+#include <asm/gic.h>
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
     long rc = 0;
 
@@ -65,3 +66,263 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     return rc;
 }
+
+static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_arm_cpu ctxt;
+    struct vcpu_guest_core_regs c;
+    struct vcpu *v;
+
+    /* Save the state of CPU */
+    for_each_vcpu( d, v )
+    {
+        memset(&ctxt, 0, sizeof(ctxt));
+
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.ttbr0 = v->arch.ttbr0;
+        ctxt.ttbr1 = v->arch.ttbr1;
+        ctxt.ttbcr = v->arch.ttbcr;
+
+        ctxt.dacr = v->arch.dacr;
+        ctxt.ifsr = v->arch.ifsr;
+#ifdef CONFIG_ARM_32
+        ctxt.ifar = v->arch.ifar;
+        ctxt.dfar = v->arch.dfar;
+        ctxt.dfsr = v->arch.dfsr;
+#else
+        ctxt.far = v->arch.far;
+        ctxt.esr = v->arch.esr;
+#endif
+
+#ifdef CONFIG_ARM_32
+        ctxt.mair0 = v->arch.mair0;
+        ctxt.mair1 = v->arch.mair1;
+#else
+        ctxt.mair0 = v->arch.mair;
+#endif
+        /* Control Registers */
+        ctxt.actlr = v->arch.actlr;
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.cpacr = v->arch.cpacr;
+
+        ctxt.contextidr = v->arch.contextidr;
+        ctxt.tpidr_el0 = v->arch.tpidr_el0;
+        ctxt.tpidr_el1 = v->arch.tpidr_el1;
+        ctxt.tpidrro_el0 = v->arch.tpidrro_el0;
+
+        /* CP 15 */
+        ctxt.csselr = v->arch.csselr;
+
+        ctxt.afsr0 = v->arch.afsr0;
+        ctxt.afsr1 = v->arch.afsr1;
+        ctxt.vbar = v->arch.vbar;
+        ctxt.par = v->arch.par;
+        ctxt.teecr = v->arch.teecr;
+        ctxt.teehbr = v->arch.teehbr;
+
+#ifdef CONFIG_ARM_32
+        ctxt.joscr = v->arch.joscr;
+        ctxt.jmcr = v->arch.jmcr;
+#endif
+
+        memset(&c, 0, sizeof(c));
+
+        /* get guest core registers */
+        vcpu_regs_hyp_to_user(v, &c);
+
+        ctxt.x0 = c.x0;
+        ctxt.x1 = c.x1;
+        ctxt.x2 = c.x2;
+        ctxt.x3 = c.x3;
+        ctxt.x4 = c.x4;
+        ctxt.x5 = c.x5;
+        ctxt.x6 = c.x6;
+        ctxt.x7 = c.x7;
+        ctxt.x8 = c.x8;
+        ctxt.x9 = c.x9;
+        ctxt.x10 = c.x10;
+        ctxt.x11 = c.x11;
+        ctxt.x12 = c.x12;
+        ctxt.x13 = c.x13;
+        ctxt.x14 = c.x14;
+        ctxt.x15 = c.x15;
+        ctxt.x16 = c.x16;
+        ctxt.x17 = c.x17;
+        ctxt.x18 = c.x18;
+        ctxt.x19 = c.x19;
+        ctxt.x20 = c.x20;
+        ctxt.x21 = c.x21;
+        ctxt.x22 = c.x22;
+        ctxt.x23 = c.x23;
+        ctxt.x24 = c.x24;
+        ctxt.x25 = c.x25;
+        ctxt.x26 = c.x26;
+        ctxt.x27 = c.x27;
+        ctxt.x28 = c.x28;
+        ctxt.x29 = c.x29;
+        ctxt.x30 = c.x30;
+        ctxt.pc64 = c.pc64;
+        ctxt.cpsr = c.cpsr;
+        ctxt.spsr_el1 = c.spsr_el1; /* spsr_svc */
+
+#ifdef CONFIG_ARM_32
+        ctxt.spsr_fiq = c.spsr_fiq;
+        ctxt.spsr_irq = c.spsr_irq;
+        ctxt.spsr_und = c.spsr_und;
+        ctxt.spsr_abt = c.spsr_abt;
+#endif
+#ifdef CONFIG_ARM_64
+        ctxt.sp_el0 = c.sp_el0;
+        ctxt.sp_el1 = c.sp_el1;
+        ctxt.elr_el1 = c.elr_el1;
+#endif
+
+        /* check VFP state size before dumping */
+        BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp_state));
+        memcpy((void*) &ctxt.vfp_state, (void*) &v->arch.vfp, 
+               sizeof(v->arch.vfp));
+
+        if ( hvm_save_entry(VCPU, v->vcpu_id, h, &ctxt) != 0 )
+            return 1;
+    }
+
+    return 0;
+}
+
+static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_arm_cpu ctxt;
+    struct vcpu *v;
+    struct vcpu_guest_core_regs c;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(VCPU, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.sctlr = ctxt.sctlr;
+    v->arch.ttbr0 = ctxt.ttbr0;
+    v->arch.ttbr1 = ctxt.ttbr1;
+    v->arch.ttbcr = ctxt.ttbcr;
+
+    v->arch.dacr = ctxt.dacr;
+    v->arch.ifsr = ctxt.ifsr;
+#ifdef CONFIG_ARM_32
+    v->arch.ifar = ctxt.ifar;
+    v->arch.dfar = ctxt.dfar;
+    v->arch.dfsr = ctxt.dfsr;
+#else
+    v->arch.far = ctxt.far;
+    v->arch.esr = ctxt.esr;
+#endif
+
+#ifdef CONFIG_ARM_32
+    v->arch.mair0 = ctxt.mair0;
+    v->arch.mair1 = ctxt.mair1;
+#else
+    v->arch.mair = ctxt.mair0;
+#endif
+
+    /* Control Registers */
+    v->arch.actlr = ctxt.actlr;
+    v->arch.cpacr = ctxt.cpacr;
+    v->arch.contextidr = ctxt.contextidr;
+    v->arch.tpidr_el0 = ctxt.tpidr_el0;
+    v->arch.tpidr_el1 = ctxt.tpidr_el1;
+    v->arch.tpidrro_el0 = ctxt.tpidrro_el0;
+
+    /* CP 15 */
+    v->arch.csselr = ctxt.csselr;
+
+    v->arch.afsr0 = ctxt.afsr0;
+    v->arch.afsr1 = ctxt.afsr1;
+    v->arch.vbar = ctxt.vbar;
+    v->arch.par = ctxt.par;
+    v->arch.teecr = ctxt.teecr;
+    v->arch.teehbr = ctxt.teehbr;
+#ifdef CONFIG_ARM_32
+    v->arch.joscr = ctxt.joscr;
+    v->arch.jmcr = ctxt.jmcr;
+#endif
+
+    /* fill guest core registers */
+    memset(&c, 0, sizeof(c));
+    c.x0 = ctxt.x0;
+    c.x1 = ctxt.x1;
+    c.x2 = ctxt.x2;
+    c.x3 = ctxt.x3;
+    c.x4 = ctxt.x4;
+    c.x5 = ctxt.x5;
+    c.x6 = ctxt.x6;
+    c.x7 = ctxt.x7;
+    c.x8 = ctxt.x8;
+    c.x9 = ctxt.x9;
+    c.x10 = ctxt.x10;
+    c.x11 = ctxt.x11;
+    c.x12 = ctxt.x12;
+    c.x13 = ctxt.x13;
+    c.x14 = ctxt.x14;
+    c.x15 = ctxt.x15;
+    c.x16 = ctxt.x16;
+    c.x17 = ctxt.x17;
+    c.x18 = ctxt.x18;
+    c.x19 = ctxt.x19;
+    c.x20 = ctxt.x20;
+    c.x21 = ctxt.x21;
+    c.x22 = ctxt.x22;
+    c.x23 = ctxt.x23;
+    c.x24 = ctxt.x24;
+    c.x25 = ctxt.x25;
+    c.x26 = ctxt.x26;
+    c.x27 = ctxt.x27;
+    c.x28 = ctxt.x28;
+    c.x29 = ctxt.x29;
+    c.x30 = ctxt.x30;
+    c.pc64 = ctxt.pc64;
+    c.cpsr = ctxt.cpsr;
+    c.spsr_el1 = ctxt.spsr_el1; /* spsr_svc */
+
+#ifdef CONFIG_ARM_32
+    c.spsr_fiq = ctxt.spsr_fiq;
+    c.spsr_irq = ctxt.spsr_irq;
+    c.spsr_und = ctxt.spsr_und;
+    c.spsr_abt = ctxt.spsr_abt;
+#endif
+#ifdef CONFIG_ARM_64
+    c.sp_el0 = ctxt.sp_el0;
+    c.sp_el1 = ctxt.sp_el1;
+    c.elr_el1 = ctxt.elr_el1;
+#endif
+
+    /* set guest core registers */
+    vcpu_regs_user_to_hyp(v, &c);
+
+    BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp_state));
+    memcpy(&v->arch.vfp, &ctxt,  sizeof(v->arch.vfp));
+
+    v->is_initialised = 1;
+    clear_bit(_VPF_down, &v->pause_flags);
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(VCPU, hvm_save_cpu_ctxt, hvm_load_cpu_ctxt, 1, 
+                          HVMSR_PER_VCPU);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 8679bfd..18e5899 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -86,10 +86,129 @@ struct hvm_arm_timer
 };
 DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);
 
+/* ARM core hardware info */
+struct hvm_arm_cpu
+{
+    /* ======= Guest VFP State =======
+     *   - 34 8-bytes required for AArch32 guests
+     *   - 66 8-bytes required for AArch64 guests
+     */
+    uint64_t vfp_state[66];
+
+    /* ======= Guest Core Registers =======
+     *   - Each reg is multiplexed for AArch64 and AArch32 guests, if possible
+     *   - Each comments, /AArch64_reg, AArch32_reg/, describes its
+     *     corresponding 64- and 32-bit register name. "NA" means
+     *     "Not Applicable".
+     *   - Check "struct vcpu_guest_core_regs" for details.
+     */
+    uint64_t x0;     /* x0, r0_usr */
+    uint64_t x1;     /* x1, r1_usr */
+    uint64_t x2;     /* x2, r2_usr */
+    uint64_t x3;     /* x3, r3_usr */
+    uint64_t x4;     /* x4, r4_usr */
+    uint64_t x5;     /* x5, r5_usr */
+    uint64_t x6;     /* x6, r6_usr */
+    uint64_t x7;     /* x7, r7_usr */
+    uint64_t x8;     /* x8, r8_usr */
+    uint64_t x9;     /* x9, r9_usr */
+    uint64_t x10;    /* x10, r10_usr */
+    uint64_t x11;    /* x11, r11_usr */
+    uint64_t x12;    /* x12, r12_usr */
+    uint64_t x13;    /* x13, sp_usr */
+    uint64_t x14;    /* x14, lr_usr; */
+    uint64_t x15;    /* x15, __unused_sp_hyp */
+    uint64_t x16;    /* x16, lr_irq */
+    uint64_t x17;    /* x17, sp_irq */
+    uint64_t x18;    /* x18, lr_svc */
+    uint64_t x19;    /* x19, sp_svc */
+    uint64_t x20;    /* x20, lr_abt */
+    uint64_t x21;    /* x21, sp_abt */
+    uint64_t x22;    /* x22, lr_und */
+    uint64_t x23;    /* x23, sp_und */
+    uint64_t x24;    /* x24, r8_fiq */
+    uint64_t x25;    /* x25, r9_fiq */
+    uint64_t x26;    /* x26, r10_fiq */
+    uint64_t x27;    /* x27, r11_fiq */
+    uint64_t x28;    /* x28, r12_fiq */
+    uint64_t x29;    /* fp, sp_fiq */
+    uint64_t x30;    /* lr, lr_fiq */
+
+    /* return address (EL1 ==> EL0) */
+    uint64_t elr_el1;    /* elr_el1, NA */
+    /* return address (EL2 ==> EL1) */
+    uint64_t pc64;       /* elr_el2, elr_el2 */
+
+    /* spsr registers */
+    uint32_t spsr_el1;   /* spsr_el1, spsr_svc */
+    uint32_t spsr_fiq;   /* NA, spsr_fiq */
+    uint32_t spsr_irq;   /* NA, spsr_irq */
+    uint32_t spsr_und;   /* NA, spsr_und */
+    uint32_t spsr_abt;   /* NA, spsr_abt */
+
+    /* stack pointers */
+    uint64_t sp_el0;     /* sp_el0, NA */
+    uint64_t sp_el1;     /* sp_el1, NA */
+
+    /* guest mode */
+    uint32_t cpsr;   /* spsr_el2, spsr_el2 */
+
+    /* ======= Guest System Registers =======
+     *   - multiplexed for AArch32 and AArch64 guests
+     *   - 64-bit preferred if needed (for 64-bit guests)
+     *   - architecture specific registers are noted specifically
+     */
+    /* exception */
+    uint64_t vbar;      /* vbar, vbar */
+
+    /* mmu related */
+    uint64_t ttbcr;     /* ttbcr, ttbcr */
+    uint64_t ttbr0;     /* ttbr0, ttbr0 */
+    uint64_t ttbr1;     /* ttbr1, ttbr1 */
+    uint32_t dacr;      /* NA, dacr32 */
+
+    uint64_t par;       /* par, par */
+    uint64_t mair0;     /* mair, mair0 */
+    uint64_t mair1;     /* NA, mair1 */
+
+    /* fault status */
+    uint32_t ifar;      /* ifar, ifar */
+    uint32_t ifsr;      /* ifsr, ifsr */
+    uint32_t dfar;      /* dfar, dfar */
+    uint32_t dfsr;      /* dfsr, dfsr */
+
+    uint64_t far;       /* far, far */
+    uint64_t esr;       /* esr, esr */
+
+    uint32_t afsr0;     /* afsr0, afsr0 */
+    uint32_t afsr1;     /* afsr1, afsr1 */
+
+    /* thumbee and jazelle */
+    uint32_t teecr;     /* NA, teecr */
+    uint32_t teehbr;    /* NA, teehbr */
+
+    uint32_t joscr;     /* NA, joscr */
+    uint32_t jmcr;      /* NA, jmcr */
+
+    /* control registers */
+    uint32_t sctlr;     /* sctlr, sctlr */
+    uint32_t actlr;     /* actlr, actlr */
+    uint32_t cpacr;     /* cpacr, cpacr */
+
+    uint32_t csselr;    /* csselr, csselr */
+
+    /* software management related */
+    uint32_t contextidr;  /* contextidr, contextidr */
+    uint64_t tpidr_el0;   /* tpidr_el0, tpidr_el0 */
+    uint64_t tpidr_el1;   /* tpidr_el1, tpidr_el1 */
+    uint64_t tpidrro_el0; /* tpidrro_el0, tdidrro_el0 */
+};
+DECLARE_HVM_SAVE_TYPE(VCPU, 5, struct hvm_arm_cpu);
+
 /*
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 4
+#define HVM_SAVE_CODE_MAX 5
 
 #endif
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
                   ` (3 preceding siblings ...)
  2014-05-08 21:18 ` [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers Wei Huang
@ 2014-05-08 21:18 ` Wei Huang
  2014-05-08 23:46   ` Andrew Cooper
                     ` (3 more replies)
  2014-05-08 21:18 ` [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration Wei Huang
                   ` (2 subsequent siblings)
  7 siblings, 4 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

This patch implements log_dirty for ARM guest VMs. This feature
is provided via two basic blocks: dirty_bit_map and VLPT
(virtual-linear page table)

1. VLPT provides fast accessing of 3rd PTE of guest P2M.
When creating a mapping for VLPT, the page table mapping
becomes the following:
   xen's 1st PTE --> xen's 2nd PTE --> guest p2m's 2nd PTE -->
   guest p2m's 3rd PTE

With VLPT, xen can immediately locate the 3rd PTE of guest P2M
and modify PTE attirbute during dirty page tracking. The following
link shows the performance comparison for handling a dirty-page
between VLPT and typical page table walking.
http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html

For more info about VLPT, please see
http://www.technovelty.org/linux/virtual-linear-page-table.html.

2. Dirty bitmap
The dirty bitmap is used to mark the pages which are dirty during
migration. The info is used by Xen tools, via DOMCTL_SHADOW_OP_*,
to figure out which guest pages need to be resent.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/domain.c           |    6 +
 xen/arch/arm/domctl.c           |   31 +++-
 xen/arch/arm/mm.c               |  298 ++++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/p2m.c              |  204 +++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |    9 ++
 xen/include/asm-arm/config.h    |   12 +-
 xen/include/asm-arm/domain.h    |   19 +++
 xen/include/asm-arm/mm.h        |   23 +++
 xen/include/asm-arm/p2m.h       |    8 +-
 xen/include/asm-arm/processor.h |    2 +
 10 files changed, 599 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 40f1c3a..2eb5ce0 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -208,6 +208,9 @@ static void ctxt_switch_to(struct vcpu *n)
 
     isb();
 
+    /* Dirty-page tracing */
+    log_dirty_restore(n->domain);
+
     /* This is could trigger an hardware interrupt from the virtual
      * timer. The interrupt needs to be injected into the guest. */
     WRITE_SYSREG32(n->arch.cntkctl, CNTKCTL_EL1);
@@ -504,6 +507,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    /* Init log dirty support */
+    log_dirty_init(d);
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index 45974e7..f1c34da 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -10,30 +10,53 @@
 #include <xen/errno.h>
 #include <xen/sched.h>
 #include <xen/hypercall.h>
+#include <xen/guest_access.h>
 #include <public/domctl.h>
 
 long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
+    long ret = 0;
+    bool_t copyback = 0;
+
     switch ( domctl->cmd )
     {
+    case XEN_DOMCTL_shadow_op:
+    {
+        ret = -EINVAL;
+        copyback = 1;
+
+        if ( (d == current->domain) )   /* no domain_pause() */
+            break;
+
+        domain_pause(d);
+        ret = dirty_mode_op(d, &domctl->u.shadow_op);
+        domain_unpause(d);
+    }
+    break;
+
     case XEN_DOMCTL_cacheflush:
     {
         unsigned long s = domctl->u.cacheflush.start_pfn;
         unsigned long e = s + domctl->u.cacheflush.nr_pfns;
 
         if ( domctl->u.cacheflush.nr_pfns > (1U<<MAX_ORDER) )
-            return -EINVAL;
+            ret = -EINVAL;
 
         if ( e < s )
-            return -EINVAL;
+            ret = -EINVAL;
 
-        return p2m_cache_flush(d, s, e);
+        ret = p2m_cache_flush(d, s, e);
     }
 
     default:
-        return subarch_do_domctl(domctl, d, u_domctl);
+        ret = subarch_do_domctl(domctl, d, u_domctl);
     }
+
+    if ( copyback && __copy_to_guest(u_domctl, domctl, 1) )
+        ret = -EFAULT;
+
+    return ret;
 }
 
 void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index eac228c..81c0691 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -865,7 +865,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e)
     create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
 }
 
-enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
 static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)
 {
     lpae_t pte;
@@ -945,11 +944,6 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
     return 0;
 }
 
-unsigned long domain_get_maximum_gpfn(struct domain *d)
-{
-    return -ENOSYS;
-}
-
 void share_xen_page_with_guest(struct page_info *page,
                           struct domain *d, int readonly)
 {
@@ -1235,6 +1229,298 @@ int is_iomem_page(unsigned long mfn)
         return 1;
     return 0;
 }
+
+
+/* Return start and end addr of guest RAM. Note this function only reports 
+ * regular RAM. It does not cover other areas such as foreign mapped
+ * pages or MMIO space. */
+void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end)
+{
+    if ( start )
+        *start = GUEST_RAM_BASE;
+
+    if ( end )
+        *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages << PAGE_SHIFT);
+}
+
+/* Return the maximum GPFN of guest VM. It covers all guest memory types. */
+unsigned long domain_get_maximum_gpfn(struct domain *d)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    return p2m->max_mapped_gfn;
+}
+
+/************************************/
+/*    Dirty Page Tracking Support   */
+/************************************/
+/* Mark the bitmap for a corresponding page as dirty */
+static inline void bitmap_mark_dirty(struct domain *d, paddr_t addr)
+{
+    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
+    int bit_index = PFN_DOWN(addr - ram_base);
+    int page_index = bit_index >> (PAGE_SHIFT + 3);
+    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) - 1);
+
+    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
+}
+
+/* Allocate dirty bitmap resource */
+static int bitmap_init(struct domain *d)
+{
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    int nr_bytes;
+    int nr_pages;
+    int i;
+
+    domain_get_ram_range(d, &gma_start, &gma_end);
+
+    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
+    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+    BUG_ON(nr_pages > MAX_DIRTY_BITMAP_PAGES);
+
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        struct page_info *page;
+        page = alloc_domheap_page(NULL, 0);
+        if ( page == NULL )
+            goto cleanup_on_failure;
+
+        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));
+        clear_page(d->arch.dirty.bitmap[i]);
+    }
+
+    d->arch.dirty.bitmap_pages = nr_pages;
+    return 0;
+
+cleanup_on_failure:
+    nr_pages = i;
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+
+    return -ENOMEM;
+}
+
+/* Cleanup dirty bitmap resource */
+static void bitmap_cleanup(struct domain *d)
+{
+    int i;
+
+    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+}
+
+/* Flush VLPT area */
+static void vlpt_flush(struct domain *d)
+{
+    int flush_size;
+    flush_size = (d->arch.dirty.second_lvl_end - 
+                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
+
+    /* flushing the 3rd level mapping */
+    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start << SECOND_SHIFT,
+                                flush_size);
+}
+
+/* Set up a page table for VLPT mapping */
+static int vlpt_init(struct domain *d)
+{
+    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
+    int xen_second_linear_base;
+    int gp2m_start_index, gp2m_end_index;
+    struct p2m_domain *p2m = &d->arch.p2m;
+    struct page_info *second_lvl_page;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    lpae_t *first[2];
+    int i;
+
+    /* Check if reserved space is enough to cover guest physical address space.
+     * Note that each LPAE page table entry is 64-bit (8 bytes). So we only
+     * shift left with LPAE_SHIFT instead of PAGE_SHIFT. */
+    domain_get_ram_range(d, &gma_start, &gma_end);
+    required = (gma_end - gma_start) >> LPAE_SHIFT;
+    if ( required > avail )
+    {
+        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest (avail: "
+                "%#llx, required: %#llx)\n", (unsigned long long)avail,
+                (unsigned long long)required);
+        return -ENOMEM;
+    }
+
+    /* Caulculate the base of 2nd linear table base for VIRT_LIN_P2M_START */
+    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
+
+    gp2m_start_index = gma_start >> FIRST_SHIFT;
+    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
+
+    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
+    {
+        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
+        return -ENOMEM;
+    }
+
+    /* Two pages are allocated to backup the related PTE content of guest 
+     * VM's 1st-level table. */
+    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
+    if ( second_lvl_page == NULL )
+        return -ENOMEM;
+    d->arch.dirty.second_lvl[0] = map_domain_page_global(
+        page_to_mfn(second_lvl_page) );
+    d->arch.dirty.second_lvl[1] = map_domain_page_global(
+        page_to_mfn(second_lvl_page+1) );
+
+    /* 1st level P2M of guest VM is 2 consecutive pages */
+    first[0] = __map_domain_page(p2m->first_level);
+    first[1] = __map_domain_page(p2m->first_level+1);
+
+    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
+        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
+
+        /* Update 2nd-level PTE of Xen linear table. With this, Xen linear 
+         * page table layout becomes: 1st Xen linear ==> 2nd Xen linear ==> 
+         * 2nd guest P2M (i.e. 3rd Xen linear) ==> 3rd guest P2M (i.e. Xen 
+         * linear content) for VIRT_LIN_P2M_START address space. */
+        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
+
+        /* We copy the mapping into domain's structure as a reference
+         * in case of the context switch (used in vlpt_restore function ) */
+        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
+    }
+    unmap_domain_page(first[0]);
+    unmap_domain_page(first[1]);
+
+    /* storing the start and end index */
+    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
+    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
+
+    vlpt_flush(d);
+
+    return 0;
+}
+
+static void vlpt_cleanup(struct domain *d)
+{
+    /* First level p2m is 2 consecutive pages */
+    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
+    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
+}
+
+/* Returns zero if addr is not valid or dirty mode is not set */
+int handle_page_fault(struct domain *d, paddr_t addr)
+{
+    lpae_t *vlp2m_pte = 0;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+
+    if ( !d->arch.dirty.mode )
+        return 0;
+
+    domain_get_ram_range(d, &gma_start, &gma_end);
+
+    /* Ensure that addr is inside guest's RAM */
+    if ( addr < gma_start || addr > gma_end )
+        return 0;
+
+    vlp2m_pte = vlpt_get_3lvl_pte(addr);
+    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
+         vlp2m_pte->p2m.type == p2m_ram_logdirty )
+    {
+        lpae_t pte = *vlp2m_pte;
+        pte.p2m.write = 1;
+        write_pte(vlp2m_pte, pte);
+        flush_tlb_local();
+
+        /* only necessary to lock between get-dirty bitmap and mark dirty
+         * bitmap. If get-dirty bitmap happens immediately before this
+         * lock, the corresponding dirty-page would be marked at the next
+         * round of get-dirty bitmap */
+        spin_lock(&d->arch.dirty.lock);
+        bitmap_mark_dirty(d, addr);
+        spin_unlock(&d->arch.dirty.lock);
+    }
+
+    return 1;
+}
+
+/* Restore the xen page table for vlpt mapping for domain */
+void log_dirty_restore(struct domain *d)
+{
+    int i;
+
+    /* Nothing to do as log dirty mode is off */
+    if ( !(d->arch.dirty.mode) )
+        return;
+
+    dsb(sy);
+
+    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
+          ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+
+        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
+        {
+            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
+            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
+        }
+    }
+
+    dsb(sy);
+    isb();
+}
+
+/* Turn on log dirty */
+int log_dirty_on(struct domain *d)
+{
+    if ( vlpt_init(d) || bitmap_init(d) )
+        return -EINVAL;
+
+    return 0;
+}
+
+/* Turn off log dirty */
+void log_dirty_off(struct domain *d)
+{
+    bitmap_cleanup(d);
+    vlpt_cleanup(d);
+}
+
+/* Initialize log dirty fields */
+int log_dirty_init(struct domain *d)
+{
+    d->arch.dirty.count = 0;
+    d->arch.dirty.mode = 0;
+    spin_lock_init(&d->arch.dirty.lock);
+
+    d->arch.dirty.second_lvl_start = 0;
+    d->arch.dirty.second_lvl_end = 0;
+    d->arch.dirty.second_lvl[0] = NULL;
+    d->arch.dirty.second_lvl[1] = NULL;
+
+    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
+    d->arch.dirty.bitmap_pages = 0;
+
+    return 0;
+}
+
+/* Log dirty tear down */
+void log_dirty_teardown(struct domain *d)
+{
+    return;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 603c097..0808cc9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -6,6 +6,8 @@
 #include <xen/bitops.h>
 #include <asm/flushtlb.h>
 #include <asm/gic.h>
+#include <xen/guest_access.h>
+#include <xen/pfn.h>
 #include <asm/event.h>
 #include <asm/hardirq.h>
 #include <asm/page.h>
@@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
         break;
 
     case p2m_ram_ro:
+    case p2m_ram_logdirty:
         e.p2m.xn = 0;
         e.p2m.write = 0;
         break;
@@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
 
+    /* mark the write bit (page table's case, ro bit) as 0
+     * so, it is writable in case of vlpt access */
+    pte.pt.ro = 0;
+
     write_pte(entry, pte);
 
     return 0;
@@ -696,6 +703,203 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
     return p >> PAGE_SHIFT;
 }
 
+/* Change types across all p2m entries in a domain */
+void p2m_change_entry_type_global(struct domain *d, enum mg nt)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+    paddr_t ram_base;
+    int i1, i2, i3;
+    int first_index, second_index, third_index;
+    lpae_t *first = __map_domain_page(p2m->first_level);
+    lpae_t pte, *second = NULL, *third = NULL;
+
+    domain_get_ram_range(d, &ram_base, NULL);
+
+    first_index = first_table_offset((uint64_t)ram_base);
+    second_index = second_table_offset((uint64_t)ram_base);
+    third_index = third_table_offset((uint64_t)ram_base);
+
+    BUG_ON(!first);
+
+    spin_lock(&p2m->lock);
+
+    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
+    {
+        lpae_walk_t first_pte = first[i1].walk;
+        if ( !first_pte.valid || !first_pte.table )
+            goto out;
+
+        second = map_domain_page(first_pte.base);
+        BUG_ON(!second);
+
+        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
+        {
+            lpae_walk_t second_pte = second[i2].walk;
+
+            if ( !second_pte.valid || !second_pte.table )
+                goto out;
+
+            third = map_domain_page(second_pte.base);
+            BUG_ON(!third);
+
+            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
+            {
+                lpae_walk_t third_pte = third[i3].walk;
+
+                if ( !third_pte.valid )
+                    goto out;
+
+                pte = third[i3];
+
+                if ( nt == mg_ro )
+                {
+                    if ( pte.p2m.write == 1 )
+                    {
+                        pte.p2m.write = 0;
+                        pte.p2m.type = p2m_ram_logdirty;
+                    }
+                    else
+                    {
+                        /* reuse avail bit as an indicator of 'actual' 
+                         * read-only */
+                        pte.p2m.type = p2m_ram_rw;
+                    }
+                }
+                else if ( nt == mg_rw )
+                {
+                    if ( pte.p2m.write == 0 && 
+                         pte.p2m.type == p2m_ram_logdirty )
+                    {
+                        pte.p2m.write = p2m_ram_rw;
+                    }
+                }
+                write_pte(&third[i3], pte);
+            }
+            unmap_domain_page(third);
+
+            third = NULL;
+            third_index = 0;
+        }
+        unmap_domain_page(second);
+
+        second = NULL;
+        second_index = 0;
+        third_index = 0;
+    }
+
+out:
+    flush_tlb_all_local();
+    if ( third ) unmap_domain_page(third);
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
+
+    spin_unlock(&p2m->lock);
+}
+
+/* Read a domain's log-dirty bitmap and stats. If the operation is a CLEAN, 
+ * clear the bitmap and stats. */
+int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    int peek = 1;
+    int i;
+    int bitmap_size;
+    paddr_t gma_start, gma_end;
+
+    /* this hypercall is called from domain 0, and we don't know which guest's
+     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
+    log_dirty_restore(d);
+
+    domain_get_ram_range(d, &gma_start, &gma_end);
+    bitmap_size = (gma_end - gma_start) / 8;
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+    {
+        peek = 0;
+    }
+    else
+    {
+        spin_lock(&d->arch.dirty.lock);
+
+        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+        {
+            int j = 0;
+            uint8_t *bitmap;
+
+            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
+                                 d->arch.dirty.bitmap[i],
+                                 bitmap_size < PAGE_SIZE ? bitmap_size :
+                                                           PAGE_SIZE);
+            bitmap_size -= PAGE_SIZE;
+
+            /* set p2m page table read-only */
+            bitmap = d->arch.dirty.bitmap[i];
+            while ((j = find_next_bit((const long unsigned int *)bitmap,
+                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
+            {
+                lpae_t *vlpt;
+                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +
+                    (j << PAGE_SHIFT);
+                vlpt = vlpt_get_3lvl_pte(addr);
+                vlpt->p2m.write = 0;
+                j++;
+            }
+        }
+
+        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
+        {
+            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+            {
+                clear_page(d->arch.dirty.bitmap[i]);
+            }
+        }
+
+        spin_unlock(&d->arch.dirty.lock);
+        flush_tlb_local();
+    }
+
+    sc->stats.dirty_count = d->arch.dirty.count;
+
+    return 0;
+}
+
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    long ret = 0;
+    switch (sc->op)
+    {
+        case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        case XEN_DOMCTL_SHADOW_OP_OFF:
+        {
+            enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro;
+
+            d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1;
+            p2m_change_entry_type_global(d, nt);
+
+            if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF )
+            {
+                log_dirty_off(d);
+            }
+            else
+            {
+                if ( (ret = log_dirty_on(d)) )
+                    return ret;
+            }
+        }
+        break;
+
+        case XEN_DOMCTL_SHADOW_OP_CLEAN:
+        case XEN_DOMCTL_SHADOW_OP_PEEK:
+        {
+            ret = log_dirty_op(d, sc);
+        }
+        break;
+
+        default:
+            return -ENOSYS;
+    }
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index df4d375..b652565 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1556,6 +1556,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
     struct hsr_dabt dabt = hsr.dabt;
     int rc;
     mmio_info_t info;
+    int page_fault = ( dabt.write && ((dabt.dfsc & FSC_MASK) == 
+                                      (FSC_FLT_PERM|FSC_3RD_LEVEL)) );
 
     if ( !check_conditional_instr(regs, hsr) )
     {
@@ -1577,6 +1579,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
     if ( rc == -EFAULT )
         goto bad_data_abort;
 
+    /* domU page fault handling for guest live migration. Note that 
+     * dabt.valid can be 0 here */
+    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
+    {
+        /* Do not modify PC as guest needs to repeat memory operation */
+        return;
+    }
     /* XXX: Decode the instruction if ISS is not valid */
     if ( !dabt.valid )
         goto bad_data_abort;
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index ef291ff..f18fae4 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -87,6 +87,7 @@
  *   0  -   8M   <COMMON>
  *
  *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
+ * 128M - 256M   Virtual-linear mapping to P2M table
  * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
  *                    space
  *
@@ -124,13 +125,15 @@
 #define CONFIG_SEPARATE_XENHEAP 1
 
 #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
-#define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
+#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
+#define VMAP_VIRT_START        _AT(vaddr_t,0x10000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
 #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
 #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
 #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
 #define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)
 
-#define VMAP_VIRT_END    XENHEAP_VIRT_START
+#define VMAP_VIRT_END          XENHEAP_VIRT_START
 
 #define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
 
@@ -157,6 +160,11 @@
 
 #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
 
+/* Definition for VIRT_LIN_P2M_START and VIRT_LIN_P2M_END (64-bit)
+ * TODO: Needs evaluation. */
+#define VIRT_LIN_P2M_START     _AT(vaddr_t, 0x08000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
+
 #endif
 
 /* Fixmap slots */
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index aabeb51..ac82643 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -162,6 +162,25 @@ struct arch_domain
     } vuart;
 
     unsigned int evtchn_irq;
+
+    /* dirty page tracing */
+    struct {
+        spinlock_t lock;
+        volatile int mode;               /* 1 if dirty pages tracing enabled */
+        volatile unsigned int count;     /* dirty pages counter */
+
+        /* vlpt context switch */
+        volatile int second_lvl_start; /* start idx of virt linear space 2nd */
+        volatile int second_lvl_end;   /* end idx of virt linear space 2nd */
+        lpae_t *second_lvl[2];         /* copy of guest P2M 1st-lvl content */
+
+        /* bitmap to track dirty pages */
+#define MAX_DIRTY_BITMAP_PAGES 64
+        /* Because each bit represents a dirty page, the total supported guest 
+         * memory is (64 entries x 4KB/entry x 8bits/byte x 4KB) = 8GB. */
+        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES]; /* dirty bitmap */
+        int bitmap_pages;                        /* # of dirty bitmap pages */
+    } dirty;
 }  __cacheline_aligned;
 
 struct arch_vcpu
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index b8d4e7d..ab19025 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -4,6 +4,7 @@
 #include <xen/config.h>
 #include <xen/kernel.h>
 #include <asm/page.h>
+#include <asm/config.h>
 #include <public/xen.h>
 
 /* Align Xen to a 2 MiB boundary. */
@@ -320,6 +321,7 @@ int donate_page(
 #define domain_clamp_alloc_bitsize(d, b) (b)
 
 unsigned long domain_get_maximum_gpfn(struct domain *d);
+void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end);
 
 extern struct domain *dom_xen, *dom_io, *dom_cow;
 
@@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page)
     put_page(page);
 }
 
+enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
+
+/************************************/
+/*    Log-dirty support functions   */
+/************************************/
+int log_dirty_on(struct domain *d);
+void log_dirty_off(struct domain *d);
+int log_dirty_init(struct domain *d);
+void log_dirty_teardown(struct domain *d);
+void log_dirty_restore(struct domain *d);
+int handle_page_fault(struct domain *d, paddr_t addr);
+/* access leaf PTE of a given guest address (GPA) */
+static inline lpae_t * vlpt_get_3lvl_pte(paddr_t addr)
+{
+    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
+
+    /* Since we slotted the guest's first p2m page table to xen's
+     * second page table, one shift is enough for calculating the
+     * index of guest p2m table entry */
+    return &table[addr >> PAGE_SHIFT];
+}
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index bd71abe..0cecbe7 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -2,6 +2,7 @@
 #define _XEN_P2M_H
 
 #include <xen/mm.h>
+#include <public/domctl.h>
 
 struct domain;
 
@@ -41,6 +42,7 @@ typedef enum {
     p2m_invalid = 0,    /* Nothing mapped here */
     p2m_ram_rw,         /* Normal read/write guest RAM */
     p2m_ram_ro,         /* Read-only; writes are silently dropped */
+    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
     p2m_mmio_direct,    /* Read/write mapping of genuine MMIO area */
     p2m_map_foreign,    /* Ram pages from foreign domain */
     p2m_grant_map_rw,   /* Read/write grant mapping */
@@ -49,7 +51,8 @@ typedef enum {
 } p2m_type_t;
 
 #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
-#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
+#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro ||  \
+                             (_t) == p2m_ram_logdirty)
 
 /* Initialise vmid allocator */
 void p2m_vmid_allocator_init(void);
@@ -178,6 +181,9 @@ static inline int get_page_and_type(struct page_info *page,
     return rc;
 }
 
+void p2m_change_entry_type_global(struct domain *d, enum mg nt);
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
+
 #endif /* _XEN_P2M_H */
 
 /*
diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index 750864a..0bf3d67 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -407,6 +407,8 @@ union hsr {
 #define FSC_CPR        (0x3a) /* Coprocossor Abort */
 
 #define FSC_LL_MASK    (_AC(0x03,U)<<0)
+#define FSC_MASK       (0x3f) /* Fault status mask */
+#define FSC_3RD_LEVEL  (0x03) /* Third level fault */
 
 /* Time counter hypervisor control register */
 #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical counter */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
                   ` (4 preceding siblings ...)
  2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
@ 2014-05-08 21:18 ` Wei Huang
  2014-05-14 13:20   ` Ian Campbell
  2014-05-11  9:23 ` [RFC v3 0/6] xen/arm: ARM save/restore/migration support Julien Grall
  2014-05-12 14:17 ` Julien Grall
  7 siblings, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-08 21:18 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, tim, jaeyong.yoo, jbeulich, ian.jackson,
	yjhyun.yoo

This patch implements xl save/restore operation in xc_arm_migrate.c
and make it compilable with existing design. The operation is also
used by migration.

The overall process of save/restore is the following: 1) save guest
parameters; 2) save memory; 3) save HVM states.

Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 config/arm32.mk              |    1 +
 config/arm64.mk              |    1 +
 tools/libxc/Makefile         |    5 +
 tools/libxc/xc_arm_migrate.c |  653 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c     |    4 +-
 tools/libxc/xc_resume.c      |   20 +-
 tools/libxl/libxl.h          |    3 -
 tools/misc/Makefile          |    4 +-
 8 files changed, 677 insertions(+), 14 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c

diff --git a/config/arm32.mk b/config/arm32.mk
index aa79d22..01374c9 100644
--- a/config/arm32.mk
+++ b/config/arm32.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_32 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX :=
 
diff --git a/config/arm64.mk b/config/arm64.mk
index 15b57a4..7ac3b65 100644
--- a/config/arm64.mk
+++ b/config/arm64.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_64 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX :=
 
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index a74b19e..9aaa6ff 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -43,8 +43,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
 GUEST_SRCS-y :=
 GUEST_SRCS-y += xg_private.c xc_suspend.c
 ifeq ($(CONFIG_MIGRATE),y)
+ifeq ($(CONFIG_X86),y)
 GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
+endif
+ifeq ($(CONFIG_ARM),y)
+GUEST_SRCS-y += xc_arm_migrate.c
+endif
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c
new file mode 100644
index 0000000..776e373
--- /dev/null
+++ b/tools/libxc/xc_arm_migrate.c
@@ -0,0 +1,653 @@
+/*
+ * Copyright (c) 2014 Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <inttypes.h>
+#include <errno.h>
+#include <xenctrl.h>
+#include <xenguest.h>
+
+#include <unistd.h>
+#include <xc_private.h>
+#include <xc_dom.h>
+#include "xc_bitops.h"
+#include "xg_private.h"
+
+#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop   */
+#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size  */
+#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */
+#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
+
+/* Guest params to save */
+typedef struct guest_params
+{
+    unsigned long console_pfn;
+    unsigned long store_pfn;
+    uint32_t flags;
+    xen_pfn_t start_gpfn;
+    xen_pfn_t max_gpfn;
+    uint32_t max_vcpu_id;
+} guest_params_t;
+
+/***********************************/
+/*           Misc Support          */
+/***********************************/
+static int suspend_and_state(int (*suspend)(void*), void *data,
+                             xc_interface *xch, int dom)
+{
+    xc_dominfo_t info;
+
+    if ( !(*suspend)(data) )
+    {
+        PERROR("Suspend request failed");
+        return -1;
+    }
+
+    if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) || info.domid != dom ||
+         !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
+    {
+        PERROR("Domain is not in suspended state after suspend attempt");
+        return -1;
+    }
+
+    return 0;
+}
+
+/***********************************/
+/*           Guest Memory          */
+/***********************************/
+static int save_guest_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                             struct save_callbacks *callbacks,
+                             uint32_t max_iters, uint32_t max_factor,
+                             guest_params_t *params)
+{
+    int live =  !!(params->flags & XCFLAGS_LIVE);
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    xen_pfn_t i;
+    char reportbuf[80];
+    int iter = 0;
+    int last_iter = !live;
+    int total_dirty = 0;    /* total number of dirty pages */
+    int prev_iter_dirty = 0;
+    int count = 0;
+    char *page = NULL;
+    xen_pfn_t *busy_pages = 0;
+    int busy_pages_count = 0;
+    int busy_pages_max = 256;
+    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
+    xen_pfn_t start = params->start_gpfn;
+    const xen_pfn_t end = params->max_gpfn;
+    const xen_pfn_t mem_size = end - start;    
+    int rc = -1, frc;
+
+    if ( debug )
+    {
+        IPRINTF("Saving memory: start=0x%llx end=0x%llx!\n",
+                (unsigned long long)start, (unsigned long long)end);
+    }
+
+    /* Note: to_send will be deallocated at the end of this function. */
+    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
+                                              NRPAGES(bitmap_size(mem_size)));
+    if ( !to_send )
+    {
+        PERROR("Can't allocate to_send array!\n");
+        goto out;
+    }
+    /* send all pages on first iter */
+    memset(to_send, 0xff, bitmap_size(mem_size));
+
+    if ( live )
+    {
+        if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                               NULL, 0, NULL, 0, NULL) < 0 )
+        {
+            PERROR("Can't enable log-dirty mode!\n");
+            goto out;
+        }
+
+        max_iters  = max_iters  ? : DEF_MAX_ITERS;
+        max_factor = max_factor ? : DEF_MAX_FACTOR;
+    }
+
+    for ( ; ; )
+    {
+        int current_iter_dirty = 0; /* num of dirty pages in current iter */
+
+        iter++;
+        snprintf(reportbuf, sizeof(reportbuf),
+                 "Saving memory: iter %d, last sent %u", iter, 
+                 prev_iter_dirty);
+
+        xc_report_progress_start(xch, reportbuf, mem_size);
+
+        if ( (iter > 1 && prev_iter_dirty < DEF_MIN_DIRTY_PER_ITER) ||
+             (iter == max_iters) ||
+             (total_dirty >= mem_size*max_factor) )
+        {
+            last_iter = 1;
+        }
+
+        if ( last_iter )
+        {
+            if ( suspend_and_state(callbacks->suspend, callbacks->data, xch, 
+                                   dom) )
+            {
+                PERROR("Domain appears not to have suspended");
+                goto out;
+            }
+        }
+
+        if ( live && iter > 1 )
+        {
+            frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                                    HYPERCALL_BUFFER(to_send), mem_size,
+                                    NULL, 0, NULL);
+            if ( frc != mem_size )
+            {
+                PERROR("Can't peek bitmap of guest VM's dirty pages");
+                goto out;
+            }
+        }
+
+        /* busy_pages allocation */
+        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
+
+        for ( i = start; i < end; ++i )
+        {
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i);
+                if ( !page )
+                {
+                    /* This page is mapped elsewhere, should be resent later */
+                    busy_pages[busy_pages_count] = i;
+                    busy_pages_count++;
+                    if ( busy_pages_count >= busy_pages_max )
+                    {
+                        busy_pages_max += 256;
+                        busy_pages = realloc(busy_pages, sizeof(xen_pfn_t) *
+                                             busy_pages_max);
+                    }
+                    continue;
+                }
+
+                if ( write_exact(io_fd, &i, sizeof(i)) ||
+                     write_exact(io_fd, page, PAGE_SIZE) )
+                {
+                    PERROR("Write Error: guest memory gpfn and content");
+                    free(busy_pages); /* must do here */
+                    goto out;
+                }
+
+                if ( (i % DEF_PROGRESS_RATE) == 0 )
+                    xc_report_progress_step(xch, i - start, mem_size);
+
+                count++;
+                munmap(page, PAGE_SIZE);
+                current_iter_dirty++;
+            }
+        }
+
+        while ( busy_pages_count )
+        {
+            /* Send busy pages */
+            busy_pages_count--;
+            i = busy_pages[busy_pages_count];
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i);
+                if ( !page )
+                {
+                    IPRINTF("WARNING: 2nd attempt to save page failed at "
+                            "pfn=0x%llx", (unsigned long long)i);
+                    continue;
+                }
+
+                if ( write_exact(io_fd, &i, sizeof(i)) ||
+                     write_exact(io_fd, page, PAGE_SIZE) )
+                {
+                    PERROR("Write Error: guest memory gpfn and content");
+                    free(busy_pages); /* must do here */
+                    goto out;
+                }
+
+                count++;
+                munmap(page, PAGE_SIZE);
+                current_iter_dirty++;
+            }
+        }
+        free(busy_pages);
+
+        xc_report_progress_step(xch, mem_size, mem_size);
+
+        prev_iter_dirty = current_iter_dirty;
+        total_dirty += current_iter_dirty;
+
+        if ( last_iter )
+        {
+            if ( live )
+            {
+                if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF,
+                                       NULL, 0, NULL, 0, NULL) < 0 )
+                    PERROR("Can't disable log-dirty mode");
+            }
+            break;
+        }
+    }
+
+    i = (xen_pfn_t) -1; /* end page marker */
+    if ( write_exact(io_fd, &i, sizeof(i)) )
+    {
+        PERROR("Write Error: end page mark");
+        goto out;
+    }
+    /* flush last write and check for errno (must do, see fsync()). */
+    if ( fsync(io_fd) && errno != EINVAL )
+    {
+        PERROR("Write Error: flushing stream");
+        goto out;
+    }
+
+    rc = 0;
+ out:
+    /* clean up */
+    if ( page )
+        munmap(page, PAGE_SIZE);
+    if ( to_send )
+        xc_hypercall_buffer_free_pages(xch, to_send, 
+                                       NRPAGES(bitmap_size(mem_size)));
+    return rc;
+}
+
+static int restore_guest_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                                guest_params_t *params)
+{
+    xen_pfn_t gpfn;
+    xen_pfn_t start = params->start_gpfn;
+    xen_pfn_t end = params->max_gpfn;
+    int count = 0;
+    char *page = NULL;
+    int rc = -1;
+
+    /* TODO: batch operation */
+    for ( gpfn = start; gpfn < end; ++gpfn )
+    {
+        if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) )
+        {
+            PERROR("Can't populate guest physical memory");
+            goto out;
+        }
+    }
+
+    while ( 1 )
+    {
+        if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) )
+        {
+            PERROR("Read Error: guest memory gpfn (count=%d)", count);
+            goto out;
+        }
+
+        /* end of guest pages */
+        if ( gpfn == (xen_pfn_t) -1 ) 
+            break;
+
+        if ( gpfn < start || gpfn >= end )
+        {
+            PERROR("gpfn 0x%llx doesn't belong to guest RAM addr space "
+                   "[0x%llx, 0x%llx]", gpfn, start, end);
+            goto out;
+        }
+
+        if ( !(page = xc_map_foreign_range(xch, dom, PAGE_SIZE, 
+                                           PROT_READ | PROT_WRITE, gpfn)) )
+        {
+            PERROR("Can't map guest page to dom0 (gpfn=0x%llx)", gpfn);
+            goto out;
+        }
+
+        if ( read_exact(io_fd, page, PAGE_SIZE) )
+        {
+            PERROR("Read Error: guest memory content (gpfn=0x%llx)", gpfn);
+            goto out;
+        }
+
+        munmap(page, PAGE_SIZE);
+        count++;
+    }
+
+    rc = 0;
+ out:
+    if ( page )
+        munmap(page, PAGE_SIZE);
+
+    return rc;
+}
+
+/***********************************/
+/*            HVM Context          */
+/***********************************/
+static int save_guest_hvm_ctxt(xc_interface *xch, int io_fd, uint32_t dom)
+{
+    uint32_t ctxt_size = 0;
+    uint8_t *hvm_ctxt = NULL;
+    uint32_t rec_size;
+    int rc = -1;
+
+    /* Figure out HVM context size and allocate a local buffer */
+    if ( (ctxt_size = xc_domain_hvm_getcontext(xch, dom, 0, 0)) == -1 )
+    {
+        PERROR("Can't get HVM context size");
+        goto out;
+    }
+
+    if ( (hvm_ctxt = malloc(ctxt_size)) == NULL )
+    {
+        PERROR("Can't allocate memory for HVM context");
+        goto out;
+    }
+
+    /* Retrieve HVM context from Xen and save it */
+    if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_ctxt, 
+                                              ctxt_size)) == -1 )
+    {
+        PERROR("Can't receive HVM context");
+        goto out;
+    }
+
+    if ( write_exact(io_fd, &rec_size, sizeof(uint32_t)) || 
+         write_exact(io_fd, hvm_ctxt, rec_size) )
+    {
+        PERROR("Write Error: HVM context");
+        goto out;
+    }
+
+    rc = 0;
+out:
+    free (hvm_ctxt);
+    return rc;
+}
+
+static int restore_guest_hvm_ctxt(xc_interface *xch, int io_fd, uint32_t dom)
+{
+    uint32_t rec_size;
+    uint32_t ctxt_size = 0;
+    uint8_t *hvm_ctxt = NULL;
+    int frc = 0;
+    int rc = -1;
+
+    if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) )
+    {
+        PERROR("Read Error: HVM context size");
+        goto out;
+    }
+
+    if ( !rec_size )
+    {
+        PERROR("No HVM context");
+        goto out;
+    }
+
+    if ( (ctxt_size = xc_domain_hvm_getcontext(xch, dom, 0, 0)) != rec_size )
+    {
+        PERROR("Stored HVM context size isn't matched with current system");
+        goto out;
+    }
+
+    if ( !(hvm_ctxt = malloc(ctxt_size)) )
+    {
+        PERROR("Can't allocate memory");
+        goto out;
+    }
+
+    if ( read_exact(io_fd, hvm_ctxt, ctxt_size) )
+    {
+        PERROR("Read Error: HVM context");
+        goto out;
+    }
+
+    if ( (frc = xc_domain_hvm_setcontext(xch, dom, hvm_ctxt, ctxt_size)) )
+    {
+        PERROR("Can't set HVM context");
+        goto out;
+    }
+
+    rc = 0;
+out:
+    free (hvm_ctxt);
+    return rc;
+}
+
+/***********************************/
+/*         Guest Parameters        */
+/***********************************/
+static int save_guest_params(xc_interface *xch, int io_fd, uint32_t dom, 
+                             uint32_t flags, guest_params_t *params)
+{
+    xc_dominfo_t dom_info;
+
+    /* retrieve domain info */
+    if ( (xc_domain_getinfo(xch, dom, 1, &dom_info ) != 1) || 
+         dom_info.domid != dom)
+    {
+        PERROR("Can't get domain info for dom %d", dom);
+        return -1;
+    }
+
+    /* start and max of gpfn */
+    params->start_gpfn = (GUEST_RAM_BASE >> PAGE_SHIFT);
+    params->max_gpfn = (GUEST_RAM_BASE >> PAGE_SHIFT) + dom_info.nr_pages;
+
+    /* console pfn */
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+                          &params->console_pfn) )
+    {
+        PERROR("Can't get console gpfn");
+        return -1;
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, &params->store_pfn) )
+    {
+        PERROR("Can't get store gpfn");
+        return -1;
+    }
+
+    params->max_vcpu_id = dom_info.max_vcpu_id;
+    params->flags = flags;
+
+    if ( write_exact(io_fd, params, sizeof(*params)) )
+    {
+        PERROR("Write Error: guest params");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int restore_guest_params(xc_interface *xch, int io_fd, uint32_t dom,
+                                guest_params_t *params)
+{
+    xen_pfn_t nr_pfns;
+    unsigned int maxmemkb;
+
+    if ( read_exact(io_fd, params, sizeof(*params)) )
+    {
+        PERROR("Read Error: guest params");
+        return -1;
+    }
+
+    nr_pfns = params->max_gpfn - params->start_gpfn;
+    maxmemkb = (unsigned int) nr_pfns << (PAGE_SHIFT - 10);
+
+    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
+    {
+        PERROR("Can't set memory map");
+        return -1;
+    }
+
+    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
+    {
+        PERROR("Can't set max vcpu number for domain");
+        return -1;
+    }
+
+    return 0;
+}
+
+/* Configure guest parameters after restore */
+static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom,
+                            guest_params_t *params, 
+                            unsigned int console_evtchn, domid_t console_domid,
+                            unsigned int store_evtchn, domid_t store_domid)
+{
+    int rc = 0;
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) )
+    {
+        PERROR("Can't clear console page");
+        return rc;
+    }
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) )
+    {
+        PERROR("Can't clear xenstore page");
+        return rc;
+    }
+
+    if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn,
+                                      params->store_pfn, console_domid,
+                                      store_domid)) )
+    {
+        PERROR("Can't grant console and xenstore pages");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+                                params->console_pfn)) )
+    {
+        PERROR("Can't set console gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
+                                params->store_pfn)) )
+    {
+        PERROR("Can't set xenstore gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN,
+                                console_evtchn)) )
+    {
+        PERROR("Can't set console event channel");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN,
+                                store_evtchn)) )
+    {
+        PERROR("Can't set xenstore event channel");
+        return rc;
+    }
+
+    return 0;
+}
+
+/***********************************/
+/*          Main Entries           */
+/***********************************/
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, 
+                   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
+                   struct save_callbacks *callbacks, int hvm,
+                   unsigned long vm_generationid_addr)
+{
+    guest_params_t params;
+
+    if ( save_guest_params(xch, io_fd, dom, flags, &params) )
+    {
+        PERROR("Can't save guest params");
+        return -1;
+    }
+
+    if ( save_guest_memory(xch, io_fd, dom, callbacks, max_iters, max_factor, 
+                           &params) )
+    {
+        PERROR("Can't save guest memory");
+        return -1;
+    }
+
+    if ( save_guest_hvm_ctxt(xch, io_fd, dom) )
+    {
+        PERROR("Cant' save guest HVM context");
+        return -1;
+    }
+
+    return 0;
+}
+
+int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
+                      unsigned int store_evtchn, unsigned long *store_gpfn,
+                      domid_t store_domid, unsigned int console_evtchn,
+                      unsigned long *console_gpfn, domid_t console_domid,
+                      unsigned int hvm, unsigned int pae, int superpages,
+                      int no_incr_generationid, int checkpointed_stream,
+                      unsigned long *vm_generationid_addr,
+                      struct restore_callbacks *callbacks)
+{
+    guest_params_t params;
+
+    if ( restore_guest_params(xch, io_fd, dom, &params) )
+    {
+        PERROR("Can't restore guest params");
+        return -1;
+    }
+
+    if ( restore_guest_memory(xch, io_fd, dom, &params) )
+    {
+        PERROR("Can't restore memory");
+        return -1;
+    }
+
+    if ( set_guest_params(xch, io_fd, dom, &params, console_evtchn, 
+                          console_domid, store_evtchn, store_domid) )
+    {
+        PERROR("Can't setup guest params");
+        return -1;
+    }
+
+    /* setup console and store PFNs to caller */
+    *console_gpfn = params.console_pfn;
+    *store_gpfn = params.store_pfn;
+
+    /* restore HVM context of guest VM */
+    if ( restore_guest_hvm_ctxt(xch, io_fd, dom) )
+    {
+        PERROR("Can't restore HVM context");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 60ac51a..d9e50d2 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -348,7 +348,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
         modbase += dtb_size;
     }
 
-    return 0;
+    return xc_domain_setmaxmem(dom->xch, dom->guest_domid,
+                               (dom->total_pages + NR_MAGIC_PAGES)
+                                << (PAGE_SHIFT - 10));
 }
 
 int arch_setup_bootearly(struct xc_dom_image *dom)
diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 18b4818..6d05d4e 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -18,11 +18,15 @@
 #include "xg_private.h"
 #include "xg_save_restore.h"
 
-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__i386__) || defined(__x86_64__) || defined(__arm__) || \
+    defined(__aarch64__)
 
+/* Only required for x86 */
+#if defined(__i386__) || defined(__x86_64__)
 #include <xen/foreign/x86_32.h>
 #include <xen/foreign/x86_64.h>
 #include <xen/hvm/params.h>
+#endif
 
 static int modify_returncode(xc_interface *xch, uint32_t domid)
 {
@@ -33,7 +37,7 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
     struct domain_info_context *dinfo = &_dinfo;
     int rc;
 
-    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
+    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 || info.domid != domid )
     {
         PERROR("Could not get domain info");
         return -1;
@@ -65,23 +69,23 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
     if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
         return rc;
 
+#if defined(__i386__) || defined(__x86_64__)
     SET_FIELD(&ctxt, user_regs.eax, 1);
+#elif defined(__arm__) || defined(__aarch64__)
+    ctxt.c.user_regs.r0_usr = 1;
+#endif
 
     if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 )
         return rc;
 
     return 0;
 }
-
-#else
-
+#else /* any architecture other than x86 or ARM */
 static int modify_returncode(xc_interface *xch, uint32_t domid)
 {
     return 0;
-
 }
-
-#endif
+#endif 
 
 static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
 {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 84f9c0e..ce22aa0 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -448,9 +448,6 @@
  *  - libxl_domain_resume
  *  - libxl_domain_remus_start
  */
-#if defined(__arm__) || defined(__aarch64__)
-#define LIBXL_HAVE_NO_SUSPEND_RESUME 1
-#endif
 
 /*
  * LIBXL_HAVE_DEVICE_PCI_SEIZE
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 69b1817..f4ea7ab 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
 
 TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov
 TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-TARGETS-$(CONFIG_MIGRATE) += xen-hptool
+TARGETS-$(CONFIG_X86) += xen-hptool
 TARGETS := $(TARGETS-y)
 
 SUBDIRS := $(SUBDIRS-y)
@@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
 INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \
 	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
 INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
+INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
 INSTALL_SBIN := $(INSTALL_SBIN-y)
 
 INSTALL_PRIVBIN-y := xenpvnetboot
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
@ 2014-05-08 22:11   ` Andrew Cooper
  2014-05-08 22:20     ` Wei Huang
  2014-05-14 10:25     ` Ian Campbell
  2014-05-09  9:06   ` Julien Grall
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 67+ messages in thread
From: Andrew Cooper @ 2014-05-08 22:11 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 08/05/2014 22:18, Wei Huang wrote:


> diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
> new file mode 100644
> index 0000000..fa5fe75
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/support.h
> @@ -0,0 +1,29 @@
> +/*
> + * HVM support routines
> + *
> + * Copyright (c) 2014, Samsung Electronics.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
> +#define __ASM_ARM_HVM_SUPPORT_H__
> +
> +#include <xen/types.h>
> +#include <public/hvm/ioreq.h>
> +#include <xen/sched.h>
> +#include <xen/hvm/save.h>
> +#include <asm/processor.h>
> +
> +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */

This header file isn't touched by any subsequent patches, and just
having it as a list of includes is overkill.  Can it be dropped?  5
extra includes in a few .c files is hardly breaking the bank.

> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 75b8e65..8312e7b 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -3,6 +3,7 @@
>   * be saved along with the domain's memory and device-model state.
>   *
>   * Copyright (c) 2012 Citrix Systems Ltd.
> + * Copyright (c) 2014 Samsung Electronics.
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a copy
>   * of this software and associated documentation files (the "Software"), to
> @@ -26,6 +27,24 @@
>  #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>  #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>  
> +#define HVM_ARM_FILE_MAGIC   0x92385520
> +#define HVM_ARM_FILE_VERSION 0x00000001
> +
> +/* Note: For compilation purpose hvm_save_header name is the same as x86,
> + * but layout is different. */
> +struct hvm_save_header
> +{
> +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
> +    uint32_t version;           /* File format version */
> +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
> +};
> +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
> +
> +/*
> + * Largest type-code in use
> + */
> +#define HVM_SAVE_CODE_MAX 1
> +
>  #endif
>  
>  /*

Hmm - it is quite poor to have this magically named "hvm_save_header".

If you were to redefine arch_hvm{save,load} as:

int arch_hvm_save(struct domain *d, struct hvm_domain_context_t *h);
int arch_hvm_load(struct domain *d, struct hvm_domain_context_t *h);

and pushed the hvm_save_entry(HEADER, 0, h, &hdr) into arch_hvm_save(),
you can remove all trace of "struct hvm_save_header" from common code. 
This then removes any requirements to have an identically named struct.

It is probably worth having this all as a separate patch which just
cleans up the x86 and common code before introducing the ARM side of things.

The rest of the patch is however looking fine.

~Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 22:11   ` Andrew Cooper
@ 2014-05-08 22:20     ` Wei Huang
  2014-05-09  8:56       ` Julien Grall
  2014-05-14 10:25     ` Ian Campbell
  1 sibling, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-08 22:20 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, tim, julien.grall,
	ian.jackson, jaeyong.yoo, jbeulich, yjhyun.yoo

On 05/08/2014 05:11 PM, Andrew Cooper wrote:
> On 08/05/2014 22:18, Wei Huang wrote:
>
>
>> diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
>> new file mode 100644
>> index 0000000..fa5fe75
>> --- /dev/null
>> +++ b/xen/include/asm-arm/hvm/support.h
>> @@ -0,0 +1,29 @@
>> +/*
>> + * HVM support routines
>> + *
>> + * Copyright (c) 2014, Samsung Electronics.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
>> + * Place - Suite 330, Boston, MA 02111-1307 USA.
>> + */
>> +
>> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
>> +#define __ASM_ARM_HVM_SUPPORT_H__
>> +
>> +#include <xen/types.h>
>> +#include <public/hvm/ioreq.h>
>> +#include <xen/sched.h>
>> +#include <xen/hvm/save.h>
>> +#include <asm/processor.h>
>> +
>> +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
>
> This header file isn't touched by any subsequent patches, and just
> having it as a list of includes is overkill.  Can it be dropped?  5
> extra includes in a few .c files is hardly breaking the bank.
Last time I tried it quickly, the compilation broke. Will try again.
>
>> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
>> index 75b8e65..8312e7b 100644
>> --- a/xen/include/public/arch-arm/hvm/save.h
>> +++ b/xen/include/public/arch-arm/hvm/save.h
>> @@ -3,6 +3,7 @@
>>    * be saved along with the domain's memory and device-model state.
>>    *
>>    * Copyright (c) 2012 Citrix Systems Ltd.
>> + * Copyright (c) 2014 Samsung Electronics.
>>    *
>>    * Permission is hereby granted, free of charge, to any person obtaining a copy
>>    * of this software and associated documentation files (the "Software"), to
>> @@ -26,6 +27,24 @@
>>   #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>   #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>
>> +#define HVM_ARM_FILE_MAGIC   0x92385520
>> +#define HVM_ARM_FILE_VERSION 0x00000001
>> +
>> +/* Note: For compilation purpose hvm_save_header name is the same as x86,
>> + * but layout is different. */
>> +struct hvm_save_header
>> +{
>> +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
>> +    uint32_t version;           /* File format version */
>> +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
>> +};
>> +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
>> +
>> +/*
>> + * Largest type-code in use
>> + */
>> +#define HVM_SAVE_CODE_MAX 1
>> +
>>   #endif
>>
>>   /*
>
> Hmm - it is quite poor to have this magically named "hvm_save_header".
>
> If you were to redefine arch_hvm{save,load} as:
>
> int arch_hvm_save(struct domain *d, struct hvm_domain_context_t *h);
> int arch_hvm_load(struct domain *d, struct hvm_domain_context_t *h);
>
> and pushed the hvm_save_entry(HEADER, 0, h, &hdr) into arch_hvm_save(),
> you can remove all trace of "struct hvm_save_header" from common code.
> This then removes any requirements to have an identically named struct.
>
> It is probably worth having this all as a separate patch which just
> cleans up the x86 and common code before introducing the ARM side of things.
Reasonable. I will fix next revision.
>
> The rest of the patch is however looking fine.
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
@ 2014-05-08 22:47   ` Andrew Cooper
  2014-05-09 14:12     ` Wei Huang
  2014-05-13 14:53     ` Wei Huang
  2014-05-09  9:17   ` Julien Grall
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 67+ messages in thread
From: Andrew Cooper @ 2014-05-08 22:47 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 08/05/2014 22:18, Wei Huang wrote:
> This patch implements a save/restore support for
> ARM guest GIC. Two types of GIC V2 states are saved seperately:
> 1) VGICD_* contains the GIC distributor state from
> guest VM's view; 2) GICH_* is the GIC virtual control
> state from hypervisor's persepctive.
>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/vgic.c                    |  171 ++++++++++++++++++++++++++++++++
>  xen/include/public/arch-arm/hvm/save.h |   34 ++++++-
>  2 files changed, 204 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 4cf6470..505e944 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <xen/softirq.h>
>  #include <xen/irq.h>
>  #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>  
>  #include <asm/current.h>
>  
> @@ -73,6 +74,110 @@ static struct vgic_irq_rank *vgic_irq_rank(struct vcpu *v, int b, int n)
>          return NULL;
>  }
>  
> +/* Save guest VM's distributor info into a context to support domains

Small nit, but Xen style would be:

/*
 * start of comment

for multiline comments.

> + * save/restore. Such info represents guest VM's view of its GIC
> + * distributor (GICD_*).
> + */
> +static int hvm_vgicd_save(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_arm_vgicd_v2 ctxt;
> +    struct vcpu *v;
> +    struct vgic_irq_rank *rank;
> +    int rc = 0;
> +
> +    /* Save the state for each VCPU */
> +    for_each_vcpu( d, v )
> +    {
> +        rank = &v->arch.vgic.private_irqs;
> +
> +        /* IENABLE, IACTIVE, IPEND,  PENDSGI */
> +        ctxt.ienable = rank->ienable;
> +        ctxt.iactive = rank->iactive;
> +        ctxt.ipend = rank->ipend;
> +        ctxt.pendsgi = rank->pendsgi;
> +
> +        /* ICFG */
> +        ctxt.icfg[0] = rank->icfg[0];
> +        ctxt.icfg[1] = rank->icfg[1];
> +
> +        /* IPRIORITY */
> +        BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ctxt.ipriority));
> +        memcpy(ctxt.ipriority, rank->ipriority, sizeof(rank->ipriority));

Can you be consistent with a space (or lack of) with the sizeof
operator.   Eyeballing a grep of the codebase, Xen's prevaling style
would appear to be without the space.

> +
> +        /* ITARGETS */
> +        BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ctxt.itargets));
> +        memcpy(ctxt.itargets, rank->itargets, sizeof(rank->itargets));
> +
> +        if ( (rc = hvm_save_entry(VGICD_V2, v->vcpu_id, h, &ctxt)) != 0 )
> +            return rc;
> +    }
> +
> +    return rc;
> +}
> +
> +/* Load guest VM's distributor info from a context to support domain
> + * save/restore. The info is loaded into vgic_irq_rank.
> + */
> +static int hvm_vgicd_load(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_arm_vgicd_v2 ctxt;
> +    struct vgic_irq_rank *rank;
> +    struct vcpu *v;
> +    int vcpuid;

unsigned int. (and later on as well)

Can be combined with the 'irq' declaration.

> +    unsigned long enable_bits;
> +    struct pending_irq *p;
> +    unsigned int irq = 0;
> +    int rc = 0;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( (rc = hvm_load_entry(VGICD_V2, h, &ctxt)) != 0 )
> +        return rc;
> +
> +    /* Restore PPI states */
> +    rank = &v->arch.vgic.private_irqs;
> +
> +    /* IENABLE, IACTIVE, IPEND, PENDSGI */
> +    rank->ienable = ctxt.ienable;
> +    rank->iactive = ctxt.iactive;
> +    rank->ipend = ctxt.ipend;
> +    rank->pendsgi = ctxt.pendsgi;
> +
> +    /* ICFG */
> +    rank->icfg[0] = ctxt.icfg[0];
> +    rank->icfg[1] = ctxt.icfg[1];
> +
> +    /* IPRIORITY */
> +    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ctxt.ipriority));
> +    memcpy(rank->ipriority, ctxt.ipriority, sizeof(rank->ipriority));
> +
> +    /* ITARGETS */
> +    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ctxt.itargets));
> +    memcpy(rank->itargets, ctxt.itargets, sizeof(rank->itargets));
> +
> +    /* Set IRQ status as enabled by iterating through rank->ienable register.
> +     * This step is required otherwise events won't be received by the VM
> +     * after restore. */
> +    enable_bits = ctxt.ienable;
> +    while ( (irq = find_next_bit(&enable_bits, 32, irq)) < 32 )
> +    {
> +        p = irq_to_pending(v, irq);
> +        set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
> +        irq++;
> +    }
> +
> +    return 0;
> +}
> +HVM_REGISTER_SAVE_RESTORE(VGICD_V2, hvm_vgicd_save, hvm_vgicd_load,
> +                          1, HVMSR_PER_VCPU);
> +
>  int domain_vgic_init(struct domain *d)
>  {
>      int i;
> @@ -759,6 +864,72 @@ out:
>          smp_send_event_check_mask(cpumask_of(v->processor));
>  }
>  
> +/* Save GIC virtual control state into a context to support save/restore. 
> + * The info reprsents most of GICH_* registers. */
> +static int hvm_gich_save(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_arm_gich_v2 ctxt;
> +    struct vcpu *v;
> +    int rc = 0;
> +
> +    /* Save the state of GICs */
> +    for_each_vcpu( d, v )
> +    {
> +        ctxt.gic_hcr = v->arch.gic_hcr;
> +        ctxt.gic_vmcr = v->arch.gic_vmcr;
> +        ctxt.gic_apr = v->arch.gic_apr;
> +
> +        /* Save list registers and masks */
> +        BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
> +        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
> +
> +        ctxt.lr_mask = v->arch.lr_mask;
> +        ctxt.event_mask = v->arch.event_mask;
> +
> +        if ( (rc = hvm_save_entry(GICH_V2, v->vcpu_id, h, &ctxt)) != 0 )
> +            return rc;
> +    }
> +
> +    return rc;
> +}
> +
> +/* Restore GIC virtual control state from a context to support save/restore */
> +static int hvm_gich_load(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_arm_gich_v2 ctxt;
> +    struct vcpu *v;
> +    int rc = 0;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_ERR, "HVM restore: dom%u has no vcpu%u\n", d->domain_id,
> +                vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( (rc = hvm_load_entry(GICH_V2, h, &ctxt)) != 0 )
> +        return rc;
> +
> +    v->arch.gic_hcr = ctxt.gic_hcr;
> +    v->arch.gic_vmcr = ctxt.gic_vmcr;
> +    v->arch.gic_apr = ctxt.gic_apr;
> +
> +    /* Restore list registers and masks */
> +    BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
> +    memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(v->arch.gic_lr));
> +
> +    v->arch.lr_mask = ctxt.lr_mask;
> +    v->arch.event_mask = ctxt.event_mask;
> +
> +    return rc;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(GICH_V2, hvm_gich_save, hvm_gich_load, 1,
> +                          HVMSR_PER_VCPU);
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 8312e7b..421a6f6 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -40,10 +40,42 @@ struct hvm_save_header
>  };
>  DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
>  
> +/* Guest's view of GIC distributor (per-vcpu)
> + *   - Based on GICv2 (see "struct vgic_irq_rank")
> + *   - Store guest's view of GIC distributor
> + *   - Only support SGI and PPI for DomU (DomU doesn't handle SPI)
> + */
> +struct hvm_arm_vgicd_v2
> +{
> +    uint32_t ienable;
> +    uint32_t iactive;
> +    uint32_t ipend;
> +    uint32_t pendsgi;
> +    uint32_t icfg[2];
> +    uint32_t ipriority[8];
> +    uint32_t itargets[8];
> +};
> +DECLARE_HVM_SAVE_TYPE(VGICD_V2, 2, struct hvm_arm_vgicd_v2);
> +
> +/* Info for hypervisor to manage guests (per-vcpu)
> + *   - Based on GICv2
> + *   - Mainly store registers of GICH_*
> + */
> +struct hvm_arm_gich_v2
> +{
> +    uint32_t gic_hcr;
> +    uint32_t gic_vmcr;
> +    uint32_t gic_apr;
> +    uint32_t gic_lr[64];
> +    uint64_t event_mask;
> +    uint64_t lr_mask;

This has an odd number of uint32_t.  I suspect it will end up with a
different structure size between a 32 and 64 bit build of Xen.

> +};
> +DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
> +
>  /*
>   * Largest type-code in use
>   */
> -#define HVM_SAVE_CODE_MAX 1
> +#define HVM_SAVE_CODE_MAX 3
>  
>  #endif
>  

On x86, we require that HVM save records only contain architectural
state.  Not knowing arm myself, it is not clear from your comments
whether this is the case or not.  Can you confirm whether it is or not?

~Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-08 21:18 ` [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer Wei Huang
@ 2014-05-08 23:02   ` Andrew Cooper
  2014-05-11  9:01     ` Julien Grall
  2014-05-11  8:58   ` Julien Grall
  2014-05-14 11:14   ` Ian Campbell
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2014-05-08 23:02 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 08/05/2014 22:18, Wei Huang wrote:
> This patch implements a save/resore support for ARM architecture
> timer.
>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/vtimer.c                  |   90 ++++++++++++++++++++++++++++++++
>  xen/include/public/arch-arm/hvm/save.h |   16 +++++-
>  2 files changed, 105 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
> index b93153e..6576408 100644
> --- a/xen/arch/arm/vtimer.c
> +++ b/xen/arch/arm/vtimer.c
> @@ -21,6 +21,7 @@
>  #include <xen/lib.h>
>  #include <xen/timer.h>
>  #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>  #include <asm/irq.h>
>  #include <asm/time.h>
>  #include <asm/gic.h>
> @@ -285,6 +286,95 @@ int vtimer_emulate(struct cpu_user_regs *regs, union hsr hsr)
>      }
>  }
>  
> +/* Save timer info to support save/restore */
> +static int hvm_timer_save(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_arm_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t;
> +    int i;

unsigned int

> +    int rc = 0;
> +
> +    /* Save the state of vtimer and ptimer */
> +    for_each_vcpu( d, v )
> +    {
> +        t = &v->arch.virt_timer;
> +
> +        for ( i = 0; i < ARM_TIMER_TYPE_COUNT; i++ )
> +        {
> +            ctxt.cval = t->cval;
> +            ctxt.ctl = t->ctl;
> +
> +            switch ( i )
> +            {
> +            case ARM_TIMER_TYPE_PHYS:
> +                ctxt.vtb_offset = d->arch.phys_timer_base.offset;
> +                ctxt.type = ARM_TIMER_TYPE_PHYS;
> +                break;
> +            case ARM_TIMER_TYPE_VIRT:
> +                ctxt.vtb_offset = d->arch.virt_timer_base.offset;
> +                ctxt.type = ARM_TIMER_TYPE_VIRT;
> +            default:
> +                rc = -EINVAL;
> +                break;

This break is out of the switch, not out of the for loop, so you will
still try to save the bogus entry.

As you control i and want to save all timers, I suggest a BUG() instead;

> +            }
> +
> +            if ( (rc = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
> +                return rc;
> +
> +            t = &v->arch.phys_timer;

This updating of t looks suspect and fragile.

This is a good approximation of the "for case" programming paradigm
(http://thedailywtf.com/Comments/The_FOR-CASE_paradigm.aspx).

There are only two timers and they refer to different named items inside
struct domain.  It would be clearer to remove the loop.

> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +/* Restore timer info from context to support save/restore */
> +static int hvm_timer_load(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;

unsigned

> +    struct hvm_arm_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t = NULL;

With this initialised here...

> +    int rc = 0;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    switch ( ctxt.type )
> +    {
> +    case ARM_TIMER_TYPE_PHYS:
> +        t = &v->arch.phys_timer;
> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
> +        break;
> +    case ARM_TIMER_TYPE_VIRT:
> +        t = &v->arch.virt_timer;
> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
> +        break;
> +    default:
> +        rc = -EINVAL;
> +        break;

... and this error handling,

> +    }
> +
> +    t->cval = ctxt.cval;
> +    t->ctl = ctxt.ctl;
> +    t->v = v;

this is going to end in tears.  return -EINVAL from the default.

> +
> +    return rc;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(TIMER, hvm_timer_save, hvm_timer_load, 2,
> +                          HVMSR_PER_VCPU);
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 421a6f6..8679bfd 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -72,10 +72,24 @@ struct hvm_arm_gich_v2
>  };
>  DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
>  
> +/* Two ARM timers (physical and virtual) are saved */
> +#define ARM_TIMER_TYPE_VIRT  0
> +#define ARM_TIMER_TYPE_PHYS  1
> +#define ARM_TIMER_TYPE_COUNT 2       /* total count */
> +
> +struct hvm_arm_timer
> +{
> +    uint64_t vtb_offset;
> +    uint32_t ctl;
> +    uint64_t cval;
> +    uint32_t type;
> +};

This is also going to have 32/64 alignment issues.

~Andrew

> +DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);
> +
>  /*
>   * Largest type-code in use
>   */
> -#define HVM_SAVE_CODE_MAX 3
> +#define HVM_SAVE_CODE_MAX 4
>  
>  #endif
>  

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-08 21:18 ` [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers Wei Huang
@ 2014-05-08 23:10   ` Andrew Cooper
  2014-05-09 16:35     ` Wei Huang
  2014-05-11  9:06   ` Julien Grall
  2014-05-14 11:37   ` Ian Campbell
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2014-05-08 23:10 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 08/05/2014 22:18, Wei Huang wrote:
> This patch implements a save/resore support for ARM guest core
> registers.
>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/hvm.c                     |  263 +++++++++++++++++++++++++++++++-
>  xen/include/public/arch-arm/hvm/save.h |  121 ++++++++++++++-
>  2 files changed, 382 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
> index 471c4cd..7bfa547 100644
> --- a/xen/arch/arm/hvm.c
> +++ b/xen/arch/arm/hvm.c
> @@ -7,14 +7,15 @@
>  
>  #include <xsm/xsm.h>
>  
> +#include <xen/hvm/save.h>
>  #include <public/xen.h>
>  #include <public/hvm/params.h>
>  #include <public/hvm/hvm_op.h>
>  
>  #include <asm/hypercall.h>
> +#include <asm/gic.h>
>  
>  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
> -

Spurious whitespace change.

>  {
>      long rc = 0;
>  
> @@ -65,3 +66,263 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      return rc;
>  }
> +
> +static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_arm_cpu ctxt;
> +    struct vcpu_guest_core_regs c;
> +    struct vcpu *v;
> +
> +    /* Save the state of CPU */
> +    for_each_vcpu( d, v )
> +    {
> +        memset(&ctxt, 0, sizeof(ctxt));
> +
> +        ctxt.sctlr = v->arch.sctlr;
> +        ctxt.ttbr0 = v->arch.ttbr0;
> +        ctxt.ttbr1 = v->arch.ttbr1;
> +        ctxt.ttbcr = v->arch.ttbcr;
> +
> +        ctxt.dacr = v->arch.dacr;
> +        ctxt.ifsr = v->arch.ifsr;
> +#ifdef CONFIG_ARM_32
> +        ctxt.ifar = v->arch.ifar;
> +        ctxt.dfar = v->arch.dfar;
> +        ctxt.dfsr = v->arch.dfsr;
> +#else
> +        ctxt.far = v->arch.far;
> +        ctxt.esr = v->arch.esr;
> +#endif
> +
> +#ifdef CONFIG_ARM_32
> +        ctxt.mair0 = v->arch.mair0;
> +        ctxt.mair1 = v->arch.mair1;
> +#else
> +        ctxt.mair0 = v->arch.mair;
> +#endif
> +        /* Control Registers */
> +        ctxt.actlr = v->arch.actlr;
> +        ctxt.sctlr = v->arch.sctlr;
> +        ctxt.cpacr = v->arch.cpacr;
> +
> +        ctxt.contextidr = v->arch.contextidr;
> +        ctxt.tpidr_el0 = v->arch.tpidr_el0;
> +        ctxt.tpidr_el1 = v->arch.tpidr_el1;
> +        ctxt.tpidrro_el0 = v->arch.tpidrro_el0;
> +
> +        /* CP 15 */
> +        ctxt.csselr = v->arch.csselr;
> +
> +        ctxt.afsr0 = v->arch.afsr0;
> +        ctxt.afsr1 = v->arch.afsr1;
> +        ctxt.vbar = v->arch.vbar;
> +        ctxt.par = v->arch.par;
> +        ctxt.teecr = v->arch.teecr;
> +        ctxt.teehbr = v->arch.teehbr;
> +
> +#ifdef CONFIG_ARM_32
> +        ctxt.joscr = v->arch.joscr;
> +        ctxt.jmcr = v->arch.jmcr;
> +#endif
> +
> +        memset(&c, 0, sizeof(c));
> +
> +        /* get guest core registers */
> +        vcpu_regs_hyp_to_user(v, &c);
> +
> +        ctxt.x0 = c.x0;
> +        ctxt.x1 = c.x1;
> +        ctxt.x2 = c.x2;
> +        ctxt.x3 = c.x3;
> +        ctxt.x4 = c.x4;
> +        ctxt.x5 = c.x5;
> +        ctxt.x6 = c.x6;
> +        ctxt.x7 = c.x7;
> +        ctxt.x8 = c.x8;
> +        ctxt.x9 = c.x9;
> +        ctxt.x10 = c.x10;
> +        ctxt.x11 = c.x11;
> +        ctxt.x12 = c.x12;
> +        ctxt.x13 = c.x13;
> +        ctxt.x14 = c.x14;
> +        ctxt.x15 = c.x15;
> +        ctxt.x16 = c.x16;
> +        ctxt.x17 = c.x17;
> +        ctxt.x18 = c.x18;
> +        ctxt.x19 = c.x19;
> +        ctxt.x20 = c.x20;
> +        ctxt.x21 = c.x21;
> +        ctxt.x22 = c.x22;
> +        ctxt.x23 = c.x23;
> +        ctxt.x24 = c.x24;
> +        ctxt.x25 = c.x25;
> +        ctxt.x26 = c.x26;
> +        ctxt.x27 = c.x27;
> +        ctxt.x28 = c.x28;
> +        ctxt.x29 = c.x29;
> +        ctxt.x30 = c.x30;
> +        ctxt.pc64 = c.pc64;
> +        ctxt.cpsr = c.cpsr;
> +        ctxt.spsr_el1 = c.spsr_el1; /* spsr_svc */
> +
> +#ifdef CONFIG_ARM_32
> +        ctxt.spsr_fiq = c.spsr_fiq;
> +        ctxt.spsr_irq = c.spsr_irq;
> +        ctxt.spsr_und = c.spsr_und;
> +        ctxt.spsr_abt = c.spsr_abt;
> +#endif
> +#ifdef CONFIG_ARM_64
> +        ctxt.sp_el0 = c.sp_el0;
> +        ctxt.sp_el1 = c.sp_el1;
> +        ctxt.elr_el1 = c.elr_el1;
> +#endif
> +
> +        /* check VFP state size before dumping */
> +        BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp_state));
> +        memcpy((void*) &ctxt.vfp_state, (void*) &v->arch.vfp, 
> +               sizeof(v->arch.vfp));
> +
> +        if ( hvm_save_entry(VCPU, v->vcpu_id, h, &ctxt) != 0 )
> +            return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_arm_cpu ctxt;
> +    struct vcpu *v;
> +    struct vcpu_guest_core_regs c;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(VCPU, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    v->arch.sctlr = ctxt.sctlr;
> +    v->arch.ttbr0 = ctxt.ttbr0;
> +    v->arch.ttbr1 = ctxt.ttbr1;
> +    v->arch.ttbcr = ctxt.ttbcr;
> +
> +    v->arch.dacr = ctxt.dacr;
> +    v->arch.ifsr = ctxt.ifsr;
> +#ifdef CONFIG_ARM_32
> +    v->arch.ifar = ctxt.ifar;
> +    v->arch.dfar = ctxt.dfar;
> +    v->arch.dfsr = ctxt.dfsr;
> +#else
> +    v->arch.far = ctxt.far;
> +    v->arch.esr = ctxt.esr;
> +#endif

Where you have code like this, please use a union in the structure to
reduce its size.

> +
> +#ifdef CONFIG_ARM_32
> +    v->arch.mair0 = ctxt.mair0;
> +    v->arch.mair1 = ctxt.mair1;
> +#else
> +    v->arch.mair = ctxt.mair0;
> +#endif
> +
> +    /* Control Registers */
> +    v->arch.actlr = ctxt.actlr;
> +    v->arch.cpacr = ctxt.cpacr;
> +    v->arch.contextidr = ctxt.contextidr;
> +    v->arch.tpidr_el0 = ctxt.tpidr_el0;
> +    v->arch.tpidr_el1 = ctxt.tpidr_el1;
> +    v->arch.tpidrro_el0 = ctxt.tpidrro_el0;
> +
> +    /* CP 15 */
> +    v->arch.csselr = ctxt.csselr;
> +
> +    v->arch.afsr0 = ctxt.afsr0;
> +    v->arch.afsr1 = ctxt.afsr1;
> +    v->arch.vbar = ctxt.vbar;
> +    v->arch.par = ctxt.par;
> +    v->arch.teecr = ctxt.teecr;
> +    v->arch.teehbr = ctxt.teehbr;
> +#ifdef CONFIG_ARM_32
> +    v->arch.joscr = ctxt.joscr;
> +    v->arch.jmcr = ctxt.jmcr;
> +#endif
> +
> +    /* fill guest core registers */
> +    memset(&c, 0, sizeof(c));
> +    c.x0 = ctxt.x0;
> +    c.x1 = ctxt.x1;
> +    c.x2 = ctxt.x2;
> +    c.x3 = ctxt.x3;
> +    c.x4 = ctxt.x4;
> +    c.x5 = ctxt.x5;
> +    c.x6 = ctxt.x6;
> +    c.x7 = ctxt.x7;
> +    c.x8 = ctxt.x8;
> +    c.x9 = ctxt.x9;
> +    c.x10 = ctxt.x10;
> +    c.x11 = ctxt.x11;
> +    c.x12 = ctxt.x12;
> +    c.x13 = ctxt.x13;
> +    c.x14 = ctxt.x14;
> +    c.x15 = ctxt.x15;
> +    c.x16 = ctxt.x16;
> +    c.x17 = ctxt.x17;
> +    c.x18 = ctxt.x18;
> +    c.x19 = ctxt.x19;
> +    c.x20 = ctxt.x20;
> +    c.x21 = ctxt.x21;
> +    c.x22 = ctxt.x22;
> +    c.x23 = ctxt.x23;
> +    c.x24 = ctxt.x24;
> +    c.x25 = ctxt.x25;
> +    c.x26 = ctxt.x26;
> +    c.x27 = ctxt.x27;
> +    c.x28 = ctxt.x28;
> +    c.x29 = ctxt.x29;
> +    c.x30 = ctxt.x30;
> +    c.pc64 = ctxt.pc64;
> +    c.cpsr = ctxt.cpsr;
> +    c.spsr_el1 = ctxt.spsr_el1; /* spsr_svc */
> +
> +#ifdef CONFIG_ARM_32
> +    c.spsr_fiq = ctxt.spsr_fiq;
> +    c.spsr_irq = ctxt.spsr_irq;
> +    c.spsr_und = ctxt.spsr_und;
> +    c.spsr_abt = ctxt.spsr_abt;
> +#endif
> +#ifdef CONFIG_ARM_64
> +    c.sp_el0 = ctxt.sp_el0;
> +    c.sp_el1 = ctxt.sp_el1;
> +    c.elr_el1 = ctxt.elr_el1;
> +#endif
> +
> +    /* set guest core registers */
> +    vcpu_regs_user_to_hyp(v, &c);
> +
> +    BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp_state));
> +    memcpy(&v->arch.vfp, &ctxt,  sizeof(v->arch.vfp));
> +
> +    v->is_initialised = 1;
> +    clear_bit(_VPF_down, &v->pause_flags);
> +
> +    return 0;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(VCPU, hvm_save_cpu_ctxt, hvm_load_cpu_ctxt, 1, 
> +                          HVMSR_PER_VCPU);
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 8679bfd..18e5899 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -86,10 +86,129 @@ struct hvm_arm_timer
>  };
>  DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);
>  
> +/* ARM core hardware info */
> +struct hvm_arm_cpu
> +{
> +    /* ======= Guest VFP State =======
> +     *   - 34 8-bytes required for AArch32 guests
> +     *   - 66 8-bytes required for AArch64 guests
> +     */
> +    uint64_t vfp_state[66];
> +
> +    /* ======= Guest Core Registers =======
> +     *   - Each reg is multiplexed for AArch64 and AArch32 guests, if possible
> +     *   - Each comments, /AArch64_reg, AArch32_reg/, describes its
> +     *     corresponding 64- and 32-bit register name. "NA" means
> +     *     "Not Applicable".
> +     *   - Check "struct vcpu_guest_core_regs" for details.
> +     */
> +    uint64_t x0;     /* x0, r0_usr */
> +    uint64_t x1;     /* x1, r1_usr */
> +    uint64_t x2;     /* x2, r2_usr */
> +    uint64_t x3;     /* x3, r3_usr */
> +    uint64_t x4;     /* x4, r4_usr */
> +    uint64_t x5;     /* x5, r5_usr */
> +    uint64_t x6;     /* x6, r6_usr */
> +    uint64_t x7;     /* x7, r7_usr */
> +    uint64_t x8;     /* x8, r8_usr */
> +    uint64_t x9;     /* x9, r9_usr */
> +    uint64_t x10;    /* x10, r10_usr */
> +    uint64_t x11;    /* x11, r11_usr */
> +    uint64_t x12;    /* x12, r12_usr */
> +    uint64_t x13;    /* x13, sp_usr */
> +    uint64_t x14;    /* x14, lr_usr; */
> +    uint64_t x15;    /* x15, __unused_sp_hyp */
> +    uint64_t x16;    /* x16, lr_irq */
> +    uint64_t x17;    /* x17, sp_irq */
> +    uint64_t x18;    /* x18, lr_svc */
> +    uint64_t x19;    /* x19, sp_svc */
> +    uint64_t x20;    /* x20, lr_abt */
> +    uint64_t x21;    /* x21, sp_abt */
> +    uint64_t x22;    /* x22, lr_und */
> +    uint64_t x23;    /* x23, sp_und */
> +    uint64_t x24;    /* x24, r8_fiq */
> +    uint64_t x25;    /* x25, r9_fiq */
> +    uint64_t x26;    /* x26, r10_fiq */
> +    uint64_t x27;    /* x27, r11_fiq */
> +    uint64_t x28;    /* x28, r12_fiq */
> +    uint64_t x29;    /* fp, sp_fiq */
> +    uint64_t x30;    /* lr, lr_fiq */

Please use "uint64_t x[31];" and some loops.

> +
> +    /* return address (EL1 ==> EL0) */
> +    uint64_t elr_el1;    /* elr_el1, NA */
> +    /* return address (EL2 ==> EL1) */
> +    uint64_t pc64;       /* elr_el2, elr_el2 */
> +
> +    /* spsr registers */
> +    uint32_t spsr_el1;   /* spsr_el1, spsr_svc */
> +    uint32_t spsr_fiq;   /* NA, spsr_fiq */
> +    uint32_t spsr_irq;   /* NA, spsr_irq */
> +    uint32_t spsr_und;   /* NA, spsr_und */
> +    uint32_t spsr_abt;   /* NA, spsr_abt */
> +
> +    /* stack pointers */
> +    uint64_t sp_el0;     /* sp_el0, NA */
> +    uint64_t sp_el1;     /* sp_el1, NA */
> +
> +    /* guest mode */
> +    uint32_t cpsr;   /* spsr_el2, spsr_el2 */
> +
> +    /* ======= Guest System Registers =======
> +     *   - multiplexed for AArch32 and AArch64 guests
> +     *   - 64-bit preferred if needed (for 64-bit guests)
> +     *   - architecture specific registers are noted specifically
> +     */
> +    /* exception */
> +    uint64_t vbar;      /* vbar, vbar */
> +
> +    /* mmu related */
> +    uint64_t ttbcr;     /* ttbcr, ttbcr */
> +    uint64_t ttbr0;     /* ttbr0, ttbr0 */
> +    uint64_t ttbr1;     /* ttbr1, ttbr1 */
> +    uint32_t dacr;      /* NA, dacr32 */
> +
> +    uint64_t par;       /* par, par */
> +    uint64_t mair0;     /* mair, mair0 */
> +    uint64_t mair1;     /* NA, mair1 */
> +
> +    /* fault status */
> +    uint32_t ifar;      /* ifar, ifar */
> +    uint32_t ifsr;      /* ifsr, ifsr */
> +    uint32_t dfar;      /* dfar, dfar */
> +    uint32_t dfsr;      /* dfsr, dfsr */
> +
> +    uint64_t far;       /* far, far */
> +    uint64_t esr;       /* esr, esr */
> +
> +    uint32_t afsr0;     /* afsr0, afsr0 */
> +    uint32_t afsr1;     /* afsr1, afsr1 */
> +
> +    /* thumbee and jazelle */
> +    uint32_t teecr;     /* NA, teecr */
> +    uint32_t teehbr;    /* NA, teehbr */
> +
> +    uint32_t joscr;     /* NA, joscr */
> +    uint32_t jmcr;      /* NA, jmcr */
> +
> +    /* control registers */
> +    uint32_t sctlr;     /* sctlr, sctlr */
> +    uint32_t actlr;     /* actlr, actlr */
> +    uint32_t cpacr;     /* cpacr, cpacr */
> +
> +    uint32_t csselr;    /* csselr, csselr */
> +
> +    /* software management related */
> +    uint32_t contextidr;  /* contextidr, contextidr */
> +    uint64_t tpidr_el0;   /* tpidr_el0, tpidr_el0 */
> +    uint64_t tpidr_el1;   /* tpidr_el1, tpidr_el1 */
> +    uint64_t tpidrro_el0; /* tpidrro_el0, tdidrro_el0 */
> +};

Again - 32/64bit alignment issues.

~Andrew

> +DECLARE_HVM_SAVE_TYPE(VCPU, 5, struct hvm_arm_cpu);
> +
>  /*
>   * Largest type-code in use
>   */
> -#define HVM_SAVE_CODE_MAX 4
> +#define HVM_SAVE_CODE_MAX 5
>  
>  #endif
>  

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
@ 2014-05-08 23:46   ` Andrew Cooper
  2014-05-14 11:51     ` Ian Campbell
  2014-05-11 15:28   ` Julien Grall
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2014-05-08 23:46 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 08/05/2014 22:18, Wei Huang wrote:
> This patch implements log_dirty for ARM guest VMs. This feature
> is provided via two basic blocks: dirty_bit_map and VLPT
> (virtual-linear page table)
>
> 1. VLPT provides fast accessing of 3rd PTE of guest P2M.
> When creating a mapping for VLPT, the page table mapping
> becomes the following:
>    xen's 1st PTE --> xen's 2nd PTE --> guest p2m's 2nd PTE -->
>    guest p2m's 3rd PTE
>
> With VLPT, xen can immediately locate the 3rd PTE of guest P2M
> and modify PTE attirbute during dirty page tracking. The following
> link shows the performance comparison for handling a dirty-page
> between VLPT and typical page table walking.
> http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html
>
> For more info about VLPT, please see
> http://www.technovelty.org/linux/virtual-linear-page-table.html.
>
> 2. Dirty bitmap
> The dirty bitmap is used to mark the pages which are dirty during
> migration. The info is used by Xen tools, via DOMCTL_SHADOW_OP_*,
> to figure out which guest pages need to be resent.
>
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/domain.c           |    6 +
>  xen/arch/arm/domctl.c           |   31 +++-
>  xen/arch/arm/mm.c               |  298 ++++++++++++++++++++++++++++++++++++++-
>  xen/arch/arm/p2m.c              |  204 +++++++++++++++++++++++++++
>  xen/arch/arm/traps.c            |    9 ++
>  xen/include/asm-arm/config.h    |   12 +-
>  xen/include/asm-arm/domain.h    |   19 +++
>  xen/include/asm-arm/mm.h        |   23 +++
>  xen/include/asm-arm/p2m.h       |    8 +-
>  xen/include/asm-arm/processor.h |    2 +
>  10 files changed, 599 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 40f1c3a..2eb5ce0 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -208,6 +208,9 @@ static void ctxt_switch_to(struct vcpu *n)
>  
>      isb();
>  
> +    /* Dirty-page tracing */
> +    log_dirty_restore(n->domain);
> +
>      /* This is could trigger an hardware interrupt from the virtual
>       * timer. The interrupt needs to be injected into the guest. */
>      WRITE_SYSREG32(n->arch.cntkctl, CNTKCTL_EL1);
> @@ -504,6 +507,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>      /* Default the virtual ID to match the physical */
>      d->arch.vpidr = boot_cpu_data.midr.bits;
>  
> +    /* Init log dirty support */
> +    log_dirty_init(d);
> +
>      clear_page(d->shared_info);
>      share_xen_page_with_guest(
>          virt_to_page(d->shared_info), d, XENSHARE_writable);
> diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
> index 45974e7..f1c34da 100644
> --- a/xen/arch/arm/domctl.c
> +++ b/xen/arch/arm/domctl.c
> @@ -10,30 +10,53 @@
>  #include <xen/errno.h>
>  #include <xen/sched.h>
>  #include <xen/hypercall.h>
> +#include <xen/guest_access.h>
>  #include <public/domctl.h>
>  
>  long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
>                      XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>  {
> +    long ret = 0;

'rc' is the more common name.

> +    bool_t copyback = 0;
> +
>      switch ( domctl->cmd )
>      {
> +    case XEN_DOMCTL_shadow_op:
> +    {
> +        ret = -EINVAL;
> +        copyback = 1;
> +
> +        if ( (d == current->domain) )   /* no domain_pause() */
> +            break;
> +
> +        domain_pause(d);
> +        ret = dirty_mode_op(d, &domctl->u.shadow_op);
> +        domain_unpause(d);
> +    }
> +    break;
> +
>      case XEN_DOMCTL_cacheflush:
>      {
>          unsigned long s = domctl->u.cacheflush.start_pfn;
>          unsigned long e = s + domctl->u.cacheflush.nr_pfns;
>  
>          if ( domctl->u.cacheflush.nr_pfns > (1U<<MAX_ORDER) )
> -            return -EINVAL;
> +            ret = -EINVAL;

This breaks the error handling.  The prevailing style would be:

rc = -EINVAL;
if ( something bad )
    goto out;

>  
>          if ( e < s )
> -            return -EINVAL;
> +            ret = -EINVAL;
>  
> -        return p2m_cache_flush(d, s, e);
> +        ret = p2m_cache_flush(d, s, e);
>      }
>  
>      default:
> -        return subarch_do_domctl(domctl, d, u_domctl);
> +        ret = subarch_do_domctl(domctl, d, u_domctl);
>      }
> +
> +    if ( copyback && __copy_to_guest(u_domctl, domctl, 1) )
> +        ret = -EFAULT;
> +
> +    return ret;
>  }
>  
>  void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index eac228c..81c0691 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -865,7 +865,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e)
>      create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
>  }
>  
> -enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
>  static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)
>  {
>      lpae_t pte;
> @@ -945,11 +944,6 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
>      return 0;
>  }
>  
> -unsigned long domain_get_maximum_gpfn(struct domain *d)
> -{
> -    return -ENOSYS;
> -}
> -
>  void share_xen_page_with_guest(struct page_info *page,
>                            struct domain *d, int readonly)
>  {
> @@ -1235,6 +1229,298 @@ int is_iomem_page(unsigned long mfn)
>          return 1;
>      return 0;
>  }
> +
> +
> +/* Return start and end addr of guest RAM. Note this function only reports 
> + * regular RAM. It does not cover other areas such as foreign mapped
> + * pages or MMIO space. */
> +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end)

const struct domain *d;

> +{
> +    if ( start )
> +        *start = GUEST_RAM_BASE;
> +
> +    if ( end )
> +        *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages << PAGE_SHIFT);
> +}
> +
> +/* Return the maximum GPFN of guest VM. It covers all guest memory types. */
> +unsigned long domain_get_maximum_gpfn(struct domain *d)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +
> +    return p2m->max_mapped_gfn;

This can be reduced to a single statement.

> +}
> +
> +/************************************/
> +/*    Dirty Page Tracking Support   */
> +/************************************/
> +/* Mark the bitmap for a corresponding page as dirty */
> +static inline void bitmap_mark_dirty(struct domain *d, paddr_t addr)
> +{
> +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;

Useless cast

> +    int bit_index = PFN_DOWN(addr - ram_base);

This is liable to truncation, and should absolutely be unsigned.

> +    int page_index = bit_index >> (PAGE_SHIFT + 3);
> +    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) - 1);

As should all of these.

> +
> +    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
> +}
> +
> +/* Allocate dirty bitmap resource */
> +static int bitmap_init(struct domain *d)

This function name is far too generic.

> +{
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    int nr_bytes;
> +    int nr_pages;
> +    int i;

Truncation and unsigned issues.  I will stop commenting on them now, but
most of this patch needs fixing.

> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +
> +    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
> +    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> +    BUG_ON(nr_pages > MAX_DIRTY_BITMAP_PAGES);

This looks like it should be an init() failure, or BUILD_BUG_ON().

> +
> +    for ( i = 0; i < nr_pages; ++i )
> +    {
> +        struct page_info *page;
> +        page = alloc_domheap_page(NULL, 0);

Those two lines can be combined, and needs a blank line following it.

> +        if ( page == NULL )
> +            goto cleanup_on_failure;
> +
> +        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));

__map_domain_page_global(page) is your friend, and it can fail so needs
checking.

How many pages is this?  global dompage mapping are scarce.

This function would become substantially more trivial if Xen had a
zalloc_domheap_pages() helper.  There is quite a bit of other code which
could take advantage.

> +        clear_page(d->arch.dirty.bitmap[i]);
> +    }
> +
> +    d->arch.dirty.bitmap_pages = nr_pages;
> +    return 0;
> +
> +cleanup_on_failure:
> +    nr_pages = i;
> +    for ( i = 0; i < nr_pages; ++i )
> +    {
> +        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
> +    }
> +
> +    return -ENOMEM;
> +}
> +
> +/* Cleanup dirty bitmap resource */
> +static void bitmap_cleanup(struct domain *d)
> +{
> +    int i;
> +
> +    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +    {
> +        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
> +    }
> +}
> +
> +/* Flush VLPT area */
> +static void vlpt_flush(struct domain *d)
> +{
> +    int flush_size;
> +    flush_size = (d->arch.dirty.second_lvl_end - 
> +                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
> +
> +    /* flushing the 3rd level mapping */
> +    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start << SECOND_SHIFT,
> +                                flush_size);
> +}
> +
> +/* Set up a page table for VLPT mapping */
> +static int vlpt_init(struct domain *d)
> +{
> +    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
> +    int xen_second_linear_base;
> +    int gp2m_start_index, gp2m_end_index;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    struct page_info *second_lvl_page;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    lpae_t *first[2];
> +    int i;
> +
> +    /* Check if reserved space is enough to cover guest physical address space.
> +     * Note that each LPAE page table entry is 64-bit (8 bytes). So we only
> +     * shift left with LPAE_SHIFT instead of PAGE_SHIFT. */
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> +    if ( required > avail )
> +    {
> +        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest (avail: "
> +                "%#llx, required: %#llx)\n", (unsigned long long)avail,

PRIx64 please, and loose the casts.

> +                (unsigned long long)required);
> +        return -ENOMEM;
> +    }
> +
> +    /* Caulculate the base of 2nd linear table base for VIRT_LIN_P2M_START */
> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
> +    {
> +        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
> +        return -ENOMEM;
> +    }
> +
> +    /* Two pages are allocated to backup the related PTE content of guest 
> +     * VM's 1st-level table. */
> +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> +    if ( second_lvl_page == NULL )
> +        return -ENOMEM;
> +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page) );
> +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page+1) );
> +
> +    /* 1st level P2M of guest VM is 2 consecutive pages */
> +    first[0] = __map_domain_page(p2m->first_level);
> +    first[1] = __map_domain_page(p2m->first_level+1);

spaces around binary operators.

> +
> +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> +
> +        /* Update 2nd-level PTE of Xen linear table. With this, Xen linear 
> +         * page table layout becomes: 1st Xen linear ==> 2nd Xen linear ==> 
> +         * 2nd guest P2M (i.e. 3rd Xen linear) ==> 3rd guest P2M (i.e. Xen 
> +         * linear content) for VIRT_LIN_P2M_START address space. */
> +        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
> +
> +        /* We copy the mapping into domain's structure as a reference
> +         * in case of the context switch (used in vlpt_restore function ) */
> +        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
> +    }
> +    unmap_domain_page(first[0]);
> +    unmap_domain_page(first[1]);
> +
> +    /* storing the start and end index */
> +    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
> +    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
> +
> +    vlpt_flush(d);
> +
> +    return 0;
> +}
> +
> +static void vlpt_cleanup(struct domain *d)
> +{
> +    /* First level p2m is 2 consecutive pages */
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> +}
> +
> +/* Returns zero if addr is not valid or dirty mode is not set */
> +int handle_page_fault(struct domain *d, paddr_t addr)
> +{
> +    lpae_t *vlp2m_pte = 0;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +
> +    if ( !d->arch.dirty.mode )
> +        return 0;
> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +
> +    /* Ensure that addr is inside guest's RAM */
> +    if ( addr < gma_start || addr > gma_end )
> +        return 0;
> +
> +    vlp2m_pte = vlpt_get_3lvl_pte(addr);
> +    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
> +         vlp2m_pte->p2m.type == p2m_ram_logdirty )
> +    {
> +        lpae_t pte = *vlp2m_pte;
> +        pte.p2m.write = 1;
> +        write_pte(vlp2m_pte, pte);
> +        flush_tlb_local();
> +
> +        /* only necessary to lock between get-dirty bitmap and mark dirty
> +         * bitmap. If get-dirty bitmap happens immediately before this
> +         * lock, the corresponding dirty-page would be marked at the next
> +         * round of get-dirty bitmap */
> +        spin_lock(&d->arch.dirty.lock);
> +        bitmap_mark_dirty(d, addr);
> +        spin_unlock(&d->arch.dirty.lock);
> +    }
> +
> +    return 1;
> +}
> +
> +/* Restore the xen page table for vlpt mapping for domain */
> +void log_dirty_restore(struct domain *d)
> +{
> +    int i;
> +
> +    /* Nothing to do as log dirty mode is off */
> +    if ( !(d->arch.dirty.mode) )

superfluous brackets.

> +        return;
> +
> +    dsb(sy);
> +
> +    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
> +          ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
> +        {
> +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
> +            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
> +        }
> +    }
> +
> +    dsb(sy);
> +    isb();
> +}
> +
> +/* Turn on log dirty */
> +int log_dirty_on(struct domain *d)
> +{
> +    if ( vlpt_init(d) || bitmap_init(d) )
> +        return -EINVAL;

This hides -ENOMEM from each of the init functions.

I am a fan of

return vlpt_init(d) ?: bitmap_init(d);

As an easy way of chaining a set of functions together if they succeed. 
Ian on the other hand isn't so I doubt you could get away with it.

> +
> +    return 0;
> +}
> +
> +/* Turn off log dirty */
> +void log_dirty_off(struct domain *d)
> +{
> +    bitmap_cleanup(d);
> +    vlpt_cleanup(d);
> +}
> +
> +/* Initialize log dirty fields */
> +int log_dirty_init(struct domain *d)
> +{
> +    d->arch.dirty.count = 0;
> +    d->arch.dirty.mode = 0;
> +    spin_lock_init(&d->arch.dirty.lock);
> +
> +    d->arch.dirty.second_lvl_start = 0;
> +    d->arch.dirty.second_lvl_end = 0;
> +    d->arch.dirty.second_lvl[0] = NULL;
> +    d->arch.dirty.second_lvl[1] = NULL;
> +
> +    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
> +    d->arch.dirty.bitmap_pages = 0;
> +
> +    return 0;
> +}
> +
> +/* Log dirty tear down */
> +void log_dirty_teardown(struct domain *d)
> +{
> +    return;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 603c097..0808cc9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -6,6 +6,8 @@
>  #include <xen/bitops.h>
>  #include <asm/flushtlb.h>
>  #include <asm/gic.h>
> +#include <xen/guest_access.h>
> +#include <xen/pfn.h>
>  #include <asm/event.h>
>  #include <asm/hardirq.h>
>  #include <asm/page.h>
> @@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
>          break;
>  
>      case p2m_ram_ro:
> +    case p2m_ram_logdirty:
>          e.p2m.xn = 0;
>          e.p2m.write = 0;
>          break;
> @@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
>  
>      pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
>  
> +    /* mark the write bit (page table's case, ro bit) as 0
> +     * so, it is writable in case of vlpt access */
> +    pte.pt.ro = 0;
> +
>      write_pte(entry, pte);
>  
>      return 0;
> @@ -696,6 +703,203 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
>      return p >> PAGE_SHIFT;
>  }
>  
> +/* Change types across all p2m entries in a domain */
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    paddr_t ram_base;
> +    int i1, i2, i3;
> +    int first_index, second_index, third_index;
> +    lpae_t *first = __map_domain_page(p2m->first_level);
> +    lpae_t pte, *second = NULL, *third = NULL;
> +
> +    domain_get_ram_range(d, &ram_base, NULL);
> +
> +    first_index = first_table_offset((uint64_t)ram_base);
> +    second_index = second_table_offset((uint64_t)ram_base);
> +    third_index = third_table_offset((uint64_t)ram_base);
> +
> +    BUG_ON(!first);
> +
> +    spin_lock(&p2m->lock);
> +
> +    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
> +    {
> +        lpae_walk_t first_pte = first[i1].walk;
> +        if ( !first_pte.valid || !first_pte.table )
> +            goto out;
> +
> +        second = map_domain_page(first_pte.base);
> +        BUG_ON(!second);

map_domain_page() cant fail.

> +
> +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> +        {
> +            lpae_walk_t second_pte = second[i2].walk;
> +
> +            if ( !second_pte.valid || !second_pte.table )
> +                goto out;
> +
> +            third = map_domain_page(second_pte.base);
> +            BUG_ON(!third);
> +
> +            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
> +            {
> +                lpae_walk_t third_pte = third[i3].walk;
> +
> +                if ( !third_pte.valid )
> +                    goto out;
> +
> +                pte = third[i3];
> +
> +                if ( nt == mg_ro )
> +                {
> +                    if ( pte.p2m.write == 1 )
> +                    {
> +                        pte.p2m.write = 0;
> +                        pte.p2m.type = p2m_ram_logdirty;
> +                    }
> +                    else
> +                    {
> +                        /* reuse avail bit as an indicator of 'actual' 
> +                         * read-only */
> +                        pte.p2m.type = p2m_ram_rw;
> +                    }
> +                }
> +                else if ( nt == mg_rw )
> +                {
> +                    if ( pte.p2m.write == 0 && 
> +                         pte.p2m.type == p2m_ram_logdirty )
> +                    {
> +                        pte.p2m.write = p2m_ram_rw;
> +                    }
> +                }
> +                write_pte(&third[i3], pte);
> +            }
> +            unmap_domain_page(third);
> +
> +            third = NULL;
> +            third_index = 0;
> +        }
> +        unmap_domain_page(second);
> +
> +        second = NULL;
> +        second_index = 0;
> +        third_index = 0;
> +    }
> +
> +out:
> +    flush_tlb_all_local();
> +    if ( third ) unmap_domain_page(third);
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +}
> +
> +/* Read a domain's log-dirty bitmap and stats. If the operation is a CLEAN, 
> + * clear the bitmap and stats. */
> +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    int peek = 1;
> +    int i;
> +    int bitmap_size;
> +    paddr_t gma_start, gma_end;
> +
> +    /* this hypercall is called from domain 0, and we don't know which guest's
> +     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
> +    log_dirty_restore(d);
> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +    bitmap_size = (gma_end - gma_start) / 8;
> +
> +    if ( guest_handle_is_null(sc->dirty_bitmap) )
> +    {
> +        peek = 0;
> +    }
> +    else
> +    {
> +        spin_lock(&d->arch.dirty.lock);
> +
> +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +        {
> +            int j = 0;
> +            uint8_t *bitmap;
> +
> +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> +                                 d->arch.dirty.bitmap[i],
> +                                 bitmap_size < PAGE_SIZE ? bitmap_size :
> +                                                           PAGE_SIZE);
> +            bitmap_size -= PAGE_SIZE;
> +
> +            /* set p2m page table read-only */
> +            bitmap = d->arch.dirty.bitmap[i];
> +            while ((j = find_next_bit((const long unsigned int *)bitmap,
> +                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
> +            {
> +                lpae_t *vlpt;
> +                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +
> +                    (j << PAGE_SHIFT);
> +                vlpt = vlpt_get_3lvl_pte(addr);
> +                vlpt->p2m.write = 0;
> +                j++;
> +            }
> +        }
> +
> +        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
> +        {
> +            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +            {
> +                clear_page(d->arch.dirty.bitmap[i]);
> +            }
> +        }
> +
> +        spin_unlock(&d->arch.dirty.lock);
> +        flush_tlb_local();
> +    }
> +
> +    sc->stats.dirty_count = d->arch.dirty.count;
> +
> +    return 0;
> +}
> +
> +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    long ret = 0;
> +    switch (sc->op)
> +    {
> +        case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
> +        case XEN_DOMCTL_SHADOW_OP_OFF:
> +        {
> +            enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro;
> +
> +            d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1;
> +            p2m_change_entry_type_global(d, nt);
> +
> +            if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF )
> +            {
> +                log_dirty_off(d);
> +            }
> +            else
> +            {
> +                if ( (ret = log_dirty_on(d)) )
> +                    return ret;
> +            }
> +        }
> +        break;
> +
> +        case XEN_DOMCTL_SHADOW_OP_CLEAN:
> +        case XEN_DOMCTL_SHADOW_OP_PEEK:
> +        {
> +            ret = log_dirty_op(d, sc);
> +        }
> +        break;
> +
> +        default:
> +            return -ENOSYS;
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index df4d375..b652565 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1556,6 +1556,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>      struct hsr_dabt dabt = hsr.dabt;
>      int rc;
>      mmio_info_t info;
> +    int page_fault = ( dabt.write && ((dabt.dfsc & FSC_MASK) == 
> +                                      (FSC_FLT_PERM|FSC_3RD_LEVEL)) );

This looks like a bool_t to me.

~Andrew

>  
>      if ( !check_conditional_instr(regs, hsr) )
>      {
> @@ -1577,6 +1579,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>      if ( rc == -EFAULT )
>          goto bad_data_abort;
>  
> +    /* domU page fault handling for guest live migration. Note that 
> +     * dabt.valid can be 0 here */
> +    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
> +    {
> +        /* Do not modify PC as guest needs to repeat memory operation */
> +        return;
> +    }
>      /* XXX: Decode the instruction if ISS is not valid */
>      if ( !dabt.valid )
>          goto bad_data_abort;
> diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
> index ef291ff..f18fae4 100644
> --- a/xen/include/asm-arm/config.h
> +++ b/xen/include/asm-arm/config.h
> @@ -87,6 +87,7 @@
>   *   0  -   8M   <COMMON>
>   *
>   *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> + * 128M - 256M   Virtual-linear mapping to P2M table
>   * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
>   *                    space
>   *
> @@ -124,13 +125,15 @@
>  #define CONFIG_SEPARATE_XENHEAP 1
>  
>  #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
> -#define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
> +#define VMAP_VIRT_START        _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
>  #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
>  #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
>  #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
>  #define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)
>  
> -#define VMAP_VIRT_END    XENHEAP_VIRT_START
> +#define VMAP_VIRT_END          XENHEAP_VIRT_START
>  
>  #define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
>  
> @@ -157,6 +160,11 @@
>  
>  #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
>  
> +/* Definition for VIRT_LIN_P2M_START and VIRT_LIN_P2M_END (64-bit)
> + * TODO: Needs evaluation. */
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t, 0x08000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
> +
>  #endif
>  
>  /* Fixmap slots */
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index aabeb51..ac82643 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -162,6 +162,25 @@ struct arch_domain
>      } vuart;
>  
>      unsigned int evtchn_irq;
> +
> +    /* dirty page tracing */
> +    struct {
> +        spinlock_t lock;
> +        volatile int mode;               /* 1 if dirty pages tracing enabled */
> +        volatile unsigned int count;     /* dirty pages counter */
> +
> +        /* vlpt context switch */
> +        volatile int second_lvl_start; /* start idx of virt linear space 2nd */
> +        volatile int second_lvl_end;   /* end idx of virt linear space 2nd */
> +        lpae_t *second_lvl[2];         /* copy of guest P2M 1st-lvl content */
> +
> +        /* bitmap to track dirty pages */
> +#define MAX_DIRTY_BITMAP_PAGES 64
> +        /* Because each bit represents a dirty page, the total supported guest 
> +         * memory is (64 entries x 4KB/entry x 8bits/byte x 4KB) = 8GB. */
> +        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES]; /* dirty bitmap */
> +        int bitmap_pages;                        /* # of dirty bitmap pages */
> +    } dirty;
>  }  __cacheline_aligned;
>  
>  struct arch_vcpu
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index b8d4e7d..ab19025 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -4,6 +4,7 @@
>  #include <xen/config.h>
>  #include <xen/kernel.h>
>  #include <asm/page.h>
> +#include <asm/config.h>
>  #include <public/xen.h>
>  
>  /* Align Xen to a 2 MiB boundary. */
> @@ -320,6 +321,7 @@ int donate_page(
>  #define domain_clamp_alloc_bitsize(d, b) (b)
>  
>  unsigned long domain_get_maximum_gpfn(struct domain *d);
> +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end);
>  
>  extern struct domain *dom_xen, *dom_io, *dom_cow;
>  
> @@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page)
>      put_page(page);
>  }
>  
> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
> +
> +/************************************/
> +/*    Log-dirty support functions   */
> +/************************************/
> +int log_dirty_on(struct domain *d);
> +void log_dirty_off(struct domain *d);
> +int log_dirty_init(struct domain *d);
> +void log_dirty_teardown(struct domain *d);
> +void log_dirty_restore(struct domain *d);
> +int handle_page_fault(struct domain *d, paddr_t addr);
> +/* access leaf PTE of a given guest address (GPA) */
> +static inline lpae_t * vlpt_get_3lvl_pte(paddr_t addr)
> +{
> +    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
> +
> +    /* Since we slotted the guest's first p2m page table to xen's
> +     * second page table, one shift is enough for calculating the
> +     * index of guest p2m table entry */
> +    return &table[addr >> PAGE_SHIFT];
> +}
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index bd71abe..0cecbe7 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -2,6 +2,7 @@
>  #define _XEN_P2M_H
>  
>  #include <xen/mm.h>
> +#include <public/domctl.h>
>  
>  struct domain;
>  
> @@ -41,6 +42,7 @@ typedef enum {
>      p2m_invalid = 0,    /* Nothing mapped here */
>      p2m_ram_rw,         /* Normal read/write guest RAM */
>      p2m_ram_ro,         /* Read-only; writes are silently dropped */
> +    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
>      p2m_mmio_direct,    /* Read/write mapping of genuine MMIO area */
>      p2m_map_foreign,    /* Ram pages from foreign domain */
>      p2m_grant_map_rw,   /* Read/write grant mapping */
> @@ -49,7 +51,8 @@ typedef enum {
>  } p2m_type_t;
>  
>  #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
> -#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
> +#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro ||  \
> +                             (_t) == p2m_ram_logdirty)
>  
>  /* Initialise vmid allocator */
>  void p2m_vmid_allocator_init(void);
> @@ -178,6 +181,9 @@ static inline int get_page_and_type(struct page_info *page,
>      return rc;
>  }
>  
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt);
> +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
> +
>  #endif /* _XEN_P2M_H */
>  
>  /*
> diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
> index 750864a..0bf3d67 100644
> --- a/xen/include/asm-arm/processor.h
> +++ b/xen/include/asm-arm/processor.h
> @@ -407,6 +407,8 @@ union hsr {
>  #define FSC_CPR        (0x3a) /* Coprocossor Abort */
>  
>  #define FSC_LL_MASK    (_AC(0x03,U)<<0)
> +#define FSC_MASK       (0x3f) /* Fault status mask */
> +#define FSC_3RD_LEVEL  (0x03) /* Third level fault */
>  
>  /* Time counter hypervisor control register */
>  #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical counter */

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 22:20     ` Wei Huang
@ 2014-05-09  8:56       ` Julien Grall
  2014-05-14 10:27         ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-09  8:56 UTC (permalink / raw)
  To: Wei Huang, Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	jaeyong.yoo, jbeulich, yjhyun.yoo

Hi Andrew and Wei,

On 08/05/14 23:20, Wei Huang wrote:
> On 05/08/2014 05:11 PM, Andrew Cooper wrote:
>> On 08/05/2014 22:18, Wei Huang wrote:
>>
>>
>>> diff --git a/xen/include/asm-arm/hvm/support.h
>>> b/xen/include/asm-arm/hvm/support.h
>>> new file mode 100644
>>> index 0000000..fa5fe75
>>> --- /dev/null
>>> +++ b/xen/include/asm-arm/hvm/support.h
>>> @@ -0,0 +1,29 @@
>>> +/*
>>> + * HVM support routines
>>> + *
>>> + * Copyright (c) 2014, Samsung Electronics.
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>>> modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but
>>> WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of
>>> MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
>>> License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> along with
>>> + * this program; if not, write to the Free Software Foundation,
>>> Inc., 59 Temple
>>> + * Place - Suite 330, Boston, MA 02111-1307 USA.
>>> + */
>>> +
>>> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
>>> +#define __ASM_ARM_HVM_SUPPORT_H__
>>> +
>>> +#include <xen/types.h>
>>> +#include <public/hvm/ioreq.h>
>>> +#include <xen/sched.h>
>>> +#include <xen/hvm/save.h>
>>> +#include <asm/processor.h>
>>> +
>>> +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
>>
>> This header file isn't touched by any subsequent patches, and just
>> having it as a list of includes is overkill.  Can it be dropped?  5
>> extra includes in a few .c files is hardly breaking the bank.
> Last time I tried it quickly, the compilation broke. Will try again.

This header is used in common code (xen/common/hvm/save.c). It might be 
better on have a dummy header.

Some of the includes you've added shouldn't not be used by ARM (i.e 
public/hvm/ioreq.h).

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
  2014-05-08 22:11   ` Andrew Cooper
@ 2014-05-09  9:06   ` Julien Grall
  2014-05-09  9:42   ` Jan Beulich
  2014-05-14 10:37   ` Ian Campbell
  3 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-09  9:06 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

On 08/05/14 22:18, Wei Huang wrote:
> +void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
> +{
> +    hdr->magic = HVM_ARM_FILE_MAGIC;
> +    hdr->version = HVM_ARM_FILE_VERSION;
> +    hdr->cpuinfo = READ_SYSREG32(MIDR_EL1);

You can directly use boot_cpu_data.midr.bits.

> +}
> +
> +int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
> +{
> +    uint32_t cpuinfo;
> +
> +    if ( hdr->magic != HVM_ARM_FILE_MAGIC )
> +    {
> +        printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n",
> +               d->domain_id, hdr->magic);
> +        return -EINVAL;
> +    }
> +
> +    if ( hdr->version != HVM_ARM_FILE_VERSION )
> +    {
> +        printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n",
> +               d->domain_id, hdr->version);
> +        return -EINVAL;
> +    }
> +
> +    cpuinfo = READ_SYSREG32(MIDR_EL1);

Same here.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
  2014-05-08 22:47   ` Andrew Cooper
@ 2014-05-09  9:17   ` Julien Grall
  2014-05-14 11:07   ` Ian Campbell
  2014-05-15 17:15   ` Julien Grall
  3 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-09  9:17 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

(Adding Vijay in CC).

Vijay is working on GICv3 support in Xen. His patch series and this 
patch will clash sooner.

I think you should work together to avoid big reworking later.

On 08/05/14 22:18, Wei Huang wrote:
> This patch implements a save/restore support for
> ARM guest GIC. Two types of GIC V2 states are saved seperately:

separately

> 1) VGICD_* contains the GIC distributor state from
> guest VM's view; 2) GICH_* is the GIC virtual control

I would add a newline before "2)", we don't care about long commit 
message :).

> state from hypervisor's persepctive.

perspective

>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>   xen/arch/arm/vgic.c                    |  171 ++++++++++++++++++++++++++++++++
>   xen/include/public/arch-arm/hvm/save.h |   34 ++++++-
>   2 files changed, 204 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 4cf6470..505e944 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -24,6 +24,7 @@
>   #include <xen/softirq.h>
>   #include <xen/irq.h>
>   #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>
>   #include <asm/current.h>
>
> @@ -73,6 +74,110 @@ static struct vgic_irq_rank *vgic_irq_rank(struct vcpu *v, int b, int n)
>           return NULL;
>   }
>
> +/* Save guest VM's distributor info into a context to support domains
> + * save/restore. Such info represents guest VM's view of its GIC
> + * distributor (GICD_*).
> + */
> +static int hvm_vgicd_save(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_arm_vgicd_v2 ctxt;
> +    struct vcpu *v;
> +    struct vgic_irq_rank *rank;
> +    int rc = 0;
> +
> +    /* Save the state for each VCPU */
> +    for_each_vcpu( d, v )
> +    {
> +        rank = &v->arch.vgic.private_irqs;
> +
> +        /* IENABLE, IACTIVE, IPEND,  PENDSGI */
> +        ctxt.ienable = rank->ienable;
> +        ctxt.iactive = rank->iactive;
> +        ctxt.ipend = rank->ipend;
> +        ctxt.pendsgi = rank->pendsgi;
> +
> +        /* ICFG */
> +        ctxt.icfg[0] = rank->icfg[0];
> +        ctxt.icfg[1] = rank->icfg[1];

I would use the same pattern as IPRIOTITY and ITARGETS.

[..]

> +/* Load guest VM's distributor info from a context to support domain
> + * save/restore. The info is loaded into vgic_irq_rank.
> + */
> +static int hvm_vgicd_load(struct domain *d, hvm_domain_context_t *h)
> +{

[..]

> +    /* ICFG */
> +    rank->icfg[0] = ctxt.icfg[0];
> +    rank->icfg[1] = ctxt.icfg[1];

Same remark here.

[..]

> +/* Save GIC virtual control state into a context to support save/restore.
> + * The info reprsents most of GICH_* registers. */

represents

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
  2014-05-08 22:11   ` Andrew Cooper
  2014-05-09  9:06   ` Julien Grall
@ 2014-05-09  9:42   ` Jan Beulich
  2014-05-14 10:37   ` Ian Campbell
  3 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2014-05-09  9:42 UTC (permalink / raw)
  To: Wei Huang
  Cc: tim, keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, ian.jackson, jaeyong.yoo, xen-devel, yjhyun.yoo

>>> On 08.05.14 at 23:18, <w1.huang@samsung.com> wrote:
> This patch implements a basic framework for ARM guest
> save/restore. It defines a HVM save header for ARM guests
> and correponding arch_ save/load functions. These functions
> are hooked up with domain control hypercalls (gethvmcontext
> and sethvmcontext). The hypercalls become a common code path to
> both x86 and ARM. As a result of merging, the x86 specific
> header saving code is moved to x86 sub-directory.
> 
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/Makefile                  |    1 +
>  xen/arch/arm/save.c                    |   68 +++++++++++++++++++++++++
>  xen/arch/x86/domctl.c                  |   85 -------------------------------
>  xen/arch/x86/hvm/save.c                |   12 +++++
>  xen/common/Makefile                    |    2 +-
>  xen/common/domctl.c                    |   86 ++++++++++++++++++++++++++++++++
>  xen/common/hvm/save.c                  |   11 ----
>  xen/include/asm-arm/hvm/support.h      |   29 +++++++++++
>  xen/include/public/arch-arm/hvm/save.h |   19 +++++++
>  9 files changed, 216 insertions(+), 97 deletions(-)

Provided it is like it looks like - just code movement - for the non-ARM
bits:
Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 22:47   ` Andrew Cooper
@ 2014-05-09 14:12     ` Wei Huang
  2014-05-09 14:24       ` Ian Campbell
  2014-05-13 14:53     ` Wei Huang
  1 sibling, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-09 14:12 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 05/08/2014 05:47 PM, Andrew Cooper wrote:
>> +DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
>> +
>>   /*
>>    * Largest type-code in use
>>    */
>> -#define HVM_SAVE_CODE_MAX 1
>> +#define HVM_SAVE_CODE_MAX 3
>>
>>   #endif
>>
>
> On x86, we require that HVM save records only contain architectural
> state.  Not knowing arm myself, it is not clear from your comments
> whether this is the case or not.  Can you confirm whether it is or not?
Most states are guest architecture states which include core registers, 
arch timer, memory. GIC states are arguable, given that Xen uses data 
structures (e.g. struct vgic_irq_rank) to represent GIC states internally.
>
> ~Andrew
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-09 14:12     ` Wei Huang
@ 2014-05-09 14:24       ` Ian Campbell
  2014-05-11 16:15         ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-09 14:24 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, Andrew Cooper, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Fri, 2014-05-09 at 09:12 -0500, Wei Huang wrote:
> On 05/08/2014 05:47 PM, Andrew Cooper wrote:
> >> +DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
> >> +
> >>   /*
> >>    * Largest type-code in use
> >>    */
> >> -#define HVM_SAVE_CODE_MAX 1
> >> +#define HVM_SAVE_CODE_MAX 3
> >>
> >>   #endif
> >>
> >
> > On x86, we require that HVM save records only contain architectural
> > state.  Not knowing arm myself, it is not clear from your comments
> > whether this is the case or not.  Can you confirm whether it is or not?
> Most states are guest architecture states which include core registers, 
> arch timer, memory. GIC states are arguable, given that Xen uses data 
> structures (e.g. struct vgic_irq_rank) to represent GIC states internally.

(note: I've not looked at this series for ages, I plan to look at this
new version next week)

The contents of vgic_irq_rank is still a set of architectural register,
I think (the rank thing is just to account for the fact that some
registers use 1 bit to describe 32-registers, some use 2 bits to
describe 16, etc).

If there is something in a vgic rank which needs saving which isn't in
an architectural form then it needs synthesizing to/from the
architectural state on save/restore.

IOW ARM has the same requirements as x86 here.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-08 23:10   ` Andrew Cooper
@ 2014-05-09 16:35     ` Wei Huang
  2014-05-09 16:52       ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-09 16:35 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

On 05/08/2014 06:10 PM, Andrew Cooper wrote:
> On 08/05/2014 22:18, Wei Huang wrote:
>> +#ifdef CONFIG_ARM_32
>> +    v->arch.ifar = ctxt.ifar;
>> +    v->arch.dfar = ctxt.dfar;
>> +    v->arch.dfsr = ctxt.dfsr;
>> +#else
>> +    v->arch.far = ctxt.far;
>> +    v->arch.esr = ctxt.esr;
>> +#endif
>
> Where you have code like this, please use a union in the structure to
> reduce its size.
I thought about it from your comments. We can only combine ifar and dfar 
into the far register. But it became a bit confusing. Will try again.
>
>> +    /* ======= Guest Core Registers =======
>> +     *   - Each reg is multiplexed for AArch64 and AArch32 guests, if possible
>> +     *   - Each comments, /AArch64_reg, AArch32_reg/, describes its
>> +     *     corresponding 64- and 32-bit register name. "NA" means
>> +     *     "Not Applicable".
>> +     *   - Check "struct vcpu_guest_core_regs" for details.
>> +     */
>> +    uint64_t x0;     /* x0, r0_usr */
>> +    uint64_t x1;     /* x1, r1_usr */
>> +    uint64_t x2;     /* x2, r2_usr */
>> +    uint64_t x3;     /* x3, r3_usr */
>> +    uint64_t x4;     /* x4, r4_usr */
>> +    uint64_t x5;     /* x5, r5_usr */
>> +    uint64_t x6;     /* x6, r6_usr */
>> +    uint64_t x7;     /* x7, r7_usr */
>> +    uint64_t x8;     /* x8, r8_usr */
>> +    uint64_t x9;     /* x9, r9_usr */
>> +    uint64_t x10;    /* x10, r10_usr */
>> +    uint64_t x11;    /* x11, r11_usr */
>> +    uint64_t x12;    /* x12, r12_usr */
>> +    uint64_t x13;    /* x13, sp_usr */
>> +    uint64_t x14;    /* x14, lr_usr; */
>> +    uint64_t x15;    /* x15, __unused_sp_hyp */
>> +    uint64_t x16;    /* x16, lr_irq */
>> +    uint64_t x17;    /* x17, sp_irq */
>> +    uint64_t x18;    /* x18, lr_svc */
>> +    uint64_t x19;    /* x19, sp_svc */
>> +    uint64_t x20;    /* x20, lr_abt */
>> +    uint64_t x21;    /* x21, sp_abt */
>> +    uint64_t x22;    /* x22, lr_und */
>> +    uint64_t x23;    /* x23, sp_und */
>> +    uint64_t x24;    /* x24, r8_fiq */
>> +    uint64_t x25;    /* x25, r9_fiq */
>> +    uint64_t x26;    /* x26, r10_fiq */
>> +    uint64_t x27;    /* x27, r11_fiq */
>> +    uint64_t x28;    /* x28, r12_fiq */
>> +    uint64_t x29;    /* fp, sp_fiq */
>> +    uint64_t x30;    /* lr, lr_fiq */
>
> Please use "uint64_t x[31];" and some loops.
>
Register multiplexing is very confusing when 32-bit VM is running on a 
64-bit Xen. Even though your request is reasonable, I still consider 
expanding the register names here helpful. At least it gives me space to 
add comments to show the register mapping layout. I believe others will 
find it useful as well.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-09 16:35     ` Wei Huang
@ 2014-05-09 16:52       ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-09 16:52 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, Andrew Cooper, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Fri, 2014-05-09 at 11:35 -0500, Wei Huang wrote:
> On 05/08/2014 06:10 PM, Andrew Cooper wrote:
> > On 08/05/2014 22:18, Wei Huang wrote:
> >> +#ifdef CONFIG_ARM_32
> >> +    v->arch.ifar = ctxt.ifar;
> >> +    v->arch.dfar = ctxt.dfar;
> >> +    v->arch.dfsr = ctxt.dfsr;
> >> +#else
> >> +    v->arch.far = ctxt.far;
> >> +    v->arch.esr = ctxt.esr;
> >> +#endif
> >
> > Where you have code like this, please use a union in the structure to
> > reduce its size.
> I thought about it from your comments. We can only combine ifar and dfar 
> into the far register. But it became a bit confusing. Will try again.

We use an ifdef for these particular register everywhere internally
already, because they are quite annoyingly subtley different between
arm32 and arm64. There is no requirement for the save/restore code to do
any different/better.

For the public API half we can't have this sort of ifdef though, so a
union or just having all of the field is probably required. Don't forget
about the 32-bit guest on 64-bit hypervisor case (caveat: I've only
looked at the above quoted snippet, not the whole context)

> >> +    /* ======= Guest Core Registers =======
> >> +     *   - Each reg is multiplexed for AArch64 and AArch32 guests, if possible
> >> +     *   - Each comments, /AArch64_reg, AArch32_reg/, describes its
> >> +     *     corresponding 64- and 32-bit register name. "NA" means
> >> +     *     "Not Applicable".
> >> +     *   - Check "struct vcpu_guest_core_regs" for details.
> >> +     */
> >> +    uint64_t x0;     /* x0, r0_usr */
> >> +    uint64_t x1;     /* x1, r1_usr */
> >> +    uint64_t x2;     /* x2, r2_usr */
> >> +    uint64_t x3;     /* x3, r3_usr */
> >> +    uint64_t x4;     /* x4, r4_usr */
> >> +    uint64_t x5;     /* x5, r5_usr */
> >> +    uint64_t x6;     /* x6, r6_usr */
> >> +    uint64_t x7;     /* x7, r7_usr */
> >> +    uint64_t x8;     /* x8, r8_usr */
> >> +    uint64_t x9;     /* x9, r9_usr */
> >> +    uint64_t x10;    /* x10, r10_usr */
> >> +    uint64_t x11;    /* x11, r11_usr */
> >> +    uint64_t x12;    /* x12, r12_usr */
> >> +    uint64_t x13;    /* x13, sp_usr */
> >> +    uint64_t x14;    /* x14, lr_usr; */
> >> +    uint64_t x15;    /* x15, __unused_sp_hyp */
> >> +    uint64_t x16;    /* x16, lr_irq */
> >> +    uint64_t x17;    /* x17, sp_irq */
> >> +    uint64_t x18;    /* x18, lr_svc */
> >> +    uint64_t x19;    /* x19, sp_svc */
> >> +    uint64_t x20;    /* x20, lr_abt */
> >> +    uint64_t x21;    /* x21, sp_abt */
> >> +    uint64_t x22;    /* x22, lr_und */
> >> +    uint64_t x23;    /* x23, sp_und */
> >> +    uint64_t x24;    /* x24, r8_fiq */
> >> +    uint64_t x25;    /* x25, r9_fiq */
> >> +    uint64_t x26;    /* x26, r10_fiq */
> >> +    uint64_t x27;    /* x27, r11_fiq */
> >> +    uint64_t x28;    /* x28, r12_fiq */
> >> +    uint64_t x29;    /* fp, sp_fiq */
> >> +    uint64_t x30;    /* lr, lr_fiq */
> >
> > Please use "uint64_t x[31];" and some loops.
> >
> Register multiplexing is very confusing when 32-bit VM is running on a 
> 64-bit Xen. Even though your request is reasonable, I still consider 
> expanding the register names here helpful. At least it gives me space to 
> add comments to show the register mapping layout. I believe others will 
> find it useful as well.

I am fine with what you've got, although matching the names used in the
internal struct cpu_user_regs for x29 and x30 would be nice perhaps.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-08 21:18 ` [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer Wei Huang
  2014-05-08 23:02   ` Andrew Cooper
@ 2014-05-11  8:58   ` Julien Grall
  2014-05-12  8:35     ` Ian Campbell
  2014-05-14 11:14   ` Ian Campbell
  2 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-11  8:58 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

Thank you for the patch.

On 08/05/14 22:18, Wei Huang wrote:
> +    switch ( ctxt.type )
> +    {
> +    case ARM_TIMER_TYPE_PHYS:
> +        t = &v->arch.phys_timer;
> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
> +        break;
> +    case ARM_TIMER_TYPE_VIRT:
> +        t = &v->arch.virt_timer;
> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;

It seems you forgot to address some of my comments on the timer 
save/restore.

Saving {virt,phys}_timer_base.offset per VCPU rather than per-domain 
seems a waste of space and confusing.

Furthermore, this offset is used to get a relative offset from the Xen 
timer counter. Migrating the guess to another server will invalidate the 
value.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-08 23:02   ` Andrew Cooper
@ 2014-05-11  9:01     ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-11  9:01 UTC (permalink / raw)
  To: Andrew Cooper, Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, tim,
	jaeyong.yoo, jbeulich, yjhyun.yoo

On 09/05/14 00:02, Andrew Cooper wrote:
>> +            }
>> +
>> +            if ( (rc = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
>> +                return rc;
>> +
>> +            t = &v->arch.phys_timer;
>
> This updating of t looks suspect and fragile.
>
> This is a good approximation of the "for case" programming paradigm
> (http://thedailywtf.com/Comments/The_FOR-CASE_paradigm.aspx).
>
> There are only two timers and they refer to different named items inside
> struct domain.  It would be clearer to remove the loop.

I agree with Andrew. I've already made a similar comment on v2...

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-08 21:18 ` [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers Wei Huang
  2014-05-08 23:10   ` Andrew Cooper
@ 2014-05-11  9:06   ` Julien Grall
  2014-05-14 11:16     ` Ian Campbell
  2014-05-14 11:37   ` Ian Campbell
  2 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-11  9:06 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

Thank you for the patch.

On 08/05/14 22:18, Wei Huang wrote:
> This patch implements a save/resore support for ARM guest core
> registers.


The commit 893256f "xen/arm: Correctly save/restore CNTKCTL_EL1" 
save/restore a new register during the context switch.

I think you forgot to add it in this patch.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 0/6] xen/arm: ARM save/restore/migration support
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
                   ` (5 preceding siblings ...)
  2014-05-08 21:18 ` [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration Wei Huang
@ 2014-05-11  9:23 ` Julien Grall
  2014-05-12 14:37   ` Wei Huang
  2014-05-12 14:17 ` Julien Grall
  7 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-11  9:23 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

On 08/05/14 22:18, Wei Huang wrote:
> The following patches enable save/restore/migration support for ARM
> guest VMs. Note that the original series were sent from Jaeyong Yoo.
>
> Working:
>     * 32-bit (including SMP) guest VM save/restore/migration
>     * 64-bit guest VM save
> WIP:
>     * 64-bit guest restore/migration

I saw a couple of comments about the memory restriction (e.g 8Go from 
the bitmap...). What are those restriction? Can you make clear on error 
message for the user?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
  2014-05-08 23:46   ` Andrew Cooper
@ 2014-05-11 15:28   ` Julien Grall
  2014-05-12 14:00     ` Wei Huang
  2014-05-14 11:57     ` Ian Campbell
  2014-05-14 13:18   ` Ian Campbell
  2014-05-16 10:59   ` Julien Grall
  3 siblings, 2 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-11 15:28 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

Thank you for the patch. It seems you didn't address most of my comment
I made on V2.

I will try to repeat all of them here. Next time, please read email
on the previous version before sending a new one.

On 08/05/14 22:18, Wei Huang wrote:
> This patch implements log_dirty for ARM guest VMs. This feature
> is provided via two basic blocks: dirty_bit_map and VLPT
> (virtual-linear page table)
>
> 1. VLPT provides fast accessing of 3rd PTE of guest P2M.
> When creating a mapping for VLPT, the page table mapping
> becomes the following:
>     xen's 1st PTE --> xen's 2nd PTE --> guest p2m's 2nd PTE -->
>     guest p2m's 3rd PTE
>
> With VLPT, xen can immediately locate the 3rd PTE of guest P2M
> and modify PTE attirbute during dirty page tracking. The following

attribute

> link shows the performance comparison for handling a dirty-page
> between VLPT and typical page table walking.
> http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html
>
> For more info about VLPT, please see
> http://www.technovelty.org/linux/virtual-linear-page-table.html.
>
> 2. Dirty bitmap
> The dirty bitmap is used to mark the pages which are dirty during
> migration. The info is used by Xen tools, via DOMCTL_SHADOW_OP_*,
> to figure out which guest pages need to be resent.
>
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>   xen/arch/arm/domain.c           |    6 +
>   xen/arch/arm/domctl.c           |   31 +++-
>   xen/arch/arm/mm.c               |  298 ++++++++++++++++++++++++++++++++++++++-

As said on v2, the functions you have added is P2M related not Xen memory
related. I think they should  be moved in p2m.c

[..]

> +/* Return start and end addr of guest RAM. Note this function only reports
> + * regular RAM. It does not cover other areas such as foreign mapped
> + * pages or MMIO space. */
> +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end)
> +{
> +    if ( start )
> +        *start = GUEST_RAM_BASE;
> +
> +    if ( end )
> +        *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages << PAGE_SHIFT);
> +}

As said on V1 this solution won't work.

Ian plans to add multiple banks support for the guest very soon. With this
solution there is a 1GB hole between the 2 banks. Your function will therefore
stop to work.

Furthermore, Xen should not assume that the layout of the guest will always start
at GUEST_RAM_BASE.

I think you can use max_mapped_pfn and lowest_mapped_pfn here. You may need to
modify a bit the signification of it in the p2m code or introduce a new field.

[..]

> +/* Allocate dirty bitmap resource */
> +static int bitmap_init(struct domain *d)
> +{
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    int nr_bytes;
> +    int nr_pages;
> +    int i;
> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +
> +    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
> +    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> +    BUG_ON(nr_pages > MAX_DIRTY_BITMAP_PAGES);

AFAIU, you will crash Xen is the user is trying to migrate a guest with more than 8GB of RAM, right?

If so, you should instead return an error.

[..]

> +    /* Check if reserved space is enough to cover guest physical address space.
> +     * Note that each LPAE page table entry is 64-bit (8 bytes). So we only
> +     * shift left with LPAE_SHIFT instead of PAGE_SHIFT. */
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> +    if ( required > avail )

What is the maximum amount of RAM a guest can use if we want to migrate it?

> +    {
> +        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest (avail: "
> +                "%#llx, required: %#llx)\n", (unsigned long long)avail,
> +                (unsigned long long)required);

You don't need cast here.

> +        return -ENOMEM;
> +    }
> +
> +    /* Caulculate the base of 2nd linear table base for VIRT_LIN_P2M_START */

Calculate

> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
> +    {

In which case this thing happen?

> +        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
> +        return -ENOMEM;
> +    }
> +
> +    /* Two pages are allocated to backup the related PTE content of guest
> +     * VM's 1st-level table. */
> +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> +    if ( second_lvl_page == NULL )
> +        return -ENOMEM;
> +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page) );
> +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page+1) );

map_domain_page_global can fail.

> +
> +    /* 1st level P2M of guest VM is 2 consecutive pages */
> +    first[0] = __map_domain_page(p2m->first_level);
> +    first[1] = __map_domain_page(p2m->first_level+1);
> +
> +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> +
> +        /* Update 2nd-level PTE of Xen linear table. With this, Xen linear
> +         * page table layout becomes: 1st Xen linear ==> 2nd Xen linear ==>
> +         * 2nd guest P2M (i.e. 3rd Xen linear) ==> 3rd guest P2M (i.e. Xen
> +         * linear content) for VIRT_LIN_P2M_START address space. */
> +        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);

Space around binary operator.

[..]

> +/* Restore the xen page table for vlpt mapping for domain */
> +void log_dirty_restore(struct domain *d)
> +{
> +    int i;
> +
> +    /* Nothing to do as log dirty mode is off */
> +    if ( !(d->arch.dirty.mode) )

Your VLPT implementation uses xen_second, which is shared between every pCPU.
This will restrict to migrate only one guest at the time. Therefore restoring
the VLPT seems pointless.

In another hand, I didn't see anything in your patch series which prevent this
case. You will likely corrupt one (if not both) guest.

You have to create per-VCPU mapping for your VLPT solution.

> +        return;
> +
> +    dsb(sy);

I think inner-shareable (ish) is enough here.

> +
> +    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
> +          ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
> +        {
> +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
> +            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
> +        }
> +    }
> +
> +    dsb(sy);

Same here.

> +    isb();
> +}
> +
> +/* Turn on log dirty */
> +int log_dirty_on(struct domain *d)
> +{
> +    if ( vlpt_init(d) || bitmap_init(d) )

You have to call vlpt_cleanup if bitmap_init fail. Otherwise,
will let some page mapped via domain global.

> +        return -EINVAL;
> +
> +    return 0;
> +}
> +
> +/* Turn off log dirty */
> +void log_dirty_off(struct domain *d)
> +{
> +    bitmap_cleanup(d);
> +    vlpt_cleanup(d);
> +}
> +
> +/* Initialize log dirty fields */
> +int log_dirty_init(struct domain *d)

You don't check the return in arch_domain_create. Therefore,
your return value should be void.

> +{
> +    d->arch.dirty.count = 0;
> +    d->arch.dirty.mode = 0;
> +    spin_lock_init(&d->arch.dirty.lock);
> +
> +    d->arch.dirty.second_lvl_start = 0;
> +    d->arch.dirty.second_lvl_end = 0;
> +    d->arch.dirty.second_lvl[0] = NULL;
> +    d->arch.dirty.second_lvl[1] = NULL;
> +
> +    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
> +    d->arch.dirty.bitmap_pages = 0;
> +
> +    return 0;
> +}
> +
> +/* Log dirty tear down */
> +void log_dirty_teardown(struct domain *d)
> +{

I think nothing prevents to destroy a domain with log dirty on.
You should release all resources you've allocated for this domain.
Otherwise, Xen will leak memory.

> +    return;
> +}
> +
>   /*
>    * Local variables:
>    * mode: C
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 603c097..0808cc9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -6,6 +6,8 @@
>   #include <xen/bitops.h>
>   #include <asm/flushtlb.h>
>   #include <asm/gic.h>
> +#include <xen/guest_access.h>
> +#include <xen/pfn.h>
>   #include <asm/event.h>
>   #include <asm/hardirq.h>
>   #include <asm/page.h>
> @@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
>           break;
>
>       case p2m_ram_ro:
> +    case p2m_ram_logdirty:
>           e.p2m.xn = 0;
>           e.p2m.write = 0;
>           break;
> @@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
>
>       pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
>
> +    /* mark the write bit (page table's case, ro bit) as 0
> +     * so, it is writable in case of vlpt access */

mark the entry as write-only?

> +    pte.pt.ro = 0;
> +
>       write_pte(entry, pte);
>
>       return 0;
> @@ -696,6 +703,203 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
>       return p >> PAGE_SHIFT;
>   }
>
> +/* Change types across all p2m entries in a domain */
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt)
> +{

Can't you reuse apply_p2m_changes? I'm also concerned about preemption.
This function might be very long to run (depending on the size of the memory).

> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    paddr_t ram_base;
> +    int i1, i2, i3;
> +    int first_index, second_index, third_index;
> +    lpae_t *first = __map_domain_page(p2m->first_level);
> +    lpae_t pte, *second = NULL, *third = NULL;
> +
> +    domain_get_ram_range(d, &ram_base, NULL);
> +
> +    first_index = first_table_offset((uint64_t)ram_base);
> +    second_index = second_table_offset((uint64_t)ram_base);
> +    third_index = third_table_offset((uint64_t)ram_base);
> +
> +    BUG_ON(!first);

__map_domain_page always return a valid pointer. This BUG_ON is pointless.

> +    spin_lock(&p2m->lock);
> +
> +    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
> +    {
> +        lpae_walk_t first_pte = first[i1].walk;
> +        if ( !first_pte.valid || !first_pte.table )
> +            goto out;
> +
> +        second = map_domain_page(first_pte.base);
> +        BUG_ON(!second);

Same remark as BUG_ON(!first).

> +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> +        {
> +            lpae_walk_t second_pte = second[i2].walk;
> +
> +            if ( !second_pte.valid || !second_pte.table )
> +                goto out;

With Ian's multiple bank support, the RAM region (as returned by domain_get_range)
can contain a hole. Rather than leaving the loop, you should continue.

> +
> +            third = map_domain_page(second_pte.base);
> +            BUG_ON(!third);

Same remark as BUG_ON(!third).

> +
> +            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
> +            {
> +                lpae_walk_t third_pte = third[i3].walk;
> +
> +                if ( !third_pte.valid )
> +                    goto out;
> +
> +                pte = third[i3];
> +
> +                if ( nt == mg_ro )

I would use a switch case for nt. It will be clearer and easier to extend.

> +                {
> +                    if ( pte.p2m.write == 1 )

We only want to trap read-write RAM region. Your solution may also trap MMIO, which is completely wrong.

I would replace by if ( pte.p2m.type == p2m_ram_rw ).

> +                    {
> +                        pte.p2m.write = 0;
> +                        pte.p2m.type = p2m_ram_logdirty;
> +                    }
> +                    else
> +                    {
> +                        /* reuse avail bit as an indicator of 'actual'
> +                         * read-only */

Spurious comment?

> +                        pte.p2m.type = p2m_ram_rw;

Why do you unconditionally change the type?

> +                    }
> +                }
> +                else if ( nt == mg_rw )
> +                {
> +                    if ( pte.p2m.write == 0 &&
> +                         pte.p2m.type == p2m_ram_logdirty )

Can you add a comment to say what does it mean?

> +                    {
> +                        pte.p2m.write = p2m_ram_rw;

Wrong field?

> +                    }
> +                }
> +                write_pte(&third[i3], pte);
> +            }
> +            unmap_domain_page(third);
> +
> +            third = NULL;
> +            third_index = 0;
> +        }
> +        unmap_domain_page(second);
> +
> +        second = NULL;
> +        second_index = 0;
> +        third_index = 0;
> +    }
> +
> +out:
> +    flush_tlb_all_local();

You want to flush the P2M on every CPU and only for the current VMID.

I've introduced a function flush_tlb_domain to do the job for you.
I haven't yet send the patch (see [1] at the end of the mail).

> +    if ( third ) unmap_domain_page(third);
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +}
> +
> +/* Read a domain's log-dirty bitmap and stats. If the operation is a CLEAN,
> + * clear the bitmap and stats. */
> +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    int peek = 1;
> +    int i;
> +    int bitmap_size;
> +    paddr_t gma_start, gma_end;
> +
> +    /* this hypercall is called from domain 0, and we don't know which guest's

What does prevent to call this hypercall from another domain than 0?

> +     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
> +    log_dirty_restore(d);
> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +    bitmap_size = (gma_end - gma_start) / 8;
> +
> +    if ( guest_handle_is_null(sc->dirty_bitmap) )
> +    {
> +        peek = 0;

Hrrrm... you set peek to but never use it.

> +    }

Bracket are not necessary here.

> +    else
> +    {
> +        spin_lock(&d->arch.dirty.lock);
> +
> +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )

Why not i++?

> +        {
> +            int j = 0;
> +            uint8_t *bitmap;
> +
> +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> +                                 d->arch.dirty.bitmap[i],
> +                                 bitmap_size < PAGE_SIZE ? bitmap_size :
> +                                                           PAGE_SIZE);

Where do you check sc->dirty_bitmap as enough space to store the guest dirty bitmap?

You also forget to update sc->pages.

> +            bitmap_size -= PAGE_SIZE;
> +
> +            /* set p2m page table read-only */
> +            bitmap = d->arch.dirty.bitmap[i];
> +            while ((j = find_next_bit((const long unsigned int *)bitmap,
> +                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
> +            {
> +                lpae_t *vlpt;
> +                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +
> +                    (j << PAGE_SHIFT);
> +                vlpt = vlpt_get_3lvl_pte(addr);
> +                vlpt->p2m.write = 0;
> +                j++;
> +            }
> +        }
> +
> +        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
> +        {

I suspect this if is in the wrong place. I think, when sc->op is equal to XEN_DOMCTL_SHADOW_OP_CLEAN, the buffer is NULL.

Clean should also clean d->arch.dirty.count ...

> +            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +            {
> +                clear_page(d->arch.dirty.bitmap[i]);
> +            }

Bracket are not necessary here.

[..]

> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index df4d375..b652565 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1556,6 +1556,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>       struct hsr_dabt dabt = hsr.dabt;
>       int rc;
>       mmio_info_t info;
> +    int page_fault = ( dabt.write && ((dabt.dfsc & FSC_MASK) ==
> +                                      (FSC_FLT_PERM|FSC_3RD_LEVEL)) );
>
>       if ( !check_conditional_instr(regs, hsr) )
>       {
> @@ -1577,6 +1579,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>       if ( rc == -EFAULT )
>           goto bad_data_abort;
>
> +    /* domU page fault handling for guest live migration. Note that

I would remove domU in the comment.

> +     * dabt.valid can be 0 here */
> +    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
> +    {
> +        /* Do not modify PC as guest needs to repeat memory operation */
> +        return;
> +    }
>       /* XXX: Decode the instruction if ISS is not valid */
>       if ( !dabt.valid )
>           goto bad_data_abort;
> diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
> index ef291ff..f18fae4 100644
> --- a/xen/include/asm-arm/config.h
> +++ b/xen/include/asm-arm/config.h
> @@ -87,6 +87,7 @@
>    *   0  -   8M   <COMMON>
>    *
>    *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> + * 128M - 256M   Virtual-linear mapping to P2M table
>    * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
>    *                    space
>    *
> @@ -124,13 +125,15 @@
>   #define CONFIG_SEPARATE_XENHEAP 1
>
>   #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
> -#define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
> +#define VMAP_VIRT_START        _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START

Should not it be VMAP_VIRT_START - 1?

I would also directly use _AT(vaddr_t,0x0FFFFFFF) to stay consistent with the other *_END define.

>   #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
>   #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
>   #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
>   #define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)
>
> -#define VMAP_VIRT_END    XENHEAP_VIRT_START
> +#define VMAP_VIRT_END          XENHEAP_VIRT_START

Spurious changes?

>   #define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
>
> @@ -157,6 +160,11 @@
>
>   #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
>
> +/* Definition for VIRT_LIN_P2M_START and VIRT_LIN_P2M_END (64-bit)
> + * TODO: Needs evaluation. */

Can you update the layout description for ARM64 above?

> +#define VIRT_LIN_P2M_START     _AT(vaddr_t, 0x08000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
> +
>   #endif
>
>   /* Fixmap slots */
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index aabeb51..ac82643 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -162,6 +162,25 @@ struct arch_domain
>       } vuart;
>
>       unsigned int evtchn_irq;
> +
> +    /* dirty page tracing */
> +    struct {
> +        spinlock_t lock;
> +        volatile int mode;               /* 1 if dirty pages tracing enabled */
> +        volatile unsigned int count;     /* dirty pages counter */
> +
> +        /* vlpt context switch */
> +        volatile int second_lvl_start; /* start idx of virt linear space 2nd */
> +        volatile int second_lvl_end;   /* end idx of virt linear space 2nd */

Why this 4 fields must be volatile?

> +        lpae_t *second_lvl[2];         /* copy of guest P2M 1st-lvl content */
> +
> +        /* bitmap to track dirty pages */
> +#define MAX_DIRTY_BITMAP_PAGES 64
> +        /* Because each bit represents a dirty page, the total supported guest
> +         * memory is (64 entries x 4KB/entry x 8bits/byte x 4KB) = 8GB. */
> +        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES]; /* dirty bitmap */
> +        int bitmap_pages;                        /* # of dirty bitmap pages */
> +    } dirty;
>   }  __cacheline_aligned;
>
>   struct arch_vcpu
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index b8d4e7d..ab19025 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -4,6 +4,7 @@
>   #include <xen/config.h>
>   #include <xen/kernel.h>
>   #include <asm/page.h>
> +#include <asm/config.h>
>   #include <public/xen.h>
>
>   /* Align Xen to a 2 MiB boundary. */
> @@ -320,6 +321,7 @@ int donate_page(
>   #define domain_clamp_alloc_bitsize(d, b) (b)
>
>   unsigned long domain_get_maximum_gpfn(struct domain *d);
> +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end);
>   extern struct domain *dom_xen, *dom_io, *dom_cow;
>
> @@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page)
>       put_page(page);
>   }
>
> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };

Please describe this enum. Also mg is too generic.

> +/************************************/
> +/*    Log-dirty support functions   */
> +/************************************/
> +int log_dirty_on(struct domain *d);
> +void log_dirty_off(struct domain *d);
> +int log_dirty_init(struct domain *d);
> +void log_dirty_teardown(struct domain *d);
> +void log_dirty_restore(struct domain *d);
> +int handle_page_fault(struct domain *d, paddr_t addr);
> +/* access leaf PTE of a given guest address (GPA) */
> +static inline lpae_t * vlpt_get_3lvl_pte(paddr_t addr)
> +{
> +    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
> +
> +    /* Since we slotted the guest's first p2m page table to xen's
> +     * second page table, one shift is enough for calculating the
> +     * index of guest p2m table entry */
> +    return &table[addr >> PAGE_SHIFT];
> +}

These functions should be part of p2m.h, not mm.h

> @@ -41,6 +42,7 @@ typedef enum {
>       p2m_invalid = 0,    /* Nothing mapped here */
>       p2m_ram_rw,         /* Normal read/write guest RAM */
>       p2m_ram_ro,         /* Read-only; writes are silently dropped */
> +    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */

You should at the new type at the end of the enum.

>       p2m_mmio_direct,    /* Read/write mapping of genuine MMIO area */
>       p2m_map_foreign,    /* Ram pages from foreign domain */
>       p2m_grant_map_rw,   /* Read/write grant mapping */
> @@ -49,7 +51,8 @@ typedef enum {
>   } p2m_type_t;

Regards,


[1]

commit cebfd170dc13791df95fbb120c5894f0960e2804
Author: Julien Grall <julien.grall@linaro.org>
Date:   Mon Apr 28 16:34:21 2014 +0100

    xen/arm: Introduce flush_tlb_domain
    
    The pattern p2m_load_VTTBR(d) -> flush_tlb -> p2m_load_VTTBR(current->domain)
    is used in few places.
    
    Replace this usage by flush_tlb_domain which will take care of this pattern.
    This will help to the lisibility of apply_p2m_changes which begin to be big.
    
    Signed-off-by: Julien Grall <julien.grall@linaro.org>

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 603c097..61450cf 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -72,6 +72,21 @@ void p2m_restore_state(struct vcpu *n)
     isb();
 }
 
+void flush_tlb_domain(struct domain *d)
+{
+    /* Update the VTTBR if necessary with the domain d. In this case,
+     * it's only necessary to flush TLBs on every CPUs with the current VMID
+     * (our domain).
+     */
+    if ( d != current->domain )
+        p2m_load_VTTBR(d);
+
+    flush_tlb();
+
+    if ( d != current->domain )
+        p2m_load_VTTBR(current->domain);
+}
+
 static int p2m_first_level_index(paddr_t addr)
 {
     /*
@@ -450,19 +465,7 @@ static int apply_p2m_changes(struct domain *d,
     }
 
     if ( flush )
-    {
-        /* Update the VTTBR if necessary with the domain where mappings
-         * are created. In this case it's only necessary to flush TLBs
-         * on every CPUs with the current VMID (our domain).
-         */
-        if ( d != current->domain )
-            p2m_load_VTTBR(d);
-
-        flush_tlb();
-
-        if ( d != current->domain )
-            p2m_load_VTTBR(current->domain);
-    }
+        flush_tlb_domain(d);
 
     if ( op == ALLOCATE || op == INSERT )
     {
@@ -550,14 +553,10 @@ int p2m_alloc_table(struct domain *d)
     d->arch.vttbr = page_to_maddr(p2m->first_level)
         | ((uint64_t)p2m->vmid&0xff)<<48;
 
-    p2m_load_VTTBR(d);
-
     /* Make sure that all TLBs corresponding to the new VMID are flushed
      * before using it
      */
-    flush_tlb();
-
-    p2m_load_VTTBR(current->domain);
+    flush_tlb_domain(d);
 
     spin_unlock(&p2m->lock);
 
diff --git a/xen/include/asm-arm/flushtlb.h b/xen/include/asm-arm/flushtlb.h
index 329fbb4..5722c67 100644
--- a/xen/include/asm-arm/flushtlb.h
+++ b/xen/include/asm-arm/flushtlb.h
@@ -25,6 +25,9 @@ do {                                                                    \
 /* Flush specified CPUs' TLBs */
 void flush_tlb_mask(const cpumask_t *mask);
 
+/* Flush CPU's TLBs for the speficied domain */
+void flush_tlb_domain(struct domain *d);
+
 #endif /* __ASM_ARM_FLUSHTLB_H__ */
 /*
  * Local variables:


-- 
Julien Grall

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-09 14:24       ` Ian Campbell
@ 2014-05-11 16:15         ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-11 16:15 UTC (permalink / raw)
  To: Ian Campbell, Wei Huang
  Cc: keir, stefano.stabellini, Andrew Cooper, tim, jaeyong.yoo,
	xen-devel, jbeulich, ian.jackson, yjhyun.yoo

Hi,

On 09/05/14 15:24, Ian Campbell wrote:
> On Fri, 2014-05-09 at 09:12 -0500, Wei Huang wrote:
>> On 05/08/2014 05:47 PM, Andrew Cooper wrote:
>>>> +DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
>>>> +
>>>>    /*
>>>>     * Largest type-code in use
>>>>     */
>>>> -#define HVM_SAVE_CODE_MAX 1
>>>> +#define HVM_SAVE_CODE_MAX 3
>>>>
>>>>    #endif
>>>>
>>>
>>> On x86, we require that HVM save records only contain architectural
>>> state.  Not knowing arm myself, it is not clear from your comments
>>> whether this is the case or not.  Can you confirm whether it is or not?
>> Most states are guest architecture states which include core registers,
>> arch timer, memory. GIC states are arguable, given that Xen uses data
>> structures (e.g. struct vgic_irq_rank) to represent GIC states internally.
>
> (note: I've not looked at this series for ages, I plan to look at this
> new version next week)
>
> The contents of vgic_irq_rank is still a set of architectural register,
> I think (the rank thing is just to account for the fact that some
> registers use 1 bit to describe 32-registers, some use 2 bits to
> describe 16, etc).

Correct, the vgic_irq_rank should be saved entirely. It represents the 
guest view of the GIC state (such as the priorities, the routing... of 
an IRQ).

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-11  8:58   ` Julien Grall
@ 2014-05-12  8:35     ` Ian Campbell
  2014-05-12 11:42       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-12  8:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Sun, 2014-05-11 at 09:58 +0100, Julien Grall wrote:
> Hi Wei,
> 
> Thank you for the patch.
> 
> On 08/05/14 22:18, Wei Huang wrote:
> > +    switch ( ctxt.type )
> > +    {
> > +    case ARM_TIMER_TYPE_PHYS:
> > +        t = &v->arch.phys_timer;
> > +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
> > +        break;
> > +    case ARM_TIMER_TYPE_VIRT:
> > +        t = &v->arch.virt_timer;
> > +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
> 
> It seems you forgot to address some of my comments on the timer 
> save/restore.
> 
> Saving {virt,phys}_timer_base.offset per VCPU rather than per-domain 
> seems a waste of space and confusing.

Is it 100% inconceivable that two vcpus might some day have different
timers?

> Furthermore, this offset is used to get a relative offset from the Xen 
> timer counter. Migrating the guess to another server will invalidate the 
> value.

The correct architectural state to migrate is the time stamp as the
guest sees it, i.e. with the offset already applied. The receiving end
would then need to recalculate the correct offset based on its local
timer count.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-12  8:35     ` Ian Campbell
@ 2014-05-12 11:42       ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-12 11:42 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/12/2014 09:35 AM, Ian Campbell wrote:
> On Sun, 2014-05-11 at 09:58 +0100, Julien Grall wrote:
>> Hi Wei,
>>
>> Thank you for the patch.
>>
>> On 08/05/14 22:18, Wei Huang wrote:
>>> +    switch ( ctxt.type )
>>> +    {
>>> +    case ARM_TIMER_TYPE_PHYS:
>>> +        t = &v->arch.phys_timer;
>>> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
>>> +        break;
>>> +    case ARM_TIMER_TYPE_VIRT:
>>> +        t = &v->arch.virt_timer;
>>> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
>>
>> It seems you forgot to address some of my comments on the timer 
>> save/restore.
>>
>> Saving {virt,phys}_timer_base.offset per VCPU rather than per-domain 
>> seems a waste of space and confusing.
> 
> Is it 100% inconceivable that two vcpus might some day have different
> timers?

By timers, did you mean did different boot offset?

AFAIU, the timer counter (CNTP) is common to every {p,v{CPUs and
contains the number of ticks from the start time.

If some days we want to have two vcpus with different offfset, then we
can extend the format.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-11 15:28   ` Julien Grall
@ 2014-05-12 14:00     ` Wei Huang
  2014-05-12 14:11       ` Julien Grall
  2014-05-14 11:57     ` Ian Campbell
  1 sibling, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-12 14:00 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

On 05/11/2014 10:28 AM, Julien Grall wrote:
>   the guest very soon. With this
> solution there is a 1GB hole between the 2 banks. Your function will therefore
> stop to work.
>
> Furthermore, Xen should not assume that the layout of the guest will always start
> at GUEST_RAM_BASE.
>
> I think you can use max_mapped_pfn and lowest_mapped_pfn here. You may need to
> modify a bit the signification of it in the p2m code or introduce a new field.
These two values don't work for the purpose of dirty tracking. What we 
need are two fields to track _physical_ memory. lowest_mapped_pfn tracks 
all memory types, which doesn't apply here.

Will new fields to track physical ram space acceptable for you?

-Wei

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-12 14:00     ` Wei Huang
@ 2014-05-12 14:11       ` Julien Grall
  2014-05-14 12:04         ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-12 14:11 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

On 05/12/2014 03:00 PM, Wei Huang wrote:
> On 05/11/2014 10:28 AM, Julien Grall wrote:
>>   the guest very soon. With this
>> solution there is a 1GB hole between the 2 banks. Your function will
>> therefore
>> stop to work.
>>
>> Furthermore, Xen should not assume that the layout of the guest will
>> always start
>> at GUEST_RAM_BASE.
>>
>> I think you can use max_mapped_pfn and lowest_mapped_pfn here. You may
>> need to
>> modify a bit the signification of it in the p2m code or introduce a
>> new field.
> These two values don't work for the purpose of dirty tracking. What we
> need are two fields to track _physical_ memory. lowest_mapped_pfn tracks
> all memory types, which doesn't apply here.
> 
> Will new fields to track physical ram space acceptable for you?

I'm fine with new fields to track physical RAM space. Don't forget it
may have non-RAM hole in this range.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 0/6] xen/arm: ARM save/restore/migration support
  2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
                   ` (6 preceding siblings ...)
  2014-05-11  9:23 ` [RFC v3 0/6] xen/arm: ARM save/restore/migration support Julien Grall
@ 2014-05-12 14:17 ` Julien Grall
  2014-05-12 14:52   ` Wei Huang
  7 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-12 14:17 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

On 05/08/2014 10:18 PM, Wei Huang wrote:
> The following patches enable save/restore/migration support for ARM
> guest VMs. Note that the original series were sent from Jaeyong Yoo.
> 
> Working:
>    * 32-bit (including SMP) guest VM save/restore/migration
>    * 64-bit guest VM save
> WIP:
>    * 64-bit guest restore/migration 

I though a bit more about this series. I think we miss to save the
layout of the guest (i.e GIC base address, gnttab base address,...) and
some interrupts (i.e timer, evtchn,...)

It's necessary because if we migrate from a older Xen to a new Xen, the
default layout may have change.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 0/6] xen/arm: ARM save/restore/migration support
  2014-05-11  9:23 ` [RFC v3 0/6] xen/arm: ARM save/restore/migration support Julien Grall
@ 2014-05-12 14:37   ` Wei Huang
  2014-05-13 14:41     ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-12 14:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

On 05/11/2014 04:23 AM, Julien Grall wrote:
> Hi Wei,
>
> On 08/05/14 22:18, Wei Huang wrote:
>> The following patches enable save/restore/migration support for ARM
>> guest VMs. Note that the original series were sent from Jaeyong Yoo.
>>
>> Working:
>>     * 32-bit (including SMP) guest VM save/restore/migration
>>     * 64-bit guest VM save
>> WIP:
>>     * 64-bit guest restore/migration
>
> I saw a couple of comments about the memory restriction (e.g 8Go from
> the bitmap...). What are those restriction? Can you make clear on error
> message for the user?
>
OK, I will add more informative error message to the failure. Do you 
want me to add the details in cover page too?
> Regards,
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 0/6] xen/arm: ARM save/restore/migration support
  2014-05-12 14:17 ` Julien Grall
@ 2014-05-12 14:52   ` Wei Huang
  2014-05-12 15:01     ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Wei Huang @ 2014-05-12 14:52 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

On 05/12/2014 09:17 AM, Julien Grall wrote:
> Hi Wei,
>
> On 05/08/2014 10:18 PM, Wei Huang wrote:
>> The following patches enable save/restore/migration support for ARM
>> guest VMs. Note that the original series were sent from Jaeyong Yoo.
>>
>> Working:
>>     * 32-bit (including SMP) guest VM save/restore/migration
>>     * 64-bit guest VM save
>> WIP:
>>     * 64-bit guest restore/migration
>
> I though a bit more about this series. I think we miss to save the
> layout of the guest (i.e GIC base address, gnttab base address,...) and
> some interrupts (i.e timer, evtchn,...)
>
> It's necessary because if we migrate from a older Xen to a new Xen, the
> default layout may have change.
>
Should we make these requirements as phase 2 for save/restore/migration? 
I think we have enough items to fix in current form. Adding more 
requirements will make the whole patch set bloated and hard to revise.

-Wei

> Regards,
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 0/6] xen/arm: ARM save/restore/migration support
  2014-05-12 14:52   ` Wei Huang
@ 2014-05-12 15:01     ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-12 15:01 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, Julien Grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Mon, 2014-05-12 at 09:52 -0500, Wei Huang wrote:
> On 05/12/2014 09:17 AM, Julien Grall wrote:
> > Hi Wei,
> >
> > On 05/08/2014 10:18 PM, Wei Huang wrote:
> >> The following patches enable save/restore/migration support for ARM
> >> guest VMs. Note that the original series were sent from Jaeyong Yoo.
> >>
> >> Working:
> >>     * 32-bit (including SMP) guest VM save/restore/migration
> >>     * 64-bit guest VM save
> >> WIP:
> >>     * 64-bit guest restore/migration
> >
> > I though a bit more about this series. I think we miss to save the
> > layout of the guest (i.e GIC base address, gnttab base address,...) and
> > some interrupts (i.e timer, evtchn,...)
> >
> > It's necessary because if we migrate from a older Xen to a new Xen, the
> > default layout may have change.
> >
> Should we make these requirements as phase 2 for save/restore/migration? 
> I think we have enough items to fix in current form. Adding more 
> requirements will make the whole patch set bloated and hard to revise.

It would probably be reasonably easy to add the save bit and to just
assert on restore that the address is the one we support.

But equally it would be possible to default to the current value on
restore if no contrary save record is found, meaning we can fix it then.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 0/6] xen/arm: ARM save/restore/migration support
  2014-05-12 14:37   ` Wei Huang
@ 2014-05-13 14:41     ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-13 14:41 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

On 05/12/2014 03:37 PM, Wei Huang wrote:
> OK, I will add more informative error message to the failure. Do you
> want me to add the details in cover page too?

Yes please.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 22:47   ` Andrew Cooper
  2014-05-09 14:12     ` Wei Huang
@ 2014-05-13 14:53     ` Wei Huang
  1 sibling, 0 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-13 14:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	julien.grall, tim, jaeyong.yoo, jbeulich, yjhyun.yoo

>> +
>> +/* Info for hypervisor to manage guests (per-vcpu)
>> + *   - Based on GICv2
>> + *   - Mainly store registers of GICH_*
>> + */
>> +struct hvm_arm_gich_v2
>> +{
>> +    uint32_t gic_hcr;
>> +    uint32_t gic_vmcr;
>> +    uint32_t gic_apr;
>> +    uint32_t gic_lr[64];
>> +    uint64_t event_mask;
>> +    uint64_t lr_mask;
>
> This has an odd number of uint32_t.  I suspect it will end up with a
> different structure size between a 32 and 64 bit build of Xen.
>
I will add a padding field to make all structures 64-bit aligned. Let me 
know if this isn't what you want.

-Wei

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 22:11   ` Andrew Cooper
  2014-05-08 22:20     ` Wei Huang
@ 2014-05-14 10:25     ` Ian Campbell
  2014-05-14 10:46       ` Andrew Cooper
  1 sibling, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 10:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Huang, keir, stefano.stabellini, ian.jackson, julien.grall,
	tim, jaeyong.yoo, xen-devel, jbeulich, yjhyun.yoo

On Thu, 2014-05-08 at 23:11 +0100, Andrew Cooper wrote:
> > diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> > index 75b8e65..8312e7b 100644
> > --- a/xen/include/public/arch-arm/hvm/save.h
> > +++ b/xen/include/public/arch-arm/hvm/save.h
> > @@ -3,6 +3,7 @@
> >   * be saved along with the domain's memory and device-model state.
> >   *
> >   * Copyright (c) 2012 Citrix Systems Ltd.
> > + * Copyright (c) 2014 Samsung Electronics.
> >   *
> >   * Permission is hereby granted, free of charge, to any person obtaining a copy
> >   * of this software and associated documentation files (the "Software"), to
> > @@ -26,6 +27,24 @@
> >  #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
> >  #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
> >  
> > +#define HVM_ARM_FILE_MAGIC   0x92385520
> > +#define HVM_ARM_FILE_VERSION 0x00000001
> > +
> > +/* Note: For compilation purpose hvm_save_header name is the same as x86,
> > + * but layout is different. */
> > +struct hvm_save_header
> > +{
> > +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
> > +    uint32_t version;           /* File format version */
> > +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
> > +};
> > +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
> > +
> > +/*
> > + * Largest type-code in use
> > + */
> > +#define HVM_SAVE_CODE_MAX 1
> > +
> >  #endif
> >  
> >  /*
> 
> Hmm - it is quite poor to have this magically named "hvm_save_header".

We frequently have arch interfaces where generic code requires arch code
to provide particular structs or functions etc. What is poor about this
particular instance of that pattern?

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-09  8:56       ` Julien Grall
@ 2014-05-14 10:27         ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 10:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, tim, keir, stefano.stabellini, Andrew Cooper,
	ian.jackson, jaeyong.yoo, xen-devel, jbeulich, yjhyun.yoo

On Fri, 2014-05-09 at 09:56 +0100, Julien Grall wrote:
> >> This header file isn't touched by any subsequent patches, and just
> >> having it as a list of includes is overkill.  Can it be dropped?  5
> >> extra includes in a few .c files is hardly breaking the bank.
> > Last time I tried it quickly, the compilation broke. Will try again.
> 
> This header is used in common code (xen/common/hvm/save.c). It might be 
> better on have a dummy header.

It's used exactly twice in common code, xen/common/hvm/save.c and
xen/drivers/passthrough/io.c. I'm not sure which functionality from this
header these guys are using but it can't be much (or the arm version
would be bigger).

I think it's probably better to just eliminate this asm include from
common code.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
                     ` (2 preceding siblings ...)
  2014-05-09  9:42   ` Jan Beulich
@ 2014-05-14 10:37   ` Ian Campbell
  2014-05-14 18:54     ` Wei Huang
  3 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 10:37 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> This patch implements a basic framework for ARM guest
> save/restore. It defines a HVM save header for ARM guests
> and correponding arch_ save/load functions. These functions

"corresponding"

> are hooked up with domain control hypercalls (gethvmcontext
> and sethvmcontext). The hypercalls become a common code path to
> both x86 and ARM. As a result of merging, the x86 specific
> header saving code is moved to x86 sub-directory.
> 
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>

Other than the comments already made by others this is looking good to
me.

> [...] 
> +#define HVM_ARM_FILE_MAGIC   0x92385520

OOI where did that number come from? (often these are a few ASCII
characters etc, I'm just curious)

> +#define HVM_ARM_FILE_VERSION 0x00000001
> +
> +/* Note: For compilation purpose hvm_save_header name is the same as x86,
> + * but layout is different. */
> +struct hvm_save_header
> +{
> +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
> +    uint32_t version;           /* File format version */
> +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */

Is the size of this struct the same for arm32 and arm64?

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-14 10:25     ` Ian Campbell
@ 2014-05-14 10:46       ` Andrew Cooper
  2014-05-14 13:22         ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2014-05-14 10:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, keir, stefano.stabellini, ian.jackson, julien.grall,
	tim, jaeyong.yoo, xen-devel, jbeulich, yjhyun.yoo

On 14/05/14 11:25, Ian Campbell wrote:
> On Thu, 2014-05-08 at 23:11 +0100, Andrew Cooper wrote:
>>> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
>>> index 75b8e65..8312e7b 100644
>>> --- a/xen/include/public/arch-arm/hvm/save.h
>>> +++ b/xen/include/public/arch-arm/hvm/save.h
>>> @@ -3,6 +3,7 @@
>>>   * be saved along with the domain's memory and device-model state.
>>>   *
>>>   * Copyright (c) 2012 Citrix Systems Ltd.
>>> + * Copyright (c) 2014 Samsung Electronics.
>>>   *
>>>   * Permission is hereby granted, free of charge, to any person obtaining a copy
>>>   * of this software and associated documentation files (the "Software"), to
>>> @@ -26,6 +27,24 @@
>>>  #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>>  #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>>  
>>> +#define HVM_ARM_FILE_MAGIC   0x92385520
>>> +#define HVM_ARM_FILE_VERSION 0x00000001
>>> +
>>> +/* Note: For compilation purpose hvm_save_header name is the same as x86,
>>> + * but layout is different. */
>>> +struct hvm_save_header
>>> +{
>>> +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
>>> +    uint32_t version;           /* File format version */
>>> +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
>>> +};
>>> +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
>>> +
>>> +/*
>>> + * Largest type-code in use
>>> + */
>>> +#define HVM_SAVE_CODE_MAX 1
>>> +
>>>  #endif
>>>  
>>>  /*
>> Hmm - it is quite poor to have this magically named "hvm_save_header".
> We frequently have arch interfaces where generic code requires arch code
> to provide particular structs or functions etc. What is poor about this
> particular instance of that pattern?
>
> Ian.
>

Save/restore is currently asymmetric in this regard.  The save side
treats this x86 structure as common, whereas load is entirely arch specific.

Fixing the asymmetry sensibly involves pushing the save side into arch code.

~Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
  2014-05-08 22:47   ` Andrew Cooper
  2014-05-09  9:17   ` Julien Grall
@ 2014-05-14 11:07   ` Ian Campbell
  2014-05-14 12:05     ` Julien Grall
  2014-05-15 17:15   ` Julien Grall
  3 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 11:07 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 8312e7b..421a6f6 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -40,10 +40,42 @@ struct hvm_save_header
>  };
>  DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
>  
> +/* Guest's view of GIC distributor (per-vcpu)
> + *   - Based on GICv2 (see "struct vgic_irq_rank")
> + *   - Store guest's view of GIC distributor
> + *   - Only support SGI and PPI for DomU (DomU doesn't handle SPI)
> + */
> +struct hvm_arm_vgicd_v2
> +{
> +    uint32_t ienable;
> +    uint32_t iactive;
> +    uint32_t ipend;
> +    uint32_t pendsgi;
> +    uint32_t icfg[2];
> +    uint32_t ipriority[8];
> +    uint32_t itargets[8];
> +};
> +DECLARE_HVM_SAVE_TYPE(VGICD_V2, 2, struct hvm_arm_vgicd_v2);

This is the state of 32 interrupts. How do you propose to handle more
interrupts than that?

I think it would be sensible to split the domain global state, the
distributor and cpu interface base addresses and sizes and the states of
any SPIs in here and have a separate per-vpcu set of state for the
per-cpu GICD state (SPIs and PPIs mainly).

For the SPI I think you either want to put the above set of state into
an array of size NR_GUEST_INTERRUPTS/32 or better make each of the above
an array based on NR_GUEST_INTERRUPTS.

> +
> +/* Info for hypervisor to manage guests (per-vcpu)
> + *   - Based on GICv2
> + *   - Mainly store registers of GICH_*
> + */
> +struct hvm_arm_gich_v2
> +{
> +    uint32_t gic_hcr;
> +    uint32_t gic_vmcr;
> +    uint32_t gic_apr;
> +    uint32_t gic_lr[64];
> +    uint64_t event_mask;
> +    uint64_t lr_mask;

I don't think you should be saving any GICH state at all. What should be
saved is the corresponding GICC state, i.e. "architectural state" that
is observed by the guest. This might mean pickling stuff from the GICH
state into a GICC form. (I said this wrt the LRs in a previous round of
review)

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-08 21:18 ` [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer Wei Huang
  2014-05-08 23:02   ` Andrew Cooper
  2014-05-11  8:58   ` Julien Grall
@ 2014-05-14 11:14   ` Ian Campbell
  2014-05-14 12:13     ` Julien Grall
  2014-05-14 19:04     ` Wei Huang
  2 siblings, 2 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 11:14 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> This patch implements a save/resore support for ARM architecture

"restore"

> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 421a6f6..8679bfd 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -72,10 +72,24 @@ struct hvm_arm_gich_v2
>  };
>  DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
>  
> +/* Two ARM timers (physical and virtual) are saved */

Do you not need to save CNTFRQ and CNTKCTL?

> +#define ARM_TIMER_TYPE_VIRT  0
> +#define ARM_TIMER_TYPE_PHYS  1
> +#define ARM_TIMER_TYPE_COUNT 2       /* total count */
> +
> +struct hvm_arm_timer
> +{
> +    uint64_t vtb_offset;

As discussed elsewhere I don't think the offset is architectural state.
This should be incorporated into the cval. Otherwise how does the
receiver know what this is an offset from?

> +    uint32_t ctl;
> +    uint64_t cval;
> +    uint32_t type;
> +};
> +DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);

Why not do
DECLARE_HVM_SAVE_TYPE(VTIMER, 4, struct hvm_arm_timer)
DECLARE_HVM_SAVE_TYPE(PTIMER, 5, struct hvm_arm_timer)
and drop the type field?

Or else define hvm_arm_timers with cntfeq, cntkctl and two sets of the
ctl,cval in a single struct.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-11  9:06   ` Julien Grall
@ 2014-05-14 11:16     ` Ian Campbell
  2014-05-14 12:23       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 11:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Sun, 2014-05-11 at 10:06 +0100, Julien Grall wrote:
> Hi Wei,
> 
> Thank you for the patch.
> 
> On 08/05/14 22:18, Wei Huang wrote:
> > This patch implements a save/resore support for ARM guest core
> > registers.
> 
> 
> The commit 893256f "xen/arm: Correctly save/restore CNTKCTL_EL1" 
> save/restore a new register during the context switch.
> 
> I think you forgot to add it in this patch.

I think it would belong in the previous patch with the timer state.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-08 21:18 ` [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers Wei Huang
  2014-05-08 23:10   ` Andrew Cooper
  2014-05-11  9:06   ` Julien Grall
@ 2014-05-14 11:37   ` Ian Campbell
  2 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 11:37 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> This patch implements a save/resore support for ARM guest core
> registers.
> 
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/hvm.c                     |  263 +++++++++++++++++++++++++++++++-
>  xen/include/public/arch-arm/hvm/save.h |  121 ++++++++++++++-
>  2 files changed, 382 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 8679bfd..18e5899 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
[...]

> +    /* return address (EL1 ==> EL0) */
> +    uint64_t elr_el1;    /* elr_el1, NA */
> +    /* return address (EL2 ==> EL1) */
> +    uint64_t pc64;       /* elr_el2, elr_el2 */

The guest name of this register would be pc or r15, not elr_el2 which is
the hyp mode exception return address. (I think the _el2 suffix should
never appear in these structs)

> +    /* spsr registers */
> +    uint32_t spsr_el1;   /* spsr_el1, spsr_svc */
> +    uint32_t spsr_fiq;   /* NA, spsr_fiq */
> +    uint32_t spsr_irq;   /* NA, spsr_irq */
> +    uint32_t spsr_und;   /* NA, spsr_und */
> +    uint32_t spsr_abt;   /* NA, spsr_abt */

I think in general you can replace /* NA, .* */ with "32-bit only"
and /* .*, NA */ with "64-bit only". and omit the comment in cases where
everything matches. e.g.
	uint64_t vbar /* vbar, vbar */

> +
> +    /* stack pointers */
> +    uint64_t sp_el0;     /* sp_el0, NA */
> +    uint64_t sp_el1;     /* sp_el1, NA */
> +
> +    /* guest mode */
> +    uint32_t cpsr;   /* spsr_el2, spsr_el2 */

The guest register is just cpsr I think, spsr_el2 is a hypervisor mode
ism.

> +    /* ======= Guest System Registers =======
> +     *   - multiplexed for AArch32 and AArch64 guests
> +     *   - 64-bit preferred if needed (for 64-bit guests)
> +     *   - architecture specific registers are noted specifically
> +     */
> +    /* exception */
> +    uint64_t vbar;      /* vbar, vbar */

There's a bit of a mixture of v7 and v8 naming here. Can we try to
consistently use the v8 naming (foo_el1/el0) for anything which is v8
only or common and only use v7 naming for 32-bit only registers.

Or maybe we should be trying to use the "General names" as defined in
section J or the ARMv8 ARM? Whatever we do it should be consistent.

(I know Xen internally is a bit confused about this, but please lets try
not to leak that into the public API).

> +    /* mmu related */
> +    uint64_t ttbcr;     /* ttbcr, ttbcr */
> +    uint64_t ttbr0;     /* ttbr0, ttbr0 */
> +    uint64_t ttbr1;     /* ttbr1, ttbr1 */
> +    uint32_t dacr;      /* NA, dacr32 */
> +
> +    uint64_t par;       /* par, par */
> +    uint64_t mair0;     /* mair, mair0 */
> +    uint64_t mair1;     /* NA, mair1 */

mair0 and mair1 are only 32-bit. But actually I think the mapping is
MAIR0==MAIR_EL1[31:0] and MAIR1==MAIR_EL1[63:32], so you should use
that.

> +    /* fault status */
> +    uint32_t ifar;      /* ifar, ifar */
> +    uint32_t ifsr;      /* ifsr, ifsr */
> +    uint32_t dfar;      /* dfar, dfar */
> +    uint32_t dfsr;      /* dfsr, dfsr */
> +
> +    uint64_t far;       /* far, far */
> +    uint64_t esr;       /* esr, esr */

I don't think there is a 32-bit ESR, since 32-bit has all the banked
register CPSR.MODE stuff instead.

I think the 32-bit ifar/dfar ifsr/dfsr have a mapping onto the 64-bit
far etc. Table D1-81 in the ARMv8 ARM spells all that out.

> +
> +    uint32_t afsr0;     /* afsr0, afsr0 */
> +    uint32_t afsr1;     /* afsr1, afsr1 */
> +
> +    /* thumbee and jazelle */
> +    uint32_t teecr;     /* NA, teecr */
> +    uint32_t teehbr;    /* NA, teehbr */
> +
> +    uint32_t joscr;     /* NA, joscr */
> +    uint32_t jmcr;      /* NA, jmcr */
> +
> +    /* control registers */
> +    uint32_t sctlr;     /* sctlr, sctlr */
> +    uint32_t actlr;     /* actlr, actlr */
> +    uint32_t cpacr;     /* cpacr, cpacr */
> +
> +    uint32_t csselr;    /* csselr, csselr */
> +
> +    /* software management related */
> +    uint32_t contextidr;  /* contextidr, contextidr */
> +    uint64_t tpidr_el0;   /* tpidr_el0, tpidr_el0 */
> +    uint64_t tpidr_el1;   /* tpidr_el1, tpidr_el1 */
> +    uint64_t tpidrro_el0; /* tpidrro_el0, tdidrro_el0 */
> +};
> +DECLARE_HVM_SAVE_TYPE(VCPU, 5, struct hvm_arm_cpu);
> +
>  /*
>   * Largest type-code in use
>   */
> -#define HVM_SAVE_CODE_MAX 4
> +#define HVM_SAVE_CODE_MAX 5
>  
>  #endif
>  

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-08 23:46   ` Andrew Cooper
@ 2014-05-14 11:51     ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 11:51 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Huang, keir, stefano.stabellini, ian.jackson, julien.grall,
	tim, jaeyong.yoo, xen-devel, jbeulich, yjhyun.yoo

On Fri, 2014-05-09 at 00:46 +0100, Andrew Cooper wrote:
> > +/* Allocate dirty bitmap resource */
> > +static int bitmap_init(struct domain *d)
> 
> This function name is far too generic.

It might be OK if it were in p2m.c rather than mm.c, or even better
would be to create logdirty.c.

> > +        if ( page == NULL )
> > +            goto cleanup_on_failure;
> > +
> > +        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));
> 
> __map_domain_page_global(page) is your friend, and it can fail so needs
> checking.
> 
> How many pages is this?  global dompage mapping are scarce.

It's not so scarce on ARM32 as on x86, IIRC we have 2GB of domheap
space. Not to say we should be profligate with it, but it is less of a
concern than on x86 I think.

> > +int log_dirty_on(struct domain *d)
> > +{
> > +    if ( vlpt_init(d) || bitmap_init(d) )
> > +        return -EINVAL;
> 
> This hides -ENOMEM from each of the init functions.
> 
> I am a fan of
> 
> return vlpt_init(d) ?: bitmap_init(d);
> 
> As an easy way of chaining a set of functions together if they succeed. 
> Ian on the other hand isn't so I doubt you could get away with it.

:-)

I'd prefer:
	rc = foo_init(d)
	if (rc)
		return rc

I'd also be happy enough with if ((rc = foo_init(d))

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-11 15:28   ` Julien Grall
  2014-05-12 14:00     ` Wei Huang
@ 2014-05-14 11:57     ` Ian Campbell
  2014-05-14 12:20       ` Julien Grall
  1 sibling, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 11:57 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Sun, 2014-05-11 at 16:28 +0100, Julien Grall wrote:

> [..]
> 
> > +/* Return start and end addr of guest RAM. Note this function only reports
> > + * regular RAM. It does not cover other areas such as foreign mapped
> > + * pages or MMIO space. */
> > +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end)
> > +{
> > +    if ( start )
> > +        *start = GUEST_RAM_BASE;
> > +
> > +    if ( end )
> > +        *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages << PAGE_SHIFT);
> > +}
> 
> As said on V1 this solution won't work.
> 
> Ian plans to add multiple banks support for the guest very soon. With this
> solution there is a 1GB hole between the 2 banks. Your function will therefore
> stop to work.
> 
> Furthermore, Xen should not assume that the layout of the guest will always start
> at GUEST_RAM_BASE.

Actually, for the time being that is fine if it is internal to the
hypervisor, although it might be storing up pain for later.

> > +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> > +        {
> > +            lpae_walk_t second_pte = second[i2].walk;
> > +
> > +            if ( !second_pte.valid || !second_pte.table )
> > +                goto out;
> 
> With Ian's multiple bank support, the RAM region (as returned by domain_get_range)
> can contain a hole. Rather than leaving the loop, you should continue.

Even without that patch you'd need to be careful of ballooned out
regions.
> > @@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page)
> >       put_page(page);
> >   }
> >
> > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
> 
> Please describe this enum. Also mg is too generic.

This is just code motion.

> > @@ -41,6 +42,7 @@ typedef enum {
> >       p2m_invalid = 0,    /* Nothing mapped here */
> >       p2m_ram_rw,         /* Normal read/write guest RAM */
> >       p2m_ram_ro,         /* Read-only; writes are silently dropped */
> > +    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
> 
> You should at the new type at the end of the enum.

Why? Keeping p2m_ram_* together doesn't seem wrong to me.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-12 14:11       ` Julien Grall
@ 2014-05-14 12:04         ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 12:04 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Mon, 2014-05-12 at 15:11 +0100, Julien Grall wrote:
> On 05/12/2014 03:00 PM, Wei Huang wrote:
> > On 05/11/2014 10:28 AM, Julien Grall wrote:
> >>   the guest very soon. With this
> >> solution there is a 1GB hole between the 2 banks. Your function will
> >> therefore
> >> stop to work.
> >>
> >> Furthermore, Xen should not assume that the layout of the guest will
> >> always start
> >> at GUEST_RAM_BASE.
> >>
> >> I think you can use max_mapped_pfn and lowest_mapped_pfn here. You may
> >> need to
> >> modify a bit the signification of it in the p2m code or introduce a
> >> new field.
> > These two values don't work for the purpose of dirty tracking. What we
> > need are two fields to track _physical_ memory. lowest_mapped_pfn tracks
> > all memory types, which doesn't apply here.
> > 
> > Will new fields to track physical ram space acceptable for you?
> 
> I'm fine with new fields to track physical RAM space. Don't forget it
> may have non-RAM hole in this range.

FWIW x86 appears to use a rangeset for this purpose, and keeps it up to
date when modifying the p2m. Doesn't seem like a bad plan to me...

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-14 11:07   ` Ian Campbell
@ 2014-05-14 12:05     ` Julien Grall
  2014-05-14 12:23       ` Tim Deegan
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-14 12:05 UTC (permalink / raw)
  To: Ian Campbell, Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, tim, jaeyong.yoo,
	xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 12:07 PM, Ian Campbell wrote:
> On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
>> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
>> index 8312e7b..421a6f6 100644
>> --- a/xen/include/public/arch-arm/hvm/save.h
>> +++ b/xen/include/public/arch-arm/hvm/save.h
>> @@ -40,10 +40,42 @@ struct hvm_save_header
>>  };
>>  DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
>>  
>> +/* Guest's view of GIC distributor (per-vcpu)
>> + *   - Based on GICv2 (see "struct vgic_irq_rank")
>> + *   - Store guest's view of GIC distributor
>> + *   - Only support SGI and PPI for DomU (DomU doesn't handle SPI)
>> + */
>> +struct hvm_arm_vgicd_v2
>> +{
>> +    uint32_t ienable;
>> +    uint32_t iactive;
>> +    uint32_t ipend;
>> +    uint32_t pendsgi;
>> +    uint32_t icfg[2];
>> +    uint32_t ipriority[8];
>> +    uint32_t itargets[8];
>> +};
>> +DECLARE_HVM_SAVE_TYPE(VGICD_V2, 2, struct hvm_arm_vgicd_v2);
> 
> This is the state of 32 interrupts. How do you propose to handle more
> interrupts than that?
> 
> I think it would be sensible to split the domain global state, the
> distributor and cpu interface base addresses and sizes and the states of
> any SPIs in here and have a separate per-vpcu set of state for the
> per-cpu GICD state (SPIs and PPIs mainly).
> 
> For the SPI I think you either want to put the above set of state into
> an array of size NR_GUEST_INTERRUPTS/32 or better make each of the above
> an array based on NR_GUEST_INTERRUPTS.
> 
>> +
>> +/* Info for hypervisor to manage guests (per-vcpu)
>> + *   - Based on GICv2
>> + *   - Mainly store registers of GICH_*
>> + */
>> +struct hvm_arm_gich_v2
>> +{
>> +    uint32_t gic_hcr;
>> +    uint32_t gic_vmcr;
>> +    uint32_t gic_apr;
>> +    uint32_t gic_lr[64];
>> +    uint64_t event_mask;
>> +    uint64_t lr_mask;
> 
> I don't think you should be saving any GICH state at all. What should be
> saved is the corresponding GICC state, i.e. "architectural state" that
> is observed by the guest. This might mean pickling stuff from the GICH
> state into a GICC form. (I said this wrt the LRs in a previous round of
> review)

What are the advantage to save the GICC state rather than GICH?

IIRC, the GICH state gives you a representation of the important bits of
the GICC. Most of GICC can't be restore without any translation and
writing in GICH (see gic_vmcr that is a collection of multiple GICC
registers). It seems easier to use GICH state during migration.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-14 11:14   ` Ian Campbell
@ 2014-05-14 12:13     ` Julien Grall
  2014-05-14 13:23       ` Ian Campbell
  2014-05-14 19:04     ` Wei Huang
  1 sibling, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-14 12:13 UTC (permalink / raw)
  To: Ian Campbell, Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, tim, jaeyong.yoo,
	xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 12:14 PM, Ian Campbell wrote:
> On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
>> This patch implements a save/resore support for ARM architecture
> 
> "restore"
> 
>> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
>> index 421a6f6..8679bfd 100644
>> --- a/xen/include/public/arch-arm/hvm/save.h
>> +++ b/xen/include/public/arch-arm/hvm/save.h
>> @@ -72,10 +72,24 @@ struct hvm_arm_gich_v2
>>  };
>>  DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
>>  
>> +/* Two ARM timers (physical and virtual) are saved */
> 
> Do you not need to save CNTFRQ and CNTKCTL?

CNTFRQ is set by the platform and can't change for any guest. If we
migrate to a platform with a different frequency, then the guest should
cope with it.

IHMO, CNTKCTL should be saved/restored in guest core registers.

>> +#define ARM_TIMER_TYPE_VIRT  0
>> +#define ARM_TIMER_TYPE_PHYS  1
>> +#define ARM_TIMER_TYPE_COUNT 2       /* total count */
>> +
>> +struct hvm_arm_timer
>> +{
>> +    uint64_t vtb_offset;
> 
> As discussed elsewhere I don't think the offset is architectural state.
> This should be incorporated into the cval. Otherwise how does the
> receiver know what this is an offset from?

Careful, phystimer.vtb_offset is in nanosecond and virttimer.vtb_offset
is in ticks.

Regards,
-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-14 11:57     ` Ian Campbell
@ 2014-05-14 12:20       ` Julien Grall
  2014-05-14 13:24         ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-14 12:20 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 12:57 PM, Ian Campbell wrote:
>>> @@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page)
>>>       put_page(page);
>>>   }
>>>
>>> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
>>
>> Please describe this enum. Also mg is too generic.
> 
> This is just code motion.

This enum has been moved in a header which is included everywhere.

Keeping the name "mg" without any description is confusion. Developper
can misuse this enum.

>>> @@ -41,6 +42,7 @@ typedef enum {
>>>       p2m_invalid = 0,    /* Nothing mapped here */
>>>       p2m_ram_rw,         /* Normal read/write guest RAM */
>>>       p2m_ram_ro,         /* Read-only; writes are silently dropped */
>>> +    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
>>
>> You should at the new type at the end of the enum.
> 
> Why? Keeping p2m_ram_* together doesn't seem wrong to me.

My mistake, I though we store the P2M during the migration.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-14 12:05     ` Julien Grall
@ 2014-05-14 12:23       ` Tim Deegan
  2014-05-14 13:24         ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Tim Deegan @ 2014-05-14 12:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, Ian Campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, jaeyong.yoo, xen-devel, jbeulich,
	yjhyun.yoo

At 13:05 +0100 on 14 May (1400069153), Julien Grall wrote:
> On 05/14/2014 12:07 PM, Ian Campbell wrote:
> > On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> >> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> >> index 8312e7b..421a6f6 100644
> >> --- a/xen/include/public/arch-arm/hvm/save.h
> >> +++ b/xen/include/public/arch-arm/hvm/save.h
> >> @@ -40,10 +40,42 @@ struct hvm_save_header
> >>  };
> >>  DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
> >>  
> >> +/* Guest's view of GIC distributor (per-vcpu)
> >> + *   - Based on GICv2 (see "struct vgic_irq_rank")
> >> + *   - Store guest's view of GIC distributor
> >> + *   - Only support SGI and PPI for DomU (DomU doesn't handle SPI)
> >> + */
> >> +struct hvm_arm_vgicd_v2
> >> +{
> >> +    uint32_t ienable;
> >> +    uint32_t iactive;
> >> +    uint32_t ipend;
> >> +    uint32_t pendsgi;
> >> +    uint32_t icfg[2];
> >> +    uint32_t ipriority[8];
> >> +    uint32_t itargets[8];
> >> +};
> >> +DECLARE_HVM_SAVE_TYPE(VGICD_V2, 2, struct hvm_arm_vgicd_v2);
> > 
> > This is the state of 32 interrupts. How do you propose to handle more
> > interrupts than that?
> > 
> > I think it would be sensible to split the domain global state, the
> > distributor and cpu interface base addresses and sizes and the states of
> > any SPIs in here and have a separate per-vpcu set of state for the
> > per-cpu GICD state (SPIs and PPIs mainly).
> > 
> > For the SPI I think you either want to put the above set of state into
> > an array of size NR_GUEST_INTERRUPTS/32 or better make each of the above
> > an array based on NR_GUEST_INTERRUPTS.
> > 
> >> +
> >> +/* Info for hypervisor to manage guests (per-vcpu)
> >> + *   - Based on GICv2
> >> + *   - Mainly store registers of GICH_*
> >> + */
> >> +struct hvm_arm_gich_v2
> >> +{
> >> +    uint32_t gic_hcr;
> >> +    uint32_t gic_vmcr;
> >> +    uint32_t gic_apr;
> >> +    uint32_t gic_lr[64];
> >> +    uint64_t event_mask;
> >> +    uint64_t lr_mask;
> > 
> > I don't think you should be saving any GICH state at all. What should be
> > saved is the corresponding GICC state, i.e. "architectural state" that
> > is observed by the guest. This might mean pickling stuff from the GICH
> > state into a GICC form. (I said this wrt the LRs in a previous round of
> > review)
> 
> What are the advantage to save the GICC state rather than GICH?
> 
> IIRC, the GICH state gives you a representation of the important bits of
> the GICC. Most of GICC can't be restore without any translation and
> writing in GICH (see gic_vmcr that is a collection of multiple GICC
> registers). It seems easier to use GICH state during migration.

The GICC state is the architectural state of the virtual machine;
the GICH state is an implementation detail of how that's achieved by Xen.
We prefer always to put clean architectural state into the save record.
That way if we for any reason change how the VMM is implemented, the
save record format won't be affected by that.

Tim.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-14 11:16     ` Ian Campbell
@ 2014-05-14 12:23       ` Julien Grall
  2014-05-14 13:25         ` Ian Campbell
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-14 12:23 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 12:16 PM, Ian Campbell wrote:
> On Sun, 2014-05-11 at 10:06 +0100, Julien Grall wrote:
>> Hi Wei,
>>
>> Thank you for the patch.
>>
>> On 08/05/14 22:18, Wei Huang wrote:
>>> This patch implements a save/resore support for ARM guest core
>>> registers.
>>
>>
>> The commit 893256f "xen/arm: Correctly save/restore CNTKCTL_EL1" 
>> save/restore a new register during the context switch.
>>
>> I think you forgot to add it in this patch.
> 
> I think it would belong in the previous patch with the timer state.

If Wei plans to use his new structure (see [1]), I'm fine with
save/restore on the previous patch.

Otherwise it seems stupid to save twice the same value.

Regards,

[1] http://lists.xen.org/archives/html/xen-devel/2014-05/msg01642.html

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
  2014-05-08 23:46   ` Andrew Cooper
  2014-05-11 15:28   ` Julien Grall
@ 2014-05-14 13:18   ` Ian Campbell
  2014-05-16 10:59   ` Julien Grall
  3 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:18 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> This patch implements log_dirty for ARM guest VMs. This feature
> is provided via two basic blocks: dirty_bit_map and VLPT
> (virtual-linear page table)

> 1. VLPT provides fast accessing of 3rd PTE of guest P2M.
> When creating a mapping for VLPT, the page table mapping
> becomes the following:
>    xen's 1st PTE --> xen's 2nd PTE --> guest p2m's 2nd PTE -->
>    guest p2m's 3rd PTE

I think "xen's 2nd PTE" here literally means the xen_second[] page
table? As discussed with Jaeyong this is shared between all PCPUs, which
means that only a single domain can be being migrated at one time.

> 
> With VLPT, xen can immediately locate the 3rd PTE of guest P2M
> and modify PTE attirbute during dirty page tracking. The following

"attribute"

> link shows the performance comparison for handling a dirty-page
> between VLPT and typical page table walking.
> http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html

Can you inline the results here please.


> +/* Mark the bitmap for a corresponding page as dirty */
> +static inline void bitmap_mark_dirty(struct domain *d, paddr_t addr)
> +{
> +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
> +    int bit_index = PFN_DOWN(addr - ram_base);
> +    int page_index = bit_index >> (PAGE_SHIFT + 3);
> +    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) - 1);

I queried this magic +3 on Jaeyong's v5 posting of this functionality
too.

> +
> +    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
> +}
> +
> +/* Allocate dirty bitmap resource */
> +static int bitmap_init(struct domain *d)
> +{
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    int nr_bytes;
> +    int nr_pages;
> +    int i;
> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +
> +    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
> +    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;

DIV_ROUNDUP?

> +
> +    BUG_ON(nr_pages > MAX_DIRTY_BITMAP_PAGES);
> +
> +    for ( i = 0; i < nr_pages; ++i )
> +    {
> +        struct page_info *page;
> +        page = alloc_domheap_page(NULL, 0);
> +        if ( page == NULL )
> +            goto cleanup_on_failure;
> +
> +        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));
> +        clear_page(d->arch.dirty.bitmap[i]);
> +    }
> +
> +    d->arch.dirty.bitmap_pages = nr_pages;
> +    return 0;
> +
> +cleanup_on_failure:

This would more normally be called out or err.

> +    nr_pages = i;
> +    for ( i = 0; i < nr_pages; ++i )

I think people normally do this with a while counting backwards.

> +    {
> +        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
> +    }

Coding Style doesn't need {}'s around single line statement like this.

> +
> +    return -ENOMEM;
> +}
> +
> +/* Cleanup dirty bitmap resource */
> +static void bitmap_cleanup(struct domain *d)
> +{
> +    int i;
> +
> +    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +    {
> +        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
> +    }
> +}
> +
> +/* Flush VLPT area */
> +static void vlpt_flush(struct domain *d)
> +{
> +    int flush_size;
> +    flush_size = (d->arch.dirty.second_lvl_end - 
> +                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
> +
> +    /* flushing the 3rd level mapping */
> +    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start << SECOND_SHIFT,
> +                                flush_size);
> +}
> +
> +/* Set up a page table for VLPT mapping */
> +static int vlpt_init(struct domain *d)
> +{
> +    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
> +    int xen_second_linear_base;
> +    int gp2m_start_index, gp2m_end_index;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    struct page_info *second_lvl_page;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    lpae_t *first[2];
> +    int i;
> +
> +    /* Check if reserved space is enough to cover guest physical address space.
> +     * Note that each LPAE page table entry is 64-bit (8 bytes). So we only
> +     * shift left with LPAE_SHIFT instead of PAGE_SHIFT. */
> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> +    if ( required > avail )
> +    {
> +        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest (avail: "

"...is too small...".

What is the size limit here? We can probably accept a reasonable limit
for 32-bit guests, but for 64-bit guests we might need to consider other
options. I have patches which enable up to 1TB guests for 64-bit, and I
would expect that to grow again sooner rather than later.

This will need doing differently for arm64 anyway since there is no
per-pcpu page tables there. But there is plenty of address space so
there can probably be a linear area per pcpu, which depending on the
size of the VLPT might do, otherwise we might need to switch to per-pcpu
page tables.

> +                "%#llx, required: %#llx)\n", (unsigned long long)avail,
> +                (unsigned long long)required);
> +        return -ENOMEM;
> +    }
> +
> +    /* Caulculate the base of 2nd linear table base for VIRT_LIN_P2M_START */
> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
> +    {
> +        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
> +        return -ENOMEM;
> +    }
> +
> +    /* Two pages are allocated to backup the related PTE content of guest 
> +     * VM's 1st-level table. */

By "backup" do you mean context switch?

> +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);

> +    if ( second_lvl_page == NULL )
> +        return -ENOMEM;
> +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page) );
> +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page+1) );
> +
> +    /* 1st level P2M of guest VM is 2 consecutive pages */

Note that 4 level P2M with no concatenated level zero is on the cards
for arm64 soon.

> +    first[0] = __map_domain_page(p2m->first_level);
> +    first[1] = __map_domain_page(p2m->first_level+1);
> +
> +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> +
> +        /* Update 2nd-level PTE of Xen linear table. With this, Xen linear 
> +         * page table layout becomes: 1st Xen linear ==> 2nd Xen linear ==> 
> +         * 2nd guest P2M (i.e. 3rd Xen linear) ==> 3rd guest P2M (i.e. Xen 
> +         * linear content) for VIRT_LIN_P2M_START address space. */
> +        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);

This has two barriers for each write, but you only actually need one
before the loop and one after, i.e. dsb() + copy_page() + dsb()
> +
> +        /* We copy the mapping into domain's structure as a reference
> +         * in case of the context switch (used in vlpt_restore function ) */

This is to avoid having to map the p2m pages on context switch I think.
But in order to do that you have to create a permanent mapping of two
other fresh pages to create a "cache". Why not just map the 2 actual p2m
pages instead?

Having done that I wonder how much of this loop can then be shared with
the context switcher?

> +        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
> +    }
> +    unmap_domain_page(first[0]);
> +    unmap_domain_page(first[1]);
> +
> +    /* storing the start and end index */
> +    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
> +    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
> +
> +    vlpt_flush(d);
> +
> +    return 0;
> +}
> +
> +static void vlpt_cleanup(struct domain *d)
> +{
> +    /* First level p2m is 2 consecutive pages */
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> +}
> +
> +/* Returns zero if addr is not valid or dirty mode is not set */
> +int handle_page_fault(struct domain *d, paddr_t addr)
> +{
> +    lpae_t *vlp2m_pte = 0;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +
> +    if ( !d->arch.dirty.mode )
> +        return 0;
> +
> +    domain_get_ram_range(d, &gma_start, &gma_end);

Couldn't this just be d->arch.foo_start/end?

> +
> +    /* Ensure that addr is inside guest's RAM */
> +    if ( addr < gma_start || addr > gma_end )
> +        return 0;
> +
> +    vlp2m_pte = vlpt_get_3lvl_pte(addr);
> +    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
> +         vlp2m_pte->p2m.type == p2m_ram_logdirty )
> +    {
> +        lpae_t pte = *vlp2m_pte;
> +        pte.p2m.write = 1;
> +        write_pte(vlp2m_pte, pte);

Should you not be changing the type back to p2m_ram_rw?

> +        flush_tlb_local();

What about other CPUs?

> +
> +        /* only necessary to lock between get-dirty bitmap and mark dirty
> +         * bitmap. If get-dirty bitmap happens immediately before this
> +         * lock, the corresponding dirty-page would be marked at the next
> +         * round of get-dirty bitmap */
> +        spin_lock(&d->arch.dirty.lock);

At some point I think I suggested that you might be able to use an
atomic bitop rather than a lock, did that turn out to be impossible?

> +        bitmap_mark_dirty(d, addr);
> +        spin_unlock(&d->arch.dirty.lock);
> +    }
> +
> +    return 1;
> +}
> +
> +/* Restore the xen page table for vlpt mapping for domain */
> +void log_dirty_restore(struct domain *d)
> +{
> +    int i;
> +
> +    /* Nothing to do as log dirty mode is off */
> +    if ( !(d->arch.dirty.mode) )
> +        return;
> +
> +    dsb(sy);
> +
> +    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
> +          ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
> +        {
> +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
> +            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
> +        }
> +    }
> +
> +    dsb(sy);
> +    isb();

You have barriers here, and in write_pte and in
flush_xen_data_tlb_range, which is somewhat overkill. I think you can
just have a single set before and after.

> +/* Initialize log dirty fields */
> +int log_dirty_init(struct domain *d)
> +{
> +    d->arch.dirty.count = 0;

I have a feeling that all of these are zeroed in struct domain when it
is allocated.

> +    d->arch.dirty.mode = 0;
> +    spin_lock_init(&d->arch.dirty.lock);
> +
> +    d->arch.dirty.second_lvl_start = 0;
> +    d->arch.dirty.second_lvl_end = 0;
> +    d->arch.dirty.second_lvl[0] = NULL;
> +    d->arch.dirty.second_lvl[1] = NULL;
> +
> +    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
> +    d->arch.dirty.bitmap_pages = 0;
> +
> +    return 0;
> +}
> +
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 603c097..0808cc9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -6,6 +6,8 @@
>  #include <xen/bitops.h>
>  #include <asm/flushtlb.h>
>  #include <asm/gic.h>
> +#include <xen/guest_access.h>
> +#include <xen/pfn.h>
>  #include <asm/event.h>
>  #include <asm/hardirq.h>
>  #include <asm/page.h>
> @@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
>          break;
>  
>      case p2m_ram_ro:
> +    case p2m_ram_logdirty:
>          e.p2m.xn = 0;
>          e.p2m.write = 0;
>          break;
> @@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
>  
>      pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
>  
> +    /* mark the write bit (page table's case, ro bit) as 0
> +     * so, it is writable in case of vlpt access */

"writeable"

> +    pte.pt.ro = 0;
> +
>      write_pte(entry, pte);
>  
>      return 0;
> @@ -696,6 +703,203 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
>      return p >> PAGE_SHIFT;
>  }
>  
> +/* Change types across all p2m entries in a domain */
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt)

Please try and use apply_p2m_changes for this.

mg_* are host (i.e. Xen) mapping types. You should be using p2m_type_t
here.

This should all just fall out nicely if you use apply_p2m_changes I
think.

> +                if ( nt == mg_ro )
> +                {
> +                    if ( pte.p2m.write == 1 )
> +                    {
> +                        pte.p2m.write = 0;
> +                        pte.p2m.type = p2m_ram_logdirty;

If the new type is mg_ro then shouldn't this be p2m_ram_ro? There's no
need to do logdirty tracking on ro pages.

> +                    }
> +                    else
> +                    {
> +                        /* reuse avail bit as an indicator of 'actual' 

The avail bit?

> +                         * read-only */
> +                        pte.p2m.type = p2m_ram_rw;

The new type is mg_ro -- so surely here we *do* need logdirty, iff
logdirty is currently enabled.

> +                    }
> +                }
> +                else if ( nt == mg_rw )
> +                {
> +                    if ( pte.p2m.write == 0 && 
> +                         pte.p2m.type == p2m_ram_logdirty )
> +                    {
> +                        pte.p2m.write = p2m_ram_rw;

This also seems wrong to me.

Surely the logic here ought to be something like:

    switch (nt)
    {
    case p2m_ram_ro:
        pte.p2m.write = 0;
        pte.p2m.type = nt;
        break;
    case p2m_ram_rw:
        if ( logdirty_is_enable(d) )
        {
            pte.p2m.write = 0;
            pte.p2m.type = p2m_ram_logdirty;
        }
        else
        {
            pte.p2m.write = 1;
            pte.p2m.type = p2m_ram_rw;
        }
        break;
    /* Other types, perhaps via default... */
    }
?

> +                    }
> +                }
> +                write_pte(&third[i3], pte);
> +            }
> +            unmap_domain_page(third);
> +
> +            third = NULL;
> +            third_index = 0;
> +        }
> +        unmap_domain_page(second);
> +
> +        second = NULL;
> +        second_index = 0;
> +        third_index = 0;
> +    }
> +
> +out:
> +    flush_tlb_all_local();
> +    if ( third ) unmap_domain_page(third);
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +}
> +
> +/* Read a domain's log-dirty bitmap and stats. If the operation is a CLEAN, 
> + * clear the bitmap and stats. */
> +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    int peek = 1;
> +    int i;
> +    int bitmap_size;
> +    paddr_t gma_start, gma_end;
> +
> +    /* this hypercall is called from domain 0, and we don't know which guest's
> +     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
> +    log_dirty_restore(d);

You don't seem to clear it again at the end?

> +    domain_get_ram_range(d, &gma_start, &gma_end);
> +    bitmap_size = (gma_end - gma_start) / 8;
> +
> +    if ( guest_handle_is_null(sc->dirty_bitmap) )
> +    {
> +        peek = 0;
> +    }
> +    else
> +    {
> +        spin_lock(&d->arch.dirty.lock);
> +
> +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +        {
> +            int j = 0;
> +            uint8_t *bitmap;
> +
> +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> +                                 d->arch.dirty.bitmap[i],
> +                                 bitmap_size < PAGE_SIZE ? bitmap_size :
> +                                                           PAGE_SIZE);
> +            bitmap_size -= PAGE_SIZE;
> +
> +            /* set p2m page table read-only */
> +            bitmap = d->arch.dirty.bitmap[i];
> +            while ((j = find_next_bit((const long unsigned int *)bitmap,
> +                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)

What are these magic 8s?

> +            {
> +                lpae_t *vlpt;
> +                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +

Magic 2* and+3, a helper might be nice.

Isn't j effectively a pfn here (or a pfn offswet from RAM base). In
which case all the normal helpers for manipulating pfns and addresses
are available to you.

> +                    (j << PAGE_SHIFT);
> +                vlpt = vlpt_get_3lvl_pte(addr);
> +                vlpt->p2m.write = 0;
> +                j++;
> +            }

No barrier here?

> +        }
> +
> +        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
> +        {
> +            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +            {
> +                clear_page(d->arch.dirty.bitmap[i]);
> +            }
> +        }
> +
> +        spin_unlock(&d->arch.dirty.lock);
> +        flush_tlb_local();
> +    }
> +
> +    sc->stats.dirty_count = d->arch.dirty.count;
> +
> +    return 0;
> +}
> +
> @@ -1577,6 +1579,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>      if ( rc == -EFAULT )
>          goto bad_data_abort;
>  
> +    /* domU page fault handling for guest live migration. Note that 
> +     * dabt.valid can be 0 here */
> +    if ( page_fault && handle_page_fault(current->domain, info.gpa) )

handle_page_fault only deals with logdirty faults -- please name it
appropriately.

I'm not sure "page_fault" is a very descriptive name -- it's more
specific than that I think?

> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index aabeb51..ac82643 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -162,6 +162,25 @@ struct arch_domain
>      } vuart;
>  
>      unsigned int evtchn_irq;
> +
> +    /* dirty page tracing */
> +    struct {
> +        spinlock_t lock;
> +        volatile int mode;               /* 1 if dirty pages tracing enabled */
> +        volatile unsigned int count;     /* dirty pages counter */
> +
> +        /* vlpt context switch */
> +        volatile int second_lvl_start; /* start idx of virt linear space 2nd */
> +        volatile int second_lvl_end;   /* end idx of virt linear space 2nd */
> +        lpae_t *second_lvl[2];         /* copy of guest P2M 1st-lvl content */
> +
> +        /* bitmap to track dirty pages */
> +#define MAX_DIRTY_BITMAP_PAGES 64
> +        /* Because each bit represents a dirty page, the total supported guest 
> +         * memory is (64 entries x 4KB/entry x 8bits/byte x 4KB) = 8GB. */

8GB isn't very much, especially not for a 64-bit guest.

I've previously discussed with Jaeyong the possibility of using some
state in each pte to track dirtiness and walking the vlpt to copy them
out into the toolstacks bitmap. I think pte.p2m.type could be used --
any page which is p2m_ram_rw has been dirtied (otherwise it would still
be p2m_ram_logdirty). Only thing I'm not sure about is other types --
e.g. ballooning a page out in the middle of a migrate -- I suppose there
is some explicit "dirtying" of such a page somewhere in decrease
reservation.

BTW x86 seems to use a 4-level bitmap trie here.

> +        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES]; /* dirty bitmap */
> +        int bitmap_pages;                        /* # of dirty bitmap pages */
> +    } dirty;
>  }  __cacheline_aligned;
>  
>  struct arch_vcpu

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration
  2014-05-08 21:18 ` [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration Wei Huang
@ 2014-05-14 13:20   ` Ian Campbell
  2014-05-14 13:24     ` Andrew Cooper
  0 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:20 UTC (permalink / raw)
  To: Wei Huang
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> This patch implements xl save/restore operation in xc_arm_migrate.c
> and make it compilable with existing design. The operation is also
> used by migration.
> 
> The overall process of save/restore is the following: 1) save guest
> parameters; 2) save memory; 3) save HVM states.
> 
> Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>

I'm going to defer reviewing this for now, it'll likely need to be
reworked to fit into Andy's new migration framework. As I mentioned
before (I think/hope!) I'd like ARM to use that from day 1.

> diff --git a/tools/misc/Makefile b/tools/misc/Makefile
> index 69b1817..f4ea7ab 100644
> --- a/tools/misc/Makefile
> +++ b/tools/misc/Makefile

This change could probably be refactored into a separate patch and
applied already.

> @@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
>  
>  TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov
>  TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
> -TARGETS-$(CONFIG_MIGRATE) += xen-hptool
> +TARGETS-$(CONFIG_X86) += xen-hptool
>  TARGETS := $(TARGETS-y)
>  
>  SUBDIRS := $(SUBDIRS-y)
> @@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
>  INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \
>  	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
>  INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
> -INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
> +INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
>  INSTALL_SBIN := $(INSTALL_SBIN-y)
>  
>  INSTALL_PRIVBIN-y := xenpvnetboot

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-14 10:46       ` Andrew Cooper
@ 2014-05-14 13:22         ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Huang, keir, stefano.stabellini, ian.jackson, julien.grall,
	tim, jaeyong.yoo, xen-devel, jbeulich, yjhyun.yoo

On Wed, 2014-05-14 at 11:46 +0100, Andrew Cooper wrote:
> On 14/05/14 11:25, Ian Campbell wrote:
> > On Thu, 2014-05-08 at 23:11 +0100, Andrew Cooper wrote:
> >>> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> >>> index 75b8e65..8312e7b 100644
> >>> --- a/xen/include/public/arch-arm/hvm/save.h
> >>> +++ b/xen/include/public/arch-arm/hvm/save.h
> >>> @@ -3,6 +3,7 @@
> >>>   * be saved along with the domain's memory and device-model state.
> >>>   *
> >>>   * Copyright (c) 2012 Citrix Systems Ltd.
> >>> + * Copyright (c) 2014 Samsung Electronics.
> >>>   *
> >>>   * Permission is hereby granted, free of charge, to any person obtaining a copy
> >>>   * of this software and associated documentation files (the "Software"), to
> >>> @@ -26,6 +27,24 @@
> >>>  #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
> >>>  #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
> >>>  
> >>> +#define HVM_ARM_FILE_MAGIC   0x92385520
> >>> +#define HVM_ARM_FILE_VERSION 0x00000001
> >>> +
> >>> +/* Note: For compilation purpose hvm_save_header name is the same as x86,
> >>> + * but layout is different. */
> >>> +struct hvm_save_header
> >>> +{
> >>> +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
> >>> +    uint32_t version;           /* File format version */
> >>> +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
> >>> +};
> >>> +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
> >>> +
> >>> +/*
> >>> + * Largest type-code in use
> >>> + */
> >>> +#define HVM_SAVE_CODE_MAX 1
> >>> +
> >>>  #endif
> >>>  
> >>>  /*
> >> Hmm - it is quite poor to have this magically named "hvm_save_header".
> > We frequently have arch interfaces where generic code requires arch code
> > to provide particular structs or functions etc. What is poor about this
> > particular instance of that pattern?

> Save/restore is currently asymmetric in this regard.  The save side
> treats this x86 structure as common, whereas load is entirely arch specific.
> 
> Fixing the asymmetry sensibly involves pushing the save side into arch code.

OK, that's an actual reason, thanks.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-14 12:13     ` Julien Grall
@ 2014-05-14 13:23       ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Wed, 2014-05-14 at 13:13 +0100, Julien Grall wrote:
> On 05/14/2014 12:14 PM, Ian Campbell wrote:
> > On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
> >> This patch implements a save/resore support for ARM architecture
> > 
> > "restore"
> > 
> >> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> >> index 421a6f6..8679bfd 100644
> >> --- a/xen/include/public/arch-arm/hvm/save.h
> >> +++ b/xen/include/public/arch-arm/hvm/save.h
> >> @@ -72,10 +72,24 @@ struct hvm_arm_gich_v2
> >>  };
> >>  DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
> >>  
> >> +/* Two ARM timers (physical and virtual) are saved */
> > 
> > Do you not need to save CNTFRQ and CNTKCTL?
> 
> CNTFRQ is set by the platform and can't change for any guest. If we
> migrate to a platform with a different frequency, then the guest should
> cope with it.

I doubt that (guest coping) will be the case in reality.

CNTFRQ should be saved so that the target platform can either reject the
restore or take steps to emulate the original state. In the short term
this probably means reject.

> IHMO, CNTKCTL should be saved/restored in guest core registers.

I don't see why when there is a timer specific struct.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration
  2014-05-14 13:20   ` Ian Campbell
@ 2014-05-14 13:24     ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2014-05-14 13:24 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, keir, stefano.stabellini, ian.jackson, julien.grall,
	tim, jaeyong.yoo, xen-devel, jbeulich, yjhyun.yoo

On 14/05/14 14:20, Ian Campbell wrote:
> On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
>> This patch implements xl save/restore operation in xc_arm_migrate.c
>> and make it compilable with existing design. The operation is also
>> used by migration.
>>
>> The overall process of save/restore is the following: 1) save guest
>> parameters; 2) save memory; 3) save HVM states.
>>
>> Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
>> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> I'm going to defer reviewing this for now, it'll likely need to be
> reworked to fit into Andy's new migration framework. As I mentioned
> before (I think/hope!) I'd like ARM to use that from day 1.

With the Xen side hypercalls implemented in this series, supporting ARM
in the new framework should be as complicated as copying the x86
mostly-noop function-pointer-structure.  Should be all of 100 lines in
total or so.

~Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-14 12:20       ` Julien Grall
@ 2014-05-14 13:24         ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Wed, 2014-05-14 at 13:20 +0100, Julien Grall wrote:
> On 05/14/2014 12:57 PM, Ian Campbell wrote:
> >>> @@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page)
> >>>       put_page(page);
> >>>   }
> >>>
> >>> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
> >>
> >> Please describe this enum. Also mg is too generic.
> > 
> > This is just code motion.
> 
> This enum has been moved in a header which is included everywhere.
> 
> Keeping the name "mg" without any description is confusion. Developper
> can misuse this enum.

Actually, I think the use of enum mg where it was used here was wrong
and should have been p2m_type_t, so no need to move this either.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-14 12:23       ` Tim Deegan
@ 2014-05-14 13:24         ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:24 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3,
	Julien Grall, ian.jackson, jaeyong.yoo, xen-devel, jbeulich,
	yjhyun.yoo

On Wed, 2014-05-14 at 14:23 +0200, Tim Deegan wrote:
> > >> +
> > >> +/* Info for hypervisor to manage guests (per-vcpu)
> > >> + *   - Based on GICv2
> > >> + *   - Mainly store registers of GICH_*
> > >> + */
> > >> +struct hvm_arm_gich_v2
> > >> +{
> > >> +    uint32_t gic_hcr;
> > >> +    uint32_t gic_vmcr;
> > >> +    uint32_t gic_apr;
> > >> +    uint32_t gic_lr[64];
> > >> +    uint64_t event_mask;
> > >> +    uint64_t lr_mask;
> > > 
> > > I don't think you should be saving any GICH state at all. What should be
> > > saved is the corresponding GICC state, i.e. "architectural state" that
> > > is observed by the guest. This might mean pickling stuff from the GICH
> > > state into a GICC form. (I said this wrt the LRs in a previous round of
> > > review)
> > 
> > What are the advantage to save the GICC state rather than GICH?
> > 
> > IIRC, the GICH state gives you a representation of the important bits of
> > the GICC. Most of GICC can't be restore without any translation and
> > writing in GICH (see gic_vmcr that is a collection of multiple GICC
> > registers). It seems easier to use GICH state during migration.
> 
> The GICC state is the architectural state of the virtual machine;
> the GICH state is an implementation detail of how that's achieved by Xen.
> We prefer always to put clean architectural state into the save record.
> That way if we for any reason change how the VMM is implemented, the
> save record format won't be affected by that.

Ack.

> 
> Tim.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-14 12:23       ` Julien Grall
@ 2014-05-14 13:25         ` Ian Campbell
  2014-05-14 13:31           ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Ian Campbell @ 2014-05-14 13:25 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Wed, 2014-05-14 at 13:23 +0100, Julien Grall wrote:
> On 05/14/2014 12:16 PM, Ian Campbell wrote:
> > On Sun, 2014-05-11 at 10:06 +0100, Julien Grall wrote:
> >> Hi Wei,
> >>
> >> Thank you for the patch.
> >>
> >> On 08/05/14 22:18, Wei Huang wrote:
> >>> This patch implements a save/resore support for ARM guest core
> >>> registers.
> >>
> >>
> >> The commit 893256f "xen/arm: Correctly save/restore CNTKCTL_EL1" 
> >> save/restore a new register during the context switch.
> >>
> >> I think you forgot to add it in this patch.
> > 
> > I think it would belong in the previous patch with the timer state.
> 
> If Wei plans to use his new structure (see [1]), I'm fine with
> save/restore on the previous patch.
> 
> Otherwise it seems stupid to save twice the same value.

Nobody has suggested saving it twice.

> 
> Regards,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2014-05/msg01642.html
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers
  2014-05-14 13:25         ` Ian Campbell
@ 2014-05-14 13:31           ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-14 13:31 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 02:25 PM, Ian Campbell wrote:
> On Wed, 2014-05-14 at 13:23 +0100, Julien Grall wrote:
>> On 05/14/2014 12:16 PM, Ian Campbell wrote:
>>> On Sun, 2014-05-11 at 10:06 +0100, Julien Grall wrote:
>>>> Hi Wei,
>>>>
>>>> Thank you for the patch.
>>>>
>>>> On 08/05/14 22:18, Wei Huang wrote:
>>>>> This patch implements a save/resore support for ARM guest core
>>>>> registers.
>>>>
>>>>
>>>> The commit 893256f "xen/arm: Correctly save/restore CNTKCTL_EL1" 
>>>> save/restore a new register during the context switch.
>>>>
>>>> I think you forgot to add it in this patch.
>>>
>>> I think it would belong in the previous patch with the timer state.
>>
>> If Wei plans to use his new structure (see [1]), I'm fine with
>> save/restore on the previous patch.
>>
>> Otherwise it seems stupid to save twice the same value.
> 
> Nobody has suggested saving it twice.

When I commented the code, there was 2 timer structure (one for the phys
timer and the other for the virt timer).

Anyway, it seems the new approach will be with a single structure.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM
  2014-05-14 10:37   ` Ian Campbell
@ 2014-05-14 18:54     ` Wei Huang
  0 siblings, 0 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-14 18:54 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 05:37 AM, Ian Campbell wrote:
> On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
>> This patch implements a basic framework for ARM guest
>> save/restore. It defines a HVM save header for ARM guests
>> and correponding arch_ save/load functions. These functions
>
> "corresponding"
>
>> are hooked up with domain control hypercalls (gethvmcontext
>> and sethvmcontext). The hypercalls become a common code path to
>> both x86 and ARM. As a result of merging, the x86 specific
>> header saving code is moved to x86 sub-directory.
>>
>> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
>> Signed-off-by: Wei Huang <w1.huang@samsung.com>
>
> Other than the comments already made by others this is looking good to
> me.
>
>> [...]
>> +#define HVM_ARM_FILE_MAGIC   0x92385520
>
> OOI where did that number come from? (often these are a few ASCII
> characters etc, I'm just curious)
No idea. I inherited from old patch. It doesn't seem to be meaningful 
ASCII string to me.
>
>> +#define HVM_ARM_FILE_VERSION 0x00000001
>> +
>> +/* Note: For compilation purpose hvm_save_header name is the same as x86,
>> + * but layout is different. */
>> +struct hvm_save_header
>> +{
>> +    uint32_t magic;             /* Must be HVM_ARM_FILE_MAGIC */
>> +    uint32_t version;           /* File format version */
>> +    uint32_t cpuinfo;           /* Record MIDR_EL1 info of saving machine */
>
> Is the size of this struct the same for arm32 and arm64?
The size is the same on arm32 and arm64.
>
> Ian.
>
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer
  2014-05-14 11:14   ` Ian Campbell
  2014-05-14 12:13     ` Julien Grall
@ 2014-05-14 19:04     ` Wei Huang
  1 sibling, 0 replies; 67+ messages in thread
From: Wei Huang @ 2014-05-14 19:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, stefano.stabellini, andrew.cooper3, julien.grall, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On 05/14/2014 06:14 AM, Ian Campbell wrote:
> On Thu, 2014-05-08 at 16:18 -0500, Wei Huang wrote:
>> This patch implements a save/resore support for ARM architecture
>
> "restore"
>
>> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
>> index 421a6f6..8679bfd 100644
>> --- a/xen/include/public/arch-arm/hvm/save.h
>> +++ b/xen/include/public/arch-arm/hvm/save.h
>> @@ -72,10 +72,24 @@ struct hvm_arm_gich_v2
>>   };
>>   DECLARE_HVM_SAVE_TYPE(GICH_V2, 3, struct hvm_arm_gich_v2);
>>
>> +/* Two ARM timers (physical and virtual) are saved */
>
> Do you not need to save CNTFRQ and CNTKCTL?
>
>> +#define ARM_TIMER_TYPE_VIRT  0
>> +#define ARM_TIMER_TYPE_PHYS  1
>> +#define ARM_TIMER_TYPE_COUNT 2       /* total count */
>> +
>> +struct hvm_arm_timer
>> +{
>> +    uint64_t vtb_offset;
>
> As discussed elsewhere I don't think the offset is architectural state.
> This should be incorporated into the cval. Otherwise how does the
> receiver know what this is an offset from?
>
>> +    uint32_t ctl;
>> +    uint64_t cval;
>> +    uint32_t type;
>> +};
>> +DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);
>
> Why not do
> DECLARE_HVM_SAVE_TYPE(VTIMER, 4, struct hvm_arm_timer)
> DECLARE_HVM_SAVE_TYPE(PTIMER, 5, struct hvm_arm_timer)
> and drop the type field?
>
> Or else define hvm_arm_timers with cntfeq, cntkctl and two sets of the
> ctl,cval in a single struct.
>
This is the preferred approach after discussing with Julien.
> Ian.
>
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
                     ` (2 preceding siblings ...)
  2014-05-14 11:07   ` Ian Campbell
@ 2014-05-15 17:15   ` Julien Grall
  2014-05-16  7:36     ` Ian Campbell
  3 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2014-05-15 17:15 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

On 05/08/2014 10:18 PM, Wei Huang wrote:
> +struct hvm_arm_gich_v2
> +{
> +    uint32_t gic_hcr;
> +    uint32_t gic_vmcr;
> +    uint32_t gic_apr;
> +    uint32_t gic_lr[64];
> +    uint64_t event_mask;

FYI, the field event_mask as been dropped in xen upstream [1]

So you don't need to save it.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=6fedf29bf3ff8a2391eef7c45244406ec4900f88

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2
  2014-05-15 17:15   ` Julien Grall
@ 2014-05-16  7:36     ` Ian Campbell
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Campbell @ 2014-05-16  7:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, keir, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, xen-devel, jbeulich, ian.jackson, yjhyun.yoo

On Thu, 2014-05-15 at 18:15 +0100, Julien Grall wrote:
> Hi Wei,
> 
> On 05/08/2014 10:18 PM, Wei Huang wrote:
> > +struct hvm_arm_gich_v2
> > +{
> > +    uint32_t gic_hcr;
> > +    uint32_t gic_vmcr;
> > +    uint32_t gic_apr;
> > +    uint32_t gic_lr[64];
> > +    uint64_t event_mask;
> 
> FYI, the field event_mask as been dropped in xen upstream [1]
> 
> So you don't need to save it.

Quite apart from it's existence in upstream Xen it is not and never was
an architectural field and so shouldn't ever be saved. It's presence or
absence in some struct inside Xen has no impact on that.

Ian.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC v3 5/6] xen/arm: Add log_dirty support for ARM
  2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
                     ` (2 preceding siblings ...)
  2014-05-14 13:18   ` Ian Campbell
@ 2014-05-16 10:59   ` Julien Grall
  3 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2014-05-16 10:59 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jaeyong.yoo, jbeulich, ian.jackson, yjhyun.yoo

Hi Wei,

On 05/08/2014 10:18 PM, Wei Huang wrote:
> This patch implements log_dirty for ARM guest VMs. This feature
> is provided via two basic blocks: dirty_bit_map and VLPT
> (virtual-linear page table)
> 
> 1. VLPT provides fast accessing of 3rd PTE of guest P2M.
> When creating a mapping for VLPT, the page table mapping
> becomes the following:
>    xen's 1st PTE --> xen's 2nd PTE --> guest p2m's 2nd PTE -->
>    guest p2m's 3rd PTE
> 
> With VLPT, xen can immediately locate the 3rd PTE of guest P2M
> and modify PTE attirbute during dirty page tracking. The following
> link shows the performance comparison for handling a dirty-page
> between VLPT and typical page table walking.
> http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html
> 
> For more info about VLPT, please see
> http://www.technovelty.org/linux/virtual-linear-page-table.html.
> 
> 2. Dirty bitmap
> The dirty bitmap is used to mark the pages which are dirty during
> migration. The info is used by Xen tools, via DOMCTL_SHADOW_OP_*,
> to figure out which guest pages need to be resent.

I think you forgot a case in the log dirty support. You don't handle
dirty page when Xen is writing data in the guest memory (with
raw_copy_to_guest).

I think we need to handle this case during live migration otherwise data
may be corrupted.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2014-05-16 10:59 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08 21:18 [RFC v3 0/6] xen/arm: ARM save/restore/migration support Wei Huang
2014-05-08 21:18 ` [RFC v3 1/6] xen/arm: Add basic save/restore support for ARM Wei Huang
2014-05-08 22:11   ` Andrew Cooper
2014-05-08 22:20     ` Wei Huang
2014-05-09  8:56       ` Julien Grall
2014-05-14 10:27         ` Ian Campbell
2014-05-14 10:25     ` Ian Campbell
2014-05-14 10:46       ` Andrew Cooper
2014-05-14 13:22         ` Ian Campbell
2014-05-09  9:06   ` Julien Grall
2014-05-09  9:42   ` Jan Beulich
2014-05-14 10:37   ` Ian Campbell
2014-05-14 18:54     ` Wei Huang
2014-05-08 21:18 ` [RFC v3 2/6] xen/arm: Add save/restore support for ARM GIC V2 Wei Huang
2014-05-08 22:47   ` Andrew Cooper
2014-05-09 14:12     ` Wei Huang
2014-05-09 14:24       ` Ian Campbell
2014-05-11 16:15         ` Julien Grall
2014-05-13 14:53     ` Wei Huang
2014-05-09  9:17   ` Julien Grall
2014-05-14 11:07   ` Ian Campbell
2014-05-14 12:05     ` Julien Grall
2014-05-14 12:23       ` Tim Deegan
2014-05-14 13:24         ` Ian Campbell
2014-05-15 17:15   ` Julien Grall
2014-05-16  7:36     ` Ian Campbell
2014-05-08 21:18 ` [RFC v3 3/6] xen/arm: Add save/restore support for ARM arch timer Wei Huang
2014-05-08 23:02   ` Andrew Cooper
2014-05-11  9:01     ` Julien Grall
2014-05-11  8:58   ` Julien Grall
2014-05-12  8:35     ` Ian Campbell
2014-05-12 11:42       ` Julien Grall
2014-05-14 11:14   ` Ian Campbell
2014-05-14 12:13     ` Julien Grall
2014-05-14 13:23       ` Ian Campbell
2014-05-14 19:04     ` Wei Huang
2014-05-08 21:18 ` [RFC v3 4/6] xen/arm: Add save/restore support for guest core registers Wei Huang
2014-05-08 23:10   ` Andrew Cooper
2014-05-09 16:35     ` Wei Huang
2014-05-09 16:52       ` Ian Campbell
2014-05-11  9:06   ` Julien Grall
2014-05-14 11:16     ` Ian Campbell
2014-05-14 12:23       ` Julien Grall
2014-05-14 13:25         ` Ian Campbell
2014-05-14 13:31           ` Julien Grall
2014-05-14 11:37   ` Ian Campbell
2014-05-08 21:18 ` [RFC v3 5/6] xen/arm: Add log_dirty support for ARM Wei Huang
2014-05-08 23:46   ` Andrew Cooper
2014-05-14 11:51     ` Ian Campbell
2014-05-11 15:28   ` Julien Grall
2014-05-12 14:00     ` Wei Huang
2014-05-12 14:11       ` Julien Grall
2014-05-14 12:04         ` Ian Campbell
2014-05-14 11:57     ` Ian Campbell
2014-05-14 12:20       ` Julien Grall
2014-05-14 13:24         ` Ian Campbell
2014-05-14 13:18   ` Ian Campbell
2014-05-16 10:59   ` Julien Grall
2014-05-08 21:18 ` [RFC v3 6/6] xen/arm: Implement toolstack for xl restore/save/migration Wei Huang
2014-05-14 13:20   ` Ian Campbell
2014-05-14 13:24     ` Andrew Cooper
2014-05-11  9:23 ` [RFC v3 0/6] xen/arm: ARM save/restore/migration support Julien Grall
2014-05-12 14:37   ` Wei Huang
2014-05-13 14:41     ` Julien Grall
2014-05-12 14:17 ` Julien Grall
2014-05-12 14:52   ` Wei Huang
2014-05-12 15:01     ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.