All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
@ 2014-04-15 21:05 Wei Huang
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
                   ` (14 more replies)
  0 siblings, 15 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

This series is RFC v2 for save/restore/migration. The following 
areas have been addressed:
  * save and restore of guest states is split into specific areas (and files)
  * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
  * name and layout of some functions
  * small areas commented by Julien Grall and Andrew Cooper

Note that:
  * previous comments by Ian are being examined.
  * patch 3-6 need more review attenion.
  * Rev v3 will be sent out soon.

Let me know if there are issues with the design. 

Thanks,
-Wei

  xen/arm: Save and restore support with hvm context hypercalls
  xen/arm: implement support for XENMEM_maximum_gpfn hypercall
  xen/arm: support guest do_suspend function
  xen/arm: Implement VLPT for guest p2m mapping in live migration
  xen/arm: Implement hypercall for dirty page tracing
  xen/arm: Implement toolstack for xl restore/save and migrate

 config/arm32.mk                        |    1 +
 config/arm64.mk                        |    1 +
 tools/libxc/Makefile                   |    6 +-
 tools/libxc/xc_arm_migrate.c           |  702 ++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c               |    4 +-
 tools/libxc/xc_resume.c                |   25 ++
 tools/libxl/libxl.h                    |    3 -
 tools/misc/Makefile                    |    4 +-
 xen/arch/arm/Makefile                  |    1 +
 xen/arch/arm/domain.c                  |   19 +
 xen/arch/arm/domctl.c                  |   21 +
 xen/arch/arm/hvm.c                     |  268 +++++++++++-
 xen/arch/arm/mm.c                      |  242 ++++++++++-
 xen/arch/arm/p2m.c                     |  211 ++++++++++
 xen/arch/arm/save.c                    |   65 +++
 xen/arch/arm/traps.c                   |   11 +
 xen/arch/arm/vgic.c                    |  146 +++++++
 xen/arch/arm/vtimer.c                  |   71 ++++
 xen/arch/x86/domctl.c                  |   70 ----
 xen/common/Makefile                    |    2 +-
 xen/common/domctl.c                    |   74 ++++
 xen/include/asm-arm/config.h           |    7 +
 xen/include/asm-arm/domain.h           |   14 +
 xen/include/asm-arm/hvm/support.h      |   29 ++
 xen/include/asm-arm/mm.h               |   28 ++
 xen/include/asm-arm/p2m.h              |    8 +-
 xen/include/asm-arm/processor.h        |    2 +
 xen/include/public/arch-arm/hvm/save.h |  130 ++++++
 28 files changed, 2083 insertions(+), 82 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 23:37   ` Andrew Cooper
                     ` (2 more replies)
  2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch implements HVM context hypercalls to support ARM
guest save and restore. It saves the states of guest GIC,
arch timer, and CPU registers.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/Makefile                  |    1 +
 xen/arch/arm/hvm.c                     |  268 +++++++++++++++++++++++++++++++-
 xen/arch/arm/save.c                    |   65 ++++++++
 xen/arch/arm/vgic.c                    |  146 +++++++++++++++++
 xen/arch/arm/vtimer.c                  |   71 +++++++++
 xen/arch/x86/domctl.c                  |   70 ---------
 xen/common/Makefile                    |    2 +-
 xen/common/domctl.c                    |   74 +++++++++
 xen/include/asm-arm/hvm/support.h      |   29 ++++
 xen/include/public/arch-arm/hvm/save.h |  130 ++++++++++++++++
 10 files changed, 784 insertions(+), 72 deletions(-)
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 63e0460..d9a328c 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -33,6 +33,7 @@ obj-y += hvm.o
 obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
+obj-y += save.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 471c4cd..18eb2bd 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -4,6 +4,7 @@
 #include <xen/errno.h>
 #include <xen/guest_access.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 
 #include <xsm/xsm.h>
 
@@ -12,9 +13,9 @@
 #include <public/hvm/hvm_op.h>
 
 #include <asm/hypercall.h>
+#include <asm/gic.h>
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
     long rc = 0;
 
@@ -65,3 +66,268 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     return rc;
 }
+
+/* Save CPU related states into save/retore context */
+static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_cpu ctxt;
+    struct vcpu_guest_core_regs c;
+    struct vcpu *v;
+    int ret = 0;
+
+    /* Save the state of CPU */
+    for_each_vcpu( d, v )
+    {
+        memset(&ctxt, 0, sizeof(ctxt));
+
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.ttbr0 = v->arch.ttbr0;
+        ctxt.ttbr1 = v->arch.ttbr1;
+        ctxt.ttbcr = v->arch.ttbcr;
+
+        ctxt.dacr = v->arch.dacr;
+        ctxt.ifsr = v->arch.ifsr;
+#ifdef CONFIG_ARM_32
+        ctxt.ifar = v->arch.ifar;
+        ctxt.dfar = v->arch.dfar;
+        ctxt.dfsr = v->arch.dfsr;
+#else
+        ctxt.far = v->arch.far;
+        ctxt.esr = v->arch.esr;
+#endif
+
+#ifdef CONFIG_ARM_32
+        ctxt.mair0 = v->arch.mair0;
+        ctxt.mair1 = v->arch.mair1;
+#else
+        ctxt.mair0 = v->arch.mair;
+#endif
+
+        /* Control Registers */
+        ctxt.actlr = v->arch.actlr;
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.cpacr = v->arch.cpacr;
+
+        ctxt.contextidr = v->arch.contextidr;
+        ctxt.tpidr_el0 = v->arch.tpidr_el0;
+        ctxt.tpidr_el1 = v->arch.tpidr_el1;
+        ctxt.tpidrro_el0 = v->arch.tpidrro_el0;
+
+        /* CP 15 */
+        ctxt.csselr = v->arch.csselr;
+
+        ctxt.afsr0 = v->arch.afsr0;
+        ctxt.afsr1 = v->arch.afsr1;
+        ctxt.vbar = v->arch.vbar;
+        ctxt.par = v->arch.par;
+        ctxt.teecr = v->arch.teecr;
+        ctxt.teehbr = v->arch.teehbr;
+
+#ifdef CONFIG_ARM_32
+        ctxt.joscr = v->arch.joscr;
+        ctxt.jmcr = v->arch.jmcr;
+#endif
+
+        memset(&c, 0, sizeof(c));
+
+        /* get guest core registers */
+        vcpu_regs_hyp_to_user(v, &c);
+
+        ctxt.x0 = c.x0;
+        ctxt.x1 = c.x1;
+        ctxt.x2 = c.x2;
+        ctxt.x3 = c.x3;
+        ctxt.x4 = c.x4;
+        ctxt.x5 = c.x5;
+        ctxt.x6 = c.x6;
+        ctxt.x7 = c.x7;
+        ctxt.x8 = c.x8;
+        ctxt.x9 = c.x9;
+        ctxt.x10 = c.x10;
+        ctxt.x11 = c.x11;
+        ctxt.x12 = c.x12;
+        ctxt.x13 = c.x13;
+        ctxt.x14 = c.x14;
+        ctxt.x15 = c.x15;
+        ctxt.x16 = c.x16;
+        ctxt.x17 = c.x17;
+        ctxt.x18 = c.x18;
+        ctxt.x19 = c.x19;
+        ctxt.x20 = c.x20;
+        ctxt.x21 = c.x21;
+        ctxt.x22 = c.x22;
+        ctxt.x23 = c.x23;
+        ctxt.x24 = c.x24;
+        ctxt.x25 = c.x25;
+        ctxt.x26 = c.x26;
+        ctxt.x27 = c.x27;
+        ctxt.x28 = c.x28;
+        ctxt.x29 = c.x29;
+        ctxt.x30 = c.x30;
+        ctxt.pc64 = c.pc64;
+        ctxt.cpsr = c.cpsr;
+        ctxt.spsr_el1 = c.spsr_el1; /* spsr_svc */
+
+#ifdef CONFIG_ARM_32
+        ctxt.spsr_fiq = c.spsr_fiq;
+        ctxt.spsr_irq = c.spsr_irq;
+        ctxt.spsr_und = c.spsr_und;
+        ctxt.spsr_abt = c.spsr_abt;
+#endif
+#ifdef CONFIG_ARM_64
+        ctxt.sp_el0 = c.sp_el0;
+        ctxt.sp_el1 = c.sp_el1;
+        ctxt.elr_el1 = c.elr_el1;
+#endif
+
+        /* check VFP state size */
+        BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp));
+        memcpy((void*) &ctxt.vfp, (void*) &v->arch.vfp, sizeof(v->arch.vfp));
+
+        ctxt.pause_flags = v->pause_flags;
+
+        if ( (ret = hvm_save_entry(VCPU, v->vcpu_id, h, &ctxt)) != 0 )
+            return ret;
+    }
+
+    return ret;
+}
+
+/* Load CPU related states from existing save/retore context */
+static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_cpu ctxt;
+    struct vcpu *v;
+    struct vcpu_guest_core_regs c;
+
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(VCPU, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.sctlr = ctxt.sctlr;
+    v->arch.ttbr0 = ctxt.ttbr0;
+    v->arch.ttbr1 = ctxt.ttbr1;
+    v->arch.ttbcr = ctxt.ttbcr;
+
+    v->arch.dacr = ctxt.dacr;
+    v->arch.ifsr = ctxt.ifsr;
+#ifdef CONFIG_ARM_32
+    v->arch.ifar = ctxt.ifar;
+    v->arch.dfar = ctxt.dfar;
+    v->arch.dfsr = ctxt.dfsr;
+#else
+    v->arch.far = ctxt.far;
+    v->arch.esr = ctxt.esr;
+#endif
+
+#ifdef CONFIG_ARM_32
+    v->arch.mair0 = ctxt.mair0;
+    v->arch.mair1 = ctxt.mair1;
+#else
+    v->arch.mair = ctxt.mair0;
+#endif
+
+    /* Control Registers */
+    v->arch.actlr = ctxt.actlr;
+    v->arch.cpacr = ctxt.cpacr;
+    v->arch.contextidr = ctxt.contextidr;
+    v->arch.tpidr_el0 = ctxt.tpidr_el0;
+    v->arch.tpidr_el1 = ctxt.tpidr_el1;
+    v->arch.tpidrro_el0 = ctxt.tpidrro_el0;
+
+    /* CP 15 */
+    v->arch.csselr = ctxt.csselr;
+
+    v->arch.afsr0 = ctxt.afsr0;
+    v->arch.afsr1 = ctxt.afsr1;
+    v->arch.vbar = ctxt.vbar;
+    v->arch.par = ctxt.par;
+    v->arch.teecr = ctxt.teecr;
+    v->arch.teehbr = ctxt.teehbr;
+#ifdef CONFIG_ARM_32
+    v->arch.joscr = ctxt.joscr;
+    v->arch.jmcr = ctxt.jmcr;
+#endif
+
+    /* fill guest core registers */
+    memset(&c, 0, sizeof(c));
+    c.x0 = ctxt.x0;
+    c.x1 = ctxt.x1;
+    c.x2 = ctxt.x2;
+    c.x3 = ctxt.x3;
+    c.x4 = ctxt.x4;
+    c.x5 = ctxt.x5;
+    c.x6 = ctxt.x6;
+    c.x7 = ctxt.x7;
+    c.x8 = ctxt.x8;
+    c.x9 = ctxt.x9;
+    c.x10 = ctxt.x10;
+    c.x11 = ctxt.x11;
+    c.x12 = ctxt.x12;
+    c.x13 = ctxt.x13;
+    c.x14 = ctxt.x14;
+    c.x15 = ctxt.x15;
+    c.x16 = ctxt.x16;
+    c.x17 = ctxt.x17;
+    c.x18 = ctxt.x18;
+    c.x19 = ctxt.x19;
+    c.x20 = ctxt.x20;
+    c.x21 = ctxt.x21;
+    c.x22 = ctxt.x22;
+    c.x23 = ctxt.x23;
+    c.x24 = ctxt.x24;
+    c.x25 = ctxt.x25;
+    c.x26 = ctxt.x26;
+    c.x27 = ctxt.x27;
+    c.x28 = ctxt.x28;
+    c.x29 = ctxt.x29;
+    c.x30 = ctxt.x30;
+    c.pc64 = ctxt.pc64;
+    c.cpsr = ctxt.cpsr;
+    c.spsr_el1 = ctxt.spsr_el1; /* spsr_svc */
+
+#ifdef CONFIG_ARM_32
+    c.spsr_fiq = ctxt.spsr_fiq;
+    c.spsr_irq = ctxt.spsr_irq;
+    c.spsr_und = ctxt.spsr_und;
+    c.spsr_abt = ctxt.spsr_abt;
+#endif
+#ifdef CONFIG_ARM_64
+    c.sp_el0 = ctxt.sp_el0;
+    c.sp_el1 = ctxt.sp_el1;
+    c.elr_el1 = ctxt.elr_el1;
+#endif
+
+    /* set guest core registers */
+    vcpu_regs_user_to_hyp(v, &c);
+
+    /* check VFP state size */
+    BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp));
+    memcpy(&v->arch.vfp, &ctxt,  sizeof(v->arch.vfp));
+
+    v->is_initialised = 1;
+    v->pause_flags = ctxt.pause_flags;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(VCPU, hvm_save_cpu_ctxt, hvm_load_cpu_ctxt, 1,
+                          HVMSR_PER_VCPU);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c
new file mode 100644
index 0000000..eef14a8
--- /dev/null
+++ b/xen/arch/arm/save.c
@@ -0,0 +1,65 @@
+/*
+ * Save.c: Save and restore HVM guest's emulated hardware state for ARM.
+ *
+ * Copyright (c) 2014, Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+#include <asm/hvm/support.h>
+#include <public/hvm/save.h>
+
+void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
+{
+    hdr->cpuid = current_cpu_data.midr.bits;
+}
+
+int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
+{
+    uint32_t cpuid;
+
+    if ( hdr->magic != HVM_FILE_MAGIC )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n",
+               d->domain_id, hdr->magic);
+        return -EINVAL;
+    }
+
+    if ( hdr->version != HVM_FILE_VERSION )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n",
+               d->domain_id, hdr->version);
+        return -EINVAL;
+    }
+
+    cpuid = current_cpu_data.midr.bits;
+    if ( hdr->cpuid != cpuid )
+    {
+        printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU "
+               "(%#"PRIx32") and restored on another (%#"PRIx32").\n",
+               d->domain_id, hdr->cpuid, cpuid);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 9fc9586..af244a7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -24,6 +24,7 @@
 #include <xen/softirq.h>
 #include <xen/irq.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 
 #include <asm/current.h>
 
@@ -73,6 +74,75 @@ static struct vgic_irq_rank *vgic_irq_rank(struct vcpu *v, int b, int n)
         return NULL;
 }
 
+/* Save rank info into a context to support domain save/restore */
+static int vgic_save_irq_rank(struct vcpu *v, struct vgic_rank *ext,
+                              struct vgic_irq_rank *rank)
+{
+    spin_lock(&rank->lock);
+
+    /* IENABLE, IACTIVE, IPEND, PENDSGI registers */
+    ext->ienable = rank->ienable;
+    ext->iactive = rank->iactive;
+    ext->ipend = rank->ipend;
+    ext->pendsgi = rank->pendsgi;
+
+    /* ICFG */
+    ext->icfg[0] = rank->icfg[0];
+    ext->icfg[1] = rank->icfg[1];
+
+    /* IPRIORITY */
+    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ext->ipriority));
+    memcpy(ext->ipriority, rank->ipriority, sizeof(rank->ipriority));
+
+    /* ITARGETS */
+    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ext->itargets));
+    memcpy(ext->itargets, rank->itargets, sizeof(rank->itargets));
+
+    spin_unlock(&rank->lock);
+
+    return 0;
+}
+
+/* Load rank info from a context to support for domain save/restore */
+static int vgic_load_irq_rank(struct vcpu *v, struct vgic_irq_rank *rank,
+                              struct vgic_rank *ext)
+{
+    struct pending_irq *p;
+    unsigned int irq = 0;
+    const unsigned long enable_bits = ext->ienable;
+
+    spin_lock(&rank->lock);
+
+    /* IENABLE, IACTIVE, IPEND, PENDSGI registers */
+    rank->ienable = ext->ienable;
+    rank->iactive = ext->iactive;
+    rank->ipend = ext->ipend;
+    rank->pendsgi = ext->pendsgi;
+
+    /* ICFG */
+    rank->icfg[0] = ext->icfg[0];
+    rank->icfg[1] = ext->icfg[1];
+
+    /* IPRIORITY */
+    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ext->ipriority));
+    memcpy(rank->ipriority, ext->ipriority, sizeof(rank->ipriority));
+
+    /* ITARGETS */
+    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ext->itargets));
+    memcpy(rank->itargets, ext->itargets, sizeof(rank->itargets));
+
+    /* Set IRQ status as enabled by iterating through rank->ienable */
+    while ( (irq = find_next_bit(&enable_bits, 32, irq)) < 32 ) {
+        p = irq_to_pending(v, irq);
+        set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
+        irq++;
+    }
+
+    spin_unlock(&rank->lock);
+
+    return 0;
+}
+
 int domain_vgic_init(struct domain *d)
 {
     int i;
@@ -749,6 +819,82 @@ out:
         smp_send_event_check_mask(cpumask_of(v->processor));
 }
 
+
+/* Save GIC state into a context to support save/restore */
+static int hvm_gic_save_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_gic ctxt;
+    struct vcpu *v;
+    int ret = 0;
+
+    /* Save the state of GICs */
+    for_each_vcpu( d, v )
+    {
+        ctxt.gic_hcr = v->arch.gic_hcr;
+        ctxt.gic_vmcr = v->arch.gic_vmcr;
+        ctxt.gic_apr = v->arch.gic_apr;
+
+        /* Save list registers and masks */
+        BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
+        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
+
+        ctxt.lr_mask = v->arch.lr_mask;
+        ctxt.event_mask = v->arch.event_mask;
+
+        /* Save PPI states (per-CPU), necessary for SMP-enabled guests */
+        if ( (ret = vgic_save_irq_rank(v, &ctxt.ppi_state,
+                                       &v->arch.vgic.private_irqs)) != 0 )
+            return ret;
+
+        if ( (ret = hvm_save_entry(GIC, v->vcpu_id, h, &ctxt)) != 0 )
+            return ret;
+    }
+
+    return ret;
+}
+
+/* Restore GIC state from a context to support save/restore */
+static int hvm_gic_load_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_gic ctxt;
+    struct vcpu *v;
+    int ret = 0;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(GIC, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.gic_hcr = ctxt.gic_hcr;
+    v->arch.gic_vmcr = ctxt.gic_vmcr;
+    v->arch.gic_apr = ctxt.gic_apr;
+
+    /* Restore list registers and masks */
+    BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
+    memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(v->arch.gic_lr));
+
+    v->arch.lr_mask = ctxt.lr_mask;
+    v->arch.event_mask = ctxt.event_mask;
+
+    /* Restore PPI states */
+    if ( (ret = vgic_load_irq_rank(v, &v->arch.vgic.private_irqs,
+                                   &ctxt.ppi_state)) != 0 )
+        return ret;
+
+    return ret;
+}
+
+HVM_REGISTER_SAVE_RESTORE(GIC, hvm_gic_save_ctxt, hvm_gic_load_ctxt, 1,
+                          HVMSR_PER_VCPU);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index 3d6a721..7c47eac 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -21,6 +21,7 @@
 #include <xen/lib.h>
 #include <xen/timer.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 #include <asm/irq.h>
 #include <asm/time.h>
 #include <asm/gic.h>
@@ -284,6 +285,76 @@ int vtimer_emulate(struct cpu_user_regs *regs, union hsr hsr)
     }
 }
 
+static int hvm_vtimer_save_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t;
+    int i, ret = 0;
+
+    /* Save the state of vtimer and ptimer */
+    for_each_vcpu( d, v )
+    {
+        t = &v->arch.virt_timer;
+        for ( i = 0; i < 2; i++ )
+        {
+            ctxt.cval = t->cval;
+            ctxt.ctl = t->ctl;
+            ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset :
+                d->arch.virt_timer_base.offset;
+            ctxt.type = i ? TIMER_TYPE_PHYS : TIMER_TYPE_VIRT;
+
+            if ( (ret = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
+                return ret;
+
+            t = &v->arch.phys_timer;
+        }
+    }
+
+    return ret;
+}
+
+static int hvm_vtimer_load_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t = NULL;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    if ( ctxt.type == TIMER_TYPE_VIRT )
+    {
+        t = &v->arch.virt_timer;
+        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
+    }
+    else
+    {
+        t = &v->arch.phys_timer;
+        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
+    }
+
+    t->cval = ctxt.cval;
+    t->ctl = ctxt.ctl;
+    t->v = v;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(TIMER, hvm_vtimer_save_ctxt, hvm_vtimer_load_ctxt,
+                          2, HVMSR_PER_VCPU);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 26635ff..30fbd30 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -399,76 +399,6 @@ long arch_do_domctl(
     }
     break;
 
-    case XEN_DOMCTL_sethvmcontext:
-    { 
-        struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size };
-
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            goto sethvmcontext_out;
-
-        ret = -ENOMEM;
-        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
-            goto sethvmcontext_out;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0)
-            goto sethvmcontext_out;
-
-        domain_pause(d);
-        ret = hvm_load(d, &c);
-        domain_unpause(d);
-
-    sethvmcontext_out:
-        if ( c.data != NULL )
-            xfree(c.data);
-    }
-    break;
-
-    case XEN_DOMCTL_gethvmcontext:
-    { 
-        struct hvm_domain_context c = { 0 };
-
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            goto gethvmcontext_out;
-
-        c.size = hvm_save_size(d);
-
-        if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) )
-        {
-            /* Client is querying for the correct buffer size */
-            domctl->u.hvmcontext.size = c.size;
-            ret = 0;
-            goto gethvmcontext_out;            
-        }
-
-        /* Check that the client has a big enough buffer */
-        ret = -ENOSPC;
-        if ( domctl->u.hvmcontext.size < c.size ) 
-            goto gethvmcontext_out;
-
-        /* Allocate our own marshalling buffer */
-        ret = -ENOMEM;
-        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
-            goto gethvmcontext_out;
-
-        domain_pause(d);
-        ret = hvm_save(d, &c);
-        domain_unpause(d);
-
-        domctl->u.hvmcontext.size = c.cur;
-        if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 )
-            ret = -EFAULT;
-
-    gethvmcontext_out:
-        copyback = 1;
-
-        if ( c.data != NULL )
-            xfree(c.data);
-    }
-    break;
-
     case XEN_DOMCTL_gethvmcontext_partial:
     { 
         ret = -EINVAL;
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3683ae3..13b781f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -62,7 +62,7 @@ obj-$(CONFIG_XENCOMM) += xencomm.o
 
 subdir-$(CONFIG_COMPAT) += compat
 
-subdir-$(x86_64) += hvm
+subdir-y += hvm
 
 subdir-$(coverage) += gcov
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 5342e5d..2ea4af5 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -24,6 +24,8 @@
 #include <xen/bitmap.h>
 #include <xen/paging.h>
 #include <xen/hypercall.h>
+#include <xen/hvm/save.h>
+#include <xen/guest_access.h>
 #include <asm/current.h>
 #include <asm/irq.h>
 #include <asm/page.h>
@@ -881,6 +883,78 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     }
     break;
 
+    case XEN_DOMCTL_sethvmcontext:
+    {
+        struct hvm_domain_context c = { .size = op->u.hvmcontext.size };
+
+        ret = -EINVAL;
+        if ( (d == current->domain) || /* no domain_pause() */
+             !is_hvm_domain(d) )
+            goto sethvmcontext_out;
+
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto sethvmcontext_out;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(c.data, op->u.hvmcontext.buffer, c.size) != 0)
+            goto sethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_load(d, &c);
+        domain_unpause(d);
+
+    sethvmcontext_out:
+        if ( c.data != NULL )
+            xfree(c.data);
+    }
+    break;
+
+    case XEN_DOMCTL_gethvmcontext:
+    {
+        struct hvm_domain_context c = { 0 };
+
+        ret = -EINVAL;
+        if ( (d == current->domain) || /* no domain_pause() */
+             !is_hvm_domain(d) )
+            goto gethvmcontext_out;
+
+        c.size = hvm_save_size(d);
+
+        if ( guest_handle_is_null(op->u.hvmcontext.buffer) )
+        {
+            /* Client is querying for the correct buffer size */
+            op->u.hvmcontext.size = c.size;
+            ret = 0;
+            goto gethvmcontext_out;
+        }
+
+        /* Check that the client has a big enough buffer */
+        ret = -ENOSPC;
+        if ( op->u.hvmcontext.size < c.size )
+            goto gethvmcontext_out;
+
+        /* Allocate our own marshalling buffer */
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto gethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_save(d, &c);
+        domain_unpause(d);
+
+        op->u.hvmcontext.size = c.cur;
+        if ( copy_to_guest(op->u.hvmcontext.buffer, c.data, c.size) != 0 )
+            ret = -EFAULT;
+
+    gethvmcontext_out:
+        copyback = 1;
+
+        if ( c.data != NULL )
+            xfree(c.data);
+    }
+    break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
new file mode 100644
index 0000000..09f7cb8
--- /dev/null
+++ b/xen/include/asm-arm/hvm/support.h
@@ -0,0 +1,29 @@
+/*
+ * asm-arm/hvm/support.h: HVM support routines used by ARM.
+ *
+ * Copyright (c) 2014, Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#ifndef __ASM_ARM_HVM_SUPPORT_H__
+#define __ASM_ARM_HVM_SUPPORT_H__
+
+#include <xen/types.h>
+#include <public/hvm/ioreq.h>
+#include <xen/sched.h>
+#include <xen/hvm/save.h>
+#include <asm/processor.h>
+
+#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 75b8e65..f6ad258 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -26,6 +26,136 @@
 #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
 #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
 
+#define HVM_FILE_MAGIC   0x92385520
+#define HVM_FILE_VERSION 0x00000001
+
+struct hvm_save_header
+{
+    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
+    uint32_t version;           /* File format version */
+    uint64_t changeset;         /* Version of Xen that saved this file */
+    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */
+};
+DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
+
+struct vgic_rank
+{
+    uint32_t ienable, iactive, ipend, pendsgi;
+    uint32_t icfg[2];
+    uint32_t ipriority[8];
+    uint32_t itargets[8];
+};
+
+struct hvm_hw_gic
+{
+    uint32_t gic_hcr;
+    uint32_t gic_vmcr;
+    uint32_t gic_apr;
+    uint32_t gic_lr[64];
+    uint64_t event_mask;
+    uint64_t lr_mask;
+    struct vgic_rank ppi_state;
+};
+DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic);
+
+#define TIMER_TYPE_VIRT 0
+#define TIMER_TYPE_PHYS 1
+
+struct hvm_hw_timer
+{
+    uint64_t vtb_offset;
+    uint32_t ctl;
+    uint64_t cval;
+    uint32_t type;
+};
+DECLARE_HVM_SAVE_TYPE(TIMER, 3, struct hvm_hw_timer);
+
+struct hvm_hw_cpu
+{
+#ifdef CONFIG_ARM_32
+    uint64_t vfp[34];  /* 32-bit VFP registers */
+#else
+    uint64_t vfp[66];  /* 64-bit VFP registers */
+#endif
+
+    /* Guest core registers */
+    uint64_t x0;     /* r0_usr */
+    uint64_t x1;     /* r1_usr */
+    uint64_t x2;     /* r2_usr */
+    uint64_t x3;     /* r3_usr */
+    uint64_t x4;     /* r4_usr */
+    uint64_t x5;     /* r5_usr */
+    uint64_t x6;     /* r6_usr */
+    uint64_t x7;     /* r7_usr */
+    uint64_t x8;     /* r8_usr */
+    uint64_t x9;     /* r9_usr */
+    uint64_t x10;    /* r10_usr */
+    uint64_t x11;    /* r11_usr */
+    uint64_t x12;    /* r12_usr */
+    uint64_t x13;    /* sp_usr */
+    uint64_t x14;    /* lr_usr; */
+    uint64_t x15;    /* __unused_sp_hyp */
+    uint64_t x16;    /* lr_irq */
+    uint64_t x17;    /* sp_irq */
+    uint64_t x18;    /* lr_svc */
+    uint64_t x19;    /* sp_svc */
+    uint64_t x20;    /* lr_abt */
+    uint64_t x21;    /* sp_abt */
+    uint64_t x22;    /* lr_und */
+    uint64_t x23;    /* sp_und */
+    uint64_t x24;    /* r8_fiq */
+    uint64_t x25;    /* r9_fiq */
+    uint64_t x26;    /* r10_fiq */
+    uint64_t x27;    /* r11_fiq */
+    uint64_t x28;    /* r12_fiq */
+    uint64_t x29;    /* fp,sp_fiq */
+    uint64_t x30;    /* lr_fiq */
+    uint64_t pc64;   /* ELR_EL2 */
+    uint32_t cpsr;   /* SPSR_EL2 */
+    uint32_t spsr_el1;  /*spsr_svc */
+    /* AArch32 guests only */
+    uint32_t spsr_fiq, spsr_irq, spsr_und, spsr_abt;
+    /* AArch64 guests only */
+    uint64_t sp_el0;
+    uint64_t sp_el1, elr_el1;
+
+    uint32_t sctlr, ttbcr;
+    uint64_t ttbr0, ttbr1;
+
+    uint32_t ifar, dfar;
+    uint32_t ifsr, dfsr;
+    uint32_t dacr;
+    uint64_t par;
+
+    uint64_t far;
+    uint64_t esr;
+
+    uint64_t mair0, mair1;
+    uint64_t tpidr_el0;
+    uint64_t tpidr_el1;
+    uint64_t tpidrro_el0;
+    uint64_t vbar;
+
+    /* Control Registers */
+    uint32_t actlr;
+    uint32_t cpacr;
+    uint32_t afsr0, afsr1;
+    uint32_t contextidr;
+    uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */
+    uint32_t joscr, jmcr;
+    /* CP 15 */
+    uint32_t csselr;
+
+    unsigned long pause_flags;
+
+};
+DECLARE_HVM_SAVE_TYPE(VCPU, 4, struct hvm_hw_cpu);
+
+/*
+ * Largest type-code in use
+ */
+#define HVM_SAVE_CODE_MAX 4
+
 #endif
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 22:46   ` Julien Grall
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch implements ddomain_get_maximum_gpfn by using max_mapped_gfn
field of P2M struct. A support function to retrieve guest VM pfn range
is also added.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/mm.c        |   21 ++++++++++++++++++++-
 xen/include/asm-arm/mm.h |    1 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 362bc8d..473ad04 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -947,7 +947,11 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 
 unsigned long domain_get_maximum_gpfn(struct domain *d)
 {
-    return -ENOSYS;
+    paddr_t end;
+
+    domain_get_gpfn_range(d, NULL, &end);
+
+    return (unsigned long)end;
 }
 
 void share_xen_page_with_guest(struct page_info *page,
@@ -1235,6 +1239,21 @@ int is_iomem_page(unsigned long mfn)
         return 1;
     return 0;
 }
+
+/*
+ * Return start and end addresses of guest VM
+ */
+void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    if ( start )
+        *start = GUEST_RAM_BASE;
+
+    if ( end )
+        *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index b8d4e7d..8347524 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -319,6 +319,7 @@ int donate_page(
 #define domain_set_alloc_bitsize(d) ((void)0)
 #define domain_clamp_alloc_bitsize(d, b) (b)
 
+void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end);
 unsigned long domain_get_maximum_gpfn(struct domain *d);
 
 extern struct domain *dom_xen, *dom_io, *dom_cow;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 3/6] xen/arm: support guest do_suspend function
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
  2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 23:38   ` Andrew Cooper
                     ` (2 more replies)
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
                   ` (11 subsequent siblings)
  14 siblings, 3 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the
success of suspend.

Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 tools/libxc/xc_resume.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 18b4818..2b09990 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -73,6 +73,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
     return 0;
 }
 
+#elif defined(__arm__) || defined(__aarch64__)
+
+static int modify_returncode(xc_interface *xch, uint32_t domid)
+{
+    vcpu_guest_context_any_t ctxt;
+    xc_dominfo_t info;
+    int rc;
+
+    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
+    {
+        PERROR("Could not get domain info");
+        return -EINVAL;
+    }
+
+    if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
+        return rc;
+
+    ctxt.c.user_regs.r0_usr = 1;
+
+    if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 )
+        return rc;
+
+    return 0;
+}
+
 #else
 
 static int modify_returncode(xc_interface *xch, uint32_t domid)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (2 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 22:29   ` Julien Grall
                     ` (2 more replies)
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
                   ` (10 subsequent siblings)
  14 siblings, 3 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

Thsi patch implements VLPT (virtual-linear page table) for fast accessing
of 3rd PTE of guest P2M. For more info about VLPT, please see
http://www.technovelty.org/linux/virtual-linear-page-table.html.

When creating a mapping for VLPT, just copy the 1st level PTE of guest p2m
to xen's 2nd level PTE. Then the mapping becomes the following:
      xen's 1st PTE -->
      xen's 2nd PTE (which is the same as 1st PTE of guest p2m) -->
      guest p2m's 2nd PTE -->
      guest p2m's 3rd PTE (the memory contents where the vlpt points)

This function is used in dirty-page tracing. When domU write-fault is
trapped by xen, xen can immediately locate the 3rd PTE of guest p2m.

The following link shows the performance comparison for handling a
dirty-page between vlpt and typical page table walking.
http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
---
 xen/arch/arm/domain.c        |    5 ++
 xen/arch/arm/mm.c            |  116 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/config.h |    7 +++
 xen/include/asm-arm/domain.h |    7 +++
 xen/include/asm-arm/mm.h     |   17 +++++++
 5 files changed, 152 insertions(+)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index b125857..3f04a77 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -502,6 +502,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    d->arch.dirty.second_lvl_start = 0;
+    d->arch.dirty.second_lvl_end = 0;
+    d->arch.dirty.second_lvl[0] = NULL;
+    d->arch.dirty.second_lvl[1] = NULL;
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 473ad04..a315752 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -750,6 +750,122 @@ void *__init arch_vmap_virt_end(void)
     return (void *)VMAP_VIRT_END;
 }
 
+/* Flush the vlpt area */
+void flush_vlpt(struct domain *d)
+{
+    int flush_size;
+
+    flush_size = (d->arch.dirty.second_lvl_end -
+                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
+
+    /* flushing the 3rd level mapping */
+    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start << SECOND_SHIFT,
+                                flush_size);
+}
+
+/* Restore the xen page table for vlpt mapping for domain */
+void restore_vlpt(struct domain *d)
+{
+    int i;
+
+    dsb(sy);
+
+    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
+          ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+
+        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
+        {
+            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
+            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
+        }
+    }
+    
+    dsb(sy);
+    isb();
+}
+
+/* Set up the xen page table for vlpt mapping for domain */
+int prepare_vlpt(struct domain *d)
+{
+    int xen_second_linear_base;
+    int gp2m_start_index, gp2m_end_index;
+    struct p2m_domain *p2m = &d->arch.p2m;
+    struct page_info *second_lvl_page;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    lpae_t *first[2];
+    int i;
+    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+    required = (gma_end - gma_start) >> LPAE_SHIFT;
+
+    if ( required > avail )
+    {
+        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest"
+                "(avail: %llx, required: %llx)\n", (unsigned long long)avail,
+                (unsigned long long)required);
+        return -ENOMEM;
+    }
+
+    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
+
+    gp2m_start_index = gma_start >> FIRST_SHIFT;
+    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
+
+    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
+    {
+        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
+        return -ENOMEM;
+    }
+
+    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
+    if ( second_lvl_page == NULL )
+        return -ENOMEM;
+
+    /* First level p2m is 2 consecutive pages */
+    d->arch.dirty.second_lvl[0] = map_domain_page_global(
+        page_to_mfn(second_lvl_page) );
+    d->arch.dirty.second_lvl[1] = map_domain_page_global(
+        page_to_mfn(second_lvl_page+1) );
+
+    first[0] = __map_domain_page(p2m->first_level);
+    first[1] = __map_domain_page(p2m->first_level+1);
+
+    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
+        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
+
+        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
+
+        /* we copy the mapping into domain's structure as a reference
+         * in case of the context switch (used in restore_vlpt) */
+        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
+    }
+    unmap_domain_page(first[0]);
+    unmap_domain_page(first[1]);
+
+    /* storing the start and end index */
+    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
+    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
+
+    flush_vlpt(d);
+
+    return 0;
+}
+
+void cleanup_vlpt(struct domain *d)
+{
+    /* First level p2m is 2 consecutive pages */
+    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
+    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
+}
 /*
  * This function should only be used to remap device address ranges
  * TODO: add a check to verify this assumption
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index ef291ff..47d1bce 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -87,6 +87,7 @@
  *   0  -   8M   <COMMON>
  *
  *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
+ * 128M - 256M   Virtual-linear mapping to P2M table
  * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
  *                    space
  *
@@ -124,7 +125,9 @@
 #define CONFIG_SEPARATE_XENHEAP 1
 
 #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
+#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
 #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
 #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
 #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
 #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
@@ -157,6 +160,10 @@
 
 #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
 
+/* VIRT_LIN_P2M_START and VIRT_LIN_P2M_END for vlpt */
+#define VIRT_LIN_P2M_START     _AT(vaddr_t, 0x08000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
+
 #endif
 
 /* Fixmap slots */
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 28c359a..5321bd6 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -161,6 +161,13 @@ struct arch_domain
         spinlock_t                  lock;
     } vuart;
 
+    /* dirty-page tracing */
+    struct {
+        volatile int second_lvl_start;   /* for context switch */
+        volatile int second_lvl_end;
+        lpae_t *second_lvl[2];           /* copy of guest p2m's first */
+    } dirty;
+
     unsigned int evtchn_irq;
 }  __cacheline_aligned;
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 8347524..5fd684f 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -4,6 +4,7 @@
 #include <xen/config.h>
 #include <xen/kernel.h>
 #include <asm/page.h>
+#include <asm/config.h>
 #include <public/xen.h>
 
 /* Align Xen to a 2 MiB boundary. */
@@ -342,6 +343,22 @@ static inline void put_page_and_type(struct page_info *page)
     put_page(page);
 }
 
+int prepare_vlpt(struct domain *d);
+void cleanup_vlpt(struct domain *d);
+void restore_vlpt(struct domain *d);
+
+/* calculate the xen's virtual address for accessing the leaf PTE of
+ * a given address (GPA) */
+static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
+{
+    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
+
+    /* Since we slotted the guest's first p2m page table to xen's
+     * second page table, one shift is enough for calculating the
+     * index of guest p2m table entry */
+    return &table[addr >> PAGE_SHIFT];
+}
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (3 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 23:38   ` Andrew Cooper
  2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch adds hypercall for shadow operations, including enable/disable
and clean/peek dirty page bitmap.

The design consists of two parts: dirty page detecting and saving. For
detecting, we setup the guest p2m's leaf PTE read-only and whenever
the guest tries to write something, permission fault happens and traps
into xen. The permission-faulted GPA should be saved for the toolstack,
which checks which pages are dirty. For this purpose, it temporarily saves
the GPAs into bitmap.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/domain.c           |   14 +++
 xen/arch/arm/domctl.c           |   21 ++++
 xen/arch/arm/mm.c               |  105 ++++++++++++++++++-
 xen/arch/arm/p2m.c              |  211 +++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |   11 ++
 xen/include/asm-arm/domain.h    |    7 ++
 xen/include/asm-arm/mm.h        |   10 ++
 xen/include/asm-arm/p2m.h       |    8 +-
 xen/include/asm-arm/processor.h |    2 +
 9 files changed, 387 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 3f04a77..d2531ed 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -207,6 +207,12 @@ static void ctxt_switch_to(struct vcpu *n)
 
     isb();
 
+    /* Dirty-page tracing
+     * NB: How do we consider SMP case?
+     */
+    if ( n->domain->arch.dirty.mode )
+        restore_vlpt(n->domain);
+
     /* This is could trigger an hardware interrupt from the virtual
      * timer. The interrupt needs to be injected into the guest. */
     virt_timer_restore(n);
@@ -502,11 +508,19 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    /* init for dirty-page tracing */
+    d->arch.dirty.count = 0;
+    d->arch.dirty.mode = 0;
+    spin_lock_init(&d->arch.dirty.lock);
+
     d->arch.dirty.second_lvl_start = 0;
     d->arch.dirty.second_lvl_end = 0;
     d->arch.dirty.second_lvl[0] = NULL;
     d->arch.dirty.second_lvl[1] = NULL;
 
+    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
+    d->arch.dirty.bitmap_pages = 0;
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index 45974e7..e84651f 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -11,12 +11,33 @@
 #include <xen/sched.h>
 #include <xen/hypercall.h>
 #include <public/domctl.h>
+#include <xen/hvm/save.h>
+#include <xen/guest_access.h>
+
 
 long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
+    long ret = 0;
+
     switch ( domctl->cmd )
     {
+    case XEN_DOMCTL_shadow_op:
+    {
+        if ( d == current->domain ) /* no domain_pause() */
+            return -EINVAL;
+                                          
+        domain_pause(d);
+        ret = dirty_mode_op(d, &domctl->u.shadow_op);
+        domain_unpause(d);
+
+        if ( __copy_to_guest(u_domctl, domctl, 1) )
+            ret = -EFAULT;
+
+        return ret;
+    }
+    break;
+
     case XEN_DOMCTL_cacheflush:
     {
         unsigned long s = domctl->u.cacheflush.start_pfn;
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index a315752..ae852eb 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -981,7 +981,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e)
     create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
 }
 
-enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
 static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)
 {
     lpae_t pte;
@@ -1370,6 +1369,110 @@ void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end)
         *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);
 }
 
+static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
+{
+    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
+    int bit_index = PFN_DOWN(addr - ram_base);
+    int page_index = bit_index >> (PAGE_SHIFT + 3);
+    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) - 1);
+
+    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
+}
+
+/* Routine for dirty-page tracing
+ *
+ * On first write, it page faults, its entry is changed to read-write,
+ * and on retry the write succeeds. For locating p2m of the faulting entry,
+ * we use virtual-linear page table.
+ *
+ * Returns zero if addr is not valid or dirty mode is not set
+ */
+int handle_page_fault(struct domain *d, paddr_t addr)
+{
+
+    lpae_t *vlp2m_pte = 0;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+
+    if ( !d->arch.dirty.mode )
+        return 0;
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+    /* Ensure that addr is inside guest's RAM */
+    if ( addr < gma_start || addr > gma_end )
+        return 0;
+
+    vlp2m_pte = get_vlpt_3lvl_pte(addr);
+    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
+         vlp2m_pte->p2m.type == p2m_ram_logdirty )
+    {
+        lpae_t pte = *vlp2m_pte;
+        pte.p2m.write = 1;
+        write_pte(vlp2m_pte, pte);
+        flush_tlb_local();
+
+        /* only necessary to lock between get-dirty bitmap and mark dirty
+         * bitmap. If get-dirty bitmap happens immediately before this
+         * lock, the corresponding dirty-page would be marked at the next
+         * round of get-dirty bitmap */
+        spin_lock(&d->arch.dirty.lock);
+        mark_dirty_bitmap(d, addr);
+        spin_unlock(&d->arch.dirty.lock);
+    }
+
+    return 1;
+}
+
+int prepare_bitmap(struct domain *d)
+{
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    int nr_bytes;
+    int nr_pages;
+    int i;
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+
+    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
+    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+    BUG_ON( nr_pages > MAX_DIRTY_BITMAP_PAGES );
+
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        struct page_info *page;
+
+        page = alloc_domheap_page(NULL, 0);
+        if ( page == NULL )
+            goto cleanup_on_failure;
+
+        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));
+        clear_page(d->arch.dirty.bitmap[i]);
+    }
+
+    d->arch.dirty.bitmap_pages = nr_pages;
+    return 0;
+
+cleanup_on_failure:
+    nr_pages = i;
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+
+    return -ENOMEM;
+}
+
+void cleanup_bitmap(struct domain *d)
+{
+    int i;
+
+    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 403fd89..d57a44a 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -6,6 +6,8 @@
 #include <xen/bitops.h>
 #include <asm/flushtlb.h>
 #include <asm/gic.h>
+#include <xen/guest_access.h>
+#include <xen/pfn.h>
 #include <asm/event.h>
 #include <asm/hardirq.h>
 #include <asm/page.h>
@@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
         break;
 
     case p2m_ram_ro:
+    case p2m_ram_logdirty:
         e.p2m.xn = 0;
         e.p2m.write = 0;
         break;
@@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
 
+    /* mark the write bit (page table's case, ro bit) as 0. So it is writable 
+     * in case of vlpt access */
+    pte.pt.ro = 0;
+
     write_pte(entry, pte);
 
     return 0;
@@ -697,6 +704,210 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
     return p >> PAGE_SHIFT;
 }
 
+/* Change types across all p2m entries in a domain */
+void p2m_change_entry_type_global(struct domain *d, enum mg nt)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+    paddr_t ram_base;
+    int i1, i2, i3;
+    int first_index, second_index, third_index;
+    lpae_t *first = __map_domain_page(p2m->first_level);
+    lpae_t pte, *second = NULL, *third = NULL;
+
+    domain_get_gpfn_range(d, &ram_base, NULL);
+
+    first_index = first_table_offset((uint64_t)ram_base);
+    second_index = second_table_offset((uint64_t)ram_base);
+    third_index = third_table_offset((uint64_t)ram_base);
+
+    BUG_ON( !first && "Can't map first level p2m." );
+
+    spin_lock(&p2m->lock);
+
+    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
+    {
+        lpae_walk_t first_pte = first[i1].walk;
+
+        if ( !first_pte.valid || !first_pte.table )
+            goto out;
+
+        second = map_domain_page(first_pte.base);
+        BUG_ON( !second && "Can't map second level p2m.");
+
+        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
+        {
+            lpae_walk_t second_pte = second[i2].walk;
+
+            if ( !second_pte.valid || !second_pte.table )
+                goto out;
+
+            third = map_domain_page(second_pte.base);
+            BUG_ON( !third && "Can't map third level p2m.");
+
+            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
+            {
+
+                lpae_walk_t third_pte = third[i3].walk;
+                if ( !third_pte.valid )
+                    goto out;
+
+                pte = third[i3];
+                if ( nt == mg_ro )
+                {
+                    if ( pte.p2m.write == 1 )
+                    {
+                        pte.p2m.write = 0;
+                        pte.p2m.type = p2m_ram_logdirty;
+                    }
+                    else
+                    {
+                        /* reuse avail bit as an indicator of 'actual'
+                         * read-only */
+                        pte.p2m.type = p2m_ram_rw;
+                    }
+                }
+                else if ( nt == mg_rw )
+                {
+                    if ( pte.p2m.write == 0 &&
+                         pte.p2m.type == p2m_ram_logdirty )
+                    {
+                        pte.p2m.write = p2m_ram_rw;
+                    }
+                }
+                write_pte(&third[i3], pte);
+            }
+            unmap_domain_page(third);
+
+            third = NULL;
+            third_index = 0;
+        }
+        unmap_domain_page(second);
+
+        second = NULL;
+        second_index = 0;
+        third_index = 0;
+    }
+
+out:
+    flush_tlb_all_local();
+    if ( third ) unmap_domain_page(third);
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
+
+    spin_unlock(&p2m->lock);
+}
+
+/* Read a domain's log-dirty bitmap and stats.
+ * If the operation is a CLEAN, clear the bitmap and stats. */
+int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    int peek = 1;
+    int i;
+    int bitmap_size;
+    paddr_t gma_start, gma_end;
+
+    /* this hypercall is called from domain 0, and we don't know which guest's
+     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
+    restore_vlpt(d);
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+    bitmap_size = (gma_end - gma_start) / 8;
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+    {
+        peek = 0;
+    }
+    else
+    {
+        spin_lock(&d->arch.dirty.lock);
+        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+        {
+            int j = 0;
+            uint8_t *bitmap;
+            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
+                                 d->arch.dirty.bitmap[i],
+                                 bitmap_size < PAGE_SIZE ? bitmap_size :
+                                                           PAGE_SIZE);
+            bitmap_size -= PAGE_SIZE;
+
+            /* set p2m page table read-only */
+            bitmap = d->arch.dirty.bitmap[i];
+            while ((j = find_next_bit((const long unsigned int *)bitmap,
+                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
+            {
+                lpae_t *vlpt;
+                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +
+                    (j << PAGE_SHIFT);
+                vlpt = get_vlpt_3lvl_pte(addr);
+                vlpt->p2m.write = 0;
+                j++;
+            }
+        }
+
+        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
+        {
+            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+            {
+                clear_page(d->arch.dirty.bitmap[i]);
+            }
+        }
+
+        spin_unlock(&d->arch.dirty.lock);
+        flush_tlb_local();
+    }
+
+    sc->stats.dirty_count = d->arch.dirty.count;
+
+    return 0;
+}
+
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    long ret = 0;
+    switch (sc->op)
+    {
+        case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        case XEN_DOMCTL_SHADOW_OP_OFF:
+        {
+            enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro;
+
+            d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1;
+            p2m_change_entry_type_global(d, nt);
+
+            if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF )
+            {
+                cleanup_vlpt(d);
+                cleanup_bitmap(d);
+            }
+            else
+            {
+                if ( (ret = prepare_vlpt(d)) )
+                   return ret;
+
+                if ( (ret = prepare_bitmap(d)) )
+                {
+                   /* in case of failure, we have to cleanup vlpt */
+                   cleanup_vlpt(d);
+                   return ret;
+                }
+            }
+        }
+        break;
+
+        case XEN_DOMCTL_SHADOW_OP_CLEAN:
+        case XEN_DOMCTL_SHADOW_OP_PEEK:
+        {
+            ret = log_dirty_op(d, sc);
+        }
+        break;
+
+        default:
+            return -ENOSYS;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index a7edc4e..cca34e9 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1491,6 +1491,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
     struct hsr_dabt dabt = hsr.dabt;
     int rc;
     mmio_info_t info;
+    int page_fault = ( (dabt.dfsc & FSC_MASK) ==
+                       (FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write );
 
     if ( !check_conditional_instr(regs, hsr) )
     {
@@ -1512,6 +1514,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
     if ( rc == -EFAULT )
         goto bad_data_abort;
 
+    /* domU page fault handling for guest live migration. dabt.valid can be 
+     * 0 here.
+     */
+    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
+    {
+        /* Do not modify pc after page fault to repeat memory operation */
+        return;
+    }
+
     /* XXX: Decode the instruction if ISS is not valid */
     if ( !dabt.valid )
         goto bad_data_abort;
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 5321bd6..99f9f51 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -163,9 +163,16 @@ struct arch_domain
 
     /* dirty-page tracing */
     struct {
+#define MAX_DIRTY_BITMAP_PAGES 64        /* support upto 8GB guest memory */
+        spinlock_t lock;                 /* protect list: head, mvn_head */
+        volatile int mode;               /* 1 if dirty pages tracing enabled */
+        volatile unsigned int count;     /* dirty pages counter */
         volatile int second_lvl_start;   /* for context switch */
         volatile int second_lvl_end;
         lpae_t *second_lvl[2];           /* copy of guest p2m's first */
+        /* dirty bitmap */
+        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES];
+        int bitmap_pages;                /* number of dirty bitmap pages */
     } dirty;
 
     unsigned int evtchn_irq;
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 5fd684f..5f9478b 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -343,10 +343,18 @@ static inline void put_page_and_type(struct page_info *page)
     put_page(page);
 }
 
+enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
+
+/* routine for dirty-page tracing */
+int handle_page_fault(struct domain *d, paddr_t addr);
+
 int prepare_vlpt(struct domain *d);
 void cleanup_vlpt(struct domain *d);
 void restore_vlpt(struct domain *d);
 
+int prepare_bitmap(struct domain *d);
+void cleanup_bitmap(struct domain *d);
+
 /* calculate the xen's virtual address for accessing the leaf PTE of
  * a given address (GPA) */
 static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
@@ -359,6 +367,8 @@ static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
     return &table[addr >> PAGE_SHIFT];
 }
 
+void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index bd71abe..0cecbe7 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -2,6 +2,7 @@
 #define _XEN_P2M_H
 
 #include <xen/mm.h>
+#include <public/domctl.h>
 
 struct domain;
 
@@ -41,6 +42,7 @@ typedef enum {
     p2m_invalid = 0,    /* Nothing mapped here */
     p2m_ram_rw,         /* Normal read/write guest RAM */
     p2m_ram_ro,         /* Read-only; writes are silently dropped */
+    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
     p2m_mmio_direct,    /* Read/write mapping of genuine MMIO area */
     p2m_map_foreign,    /* Ram pages from foreign domain */
     p2m_grant_map_rw,   /* Read/write grant mapping */
@@ -49,7 +51,8 @@ typedef enum {
 } p2m_type_t;
 
 #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
-#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
+#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro ||  \
+                             (_t) == p2m_ram_logdirty)
 
 /* Initialise vmid allocator */
 void p2m_vmid_allocator_init(void);
@@ -178,6 +181,9 @@ static inline int get_page_and_type(struct page_info *page,
     return rc;
 }
 
+void p2m_change_entry_type_global(struct domain *d, enum mg nt);
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
+
 #endif /* _XEN_P2M_H */
 
 /*
diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index 06e638f..9dc49c3 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -399,6 +399,8 @@ union hsr {
 #define FSC_CPR        (0x3a) /* Coprocossor Abort */
 
 #define FSC_LL_MASK    (_AC(0x03,U)<<0)
+#define FSC_MASK       (0x3f) /* Fault status mask */
+#define FSC_3D_LEVEL   (0x03) /* Third level fault*/
 
 /* Time counter hypervisor control register */
 #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical counter */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (4 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 23:40   ` Andrew Cooper
  2014-04-15 21:05 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch implements xl save/restore operation in xc_arm_migrate.c and
make it compilable with existing design. The operation is also used by
migration.

The overall process of save is the following:
1) save guest parameters (i.e., memory map, console and store pfn, etc)
2) save memory (if it is live migration, perform dirty-page tracing)
3) save hvm states (i.e., gic, timer, vcpu etc)

Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 config/arm32.mk              |    1 +
 config/arm64.mk              |    1 +
 tools/libxc/Makefile         |    6 +-
 tools/libxc/xc_arm_migrate.c |  702 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c     |    4 +-
 tools/libxl/libxl.h          |    3 -
 tools/misc/Makefile          |    4 +-
 7 files changed, 714 insertions(+), 7 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c

diff --git a/config/arm32.mk b/config/arm32.mk
index aa79d22..01374c9 100644
--- a/config/arm32.mk
+++ b/config/arm32.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_32 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX :=
 
diff --git a/config/arm64.mk b/config/arm64.mk
index 15b57a4..7ac3b65 100644
--- a/config/arm64.mk
+++ b/config/arm64.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_64 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX :=
 
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 2cca2b2..6b90b1c 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -43,8 +43,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
 GUEST_SRCS-y :=
 GUEST_SRCS-y += xg_private.c xc_suspend.c
 ifeq ($(CONFIG_MIGRATE),y)
+ifeq ($(CONFIG_X86),y)
 GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
+endif
+ifeq ($(CONFIG_ARM),y)
+GUEST_SRCS-y += xc_arm_migrate.c
+endif
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
@@ -64,7 +69,6 @@ $(patsubst %.c,%.opic,$(ELF_SRCS-y)): CFLAGS += -Wno-pointer-sign
 GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
 GUEST_SRCS-y                 += xc_dom_elfloader.c
 GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
-GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
 GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
 GUEST_SRCS-y                 += xc_dom_binloader.c
 GUEST_SRCS-y                 += xc_dom_compat_linux.c
diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c
new file mode 100644
index 0000000..ab2b94c
--- /dev/null
+++ b/tools/libxc/xc_arm_migrate.c
@@ -0,0 +1,702 @@
+/*
+ * Copyright (c) 2013, Samsung Electronics
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <inttypes.h>
+#include <errno.h>
+#include <xenctrl.h>
+#include <xenguest.h>
+
+#include <unistd.h>
+#include <xc_private.h>
+#include <xc_dom.h>
+#include "xc_bitops.h"
+#include "xg_private.h"
+
+#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop   */
+#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size  */
+#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */
+#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
+
+//#define DISABLE_LIVE_MIGRATION
+
+//#define ARM_MIGRATE_VERBOSE
+
+/*
+ * Guest params to save: used HVM params, save flags, memory map
+ */
+typedef struct guest_params
+{
+    unsigned long console_pfn;
+    unsigned long store_pfn;
+    uint32_t flags;
+    xen_pfn_t start_gpfn;
+    xen_pfn_t max_gpfn;
+    uint32_t max_vcpu_id;
+} guest_params_t;
+
+static int suspend_and_state(int (*suspend)(void*), void *data,
+                             xc_interface *xch, int dom)
+{
+    xc_dominfo_t info;
+    if ( !(*suspend)(data) )
+    {
+        ERROR("Suspend request failed");
+        return -1;
+    }
+
+    if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) ||
+         !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
+    {
+        ERROR("Domain is not in suspended state after suspend attempt");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_exact_handled(xc_interface *xch, int fd, const void *data,
+                               size_t size)
+{
+    if ( write_exact(fd, data, size) )
+    {
+        ERROR("Write failed, check space");
+        return -1;
+    }
+    return 0;
+}
+
+/* ============ Memory ============= */
+static int save_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                       struct save_callbacks *callbacks,
+                       uint32_t max_iters, uint32_t max_factor,
+                       guest_params_t *params)
+{
+    int live =  !!(params->flags & XCFLAGS_LIVE);
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    xen_pfn_t i;
+    char reportbuf[80];
+    int iter = 0;
+    int last_iter = !live;
+    int total_dirty_pages_num = 0;
+    int dirty_pages_on_prev_iter_num = 0;
+    int count = 0;
+    char *page = 0;
+    xen_pfn_t *busy_pages = 0;
+    int busy_pages_count = 0;
+    int busy_pages_max = 256;
+
+    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
+
+    xen_pfn_t start = params->start_gpfn;
+    const xen_pfn_t end = params->max_gpfn;
+    const xen_pfn_t mem_size = end - start;
+
+    if ( debug )
+    {
+        IPRINTF("(save mem) start=%llx end=%llx!\n", (unsigned long long)start,
+                (unsigned long long)end);
+    }
+
+    if ( live )
+    {
+        if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                    NULL, 0, NULL, 0, NULL) < 0 )
+        {
+            ERROR("Couldn't enable log-dirty mode !\n");
+            return -1;
+        }
+
+        max_iters  = max_iters  ? : DEF_MAX_ITERS;
+        max_factor = max_factor ? : DEF_MAX_FACTOR;
+
+        if ( debug )
+            IPRINTF("Log-dirty mode enabled, max_iters=%d, max_factor=%d!\n",
+                    max_iters, max_factor);
+    }
+
+    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
+                                              NRPAGES(bitmap_size(mem_size)));
+    if ( !to_send )
+    {
+        ERROR("Couldn't allocate to_send array!\n");
+        return -1;
+    }
+
+    /* send all pages on first iter */
+    memset(to_send, 0xff, bitmap_size(mem_size));
+
+    for ( ; ; )
+    {
+        int dirty_pages_on_current_iter_num = 0;
+        int frc;
+        iter++;
+
+        snprintf(reportbuf, sizeof(reportbuf),
+                 "Saving memory: iter %d (last sent %u)",
+                 iter, dirty_pages_on_prev_iter_num);
+
+        xc_report_progress_start(xch, reportbuf, mem_size);
+
+        if ( (iter > 1 &&
+              dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) ||
+             (iter == max_iters) ||
+             (total_dirty_pages_num >= mem_size*max_factor) )
+        {
+            if ( debug )
+                IPRINTF("Last iteration");
+            last_iter = 1;
+        }
+
+        if ( last_iter )
+        {
+            if ( suspend_and_state(callbacks->suspend, callbacks->data,
+                                   xch, dom) )
+            {
+                ERROR("Domain appears not to have suspended");
+                return -1;
+            }
+        }
+        if ( live && iter > 1 )
+        {
+            frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                                    HYPERCALL_BUFFER(to_send), mem_size,
+                                                     NULL, 0, NULL);
+            if ( frc != mem_size )
+            {
+                ERROR("Error peeking shadow bitmap");
+                xc_hypercall_buffer_free_pages(xch, to_send,
+                                               NRPAGES(bitmap_size(mem_size)));
+                return -1;
+            }
+        }
+
+        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
+
+        for ( i = start; i < end; ++i )
+        {
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i);
+                if ( !page )
+                {
+                    /* This page is mapped elsewhere, should be resent later */
+                    busy_pages[busy_pages_count] = i;
+                    busy_pages_count++;
+                    if ( busy_pages_count >= busy_pages_max )
+                    {
+                        busy_pages_max += 256;
+                        busy_pages = realloc(busy_pages, sizeof(xen_pfn_t) *
+                                                         busy_pages_max);
+                    }
+                    continue;
+                }
+
+                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
+                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
+                {
+                    munmap(page, PAGE_SIZE);
+                    free(busy_pages);
+                    return -1;
+                }
+                count++;
+                munmap(page, PAGE_SIZE);
+
+                if ( (i % DEF_PROGRESS_RATE) == 0 )
+                    xc_report_progress_step(xch, i - start, mem_size);
+                dirty_pages_on_current_iter_num++;
+            }
+        }
+
+        while ( busy_pages_count )
+        {
+            /* Send busy pages */
+            busy_pages_count--;
+            i = busy_pages[busy_pages_count];
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE,PROT_READ, i);
+                if ( !page )
+                {
+                    IPRINTF("WARNING: 2nd attempt to save page "
+                            "busy failed pfn=%llx", (unsigned long long)i);
+                    continue;
+                }
+
+                if ( debug )
+                {
+                    IPRINTF("save mem: resend busy page %llx\n",
+                            (unsigned long long)i);
+                }
+
+                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
+                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
+                {
+                    munmap(page, PAGE_SIZE);
+                    free(busy_pages);
+                    return -1;
+                }
+                count++;
+                munmap(page, PAGE_SIZE);
+                dirty_pages_on_current_iter_num++;
+            }
+        }
+        free(busy_pages);
+
+        if ( debug )
+            IPRINTF("Dirty pages=%d", dirty_pages_on_current_iter_num);
+
+        xc_report_progress_step(xch, mem_size, mem_size);
+
+        dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num;
+        total_dirty_pages_num += dirty_pages_on_current_iter_num;
+
+        if ( last_iter )
+        {
+            xc_hypercall_buffer_free_pages(xch, to_send,
+                                           NRPAGES(bitmap_size(mem_size)));
+            if ( live )
+            {
+                if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF,
+                                       NULL, 0, NULL, 0, NULL) < 0 )
+                    ERROR("Couldn't disable log-dirty mode");
+            }
+            break;
+        }
+    }
+    if ( debug )
+    {
+        IPRINTF("save mem: pages count = %d\n", count);
+    }
+
+    i = (xen_pfn_t) -1; /* end page marker */
+    return write_exact_handled(xch, io_fd, &i, sizeof(i));
+}
+
+static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                          guest_params_t *params)
+{
+    xen_pfn_t end = params->max_gpfn;
+    xen_pfn_t gpfn;
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    int count = 0;
+    char *page;
+    xen_pfn_t start = params->start_gpfn;
+
+    /* TODO allocate several pages per call */
+    for ( gpfn = start; gpfn < end; ++gpfn )
+    {
+        if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) )
+        {
+            PERROR("Memory allocation for a new domain failed");
+            return -1;
+        }
+    }
+
+    while ( 1 )
+    {
+
+        if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) )
+        {
+            PERROR("GPFN read failed during memory transfer, count=%d", count);
+            return -1;
+        }
+        if ( gpfn == (xen_pfn_t) -1 ) break; /* end page marker */
+
+        if ( gpfn < start || gpfn >= end )
+        {
+            ERROR("GPFN %llx doesn't belong to RAM address space, count=%d",
+                  (unsigned long long)gpfn, count);
+            return -1;
+        }
+        page = xc_map_foreign_range(xch, dom, PAGE_SIZE,
+                                    PROT_READ | PROT_WRITE, gpfn);
+        if ( !page )
+        {
+            PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn);
+            return -1;
+        }
+        if ( read_exact(io_fd, page, PAGE_SIZE) )
+        {
+            PERROR("Page data read failed during memory transfer, pfn=%llx",
+                    gpfn);
+            return -1;
+        }
+        munmap(page, PAGE_SIZE);
+        count++;
+    }
+
+    if ( debug )
+    {
+        IPRINTF("Memory restored, pages count=%d", count);
+    }
+    return 0;
+}
+
+/* ============ HVM context =========== */
+static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug)
+{
+    /* HVM: a buffer for holding HVM context */
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    uint32_t rec_size;
+    int retval = -1;
+
+    /* Need another buffer for HVM context */
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
+    if ( hvm_buf_size == -1 )
+    {
+        ERROR("Couldn't get HVM context size from Xen");
+        goto out;
+    }
+    hvm_buf = malloc(hvm_buf_size);
+
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory for hvm buffer");
+        goto out;
+    }
+
+    /* Get HVM context from Xen and save it too */
+    if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf,
+                    hvm_buf_size)) == -1 )
+    {
+        ERROR("HVM:Could not get hvm buffer");
+        goto out;
+    }
+
+    if ( debug )
+        IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size);
+
+    if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) )
+        goto out;
+
+    if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) )
+    {
+        goto out;
+    }
+
+    retval = 0;
+
+out:
+    if ( hvm_buf )
+        free (hvm_buf);
+
+    return retval;
+}
+
+static int restore_armhvm(xc_interface *xch, int io_fd,
+                          uint32_t dom, int debug)
+{
+    uint32_t rec_size;
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    int frc = 0;
+    int retval = -1;
+
+    if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) )
+    {
+        PERROR("Could not read HVM size");
+        goto out;
+    }
+
+    if ( !rec_size )
+    {
+        ERROR("Zero HVM size");
+        goto out;
+    }
+
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
+    if ( hvm_buf_size != rec_size )
+    {
+        ERROR("HVM size for this domain is not the same as stored");
+    }
+
+    hvm_buf = malloc(hvm_buf_size);
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory");
+        goto out;
+    }
+
+    if ( read_exact(io_fd, hvm_buf, hvm_buf_size) )
+    {
+        PERROR("Could not read HVM context");
+        goto out;
+    }
+
+    frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size);
+    if ( frc )
+    {
+        ERROR("error setting the HVM context");
+        goto out;
+    }
+    retval = 0;
+
+    if ( debug )
+    {
+            IPRINTF("HVM restore size %d %d", hvm_buf_size, rec_size);
+    }
+out:
+    if ( hvm_buf )
+        free (hvm_buf);
+    return retval;
+}
+
+/* ================= Console & Xenstore & Memory map =========== */
+static int save_guest_params(xc_interface *xch, int io_fd,
+                             uint32_t dom, uint32_t flags,
+                             guest_params_t *params)
+{
+    size_t sz = sizeof(guest_params_t);
+    xc_dominfo_t dom_info;
+
+    params->max_gpfn = xc_domain_maximum_gpfn(xch, dom);
+    params->start_gpfn = (GUEST_RAM_BASE >> PAGE_SHIFT);
+
+    if ( flags & XCFLAGS_DEBUG )
+    {
+        IPRINTF("Guest param save size: %d ", (int)sz);
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+            &params->console_pfn) )
+    {
+        ERROR("Can't get console gpfn");
+        return -1;
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, &params->store_pfn) )
+    {
+        ERROR("Can't get store gpfn");
+        return -1;
+    }
+
+    if ( xc_domain_getinfo(xch, dom, 1, &dom_info ) < 0)
+    {
+        ERROR("Can't get domain info for dom %d", dom);
+        return -1;
+    }
+    params->max_vcpu_id = dom_info.max_vcpu_id;
+
+    params->flags = flags;
+
+    if ( write_exact_handled(xch, io_fd, params, sz) )
+    {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int restore_guest_params(xc_interface *xch, int io_fd,
+                                uint32_t dom, guest_params_t *params)
+{
+    size_t sz = sizeof(guest_params_t);
+    xen_pfn_t nr_pfns;
+    unsigned int maxmemkb;
+
+    if ( read_exact(io_fd, params, sizeof(guest_params_t)) )
+    {
+        PERROR("Can't read guest params");
+        return -1;
+    }
+
+    nr_pfns = params->max_gpfn - params->start_gpfn;
+    maxmemkb = (unsigned int) nr_pfns << (PAGE_SHIFT - 10);
+
+    if ( params->flags & XCFLAGS_DEBUG )
+    {
+        IPRINTF("Guest param restore size: %d ", (int)sz);
+        IPRINTF("Guest memory size: %d MB", maxmemkb >> 10);
+    }
+
+    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
+    {
+        ERROR("Can't set memory map");
+        return -1;
+    }
+
+    /* Set max. number of vcpus as max_vcpu_id + 1 */
+    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
+    {
+        ERROR("Can't set max vcpu number for domain");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom,
+                            guest_params_t *params, unsigned int console_evtchn,
+                            domid_t console_domid, unsigned int store_evtchn,
+                            domid_t store_domid)
+{
+    int rc = 0;
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) )
+    {
+        ERROR("Can't clear console page");
+        return rc;
+    }
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) )
+    {
+        ERROR("Can't clear xenstore page");
+        return rc;
+    }
+
+    if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn,
+                                      params->store_pfn, console_domid,
+                                      store_domid)) )
+    {
+        ERROR("Can't grant console and xenstore pages");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+                                params->console_pfn)) )
+    {
+        ERROR("Can't set console gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
+                                params->store_pfn)) )
+    {
+        ERROR("Can't set xenstore gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN,
+                                console_evtchn)) )
+    {
+        ERROR("Can't set console event channel");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN,
+                                store_evtchn)) )
+    {
+        ERROR("Can't set xenstore event channel");
+        return rc;
+    }
+    return 0;
+}
+
+/* ================== Main ============== */
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+                   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
+                   struct save_callbacks *callbacks, int hvm,
+                   unsigned long vm_generationid_addr)
+{
+    int debug;
+    guest_params_t params;
+
+#ifdef ARM_MIGRATE_VERBOSE
+    flags |= XCFLAGS_DEBUG;
+#endif
+
+#ifdef DISABLE_LIVE_MIGRATION
+    flags &= ~(XCFLAGS_LIVE);
+#endif
+
+    debug = !!(flags & XCFLAGS_DEBUG);
+    if ( save_guest_params(xch, io_fd, dom, flags, &params) )
+    {
+       ERROR("Can't save guest params");
+       return -1;
+    }
+
+    if ( save_memory(xch, io_fd, dom, callbacks, max_iters,
+            max_factor, &params) )
+    {
+        ERROR("Memory not saved");
+        return -1;
+    }
+
+    if ( save_armhvm(xch, io_fd, dom, debug) )
+    {
+        ERROR("HVM not saved");
+        return -1;
+    }
+
+    if ( debug )
+    {
+        IPRINTF("Domain %d saved", dom);
+    }
+    return 0;
+}
+
+int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
+                      unsigned int store_evtchn, unsigned long *store_gpfn,
+                      domid_t store_domid, unsigned int console_evtchn,
+                      unsigned long *console_gpfn, domid_t console_domid,
+                      unsigned int hvm, unsigned int pae, int superpages,
+                      int no_incr_generationid, int checkpointed_stream,
+                      unsigned long *vm_generationid_addr,
+                      struct restore_callbacks *callbacks)
+{
+    guest_params_t params;
+    int debug = 1;
+
+    if ( restore_guest_params(xch, io_fd, dom, &params) )
+    {
+        ERROR("Can't restore guest params");
+        return -1;
+    }
+    debug = !!(params.flags & XCFLAGS_DEBUG);
+
+    if ( restore_memory(xch, io_fd, dom, &params) )
+    {
+        ERROR("Can't restore memory");
+        return -1;
+    }
+    if ( set_guest_params(xch, io_fd, dom, &params,
+                console_evtchn, console_domid,
+                store_evtchn, store_domid) )
+    {
+        ERROR("Can't setup guest params");
+        return -1;
+    }
+
+    /* Setup console and store PFNs to caller */
+    *console_gpfn = params.console_pfn;
+    *store_gpfn = params.store_pfn;
+
+    if ( restore_armhvm(xch, io_fd, dom, debug) )
+    {
+        ERROR("HVM not restored");
+        return -1;
+    }
+
+    if ( debug )
+    {
+         IPRINTF("Domain %d restored", dom);
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index f051515..044a8de 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -335,7 +335,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
         modbase += dtb_size;
     }
 
-    return 0;
+    return xc_domain_setmaxmem(dom->xch, dom->guest_domid,
+                               (dom->total_pages + NR_MAGIC_PAGES)
+                                << (PAGE_SHIFT - 10));
 }
 
 int arch_setup_bootearly(struct xc_dom_image *dom)
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index b2c3015..e10f4fb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -441,9 +441,6 @@
  *  - libxl_domain_resume
  *  - libxl_domain_remus_start
  */
-#if defined(__arm__) || defined(__aarch64__)
-#define LIBXL_HAVE_NO_SUSPEND_RESUME 1
-#endif
 
 /*
  * LIBXL_HAVE_DEVICE_PCI_SEIZE
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 17aeda5..0824100 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
 
 TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov
 TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-TARGETS-$(CONFIG_MIGRATE) += xen-hptool
+TARGETS-$(CONFIG_X86) += xen-hptool
 TARGETS := $(TARGETS-y)
 
 SUBDIRS := $(SUBDIRS-y)
@@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
 INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \
 	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
 INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
+INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
 INSTALL_SBIN := $(INSTALL_SBIN-y)
 
 INSTALL_PRIVBIN-y := xenpvnetboot
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (5 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 22:23   ` Julien Grall
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

This series is RFC v2 for save/restore/migration. The following 
areas have been addressed:
  * save and restore of guest states is split into specific areas (and files)
  * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
  * name and layout of some functions
  * small areas commented by Julien Grall and Andrew Cooper

Note that:
  * previous comments by Ian are being examined.
  * patch 3-6 need more review attenion.
  * Rev v3 will be sent out soon.

Let me know if there are issues with the design. 

Thanks,
-Wei

  xen/arm: Save and restore support with hvm context hypercalls
  xen/arm: implement support for XENMEM_maximum_gpfn hypercall
  xen/arm: support guest do_suspend function
  xen/arm: Implement VLPT for guest p2m mapping in live migration
  xen/arm: Implement hypercall for dirty page tracing
  xen/arm: Implement toolstack for xl restore/save and migrate

 config/arm32.mk                        |    1 +
 config/arm64.mk                        |    1 +
 tools/libxc/Makefile                   |    6 +-
 tools/libxc/xc_arm_migrate.c           |  702 ++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c               |    4 +-
 tools/libxc/xc_resume.c                |   25 ++
 tools/libxl/libxl.h                    |    3 -
 tools/misc/Makefile                    |    4 +-
 xen/arch/arm/Makefile                  |    1 +
 xen/arch/arm/domain.c                  |   19 +
 xen/arch/arm/domctl.c                  |   21 +
 xen/arch/arm/hvm.c                     |  268 +++++++++++-
 xen/arch/arm/mm.c                      |  242 ++++++++++-
 xen/arch/arm/p2m.c                     |  211 ++++++++++
 xen/arch/arm/save.c                    |   65 +++
 xen/arch/arm/traps.c                   |   11 +
 xen/arch/arm/vgic.c                    |  146 +++++++
 xen/arch/arm/vtimer.c                  |   71 ++++
 xen/arch/x86/domctl.c                  |   70 ----
 xen/common/Makefile                    |    2 +-
 xen/common/domctl.c                    |   74 ++++
 xen/include/asm-arm/config.h           |    7 +
 xen/include/asm-arm/domain.h           |   14 +
 xen/include/asm-arm/hvm/support.h      |   29 ++
 xen/include/asm-arm/mm.h               |   28 ++
 xen/include/asm-arm/p2m.h              |    8 +-
 xen/include/asm-arm/processor.h        |    2 +
 xen/include/public/arch-arm/hvm/save.h |  130 ++++++
 28 files changed, 2083 insertions(+), 82 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (6 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch implements HVM context hypercalls to support ARM
guest save and restore. It saves the states of guest GIC,
arch timer, and CPU registers.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/Makefile                  |    1 +
 xen/arch/arm/hvm.c                     |  268 +++++++++++++++++++++++++++++++-
 xen/arch/arm/save.c                    |   65 ++++++++
 xen/arch/arm/vgic.c                    |  146 +++++++++++++++++
 xen/arch/arm/vtimer.c                  |   71 +++++++++
 xen/arch/x86/domctl.c                  |   70 ---------
 xen/common/Makefile                    |    2 +-
 xen/common/domctl.c                    |   74 +++++++++
 xen/include/asm-arm/hvm/support.h      |   29 ++++
 xen/include/public/arch-arm/hvm/save.h |  130 ++++++++++++++++
 10 files changed, 784 insertions(+), 72 deletions(-)
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 63e0460..d9a328c 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -33,6 +33,7 @@ obj-y += hvm.o
 obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
+obj-y += save.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 471c4cd..18eb2bd 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -4,6 +4,7 @@
 #include <xen/errno.h>
 #include <xen/guest_access.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 
 #include <xsm/xsm.h>
 
@@ -12,9 +13,9 @@
 #include <public/hvm/hvm_op.h>
 
 #include <asm/hypercall.h>
+#include <asm/gic.h>
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
     long rc = 0;
 
@@ -65,3 +66,268 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     return rc;
 }
+
+/* Save CPU related states into save/retore context */
+static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_cpu ctxt;
+    struct vcpu_guest_core_regs c;
+    struct vcpu *v;
+    int ret = 0;
+
+    /* Save the state of CPU */
+    for_each_vcpu( d, v )
+    {
+        memset(&ctxt, 0, sizeof(ctxt));
+
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.ttbr0 = v->arch.ttbr0;
+        ctxt.ttbr1 = v->arch.ttbr1;
+        ctxt.ttbcr = v->arch.ttbcr;
+
+        ctxt.dacr = v->arch.dacr;
+        ctxt.ifsr = v->arch.ifsr;
+#ifdef CONFIG_ARM_32
+        ctxt.ifar = v->arch.ifar;
+        ctxt.dfar = v->arch.dfar;
+        ctxt.dfsr = v->arch.dfsr;
+#else
+        ctxt.far = v->arch.far;
+        ctxt.esr = v->arch.esr;
+#endif
+
+#ifdef CONFIG_ARM_32
+        ctxt.mair0 = v->arch.mair0;
+        ctxt.mair1 = v->arch.mair1;
+#else
+        ctxt.mair0 = v->arch.mair;
+#endif
+
+        /* Control Registers */
+        ctxt.actlr = v->arch.actlr;
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.cpacr = v->arch.cpacr;
+
+        ctxt.contextidr = v->arch.contextidr;
+        ctxt.tpidr_el0 = v->arch.tpidr_el0;
+        ctxt.tpidr_el1 = v->arch.tpidr_el1;
+        ctxt.tpidrro_el0 = v->arch.tpidrro_el0;
+
+        /* CP 15 */
+        ctxt.csselr = v->arch.csselr;
+
+        ctxt.afsr0 = v->arch.afsr0;
+        ctxt.afsr1 = v->arch.afsr1;
+        ctxt.vbar = v->arch.vbar;
+        ctxt.par = v->arch.par;
+        ctxt.teecr = v->arch.teecr;
+        ctxt.teehbr = v->arch.teehbr;
+
+#ifdef CONFIG_ARM_32
+        ctxt.joscr = v->arch.joscr;
+        ctxt.jmcr = v->arch.jmcr;
+#endif
+
+        memset(&c, 0, sizeof(c));
+
+        /* get guest core registers */
+        vcpu_regs_hyp_to_user(v, &c);
+
+        ctxt.x0 = c.x0;
+        ctxt.x1 = c.x1;
+        ctxt.x2 = c.x2;
+        ctxt.x3 = c.x3;
+        ctxt.x4 = c.x4;
+        ctxt.x5 = c.x5;
+        ctxt.x6 = c.x6;
+        ctxt.x7 = c.x7;
+        ctxt.x8 = c.x8;
+        ctxt.x9 = c.x9;
+        ctxt.x10 = c.x10;
+        ctxt.x11 = c.x11;
+        ctxt.x12 = c.x12;
+        ctxt.x13 = c.x13;
+        ctxt.x14 = c.x14;
+        ctxt.x15 = c.x15;
+        ctxt.x16 = c.x16;
+        ctxt.x17 = c.x17;
+        ctxt.x18 = c.x18;
+        ctxt.x19 = c.x19;
+        ctxt.x20 = c.x20;
+        ctxt.x21 = c.x21;
+        ctxt.x22 = c.x22;
+        ctxt.x23 = c.x23;
+        ctxt.x24 = c.x24;
+        ctxt.x25 = c.x25;
+        ctxt.x26 = c.x26;
+        ctxt.x27 = c.x27;
+        ctxt.x28 = c.x28;
+        ctxt.x29 = c.x29;
+        ctxt.x30 = c.x30;
+        ctxt.pc64 = c.pc64;
+        ctxt.cpsr = c.cpsr;
+        ctxt.spsr_el1 = c.spsr_el1; /* spsr_svc */
+
+#ifdef CONFIG_ARM_32
+        ctxt.spsr_fiq = c.spsr_fiq;
+        ctxt.spsr_irq = c.spsr_irq;
+        ctxt.spsr_und = c.spsr_und;
+        ctxt.spsr_abt = c.spsr_abt;
+#endif
+#ifdef CONFIG_ARM_64
+        ctxt.sp_el0 = c.sp_el0;
+        ctxt.sp_el1 = c.sp_el1;
+        ctxt.elr_el1 = c.elr_el1;
+#endif
+
+        /* check VFP state size */
+        BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp));
+        memcpy((void*) &ctxt.vfp, (void*) &v->arch.vfp, sizeof(v->arch.vfp));
+
+        ctxt.pause_flags = v->pause_flags;
+
+        if ( (ret = hvm_save_entry(VCPU, v->vcpu_id, h, &ctxt)) != 0 )
+            return ret;
+    }
+
+    return ret;
+}
+
+/* Load CPU related states from existing save/retore context */
+static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_cpu ctxt;
+    struct vcpu *v;
+    struct vcpu_guest_core_regs c;
+
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(VCPU, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.sctlr = ctxt.sctlr;
+    v->arch.ttbr0 = ctxt.ttbr0;
+    v->arch.ttbr1 = ctxt.ttbr1;
+    v->arch.ttbcr = ctxt.ttbcr;
+
+    v->arch.dacr = ctxt.dacr;
+    v->arch.ifsr = ctxt.ifsr;
+#ifdef CONFIG_ARM_32
+    v->arch.ifar = ctxt.ifar;
+    v->arch.dfar = ctxt.dfar;
+    v->arch.dfsr = ctxt.dfsr;
+#else
+    v->arch.far = ctxt.far;
+    v->arch.esr = ctxt.esr;
+#endif
+
+#ifdef CONFIG_ARM_32
+    v->arch.mair0 = ctxt.mair0;
+    v->arch.mair1 = ctxt.mair1;
+#else
+    v->arch.mair = ctxt.mair0;
+#endif
+
+    /* Control Registers */
+    v->arch.actlr = ctxt.actlr;
+    v->arch.cpacr = ctxt.cpacr;
+    v->arch.contextidr = ctxt.contextidr;
+    v->arch.tpidr_el0 = ctxt.tpidr_el0;
+    v->arch.tpidr_el1 = ctxt.tpidr_el1;
+    v->arch.tpidrro_el0 = ctxt.tpidrro_el0;
+
+    /* CP 15 */
+    v->arch.csselr = ctxt.csselr;
+
+    v->arch.afsr0 = ctxt.afsr0;
+    v->arch.afsr1 = ctxt.afsr1;
+    v->arch.vbar = ctxt.vbar;
+    v->arch.par = ctxt.par;
+    v->arch.teecr = ctxt.teecr;
+    v->arch.teehbr = ctxt.teehbr;
+#ifdef CONFIG_ARM_32
+    v->arch.joscr = ctxt.joscr;
+    v->arch.jmcr = ctxt.jmcr;
+#endif
+
+    /* fill guest core registers */
+    memset(&c, 0, sizeof(c));
+    c.x0 = ctxt.x0;
+    c.x1 = ctxt.x1;
+    c.x2 = ctxt.x2;
+    c.x3 = ctxt.x3;
+    c.x4 = ctxt.x4;
+    c.x5 = ctxt.x5;
+    c.x6 = ctxt.x6;
+    c.x7 = ctxt.x7;
+    c.x8 = ctxt.x8;
+    c.x9 = ctxt.x9;
+    c.x10 = ctxt.x10;
+    c.x11 = ctxt.x11;
+    c.x12 = ctxt.x12;
+    c.x13 = ctxt.x13;
+    c.x14 = ctxt.x14;
+    c.x15 = ctxt.x15;
+    c.x16 = ctxt.x16;
+    c.x17 = ctxt.x17;
+    c.x18 = ctxt.x18;
+    c.x19 = ctxt.x19;
+    c.x20 = ctxt.x20;
+    c.x21 = ctxt.x21;
+    c.x22 = ctxt.x22;
+    c.x23 = ctxt.x23;
+    c.x24 = ctxt.x24;
+    c.x25 = ctxt.x25;
+    c.x26 = ctxt.x26;
+    c.x27 = ctxt.x27;
+    c.x28 = ctxt.x28;
+    c.x29 = ctxt.x29;
+    c.x30 = ctxt.x30;
+    c.pc64 = ctxt.pc64;
+    c.cpsr = ctxt.cpsr;
+    c.spsr_el1 = ctxt.spsr_el1; /* spsr_svc */
+
+#ifdef CONFIG_ARM_32
+    c.spsr_fiq = ctxt.spsr_fiq;
+    c.spsr_irq = ctxt.spsr_irq;
+    c.spsr_und = ctxt.spsr_und;
+    c.spsr_abt = ctxt.spsr_abt;
+#endif
+#ifdef CONFIG_ARM_64
+    c.sp_el0 = ctxt.sp_el0;
+    c.sp_el1 = ctxt.sp_el1;
+    c.elr_el1 = ctxt.elr_el1;
+#endif
+
+    /* set guest core registers */
+    vcpu_regs_user_to_hyp(v, &c);
+
+    /* check VFP state size */
+    BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp));
+    memcpy(&v->arch.vfp, &ctxt,  sizeof(v->arch.vfp));
+
+    v->is_initialised = 1;
+    v->pause_flags = ctxt.pause_flags;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(VCPU, hvm_save_cpu_ctxt, hvm_load_cpu_ctxt, 1,
+                          HVMSR_PER_VCPU);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c
new file mode 100644
index 0000000..eef14a8
--- /dev/null
+++ b/xen/arch/arm/save.c
@@ -0,0 +1,65 @@
+/*
+ * Save.c: Save and restore HVM guest's emulated hardware state for ARM.
+ *
+ * Copyright (c) 2014, Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+#include <asm/hvm/support.h>
+#include <public/hvm/save.h>
+
+void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
+{
+    hdr->cpuid = current_cpu_data.midr.bits;
+}
+
+int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
+{
+    uint32_t cpuid;
+
+    if ( hdr->magic != HVM_FILE_MAGIC )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n",
+               d->domain_id, hdr->magic);
+        return -EINVAL;
+    }
+
+    if ( hdr->version != HVM_FILE_VERSION )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n",
+               d->domain_id, hdr->version);
+        return -EINVAL;
+    }
+
+    cpuid = current_cpu_data.midr.bits;
+    if ( hdr->cpuid != cpuid )
+    {
+        printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU "
+               "(%#"PRIx32") and restored on another (%#"PRIx32").\n",
+               d->domain_id, hdr->cpuid, cpuid);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 9fc9586..af244a7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -24,6 +24,7 @@
 #include <xen/softirq.h>
 #include <xen/irq.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 
 #include <asm/current.h>
 
@@ -73,6 +74,75 @@ static struct vgic_irq_rank *vgic_irq_rank(struct vcpu *v, int b, int n)
         return NULL;
 }
 
+/* Save rank info into a context to support domain save/restore */
+static int vgic_save_irq_rank(struct vcpu *v, struct vgic_rank *ext,
+                              struct vgic_irq_rank *rank)
+{
+    spin_lock(&rank->lock);
+
+    /* IENABLE, IACTIVE, IPEND, PENDSGI registers */
+    ext->ienable = rank->ienable;
+    ext->iactive = rank->iactive;
+    ext->ipend = rank->ipend;
+    ext->pendsgi = rank->pendsgi;
+
+    /* ICFG */
+    ext->icfg[0] = rank->icfg[0];
+    ext->icfg[1] = rank->icfg[1];
+
+    /* IPRIORITY */
+    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ext->ipriority));
+    memcpy(ext->ipriority, rank->ipriority, sizeof(rank->ipriority));
+
+    /* ITARGETS */
+    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ext->itargets));
+    memcpy(ext->itargets, rank->itargets, sizeof(rank->itargets));
+
+    spin_unlock(&rank->lock);
+
+    return 0;
+}
+
+/* Load rank info from a context to support for domain save/restore */
+static int vgic_load_irq_rank(struct vcpu *v, struct vgic_irq_rank *rank,
+                              struct vgic_rank *ext)
+{
+    struct pending_irq *p;
+    unsigned int irq = 0;
+    const unsigned long enable_bits = ext->ienable;
+
+    spin_lock(&rank->lock);
+
+    /* IENABLE, IACTIVE, IPEND, PENDSGI registers */
+    rank->ienable = ext->ienable;
+    rank->iactive = ext->iactive;
+    rank->ipend = ext->ipend;
+    rank->pendsgi = ext->pendsgi;
+
+    /* ICFG */
+    rank->icfg[0] = ext->icfg[0];
+    rank->icfg[1] = ext->icfg[1];
+
+    /* IPRIORITY */
+    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ext->ipriority));
+    memcpy(rank->ipriority, ext->ipriority, sizeof(rank->ipriority));
+
+    /* ITARGETS */
+    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ext->itargets));
+    memcpy(rank->itargets, ext->itargets, sizeof(rank->itargets));
+
+    /* Set IRQ status as enabled by iterating through rank->ienable */
+    while ( (irq = find_next_bit(&enable_bits, 32, irq)) < 32 ) {
+        p = irq_to_pending(v, irq);
+        set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
+        irq++;
+    }
+
+    spin_unlock(&rank->lock);
+
+    return 0;
+}
+
 int domain_vgic_init(struct domain *d)
 {
     int i;
@@ -749,6 +819,82 @@ out:
         smp_send_event_check_mask(cpumask_of(v->processor));
 }
 
+
+/* Save GIC state into a context to support save/restore */
+static int hvm_gic_save_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_gic ctxt;
+    struct vcpu *v;
+    int ret = 0;
+
+    /* Save the state of GICs */
+    for_each_vcpu( d, v )
+    {
+        ctxt.gic_hcr = v->arch.gic_hcr;
+        ctxt.gic_vmcr = v->arch.gic_vmcr;
+        ctxt.gic_apr = v->arch.gic_apr;
+
+        /* Save list registers and masks */
+        BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
+        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
+
+        ctxt.lr_mask = v->arch.lr_mask;
+        ctxt.event_mask = v->arch.event_mask;
+
+        /* Save PPI states (per-CPU), necessary for SMP-enabled guests */
+        if ( (ret = vgic_save_irq_rank(v, &ctxt.ppi_state,
+                                       &v->arch.vgic.private_irqs)) != 0 )
+            return ret;
+
+        if ( (ret = hvm_save_entry(GIC, v->vcpu_id, h, &ctxt)) != 0 )
+            return ret;
+    }
+
+    return ret;
+}
+
+/* Restore GIC state from a context to support save/restore */
+static int hvm_gic_load_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_gic ctxt;
+    struct vcpu *v;
+    int ret = 0;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(GIC, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.gic_hcr = ctxt.gic_hcr;
+    v->arch.gic_vmcr = ctxt.gic_vmcr;
+    v->arch.gic_apr = ctxt.gic_apr;
+
+    /* Restore list registers and masks */
+    BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
+    memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(v->arch.gic_lr));
+
+    v->arch.lr_mask = ctxt.lr_mask;
+    v->arch.event_mask = ctxt.event_mask;
+
+    /* Restore PPI states */
+    if ( (ret = vgic_load_irq_rank(v, &v->arch.vgic.private_irqs,
+                                   &ctxt.ppi_state)) != 0 )
+        return ret;
+
+    return ret;
+}
+
+HVM_REGISTER_SAVE_RESTORE(GIC, hvm_gic_save_ctxt, hvm_gic_load_ctxt, 1,
+                          HVMSR_PER_VCPU);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index 3d6a721..7c47eac 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -21,6 +21,7 @@
 #include <xen/lib.h>
 #include <xen/timer.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
 #include <asm/irq.h>
 #include <asm/time.h>
 #include <asm/gic.h>
@@ -284,6 +285,76 @@ int vtimer_emulate(struct cpu_user_regs *regs, union hsr hsr)
     }
 }
 
+static int hvm_vtimer_save_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t;
+    int i, ret = 0;
+
+    /* Save the state of vtimer and ptimer */
+    for_each_vcpu( d, v )
+    {
+        t = &v->arch.virt_timer;
+        for ( i = 0; i < 2; i++ )
+        {
+            ctxt.cval = t->cval;
+            ctxt.ctl = t->ctl;
+            ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset :
+                d->arch.virt_timer_base.offset;
+            ctxt.type = i ? TIMER_TYPE_PHYS : TIMER_TYPE_VIRT;
+
+            if ( (ret = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
+                return ret;
+
+            t = &v->arch.phys_timer;
+        }
+    }
+
+    return ret;
+}
+
+static int hvm_vtimer_load_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t = NULL;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    if ( ctxt.type == TIMER_TYPE_VIRT )
+    {
+        t = &v->arch.virt_timer;
+        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
+    }
+    else
+    {
+        t = &v->arch.phys_timer;
+        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
+    }
+
+    t->cval = ctxt.cval;
+    t->ctl = ctxt.ctl;
+    t->v = v;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(TIMER, hvm_vtimer_save_ctxt, hvm_vtimer_load_ctxt,
+                          2, HVMSR_PER_VCPU);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 26635ff..30fbd30 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -399,76 +399,6 @@ long arch_do_domctl(
     }
     break;
 
-    case XEN_DOMCTL_sethvmcontext:
-    { 
-        struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size };
-
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            goto sethvmcontext_out;
-
-        ret = -ENOMEM;
-        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
-            goto sethvmcontext_out;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0)
-            goto sethvmcontext_out;
-
-        domain_pause(d);
-        ret = hvm_load(d, &c);
-        domain_unpause(d);
-
-    sethvmcontext_out:
-        if ( c.data != NULL )
-            xfree(c.data);
-    }
-    break;
-
-    case XEN_DOMCTL_gethvmcontext:
-    { 
-        struct hvm_domain_context c = { 0 };
-
-        ret = -EINVAL;
-        if ( !is_hvm_domain(d) ) 
-            goto gethvmcontext_out;
-
-        c.size = hvm_save_size(d);
-
-        if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) )
-        {
-            /* Client is querying for the correct buffer size */
-            domctl->u.hvmcontext.size = c.size;
-            ret = 0;
-            goto gethvmcontext_out;            
-        }
-
-        /* Check that the client has a big enough buffer */
-        ret = -ENOSPC;
-        if ( domctl->u.hvmcontext.size < c.size ) 
-            goto gethvmcontext_out;
-
-        /* Allocate our own marshalling buffer */
-        ret = -ENOMEM;
-        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
-            goto gethvmcontext_out;
-
-        domain_pause(d);
-        ret = hvm_save(d, &c);
-        domain_unpause(d);
-
-        domctl->u.hvmcontext.size = c.cur;
-        if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 )
-            ret = -EFAULT;
-
-    gethvmcontext_out:
-        copyback = 1;
-
-        if ( c.data != NULL )
-            xfree(c.data);
-    }
-    break;
-
     case XEN_DOMCTL_gethvmcontext_partial:
     { 
         ret = -EINVAL;
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3683ae3..13b781f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -62,7 +62,7 @@ obj-$(CONFIG_XENCOMM) += xencomm.o
 
 subdir-$(CONFIG_COMPAT) += compat
 
-subdir-$(x86_64) += hvm
+subdir-y += hvm
 
 subdir-$(coverage) += gcov
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 5342e5d..2ea4af5 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -24,6 +24,8 @@
 #include <xen/bitmap.h>
 #include <xen/paging.h>
 #include <xen/hypercall.h>
+#include <xen/hvm/save.h>
+#include <xen/guest_access.h>
 #include <asm/current.h>
 #include <asm/irq.h>
 #include <asm/page.h>
@@ -881,6 +883,78 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     }
     break;
 
+    case XEN_DOMCTL_sethvmcontext:
+    {
+        struct hvm_domain_context c = { .size = op->u.hvmcontext.size };
+
+        ret = -EINVAL;
+        if ( (d == current->domain) || /* no domain_pause() */
+             !is_hvm_domain(d) )
+            goto sethvmcontext_out;
+
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto sethvmcontext_out;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(c.data, op->u.hvmcontext.buffer, c.size) != 0)
+            goto sethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_load(d, &c);
+        domain_unpause(d);
+
+    sethvmcontext_out:
+        if ( c.data != NULL )
+            xfree(c.data);
+    }
+    break;
+
+    case XEN_DOMCTL_gethvmcontext:
+    {
+        struct hvm_domain_context c = { 0 };
+
+        ret = -EINVAL;
+        if ( (d == current->domain) || /* no domain_pause() */
+             !is_hvm_domain(d) )
+            goto gethvmcontext_out;
+
+        c.size = hvm_save_size(d);
+
+        if ( guest_handle_is_null(op->u.hvmcontext.buffer) )
+        {
+            /* Client is querying for the correct buffer size */
+            op->u.hvmcontext.size = c.size;
+            ret = 0;
+            goto gethvmcontext_out;
+        }
+
+        /* Check that the client has a big enough buffer */
+        ret = -ENOSPC;
+        if ( op->u.hvmcontext.size < c.size )
+            goto gethvmcontext_out;
+
+        /* Allocate our own marshalling buffer */
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto gethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_save(d, &c);
+        domain_unpause(d);
+
+        op->u.hvmcontext.size = c.cur;
+        if ( copy_to_guest(op->u.hvmcontext.buffer, c.data, c.size) != 0 )
+            ret = -EFAULT;
+
+    gethvmcontext_out:
+        copyback = 1;
+
+        if ( c.data != NULL )
+            xfree(c.data);
+    }
+    break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
new file mode 100644
index 0000000..09f7cb8
--- /dev/null
+++ b/xen/include/asm-arm/hvm/support.h
@@ -0,0 +1,29 @@
+/*
+ * asm-arm/hvm/support.h: HVM support routines used by ARM.
+ *
+ * Copyright (c) 2014, Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#ifndef __ASM_ARM_HVM_SUPPORT_H__
+#define __ASM_ARM_HVM_SUPPORT_H__
+
+#include <xen/types.h>
+#include <public/hvm/ioreq.h>
+#include <xen/sched.h>
+#include <xen/hvm/save.h>
+#include <asm/processor.h>
+
+#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 75b8e65..f6ad258 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -26,6 +26,136 @@
 #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
 #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
 
+#define HVM_FILE_MAGIC   0x92385520
+#define HVM_FILE_VERSION 0x00000001
+
+struct hvm_save_header
+{
+    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
+    uint32_t version;           /* File format version */
+    uint64_t changeset;         /* Version of Xen that saved this file */
+    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */
+};
+DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
+
+struct vgic_rank
+{
+    uint32_t ienable, iactive, ipend, pendsgi;
+    uint32_t icfg[2];
+    uint32_t ipriority[8];
+    uint32_t itargets[8];
+};
+
+struct hvm_hw_gic
+{
+    uint32_t gic_hcr;
+    uint32_t gic_vmcr;
+    uint32_t gic_apr;
+    uint32_t gic_lr[64];
+    uint64_t event_mask;
+    uint64_t lr_mask;
+    struct vgic_rank ppi_state;
+};
+DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic);
+
+#define TIMER_TYPE_VIRT 0
+#define TIMER_TYPE_PHYS 1
+
+struct hvm_hw_timer
+{
+    uint64_t vtb_offset;
+    uint32_t ctl;
+    uint64_t cval;
+    uint32_t type;
+};
+DECLARE_HVM_SAVE_TYPE(TIMER, 3, struct hvm_hw_timer);
+
+struct hvm_hw_cpu
+{
+#ifdef CONFIG_ARM_32
+    uint64_t vfp[34];  /* 32-bit VFP registers */
+#else
+    uint64_t vfp[66];  /* 64-bit VFP registers */
+#endif
+
+    /* Guest core registers */
+    uint64_t x0;     /* r0_usr */
+    uint64_t x1;     /* r1_usr */
+    uint64_t x2;     /* r2_usr */
+    uint64_t x3;     /* r3_usr */
+    uint64_t x4;     /* r4_usr */
+    uint64_t x5;     /* r5_usr */
+    uint64_t x6;     /* r6_usr */
+    uint64_t x7;     /* r7_usr */
+    uint64_t x8;     /* r8_usr */
+    uint64_t x9;     /* r9_usr */
+    uint64_t x10;    /* r10_usr */
+    uint64_t x11;    /* r11_usr */
+    uint64_t x12;    /* r12_usr */
+    uint64_t x13;    /* sp_usr */
+    uint64_t x14;    /* lr_usr; */
+    uint64_t x15;    /* __unused_sp_hyp */
+    uint64_t x16;    /* lr_irq */
+    uint64_t x17;    /* sp_irq */
+    uint64_t x18;    /* lr_svc */
+    uint64_t x19;    /* sp_svc */
+    uint64_t x20;    /* lr_abt */
+    uint64_t x21;    /* sp_abt */
+    uint64_t x22;    /* lr_und */
+    uint64_t x23;    /* sp_und */
+    uint64_t x24;    /* r8_fiq */
+    uint64_t x25;    /* r9_fiq */
+    uint64_t x26;    /* r10_fiq */
+    uint64_t x27;    /* r11_fiq */
+    uint64_t x28;    /* r12_fiq */
+    uint64_t x29;    /* fp,sp_fiq */
+    uint64_t x30;    /* lr_fiq */
+    uint64_t pc64;   /* ELR_EL2 */
+    uint32_t cpsr;   /* SPSR_EL2 */
+    uint32_t spsr_el1;  /*spsr_svc */
+    /* AArch32 guests only */
+    uint32_t spsr_fiq, spsr_irq, spsr_und, spsr_abt;
+    /* AArch64 guests only */
+    uint64_t sp_el0;
+    uint64_t sp_el1, elr_el1;
+
+    uint32_t sctlr, ttbcr;
+    uint64_t ttbr0, ttbr1;
+
+    uint32_t ifar, dfar;
+    uint32_t ifsr, dfsr;
+    uint32_t dacr;
+    uint64_t par;
+
+    uint64_t far;
+    uint64_t esr;
+
+    uint64_t mair0, mair1;
+    uint64_t tpidr_el0;
+    uint64_t tpidr_el1;
+    uint64_t tpidrro_el0;
+    uint64_t vbar;
+
+    /* Control Registers */
+    uint32_t actlr;
+    uint32_t cpacr;
+    uint32_t afsr0, afsr1;
+    uint32_t contextidr;
+    uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */
+    uint32_t joscr, jmcr;
+    /* CP 15 */
+    uint32_t csselr;
+
+    unsigned long pause_flags;
+
+};
+DECLARE_HVM_SAVE_TYPE(VCPU, 4, struct hvm_hw_cpu);
+
+/*
+ * Largest type-code in use
+ */
+#define HVM_SAVE_CODE_MAX 4
+
 #endif
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (7 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch implements ddomain_get_maximum_gpfn by using max_mapped_gfn
field of P2M struct. A support function to retrieve guest VM pfn range
is also added.

Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/mm.c        |   21 ++++++++++++++++++++-
 xen/include/asm-arm/mm.h |    1 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 362bc8d..473ad04 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -947,7 +947,11 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 
 unsigned long domain_get_maximum_gpfn(struct domain *d)
 {
-    return -ENOSYS;
+    paddr_t end;
+
+    domain_get_gpfn_range(d, NULL, &end);
+
+    return (unsigned long)end;
 }
 
 void share_xen_page_with_guest(struct page_info *page,
@@ -1235,6 +1239,21 @@ int is_iomem_page(unsigned long mfn)
         return 1;
     return 0;
 }
+
+/*
+ * Return start and end addresses of guest VM
+ */
+void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    if ( start )
+        *start = GUEST_RAM_BASE;
+
+    if ( end )
+        *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index b8d4e7d..8347524 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -319,6 +319,7 @@ int donate_page(
 #define domain_set_alloc_bitsize(d) ((void)0)
 #define domain_clamp_alloc_bitsize(d, b) (b)
 
+void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end);
 unsigned long domain_get_maximum_gpfn(struct domain *d);
 
 extern struct domain *dom_xen, *dom_io, *dom_cow;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 3/6] xen/arm: support guest do_suspend function
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (8 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the
success of suspend.

Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 tools/libxc/xc_resume.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 18b4818..2b09990 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -73,6 +73,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
     return 0;
 }
 
+#elif defined(__arm__) || defined(__aarch64__)
+
+static int modify_returncode(xc_interface *xch, uint32_t domid)
+{
+    vcpu_guest_context_any_t ctxt;
+    xc_dominfo_t info;
+    int rc;
+
+    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
+    {
+        PERROR("Could not get domain info");
+        return -EINVAL;
+    }
+
+    if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
+        return rc;
+
+    ctxt.c.user_regs.r0_usr = 1;
+
+    if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 )
+        return rc;
+
+    return 0;
+}
+
 #else
 
 static int modify_returncode(xc_interface *xch, uint32_t domid)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (9 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

Thsi patch implements VLPT (virtual-linear page table) for fast accessing
of 3rd PTE of guest P2M. For more info about VLPT, please see
http://www.technovelty.org/linux/virtual-linear-page-table.html.

When creating a mapping for VLPT, just copy the 1st level PTE of guest p2m
to xen's 2nd level PTE. Then the mapping becomes the following:
      xen's 1st PTE -->
      xen's 2nd PTE (which is the same as 1st PTE of guest p2m) -->
      guest p2m's 2nd PTE -->
      guest p2m's 3rd PTE (the memory contents where the vlpt points)

This function is used in dirty-page tracing. When domU write-fault is
trapped by xen, xen can immediately locate the 3rd PTE of guest p2m.

The following link shows the performance comparison for handling a
dirty-page between vlpt and typical page table walking.
http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
---
 xen/arch/arm/domain.c        |    5 ++
 xen/arch/arm/mm.c            |  116 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/config.h |    7 +++
 xen/include/asm-arm/domain.h |    7 +++
 xen/include/asm-arm/mm.h     |   17 +++++++
 5 files changed, 152 insertions(+)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index b125857..3f04a77 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -502,6 +502,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    d->arch.dirty.second_lvl_start = 0;
+    d->arch.dirty.second_lvl_end = 0;
+    d->arch.dirty.second_lvl[0] = NULL;
+    d->arch.dirty.second_lvl[1] = NULL;
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 473ad04..a315752 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -750,6 +750,122 @@ void *__init arch_vmap_virt_end(void)
     return (void *)VMAP_VIRT_END;
 }
 
+/* Flush the vlpt area */
+void flush_vlpt(struct domain *d)
+{
+    int flush_size;
+
+    flush_size = (d->arch.dirty.second_lvl_end -
+                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
+
+    /* flushing the 3rd level mapping */
+    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start << SECOND_SHIFT,
+                                flush_size);
+}
+
+/* Restore the xen page table for vlpt mapping for domain */
+void restore_vlpt(struct domain *d)
+{
+    int i;
+
+    dsb(sy);
+
+    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
+          ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+
+        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
+        {
+            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
+            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
+        }
+    }
+    
+    dsb(sy);
+    isb();
+}
+
+/* Set up the xen page table for vlpt mapping for domain */
+int prepare_vlpt(struct domain *d)
+{
+    int xen_second_linear_base;
+    int gp2m_start_index, gp2m_end_index;
+    struct p2m_domain *p2m = &d->arch.p2m;
+    struct page_info *second_lvl_page;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    lpae_t *first[2];
+    int i;
+    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+    required = (gma_end - gma_start) >> LPAE_SHIFT;
+
+    if ( required > avail )
+    {
+        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest"
+                "(avail: %llx, required: %llx)\n", (unsigned long long)avail,
+                (unsigned long long)required);
+        return -ENOMEM;
+    }
+
+    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
+
+    gp2m_start_index = gma_start >> FIRST_SHIFT;
+    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
+
+    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
+    {
+        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
+        return -ENOMEM;
+    }
+
+    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
+    if ( second_lvl_page == NULL )
+        return -ENOMEM;
+
+    /* First level p2m is 2 consecutive pages */
+    d->arch.dirty.second_lvl[0] = map_domain_page_global(
+        page_to_mfn(second_lvl_page) );
+    d->arch.dirty.second_lvl[1] = map_domain_page_global(
+        page_to_mfn(second_lvl_page+1) );
+
+    first[0] = __map_domain_page(p2m->first_level);
+    first[1] = __map_domain_page(p2m->first_level+1);
+
+    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
+        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
+
+        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
+
+        /* we copy the mapping into domain's structure as a reference
+         * in case of the context switch (used in restore_vlpt) */
+        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
+    }
+    unmap_domain_page(first[0]);
+    unmap_domain_page(first[1]);
+
+    /* storing the start and end index */
+    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
+    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
+
+    flush_vlpt(d);
+
+    return 0;
+}
+
+void cleanup_vlpt(struct domain *d)
+{
+    /* First level p2m is 2 consecutive pages */
+    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
+    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
+}
 /*
  * This function should only be used to remap device address ranges
  * TODO: add a check to verify this assumption
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index ef291ff..47d1bce 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -87,6 +87,7 @@
  *   0  -   8M   <COMMON>
  *
  *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
+ * 128M - 256M   Virtual-linear mapping to P2M table
  * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
  *                    space
  *
@@ -124,7 +125,9 @@
 #define CONFIG_SEPARATE_XENHEAP 1
 
 #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
+#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
 #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
 #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
 #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
 #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
@@ -157,6 +160,10 @@
 
 #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
 
+/* VIRT_LIN_P2M_START and VIRT_LIN_P2M_END for vlpt */
+#define VIRT_LIN_P2M_START     _AT(vaddr_t, 0x08000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
+
 #endif
 
 /* Fixmap slots */
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 28c359a..5321bd6 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -161,6 +161,13 @@ struct arch_domain
         spinlock_t                  lock;
     } vuart;
 
+    /* dirty-page tracing */
+    struct {
+        volatile int second_lvl_start;   /* for context switch */
+        volatile int second_lvl_end;
+        lpae_t *second_lvl[2];           /* copy of guest p2m's first */
+    } dirty;
+
     unsigned int evtchn_irq;
 }  __cacheline_aligned;
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 8347524..5fd684f 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -4,6 +4,7 @@
 #include <xen/config.h>
 #include <xen/kernel.h>
 #include <asm/page.h>
+#include <asm/config.h>
 #include <public/xen.h>
 
 /* Align Xen to a 2 MiB boundary. */
@@ -342,6 +343,22 @@ static inline void put_page_and_type(struct page_info *page)
     put_page(page);
 }
 
+int prepare_vlpt(struct domain *d);
+void cleanup_vlpt(struct domain *d);
+void restore_vlpt(struct domain *d);
+
+/* calculate the xen's virtual address for accessing the leaf PTE of
+ * a given address (GPA) */
+static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
+{
+    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
+
+    /* Since we slotted the guest's first p2m page table to xen's
+     * second page table, one shift is enough for calculating the
+     * index of guest p2m table entry */
+    return &table[addr >> PAGE_SHIFT];
+}
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (10 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-15 23:35   ` Julien Grall
  2014-04-23 11:59   ` Julien Grall
  2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch adds hypercall for shadow operations, including enable/disable
and clean/peek dirty page bitmap.

The design consists of two parts: dirty page detecting and saving. For
detecting, we setup the guest p2m's leaf PTE read-only and whenever
the guest tries to write something, permission fault happens and traps
into xen. The permission-faulted GPA should be saved for the toolstack,
which checks which pages are dirty. For this purpose, it temporarily saves
the GPAs into bitmap.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 xen/arch/arm/domain.c           |   14 +++
 xen/arch/arm/domctl.c           |   21 ++++
 xen/arch/arm/mm.c               |  105 ++++++++++++++++++-
 xen/arch/arm/p2m.c              |  211 +++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |   11 ++
 xen/include/asm-arm/domain.h    |    7 ++
 xen/include/asm-arm/mm.h        |   10 ++
 xen/include/asm-arm/p2m.h       |    8 +-
 xen/include/asm-arm/processor.h |    2 +
 9 files changed, 387 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 3f04a77..d2531ed 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -207,6 +207,12 @@ static void ctxt_switch_to(struct vcpu *n)
 
     isb();
 
+    /* Dirty-page tracing
+     * NB: How do we consider SMP case?
+     */
+    if ( n->domain->arch.dirty.mode )
+        restore_vlpt(n->domain);
+
     /* This is could trigger an hardware interrupt from the virtual
      * timer. The interrupt needs to be injected into the guest. */
     virt_timer_restore(n);
@@ -502,11 +508,19 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    /* init for dirty-page tracing */
+    d->arch.dirty.count = 0;
+    d->arch.dirty.mode = 0;
+    spin_lock_init(&d->arch.dirty.lock);
+
     d->arch.dirty.second_lvl_start = 0;
     d->arch.dirty.second_lvl_end = 0;
     d->arch.dirty.second_lvl[0] = NULL;
     d->arch.dirty.second_lvl[1] = NULL;
 
+    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
+    d->arch.dirty.bitmap_pages = 0;
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index 45974e7..e84651f 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -11,12 +11,33 @@
 #include <xen/sched.h>
 #include <xen/hypercall.h>
 #include <public/domctl.h>
+#include <xen/hvm/save.h>
+#include <xen/guest_access.h>
+
 
 long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
+    long ret = 0;
+
     switch ( domctl->cmd )
     {
+    case XEN_DOMCTL_shadow_op:
+    {
+        if ( d == current->domain ) /* no domain_pause() */
+            return -EINVAL;
+                                          
+        domain_pause(d);
+        ret = dirty_mode_op(d, &domctl->u.shadow_op);
+        domain_unpause(d);
+
+        if ( __copy_to_guest(u_domctl, domctl, 1) )
+            ret = -EFAULT;
+
+        return ret;
+    }
+    break;
+
     case XEN_DOMCTL_cacheflush:
     {
         unsigned long s = domctl->u.cacheflush.start_pfn;
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index a315752..ae852eb 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -981,7 +981,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e)
     create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
 }
 
-enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
 static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)
 {
     lpae_t pte;
@@ -1370,6 +1369,110 @@ void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end)
         *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);
 }
 
+static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
+{
+    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
+    int bit_index = PFN_DOWN(addr - ram_base);
+    int page_index = bit_index >> (PAGE_SHIFT + 3);
+    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) - 1);
+
+    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
+}
+
+/* Routine for dirty-page tracing
+ *
+ * On first write, it page faults, its entry is changed to read-write,
+ * and on retry the write succeeds. For locating p2m of the faulting entry,
+ * we use virtual-linear page table.
+ *
+ * Returns zero if addr is not valid or dirty mode is not set
+ */
+int handle_page_fault(struct domain *d, paddr_t addr)
+{
+
+    lpae_t *vlp2m_pte = 0;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+
+    if ( !d->arch.dirty.mode )
+        return 0;
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+    /* Ensure that addr is inside guest's RAM */
+    if ( addr < gma_start || addr > gma_end )
+        return 0;
+
+    vlp2m_pte = get_vlpt_3lvl_pte(addr);
+    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
+         vlp2m_pte->p2m.type == p2m_ram_logdirty )
+    {
+        lpae_t pte = *vlp2m_pte;
+        pte.p2m.write = 1;
+        write_pte(vlp2m_pte, pte);
+        flush_tlb_local();
+
+        /* only necessary to lock between get-dirty bitmap and mark dirty
+         * bitmap. If get-dirty bitmap happens immediately before this
+         * lock, the corresponding dirty-page would be marked at the next
+         * round of get-dirty bitmap */
+        spin_lock(&d->arch.dirty.lock);
+        mark_dirty_bitmap(d, addr);
+        spin_unlock(&d->arch.dirty.lock);
+    }
+
+    return 1;
+}
+
+int prepare_bitmap(struct domain *d)
+{
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    int nr_bytes;
+    int nr_pages;
+    int i;
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+
+    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
+    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+    BUG_ON( nr_pages > MAX_DIRTY_BITMAP_PAGES );
+
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        struct page_info *page;
+
+        page = alloc_domheap_page(NULL, 0);
+        if ( page == NULL )
+            goto cleanup_on_failure;
+
+        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));
+        clear_page(d->arch.dirty.bitmap[i]);
+    }
+
+    d->arch.dirty.bitmap_pages = nr_pages;
+    return 0;
+
+cleanup_on_failure:
+    nr_pages = i;
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+
+    return -ENOMEM;
+}
+
+void cleanup_bitmap(struct domain *d)
+{
+    int i;
+
+    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 403fd89..d57a44a 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -6,6 +6,8 @@
 #include <xen/bitops.h>
 #include <asm/flushtlb.h>
 #include <asm/gic.h>
+#include <xen/guest_access.h>
+#include <xen/pfn.h>
 #include <asm/event.h>
 #include <asm/hardirq.h>
 #include <asm/page.h>
@@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
         break;
 
     case p2m_ram_ro:
+    case p2m_ram_logdirty:
         e.p2m.xn = 0;
         e.p2m.write = 0;
         break;
@@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
 
+    /* mark the write bit (page table's case, ro bit) as 0. So it is writable 
+     * in case of vlpt access */
+    pte.pt.ro = 0;
+
     write_pte(entry, pte);
 
     return 0;
@@ -697,6 +704,210 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
     return p >> PAGE_SHIFT;
 }
 
+/* Change types across all p2m entries in a domain */
+void p2m_change_entry_type_global(struct domain *d, enum mg nt)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+    paddr_t ram_base;
+    int i1, i2, i3;
+    int first_index, second_index, third_index;
+    lpae_t *first = __map_domain_page(p2m->first_level);
+    lpae_t pte, *second = NULL, *third = NULL;
+
+    domain_get_gpfn_range(d, &ram_base, NULL);
+
+    first_index = first_table_offset((uint64_t)ram_base);
+    second_index = second_table_offset((uint64_t)ram_base);
+    third_index = third_table_offset((uint64_t)ram_base);
+
+    BUG_ON( !first && "Can't map first level p2m." );
+
+    spin_lock(&p2m->lock);
+
+    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
+    {
+        lpae_walk_t first_pte = first[i1].walk;
+
+        if ( !first_pte.valid || !first_pte.table )
+            goto out;
+
+        second = map_domain_page(first_pte.base);
+        BUG_ON( !second && "Can't map second level p2m.");
+
+        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
+        {
+            lpae_walk_t second_pte = second[i2].walk;
+
+            if ( !second_pte.valid || !second_pte.table )
+                goto out;
+
+            third = map_domain_page(second_pte.base);
+            BUG_ON( !third && "Can't map third level p2m.");
+
+            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
+            {
+
+                lpae_walk_t third_pte = third[i3].walk;
+                if ( !third_pte.valid )
+                    goto out;
+
+                pte = third[i3];
+                if ( nt == mg_ro )
+                {
+                    if ( pte.p2m.write == 1 )
+                    {
+                        pte.p2m.write = 0;
+                        pte.p2m.type = p2m_ram_logdirty;
+                    }
+                    else
+                    {
+                        /* reuse avail bit as an indicator of 'actual'
+                         * read-only */
+                        pte.p2m.type = p2m_ram_rw;
+                    }
+                }
+                else if ( nt == mg_rw )
+                {
+                    if ( pte.p2m.write == 0 &&
+                         pte.p2m.type == p2m_ram_logdirty )
+                    {
+                        pte.p2m.write = p2m_ram_rw;
+                    }
+                }
+                write_pte(&third[i3], pte);
+            }
+            unmap_domain_page(third);
+
+            third = NULL;
+            third_index = 0;
+        }
+        unmap_domain_page(second);
+
+        second = NULL;
+        second_index = 0;
+        third_index = 0;
+    }
+
+out:
+    flush_tlb_all_local();
+    if ( third ) unmap_domain_page(third);
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
+
+    spin_unlock(&p2m->lock);
+}
+
+/* Read a domain's log-dirty bitmap and stats.
+ * If the operation is a CLEAN, clear the bitmap and stats. */
+int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    int peek = 1;
+    int i;
+    int bitmap_size;
+    paddr_t gma_start, gma_end;
+
+    /* this hypercall is called from domain 0, and we don't know which guest's
+     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
+    restore_vlpt(d);
+
+    domain_get_gpfn_range(d, &gma_start, &gma_end);
+    bitmap_size = (gma_end - gma_start) / 8;
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+    {
+        peek = 0;
+    }
+    else
+    {
+        spin_lock(&d->arch.dirty.lock);
+        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+        {
+            int j = 0;
+            uint8_t *bitmap;
+            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
+                                 d->arch.dirty.bitmap[i],
+                                 bitmap_size < PAGE_SIZE ? bitmap_size :
+                                                           PAGE_SIZE);
+            bitmap_size -= PAGE_SIZE;
+
+            /* set p2m page table read-only */
+            bitmap = d->arch.dirty.bitmap[i];
+            while ((j = find_next_bit((const long unsigned int *)bitmap,
+                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
+            {
+                lpae_t *vlpt;
+                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +
+                    (j << PAGE_SHIFT);
+                vlpt = get_vlpt_3lvl_pte(addr);
+                vlpt->p2m.write = 0;
+                j++;
+            }
+        }
+
+        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
+        {
+            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+            {
+                clear_page(d->arch.dirty.bitmap[i]);
+            }
+        }
+
+        spin_unlock(&d->arch.dirty.lock);
+        flush_tlb_local();
+    }
+
+    sc->stats.dirty_count = d->arch.dirty.count;
+
+    return 0;
+}
+
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    long ret = 0;
+    switch (sc->op)
+    {
+        case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        case XEN_DOMCTL_SHADOW_OP_OFF:
+        {
+            enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro;
+
+            d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1;
+            p2m_change_entry_type_global(d, nt);
+
+            if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF )
+            {
+                cleanup_vlpt(d);
+                cleanup_bitmap(d);
+            }
+            else
+            {
+                if ( (ret = prepare_vlpt(d)) )
+                   return ret;
+
+                if ( (ret = prepare_bitmap(d)) )
+                {
+                   /* in case of failure, we have to cleanup vlpt */
+                   cleanup_vlpt(d);
+                   return ret;
+                }
+            }
+        }
+        break;
+
+        case XEN_DOMCTL_SHADOW_OP_CLEAN:
+        case XEN_DOMCTL_SHADOW_OP_PEEK:
+        {
+            ret = log_dirty_op(d, sc);
+        }
+        break;
+
+        default:
+            return -ENOSYS;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index a7edc4e..cca34e9 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1491,6 +1491,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
     struct hsr_dabt dabt = hsr.dabt;
     int rc;
     mmio_info_t info;
+    int page_fault = ( (dabt.dfsc & FSC_MASK) ==
+                       (FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write );
 
     if ( !check_conditional_instr(regs, hsr) )
     {
@@ -1512,6 +1514,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
     if ( rc == -EFAULT )
         goto bad_data_abort;
 
+    /* domU page fault handling for guest live migration. dabt.valid can be 
+     * 0 here.
+     */
+    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
+    {
+        /* Do not modify pc after page fault to repeat memory operation */
+        return;
+    }
+
     /* XXX: Decode the instruction if ISS is not valid */
     if ( !dabt.valid )
         goto bad_data_abort;
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 5321bd6..99f9f51 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -163,9 +163,16 @@ struct arch_domain
 
     /* dirty-page tracing */
     struct {
+#define MAX_DIRTY_BITMAP_PAGES 64        /* support upto 8GB guest memory */
+        spinlock_t lock;                 /* protect list: head, mvn_head */
+        volatile int mode;               /* 1 if dirty pages tracing enabled */
+        volatile unsigned int count;     /* dirty pages counter */
         volatile int second_lvl_start;   /* for context switch */
         volatile int second_lvl_end;
         lpae_t *second_lvl[2];           /* copy of guest p2m's first */
+        /* dirty bitmap */
+        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES];
+        int bitmap_pages;                /* number of dirty bitmap pages */
     } dirty;
 
     unsigned int evtchn_irq;
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 5fd684f..5f9478b 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -343,10 +343,18 @@ static inline void put_page_and_type(struct page_info *page)
     put_page(page);
 }
 
+enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
+
+/* routine for dirty-page tracing */
+int handle_page_fault(struct domain *d, paddr_t addr);
+
 int prepare_vlpt(struct domain *d);
 void cleanup_vlpt(struct domain *d);
 void restore_vlpt(struct domain *d);
 
+int prepare_bitmap(struct domain *d);
+void cleanup_bitmap(struct domain *d);
+
 /* calculate the xen's virtual address for accessing the leaf PTE of
  * a given address (GPA) */
 static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
@@ -359,6 +367,8 @@ static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
     return &table[addr >> PAGE_SHIFT];
 }
 
+void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index bd71abe..0cecbe7 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -2,6 +2,7 @@
 #define _XEN_P2M_H
 
 #include <xen/mm.h>
+#include <public/domctl.h>
 
 struct domain;
 
@@ -41,6 +42,7 @@ typedef enum {
     p2m_invalid = 0,    /* Nothing mapped here */
     p2m_ram_rw,         /* Normal read/write guest RAM */
     p2m_ram_ro,         /* Read-only; writes are silently dropped */
+    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
     p2m_mmio_direct,    /* Read/write mapping of genuine MMIO area */
     p2m_map_foreign,    /* Ram pages from foreign domain */
     p2m_grant_map_rw,   /* Read/write grant mapping */
@@ -49,7 +51,8 @@ typedef enum {
 } p2m_type_t;
 
 #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
-#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
+#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro ||  \
+                             (_t) == p2m_ram_logdirty)
 
 /* Initialise vmid allocator */
 void p2m_vmid_allocator_init(void);
@@ -178,6 +181,9 @@ static inline int get_page_and_type(struct page_info *page,
     return rc;
 }
 
+void p2m_change_entry_type_global(struct domain *d, enum mg nt);
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
+
 #endif /* _XEN_P2M_H */
 
 /*
diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index 06e638f..9dc49c3 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -399,6 +399,8 @@ union hsr {
 #define FSC_CPR        (0x3a) /* Coprocossor Abort */
 
 #define FSC_LL_MASK    (_AC(0x03,U)<<0)
+#define FSC_MASK       (0x3f) /* Fault status mask */
+#define FSC_3D_LEVEL   (0x03) /* Third level fault*/
 
 /* Time counter hypervisor control register */
 #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical counter */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (11 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
@ 2014-04-15 21:05 ` Wei Huang
  2014-04-16 16:29 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Julien Grall
  2014-04-23 11:49 ` Ian Campbell
  14 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-15 21:05 UTC (permalink / raw)
  To: xen-devel
  Cc: w1.huang, ian.campbell, stefano.stabellini, andrew.cooper3,
	julien.grall, jaeyong.yoo, yjhyun.yoo

From: Jaeyong Yoo <jaeyong.yoo@samsung.com>

This patch implements xl save/restore operation in xc_arm_migrate.c and
make it compilable with existing design. The operation is also used by
migration.

The overall process of save is the following:
1) save guest parameters (i.e., memory map, console and store pfn, etc)
2) save memory (if it is live migration, perform dirty-page tracing)
3) save hvm states (i.e., gic, timer, vcpu etc)

Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
Signed-off-by: Wei Huang <w1.huang@samsung.com>
---
 config/arm32.mk              |    1 +
 config/arm64.mk              |    1 +
 tools/libxc/Makefile         |    6 +-
 tools/libxc/xc_arm_migrate.c |  702 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c     |    4 +-
 tools/libxl/libxl.h          |    3 -
 tools/misc/Makefile          |    4 +-
 7 files changed, 714 insertions(+), 7 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c

diff --git a/config/arm32.mk b/config/arm32.mk
index aa79d22..01374c9 100644
--- a/config/arm32.mk
+++ b/config/arm32.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_32 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX :=
 
diff --git a/config/arm64.mk b/config/arm64.mk
index 15b57a4..7ac3b65 100644
--- a/config/arm64.mk
+++ b/config/arm64.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_64 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX :=
 
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 2cca2b2..6b90b1c 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -43,8 +43,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
 GUEST_SRCS-y :=
 GUEST_SRCS-y += xg_private.c xc_suspend.c
 ifeq ($(CONFIG_MIGRATE),y)
+ifeq ($(CONFIG_X86),y)
 GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
+endif
+ifeq ($(CONFIG_ARM),y)
+GUEST_SRCS-y += xc_arm_migrate.c
+endif
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
@@ -64,7 +69,6 @@ $(patsubst %.c,%.opic,$(ELF_SRCS-y)): CFLAGS += -Wno-pointer-sign
 GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
 GUEST_SRCS-y                 += xc_dom_elfloader.c
 GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
-GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
 GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
 GUEST_SRCS-y                 += xc_dom_binloader.c
 GUEST_SRCS-y                 += xc_dom_compat_linux.c
diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c
new file mode 100644
index 0000000..ab2b94c
--- /dev/null
+++ b/tools/libxc/xc_arm_migrate.c
@@ -0,0 +1,702 @@
+/*
+ * Copyright (c) 2013, Samsung Electronics
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <inttypes.h>
+#include <errno.h>
+#include <xenctrl.h>
+#include <xenguest.h>
+
+#include <unistd.h>
+#include <xc_private.h>
+#include <xc_dom.h>
+#include "xc_bitops.h"
+#include "xg_private.h"
+
+#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop   */
+#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size  */
+#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */
+#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
+
+//#define DISABLE_LIVE_MIGRATION
+
+//#define ARM_MIGRATE_VERBOSE
+
+/*
+ * Guest params to save: used HVM params, save flags, memory map
+ */
+typedef struct guest_params
+{
+    unsigned long console_pfn;
+    unsigned long store_pfn;
+    uint32_t flags;
+    xen_pfn_t start_gpfn;
+    xen_pfn_t max_gpfn;
+    uint32_t max_vcpu_id;
+} guest_params_t;
+
+static int suspend_and_state(int (*suspend)(void*), void *data,
+                             xc_interface *xch, int dom)
+{
+    xc_dominfo_t info;
+    if ( !(*suspend)(data) )
+    {
+        ERROR("Suspend request failed");
+        return -1;
+    }
+
+    if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) ||
+         !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
+    {
+        ERROR("Domain is not in suspended state after suspend attempt");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_exact_handled(xc_interface *xch, int fd, const void *data,
+                               size_t size)
+{
+    if ( write_exact(fd, data, size) )
+    {
+        ERROR("Write failed, check space");
+        return -1;
+    }
+    return 0;
+}
+
+/* ============ Memory ============= */
+static int save_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                       struct save_callbacks *callbacks,
+                       uint32_t max_iters, uint32_t max_factor,
+                       guest_params_t *params)
+{
+    int live =  !!(params->flags & XCFLAGS_LIVE);
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    xen_pfn_t i;
+    char reportbuf[80];
+    int iter = 0;
+    int last_iter = !live;
+    int total_dirty_pages_num = 0;
+    int dirty_pages_on_prev_iter_num = 0;
+    int count = 0;
+    char *page = 0;
+    xen_pfn_t *busy_pages = 0;
+    int busy_pages_count = 0;
+    int busy_pages_max = 256;
+
+    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
+
+    xen_pfn_t start = params->start_gpfn;
+    const xen_pfn_t end = params->max_gpfn;
+    const xen_pfn_t mem_size = end - start;
+
+    if ( debug )
+    {
+        IPRINTF("(save mem) start=%llx end=%llx!\n", (unsigned long long)start,
+                (unsigned long long)end);
+    }
+
+    if ( live )
+    {
+        if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                    NULL, 0, NULL, 0, NULL) < 0 )
+        {
+            ERROR("Couldn't enable log-dirty mode !\n");
+            return -1;
+        }
+
+        max_iters  = max_iters  ? : DEF_MAX_ITERS;
+        max_factor = max_factor ? : DEF_MAX_FACTOR;
+
+        if ( debug )
+            IPRINTF("Log-dirty mode enabled, max_iters=%d, max_factor=%d!\n",
+                    max_iters, max_factor);
+    }
+
+    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
+                                              NRPAGES(bitmap_size(mem_size)));
+    if ( !to_send )
+    {
+        ERROR("Couldn't allocate to_send array!\n");
+        return -1;
+    }
+
+    /* send all pages on first iter */
+    memset(to_send, 0xff, bitmap_size(mem_size));
+
+    for ( ; ; )
+    {
+        int dirty_pages_on_current_iter_num = 0;
+        int frc;
+        iter++;
+
+        snprintf(reportbuf, sizeof(reportbuf),
+                 "Saving memory: iter %d (last sent %u)",
+                 iter, dirty_pages_on_prev_iter_num);
+
+        xc_report_progress_start(xch, reportbuf, mem_size);
+
+        if ( (iter > 1 &&
+              dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) ||
+             (iter == max_iters) ||
+             (total_dirty_pages_num >= mem_size*max_factor) )
+        {
+            if ( debug )
+                IPRINTF("Last iteration");
+            last_iter = 1;
+        }
+
+        if ( last_iter )
+        {
+            if ( suspend_and_state(callbacks->suspend, callbacks->data,
+                                   xch, dom) )
+            {
+                ERROR("Domain appears not to have suspended");
+                return -1;
+            }
+        }
+        if ( live && iter > 1 )
+        {
+            frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                                    HYPERCALL_BUFFER(to_send), mem_size,
+                                                     NULL, 0, NULL);
+            if ( frc != mem_size )
+            {
+                ERROR("Error peeking shadow bitmap");
+                xc_hypercall_buffer_free_pages(xch, to_send,
+                                               NRPAGES(bitmap_size(mem_size)));
+                return -1;
+            }
+        }
+
+        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
+
+        for ( i = start; i < end; ++i )
+        {
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i);
+                if ( !page )
+                {
+                    /* This page is mapped elsewhere, should be resent later */
+                    busy_pages[busy_pages_count] = i;
+                    busy_pages_count++;
+                    if ( busy_pages_count >= busy_pages_max )
+                    {
+                        busy_pages_max += 256;
+                        busy_pages = realloc(busy_pages, sizeof(xen_pfn_t) *
+                                                         busy_pages_max);
+                    }
+                    continue;
+                }
+
+                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
+                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
+                {
+                    munmap(page, PAGE_SIZE);
+                    free(busy_pages);
+                    return -1;
+                }
+                count++;
+                munmap(page, PAGE_SIZE);
+
+                if ( (i % DEF_PROGRESS_RATE) == 0 )
+                    xc_report_progress_step(xch, i - start, mem_size);
+                dirty_pages_on_current_iter_num++;
+            }
+        }
+
+        while ( busy_pages_count )
+        {
+            /* Send busy pages */
+            busy_pages_count--;
+            i = busy_pages[busy_pages_count];
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE,PROT_READ, i);
+                if ( !page )
+                {
+                    IPRINTF("WARNING: 2nd attempt to save page "
+                            "busy failed pfn=%llx", (unsigned long long)i);
+                    continue;
+                }
+
+                if ( debug )
+                {
+                    IPRINTF("save mem: resend busy page %llx\n",
+                            (unsigned long long)i);
+                }
+
+                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
+                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
+                {
+                    munmap(page, PAGE_SIZE);
+                    free(busy_pages);
+                    return -1;
+                }
+                count++;
+                munmap(page, PAGE_SIZE);
+                dirty_pages_on_current_iter_num++;
+            }
+        }
+        free(busy_pages);
+
+        if ( debug )
+            IPRINTF("Dirty pages=%d", dirty_pages_on_current_iter_num);
+
+        xc_report_progress_step(xch, mem_size, mem_size);
+
+        dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num;
+        total_dirty_pages_num += dirty_pages_on_current_iter_num;
+
+        if ( last_iter )
+        {
+            xc_hypercall_buffer_free_pages(xch, to_send,
+                                           NRPAGES(bitmap_size(mem_size)));
+            if ( live )
+            {
+                if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF,
+                                       NULL, 0, NULL, 0, NULL) < 0 )
+                    ERROR("Couldn't disable log-dirty mode");
+            }
+            break;
+        }
+    }
+    if ( debug )
+    {
+        IPRINTF("save mem: pages count = %d\n", count);
+    }
+
+    i = (xen_pfn_t) -1; /* end page marker */
+    return write_exact_handled(xch, io_fd, &i, sizeof(i));
+}
+
+static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                          guest_params_t *params)
+{
+    xen_pfn_t end = params->max_gpfn;
+    xen_pfn_t gpfn;
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    int count = 0;
+    char *page;
+    xen_pfn_t start = params->start_gpfn;
+
+    /* TODO allocate several pages per call */
+    for ( gpfn = start; gpfn < end; ++gpfn )
+    {
+        if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) )
+        {
+            PERROR("Memory allocation for a new domain failed");
+            return -1;
+        }
+    }
+
+    while ( 1 )
+    {
+
+        if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) )
+        {
+            PERROR("GPFN read failed during memory transfer, count=%d", count);
+            return -1;
+        }
+        if ( gpfn == (xen_pfn_t) -1 ) break; /* end page marker */
+
+        if ( gpfn < start || gpfn >= end )
+        {
+            ERROR("GPFN %llx doesn't belong to RAM address space, count=%d",
+                  (unsigned long long)gpfn, count);
+            return -1;
+        }
+        page = xc_map_foreign_range(xch, dom, PAGE_SIZE,
+                                    PROT_READ | PROT_WRITE, gpfn);
+        if ( !page )
+        {
+            PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn);
+            return -1;
+        }
+        if ( read_exact(io_fd, page, PAGE_SIZE) )
+        {
+            PERROR("Page data read failed during memory transfer, pfn=%llx",
+                    gpfn);
+            return -1;
+        }
+        munmap(page, PAGE_SIZE);
+        count++;
+    }
+
+    if ( debug )
+    {
+        IPRINTF("Memory restored, pages count=%d", count);
+    }
+    return 0;
+}
+
+/* ============ HVM context =========== */
+static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug)
+{
+    /* HVM: a buffer for holding HVM context */
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    uint32_t rec_size;
+    int retval = -1;
+
+    /* Need another buffer for HVM context */
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
+    if ( hvm_buf_size == -1 )
+    {
+        ERROR("Couldn't get HVM context size from Xen");
+        goto out;
+    }
+    hvm_buf = malloc(hvm_buf_size);
+
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory for hvm buffer");
+        goto out;
+    }
+
+    /* Get HVM context from Xen and save it too */
+    if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf,
+                    hvm_buf_size)) == -1 )
+    {
+        ERROR("HVM:Could not get hvm buffer");
+        goto out;
+    }
+
+    if ( debug )
+        IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size);
+
+    if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) )
+        goto out;
+
+    if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) )
+    {
+        goto out;
+    }
+
+    retval = 0;
+
+out:
+    if ( hvm_buf )
+        free (hvm_buf);
+
+    return retval;
+}
+
+static int restore_armhvm(xc_interface *xch, int io_fd,
+                          uint32_t dom, int debug)
+{
+    uint32_t rec_size;
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    int frc = 0;
+    int retval = -1;
+
+    if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) )
+    {
+        PERROR("Could not read HVM size");
+        goto out;
+    }
+
+    if ( !rec_size )
+    {
+        ERROR("Zero HVM size");
+        goto out;
+    }
+
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
+    if ( hvm_buf_size != rec_size )
+    {
+        ERROR("HVM size for this domain is not the same as stored");
+    }
+
+    hvm_buf = malloc(hvm_buf_size);
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory");
+        goto out;
+    }
+
+    if ( read_exact(io_fd, hvm_buf, hvm_buf_size) )
+    {
+        PERROR("Could not read HVM context");
+        goto out;
+    }
+
+    frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size);
+    if ( frc )
+    {
+        ERROR("error setting the HVM context");
+        goto out;
+    }
+    retval = 0;
+
+    if ( debug )
+    {
+            IPRINTF("HVM restore size %d %d", hvm_buf_size, rec_size);
+    }
+out:
+    if ( hvm_buf )
+        free (hvm_buf);
+    return retval;
+}
+
+/* ================= Console & Xenstore & Memory map =========== */
+static int save_guest_params(xc_interface *xch, int io_fd,
+                             uint32_t dom, uint32_t flags,
+                             guest_params_t *params)
+{
+    size_t sz = sizeof(guest_params_t);
+    xc_dominfo_t dom_info;
+
+    params->max_gpfn = xc_domain_maximum_gpfn(xch, dom);
+    params->start_gpfn = (GUEST_RAM_BASE >> PAGE_SHIFT);
+
+    if ( flags & XCFLAGS_DEBUG )
+    {
+        IPRINTF("Guest param save size: %d ", (int)sz);
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+            &params->console_pfn) )
+    {
+        ERROR("Can't get console gpfn");
+        return -1;
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, &params->store_pfn) )
+    {
+        ERROR("Can't get store gpfn");
+        return -1;
+    }
+
+    if ( xc_domain_getinfo(xch, dom, 1, &dom_info ) < 0)
+    {
+        ERROR("Can't get domain info for dom %d", dom);
+        return -1;
+    }
+    params->max_vcpu_id = dom_info.max_vcpu_id;
+
+    params->flags = flags;
+
+    if ( write_exact_handled(xch, io_fd, params, sz) )
+    {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int restore_guest_params(xc_interface *xch, int io_fd,
+                                uint32_t dom, guest_params_t *params)
+{
+    size_t sz = sizeof(guest_params_t);
+    xen_pfn_t nr_pfns;
+    unsigned int maxmemkb;
+
+    if ( read_exact(io_fd, params, sizeof(guest_params_t)) )
+    {
+        PERROR("Can't read guest params");
+        return -1;
+    }
+
+    nr_pfns = params->max_gpfn - params->start_gpfn;
+    maxmemkb = (unsigned int) nr_pfns << (PAGE_SHIFT - 10);
+
+    if ( params->flags & XCFLAGS_DEBUG )
+    {
+        IPRINTF("Guest param restore size: %d ", (int)sz);
+        IPRINTF("Guest memory size: %d MB", maxmemkb >> 10);
+    }
+
+    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
+    {
+        ERROR("Can't set memory map");
+        return -1;
+    }
+
+    /* Set max. number of vcpus as max_vcpu_id + 1 */
+    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
+    {
+        ERROR("Can't set max vcpu number for domain");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom,
+                            guest_params_t *params, unsigned int console_evtchn,
+                            domid_t console_domid, unsigned int store_evtchn,
+                            domid_t store_domid)
+{
+    int rc = 0;
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) )
+    {
+        ERROR("Can't clear console page");
+        return rc;
+    }
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) )
+    {
+        ERROR("Can't clear xenstore page");
+        return rc;
+    }
+
+    if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn,
+                                      params->store_pfn, console_domid,
+                                      store_domid)) )
+    {
+        ERROR("Can't grant console and xenstore pages");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+                                params->console_pfn)) )
+    {
+        ERROR("Can't set console gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
+                                params->store_pfn)) )
+    {
+        ERROR("Can't set xenstore gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN,
+                                console_evtchn)) )
+    {
+        ERROR("Can't set console event channel");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN,
+                                store_evtchn)) )
+    {
+        ERROR("Can't set xenstore event channel");
+        return rc;
+    }
+    return 0;
+}
+
+/* ================== Main ============== */
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+                   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
+                   struct save_callbacks *callbacks, int hvm,
+                   unsigned long vm_generationid_addr)
+{
+    int debug;
+    guest_params_t params;
+
+#ifdef ARM_MIGRATE_VERBOSE
+    flags |= XCFLAGS_DEBUG;
+#endif
+
+#ifdef DISABLE_LIVE_MIGRATION
+    flags &= ~(XCFLAGS_LIVE);
+#endif
+
+    debug = !!(flags & XCFLAGS_DEBUG);
+    if ( save_guest_params(xch, io_fd, dom, flags, &params) )
+    {
+       ERROR("Can't save guest params");
+       return -1;
+    }
+
+    if ( save_memory(xch, io_fd, dom, callbacks, max_iters,
+            max_factor, &params) )
+    {
+        ERROR("Memory not saved");
+        return -1;
+    }
+
+    if ( save_armhvm(xch, io_fd, dom, debug) )
+    {
+        ERROR("HVM not saved");
+        return -1;
+    }
+
+    if ( debug )
+    {
+        IPRINTF("Domain %d saved", dom);
+    }
+    return 0;
+}
+
+int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
+                      unsigned int store_evtchn, unsigned long *store_gpfn,
+                      domid_t store_domid, unsigned int console_evtchn,
+                      unsigned long *console_gpfn, domid_t console_domid,
+                      unsigned int hvm, unsigned int pae, int superpages,
+                      int no_incr_generationid, int checkpointed_stream,
+                      unsigned long *vm_generationid_addr,
+                      struct restore_callbacks *callbacks)
+{
+    guest_params_t params;
+    int debug = 1;
+
+    if ( restore_guest_params(xch, io_fd, dom, &params) )
+    {
+        ERROR("Can't restore guest params");
+        return -1;
+    }
+    debug = !!(params.flags & XCFLAGS_DEBUG);
+
+    if ( restore_memory(xch, io_fd, dom, &params) )
+    {
+        ERROR("Can't restore memory");
+        return -1;
+    }
+    if ( set_guest_params(xch, io_fd, dom, &params,
+                console_evtchn, console_domid,
+                store_evtchn, store_domid) )
+    {
+        ERROR("Can't setup guest params");
+        return -1;
+    }
+
+    /* Setup console and store PFNs to caller */
+    *console_gpfn = params.console_pfn;
+    *store_gpfn = params.store_pfn;
+
+    if ( restore_armhvm(xch, io_fd, dom, debug) )
+    {
+        ERROR("HVM not restored");
+        return -1;
+    }
+
+    if ( debug )
+    {
+         IPRINTF("Domain %d restored", dom);
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index f051515..044a8de 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -335,7 +335,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
         modbase += dtb_size;
     }
 
-    return 0;
+    return xc_domain_setmaxmem(dom->xch, dom->guest_domid,
+                               (dom->total_pages + NR_MAGIC_PAGES)
+                                << (PAGE_SHIFT - 10));
 }
 
 int arch_setup_bootearly(struct xc_dom_image *dom)
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index b2c3015..e10f4fb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -441,9 +441,6 @@
  *  - libxl_domain_resume
  *  - libxl_domain_remus_start
  */
-#if defined(__arm__) || defined(__aarch64__)
-#define LIBXL_HAVE_NO_SUSPEND_RESUME 1
-#endif
 
 /*
  * LIBXL_HAVE_DEVICE_PCI_SEIZE
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 17aeda5..0824100 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
 
 TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov
 TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-TARGETS-$(CONFIG_MIGRATE) += xen-hptool
+TARGETS-$(CONFIG_X86) += xen-hptool
 TARGETS := $(TARGETS-y)
 
 SUBDIRS := $(SUBDIRS-y)
@@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
 INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \
 	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
 INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
+INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
 INSTALL_SBIN := $(INSTALL_SBIN-y)
 
 INSTALL_PRIVBIN-y := xenpvnetboot
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-15 21:05 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
@ 2014-04-15 22:23   ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-15 22:23 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: andrew.cooper3, stefano.stabellini, ian.campbell, jaeyong.yoo,
	yjhyun.yoo

Hello Wei,

I guess you've sent the patch series twice by error?

Regards,

On 15/04/14 22:05, Wei Huang wrote:
> This series is RFC v2 for save/restore/migration. The following
> areas have been addressed:
>    * save and restore of guest states is split into specific areas (and files)
>    * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
>    * name and layout of some functions
>    * small areas commented by Julien Grall and Andrew Cooper
>
> Note that:
>    * previous comments by Ian are being examined.
>    * patch 3-6 need more review attenion.
>    * Rev v3 will be sent out soon.
>
> Let me know if there are issues with the design.
>
> Thanks,
> -Wei
>
>    xen/arm: Save and restore support with hvm context hypercalls
>    xen/arm: implement support for XENMEM_maximum_gpfn hypercall
>    xen/arm: support guest do_suspend function
>    xen/arm: Implement VLPT for guest p2m mapping in live migration
>    xen/arm: Implement hypercall for dirty page tracing
>    xen/arm: Implement toolstack for xl restore/save and migrate
>
>   config/arm32.mk                        |    1 +
>   config/arm64.mk                        |    1 +
>   tools/libxc/Makefile                   |    6 +-
>   tools/libxc/xc_arm_migrate.c           |  702 ++++++++++++++++++++++++++++++++
>   tools/libxc/xc_dom_arm.c               |    4 +-
>   tools/libxc/xc_resume.c                |   25 ++
>   tools/libxl/libxl.h                    |    3 -
>   tools/misc/Makefile                    |    4 +-
>   xen/arch/arm/Makefile                  |    1 +
>   xen/arch/arm/domain.c                  |   19 +
>   xen/arch/arm/domctl.c                  |   21 +
>   xen/arch/arm/hvm.c                     |  268 +++++++++++-
>   xen/arch/arm/mm.c                      |  242 ++++++++++-
>   xen/arch/arm/p2m.c                     |  211 ++++++++++
>   xen/arch/arm/save.c                    |   65 +++
>   xen/arch/arm/traps.c                   |   11 +
>   xen/arch/arm/vgic.c                    |  146 +++++++
>   xen/arch/arm/vtimer.c                  |   71 ++++
>   xen/arch/x86/domctl.c                  |   70 ----
>   xen/common/Makefile                    |    2 +-
>   xen/common/domctl.c                    |   74 ++++
>   xen/include/asm-arm/config.h           |    7 +
>   xen/include/asm-arm/domain.h           |   14 +
>   xen/include/asm-arm/hvm/support.h      |   29 ++
>   xen/include/asm-arm/mm.h               |   28 ++
>   xen/include/asm-arm/p2m.h              |    8 +-
>   xen/include/asm-arm/processor.h        |    2 +
>   xen/include/public/arch-arm/hvm/save.h |  130 ++++++
>   28 files changed, 2083 insertions(+), 82 deletions(-)
>   create mode 100644 tools/libxc/xc_arm_migrate.c
>   create mode 100644 xen/arch/arm/save.c
>   create mode 100644 xen/include/asm-arm/hvm/support.h
>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
@ 2014-04-15 22:29   ` Julien Grall
  2014-04-15 23:40   ` Andrew Cooper
  2014-04-22 17:54   ` Julien Grall
  2 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-15 22:29 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: andrew.cooper3, stefano.stabellini, ian.campbell, jaeyong.yoo,
	yjhyun.yoo

Hello Wei,

Thank you for the patch.

On 15/04/14 22:05, Wei Huang wrote:
> diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
> index ef291ff..47d1bce 100644
> --- a/xen/include/asm-arm/config.h
> +++ b/xen/include/asm-arm/config.h
> @@ -87,6 +87,7 @@
>    *   0  -   8M   <COMMON>
>    *
>    *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> + * 128M - 256M   Virtual-linear mapping to P2M table
>    * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
>    *                    space
>    *
> @@ -124,7 +125,9 @@
>   #define CONFIG_SEPARATE_XENHEAP 1
>
>   #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
>   #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START

Should not it be VMAP_VIRT_START - 1?

I would also directly use _AT(vaddr_t, 0x0fffffff) to stay consistent 
with the other *_END define.

I will review the rest of this patch soon.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall
  2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
@ 2014-04-15 22:46   ` Julien Grall
  2014-04-16 15:33     ` Wei Huang
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2014-04-15 22:46 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: andrew.cooper3, stefano.stabellini, ian.campbell, jaeyong.yoo,
	yjhyun.yoo

Hello Wei,

Thank you for the patch.

On 15/04/14 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> This patch implements ddomain_get_maximum_gpfn by using max_mapped_gfn
> field of P2M struct. A support function to retrieve guest VM pfn range
> is also added.
>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>   xen/arch/arm/mm.c        |   21 ++++++++++++++++++++-
>   xen/include/asm-arm/mm.h |    1 +
>   2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 362bc8d..473ad04 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -947,7 +947,11 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
>
>   unsigned long domain_get_maximum_gpfn(struct domain *d)
>   {
> -    return -ENOSYS;
> +    paddr_t end;
> +
> +    domain_get_gpfn_range(d, NULL, &end);
> +
> +    return (unsigned long)end;
>   }
>
>   void share_xen_page_with_guest(struct page_info *page,
> @@ -1235,6 +1239,21 @@ int is_iomem_page(unsigned long mfn)
>           return 1;
>       return 0;
>   }
> +
> +/*
> + * Return start and end addresses of guest VM
> + */
> +void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end)

The content of the function doesn't match the name.

This function should return a PFN not an address.
Actually, libxc (i.e the return of domain_get_maximum_gpfn) expect a pfn.

> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +
> +    if ( start )
> +        *start = GUEST_RAM_BASE;

You can use  p2m->lowest_mapped_gfn here.

> +    if ( end )
> +        *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);

This is wrong, max_mapped_gpfn contains a guest frame number, not a 
number of frames.

The code should be smth like:

*end = pfn_to_paddr(p2m->max_mapped_gfn);

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
@ 2014-04-15 23:35   ` Julien Grall
  2014-04-23 11:59   ` Julien Grall
  1 sibling, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-15 23:35 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: andrew.cooper3, stefano.stabellini, ian.campbell, jaeyong.yoo,
	yjhyun.yoo

Hi Wei,

Thank you for the patch.

On 15/04/14 22:05, Wei Huang wrote:
>   long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
>                       XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>   {
> +    long ret = 0;
> +
>       switch ( domctl->cmd )
>       {
> +    case XEN_DOMCTL_shadow_op:
> +    {
> +        if ( d == current->domain ) /* no domain_pause() */

Error message here?

> +            return -EINVAL;
> +
> +        domain_pause(d);
> +        ret = dirty_mode_op(d, &domctl->u.shadow_op);
> +        domain_unpause(d);

Why do you pause/unpause the domain? Also you have to check that:
     - You have the right via XSM.
     - The domain is not dying...

See paging_domctl in arch/x86/p2m/paging.c

[..]

> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 403fd89..d57a44a 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -6,6 +6,8 @@
>   #include <xen/bitops.h>
>   #include <asm/flushtlb.h>
>   #include <asm/gic.h>
> +#include <xen/guest_access.h>
> +#include <xen/pfn.h>
>   #include <asm/event.h>
>   #include <asm/hardirq.h>
>   #include <asm/page.h>
> @@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
>           break;
>
>       case p2m_ram_ro:
> +    case p2m_ram_logdirty:
>           e.p2m.xn = 0;
>           e.p2m.write = 0;
>           break;
> @@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
>
>       pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
>
> +    /* mark the write bit (page table's case, ro bit) as 0. So it is writable

mark the entry as write-only?

> +     * in case of vlpt access */
> +    pte.pt.ro = 0;
> +
>       write_pte(entry, pte);
>
>       return 0;
> @@ -697,6 +704,210 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
>       return p >> PAGE_SHIFT;
>   }

> +/* Change types across all p2m entries in a domain */
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt)
> +{

Can't you reuse apply_p2m_changes? I'm also concerned about preemption. 
This function might be very long to run (depending on the size of the 
memory).

> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    paddr_t ram_base;
> +    int i1, i2, i3;
> +    int first_index, second_index, third_index;
> +    lpae_t *first = __map_domain_page(p2m->first_level);
> +    lpae_t pte, *second = NULL, *third = NULL;
> +
> +    domain_get_gpfn_range(d, &ram_base, NULL);

Just careful, begin and end correspond to the bound for all addresses 
mapped in the guest (i.e RAM, MMIO, foreign page, grant table...).

We don't want to log other type than RAM.

Modifying the behavior of {max,lowest}_mapped_gfn won't work because the 
guest might use different banks with MMIO region.

> +    first_index = first_table_offset((uint64_t)ram_base);
> +    second_index = second_table_offset((uint64_t)ram_base);
> +    third_index = third_table_offset((uint64_t)ram_base);
> +
> +    BUG_ON( !first && "Can't map first level p2m." );
> +

This BUG_ON is not necessary. __map_domain_page always return a valid 
pointer.

> +    spin_lock(&p2m->lock);
> +
> +    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
> +    {
> +        lpae_walk_t first_pte = first[i1].walk;
> +
> +        if ( !first_pte.valid || !first_pte.table )
> +            goto out;
> +
> +        second = map_domain_page(first_pte.base);
> +        BUG_ON( !second && "Can't map second level p2m.");
> +
> +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> +        {
> +            lpae_walk_t second_pte = second[i2].walk;
> +
> +            if ( !second_pte.valid || !second_pte.table )
> +                goto out;
> +
> +            third = map_domain_page(second_pte.base);
> +            BUG_ON( !third && "Can't map third level p2m.");
> +
> +            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
> +            {
> +
> +                lpae_walk_t third_pte = third[i3].walk;
> +                if ( !third_pte.valid )
> +                    goto out;
> +
> +                pte = third[i3];
> +                if ( nt == mg_ro )

It will use a switch case for nt. It will be clearer and easier to extend.

> +                {
> +                    if ( pte.p2m.write == 1 )

As said earlier, this will trap everything that is writeable. We only 
want trap RAM.

I would replace by if ( pte.p2m.type == p2m_ram ).

> +                    {
> +                        pte.p2m.write = 0;
> +                        pte.p2m.type = p2m_ram_logdirty;
> +                    }
> +                    else
> +                    {
> +                        /* reuse avail bit as an indicator of 'actual'
> +                         * read-only */
> +                        pte.p2m.type = p2m_ram_rw;

Why do you unconditionally change the type?

> +                    }
> +                }
> +                else if ( nt == mg_rw )
> +                {
> +                    if ( pte.p2m.write == 0 &&
> +                         pte.p2m.type == p2m_ram_logdirty )

Can you add a comment to say what does this if means?

> +                    {
> +                        pte.p2m.write = p2m_ram_rw;

Wrong field?

> +                    }
> +                }
> +                write_pte(&third[i3], pte);
> +            }
> +            unmap_domain_page(third);
> +
> +            third = NULL;
> +            third_index = 0;
> +        }
> +        unmap_domain_page(second);
> +
> +        second = NULL;
> +        second_index = 0;
> +        third_index = 0;
> +    }
> +
> +out:
> +    flush_tlb_all_local();

you want to flush the P2M on every CPU and only for the current VMID.


You should use flush_tlb(). You might need to switch to the DOMID before.

[..]

> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index a7edc4e..cca34e9 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1491,6 +1491,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>       struct hsr_dabt dabt = hsr.dabt;
>       int rc;
>       mmio_info_t info;
> +    int page_fault = ( (dabt.dfsc & FSC_MASK) ==
> +                       (FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write );
>
>       if ( !check_conditional_instr(regs, hsr) )
>       {
> @@ -1512,6 +1514,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>       if ( rc == -EFAULT )
>           goto bad_data_abort;
>
> +    /* domU page fault handling for guest live migration. dabt.valid can be

I would remove domU in this comment.

> +     * 0 here.
> +     */
> +    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
> +    {
> +        /* Do not modify pc after page fault to repeat memory operation */
> +        return;
> +    }
> +
>       /* XXX: Decode the instruction if ISS is not valid */
>       if ( !dabt.valid )
>           goto bad_data_abort;

[..]

> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 5fd684f..5f9478b 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -343,10 +343,18 @@ static inline void put_page_and_type(struct page_info *page)
>       put_page(page);
>   }
>
> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };

Please describe this enum. Also mg is too generic.

> +
> +/* routine for dirty-page tracing */
> +int handle_page_fault(struct domain *d, paddr_t addr);
> +
>   int prepare_vlpt(struct domain *d);
>   void cleanup_vlpt(struct domain *d);
>   void restore_vlpt(struct domain *d);
>
> +int prepare_bitmap(struct domain *d);
> +void cleanup_bitmap(struct domain *d);

{prepare,cleanup} bitmap of what? The name is too generic here.

Regards.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
@ 2014-04-15 23:37   ` Andrew Cooper
  2014-04-16 21:50     ` Wei Huang
  2014-04-16  9:48   ` Julien Grall
  2014-04-17 15:06   ` Julien Grall
  2 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2014-04-15 23:37 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 15/04/2014 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> This patch implements HVM context hypercalls to support ARM
> guest save and restore. It saves the states of guest GIC,
> arch timer, and CPU registers.
>
> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/Makefile                  |    1 +
>  xen/arch/arm/hvm.c                     |  268 +++++++++++++++++++++++++++++++-
>  xen/arch/arm/save.c                    |   65 ++++++++
>  xen/arch/arm/vgic.c                    |  146 +++++++++++++++++
>  xen/arch/arm/vtimer.c                  |   71 +++++++++
>  xen/arch/x86/domctl.c                  |   70 ---------
>  xen/common/Makefile                    |    2 +-
>  xen/common/domctl.c                    |   74 +++++++++
>  xen/include/asm-arm/hvm/support.h      |   29 ++++
>  xen/include/public/arch-arm/hvm/save.h |  130 ++++++++++++++++
>  10 files changed, 784 insertions(+), 72 deletions(-)
>  create mode 100644 xen/arch/arm/save.c
>  create mode 100644 xen/include/asm-arm/hvm/support.h
>
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 63e0460..d9a328c 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -33,6 +33,7 @@ obj-y += hvm.o
>  obj-y += device.o
>  obj-y += decode.o
>  obj-y += processor.o
> +obj-y += save.o
>  
>  #obj-bin-y += ....o
>  
> diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
> index 471c4cd..18eb2bd 100644
> --- a/xen/arch/arm/hvm.c
> +++ b/xen/arch/arm/hvm.c
> @@ -4,6 +4,7 @@
>  #include <xen/errno.h>
>  #include <xen/guest_access.h>
>  #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>  
>  #include <xsm/xsm.h>
>  
> @@ -12,9 +13,9 @@
>  #include <public/hvm/hvm_op.h>
>  
>  #include <asm/hypercall.h>
> +#include <asm/gic.h>
>  
>  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
> -
>  {
>      long rc = 0;
>  
> @@ -65,3 +66,268 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      return rc;
>  }
> +
> +/* Save CPU related states into save/retore context */
> +static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_hw_cpu ctxt;
> +    struct vcpu_guest_core_regs c;
> +    struct vcpu *v;
> +    int ret = 0;
> +
> +    /* Save the state of CPU */
> +    for_each_vcpu( d, v )
> +    {
> +        memset(&ctxt, 0, sizeof(ctxt));
> +
> +        ctxt.sctlr = v->arch.sctlr;
> +        ctxt.ttbr0 = v->arch.ttbr0;
> +        ctxt.ttbr1 = v->arch.ttbr1;
> +        ctxt.ttbcr = v->arch.ttbcr;
> +
> +        ctxt.dacr = v->arch.dacr;
> +        ctxt.ifsr = v->arch.ifsr;
> +#ifdef CONFIG_ARM_32
> +        ctxt.ifar = v->arch.ifar;
> +        ctxt.dfar = v->arch.dfar;
> +        ctxt.dfsr = v->arch.dfsr;
> +#else
> +        ctxt.far = v->arch.far;
> +        ctxt.esr = v->arch.esr;
> +#endif
> +
> +#ifdef CONFIG_ARM_32
> +        ctxt.mair0 = v->arch.mair0;
> +        ctxt.mair1 = v->arch.mair1;
> +#else
> +        ctxt.mair0 = v->arch.mair;
> +#endif
> +
> +        /* Control Registers */
> +        ctxt.actlr = v->arch.actlr;
> +        ctxt.sctlr = v->arch.sctlr;
> +        ctxt.cpacr = v->arch.cpacr;
> +
> +        ctxt.contextidr = v->arch.contextidr;
> +        ctxt.tpidr_el0 = v->arch.tpidr_el0;
> +        ctxt.tpidr_el1 = v->arch.tpidr_el1;
> +        ctxt.tpidrro_el0 = v->arch.tpidrro_el0;
> +
> +        /* CP 15 */
> +        ctxt.csselr = v->arch.csselr;
> +
> +        ctxt.afsr0 = v->arch.afsr0;
> +        ctxt.afsr1 = v->arch.afsr1;
> +        ctxt.vbar = v->arch.vbar;
> +        ctxt.par = v->arch.par;
> +        ctxt.teecr = v->arch.teecr;
> +        ctxt.teehbr = v->arch.teehbr;
> +
> +#ifdef CONFIG_ARM_32
> +        ctxt.joscr = v->arch.joscr;
> +        ctxt.jmcr = v->arch.jmcr;
> +#endif
> +
> +        memset(&c, 0, sizeof(c));
> +
> +        /* get guest core registers */
> +        vcpu_regs_hyp_to_user(v, &c);
> +
> +        ctxt.x0 = c.x0;
> +        ctxt.x1 = c.x1;
> +        ctxt.x2 = c.x2;
> +        ctxt.x3 = c.x3;
> +        ctxt.x4 = c.x4;
> +        ctxt.x5 = c.x5;
> +        ctxt.x6 = c.x6;
> +        ctxt.x7 = c.x7;
> +        ctxt.x8 = c.x8;
> +        ctxt.x9 = c.x9;
> +        ctxt.x10 = c.x10;
> +        ctxt.x11 = c.x11;
> +        ctxt.x12 = c.x12;
> +        ctxt.x13 = c.x13;
> +        ctxt.x14 = c.x14;
> +        ctxt.x15 = c.x15;
> +        ctxt.x16 = c.x16;
> +        ctxt.x17 = c.x17;
> +        ctxt.x18 = c.x18;
> +        ctxt.x19 = c.x19;
> +        ctxt.x20 = c.x20;
> +        ctxt.x21 = c.x21;
> +        ctxt.x22 = c.x22;
> +        ctxt.x23 = c.x23;
> +        ctxt.x24 = c.x24;
> +        ctxt.x25 = c.x25;
> +        ctxt.x26 = c.x26;
> +        ctxt.x27 = c.x27;
> +        ctxt.x28 = c.x28;
> +        ctxt.x29 = c.x29;
> +        ctxt.x30 = c.x30;
> +        ctxt.pc64 = c.pc64;
> +        ctxt.cpsr = c.cpsr;
> +        ctxt.spsr_el1 = c.spsr_el1; /* spsr_svc */
> +
> +#ifdef CONFIG_ARM_32
> +        ctxt.spsr_fiq = c.spsr_fiq;
> +        ctxt.spsr_irq = c.spsr_irq;
> +        ctxt.spsr_und = c.spsr_und;
> +        ctxt.spsr_abt = c.spsr_abt;
> +#endif
> +#ifdef CONFIG_ARM_64
> +        ctxt.sp_el0 = c.sp_el0;
> +        ctxt.sp_el1 = c.sp_el1;
> +        ctxt.elr_el1 = c.elr_el1;
> +#endif
> +
> +        /* check VFP state size */
> +        BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp));
> +        memcpy((void*) &ctxt.vfp, (void*) &v->arch.vfp, sizeof(v->arch.vfp));
> +
> +        ctxt.pause_flags = v->pause_flags;
> +
> +        if ( (ret = hvm_save_entry(VCPU, v->vcpu_id, h, &ctxt)) != 0 )
> +            return ret;
> +    }
> +
> +    return ret;
> +}
> +
> +/* Load CPU related states from existing save/retore context */
> +static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_hw_cpu ctxt;
> +    struct vcpu *v;
> +    struct vcpu_guest_core_regs c;
> +
> +    vcpuid = hvm_load_instance(h);
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(VCPU, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    v->arch.sctlr = ctxt.sctlr;
> +    v->arch.ttbr0 = ctxt.ttbr0;
> +    v->arch.ttbr1 = ctxt.ttbr1;
> +    v->arch.ttbcr = ctxt.ttbcr;
> +
> +    v->arch.dacr = ctxt.dacr;
> +    v->arch.ifsr = ctxt.ifsr;
> +#ifdef CONFIG_ARM_32
> +    v->arch.ifar = ctxt.ifar;
> +    v->arch.dfar = ctxt.dfar;
> +    v->arch.dfsr = ctxt.dfsr;
> +#else
> +    v->arch.far = ctxt.far;
> +    v->arch.esr = ctxt.esr;
> +#endif
> +
> +#ifdef CONFIG_ARM_32
> +    v->arch.mair0 = ctxt.mair0;
> +    v->arch.mair1 = ctxt.mair1;
> +#else
> +    v->arch.mair = ctxt.mair0;
> +#endif
> +
> +    /* Control Registers */
> +    v->arch.actlr = ctxt.actlr;
> +    v->arch.cpacr = ctxt.cpacr;
> +    v->arch.contextidr = ctxt.contextidr;
> +    v->arch.tpidr_el0 = ctxt.tpidr_el0;
> +    v->arch.tpidr_el1 = ctxt.tpidr_el1;
> +    v->arch.tpidrro_el0 = ctxt.tpidrro_el0;
> +
> +    /* CP 15 */
> +    v->arch.csselr = ctxt.csselr;
> +
> +    v->arch.afsr0 = ctxt.afsr0;
> +    v->arch.afsr1 = ctxt.afsr1;
> +    v->arch.vbar = ctxt.vbar;
> +    v->arch.par = ctxt.par;
> +    v->arch.teecr = ctxt.teecr;
> +    v->arch.teehbr = ctxt.teehbr;
> +#ifdef CONFIG_ARM_32
> +    v->arch.joscr = ctxt.joscr;
> +    v->arch.jmcr = ctxt.jmcr;
> +#endif
> +
> +    /* fill guest core registers */
> +    memset(&c, 0, sizeof(c));
> +    c.x0 = ctxt.x0;
> +    c.x1 = ctxt.x1;
> +    c.x2 = ctxt.x2;
> +    c.x3 = ctxt.x3;
> +    c.x4 = ctxt.x4;
> +    c.x5 = ctxt.x5;
> +    c.x6 = ctxt.x6;
> +    c.x7 = ctxt.x7;
> +    c.x8 = ctxt.x8;
> +    c.x9 = ctxt.x9;
> +    c.x10 = ctxt.x10;
> +    c.x11 = ctxt.x11;
> +    c.x12 = ctxt.x12;
> +    c.x13 = ctxt.x13;
> +    c.x14 = ctxt.x14;
> +    c.x15 = ctxt.x15;
> +    c.x16 = ctxt.x16;
> +    c.x17 = ctxt.x17;
> +    c.x18 = ctxt.x18;
> +    c.x19 = ctxt.x19;
> +    c.x20 = ctxt.x20;
> +    c.x21 = ctxt.x21;
> +    c.x22 = ctxt.x22;
> +    c.x23 = ctxt.x23;
> +    c.x24 = ctxt.x24;
> +    c.x25 = ctxt.x25;
> +    c.x26 = ctxt.x26;
> +    c.x27 = ctxt.x27;
> +    c.x28 = ctxt.x28;
> +    c.x29 = ctxt.x29;
> +    c.x30 = ctxt.x30;
> +    c.pc64 = ctxt.pc64;
> +    c.cpsr = ctxt.cpsr;
> +    c.spsr_el1 = ctxt.spsr_el1; /* spsr_svc */
> +
> +#ifdef CONFIG_ARM_32
> +    c.spsr_fiq = ctxt.spsr_fiq;
> +    c.spsr_irq = ctxt.spsr_irq;
> +    c.spsr_und = ctxt.spsr_und;
> +    c.spsr_abt = ctxt.spsr_abt;
> +#endif
> +#ifdef CONFIG_ARM_64
> +    c.sp_el0 = ctxt.sp_el0;
> +    c.sp_el1 = ctxt.sp_el1;
> +    c.elr_el1 = ctxt.elr_el1;
> +#endif
> +
> +    /* set guest core registers */
> +    vcpu_regs_user_to_hyp(v, &c);
> +
> +    /* check VFP state size */
> +    BUILD_BUG_ON(sizeof(v->arch.vfp) > sizeof (ctxt.vfp));
> +    memcpy(&v->arch.vfp, &ctxt,  sizeof(v->arch.vfp));
> +
> +    v->is_initialised = 1;
> +    v->pause_flags = ctxt.pause_flags;
> +
> +    return 0;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(VCPU, hvm_save_cpu_ctxt, hvm_load_cpu_ctxt, 1,
> +                          HVMSR_PER_VCPU);
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c
> new file mode 100644
> index 0000000..eef14a8
> --- /dev/null
> +++ b/xen/arch/arm/save.c
> @@ -0,0 +1,65 @@
> +/*
> + * Save.c: Save and restore HVM guest's emulated hardware state for ARM.
> + *
> + * Copyright (c) 2014, Samsung Electronics.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +#include <asm/hvm/support.h>
> +#include <public/hvm/save.h>
> +
> +void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
> +{
> +    hdr->cpuid = current_cpu_data.midr.bits;
> +}
> +
> +int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
> +{
> +    uint32_t cpuid;
> +
> +    if ( hdr->magic != HVM_FILE_MAGIC )
> +    {
> +        printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n",
> +               d->domain_id, hdr->magic);
> +        return -EINVAL;
> +    }
> +
> +    if ( hdr->version != HVM_FILE_VERSION )
> +    {
> +        printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n",
> +               d->domain_id, hdr->version);
> +        return -EINVAL;
> +    }
> +
> +    cpuid = current_cpu_data.midr.bits;
> +    if ( hdr->cpuid != cpuid )
> +    {
> +        printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU "
> +               "(%#"PRIx32") and restored on another (%#"PRIx32").\n",
> +               d->domain_id, hdr->cpuid, cpuid);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 9fc9586..af244a7 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <xen/softirq.h>
>  #include <xen/irq.h>
>  #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>  
>  #include <asm/current.h>
>  
> @@ -73,6 +74,75 @@ static struct vgic_irq_rank *vgic_irq_rank(struct vcpu *v, int b, int n)
>          return NULL;
>  }
>  
> +/* Save rank info into a context to support domain save/restore */
> +static int vgic_save_irq_rank(struct vcpu *v, struct vgic_rank *ext,
> +                              struct vgic_irq_rank *rank)
> +{
> +    spin_lock(&rank->lock);
> +
> +    /* IENABLE, IACTIVE, IPEND, PENDSGI registers */
> +    ext->ienable = rank->ienable;
> +    ext->iactive = rank->iactive;
> +    ext->ipend = rank->ipend;
> +    ext->pendsgi = rank->pendsgi;
> +
> +    /* ICFG */
> +    ext->icfg[0] = rank->icfg[0];
> +    ext->icfg[1] = rank->icfg[1];
> +
> +    /* IPRIORITY */
> +    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ext->ipriority));
> +    memcpy(ext->ipriority, rank->ipriority, sizeof(rank->ipriority));
> +
> +    /* ITARGETS */
> +    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ext->itargets));
> +    memcpy(ext->itargets, rank->itargets, sizeof(rank->itargets));
> +
> +    spin_unlock(&rank->lock);
> +
> +    return 0;
> +}
> +
> +/* Load rank info from a context to support for domain save/restore */
> +static int vgic_load_irq_rank(struct vcpu *v, struct vgic_irq_rank *rank,
> +                              struct vgic_rank *ext)
> +{
> +    struct pending_irq *p;
> +    unsigned int irq = 0;
> +    const unsigned long enable_bits = ext->ienable;
> +
> +    spin_lock(&rank->lock);
> +
> +    /* IENABLE, IACTIVE, IPEND, PENDSGI registers */
> +    rank->ienable = ext->ienable;
> +    rank->iactive = ext->iactive;
> +    rank->ipend = ext->ipend;
> +    rank->pendsgi = ext->pendsgi;
> +
> +    /* ICFG */
> +    rank->icfg[0] = ext->icfg[0];
> +    rank->icfg[1] = ext->icfg[1];
> +
> +    /* IPRIORITY */
> +    BUILD_BUG_ON(sizeof(rank->ipriority) != sizeof (ext->ipriority));
> +    memcpy(rank->ipriority, ext->ipriority, sizeof(rank->ipriority));
> +
> +    /* ITARGETS */
> +    BUILD_BUG_ON(sizeof(rank->itargets) != sizeof (ext->itargets));
> +    memcpy(rank->itargets, ext->itargets, sizeof(rank->itargets));
> +
> +    /* Set IRQ status as enabled by iterating through rank->ienable */
> +    while ( (irq = find_next_bit(&enable_bits, 32, irq)) < 32 ) {
> +        p = irq_to_pending(v, irq);
> +        set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
> +        irq++;
> +    }
> +
> +    spin_unlock(&rank->lock);
> +
> +    return 0;
> +}
> +
>  int domain_vgic_init(struct domain *d)
>  {
>      int i;
> @@ -749,6 +819,82 @@ out:
>          smp_send_event_check_mask(cpumask_of(v->processor));
>  }
>  
> +
> +/* Save GIC state into a context to support save/restore */
> +static int hvm_gic_save_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_hw_gic ctxt;
> +    struct vcpu *v;
> +    int ret = 0;
> +
> +    /* Save the state of GICs */
> +    for_each_vcpu( d, v )
> +    {
> +        ctxt.gic_hcr = v->arch.gic_hcr;
> +        ctxt.gic_vmcr = v->arch.gic_vmcr;
> +        ctxt.gic_apr = v->arch.gic_apr;
> +
> +        /* Save list registers and masks */
> +        BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
> +        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
> +
> +        ctxt.lr_mask = v->arch.lr_mask;
> +        ctxt.event_mask = v->arch.event_mask;
> +
> +        /* Save PPI states (per-CPU), necessary for SMP-enabled guests */
> +        if ( (ret = vgic_save_irq_rank(v, &ctxt.ppi_state,
> +                                       &v->arch.vgic.private_irqs)) != 0 )
> +            return ret;
> +
> +        if ( (ret = hvm_save_entry(GIC, v->vcpu_id, h, &ctxt)) != 0 )
> +            return ret;
> +    }
> +
> +    return ret;
> +}
> +
> +/* Restore GIC state from a context to support save/restore */
> +static int hvm_gic_load_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_hw_gic ctxt;
> +    struct vcpu *v;
> +    int ret = 0;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(GIC, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    v->arch.gic_hcr = ctxt.gic_hcr;
> +    v->arch.gic_vmcr = ctxt.gic_vmcr;
> +    v->arch.gic_apr = ctxt.gic_apr;
> +
> +    /* Restore list registers and masks */
> +    BUILD_BUG_ON(sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr));
> +    memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(v->arch.gic_lr));
> +
> +    v->arch.lr_mask = ctxt.lr_mask;
> +    v->arch.event_mask = ctxt.event_mask;
> +
> +    /* Restore PPI states */
> +    if ( (ret = vgic_load_irq_rank(v, &v->arch.vgic.private_irqs,
> +                                   &ctxt.ppi_state)) != 0 )
> +        return ret;
> +
> +    return ret;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(GIC, hvm_gic_save_ctxt, hvm_gic_load_ctxt, 1,
> +                          HVMSR_PER_VCPU);
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
> index 3d6a721..7c47eac 100644
> --- a/xen/arch/arm/vtimer.c
> +++ b/xen/arch/arm/vtimer.c
> @@ -21,6 +21,7 @@
>  #include <xen/lib.h>
>  #include <xen/timer.h>
>  #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>  #include <asm/irq.h>
>  #include <asm/time.h>
>  #include <asm/gic.h>
> @@ -284,6 +285,76 @@ int vtimer_emulate(struct cpu_user_regs *regs, union hsr hsr)
>      }
>  }
>  
> +static int hvm_vtimer_save_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_hw_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t;
> +    int i, ret = 0;
> +
> +    /* Save the state of vtimer and ptimer */
> +    for_each_vcpu( d, v )
> +    {
> +        t = &v->arch.virt_timer;
> +        for ( i = 0; i < 2; i++ )
> +        {
> +            ctxt.cval = t->cval;
> +            ctxt.ctl = t->ctl;
> +            ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset :
> +                d->arch.virt_timer_base.offset;
> +            ctxt.type = i ? TIMER_TYPE_PHYS : TIMER_TYPE_VIRT;
> +
> +            if ( (ret = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
> +                return ret;
> +
> +            t = &v->arch.phys_timer;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static int hvm_vtimer_load_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_hw_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t = NULL;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    if ( ctxt.type == TIMER_TYPE_VIRT )
> +    {
> +        t = &v->arch.virt_timer;
> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
> +    }
> +    else
> +    {
> +        t = &v->arch.phys_timer;
> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
> +    }
> +
> +    t->cval = ctxt.cval;
> +    t->ctl = ctxt.ctl;
> +    t->v = v;
> +
> +    return 0;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(TIMER, hvm_vtimer_save_ctxt, hvm_vtimer_load_ctxt,
> +                          2, HVMSR_PER_VCPU);
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 26635ff..30fbd30 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -399,76 +399,6 @@ long arch_do_domctl(
>      }
>      break;
>  
> -    case XEN_DOMCTL_sethvmcontext:
> -    { 
> -        struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size };
> -
> -        ret = -EINVAL;
> -        if ( !is_hvm_domain(d) ) 
> -            goto sethvmcontext_out;
> -
> -        ret = -ENOMEM;
> -        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
> -            goto sethvmcontext_out;
> -
> -        ret = -EFAULT;
> -        if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0)
> -            goto sethvmcontext_out;
> -
> -        domain_pause(d);
> -        ret = hvm_load(d, &c);
> -        domain_unpause(d);
> -
> -    sethvmcontext_out:
> -        if ( c.data != NULL )
> -            xfree(c.data);
> -    }
> -    break;
> -
> -    case XEN_DOMCTL_gethvmcontext:
> -    { 
> -        struct hvm_domain_context c = { 0 };
> -
> -        ret = -EINVAL;
> -        if ( !is_hvm_domain(d) ) 
> -            goto gethvmcontext_out;
> -
> -        c.size = hvm_save_size(d);
> -
> -        if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) )
> -        {
> -            /* Client is querying for the correct buffer size */
> -            domctl->u.hvmcontext.size = c.size;
> -            ret = 0;
> -            goto gethvmcontext_out;            
> -        }
> -
> -        /* Check that the client has a big enough buffer */
> -        ret = -ENOSPC;
> -        if ( domctl->u.hvmcontext.size < c.size ) 
> -            goto gethvmcontext_out;
> -
> -        /* Allocate our own marshalling buffer */
> -        ret = -ENOMEM;
> -        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
> -            goto gethvmcontext_out;
> -
> -        domain_pause(d);
> -        ret = hvm_save(d, &c);
> -        domain_unpause(d);
> -
> -        domctl->u.hvmcontext.size = c.cur;
> -        if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 )
> -            ret = -EFAULT;
> -
> -    gethvmcontext_out:
> -        copyback = 1;
> -
> -        if ( c.data != NULL )
> -            xfree(c.data);
> -    }
> -    break;
> -
>      case XEN_DOMCTL_gethvmcontext_partial:
>      { 
>          ret = -EINVAL;
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 3683ae3..13b781f 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -62,7 +62,7 @@ obj-$(CONFIG_XENCOMM) += xencomm.o
>  
>  subdir-$(CONFIG_COMPAT) += compat
>  
> -subdir-$(x86_64) += hvm
> +subdir-y += hvm
>  
>  subdir-$(coverage) += gcov
>  
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 5342e5d..2ea4af5 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -24,6 +24,8 @@
>  #include <xen/bitmap.h>
>  #include <xen/paging.h>
>  #include <xen/hypercall.h>
> +#include <xen/hvm/save.h>
> +#include <xen/guest_access.h>
>  #include <asm/current.h>
>  #include <asm/irq.h>
>  #include <asm/page.h>
> @@ -881,6 +883,78 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      }
>      break;
>  
> +    case XEN_DOMCTL_sethvmcontext:
> +    {
> +        struct hvm_domain_context c = { .size = op->u.hvmcontext.size };
> +
> +        ret = -EINVAL;
> +        if ( (d == current->domain) || /* no domain_pause() */
> +             !is_hvm_domain(d) )
> +            goto sethvmcontext_out;
> +
> +        ret = -ENOMEM;
> +        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
> +            goto sethvmcontext_out;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(c.data, op->u.hvmcontext.buffer, c.size) != 0)

The != 0 is not needed.

> +            goto sethvmcontext_out;
> +
> +        domain_pause(d);
> +        ret = hvm_load(d, &c);
> +        domain_unpause(d);
> +
> +    sethvmcontext_out:
> +        if ( c.data != NULL )
> +            xfree(c.data);

I know you have copied this code and it was poor to start with, but
xfree() has the same semantics as free(), so is happy with a NULL
pointer.  You can drop the if condition here and below in getcontext.

> +    }
> +    break;
> +
> +    case XEN_DOMCTL_gethvmcontext:
> +    {
> +        struct hvm_domain_context c = { 0 };
> +
> +        ret = -EINVAL;
> +        if ( (d == current->domain) || /* no domain_pause() */
> +             !is_hvm_domain(d) )
> +            goto gethvmcontext_out;
> +
> +        c.size = hvm_save_size(d);
> +
> +        if ( guest_handle_is_null(op->u.hvmcontext.buffer) )
> +        {
> +            /* Client is querying for the correct buffer size */
> +            op->u.hvmcontext.size = c.size;
> +            ret = 0;
> +            goto gethvmcontext_out;
> +        }
> +
> +        /* Check that the client has a big enough buffer */
> +        ret = -ENOSPC;
> +        if ( op->u.hvmcontext.size < c.size )
> +            goto gethvmcontext_out;
> +
> +        /* Allocate our own marshalling buffer */
> +        ret = -ENOMEM;
> +        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
> +            goto gethvmcontext_out;
> +
> +        domain_pause(d);
> +        ret = hvm_save(d, &c);
> +        domain_unpause(d);
> +
> +        op->u.hvmcontext.size = c.cur;
> +        if ( copy_to_guest(op->u.hvmcontext.buffer, c.data, c.size) != 0 )
> +            ret = -EFAULT;
> +
> +    gethvmcontext_out:
> +        copyback = 1;
> +
> +        if ( c.data != NULL )
> +            xfree(c.data);
> +    }
> +    break;
> +
>      default:
>          ret = arch_do_domctl(op, d, u_domctl);
>          break;
> diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
> new file mode 100644
> index 0000000..09f7cb8
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/support.h
> @@ -0,0 +1,29 @@
> +/*
> + * asm-arm/hvm/support.h: HVM support routines used by ARM.
> + *
> + * Copyright (c) 2014, Samsung Electronics.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
> +#define __ASM_ARM_HVM_SUPPORT_H__
> +
> +#include <xen/types.h>
> +#include <public/hvm/ioreq.h>
> +#include <xen/sched.h>
> +#include <xen/hvm/save.h>
> +#include <asm/processor.h>
> +
> +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 75b8e65..f6ad258 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -26,6 +26,136 @@
>  #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>  #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>  
> +#define HVM_FILE_MAGIC   0x92385520
> +#define HVM_FILE_VERSION 0x00000001
> +
> +struct hvm_save_header
> +{
> +    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
> +    uint32_t version;           /* File format version */
> +    uint64_t changeset;         /* Version of Xen that saved this file */
> +    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */

This looks needlessly copied from x86, which is far from ideal.

On x86, Xen tries to parse the mercural revision number from its compile
time information and fails now that the underlying codebase has moved
from hg to git.  As a result, the value is now generally -1.

cpuid is also an x86ism as far as I am aware.

I also wonder about the wisdom of having identically named structures
like this in arch code without an arch_ prefix?  We should make it as
hard as possible for things like this to accidentally get referenced in
common code.

> +};
> +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
> +
> +struct vgic_rank
> +{
> +    uint32_t ienable, iactive, ipend, pendsgi;
> +    uint32_t icfg[2];
> +    uint32_t ipriority[8];
> +    uint32_t itargets[8];
> +};
> +
> +struct hvm_hw_gic
> +{
> +    uint32_t gic_hcr;
> +    uint32_t gic_vmcr;
> +    uint32_t gic_apr;
> +    uint32_t gic_lr[64];
> +    uint64_t event_mask;

Does this uint64_t has alignment issues between 32 and 64 bit builds? 
It certainly would on x86, but I don't know for sure on arm.

> +    uint64_t lr_mask;
> +    struct vgic_rank ppi_state;
> +};
> +DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic);
> +
> +#define TIMER_TYPE_VIRT 0
> +#define TIMER_TYPE_PHYS 1
> +
> +struct hvm_hw_timer
> +{
> +    uint64_t vtb_offset;
> +    uint32_t ctl;
> +    uint64_t cval;

Another alignment query.

> +    uint32_t type;
> +};
> +DECLARE_HVM_SAVE_TYPE(TIMER, 3, struct hvm_hw_timer);
> +
> +struct hvm_hw_cpu
> +{
> +#ifdef CONFIG_ARM_32
> +    uint64_t vfp[34];  /* 32-bit VFP registers */
> +#else
> +    uint64_t vfp[66];  /* 64-bit VFP registers */
> +#endif
> +
> +    /* Guest core registers */
> +    uint64_t x0;     /* r0_usr */
> +    uint64_t x1;     /* r1_usr */
> +    uint64_t x2;     /* r2_usr */
> +    uint64_t x3;     /* r3_usr */
> +    uint64_t x4;     /* r4_usr */
> +    uint64_t x5;     /* r5_usr */
> +    uint64_t x6;     /* r6_usr */
> +    uint64_t x7;     /* r7_usr */
> +    uint64_t x8;     /* r8_usr */
> +    uint64_t x9;     /* r9_usr */
> +    uint64_t x10;    /* r10_usr */
> +    uint64_t x11;    /* r11_usr */
> +    uint64_t x12;    /* r12_usr */
> +    uint64_t x13;    /* sp_usr */
> +    uint64_t x14;    /* lr_usr; */
> +    uint64_t x15;    /* __unused_sp_hyp */
> +    uint64_t x16;    /* lr_irq */
> +    uint64_t x17;    /* sp_irq */
> +    uint64_t x18;    /* lr_svc */
> +    uint64_t x19;    /* sp_svc */
> +    uint64_t x20;    /* lr_abt */
> +    uint64_t x21;    /* sp_abt */
> +    uint64_t x22;    /* lr_und */
> +    uint64_t x23;    /* sp_und */
> +    uint64_t x24;    /* r8_fiq */
> +    uint64_t x25;    /* r9_fiq */
> +    uint64_t x26;    /* r10_fiq */
> +    uint64_t x27;    /* r11_fiq */
> +    uint64_t x28;    /* r12_fiq */
> +    uint64_t x29;    /* fp,sp_fiq */
> +    uint64_t x30;    /* lr_fiq */
> +    uint64_t pc64;   /* ELR_EL2 */
> +    uint32_t cpsr;   /* SPSR_EL2 */
> +    uint32_t spsr_el1;  /*spsr_svc */
> +    /* AArch32 guests only */
> +    uint32_t spsr_fiq, spsr_irq, spsr_und, spsr_abt;
> +    /* AArch64 guests only */
> +    uint64_t sp_el0;
> +    uint64_t sp_el1, elr_el1;

If these are arch exclusive, should they be in a union?

> +
> +    uint32_t sctlr, ttbcr;
> +    uint64_t ttbr0, ttbr1;
> +
> +    uint32_t ifar, dfar;
> +    uint32_t ifsr, dfsr;
> +    uint32_t dacr;
> +    uint64_t par;
> +
> +    uint64_t far;
> +    uint64_t esr;
> +
> +    uint64_t mair0, mair1;
> +    uint64_t tpidr_el0;
> +    uint64_t tpidr_el1;
> +    uint64_t tpidrro_el0;
> +    uint64_t vbar;
> +
> +    /* Control Registers */
> +    uint32_t actlr;
> +    uint32_t cpacr;
> +    uint32_t afsr0, afsr1;
> +    uint32_t contextidr;
> +    uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */
> +    uint32_t joscr, jmcr;
> +    /* CP 15 */
> +    uint32_t csselr;
> +
> +    unsigned long pause_flags;

What is this doing here?  This is not architectural state of the cpu.

> +
> +};
> +DECLARE_HVM_SAVE_TYPE(VCPU, 4, struct hvm_hw_cpu);
> +
> +/*
> + * Largest type-code in use
> + */
> +#define HVM_SAVE_CODE_MAX 4
> +
>  #endif
>  
>  /*

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 3/6] xen/arm: support guest do_suspend function
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
@ 2014-04-15 23:38   ` Andrew Cooper
  2014-04-15 23:39   ` Andrew Cooper
  2014-04-16  9:10   ` Julien Grall
  2 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2014-04-15 23:38 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 15/04/2014 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the
> success of suspend.
>
> Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  tools/libxc/xc_resume.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index 18b4818..2b09990 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -73,6 +73,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
>      return 0;
>  }
>  
> +#elif defined(__arm__) || defined(__aarch64__)
> +
> +static int modify_returncode(xc_interface *xch, uint32_t domid)
> +{
> +    vcpu_guest_context_any_t ctxt;
> +    xc_dominfo_t info;
> +    int rc;
> +
> +    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
> +    {
> +        PERROR("Could not get domain info");
> +        return -EINVAL;
> +    }

The semantics for xc_domain_getinfo() are crazy, and it sadly gets used
incorrectly far more often than correctly.

As the call stands, it asks for the first '1' domain which can be found
by starting at 'domid'.  If the provided domid is wrong, you will get
valid domain information for a different domain back, so in the  you
must also confirm that info.domid == domid

> +
> +    if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
> +        return rc;
> +
> +    ctxt.c.user_regs.r0_usr = 1;

This is the only architecture specific bit of code.  Cant you make the
code common with a small #if defined($ARCH) section in the middle of the
function?

~Andrew

> +
> +    if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 )
> +        return rc;
> +
> +    return 0;
> +}
> +
>  #else
>  
>  static int modify_returncode(xc_interface *xch, uint32_t domid)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
@ 2014-04-15 23:38   ` Andrew Cooper
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2014-04-15 23:38 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 15/04/2014 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> This patch adds hypercall for shadow operations, including enable/disable
> and clean/peek dirty page bitmap.
>
> The design consists of two parts: dirty page detecting and saving. For
> detecting, we setup the guest p2m's leaf PTE read-only and whenever
> the guest tries to write something, permission fault happens and traps
> into xen. The permission-faulted GPA should be saved for the toolstack,
> which checks which pages are dirty. For this purpose, it temporarily saves
> the GPAs into bitmap.
>
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  xen/arch/arm/domain.c           |   14 +++
>  xen/arch/arm/domctl.c           |   21 ++++
>  xen/arch/arm/mm.c               |  105 ++++++++++++++++++-
>  xen/arch/arm/p2m.c              |  211 +++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/traps.c            |   11 ++
>  xen/include/asm-arm/domain.h    |    7 ++
>  xen/include/asm-arm/mm.h        |   10 ++
>  xen/include/asm-arm/p2m.h       |    8 +-
>  xen/include/asm-arm/processor.h |    2 +
>  9 files changed, 387 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 3f04a77..d2531ed 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -207,6 +207,12 @@ static void ctxt_switch_to(struct vcpu *n)
>  
>      isb();
>  
> +    /* Dirty-page tracing
> +     * NB: How do we consider SMP case?
> +     */
> +    if ( n->domain->arch.dirty.mode )
> +        restore_vlpt(n->domain);
> +
>      /* This is could trigger an hardware interrupt from the virtual
>       * timer. The interrupt needs to be injected into the guest. */
>      virt_timer_restore(n);
> @@ -502,11 +508,19 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>      /* Default the virtual ID to match the physical */
>      d->arch.vpidr = boot_cpu_data.midr.bits;
>  
> +    /* init for dirty-page tracing */
> +    d->arch.dirty.count = 0;
> +    d->arch.dirty.mode = 0;

Redundant initialization to 0.

> +    spin_lock_init(&d->arch.dirty.lock);
> +
>      d->arch.dirty.second_lvl_start = 0;
>      d->arch.dirty.second_lvl_end = 0;
>      d->arch.dirty.second_lvl[0] = NULL;
>      d->arch.dirty.second_lvl[1] = NULL;
>  
> +    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
> +    d->arch.dirty.bitmap_pages = 0;
> +
>      clear_page(d->shared_info);
>      share_xen_page_with_guest(
>          virt_to_page(d->shared_info), d, XENSHARE_writable);
> diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
> index 45974e7..e84651f 100644
> --- a/xen/arch/arm/domctl.c
> +++ b/xen/arch/arm/domctl.c
> @@ -11,12 +11,33 @@
>  #include <xen/sched.h>
>  #include <xen/hypercall.h>
>  #include <public/domctl.h>
> +#include <xen/hvm/save.h>
> +#include <xen/guest_access.h>
> +
>  

Spurious whitespace change

>  long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
>                      XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>  {
> +    long ret = 0;
> +
>      switch ( domctl->cmd )
>      {
> +    case XEN_DOMCTL_shadow_op:
> +    {
> +        if ( d == current->domain ) /* no domain_pause() */
> +            return -EINVAL;
> +                                          
> +        domain_pause(d);
> +        ret = dirty_mode_op(d, &domctl->u.shadow_op);
> +        domain_unpause(d);
> +
> +        if ( __copy_to_guest(u_domctl, domctl, 1) )
> +            ret = -EFAULT;
> +
> +        return ret;
> +    }
> +    break;
> +
>      case XEN_DOMCTL_cacheflush:
>      {
>          unsigned long s = domctl->u.cacheflush.start_pfn;
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index a315752..ae852eb 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -981,7 +981,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e)
>      create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
>  }
>  
> -enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
>  static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)
>  {
>      lpae_t pte;
> @@ -1370,6 +1369,110 @@ void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t *end)
>          *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);
>  }
>  
> +static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
> +{
> +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
> +    int bit_index = PFN_DOWN(addr - ram_base);
> +    int page_index = bit_index >> (PAGE_SHIFT + 3);
> +    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) - 1);

These must be unsigned quantities, and larger than an int for 64 bit. 
Same applies throughout this patch.

> +
> +    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
> +}
> +
> +/* Routine for dirty-page tracing
> + *
> + * On first write, it page faults, its entry is changed to read-write,
> + * and on retry the write succeeds. For locating p2m of the faulting entry,
> + * we use virtual-linear page table.
> + *
> + * Returns zero if addr is not valid or dirty mode is not set
> + */
> +int handle_page_fault(struct domain *d, paddr_t addr)
> +{
> +
> +    lpae_t *vlp2m_pte = 0;

Pointers should be initialised to NULL.

> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +
> +    if ( !d->arch.dirty.mode )
> +        return 0;
> +
> +    domain_get_gpfn_range(d, &gma_start, &gma_end);
> +    /* Ensure that addr is inside guest's RAM */
> +    if ( addr < gma_start || addr > gma_end )
> +        return 0;
> +
> +    vlp2m_pte = get_vlpt_3lvl_pte(addr);
> +    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
> +         vlp2m_pte->p2m.type == p2m_ram_logdirty )
> +    {
> +        lpae_t pte = *vlp2m_pte;
> +        pte.p2m.write = 1;
> +        write_pte(vlp2m_pte, pte);
> +        flush_tlb_local();
> +
> +        /* only necessary to lock between get-dirty bitmap and mark dirty
> +         * bitmap. If get-dirty bitmap happens immediately before this
> +         * lock, the corresponding dirty-page would be marked at the next
> +         * round of get-dirty bitmap */
> +        spin_lock(&d->arch.dirty.lock);
> +        mark_dirty_bitmap(d, addr);
> +        spin_unlock(&d->arch.dirty.lock);
> +    }
> +
> +    return 1;
> +}
> +
> +int prepare_bitmap(struct domain *d)
> +{
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    int nr_bytes;
> +    int nr_pages;
> +    int i;
> +
> +    domain_get_gpfn_range(d, &gma_start, &gma_end);
> +
> +    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
> +    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> +    BUG_ON( nr_pages > MAX_DIRTY_BITMAP_PAGES );
> +
> +    for ( i = 0; i < nr_pages; ++i )
> +    {
> +        struct page_info *page;
> +
> +        page = alloc_domheap_page(NULL, 0);
> +        if ( page == NULL )
> +            goto cleanup_on_failure;
> +
> +        d->arch.dirty.bitmap[i] = map_domain_page_global(__page_to_mfn(page));
> +        clear_page(d->arch.dirty.bitmap[i]);
> +    }
> +
> +    d->arch.dirty.bitmap_pages = nr_pages;
> +    return 0;
> +
> +cleanup_on_failure:
> +    nr_pages = i;
> +    for ( i = 0; i < nr_pages; ++i )
> +    {

Extraneous braces.  (and elsewhere)

> +        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
> +    }
> +
> +    return -ENOMEM;
> +}
> +
> +void cleanup_bitmap(struct domain *d)
> +{
> +    int i;
> +
> +    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +    {
> +        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
> +    }
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 403fd89..d57a44a 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -6,6 +6,8 @@
>  #include <xen/bitops.h>
>  #include <asm/flushtlb.h>
>  #include <asm/gic.h>
> +#include <xen/guest_access.h>
> +#include <xen/pfn.h>
>  #include <asm/event.h>
>  #include <asm/hardirq.h>
>  #include <asm/page.h>
> @@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr,
>          break;
>  
>      case p2m_ram_ro:
> +    case p2m_ram_logdirty:
>          e.p2m.xn = 0;
>          e.p2m.write = 0;
>          break;
> @@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d,
>  
>      pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
>  
> +    /* mark the write bit (page table's case, ro bit) as 0. So it is writable 
> +     * in case of vlpt access */
> +    pte.pt.ro = 0;
> +
>      write_pte(entry, pte);
>  
>      return 0;
> @@ -697,6 +704,210 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
>      return p >> PAGE_SHIFT;
>  }
>  
> +/* Change types across all p2m entries in a domain */
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    paddr_t ram_base;
> +    int i1, i2, i3;
> +    int first_index, second_index, third_index;
> +    lpae_t *first = __map_domain_page(p2m->first_level);
> +    lpae_t pte, *second = NULL, *third = NULL;
> +
> +    domain_get_gpfn_range(d, &ram_base, NULL);
> +
> +    first_index = first_table_offset((uint64_t)ram_base);

You should not need to cast a paddr_t to uint64_t at all.

> +    second_index = second_table_offset((uint64_t)ram_base);
> +    third_index = third_table_offset((uint64_t)ram_base);
> +
> +    BUG_ON( !first && "Can't map first level p2m." );

map_domain_page() doesn't fail.  It will bug itself if it cant succeed.

> +
> +    spin_lock(&p2m->lock);
> +
> +    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
> +    {
> +        lpae_walk_t first_pte = first[i1].walk;
> +
> +        if ( !first_pte.valid || !first_pte.table )
> +            goto out;
> +
> +        second = map_domain_page(first_pte.base);
> +        BUG_ON( !second && "Can't map second level p2m.");
> +
> +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> +        {
> +            lpae_walk_t second_pte = second[i2].walk;
> +
> +            if ( !second_pte.valid || !second_pte.table )
> +                goto out;
> +
> +            third = map_domain_page(second_pte.base);
> +            BUG_ON( !third && "Can't map third level p2m.");
> +
> +            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
> +            {
> +
> +                lpae_walk_t third_pte = third[i3].walk;
> +                if ( !third_pte.valid )
> +                    goto out;
> +
> +                pte = third[i3];
> +                if ( nt == mg_ro )
> +                {
> +                    if ( pte.p2m.write == 1 )
> +                    {
> +                        pte.p2m.write = 0;
> +                        pte.p2m.type = p2m_ram_logdirty;
> +                    }
> +                    else
> +                    {
> +                        /* reuse avail bit as an indicator of 'actual'
> +                         * read-only */
> +                        pte.p2m.type = p2m_ram_rw;
> +                    }
> +                }
> +                else if ( nt == mg_rw )
> +                {
> +                    if ( pte.p2m.write == 0 &&
> +                         pte.p2m.type == p2m_ram_logdirty )
> +                    {
> +                        pte.p2m.write = p2m_ram_rw;
> +                    }
> +                }
> +                write_pte(&third[i3], pte);
> +            }
> +            unmap_domain_page(third);
> +
> +            third = NULL;
> +            third_index = 0;
> +        }
> +        unmap_domain_page(second);
> +
> +        second = NULL;
> +        second_index = 0;
> +        third_index = 0;
> +    }
> +
> +out:
> +    flush_tlb_all_local();
> +    if ( third ) unmap_domain_page(third);

Newlines please.

> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +}
> +
> +/* Read a domain's log-dirty bitmap and stats.
> + * If the operation is a CLEAN, clear the bitmap and stats. */
> +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    int peek = 1;
> +    int i;
> +    int bitmap_size;
> +    paddr_t gma_start, gma_end;
> +
> +    /* this hypercall is called from domain 0, and we don't know which guest's
> +     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
> +    restore_vlpt(d);
> +
> +    domain_get_gpfn_range(d, &gma_start, &gma_end);
> +    bitmap_size = (gma_end - gma_start) / 8;
> +
> +    if ( guest_handle_is_null(sc->dirty_bitmap) )
> +    {
> +        peek = 0;
> +    }
> +    else
> +    {
> +        spin_lock(&d->arch.dirty.lock);
> +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +        {
> +            int j = 0;
> +            uint8_t *bitmap;
> +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> +                                 d->arch.dirty.bitmap[i],
> +                                 bitmap_size < PAGE_SIZE ? bitmap_size :
> +                                                           PAGE_SIZE);
> +            bitmap_size -= PAGE_SIZE;
> +
> +            /* set p2m page table read-only */
> +            bitmap = d->arch.dirty.bitmap[i];
> +            while ((j = find_next_bit((const long unsigned int *)bitmap,
> +                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
> +            {
> +                lpae_t *vlpt;
> +                paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) +
> +                    (j << PAGE_SHIFT);
> +                vlpt = get_vlpt_3lvl_pte(addr);
> +                vlpt->p2m.write = 0;
> +                j++;
> +            }
> +        }
> +
> +        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
> +        {
> +            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +            {
> +                clear_page(d->arch.dirty.bitmap[i]);
> +            }
> +        }
> +
> +        spin_unlock(&d->arch.dirty.lock);
> +        flush_tlb_local();
> +    }
> +
> +    sc->stats.dirty_count = d->arch.dirty.count;
> +
> +    return 0;
> +}
> +
> +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    long ret = 0;
> +    switch (sc->op)
> +    {
> +        case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
> +        case XEN_DOMCTL_SHADOW_OP_OFF:
> +        {
> +            enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro;
> +
> +            d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1;
> +            p2m_change_entry_type_global(d, nt);
> +
> +            if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF )
> +            {
> +                cleanup_vlpt(d);
> +                cleanup_bitmap(d);
> +            }
> +            else
> +            {
> +                if ( (ret = prepare_vlpt(d)) )
> +                   return ret;
> +
> +                if ( (ret = prepare_bitmap(d)) )
> +                {
> +                   /* in case of failure, we have to cleanup vlpt */
> +                   cleanup_vlpt(d);
> +                   return ret;
> +                }
> +            }
> +        }
> +        break;
> +
> +        case XEN_DOMCTL_SHADOW_OP_CLEAN:
> +        case XEN_DOMCTL_SHADOW_OP_PEEK:
> +        {
> +            ret = log_dirty_op(d, sc);
> +        }
> +        break;
> +
> +        default:
> +            return -ENOSYS;
> +    }
> +
> +    return ret;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index a7edc4e..cca34e9 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1491,6 +1491,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>      struct hsr_dabt dabt = hsr.dabt;
>      int rc;
>      mmio_info_t info;
> +    int page_fault = ( (dabt.dfsc & FSC_MASK) ==
> +                       (FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write );
>  
>      if ( !check_conditional_instr(regs, hsr) )
>      {
> @@ -1512,6 +1514,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs,
>      if ( rc == -EFAULT )
>          goto bad_data_abort;
>  
> +    /* domU page fault handling for guest live migration. dabt.valid can be 
> +     * 0 here.
> +     */
> +    if ( page_fault && handle_page_fault(current->domain, info.gpa) )
> +    {
> +        /* Do not modify pc after page fault to repeat memory operation */
> +        return;
> +    }
> +
>      /* XXX: Decode the instruction if ISS is not valid */
>      if ( !dabt.valid )
>          goto bad_data_abort;
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 5321bd6..99f9f51 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -163,9 +163,16 @@ struct arch_domain
>  
>      /* dirty-page tracing */
>      struct {
> +#define MAX_DIRTY_BITMAP_PAGES 64        /* support upto 8GB guest memory */
> +        spinlock_t lock;                 /* protect list: head, mvn_head */
> +        volatile int mode;               /* 1 if dirty pages tracing enabled */
> +        volatile unsigned int count;     /* dirty pages counter */
>          volatile int second_lvl_start;   /* for context switch */
>          volatile int second_lvl_end;
>          lpae_t *second_lvl[2];           /* copy of guest p2m's first */
> +        /* dirty bitmap */
> +        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES];
> +        int bitmap_pages;                /* number of dirty bitmap pages */
>      } dirty;
>  
>      unsigned int evtchn_irq;
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 5fd684f..5f9478b 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -343,10 +343,18 @@ static inline void put_page_and_type(struct page_info *page)
>      put_page(page);
>  }
>  
> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
> +
> +/* routine for dirty-page tracing */
> +int handle_page_fault(struct domain *d, paddr_t addr);
> +
>  int prepare_vlpt(struct domain *d);
>  void cleanup_vlpt(struct domain *d);
>  void restore_vlpt(struct domain *d);
>  
> +int prepare_bitmap(struct domain *d);
> +void cleanup_bitmap(struct domain *d);

Too generically named.  Perhaps {prepare,cleanup}_logdirty_bitmap() ?

~Andrew

> +
>  /* calculate the xen's virtual address for accessing the leaf PTE of
>   * a given address (GPA) */
>  static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
> @@ -359,6 +367,8 @@ static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
>      return &table[addr >> PAGE_SHIFT];
>  }
>  
> +void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
> +
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index bd71abe..0cecbe7 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -2,6 +2,7 @@
>  #define _XEN_P2M_H
>  
>  #include <xen/mm.h>
> +#include <public/domctl.h>
>  
>  struct domain;
>  
> @@ -41,6 +42,7 @@ typedef enum {
>      p2m_invalid = 0,    /* Nothing mapped here */
>      p2m_ram_rw,         /* Normal read/write guest RAM */
>      p2m_ram_ro,         /* Read-only; writes are silently dropped */
> +    p2m_ram_logdirty,   /* Read-only: special mode for log dirty */
>      p2m_mmio_direct,    /* Read/write mapping of genuine MMIO area */
>      p2m_map_foreign,    /* Ram pages from foreign domain */
>      p2m_grant_map_rw,   /* Read/write grant mapping */
> @@ -49,7 +51,8 @@ typedef enum {
>  } p2m_type_t;
>  
>  #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
> -#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
> +#define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro ||  \
> +                             (_t) == p2m_ram_logdirty)
>  
>  /* Initialise vmid allocator */
>  void p2m_vmid_allocator_init(void);
> @@ -178,6 +181,9 @@ static inline int get_page_and_type(struct page_info *page,
>      return rc;
>  }
>  
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt);
> +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
> +
>  #endif /* _XEN_P2M_H */
>  
>  /*
> diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
> index 06e638f..9dc49c3 100644
> --- a/xen/include/asm-arm/processor.h
> +++ b/xen/include/asm-arm/processor.h
> @@ -399,6 +399,8 @@ union hsr {
>  #define FSC_CPR        (0x3a) /* Coprocossor Abort */
>  
>  #define FSC_LL_MASK    (_AC(0x03,U)<<0)
> +#define FSC_MASK       (0x3f) /* Fault status mask */
> +#define FSC_3D_LEVEL   (0x03) /* Third level fault*/
>  
>  /* Time counter hypervisor control register */
>  #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical counter */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 3/6] xen/arm: support guest do_suspend function
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
  2014-04-15 23:38   ` Andrew Cooper
@ 2014-04-15 23:39   ` Andrew Cooper
  2014-04-16  9:10   ` Julien Grall
  2 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2014-04-15 23:39 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 15/04/2014 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the
> success of suspend.
>
> Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  tools/libxc/xc_resume.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index 18b4818..2b09990 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -73,6 +73,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
>      return 0;
>  }
>  
> +#elif defined(__arm__) || defined(__aarch64__)
> +
> +static int modify_returncode(xc_interface *xch, uint32_t domid)
> +{
> +    vcpu_guest_context_any_t ctxt;
> +    xc_dominfo_t info;
> +    int rc;
> +
> +    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
> +    {
> +        PERROR("Could not get domain info");
> +        return -EINVAL;
> +    }

The semantics for xc_domain_getinfo() are crazy, and it sadly gets used
incorrectly far more often than correctly.

As the call stands, it asks for the first '1' domain which can be found
by starting at 'domid'.  If the provided domid is wrong, you will get
valid domain information for a different domain back, so in the  you
must also confirm that info.domid == domid

> +
> +    if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
> +        return rc;
> +
> +    ctxt.c.user_regs.r0_usr = 1;

This is the only architecture specific bit of code.  Cant you make the
code common with a small #if defined($ARCH) section in the middle of the
function?

~Andrew

> +
> +    if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 )
> +        return rc;
> +
> +    return 0;
> +}
> +
>  #else
>  
>  static int modify_returncode(xc_interface *xch, uint32_t domid)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
  2014-04-15 22:29   ` Julien Grall
@ 2014-04-15 23:40   ` Andrew Cooper
  2014-04-22 17:54   ` Julien Grall
  2 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2014-04-15 23:40 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 15/04/2014 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> Thsi patch implements VLPT (virtual-linear page table) for fast accessing
> of 3rd PTE of guest P2M. For more info about VLPT, please see
> http://www.technovelty.org/linux/virtual-linear-page-table.html.
>
> When creating a mapping for VLPT, just copy the 1st level PTE of guest p2m
> to xen's 2nd level PTE. Then the mapping becomes the following:
>       xen's 1st PTE -->
>       xen's 2nd PTE (which is the same as 1st PTE of guest p2m) -->
>       guest p2m's 2nd PTE -->
>       guest p2m's 3rd PTE (the memory contents where the vlpt points)
>
> This function is used in dirty-page tracing. When domU write-fault is
> trapped by xen, xen can immediately locate the 3rd PTE of guest p2m.
>
> The following link shows the performance comparison for handling a
> dirty-page between vlpt and typical page table walking.
> http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html
>
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> ---
>  xen/arch/arm/domain.c        |    5 ++
>  xen/arch/arm/mm.c            |  116 ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/config.h |    7 +++
>  xen/include/asm-arm/domain.h |    7 +++
>  xen/include/asm-arm/mm.h     |   17 +++++++
>  5 files changed, 152 insertions(+)
>
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index b125857..3f04a77 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -502,6 +502,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>      /* Default the virtual ID to match the physical */
>      d->arch.vpidr = boot_cpu_data.midr.bits;
>  
> +    d->arch.dirty.second_lvl_start = 0;
> +    d->arch.dirty.second_lvl_end = 0;
> +    d->arch.dirty.second_lvl[0] = NULL;
> +    d->arch.dirty.second_lvl[1] = NULL;
> +

alloc_domain_struct() clears the domain page before handing it back.
Initialising things like this to 0 is pointless.

>      clear_page(d->shared_info);
>      share_xen_page_with_guest(
>          virt_to_page(d->shared_info), d, XENSHARE_writable);
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 473ad04..a315752 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -750,6 +750,122 @@ void *__init arch_vmap_virt_end(void)
>      return (void *)VMAP_VIRT_END;
>  }
>  
> +/* Flush the vlpt area */
> +void flush_vlpt(struct domain *d)
> +{
> +    int flush_size;
> +
> +    flush_size = (d->arch.dirty.second_lvl_end -
> +                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
> +
> +    /* flushing the 3rd level mapping */
> +    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start << SECOND_SHIFT,

This is taking a volatile int, shifting left by 21 and then using it as
a virtual address.  It is going to do the wrong thing more often than
the right thing.

second_lvl_{start,end} should be unsigned long, and I cant see a
justification for why they should be volatile.

> +                                flush_size);
> +}
> +
> +/* Restore the xen page table for vlpt mapping for domain */
> +void restore_vlpt(struct domain *d)
> +{
> +    int i;
> +
> +    dsb(sy);
> +
> +    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
> +          ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;

These are all array indices - they should all be unsigned, as should the
loop variable.

> +
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
> +        {
> +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
> +            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);

shifted int as virtual address.  Also, SECOND_SIZE exists.

> +        }
> +    }
> +    
> +    dsb(sy);
> +    isb();
> +}
> +
> +/* Set up the xen page table for vlpt mapping for domain */
> +int prepare_vlpt(struct domain *d)
> +{
> +    int xen_second_linear_base;
> +    int gp2m_start_index, gp2m_end_index;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    struct page_info *second_lvl_page;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    lpae_t *first[2];
> +    int i;
> +    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
> +
> +    domain_get_gpfn_range(d, &gma_start, &gma_end);
> +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> +
> +    if ( required > avail )
> +    {
> +        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest"
> +                "(avail: %llx, required: %llx)\n", (unsigned long long)avail,
> +                (unsigned long long)required);
> +        return -ENOMEM;
> +    }
> +
> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);

unsigned int held in signed integer value.

> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
> +    {
> +        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
> +        return -ENOMEM;
> +    }
> +
> +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> +    if ( second_lvl_page == NULL )
> +        return -ENOMEM;
> +
> +    /* First level p2m is 2 consecutive pages */
> +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page) );
> +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page+1) );
> +
> +    first[0] = __map_domain_page(p2m->first_level);
> +    first[1] = __map_domain_page(p2m->first_level+1);

Xen style - spaces immediately surrounding binary operators.

~Andrew

> +
> +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> +
> +        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
> +
> +        /* we copy the mapping into domain's structure as a reference
> +         * in case of the context switch (used in restore_vlpt) */
> +        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
> +    }
> +    unmap_domain_page(first[0]);
> +    unmap_domain_page(first[1]);
> +
> +    /* storing the start and end index */
> +    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
> +    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
> +
> +    flush_vlpt(d);
> +
> +    return 0;
> +}
> +
> +void cleanup_vlpt(struct domain *d)
> +{
> +    /* First level p2m is 2 consecutive pages */
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> +}
>  /*
>   * This function should only be used to remap device address ranges
>   * TODO: add a check to verify this assumption
> diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
> index ef291ff..47d1bce 100644
> --- a/xen/include/asm-arm/config.h
> +++ b/xen/include/asm-arm/config.h
> @@ -87,6 +87,7 @@
>   *   0  -   8M   <COMMON>
>   *
>   *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> + * 128M - 256M   Virtual-linear mapping to P2M table
>   * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
>   *                    space
>   *
> @@ -124,7 +125,9 @@
>  #define CONFIG_SEPARATE_XENHEAP 1
>  
>  #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
>  #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
>  #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
>  #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
>  #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
> @@ -157,6 +160,10 @@
>  
>  #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
>  
> +/* VIRT_LIN_P2M_START and VIRT_LIN_P2M_END for vlpt */
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t, 0x08000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
> +
>  #endif
>  
>  /* Fixmap slots */
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 28c359a..5321bd6 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -161,6 +161,13 @@ struct arch_domain
>          spinlock_t                  lock;
>      } vuart;
>  
> +    /* dirty-page tracing */
> +    struct {
> +        volatile int second_lvl_start;   /* for context switch */
> +        volatile int second_lvl_end;
> +        lpae_t *second_lvl[2];           /* copy of guest p2m's first */
> +    } dirty;
> +
>      unsigned int evtchn_irq;
>  }  __cacheline_aligned;
>  
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 8347524..5fd684f 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -4,6 +4,7 @@
>  #include <xen/config.h>
>  #include <xen/kernel.h>
>  #include <asm/page.h>
> +#include <asm/config.h>
>  #include <public/xen.h>
>  
>  /* Align Xen to a 2 MiB boundary. */
> @@ -342,6 +343,22 @@ static inline void put_page_and_type(struct page_info *page)
>      put_page(page);
>  }
>  
> +int prepare_vlpt(struct domain *d);
> +void cleanup_vlpt(struct domain *d);
> +void restore_vlpt(struct domain *d);
> +
> +/* calculate the xen's virtual address for accessing the leaf PTE of
> + * a given address (GPA) */
> +static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
> +{
> +    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
> +
> +    /* Since we slotted the guest's first p2m page table to xen's
> +     * second page table, one shift is enough for calculating the
> +     * index of guest p2m table entry */
> +    return &table[addr >> PAGE_SHIFT];
> +}
> +
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate
  2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
@ 2014-04-15 23:40   ` Andrew Cooper
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2014-04-15 23:40 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: stefano.stabellini, julien.grall, ian.campbell, jaeyong.yoo, yjhyun.yoo

On 15/04/2014 22:05, Wei Huang wrote:
> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>
> This patch implements xl save/restore operation in xc_arm_migrate.c and
> make it compilable with existing design. The operation is also used by
> migration.
>
> The overall process of save is the following:
> 1) save guest parameters (i.e., memory map, console and store pfn, etc)
> 2) save memory (if it is live migration, perform dirty-page tracing)
> 3) save hvm states (i.e., gic, timer, vcpu etc)
>
> Signed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
> Signed-off-by: Wei Huang <w1.huang@samsung.com>
> ---
>  config/arm32.mk              |    1 +
>  config/arm64.mk              |    1 +
>  tools/libxc/Makefile         |    6 +-
>  tools/libxc/xc_arm_migrate.c |  702 ++++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xc_dom_arm.c     |    4 +-
>  tools/libxl/libxl.h          |    3 -
>  tools/misc/Makefile          |    4 +-
>  7 files changed, 714 insertions(+), 7 deletions(-)
>  create mode 100644 tools/libxc/xc_arm_migrate.c
>
> diff --git a/config/arm32.mk b/config/arm32.mk
> index aa79d22..01374c9 100644
> --- a/config/arm32.mk
> +++ b/config/arm32.mk
> @@ -1,6 +1,7 @@
>  CONFIG_ARM := y
>  CONFIG_ARM_32 := y
>  CONFIG_ARM_$(XEN_OS) := y
> +CONFIG_MIGRATE := y
>  
>  CONFIG_XEN_INSTALL_SUFFIX :=
>  
> diff --git a/config/arm64.mk b/config/arm64.mk
> index 15b57a4..7ac3b65 100644
> --- a/config/arm64.mk
> +++ b/config/arm64.mk
> @@ -1,6 +1,7 @@
>  CONFIG_ARM := y
>  CONFIG_ARM_64 := y
>  CONFIG_ARM_$(XEN_OS) := y
> +CONFIG_MIGRATE := y
>  
>  CONFIG_XEN_INSTALL_SUFFIX :=
>  
> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> index 2cca2b2..6b90b1c 100644
> --- a/tools/libxc/Makefile
> +++ b/tools/libxc/Makefile
> @@ -43,8 +43,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
>  GUEST_SRCS-y :=
>  GUEST_SRCS-y += xg_private.c xc_suspend.c
>  ifeq ($(CONFIG_MIGRATE),y)
> +ifeq ($(CONFIG_X86),y)
>  GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
>  GUEST_SRCS-y += xc_offline_page.c xc_compression.c
> +endif
> +ifeq ($(CONFIG_ARM),y)
> +GUEST_SRCS-y += xc_arm_migrate.c
> +endif
>  else
>  GUEST_SRCS-y += xc_nomigrate.c
>  endif
> @@ -64,7 +69,6 @@ $(patsubst %.c,%.opic,$(ELF_SRCS-y)): CFLAGS += -Wno-pointer-sign
>  GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
>  GUEST_SRCS-y                 += xc_dom_elfloader.c
>  GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
> -GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
>  GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
>  GUEST_SRCS-y                 += xc_dom_binloader.c
>  GUEST_SRCS-y                 += xc_dom_compat_linux.c
> diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c
> new file mode 100644
> index 0000000..ab2b94c
> --- /dev/null
> +++ b/tools/libxc/xc_arm_migrate.c
> @@ -0,0 +1,702 @@
> +/*
> + * Copyright (c) 2013, Samsung Electronics
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#include <inttypes.h>
> +#include <errno.h>
> +#include <xenctrl.h>
> +#include <xenguest.h>
> +
> +#include <unistd.h>
> +#include <xc_private.h>
> +#include <xc_dom.h>

These should be "" rather than <>

> +#include "xc_bitops.h"
> +#include "xg_private.h"
> +
> +#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop   */
> +#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size  */
> +#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */
> +#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
> +
> +//#define DISABLE_LIVE_MIGRATION
> +
> +//#define ARM_MIGRATE_VERBOSE

Debugging? These certainly shouldn't stay in the code.

> +
> +/*
> + * Guest params to save: used HVM params, save flags, memory map
> + */
> +typedef struct guest_params
> +{
> +    unsigned long console_pfn;
> +    unsigned long store_pfn;
> +    uint32_t flags;
> +    xen_pfn_t start_gpfn;
> +    xen_pfn_t max_gpfn;
> +    uint32_t max_vcpu_id;
> +} guest_params_t;
> +
> +static int suspend_and_state(int (*suspend)(void*), void *data,
> +                             xc_interface *xch, int dom)
> +{
> +    xc_dominfo_t info;
> +    if ( !(*suspend)(data) )
> +    {
> +        ERROR("Suspend request failed");
> +        return -1;
> +    }
> +
> +    if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) ||

(mis)use of xc_domain_getinfo().

> +         !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
> +    {
> +        ERROR("Domain is not in suspended state after suspend attempt");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_exact_handled(xc_interface *xch, int fd, const void *data,
> +                               size_t size)
> +{
> +    if ( write_exact(fd, data, size) )
> +    {
> +        ERROR("Write failed, check space");
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +/* ============ Memory ============= */
> +static int save_memory(xc_interface *xch, int io_fd, uint32_t dom,
> +                       struct save_callbacks *callbacks,
> +                       uint32_t max_iters, uint32_t max_factor,
> +                       guest_params_t *params)
> +{
> +    int live =  !!(params->flags & XCFLAGS_LIVE);
> +    int debug =  !!(params->flags & XCFLAGS_DEBUG);
> +    xen_pfn_t i;
> +    char reportbuf[80];
> +    int iter = 0;
> +    int last_iter = !live;
> +    int total_dirty_pages_num = 0;
> +    int dirty_pages_on_prev_iter_num = 0;
> +    int count = 0;
> +    char *page = 0;
> +    xen_pfn_t *busy_pages = 0;
> +    int busy_pages_count = 0;
> +    int busy_pages_max = 256;
> +
> +    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
> +
> +    xen_pfn_t start = params->start_gpfn;
> +    const xen_pfn_t end = params->max_gpfn;
> +    const xen_pfn_t mem_size = end - start;
> +
> +    if ( debug )
> +    {
> +        IPRINTF("(save mem) start=%llx end=%llx!\n", (unsigned long long)start,
> +                (unsigned long long)end);
> +    }
> +
> +    if ( live )
> +    {
> +        if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
> +                    NULL, 0, NULL, 0, NULL) < 0 )
> +        {
> +            ERROR("Couldn't enable log-dirty mode !\n");
> +            return -1;
> +        }
> +
> +        max_iters  = max_iters  ? : DEF_MAX_ITERS;
> +        max_factor = max_factor ? : DEF_MAX_FACTOR;
> +
> +        if ( debug )
> +            IPRINTF("Log-dirty mode enabled, max_iters=%d, max_factor=%d!\n",
> +                    max_iters, max_factor);
> +    }
> +
> +    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
> +                                              NRPAGES(bitmap_size(mem_size)));
> +    if ( !to_send )
> +    {
> +        ERROR("Couldn't allocate to_send array!\n");
> +        return -1;
> +    }
> +
> +    /* send all pages on first iter */
> +    memset(to_send, 0xff, bitmap_size(mem_size));
> +
> +    for ( ; ; )
> +    {
> +        int dirty_pages_on_current_iter_num = 0;
> +        int frc;
> +        iter++;
> +
> +        snprintf(reportbuf, sizeof(reportbuf),
> +                 "Saving memory: iter %d (last sent %u)",
> +                 iter, dirty_pages_on_prev_iter_num);
> +
> +        xc_report_progress_start(xch, reportbuf, mem_size);
> +
> +        if ( (iter > 1 &&
> +              dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) ||
> +             (iter == max_iters) ||
> +             (total_dirty_pages_num >= mem_size*max_factor) )
> +        {
> +            if ( debug )
> +                IPRINTF("Last iteration");
> +            last_iter = 1;
> +        }
> +
> +        if ( last_iter )
> +        {
> +            if ( suspend_and_state(callbacks->suspend, callbacks->data,
> +                                   xch, dom) )
> +            {
> +                ERROR("Domain appears not to have suspended");
> +                return -1;
> +            }
> +        }
> +        if ( live && iter > 1 )
> +        {
> +            frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN,
> +                                    HYPERCALL_BUFFER(to_send), mem_size,
> +                                                     NULL, 0, NULL);
> +            if ( frc != mem_size )
> +            {
> +                ERROR("Error peeking shadow bitmap");
> +                xc_hypercall_buffer_free_pages(xch, to_send,
> +                                               NRPAGES(bitmap_size(mem_size)));
> +                return -1;
> +            }
> +        }
> +
> +        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
> +
> +        for ( i = start; i < end; ++i )
> +        {
> +            if ( test_bit(i - start, to_send) )
> +            {
> +                page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i);
> +                if ( !page )
> +                {
> +                    /* This page is mapped elsewhere, should be resent later */
> +                    busy_pages[busy_pages_count] = i;
> +                    busy_pages_count++;
> +                    if ( busy_pages_count >= busy_pages_max )
> +                    {
> +                        busy_pages_max += 256;
> +                        busy_pages = realloc(busy_pages, sizeof(xen_pfn_t) *
> +                                                         busy_pages_max);
> +                    }
> +                    continue;
> +                }
> +
> +                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
> +                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
> +                {
> +                    munmap(page, PAGE_SIZE);
> +                    free(busy_pages);
> +                    return -1;
> +                }
> +                count++;
> +                munmap(page, PAGE_SIZE);
> +
> +                if ( (i % DEF_PROGRESS_RATE) == 0 )
> +                    xc_report_progress_step(xch, i - start, mem_size);
> +                dirty_pages_on_current_iter_num++;
> +            }
> +        }
> +
> +        while ( busy_pages_count )
> +        {
> +            /* Send busy pages */
> +            busy_pages_count--;
> +            i = busy_pages[busy_pages_count];
> +            if ( test_bit(i - start, to_send) )
> +            {
> +                page = xc_map_foreign_range(xch, dom, PAGE_SIZE,PROT_READ, i);
> +                if ( !page )
> +                {
> +                    IPRINTF("WARNING: 2nd attempt to save page "
> +                            "busy failed pfn=%llx", (unsigned long long)i);
> +                    continue;
> +                }
> +
> +                if ( debug )
> +                {
> +                    IPRINTF("save mem: resend busy page %llx\n",
> +                            (unsigned long long)i);
> +                }
> +
> +                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
> +                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
> +                {
> +                    munmap(page, PAGE_SIZE);
> +                    free(busy_pages);
> +                    return -1;
> +                }
> +                count++;
> +                munmap(page, PAGE_SIZE);
> +                dirty_pages_on_current_iter_num++;
> +            }
> +        }
> +        free(busy_pages);
> +
> +        if ( debug )
> +            IPRINTF("Dirty pages=%d", dirty_pages_on_current_iter_num);
> +
> +        xc_report_progress_step(xch, mem_size, mem_size);
> +
> +        dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num;
> +        total_dirty_pages_num += dirty_pages_on_current_iter_num;
> +
> +        if ( last_iter )
> +        {
> +            xc_hypercall_buffer_free_pages(xch, to_send,
> +                                           NRPAGES(bitmap_size(mem_size)));
> +            if ( live )
> +            {
> +                if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF,
> +                                       NULL, 0, NULL, 0, NULL) < 0 )
> +                    ERROR("Couldn't disable log-dirty mode");
> +            }
> +            break;
> +        }
> +    }
> +    if ( debug )
> +    {
> +        IPRINTF("save mem: pages count = %d\n", count);
> +    }
> +
> +    i = (xen_pfn_t) -1; /* end page marker */
> +    return write_exact_handled(xch, io_fd, &i, sizeof(i));
> +}
> +
> +static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom,
> +                          guest_params_t *params)
> +{
> +    xen_pfn_t end = params->max_gpfn;
> +    xen_pfn_t gpfn;
> +    int debug =  !!(params->flags & XCFLAGS_DEBUG);
> +    int count = 0;
> +    char *page;
> +    xen_pfn_t start = params->start_gpfn;
> +
> +    /* TODO allocate several pages per call */
> +    for ( gpfn = start; gpfn < end; ++gpfn )
> +    {
> +        if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) )
> +        {
> +            PERROR("Memory allocation for a new domain failed");
> +            return -1;
> +        }
> +    }
> +
> +    while ( 1 )
> +    {
> +
> +        if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) )
> +        {
> +            PERROR("GPFN read failed during memory transfer, count=%d", count);
> +            return -1;
> +        }
> +        if ( gpfn == (xen_pfn_t) -1 ) break; /* end page marker */
> +
> +        if ( gpfn < start || gpfn >= end )
> +        {
> +            ERROR("GPFN %llx doesn't belong to RAM address space, count=%d",
> +                  (unsigned long long)gpfn, count);
> +            return -1;
> +        }
> +        page = xc_map_foreign_range(xch, dom, PAGE_SIZE,
> +                                    PROT_READ | PROT_WRITE, gpfn);
> +        if ( !page )
> +        {
> +            PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn);
> +            return -1;
> +        }
> +        if ( read_exact(io_fd, page, PAGE_SIZE) )
> +        {
> +            PERROR("Page data read failed during memory transfer, pfn=%llx",
> +                    gpfn);
> +            return -1;
> +        }
> +        munmap(page, PAGE_SIZE);
> +        count++;
> +    }
> +
> +    if ( debug )
> +    {
> +        IPRINTF("Memory restored, pages count=%d", count);
> +    }
> +    return 0;
> +}
> +
> +/* ============ HVM context =========== */
> +static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug)
> +{
> +    /* HVM: a buffer for holding HVM context */
> +    uint32_t hvm_buf_size = 0;
> +    uint8_t *hvm_buf = NULL;
> +    uint32_t rec_size;
> +    int retval = -1;
> +
> +    /* Need another buffer for HVM context */
> +    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
> +    if ( hvm_buf_size == -1 )
> +    {
> +        ERROR("Couldn't get HVM context size from Xen");
> +        goto out;
> +    }
> +    hvm_buf = malloc(hvm_buf_size);
> +
> +    if ( !hvm_buf )
> +    {
> +        ERROR("Couldn't allocate memory for hvm buffer");
> +        goto out;
> +    }
> +
> +    /* Get HVM context from Xen and save it too */
> +    if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf,
> +                    hvm_buf_size)) == -1 )
> +    {
> +        ERROR("HVM:Could not get hvm buffer");
> +        goto out;
> +    }
> +
> +    if ( debug )
> +        IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size);
> +
> +    if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) )
> +        goto out;
> +
> +    if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) )
> +    {
> +        goto out;
> +    }
> +
> +    retval = 0;
> +
> +out:
> +    if ( hvm_buf )
> +        free (hvm_buf);

Pointless if

> +
> +    return retval;
> +}
> +
> +static int restore_armhvm(xc_interface *xch, int io_fd,
> +                          uint32_t dom, int debug)
> +{
> +    uint32_t rec_size;
> +    uint32_t hvm_buf_size = 0;
> +    uint8_t *hvm_buf = NULL;
> +    int frc = 0;
> +    int retval = -1;
> +
> +    if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) )
> +    {
> +        PERROR("Could not read HVM size");
> +        goto out;
> +    }
> +
> +    if ( !rec_size )
> +    {
> +        ERROR("Zero HVM size");
> +        goto out;
> +    }
> +
> +    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
> +    if ( hvm_buf_size != rec_size )
> +    {
> +        ERROR("HVM size for this domain is not the same as stored");
> +    }

This is Xens problem to deal with, not the toolstack.  Forward
compatibility in Xen means that in the future it might have to deal with
restore records shorter than save records.

> +
> +    hvm_buf = malloc(hvm_buf_size);
> +    if ( !hvm_buf )
> +    {
> +        ERROR("Couldn't allocate memory");
> +        goto out;
> +    }
> +
> +    if ( read_exact(io_fd, hvm_buf, hvm_buf_size) )
> +    {
> +        PERROR("Could not read HVM context");
> +        goto out;
> +    }
> +
> +    frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size);
> +    if ( frc )
> +    {
> +        ERROR("error setting the HVM context");
> +        goto out;
> +    }
> +    retval = 0;
> +
> +    if ( debug )
> +    {
> +            IPRINTF("HVM restore size %d %d", hvm_buf_size, rec_size);

PRIu32, indentation and braces.

> +    }
> +out:
> +    if ( hvm_buf )
> +        free (hvm_buf);
> +    return retval;
> +}
> +
> +/* ================= Console & Xenstore & Memory map =========== */
> +static int save_guest_params(xc_interface *xch, int io_fd,
> +                             uint32_t dom, uint32_t flags,
> +                             guest_params_t *params)
> +{
> +    size_t sz = sizeof(guest_params_t);
> +    xc_dominfo_t dom_info;
> +
> +    params->max_gpfn = xc_domain_maximum_gpfn(xch, dom);
> +    params->start_gpfn = (GUEST_RAM_BASE >> PAGE_SHIFT);
> +
> +    if ( flags & XCFLAGS_DEBUG )
> +    {
> +        IPRINTF("Guest param save size: %d ", (int)sz);
> +    }
> +
> +    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
> +            &params->console_pfn) )
> +    {
> +        ERROR("Can't get console gpfn");
> +        return -1;
> +    }
> +
> +    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, &params->store_pfn) )
> +    {
> +        ERROR("Can't get store gpfn");
> +        return -1;
> +    }
> +
> +    if ( xc_domain_getinfo(xch, dom, 1, &dom_info ) < 0)
> +    {
> +        ERROR("Can't get domain info for dom %d", dom);
> +        return -1;
> +    }
> +    params->max_vcpu_id = dom_info.max_vcpu_id;
> +
> +    params->flags = flags;
> +
> +    if ( write_exact_handled(xch, io_fd, params, sz) )
> +    {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int restore_guest_params(xc_interface *xch, int io_fd,
> +                                uint32_t dom, guest_params_t *params)
> +{
> +    size_t sz = sizeof(guest_params_t);
> +    xen_pfn_t nr_pfns;
> +    unsigned int maxmemkb;
> +
> +    if ( read_exact(io_fd, params, sizeof(guest_params_t)) )
> +    {
> +        PERROR("Can't read guest params");
> +        return -1;
> +    }
> +
> +    nr_pfns = params->max_gpfn - params->start_gpfn;
> +    maxmemkb = (unsigned int) nr_pfns << (PAGE_SHIFT - 10);
> +
> +    if ( params->flags & XCFLAGS_DEBUG )
> +    {
> +        IPRINTF("Guest param restore size: %d ", (int)sz);
> +        IPRINTF("Guest memory size: %d MB", maxmemkb >> 10);
> +    }
> +
> +    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
> +    {
> +        ERROR("Can't set memory map");
> +        return -1;
> +    }
> +
> +    /* Set max. number of vcpus as max_vcpu_id + 1 */
> +    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
> +    {
> +        ERROR("Can't set max vcpu number for domain");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom,
> +                            guest_params_t *params, unsigned int console_evtchn,
> +                            domid_t console_domid, unsigned int store_evtchn,
> +                            domid_t store_domid)
> +{
> +    int rc = 0;
> +
> +    if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) )
> +    {
> +        ERROR("Can't clear console page");
> +        return rc;
> +    }
> +
> +    if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) )
> +    {
> +        ERROR("Can't clear xenstore page");
> +        return rc;
> +    }
> +
> +    if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn,
> +                                      params->store_pfn, console_domid,
> +                                      store_domid)) )
> +    {
> +        ERROR("Can't grant console and xenstore pages");
> +        return rc;
> +    }
> +
> +    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
> +                                params->console_pfn)) )
> +    {
> +        ERROR("Can't set console gpfn");
> +        return rc;
> +    }
> +
> +    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
> +                                params->store_pfn)) )
> +    {
> +        ERROR("Can't set xenstore gpfn");
> +        return rc;
> +    }
> +
> +    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN,
> +                                console_evtchn)) )
> +    {
> +        ERROR("Can't set console event channel");
> +        return rc;
> +    }
> +
> +    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN,
> +                                store_evtchn)) )
> +    {
> +        ERROR("Can't set xenstore event channel");
> +        return rc;
> +    }
> +    return 0;
> +}
> +
> +/* ================== Main ============== */
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
> +                   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
> +                   struct save_callbacks *callbacks, int hvm,
> +                   unsigned long vm_generationid_addr)
> +{
> +    int debug;

This is a boolean, not an integer.

> +    guest_params_t params;
> +
> +#ifdef ARM_MIGRATE_VERBOSE
> +    flags |= XCFLAGS_DEBUG;
> +#endif
> +
> +#ifdef DISABLE_LIVE_MIGRATION
> +    flags &= ~(XCFLAGS_LIVE);
> +#endif
> +
> +    debug = !!(flags & XCFLAGS_DEBUG);
> +    if ( save_guest_params(xch, io_fd, dom, flags, &params) )
> +    {
> +       ERROR("Can't save guest params");
> +       return -1;
> +    }
> +
> +    if ( save_memory(xch, io_fd, dom, callbacks, max_iters,
> +            max_factor, &params) )
> +    {
> +        ERROR("Memory not saved");
> +        return -1;
> +    }
> +
> +    if ( save_armhvm(xch, io_fd, dom, debug) )
> +    {
> +        ERROR("HVM not saved");
> +        return -1;
> +    }
> +
> +    if ( debug )
> +    {
> +        IPRINTF("Domain %d saved", dom);
> +    }
> +    return 0;
> +}
> +
> +int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
> +                      unsigned int store_evtchn, unsigned long *store_gpfn,
> +                      domid_t store_domid, unsigned int console_evtchn,
> +                      unsigned long *console_gpfn, domid_t console_domid,
> +                      unsigned int hvm, unsigned int pae, int superpages,
> +                      int no_incr_generationid, int checkpointed_stream,
> +                      unsigned long *vm_generationid_addr,
> +                      struct restore_callbacks *callbacks)
> +{
> +    guest_params_t params;
> +    int debug = 1;
> +
> +    if ( restore_guest_params(xch, io_fd, dom, &params) )
> +    {
> +        ERROR("Can't restore guest params");
> +        return -1;
> +    }
> +    debug = !!(params.flags & XCFLAGS_DEBUG);
> +
> +    if ( restore_memory(xch, io_fd, dom, &params) )
> +    {
> +        ERROR("Can't restore memory");
> +        return -1;
> +    }
> +    if ( set_guest_params(xch, io_fd, dom, &params,
> +                console_evtchn, console_domid,
> +                store_evtchn, store_domid) )
> +    {
> +        ERROR("Can't setup guest params");
> +        return -1;
> +    }
> +
> +    /* Setup console and store PFNs to caller */
> +    *console_gpfn = params.console_pfn;
> +    *store_gpfn = params.store_pfn;
> +
> +    if ( restore_armhvm(xch, io_fd, dom, debug) )
> +    {
> +        ERROR("HVM not restored");
> +        return -1;
> +    }
> +
> +    if ( debug )
> +    {
> +         IPRINTF("Domain %d restored", dom);
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-set-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
> index f051515..044a8de 100644
> --- a/tools/libxc/xc_dom_arm.c
> +++ b/tools/libxc/xc_dom_arm.c
> @@ -335,7 +335,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
>          modbase += dtb_size;
>      }
>  
> -    return 0;
> +    return xc_domain_setmaxmem(dom->xch, dom->guest_domid,
> +                               (dom->total_pages + NR_MAGIC_PAGES)
> +                                << (PAGE_SHIFT - 10));
>  }
>  
>  int arch_setup_bootearly(struct xc_dom_image *dom)
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index b2c3015..e10f4fb 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -441,9 +441,6 @@
>   *  - libxl_domain_resume
>   *  - libxl_domain_remus_start
>   */
> -#if defined(__arm__) || defined(__aarch64__)
> -#define LIBXL_HAVE_NO_SUSPEND_RESUME 1
> -#endif
>  
>  /*
>   * LIBXL_HAVE_DEVICE_PCI_SEIZE
> diff --git a/tools/misc/Makefile b/tools/misc/Makefile
> index 17aeda5..0824100 100644
> --- a/tools/misc/Makefile
> +++ b/tools/misc/Makefile
> @@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
>  
>  TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov
>  TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
> -TARGETS-$(CONFIG_MIGRATE) += xen-hptool
> +TARGETS-$(CONFIG_X86) += xen-hptool
>  TARGETS := $(TARGETS-y)
>  
>  SUBDIRS := $(SUBDIRS-y)
> @@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
>  INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \
>  	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
>  INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
> -INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
> +INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
>  INSTALL_SBIN := $(INSTALL_SBIN-y)
>  
>  INSTALL_PRIVBIN-y := xenpvnetboot

This change is logically distinct and should be in a separate patch.

~Andrew

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 3/6] xen/arm: support guest do_suspend function
  2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
  2014-04-15 23:38   ` Andrew Cooper
  2014-04-15 23:39   ` Andrew Cooper
@ 2014-04-16  9:10   ` Julien Grall
  2 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-16  9:10 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: andrew.cooper3, stefano.stabellini, ian.campbell, jaeyong.yoo,
	yjhyun.yoo

Hello Wei,

Thank you for the patch.

On 15/04/14 22:05, Wei Huang wrote:
> +#elif defined(__arm__) || defined(__aarch64__)
> +
> +static int modify_returncode(xc_interface *xch, uint32_t domid)
> +{
> +    vcpu_guest_context_any_t ctxt;
> +    xc_dominfo_t info;
> +    int rc;
> +
> +    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
> +    {
> +        PERROR("Could not get domain info");
> +        return -EINVAL;
> +    }
> +
> +    if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
> +        return rc;
> +
> +    ctxt.c.user_regs.r0_usr = 1;

r0_usr is only for 32-bit. I think you need to use x0 is you are 
restoring a 64-bits guest.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
  2014-04-15 23:37   ` Andrew Cooper
@ 2014-04-16  9:48   ` Julien Grall
  2014-04-16 10:30     ` Jan Beulich
  2014-04-17 15:06   ` Julien Grall
  2 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2014-04-16  9:48 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: Keir Fraser, ian.campbell, stefano.stabellini, andrew.cooper3,
	jaeyong.yoo, Jan Beulich, yjhyun.yoo

Hello Wei,


Thank you for the patch.

You are modifying common code in this patch. I've add Jan and Keir.

Also, I would create a separate patch for this code movement.

On 15/04/14 22:05, Wei Huang wrote:
> +
> +HVM_REGISTER_SAVE_RESTORE(GIC, hvm_gic_save_ctxt, hvm_gic_load_ctxt, 1,
> +                          HVMSR_PER_VCPU);

With new support for different GIC, I would differentiate VGIC and GIC 
save/restore.

Also can you append V2 in the name? GICv3 support will be added soon.

[..]

> +
>   /*
>    * Local variables:
>    * mode: C
> diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
> index 3d6a721..7c47eac 100644
> --- a/xen/arch/arm/vtimer.c
> +++ b/xen/arch/arm/vtimer.c
> @@ -21,6 +21,7 @@
>   #include <xen/lib.h>
>   #include <xen/timer.h>
>   #include <xen/sched.h>
> +#include <xen/hvm/save.h>
>   #include <asm/irq.h>
>   #include <asm/time.h>
>   #include <asm/gic.h>
> @@ -284,6 +285,76 @@ int vtimer_emulate(struct cpu_user_regs *regs, union hsr hsr)
>       }
>   }
>
> +static int hvm_vtimer_save_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_hw_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t;
> +    int i, ret = 0;
> +
> +    /* Save the state of vtimer and ptimer */
> +    for_each_vcpu( d, v )
> +    {
> +        t = &v->arch.virt_timer;
> +        for ( i = 0; i < 2; i++ )
> +        {

Looping here is very confusing, what does mean 0? 1?

I would create an helper with the content of this loop and call twice 
this helper with the correct value in parameter.

Smth like:
     hvm_save_vtimer(TIMER_TYPE_PHYS, &v->arch.phys_timer);
     hvm_save_vtimer(TIMER_TYPE_VIRT, &v->arch.virt_timer);

> +            ctxt.cval = t->cval;
> +            ctxt.ctl = t->ctl;
> +            ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset :
> +                d->arch.virt_timer_base.offset;
> +            ctxt.type = i ? TIMER_TYPE_PHYS : TIMER_TYPE_VIRT;
> +
> +            if ( (ret = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
> +                return ret;
> +
> +            t = &v->arch.phys_timer;

It will avoid this hackish line.

> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static int hvm_vtimer_load_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_hw_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t = NULL;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    if ( ctxt.type == TIMER_TYPE_VIRT )
> +    {
> +        t = &v->arch.virt_timer;
> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
> +    }
> +    else

else if ( ctx.type == TIMER_TYPE_PHYS ) ? Then fail if the ctx.type is 
wrong.

Even better I would use a switch case.

> +    {
> +        t = &v->arch.phys_timer;
> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;

Saving {virt,phys}_timer_base.offset which are per-domain seems a waste 
of space and confusing.

> +    }
> +
> +    t->cval = ctxt.cval;
> +    t->ctl = ctxt.ctl;
> +    t->v = v;
> +
> +    return 0;
> +}
> +

[..]

> diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h
> new file mode 100644
> index 0000000..09f7cb8
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/support.h
> @@ -0,0 +1,29 @@
> +/*
> + * asm-arm/hvm/support.h: HVM support routines used by ARM.
> + *
> + * Copyright (c) 2014, Samsung Electronics.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
> +#define __ASM_ARM_HVM_SUPPORT_H__
> +
> +#include <xen/types.h>
> +#include <public/hvm/ioreq.h>
> +#include <xen/sched.h>
> +#include <xen/hvm/save.h>
> +#include <asm/processor.h>
> +
> +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
> diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
> index 75b8e65..f6ad258 100644
> --- a/xen/include/public/arch-arm/hvm/save.h
> +++ b/xen/include/public/arch-arm/hvm/save.h
> @@ -26,6 +26,136 @@
>   #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>   #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>
> +#define HVM_FILE_MAGIC   0x92385520
> +#define HVM_FILE_VERSION 0x00000001
> +
> +struct hvm_save_header
> +{
> +    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
> +    uint32_t version;           /* File format version */
> +    uint64_t changeset;         /* Version of Xen that saved this file */
> +    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */
> +};
> +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
> +
> +struct vgic_rank

I would rename into vgic_v2_rank.

> +{
> +    uint32_t ienable, iactive, ipend, pendsgi;
> +    uint32_t icfg[2];
> +    uint32_t ipriority[8];
> +    uint32_t itargets[8];
> +};
> +
> +struct hvm_hw_gic

I would rename into hvm_hw_vgic.

> +{
> +    uint32_t gic_hcr;
> +    uint32_t gic_vmcr;
> +    uint32_t gic_apr;
> +    uint32_t gic_lr[64];
> +    uint64_t event_mask;
> +    uint64_t lr_mask;
> +    struct vgic_rank ppi_state;

As said previously, I would separate GIC from VGIC save/restore.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-16  9:48   ` Julien Grall
@ 2014-04-16 10:30     ` Jan Beulich
  2014-04-16 15:54       ` Wei Huang
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2014-04-16 10:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Wei Huang
  Cc: Keir Fraser, ian.campbell, stefano.stabellini, andrew.cooper3,
	jaeyong.yoo, yjhyun.yoo

>>> On 16.04.14 at 11:48, <julien.grall@linaro.org> wrote:
> You are modifying common code in this patch. I've add Jan and Keir.

That's for x86 side changes. IanC, IanJ, and Tim should also be copied
for common code changes now (unless it is clear that IanC, who was
already copied, would deal with it).

> Also, I would create a separate patch for this code movement.

Indeed. Plus it needs explanation why XEN_DOMCTL_gethvmcontext_partial
doesn't also get moved.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall
  2014-04-15 22:46   ` Julien Grall
@ 2014-04-16 15:33     ` Wei Huang
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-16 15:33 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: andrew.cooper3, stefano.stabellini, ian.campbell, jaeyong.yoo,
	yjhyun.yoo

On 04/15/2014 05:46 PM, Julien Grall wrote:
> Hello Wei,
>
> Thank you for the patch.
>
> On 15/04/14 22:05, Wei Huang wrote:
>> From: Jaeyong Yoo <jaeyong.yoo@samsung.com>
>>
>> This patch implements ddomain_get_maximum_gpfn by using max_mapped_gfn
>> field of P2M struct. A support function to retrieve guest VM pfn range
>> is also added.
>>
>> Signed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
>> Signed-off-by: Wei Huang <w1.huang@samsung.com>
>> ---
>>   xen/arch/arm/mm.c        |   21 ++++++++++++++++++++-
>>   xen/include/asm-arm/mm.h |    1 +
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index 362bc8d..473ad04 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -947,7 +947,11 @@ int page_is_ram_type(unsigned long mfn, unsigned
>> long mem_type)
>>
>>   unsigned long domain_get_maximum_gpfn(struct domain *d)
>>   {
>> -    return -ENOSYS;
>> +    paddr_t end;
>> +
>> +    domain_get_gpfn_range(d, NULL, &end);
>> +
>> +    return (unsigned long)end;
>>   }
>>
>>   void share_xen_page_with_guest(struct page_info *page,
>> @@ -1235,6 +1239,21 @@ int is_iomem_page(unsigned long mfn)
>>           return 1;
>>       return 0;
>>   }
>> +
>> +/*
>> + * Return start and end addresses of guest VM
>> + */
>> +void domain_get_gpfn_range(struct domain *d, paddr_t *start, paddr_t
>> *end)
>
> The content of the function doesn't match the name.
>
> This function should return a PFN not an address.
> Actually, libxc (i.e the return of domain_get_maximum_gpfn) expect a pfn.
>
>> +{
>> +    struct p2m_domain *p2m = &d->arch.p2m;
>> +
>> +    if ( start )
>> +        *start = GUEST_RAM_BASE;
>
> You can use  p2m->lowest_mapped_gfn here.
>
>> +    if ( end )
>> +        *end = GUEST_RAM_BASE + ((paddr_t) p2m->max_mapped_gfn);
>
> This is wrong, max_mapped_gpfn contains a guest frame number, not a
> number of frames.
>
> The code should be smth like:
>
> *end = pfn_to_paddr(p2m->max_mapped_gfn);
I will correct them. This patch set touches too many parts. I will break 
them into smaller patches to satisfy the comments.
>
> Regards,
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-16 10:30     ` Jan Beulich
@ 2014-04-16 15:54       ` Wei Huang
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-16 15:54 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall, xen-devel
  Cc: Keir Fraser, ian.campbell, stefano.stabellini, andrew.cooper3,
	jaeyong.yoo, yjhyun.yoo

On 04/16/2014 05:30 AM, Jan Beulich wrote:
>>>> On 16.04.14 at 11:48, <julien.grall@linaro.org> wrote:
>> You are modifying common code in this patch. I've add Jan and Keir.
>
> That's for x86 side changes. IanC, IanJ, and Tim should also be copied
> for common code changes now (unless it is clear that IanC, who was
> already copied, would deal with it).
Sorry, will do next time. I will break the patches into smaller, 
independent components for specific reviewers to comment.
>
>> Also, I would create a separate patch for this code movement.
>
> Indeed. Plus it needs explanation why XEN_DOMCTL_gethvmcontext_partial
> doesn't also get moved.
Will fix.
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (12 preceding siblings ...)
  2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
@ 2014-04-16 16:29 ` Julien Grall
  2014-04-16 16:41   ` Wei Huang
  2014-04-23 11:49 ` Ian Campbell
  14 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2014-04-16 16:29 UTC (permalink / raw)
  To: Wei Huang
  Cc: ian.campbell, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

Hi Wei,

On 04/15/2014 10:05 PM, Wei Huang wrote:
> This series is RFC v2 for save/restore/migration. The following 
> areas have been addressed:
>   * save and restore of guest states is split into specific areas (and files)
>   * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
>   * name and layout of some functions
>   * small areas commented by Julien Grall and Andrew Cooper


While I was reading the series I found some comments about the
restriction of the solution (amount of RAM, SMP...). Can you write down
in a same place what's need to done? what is working currently?...

Thanks,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-16 16:29 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Julien Grall
@ 2014-04-16 16:41   ` Wei Huang
  2014-04-16 16:50     ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-04-16 16:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: ian.campbell, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

On 04/16/2014 11:29 AM, Julien Grall wrote:
> Hi Wei,
>
> On 04/15/2014 10:05 PM, Wei Huang wrote:
>> This series is RFC v2 for save/restore/migration. The following
>> areas have been addressed:
>>    * save and restore of guest states is split into specific areas (and files)
>>    * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
>>    * name and layout of some functions
>>    * small areas commented by Julien Grall and Andrew Cooper
>
>
> While I was reading the series I found some comments about the
> restriction of the solution (amount of RAM, SMP...). Can you write down
> in a same place what's need to done? what is working currently?...
>
OK. The best candidate for common place would be in the cover letter. Or 
you have other preferences?
> Thanks,
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-16 16:41   ` Wei Huang
@ 2014-04-16 16:50     ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-16 16:50 UTC (permalink / raw)
  To: Wei Huang
  Cc: ian.campbell, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

On 04/16/2014 05:41 PM, Wei Huang wrote:
> On 04/16/2014 11:29 AM, Julien Grall wrote:
>> Hi Wei,
>>
>> On 04/15/2014 10:05 PM, Wei Huang wrote:
>>> This series is RFC v2 for save/restore/migration. The following
>>> areas have been addressed:
>>>    * save and restore of guest states is split into specific areas
>>> (and files)
>>>    * get XENMEM_maximum_gpfn is now supported via P2M max_mapped_gfn.
>>>    * name and layout of some functions
>>>    * small areas commented by Julien Grall and Andrew Cooper
>>
>>
>> While I was reading the series I found some comments about the
>> restriction of the solution (amount of RAM, SMP...). Can you write down
>> in a same place what's need to done? what is working currently?...
>>
> OK. The best candidate for common place would be in the cover letter. Or
> you have other preferences?

Yes, I usually use the cover letter for explaining what are missing in
the current version, how to test it....

If we upstream with these restriction, we will have to add them either
in the repo or the wiki page. Also, we want to let the user knows (for
instance via Xen console) what is the problem. It will be easier to
debug user issue later.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-15 23:37   ` Andrew Cooper
@ 2014-04-16 21:50     ` Wei Huang
  2014-04-17 12:55       ` Andrew Cooper
  0 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-04-16 21:50 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 04/15/2014 06:37 PM, Andrew Cooper wrote:
>> --- a/xen/include/public/arch-arm/hvm/save.h
>> +++ b/xen/include/public/arch-arm/hvm/save.h
>> @@ -26,6 +26,136 @@
>>   #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>   #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>
>> +#define HVM_FILE_MAGIC   0x92385520
>> +#define HVM_FILE_VERSION 0x00000001
>> +
>> +struct hvm_save_header
>> +{
>> +    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
>> +    uint32_t version;           /* File format version */
>> +    uint64_t changeset;         /* Version of Xen that saved this file */
>> +    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */
>
> This looks needlessly copied from x86, which is far from ideal.
>
> On x86, Xen tries to parse the mercural revision number from its compile
> time information and fails now that the underlying codebase has moved
> from hg to git.  As a result, the value is now generally -1.
>
> cpuid is also an x86ism as far as I am aware.
>
> I also wonder about the wisdom of having identically named structures
> like this in arch code without an arch_ prefix?  We should make it as
> hard as possible for things like this to accidentally get referenced in
> common code.
>
It is tricky for hvm_save_header. This is a struct used in common code 
(xen/common/hvm/save.c). Instead of making it arch_ specific, I would 
move it to common code (include/public/hvm/save.h), with the following 
modification:
1. Re-define "cpuid" of hvm_save_header as cpu_id. hvm_save_header will 
be shared by both x86 and ARM.
2. Rename HVM_FILE_MAGIC to HVM_ARM_FILE_MAGIC. Still keep it in 
arch-arm/hvm/save.h file. This is used in arch- specific code, so we 
won't get confused. The same applies to HVM_FILE_MAGIC in x86.
3. Other struct in arch-arm/hvm/save.h will remain in the same file. 
Those structs are arch specific anyway.

-Wei

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-16 21:50     ` Wei Huang
@ 2014-04-17 12:55       ` Andrew Cooper
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2014-04-17 12:55 UTC (permalink / raw)
  To: Wei Huang, xen-devel
  Cc: yjhyun.yoo, julien.grall, ian.campbell, jaeyong.yoo, stefano.stabellini

On 16/04/2014 22:50, Wei Huang wrote:
> On 04/15/2014 06:37 PM, Andrew Cooper wrote:
>>> --- a/xen/include/public/arch-arm/hvm/save.h
>>> +++ b/xen/include/public/arch-arm/hvm/save.h
>>> @@ -26,6 +26,136 @@
>>>   #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>>   #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
>>>
>>> +#define HVM_FILE_MAGIC   0x92385520
>>> +#define HVM_FILE_VERSION 0x00000001
>>> +
>>> +struct hvm_save_header
>>> +{
>>> +    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
>>> +    uint32_t version;           /* File format version */
>>> +    uint64_t changeset;         /* Version of Xen that saved this
>>> file */
>>> +    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */
>>
>> This looks needlessly copied from x86, which is far from ideal.
>>
>> On x86, Xen tries to parse the mercural revision number from its compile
>> time information and fails now that the underlying codebase has moved
>> from hg to git.  As a result, the value is now generally -1.
>>
>> cpuid is also an x86ism as far as I am aware.
>>
>> I also wonder about the wisdom of having identically named structures
>> like this in arch code without an arch_ prefix?  We should make it as
>> hard as possible for things like this to accidentally get referenced in
>> common code.
>>
> It is tricky for hvm_save_header. This is a struct used in common code
> (xen/common/hvm/save.c). Instead of making it arch_ specific, I would
> move it to common code (include/public/hvm/save.h), with the following
> modification:
> 1. Re-define "cpuid" of hvm_save_header as cpu_id. hvm_save_header
> will be shared by both x86 and ARM.
> 2. Rename HVM_FILE_MAGIC to HVM_ARM_FILE_MAGIC. Still keep it in
> arch-arm/hvm/save.h file. This is used in arch- specific code, so we
> won't get confused. The same applies to HVM_FILE_MAGIC in x86.
> 3. Other struct in arch-arm/hvm/save.h will remain in the same file.
> Those structs are arch specific anyway.
>
> -Wei
>

Eugh.  Having a more thorough look through all of this code, it is in
need of improvement.

For the sake of hdr.{magic,version,changeset}, it is not worth keeping
some common save logic for the header.  arch_hvm_save() should be
updated to be given the hvm context and should construct & write the
entire structure.  This also matches the current semantics of
arch_hvm_load() where the arch handler deals with the entire structure.

The currently existing hvm_save_header is quite silly.  'changeset'
conveys no useful information since the switch from hg to git.  'cpuid'
is used for the sole purpose a printk(), and 'gtsc_khz' is
unconditionally repeated later in the migration record with the rest of
the tsc information.

Everything currently in arch-x86/hvm/save.h should be renamed to
identify them as hvm_x86.  This can be done in a backwards compatible
manner by using some __XEN_INTERFACE_VERSION__ ifdefary

Everything new in arch-arm/hvm/save.h should be identified as hvm_arm
right from the outset.

Beyond that, the only key point should need to be that
HVM_$arch_FILE_MAGIC need be different for each $arch, but that appears
already in hand.

~Andrew

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
  2014-04-15 23:37   ` Andrew Cooper
  2014-04-16  9:48   ` Julien Grall
@ 2014-04-17 15:06   ` Julien Grall
  2014-04-17 16:55     ` Wei Huang
  2014-05-12  9:16     ` Ian Campbell
  2 siblings, 2 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-17 15:06 UTC (permalink / raw)
  To: Wei Huang, ian.campbell, stefano.stabellini
  Cc: andrew.cooper3, yjhyun.yoo, jaeyong.yoo, xen-devel

Hello Wei,

On 04/15/2014 10:05 PM, Wei Huang wrote:
> +static int hvm_vtimer_load_ctxt(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_hw_timer ctxt;
> +    struct vcpu *v;
> +    struct vtimer *t = NULL;
> +
> +    /* Which vcpu is this? */
> +    vcpuid = hvm_load_instance(h);
> +
> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    {
> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
> +                d->domain_id, vcpuid);
> +        return -EINVAL;
> +    }
> +
> +    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
> +        return -EINVAL;
> +
> +    if ( ctxt.type == TIMER_TYPE_VIRT )
> +    {
> +        t = &v->arch.virt_timer;
> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
> +    }
> +    else
> +    {
> +        t = &v->arch.phys_timer;
> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;

I thought a bit more about the {phys,virt}_timer_base.offset.

When you are migrating a guest, this offset will be invalidated. This
offset is used to get a relative offset from the Xen timer counter.

That also made me think the context switch in Xen for the timer looks
wrong to me.

When a guest VCPU is context switch, the Xen timer counter continue to
run. But not CVAL, so the timer_base.offset will drift a bit. It will
result by setting a wrong timer via set_timer in Xen.

Did I miss something?

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-17 15:06   ` Julien Grall
@ 2014-04-17 16:55     ` Wei Huang
  2014-05-12  9:16     ` Ian Campbell
  1 sibling, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-17 16:55 UTC (permalink / raw)
  To: Julien Grall, ian.campbell, stefano.stabellini
  Cc: andrew.cooper3, yjhyun.yoo, jaeyong.yoo, xen-devel

On 04/17/2014 10:06 AM, Julien Grall wrote:
> Hello Wei,
>
> On 04/15/2014 10:05 PM, Wei Huang wrote:
>> +static int hvm_vtimer_load_ctxt(struct domain *d, hvm_domain_context_t *h)
>> +{
>> +    int vcpuid;
>> +    struct hvm_hw_timer ctxt;
>> +    struct vcpu *v;
>> +    struct vtimer *t = NULL;
>> +
>> +    /* Which vcpu is this? */
>> +    vcpuid = hvm_load_instance(h);
>> +
>> +    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
>> +    {
>> +        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
>> +                d->domain_id, vcpuid);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if ( hvm_load_entry(TIMER, h, &ctxt) != 0 )
>> +        return -EINVAL;
>> +
>> +    if ( ctxt.type == TIMER_TYPE_VIRT )
>> +    {
>> +        t = &v->arch.virt_timer;
>> +        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
>> +    }
>> +    else
>> +    {
>> +        t = &v->arch.phys_timer;
>> +        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
>
> I thought a bit more about the {phys,virt}_timer_base.offset.
>
> When you are migrating a guest, this offset will be invalidated. This
> offset is used to get a relative offset from the Xen timer counter.
>
Agreed.
> That also made me think the context switch in Xen for the timer looks
> wrong to me.
>
> When a guest VCPU is context switch, the Xen timer counter continue to
> run. But not CVAL, so the timer_base.offset will drift a bit. It will
> result by setting a wrong timer via set_timer in Xen.
>
I need to examine the code more carefully. But by skimming through 
vtimer.c, it looks like this is the case: the missing ticks were not 
compensated.
> Did I miss something?
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration
  2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
  2014-04-15 22:29   ` Julien Grall
  2014-04-15 23:40   ` Andrew Cooper
@ 2014-04-22 17:54   ` Julien Grall
  2 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-22 17:54 UTC (permalink / raw)
  To: Wei Huang
  Cc: ian.campbell, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

Hi Wei,

I will try to not repeat Andrew's comments.

On 04/15/2014 10:05 PM, Wei Huang wrote:
>  xen/arch/arm/domain.c        |    5 ++
>  xen/arch/arm/mm.c            |  116 ++++++++++++++++++++++++++++++++++++++++++

I think the functions you've added in mm.c should be part of p2m.c.

[..]

> +/* Restore the xen page table for vlpt mapping for domain */
> +void restore_vlpt(struct domain *d)
> +{
> +    int i;
> +
> +    dsb(sy);

I think inner-shareable (ish) is enough here.

> +
> +    for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end;
> +          ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
> +        {
> +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
> +            flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT);
> +        }
> +    }
> +    
> +    dsb(sy);

Same here.

> +    isb();
> +}
> +
> +/* Set up the xen page table for vlpt mapping for domain */
> +int prepare_vlpt(struct domain *d)
> +{
> +    int xen_second_linear_base;
> +    int gp2m_start_index, gp2m_end_index;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    struct page_info *second_lvl_page;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    lpae_t *first[2];
> +    int i;
> +    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
> +
> +    domain_get_gpfn_range(d, &gma_start, &gma_end);
> +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> +
> +    if ( required > avail )

What is the limit of RAM?

> +    {
> +        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest"
> +                "(avail: %llx, required: %llx)\n", (unsigned long long)avail,
> +                (unsigned long long)required);

Why do you cast here?

> +        return -ENOMEM;
> +    }
> +
> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
> +    {

In which case, this thing happen?

> +        dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU");
> +        return -ENOMEM;
> +    }
> +
> +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> +    if ( second_lvl_page == NULL )
> +        return -ENOMEM;
> +
> +    /* First level p2m is 2 consecutive pages */
> +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page) );
> +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> +        page_to_mfn(second_lvl_page+1) );

map_domain_page_global can fail.

> +
> +    first[0] = __map_domain_page(p2m->first_level);
> +    first[1] = __map_domain_page(p2m->first_level+1);
> +
> +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> +
> +        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);

spaces should surrounding binary operator.

I think you can create a temporary variable to store first[l][k]. It
will avoid two load.

> +
> +        /* we copy the mapping into domain's structure as a reference
> +         * in case of the context switch (used in restore_vlpt) */
> +        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
> +    }
> +    unmap_domain_page(first[0]);
> +    unmap_domain_page(first[1]);
> +
> +    /* storing the start and end index */
> +    d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index;
> +    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
> +
> +    flush_vlpt(d);
> +
> +    return 0;
> +}
> +
> +void cleanup_vlpt(struct domain *d)
> +{
> +    /* First level p2m is 2 consecutive pages */
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> +}

Newline here please.

>  /* Fixmap slots */
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 28c359a..5321bd6 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -161,6 +161,13 @@ struct arch_domain
>          spinlock_t                  lock;
>      } vuart;
>  
> +    /* dirty-page tracing */
> +    struct {
> +        volatile int second_lvl_start;   /* for context switch */
> +        volatile int second_lvl_end;

Can you please comment theses 2 fields. What does it mean?


> +        lpae_t *second_lvl[2];           /* copy of guest p2m's first */
> +    } dirty;
> +
>      unsigned int evtchn_irq;
>  }  __cacheline_aligned;
>  
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 8347524..5fd684f 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -4,6 +4,7 @@
>  #include <xen/config.h>
>  #include <xen/kernel.h>
>  #include <asm/page.h>
> +#include <asm/config.h>
>  #include <public/xen.h>
>  
>  /* Align Xen to a 2 MiB boundary. */
> @@ -342,6 +343,22 @@ static inline void put_page_and_type(struct page_info *page)
>      put_page(page);
>  }
>  
> +int prepare_vlpt(struct domain *d);
> +void cleanup_vlpt(struct domain *d);
> +void restore_vlpt(struct domain *d);
> +
> +/* calculate the xen's virtual address for accessing the leaf PTE of
> + * a given address (GPA) */
> +static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)

No space between * and the function name.

> +{
> +    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
> +
> +    /* Since we slotted the guest's first p2m page table to xen's
> +     * second page table, one shift is enough for calculating the
> +     * index of guest p2m table entry */
> +    return &table[addr >> PAGE_SHIFT];
> +}
> +

Regards,


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
                   ` (13 preceding siblings ...)
  2014-04-16 16:29 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Julien Grall
@ 2014-04-23 11:49 ` Ian Campbell
  2014-04-23 18:41   ` Wei Huang
  14 siblings, 1 reply; 47+ messages in thread
From: Ian Campbell @ 2014-04-23 11:49 UTC (permalink / raw)
  To: Wei Huang
  Cc: stefano.stabellini, andrew.cooper3, julien.grall, jaeyong.yoo,
	xen-devel, yjhyun.yoo

On Tue, 2014-04-15 at 16:05 -0500, Wei Huang wrote:
>   * Rev v3 will be sent out soon.

How soon? Should I bother reviewing this iteration?

I suppose you are coordinating with Junghyun Yoo and there isn't going
to be a separate set of these patches at some point?

Ian.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing
  2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
  2014-04-15 23:35   ` Julien Grall
@ 2014-04-23 11:59   ` Julien Grall
  1 sibling, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-04-23 11:59 UTC (permalink / raw)
  To: Wei Huang
  Cc: ian.campbell, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

Hi Wei,

On 04/15/2014 10:05 PM, Wei Huang wrote:
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 3f04a77..d2531ed 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -207,6 +207,12 @@ static void ctxt_switch_to(struct vcpu *n)
>  
>      isb();
>  
> +    /* Dirty-page tracing
> +     * NB: How do we consider SMP case?
> +     */
> +    if ( n->domain->arch.dirty.mode )
> +        restore_vlpt(n->domain);
> +

I though a bit more of this piece of code.

Your VLPT implementation uses xen_second which is shared between every
vCPU. Therefore restoring VLPT is pointless here.

In another hand, I didn't see anything which prevent you to migrate 2
domain at the same time. You will likely crash one of (if not both) the
guests.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration
  2014-04-23 11:49 ` Ian Campbell
@ 2014-04-23 18:41   ` Wei Huang
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Huang @ 2014-04-23 18:41 UTC (permalink / raw)
  To: Ian Campbell
  Cc: stefano.stabellini, andrew.cooper3, julien.grall, jaeyong.yoo,
	xen-devel, yjhyun.yoo

On 04/23/2014 06:49 AM, Ian Campbell wrote:
> On Tue, 2014-04-15 at 16:05 -0500, Wei Huang wrote:
>>    * Rev v3 will be sent out soon.
>
> How soon? Should I bother reviewing this iteration?
>
In next couple days. You don't have to review this iteration. There have 
been enough reviews from Julien and Andrew.
> I suppose you are coordinating with Junghyun Yoo and there isn't going
> to be a separate set of these patches at some point?
No, I haven't received any feedback from Junghyun yet. I will ping him 
again.
>
> Ian.
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-04-17 15:06   ` Julien Grall
  2014-04-17 16:55     ` Wei Huang
@ 2014-05-12  9:16     ` Ian Campbell
  2014-05-12 12:04       ` Julien Grall
  1 sibling, 1 reply; 47+ messages in thread
From: Ian Campbell @ 2014-05-12  9:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Huang, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

On Thu, 2014-04-17 at 16:06 +0100, Julien Grall wrote:
> I thought a bit more about the {phys,virt}_timer_base.offset.
> 
> When you are migrating a guest, this offset will be invalidated. This
> offset is used to get a relative offset from the Xen timer counter.
> 
> That also made me think the context switch in Xen for the timer looks
> wrong to me.
> 
> When a guest VCPU is context switch, the Xen timer counter continue to
> run. But not CVAL, so the timer_base.offset will drift a bit. It will
> result by setting a wrong timer via set_timer in Xen.
> 
> Did I miss something?

The timer offset is mainly accounting for the fact that the domain is
not booted when the hardware is started.

However time does continue while a VCPU is not scheduled, this is
exposed via the PV "stolen time" mechanism.

Now it is in theory possible to virtualise time differently so that
stolen time is not possible, but unless you want to cope with different
VCPUs seeing different times (because they have been descheduled for
differently lengths of times) then you either need to do gang scheduling
or play other (likely complicated) tricks. With the model we have on ARM
paravirtualising this is the right thing to do.

Not sure what you mean about CVAL (the timer compare val) not running,
when we deschedule a VCPU we figure out when CVAL would have caused the
timer interrupt to fire and setup a Xen timer to make sure we unblock
the VCPU at that point. When we switch back to the VCPU we of course
restore the compare value to what the guest wrote, nothing else would
make sense.

Ian

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-05-12  9:16     ` Ian Campbell
@ 2014-05-12 12:04       ` Julien Grall
       [not found]         ` <53723ACC.8040402@samsung.com>
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2014-05-12 12:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Huang, stefano.stabellini, andrew.cooper3, jaeyong.yoo,
	xen-devel, yjhyun.yoo

On 05/12/2014 10:16 AM, Ian Campbell wrote:
> On Thu, 2014-04-17 at 16:06 +0100, Julien Grall wrote:
>> I thought a bit more about the {phys,virt}_timer_base.offset.
>>
>> When you are migrating a guest, this offset will be invalidated. This
>> offset is used to get a relative offset from the Xen timer counter.
>>
>> That also made me think the context switch in Xen for the timer looks
>> wrong to me.
>>
>> When a guest VCPU is context switch, the Xen timer counter continue to
>> run. But not CVAL, so the timer_base.offset will drift a bit. It will
>> result by setting a wrong timer via set_timer in Xen.
>>
>> Did I miss something?
> 
> The timer offset is mainly accounting for the fact that the domain is
> not booted when the hardware is started.
> 
> However time does continue while a VCPU is not scheduled, this is
> exposed via the PV "stolen time" mechanism.
> 
> Now it is in theory possible to virtualise time differently so that
> stolen time is not possible, but unless you want to cope with different
> VCPUs seeing different times (because they have been descheduled for
> differently lengths of times) then you either need to do gang scheduling
> or play other (likely complicated) tricks. With the model we have on ARM
> paravirtualising this is the right thing to do.
> 
> Not sure what you mean about CVAL (the timer compare val) not running,
> when we deschedule a VCPU we figure out when CVAL would have caused the
> timer interrupt to fire and setup a Xen timer to make sure we unblock
> the VCPU at that point. When we switch back to the VCPU we of course
> restore the compare value to what the guest wrote, nothing else would
> make sense.

After reading your explanation and the ARM ARM again, I think I mingled
CNT (the counter) and CVAL (the compare val).

Thank you for the explanation.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
       [not found]         ` <53723ACC.8040402@samsung.com>
@ 2014-05-13 15:42           ` Julien Grall
  2014-05-13 16:18             ` Wei Huang
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2014-05-13 15:42 UTC (permalink / raw)
  To: Wei Huang; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

(Adding back Xen devel)

On 05/13/2014 04:31 PM, Wei Huang wrote:
> On 05/12/2014 07:04 AM, Julien Grall wrote:
>> On 05/12/2014 10:16 AM, Ian Campbell wrote:
>>> On Thu, 2014-04-17 at 16:06 +0100, Julien Grall wrote:
>>>> I thought a bit more about the {phys,virt}_timer_base.offset.
>>>>
>>>> When you are migrating a guest, this offset will be invalidated. This
>>>> offset is used to get a relative offset from the Xen timer counter.
>>>>
>>>> That also made me think the context switch in Xen for the timer looks
>>>> wrong to me.
>>>>
>>>> When a guest VCPU is context switch, the Xen timer counter continue to
>>>> run. But not CVAL, so the timer_base.offset will drift a bit. It will
>>>> result by setting a wrong timer via set_timer in Xen.
>>>>
>>>> Did I miss something?
>>>
>>> The timer offset is mainly accounting for the fact that the domain is
>>> not booted when the hardware is started.
>>>
>>> However time does continue while a VCPU is not scheduled, this is
>>> exposed via the PV "stolen time" mechanism.
>>>
>>> Now it is in theory possible to virtualise time differently so that
>>> stolen time is not possible, but unless you want to cope with different
>>> VCPUs seeing different times (because they have been descheduled for
>>> differently lengths of times) then you either need to do gang scheduling
>>> or play other (likely complicated) tricks. With the model we have on ARM
>>> paravirtualising this is the right thing to do.
>>>
>>> Not sure what you mean about CVAL (the timer compare val) not running,
>>> when we deschedule a VCPU we figure out when CVAL would have caused the
>>> timer interrupt to fire and setup a Xen timer to make sure we unblock
>>> the VCPU at that point. When we switch back to the VCPU we of course
>>> restore the compare value to what the guest wrote, nothing else would
>>> make sense.
>>
>> After reading your explanation and the ARM ARM again, I think I mingled
>> CNT (the counter) and CVAL (the compare val).
>>
>> Thank you for the explanation.
>>
> Other than the code comments (case/switch), are you OK with the design
> of the latest ARCH_TIMER patch?

I made some comment on the v3. Once you will address comments from
Andrew and me, the patch will be in good shape.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-05-13 15:42           ` Julien Grall
@ 2014-05-13 16:18             ` Wei Huang
  2014-05-13 16:37               ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-05-13 16:18 UTC (permalink / raw)
  To: Julien Grall; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On 05/13/2014 10:42 AM, Julien Grall wrote:
> (Adding back Xen devel)
>
> On 05/13/2014 04:31 PM, Wei Huang wrote:
>> On 05/12/2014 07:04 AM, Julien Grall wrote:
>>> On 05/12/2014 10:16 AM, Ian Campbell wrote:
>>>> On Thu, 2014-04-17 at 16:06 +0100, Julien Grall wrote:
>>>>> I thought a bit more about the {phys,virt}_timer_base.offset.
>>>>>
>>>>> When you are migrating a guest, this offset will be invalidated. This
>>>>> offset is used to get a relative offset from the Xen timer counter.
>>>>>
>>>>> That also made me think the context switch in Xen for the timer looks
>>>>> wrong to me.
>>>>>
>>>>> When a guest VCPU is context switch, the Xen timer counter continue to
>>>>> run. But not CVAL, so the timer_base.offset will drift a bit. It will
>>>>> result by setting a wrong timer via set_timer in Xen.
>>>>>
>>>>> Did I miss something?
>>>>
>>>> The timer offset is mainly accounting for the fact that the domain is
>>>> not booted when the hardware is started.
>>>>
>>>> However time does continue while a VCPU is not scheduled, this is
>>>> exposed via the PV "stolen time" mechanism.
>>>>
>>>> Now it is in theory possible to virtualise time differently so that
>>>> stolen time is not possible, but unless you want to cope with different
>>>> VCPUs seeing different times (because they have been descheduled for
>>>> differently lengths of times) then you either need to do gang scheduling
>>>> or play other (likely complicated) tricks. With the model we have on ARM
>>>> paravirtualising this is the right thing to do.
>>>>
>>>> Not sure what you mean about CVAL (the timer compare val) not running,
>>>> when we deschedule a VCPU we figure out when CVAL would have caused the
>>>> timer interrupt to fire and setup a Xen timer to make sure we unblock
>>>> the VCPU at that point. When we switch back to the VCPU we of course
>>>> restore the compare value to what the guest wrote, nothing else would
>>>> make sense.
>>>
>>> After reading your explanation and the ARM ARM again, I think I mingled
>>> CNT (the counter) and CVAL (the compare val).
>>>
>>> Thank you for the explanation.
>>>
>> Other than the code comments (case/switch), are you OK with the design
>> of the latest ARCH_TIMER patch?
>
> I made some comment on the v3. Once you will address comments from
> Andrew and me, the patch will be in good shape.
>
Given the comments from you and Andrew, I will revise the context struct 
to the following format. With this, we can get rid of most problems 
(switch/case/...).

struct hvm_arm_timer
{
     /* phys_timer */
     uint64_t phys_vtb_offset;
     uint64_t phys_cval;
     uint32_t phys_ctl;

     /* virt_timer */
     uint64_t virt_vtb_offset;
     uint64_t virt_cval;
     uint32_t virt_ctl;
};
DECLARE_HVM_SAVE_TYPE(TIMER, 4, struct hvm_arm_timer);

Any comments, please let me know.

> Regards,
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-05-13 16:18             ` Wei Huang
@ 2014-05-13 16:37               ` Julien Grall
  2014-05-13 16:44                 ` Wei Huang
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2014-05-13 16:37 UTC (permalink / raw)
  To: Wei Huang; +Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

On 05/13/2014 05:18 PM, Wei Huang wrote:
> Given the comments from you and Andrew, I will revise the context struct
> to the following format. With this, we can get rid of most problems
> (switch/case/...).

With this solution, you will duplicate code to save/restore the timer.

> struct hvm_arm_timer
> {
>     /* phys_timer */
>     uint64_t phys_vtb_offset;
>     uint64_t phys_cval;
>     uint32_t phys_ctl;

If I'm not mistaken, you need a 32 bit padding here ...

> 
>     /* virt_timer */
>     uint64_t virt_vtb_offset;
>     uint64_t virt_cval;
>     uint32_t virt_ctl;

... and here

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-05-13 16:37               ` Julien Grall
@ 2014-05-13 16:44                 ` Wei Huang
  2014-05-13 17:33                   ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Wei Huang @ 2014-05-13 16:44 UTC (permalink / raw)
  To: Julien Grall; +Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

On 05/13/2014 11:37 AM, Julien Grall wrote:
> On 05/13/2014 05:18 PM, Wei Huang wrote:
>> Given the comments from you and Andrew, I will revise the context struct
>> to the following format. With this, we can get rid of most problems
>> (switch/case/...).
>
> With this solution, you will duplicate code to save/restore the timer.
The code size will be reduced and looks cleaner? Here is the example:

static int hvm_timer_save(struct domain *d, hvm_domain_context_t *h)
{
     struct hvm_arm_timer ctxt;
     struct vcpu *v;
     int rc = 0;

     /* Save the state of vtimer and ptimer */
     for_each_vcpu( d, v )
     {
         /* save phys_timer */
         ctxt.phys_cval = v->arch.phys_timer.cval;
         ctxt.phys_ctl = v->arch.phys_timer.ctl;
         ctxt.phys_vtb_offset = d->arch.phys_timer_base.offset;

         /* save virt_timer */
         ctxt.virt_cval = v->arch.virt_timer.cval;
         ctxt.virt_ctl = v->arch.virt_timer.ctl;
         ctxt.virt_vtb_offset = d->arch.virt_timer_base.offset;

         if ( (rc = hvm_save_entry(TIMER, v->vcpu_id, h, &ctxt)) != 0 )
             return rc;
     }

     return rc;
}

>
>> struct hvm_arm_timer
>> {
>>      /* phys_timer */
>>      uint64_t phys_vtb_offset;
>>      uint64_t phys_cval;
>>      uint32_t phys_ctl;
>
> If I'm not mistaken, you need a 32 bit padding here ...
>
>>
>>      /* virt_timer */
>>      uint64_t virt_vtb_offset;
>>      uint64_t virt_cval;
>>      uint32_t virt_ctl;
>
> ... and here
>
> Regards,
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls
  2014-05-13 16:44                 ` Wei Huang
@ 2014-05-13 17:33                   ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2014-05-13 17:33 UTC (permalink / raw)
  To: Wei Huang; +Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

On 05/13/2014 05:44 PM, Wei Huang wrote:
> On 05/13/2014 11:37 AM, Julien Grall wrote:
>> On 05/13/2014 05:18 PM, Wei Huang wrote:
>>> Given the comments from you and Andrew, I will revise the context struct
>>> to the following format. With this, we can get rid of most problems
>>> (switch/case/...).
>>
>> With this solution, you will duplicate code to save/restore the timer.
> The code size will be reduced and looks cleaner? Here is the example:

LGTM except ...

> static int hvm_timer_save(struct domain *d, hvm_domain_context_t *h)
> {
>     struct hvm_arm_timer ctxt;
>     struct vcpu *v;
>     int rc = 0;
> 
>     /* Save the state of vtimer and ptimer */
>     for_each_vcpu( d, v )
>     {
>         /* save phys_timer */
>         ctxt.phys_cval = v->arch.phys_timer.cval;
>         ctxt.phys_ctl = v->arch.phys_timer.ctl;
>         ctxt.phys_vtb_offset = d->arch.phys_timer_base.offset;
> 
>         /* save virt_timer */
>         ctxt.virt_cval = v->arch.virt_timer.cval;
>         ctxt.virt_ctl = v->arch.virt_timer.ctl;
>         ctxt.virt_vtb_offset = d->arch.virt_timer_base.offset;

I think you need to store d->arch.virt_timer_base.offset in ns rather
than ticks.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2014-05-13 17:33 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-15 21:05 [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
2014-04-15 23:37   ` Andrew Cooper
2014-04-16 21:50     ` Wei Huang
2014-04-17 12:55       ` Andrew Cooper
2014-04-16  9:48   ` Julien Grall
2014-04-16 10:30     ` Jan Beulich
2014-04-16 15:54       ` Wei Huang
2014-04-17 15:06   ` Julien Grall
2014-04-17 16:55     ` Wei Huang
2014-05-12  9:16     ` Ian Campbell
2014-05-12 12:04       ` Julien Grall
     [not found]         ` <53723ACC.8040402@samsung.com>
2014-05-13 15:42           ` Julien Grall
2014-05-13 16:18             ` Wei Huang
2014-05-13 16:37               ` Julien Grall
2014-05-13 16:44                 ` Wei Huang
2014-05-13 17:33                   ` Julien Grall
2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
2014-04-15 22:46   ` Julien Grall
2014-04-16 15:33     ` Wei Huang
2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
2014-04-15 23:38   ` Andrew Cooper
2014-04-15 23:39   ` Andrew Cooper
2014-04-16  9:10   ` Julien Grall
2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
2014-04-15 22:29   ` Julien Grall
2014-04-15 23:40   ` Andrew Cooper
2014-04-22 17:54   ` Julien Grall
2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
2014-04-15 23:38   ` Andrew Cooper
2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
2014-04-15 23:40   ` Andrew Cooper
2014-04-15 21:05 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Wei Huang
2014-04-15 22:23   ` Julien Grall
2014-04-15 21:05 ` [RFC v2 1/6] xen/arm: Save and restore support with hvm context hypercalls Wei Huang
2014-04-15 21:05 ` [RFC v2 2/6] xen/arm: implement support for XENMEM_maximum_gpfn hypercall Wei Huang
2014-04-15 21:05 ` [RFC v2 3/6] xen/arm: support guest do_suspend function Wei Huang
2014-04-15 21:05 ` [RFC v2 4/6] xen/arm: Implement VLPT for guest p2m mapping in live migration Wei Huang
2014-04-15 21:05 ` [RFC v2 5/6] xen/arm: Implement hypercall for dirty page tracing Wei Huang
2014-04-15 23:35   ` Julien Grall
2014-04-23 11:59   ` Julien Grall
2014-04-15 21:05 ` [RFC v2 6/6] xen/arm: Implement toolstack for xl restore/save and migrate Wei Huang
2014-04-16 16:29 ` [RFC v2 0/6] xen/arm: Support guest VM save/restore/migration Julien Grall
2014-04-16 16:41   ` Wei Huang
2014-04-16 16:50     ` Julien Grall
2014-04-23 11:49 ` Ian Campbell
2014-04-23 18:41   ` Wei Huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.