All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
@ 2014-08-19 10:04 Vitaly Kuznetsov
  2014-08-19 10:04 ` [PATCH v3 1/1] " Vitaly Kuznetsov
  2014-08-19 18:59 ` [PATCH v3 0/1] " David Vrabel
  0 siblings, 2 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-19 10:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Jones, David Vrabel, Jan Beulich

The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.

VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
guest. It was tested with the guest code listed below.

Note: current linux PVHVM does not perform VCPUOP_register_vcpu_info for VCPU0
so there is no need to perform VCPUOP_reset_vcpu_info. However, registering
vcpu_info for VCPU0 is supported on hypervisor side and it makes sense to have
similar support for reset operation. Guest code listed below contains a proof
of concept which does VCPUOP_register_vcpu_info for VCPU0 on startup and
VCPUOP_reset_vcpu_info on kexec. I don't think we want that part in linux
kernel upstream, however, other/future guests can use it.

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4fd979e..ec5b152 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -171,7 +171,7 @@ static void clamp_max_cpus(void)
 #endif
 }
 
-static void xen_vcpu_setup(int cpu)
+void xen_vcpu_setup(int cpu)
 {
 	struct vcpu_register_vcpu_info info;
 	int err;
@@ -183,8 +183,6 @@ static void xen_vcpu_setup(int cpu)
 	 * This path is called twice on PVHVM - first during bootup via
 	 * smp_init -> xen_hvm_cpu_notify, and then if the VCPU is being
 	 * hotplugged: cpu_up -> xen_hvm_cpu_notify.
-	 * As we can only do the VCPUOP_register_vcpu_info once lets
-	 * not over-write its result.
 	 *
 	 * For PV it is called during restore (xen_vcpu_restore) and bootup
 	 * (xen_setup_vcpu_info_placement). The hotplug mechanism does not
@@ -210,10 +208,6 @@ static void xen_vcpu_setup(int cpu)
 	/* Check to see if the hypervisor will put the vcpu_info
 	   structure where we want it, which allows direct access via
 	   a percpu-variable.
-	   N.B. This hypercall can _only_ be called once per CPU. Subsequent
-	   calls will error out with -EINVAL. This is due to the fact that
-	   hypervisor has no unregister variant and this hypercall does not
-	   allow to over-write info.mfn and info.offset.
 	 */
 	err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
 
@@ -228,6 +222,22 @@ static void xen_vcpu_setup(int cpu)
 	}
 }
 
+void xen_teardown_vcpu_setup(int cpu)
+{
+	int err;
+
+	if (!have_vcpu_info_placement)
+		return;
+
+	err = HYPERVISOR_vcpu_op(VCPUOP_reset_vcpu_info, cpu, NULL);
+	if (err) {
+		xen_raw_printk("%s: VCPUOP_reset_vcpu_info rc: %d\n", __func__, err);
+		return;
+	}
+	if (cpu < MAX_VIRT_CPUS)
+		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
+}
+
 /*
  * On restore, set the vcpu placement up again.
  * If it fails, then we're in a bad state, since
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index bc5e897..b05f91f 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -769,10 +769,27 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 #ifdef CONFIG_KEXEC
 void xen_kexec_shutdown(void)
 {
+	int cpu;
+	cpumask_var_t cpu_offline_mask;
+
 	if (!kexec_in_progress)
 		return;
 
+	gnttab_suspend();
+
+	/* Stop all CPUs except for the first one */
+	disable_nonboot_cpus();
+
 	xen_hvm_reset_eventchannels();
+
+	cpumask_andnot(cpu_offline_mask, cpu_present_mask,
+		       cpu_online_mask);
+
+	for_each_cpu(cpu, cpu_offline_mask)
+		xen_teardown_vcpu_setup(cpu);
+
+	for_each_cpu(cpu, cpu_online_mask)
+		xen_teardown_vcpu_setup(cpu);
 }
 #endif
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index d083e82..36dd380 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -53,6 +53,7 @@ void xen_init_irq_ops(void);
 void xen_setup_timer(int cpu);
 void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
+void xen_teardown_vcpu_setup(int cpu);
 cycle_t xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
 void __init xen_init_time_ops(void);
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index f12fbca..3165ab3 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1750,6 +1750,8 @@ void xen_hvm_reset_eventchannels(void)
 	}
 }
 
+void xen_vcpu_setup(int cpu);
+
 void __init xen_init_IRQ(void)
 {
 	int ret = -EINVAL;
@@ -1759,6 +1761,9 @@ void __init xen_init_IRQ(void)
 	if (ret < 0)
 		xen_evtchn_2l_init();
 
+	if (!xen_pv_domain())
+		xen_vcpu_setup(0);
+
 	evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
 				sizeof(*evtchn_to_irq), GFP_KERNEL);
 	BUG_ON(!evtchn_to_irq);
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index b05288c..b65ca42 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -172,4 +172,10 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
 /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
 #define VCPUOP_send_nmi             11
+
+/*
+ * Reset all of the vcpu_info information from their previous location to the
+ * default one used at bootup.
+ */
+#define VCPUOP_reset_vcpu_info      14
 #endif /* __XEN_PUBLIC_VCPU_H__ */

Vitaly Kuznetsov (1):
  Introduce VCPUOP_reset_vcpu_info

 xen/arch/x86/hvm/hvm.c    |  1 +
 xen/common/domain.c       | 73 +++++++++++++++++++++++++++++++++++++++++------
 xen/include/public/vcpu.h | 19 ++++++++++++
 3 files changed, 85 insertions(+), 8 deletions(-)

-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-19 10:04 [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info Vitaly Kuznetsov
@ 2014-08-19 10:04 ` Vitaly Kuznetsov
  2014-08-19 10:21   ` Andrew Cooper
  2014-08-19 23:22   ` Jan Beulich
  2014-08-19 18:59 ` [PATCH v3 0/1] " David Vrabel
  1 sibling, 2 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-19 10:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Jones, David Vrabel, Jan Beulich

When an SMP guest performs kexec/kdump it tries issuing VCPUOP_register_vcpu_info.
This fails due to vcpu_info already being registered. Introduce new vcpu operation
to reset vcpu_info to its default state.

Based on the original patch by Konrad Rzeszutek Wilk.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes from RFCv2:
- modify unmap_vcpu_info() to match both needs [Jan Beulich]
- support VCPUOP_reset_vcpu_info for current VCPU [Jan Beulich]
- improve description in public/vcpu.h [Jan Beulich]

Changes from RFCv1:
- Don't use unsuitable unmap_vcpu_info(), rewrite [Jan Beulich]
- Require FIFO ABI being in 2-level mode
- Describe limitations in include/public/vcpu.h [Jan Beulich]
---
 xen/arch/x86/hvm/hvm.c    |  1 +
 xen/common/domain.c       | 73 +++++++++++++++++++++++++++++++++++++++++------
 xen/include/public/vcpu.h | 19 ++++++++++++
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 216c3f2..7917272 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3332,6 +3332,7 @@ static long hvm_vcpu_op(
     case VCPUOP_set_singleshot_timer:
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
+    case VCPUOP_reset_vcpu_info:
     case VCPUOP_register_vcpu_time_memory_area:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 44e5cbe..9e10e1e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -956,25 +956,52 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
 }
 
 /*
- * Unmap the vcpu info page if the guest decided to place it somewhere
- * else.  This is only used from arch_domain_destroy, so there's no
- * need to do anything clever.
+ * Unmap the vcpu info page if the guest decided to place it somewhere else.
+ * There's no need to do anything clever when the domain is dying (when called
+ * from arch_domain_destroy) and we need to prepare for rebinding when it is not
+ * (in case VCPUOP_reset_vcpu_info was called).
  */
 void unmap_vcpu_info(struct vcpu *v)
 {
+    struct domain *d = v->domain;
     unsigned long mfn;
+    vcpu_info_t *old_info;
+    unsigned int i;
+
+    mfn = v->vcpu_info_mfn;
+    old_info = v->vcpu_info;
 
-    if ( v->vcpu_info_mfn == INVALID_MFN )
+    if ( mfn == INVALID_MFN )
         return;
 
-    mfn = v->vcpu_info_mfn;
+    if ( (!d->is_dying) && (v->vcpu_id < XEN_LEGACY_MAX_VCPUS) )
+    {
+        memcpy(&shared_info(d, vcpu_info[v->vcpu_id]), v->vcpu_info,
+               sizeof(vcpu_info_t));
+        v->vcpu_info = (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id]);
+    }
+    else
+        v->vcpu_info = &dummy_vcpu_info;
     unmap_domain_page_global((void *)
-                             ((unsigned long)v->vcpu_info & PAGE_MASK));
+                             ((unsigned long)old_info & PAGE_MASK));
 
-    v->vcpu_info = &dummy_vcpu_info;
+    put_page_and_type(mfn_to_page(mfn));
     v->vcpu_info_mfn = INVALID_MFN;
 
-    put_page_and_type(mfn_to_page(mfn));
+    if ( (!d->is_dying) && (v->vcpu_id < XEN_LEGACY_MAX_VCPUS) &&
+         (test_bit(_VPF_down, &v->pause_flags)) )
+    {
+        /* Make sure vcpu_info was set */
+        smp_wmb();
+
+        /*
+         * Mark everything as being pending for all offline VCPUs to make sure
+         * nothing gets lost when this VCPU goes online again.
+         */
+        vcpu_info(v, evtchn_upcall_pending) = 1;
+        for ( i = 0; i < BITS_PER_EVTCHN_WORD(d); i++ )
+            set_bit(i, &vcpu_info(v, evtchn_pending_sel));
+    }
 }
 
 long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -1116,6 +1143,36 @@ long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case VCPUOP_reset_vcpu_info:
+    {
+
+        if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
+        {
+            printk(XENLOG_G_WARNING "%pv: VCPU is up or not current,"
+                   "refusing to reset vcpu_info!\n", v);
+            return -EBUSY;
+        }
+
+        if ( v->evtchn_fifo )
+        {
+            printk(XENLOG_G_WARNING "%pv: FIFO evtchn ABI is being used,"
+                   "refusing to reset vcpu_info!\n", v);
+            return -EBUSY;
+        }
+
+        if ( v->vcpu_info_mfn == INVALID_MFN )
+        {
+            rc = 0;
+            break;
+        }
+
+        domain_lock(d);
+        unmap_vcpu_info(v);
+        domain_unlock(d);
+        rc = 0;
+        break;
+    }
+
     case VCPUOP_register_runstate_memory_area:
     {
         struct vcpu_register_runstate_memory_area area;
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index e888daf..c4d166c 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -227,6 +227,25 @@ struct vcpu_register_time_memory_area {
 typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
 
+/*
+ * Reset all of the vcpu_info information from their previous location
+ * to the default one used at bootup. The following prerequisites should be met:
+ *  1. Domain should be using 2-level event channel ABI. In case another
+ *     non-default ABI is in use the domain is supposed to switch back to
+ *     2-level ABI with EVTCHNOP_reset.
+ *  2. The operation is unsupported for non-current online VCPUs. If performed
+ *     during shutdown/kexec/... a guest domain is supposed to behave in the
+ *     following sequence:
+ *     - Choose special 'shutdown' VCPU, bring all other VCPUs down;
+ *     - Issue VCPUOP_reset_vcpu_info for all offline VCPUs;
+ *     - Issue VCPUOP_reset_vcpu_info for the 'shutdown' VCPU.
+ *
+ * After the call vcpu_info is reset to its default state: first
+ * XEN_LEGACY_MAX_VCPUS VCPUs will get it switched to shared_info and all other
+ * VCPUs will get dummy_vcpu_info.
+ */
+#define VCPUOP_reset_vcpu_info   14
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-19 10:04 ` [PATCH v3 1/1] " Vitaly Kuznetsov
@ 2014-08-19 10:21   ` Andrew Cooper
  2014-08-19 23:22   ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2014-08-19 10:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel; +Cc: Andrew Jones, David Vrabel, Jan Beulich

On 19/08/14 11:04, Vitaly Kuznetsov wrote:
> When an SMP guest performs kexec/kdump it tries issuing VCPUOP_register_vcpu_info.
> This fails due to vcpu_info already being registered. Introduce new vcpu operation
> to reset vcpu_info to its default state.
>
> Based on the original patch by Konrad Rzeszutek Wilk.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes from RFCv2:
> - modify unmap_vcpu_info() to match both needs [Jan Beulich]
> - support VCPUOP_reset_vcpu_info for current VCPU [Jan Beulich]
> - improve description in public/vcpu.h [Jan Beulich]
>
> Changes from RFCv1:
> - Don't use unsuitable unmap_vcpu_info(), rewrite [Jan Beulich]
> - Require FIFO ABI being in 2-level mode
> - Describe limitations in include/public/vcpu.h [Jan Beulich]
> ---
>  xen/arch/x86/hvm/hvm.c    |  1 +
>  xen/common/domain.c       | 73 +++++++++++++++++++++++++++++++++++++++++------
>  xen/include/public/vcpu.h | 19 ++++++++++++
>  3 files changed, 85 insertions(+), 8 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 216c3f2..7917272 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3332,6 +3332,7 @@ static long hvm_vcpu_op(
>      case VCPUOP_set_singleshot_timer:
>      case VCPUOP_stop_singleshot_timer:
>      case VCPUOP_register_vcpu_info:
> +    case VCPUOP_reset_vcpu_info:
>      case VCPUOP_register_vcpu_time_memory_area:
>          rc = do_vcpu_op(cmd, vcpuid, arg);
>          break;
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 44e5cbe..9e10e1e 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -956,25 +956,52 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
>  }
>  
>  /*
> - * Unmap the vcpu info page if the guest decided to place it somewhere
> - * else.  This is only used from arch_domain_destroy, so there's no
> - * need to do anything clever.
> + * Unmap the vcpu info page if the guest decided to place it somewhere else.
> + * There's no need to do anything clever when the domain is dying (when called
> + * from arch_domain_destroy) and we need to prepare for rebinding when it is not
> + * (in case VCPUOP_reset_vcpu_info was called).
>   */
>  void unmap_vcpu_info(struct vcpu *v)
>  {
> +    struct domain *d = v->domain;
>      unsigned long mfn;
> +    vcpu_info_t *old_info;
> +    unsigned int i;
> +
> +    mfn = v->vcpu_info_mfn;
> +    old_info = v->vcpu_info;
>  
> -    if ( v->vcpu_info_mfn == INVALID_MFN )
> +    if ( mfn == INVALID_MFN )
>          return;
>  
> -    mfn = v->vcpu_info_mfn;
> +    if ( (!d->is_dying) && (v->vcpu_id < XEN_LEGACY_MAX_VCPUS) )
> +    {
> +        memcpy(&shared_info(d, vcpu_info[v->vcpu_id]), v->vcpu_info,
> +               sizeof(vcpu_info_t));
> +        v->vcpu_info = (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id]);
> +    }
> +    else
> +        v->vcpu_info = &dummy_vcpu_info;
>      unmap_domain_page_global((void *)
> -                             ((unsigned long)v->vcpu_info & PAGE_MASK));
> +                             ((unsigned long)old_info & PAGE_MASK));
>  
> -    v->vcpu_info = &dummy_vcpu_info;
> +    put_page_and_type(mfn_to_page(mfn));
>      v->vcpu_info_mfn = INVALID_MFN;
>  
> -    put_page_and_type(mfn_to_page(mfn));
> +    if ( (!d->is_dying) && (v->vcpu_id < XEN_LEGACY_MAX_VCPUS) &&
> +         (test_bit(_VPF_down, &v->pause_flags)) )
> +    {
> +        /* Make sure vcpu_info was set */
> +        smp_wmb();
> +
> +        /*
> +         * Mark everything as being pending for all offline VCPUs to make sure
> +         * nothing gets lost when this VCPU goes online again.
> +         */
> +        vcpu_info(v, evtchn_upcall_pending) = 1;
> +        for ( i = 0; i < BITS_PER_EVTCHN_WORD(d); i++ )
> +            set_bit(i, &vcpu_info(v, evtchn_pending_sel));
> +    }
>  }
>  
>  long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
> @@ -1116,6 +1143,36 @@ long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case VCPUOP_reset_vcpu_info:
> +    {
> +
> +        if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
> +        {
> +            printk(XENLOG_G_WARNING "%pv: VCPU is up or not current,"
> +                   "refusing to reset vcpu_info!\n", v);
> +            return -EBUSY;
> +        }
> +
> +        if ( v->evtchn_fifo )
> +        {
> +            printk(XENLOG_G_WARNING "%pv: FIFO evtchn ABI is being used,"
> +                   "refusing to reset vcpu_info!\n", v);
> +            return -EBUSY;
> +        }

These two checks should be in unmap_vcpu_info() similar to
map_vcpu_info(), and unmap_vcpu_info() should return an int.  I am not
sure whether the printks are really needed; a buggy/malicious guest is
very capable of causing logspam with them.

> +
> +        if ( v->vcpu_info_mfn == INVALID_MFN )
> +        {
> +            rc = 0;
> +            break;
> +        }

This check is not safe without the domain lock held.  Drop it entirely,
as the head of unmap_vcpu_info() already does it.

~Andrew

> +
> +        domain_lock(d);
> +        unmap_vcpu_info(v);
> +        domain_unlock(d);
> +        rc = 0;
> +        break;
> +    }
> +
>      case VCPUOP_register_runstate_memory_area:
>      {
>          struct vcpu_register_runstate_memory_area area;
> diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
> index e888daf..c4d166c 100644
> --- a/xen/include/public/vcpu.h
> +++ b/xen/include/public/vcpu.h
> @@ -227,6 +227,25 @@ struct vcpu_register_time_memory_area {
>  typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>  
> +/*
> + * Reset all of the vcpu_info information from their previous location
> + * to the default one used at bootup. The following prerequisites should be met:
> + *  1. Domain should be using 2-level event channel ABI. In case another
> + *     non-default ABI is in use the domain is supposed to switch back to
> + *     2-level ABI with EVTCHNOP_reset.
> + *  2. The operation is unsupported for non-current online VCPUs. If performed
> + *     during shutdown/kexec/... a guest domain is supposed to behave in the
> + *     following sequence:
> + *     - Choose special 'shutdown' VCPU, bring all other VCPUs down;
> + *     - Issue VCPUOP_reset_vcpu_info for all offline VCPUs;
> + *     - Issue VCPUOP_reset_vcpu_info for the 'shutdown' VCPU.
> + *
> + * After the call vcpu_info is reset to its default state: first
> + * XEN_LEGACY_MAX_VCPUS VCPUs will get it switched to shared_info and all other
> + * VCPUs will get dummy_vcpu_info.
> + */
> +#define VCPUOP_reset_vcpu_info   14
> +
>  #endif /* __XEN_PUBLIC_VCPU_H__ */
>  
>  /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-19 10:04 [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info Vitaly Kuznetsov
  2014-08-19 10:04 ` [PATCH v3 1/1] " Vitaly Kuznetsov
@ 2014-08-19 18:59 ` David Vrabel
  2014-08-20  8:43   ` Vitaly Kuznetsov
  2014-08-20 13:37   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 14+ messages in thread
From: David Vrabel @ 2014-08-19 18:59 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: xen-devel, Andrew Jones, David Vrabel, Jan Beulich

On 19/08/14 11:04, Vitaly Kuznetsov wrote:
> The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
>
> VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
> guest. It was tested with the guest code listed below.

Instead of having the guest teardown all these bits of  setup.  I think 
it would be preferable to have the toolstack build a new domain with the 
same memory contents from the original VM.  The toolstack would then 
start this new domain at the kexec entry point.

The advantage of this is you don't need to add new hypercall sub-ops to 
teardown all bits and pieces, both for existing stuff and for anything 
new that might be added.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-19 10:04 ` [PATCH v3 1/1] " Vitaly Kuznetsov
  2014-08-19 10:21   ` Andrew Cooper
@ 2014-08-19 23:22   ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-08-19 23:22 UTC (permalink / raw)
  To: xen-devel, vkuznets; +Cc: drjones, david.vrabel

>>> Vitaly Kuznetsov <vkuznets@redhat.com> 08/19/14 2:20 PM >>>
>Changes from RFCv2:
>- modify unmap_vcpu_info() to match both needs [Jan Beulich]
>- support VCPUOP_reset_vcpu_info for current VCPU [Jan Beulich]

I don't think I had asked for this, and I'm not convinced this is correct to be
done. But anyway - see David's response for a better (more generic)
approach to this.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-19 18:59 ` [PATCH v3 0/1] " David Vrabel
@ 2014-08-20  8:43   ` Vitaly Kuznetsov
  2014-08-20 13:37   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-20  8:43 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Andrew Jones, David Vrabel, Jan Beulich

David Vrabel <dvrabel@cantab.net> writes:

> On 19/08/14 11:04, Vitaly Kuznetsov wrote:
>> The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
>>
>> VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
>> guest. It was tested with the guest code listed below.
>
> Instead of having the guest teardown all these bits of  setup.  I
> think it would be preferable to have the toolstack build a new domain
> with the same memory contents from the original VM.  The toolstack
> would then start this new domain at the kexec entry point.
>
> The advantage of this is you don't need to add new hypercall sub-ops
> to teardown all bits and pieces, both for existing stuff and for
> anything new that might be added.

I agree this might be the more general approach to kexec. However, I
also think that having 'paired'  operations in hypervisor is a nice
thing to have: e.g. if we have EVTCHNOP_init_control that we need an op
to switch back, if there is VCPUOP_register_vcpu_info there should
be a sort of VCPUOP_reset_vcpu_info ...

The other question would be how can toolstack determine that hvm guest
is performing kexec/kdump? I can imaging some sort of a special
toolstack-handled call with new entry point as a parameter.. 

>
> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-19 18:59 ` [PATCH v3 0/1] " David Vrabel
  2014-08-20  8:43   ` Vitaly Kuznetsov
@ 2014-08-20 13:37   ` Konrad Rzeszutek Wilk
  2014-08-20 21:57     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-08-20 13:37 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Vitaly Kuznetsov, Andrew Jones, Jan Beulich, David Vrabel

On Tue, Aug 19, 2014 at 07:59:52PM +0100, David Vrabel wrote:
> On 19/08/14 11:04, Vitaly Kuznetsov wrote:
> >The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
> >
> >VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
> >guest. It was tested with the guest code listed below.
> 
> Instead of having the guest teardown all these bits of  setup.  I think it
> would be preferable to have the toolstack build a new domain with the same
> memory contents from the original VM.  The toolstack would then start this
> new domain at the kexec entry point.

What about kdump case /crash case? We might crash at anytime and would
want to start the kdump kernel which hopefully can reset all of the VCPU
information such that it can boot with more than one VCPU.

> 
> The advantage of this is you don't need to add new hypercall sub-ops to
> teardown all bits and pieces, both for existing stuff and for anything new
> that might be added.

Sure, except that having an setup and teardown paths provide a nice
symetrical states. Doing an 'kexec_guest' hypercall seems to be just
a workaround that and giving up on the symmetry.

My feeling is that we really ought to have 'init' and 'teardown'
for every hypercall. That would also be good to test the locking, memory
leaks, etc.

> 
> David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-20 13:37   ` Konrad Rzeszutek Wilk
@ 2014-08-20 21:57     ` Konrad Rzeszutek Wilk
  2014-08-21 10:35       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-08-20 21:57 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Vitaly Kuznetsov, Andrew Jones, Jan Beulich, David Vrabel

On Wed, Aug 20, 2014 at 09:37:42AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Aug 19, 2014 at 07:59:52PM +0100, David Vrabel wrote:
> > On 19/08/14 11:04, Vitaly Kuznetsov wrote:
> > >The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
> > >
> > >VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
> > >guest. It was tested with the guest code listed below.
> > 
> > Instead of having the guest teardown all these bits of  setup.  I think it
> > would be preferable to have the toolstack build a new domain with the same
> > memory contents from the original VM.  The toolstack would then start this
> > new domain at the kexec entry point.
> 
> What about kdump case /crash case? We might crash at anytime and would
> want to start the kdump kernel which hopefully can reset all of the VCPU
> information such that it can boot with more than one VCPU.
> 
> > 
> > The advantage of this is you don't need to add new hypercall sub-ops to
> > teardown all bits and pieces, both for existing stuff and for anything new
> > that might be added.
> 
> Sure, except that having an setup and teardown paths provide a nice
> symetrical states. Doing an 'kexec_guest' hypercall seems to be just
> a workaround that and giving up on the symmetry.
> 
> My feeling is that we really ought to have 'init' and 'teardown'
> for every hypercall. That would also be good to test the locking, memory
> leaks, etc.

We had a big discussion today at the Xen Developer to talk about it.
The one hypercall option has the appeal that it will reset the 
guest to the initial boot state (minues whatever memory got ballooned out).
The semantics of it are similar to a SCHEDOP_shutdown hypercall, but it would
be a warm_reset type.

I think that going that route and instead of chasing down different
states (event channels, grants, vcpu's, pagetables, etc) we would
wipe everything to a nice clean slate. Maybe the hypercall argument
should be called tabula_rasa :-)

The reason I like this instead of doing a seperate de-alloc hypercalls are:
 1) Cool name (tabule_rasa!)
 2) It would not require complicated code paths to iterate for tearing
    down grants, events, etc.
 3). It is one simple hypercall that could be used by kdump/kexec with an
    understanding of its semantics: it would continue executing after this
    hypercall, it might change the VCPU to a different one (so executing on
    vCPU5 but now we are at VCPU0), IDT and GDTs are reset to their initial
    states, ditto on callbacks, etc. And of course work on both PVHVM and PV(and PVH).

Thoughts?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-20 21:57     ` Konrad Rzeszutek Wilk
@ 2014-08-21 10:35       ` Vitaly Kuznetsov
  2014-08-22  2:27         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-21 10:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, Andrew Jones, David Vrabel, Jan Beulich

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Aug 20, 2014 at 09:37:42AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Tue, Aug 19, 2014 at 07:59:52PM +0100, David Vrabel wrote:
>> > On 19/08/14 11:04, Vitaly Kuznetsov wrote:
>> > >The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
>> > >
>> > >VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
>> > >guest. It was tested with the guest code listed below.
>> > 
>> > Instead of having the guest teardown all these bits of  setup.  I think it
>> > would be preferable to have the toolstack build a new domain with the same
>> > memory contents from the original VM.  The toolstack would then start this
>> > new domain at the kexec entry point.
>> 
>> What about kdump case /crash case? We might crash at anytime and would
>> want to start the kdump kernel which hopefully can reset all of the VCPU
>> information such that it can boot with more than one VCPU.
>> 
>> > 
>> > The advantage of this is you don't need to add new hypercall sub-ops to
>> > teardown all bits and pieces, both for existing stuff and for anything new
>> > that might be added.
>> 
>> Sure, except that having an setup and teardown paths provide a nice
>> symetrical states. Doing an 'kexec_guest' hypercall seems to be just
>> a workaround that and giving up on the symmetry.
>> 
>> My feeling is that we really ought to have 'init' and 'teardown'
>> for every hypercall. That would also be good to test the locking, memory
>> leaks, etc.
>
> We had a big discussion today at the Xen Developer to talk about it.

I regret I missed this one :-)

> The one hypercall option has the appeal that it will reset the 
> guest to the initial boot state (minues whatever memory got ballooned out).
> The semantics of it are similar to a SCHEDOP_shutdown hypercall, but it would
> be a warm_reset type.
>
> I think that going that route and instead of chasing down different
> states (event channels, grants, vcpu's, pagetables, etc) we would
> wipe everything to a nice clean slate. Maybe the hypercall argument
> should be called tabula_rasa :-)
>
> The reason I like this instead of doing a seperate de-alloc hypercalls are:
>  1) Cool name (tabule_rasa!)
>  2) It would not require complicated code paths to iterate for tearing
>     down grants, events, etc.
>  3). It is one simple hypercall that could be used by kdump/kexec with an
>     understanding of its semantics: it would continue executing after this
>     hypercall, it might change the VCPU to a different one (so executing on
>     vCPU5 but now we are at VCPU0), IDT and GDTs are reset to their initial
>     states, ditto on callbacks, etc. And of course work on both PVHVM and PV(and PVH).
>
> Thoughts?

I think we don't necessary need new hypercall, new reason code for
SCHEDOP_shutdown should work (cool name can go there :-). The op we want
to implement is very similar to rename-restart, we need to copy ram and
vcpu contexts before destroying old domain.

Recreating domain and copying all memory should work but we'll require
host to have free memory, this can be an issue for large guests. If we
try implementing 'reassigning' of memory without making a copy that can
lead to same issues we have now: mounted grants, shared info,...

I can try this approach but it's really hard for me to predict how long
it's going to take me.. On the other hand resetting vcpu_info *should*
be enough for the majority of use cases so we'll have 'full kexec
support' relatively fast..

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-21 10:35       ` Vitaly Kuznetsov
@ 2014-08-22  2:27         ` Konrad Rzeszutek Wilk
  2014-08-22  9:08           ` Jan Beulich
  2014-08-25 13:50           ` Vitaly Kuznetsov
  0 siblings, 2 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-08-22  2:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: David Vrabel, xen-devel, Andrew Jones, David Vrabel, Jan Beulich

On Thu, Aug 21, 2014 at 12:35:36PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Wed, Aug 20, 2014 at 09:37:42AM -0400, Konrad Rzeszutek Wilk wrote:
> >> On Tue, Aug 19, 2014 at 07:59:52PM +0100, David Vrabel wrote:
> >> > On 19/08/14 11:04, Vitaly Kuznetsov wrote:
> >> > >The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
> >> > >
> >> > >VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
> >> > >guest. It was tested with the guest code listed below.
> >> > 
> >> > Instead of having the guest teardown all these bits of  setup.  I think it
> >> > would be preferable to have the toolstack build a new domain with the same
> >> > memory contents from the original VM.  The toolstack would then start this
> >> > new domain at the kexec entry point.
> >> 
> >> What about kdump case /crash case? We might crash at anytime and would
> >> want to start the kdump kernel which hopefully can reset all of the VCPU
> >> information such that it can boot with more than one VCPU.
> >> 
> >> > 
> >> > The advantage of this is you don't need to add new hypercall sub-ops to
> >> > teardown all bits and pieces, both for existing stuff and for anything new
> >> > that might be added.
> >> 
> >> Sure, except that having an setup and teardown paths provide a nice
> >> symetrical states. Doing an 'kexec_guest' hypercall seems to be just
> >> a workaround that and giving up on the symmetry.
> >> 
> >> My feeling is that we really ought to have 'init' and 'teardown'
> >> for every hypercall. That would also be good to test the locking, memory
> >> leaks, etc.
> >
> > We had a big discussion today at the Xen Developer to talk about it.
> 
> I regret I missed this one :-)

<nods> It would have been good to have had you here. Would it be possible
for you to be present at the future Xen Hackathons?

> 
> > The one hypercall option has the appeal that it will reset the 
> > guest to the initial boot state (minues whatever memory got ballooned out).
> > The semantics of it are similar to a SCHEDOP_shutdown hypercall, but it would
> > be a warm_reset type.
> >
> > I think that going that route and instead of chasing down different
> > states (event channels, grants, vcpu's, pagetables, etc) we would
> > wipe everything to a nice clean slate. Maybe the hypercall argument
> > should be called tabula_rasa :-)
> >
> > The reason I like this instead of doing a seperate de-alloc hypercalls are:
> >  1) Cool name (tabule_rasa!)
> >  2) It would not require complicated code paths to iterate for tearing
> >     down grants, events, etc.
> >  3). It is one simple hypercall that could be used by kdump/kexec with an
> >     understanding of its semantics: it would continue executing after this
> >     hypercall, it might change the VCPU to a different one (so executing on
> >     vCPU5 but now we are at VCPU0), IDT and GDTs are reset to their initial
> >     states, ditto on callbacks, etc. And of course work on both PVHVM and PV(and PVH).
> >
> > Thoughts?
> 
> I think we don't necessary need new hypercall, new reason code for
> SCHEDOP_shutdown should work (cool name can go there :-). The op we want
> to implement is very similar to rename-restart, we need to copy ram and
> vcpu contexts before destroying old domain.

<nods>
> 
> Recreating domain and copying all memory should work but we'll require
> host to have free memory, this can be an issue for large guests. If we
> try implementing 'reassigning' of memory without making a copy that can
> lead to same issues we have now: mounted grants, shared info,...

That is a good point. Especially with 512GB guests. David, Jan, thoughts?

> 
> I can try this approach but it's really hard for me to predict how long
> it's going to take me.. On the other hand resetting vcpu_info *should*
> be enough for the majority of use cases so we'll have 'full kexec
> support' relatively fast..

I think you are going to hit the grant usage as well, but perhaps not?

> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-22  2:27         ` Konrad Rzeszutek Wilk
@ 2014-08-22  9:08           ` Jan Beulich
  2014-08-22 12:41             ` David Vrabel
  2014-08-25 13:50           ` Vitaly Kuznetsov
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-08-22  9:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Vitaly Kuznetsov
  Cc: David Vrabel, xen-devel, Andrew Jones, David Vrabel

>>> On 22.08.14 at 04:27, <konrad.wilk@oracle.com> wrote:
> On Thu, Aug 21, 2014 at 12:35:36PM +0200, Vitaly Kuznetsov wrote:
>> Recreating domain and copying all memory should work but we'll require
>> host to have free memory, this can be an issue for large guests. If we
>> try implementing 'reassigning' of memory without making a copy that can
>> lead to same issues we have now: mounted grants, shared info,...
> 
> That is a good point. Especially with 512GB guests. David, Jan, thoughts?

No, the idea was really to re-use the memory rather than copy it.
Why would active grants or the use of shared info be a problem
(and particularly one worse than with the vCPU-info-reset
approach)?

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-22  9:08           ` Jan Beulich
@ 2014-08-22 12:41             ` David Vrabel
  2014-08-22 13:23               ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2014-08-22 12:41 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk, Vitaly Kuznetsov
  Cc: David Vrabel, xen-devel, Andrew Jones

On 22/08/14 10:08, Jan Beulich wrote:
>>>> On 22.08.14 at 04:27, <konrad.wilk@oracle.com> wrote:
>> On Thu, Aug 21, 2014 at 12:35:36PM +0200, Vitaly Kuznetsov wrote:
>>> Recreating domain and copying all memory should work but we'll require
>>> host to have free memory, this can be an issue for large guests. If we
>>> try implementing 'reassigning' of memory without making a copy that can
>>> lead to same issues we have now: mounted grants, shared info,...
>>
>> That is a good point. Especially with 512GB guests. David, Jan, thoughts?
> 
> No, the idea was really to re-use the memory rather than copy it.
> Why would active grants or the use of shared info be a problem
> (and particularly one worse than with the vCPU-info-reset
> approach)?

An initial prototype that copies the memory may be a useful first step
as this will be straight-forward (most of the bits can be borrowed from
save/restore).

If the domain has mapped granted pages then the new domain should not
retain the mappings (otherwise you will end up with a domain having
mappings of a grant that does not agree with the domain in the granter's
grant table).

If the domain has granted pages, it should probably copy those pages and
not reuse then (because updating the map tracking info is probably
non-trivial).

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-22 12:41             ` David Vrabel
@ 2014-08-22 13:23               ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-08-22 13:23 UTC (permalink / raw)
  To: David Vrabel, Konrad Rzeszutek Wilk, Vitaly Kuznetsov
  Cc: David Vrabel, xen-devel, Andrew Jones

>>> On 22.08.14 at 14:41, <david.vrabel@citrix.com> wrote:
> If the domain has mapped granted pages then the new domain should not
> retain the mappings (otherwise you will end up with a domain having
> mappings of a grant that does not agree with the domain in the granter's
> grant table).

Of course.

> If the domain has granted pages, it should probably copy those pages and
> not reuse then (because updating the map tracking info is probably
> non-trivial).

I think grants should be dropped in any event (and if the tool stack
establishes any it should re-establish them). Those that have a
mapping elsewhere might indeed need to have their pages copied
if updating the maptrack is too cumbersome.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-22  2:27         ` Konrad Rzeszutek Wilk
  2014-08-22  9:08           ` Jan Beulich
@ 2014-08-25 13:50           ` Vitaly Kuznetsov
  1 sibling, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-25 13:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, Andrew Jones, David Vrabel, Jan Beulich

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Thu, Aug 21, 2014 at 12:35:36PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Wed, Aug 20, 2014 at 09:37:42AM -0400, Konrad Rzeszutek Wilk wrote:
>> >> On Tue, Aug 19, 2014 at 07:59:52PM +0100, David Vrabel wrote:
>> >> > On 19/08/14 11:04, Vitaly Kuznetsov wrote:
>> >> > >The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.
>> >> > >
>> >> > >VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
>> >> > >guest. It was tested with the guest code listed below.
>> >> > 
>> >> > Instead of having the guest teardown all these bits of  setup.  I think it
>> >> > would be preferable to have the toolstack build a new domain with the same
>> >> > memory contents from the original VM.  The toolstack would then start this
>> >> > new domain at the kexec entry point.
>> >> 
>> >> What about kdump case /crash case? We might crash at anytime and would
>> >> want to start the kdump kernel which hopefully can reset all of the VCPU
>> >> information such that it can boot with more than one VCPU.
>> >> 
>> >> > 
>> >> > The advantage of this is you don't need to add new hypercall sub-ops to
>> >> > teardown all bits and pieces, both for existing stuff and for anything new
>> >> > that might be added.
>> >> 
>> >> Sure, except that having an setup and teardown paths provide a nice
>> >> symetrical states. Doing an 'kexec_guest' hypercall seems to be just
>> >> a workaround that and giving up on the symmetry.
>> >> 
>> >> My feeling is that we really ought to have 'init' and 'teardown'
>> >> for every hypercall. That would also be good to test the locking, memory
>> >> leaks, etc.
>> >
>> > We had a big discussion today at the Xen Developer to talk about it.
>> 
>> I regret I missed this one :-)
>
> <nods> It would have been good to have had you here. Would it be possible
> for you to be present at the future Xen Hackathons?
>

Sure, why not, especially if these hackathons are held somewhere in Europe)

>> 
>> > The one hypercall option has the appeal that it will reset the 
>> > guest to the initial boot state (minues whatever memory got ballooned out).
>> > The semantics of it are similar to a SCHEDOP_shutdown hypercall, but it would
>> > be a warm_reset type.
>> >
>> > I think that going that route and instead of chasing down different
>> > states (event channels, grants, vcpu's, pagetables, etc) we would
>> > wipe everything to a nice clean slate. Maybe the hypercall argument
>> > should be called tabula_rasa :-)
>> >
>> > The reason I like this instead of doing a seperate de-alloc hypercalls are:
>> >  1) Cool name (tabule_rasa!)
>> >  2) It would not require complicated code paths to iterate for tearing
>> >     down grants, events, etc.
>> >  3). It is one simple hypercall that could be used by kdump/kexec with an
>> >     understanding of its semantics: it would continue executing after this
>> >     hypercall, it might change the VCPU to a different one (so executing on
>> >     vCPU5 but now we are at VCPU0), IDT and GDTs are reset to their initial
>> >     states, ditto on callbacks, etc. And of course work on both PVHVM and PV(and PVH).
>> >
>> > Thoughts?
>> 
>> I think we don't necessary need new hypercall, new reason code for
>> SCHEDOP_shutdown should work (cool name can go there :-). The op we want
>> to implement is very similar to rename-restart, we need to copy ram and
>> vcpu contexts before destroying old domain.
>
> <nods>
>> 
>> Recreating domain and copying all memory should work but we'll require
>> host to have free memory, this can be an issue for large guests. If we
>> try implementing 'reassigning' of memory without making a copy that can
>> lead to same issues we have now: mounted grants, shared info,...

I just sent my 'copy based approach' series as wip/rfc and I'm going to
take a look how we can implement memory re-assigning..

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-25 13:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 10:04 [PATCH v3 0/1] Introduce VCPUOP_reset_vcpu_info Vitaly Kuznetsov
2014-08-19 10:04 ` [PATCH v3 1/1] " Vitaly Kuznetsov
2014-08-19 10:21   ` Andrew Cooper
2014-08-19 23:22   ` Jan Beulich
2014-08-19 18:59 ` [PATCH v3 0/1] " David Vrabel
2014-08-20  8:43   ` Vitaly Kuznetsov
2014-08-20 13:37   ` Konrad Rzeszutek Wilk
2014-08-20 21:57     ` Konrad Rzeszutek Wilk
2014-08-21 10:35       ` Vitaly Kuznetsov
2014-08-22  2:27         ` Konrad Rzeszutek Wilk
2014-08-22  9:08           ` Jan Beulich
2014-08-22 12:41             ` David Vrabel
2014-08-22 13:23               ` Jan Beulich
2014-08-25 13:50           ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.