All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv2 0/1] Introduce VCPUOP_reset_vcpu_info
@ 2014-08-14  8:28 Vitaly Kuznetsov
  2014-08-14  8:28 ` [PATCH RFCv2 1/1] " Vitaly Kuznetsov
  0 siblings, 1 reply; 5+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-14  8:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Jones, Stefano Stabellini, David Vrabel, Jan Beulich

The patch and guest code are based on the prototype by Konrad Rzeszutek Wilk.

VCPUOP_reset_vcpu_info is required to support kexec performed by smp pvhvm
guest. It was tested with the following guest code:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4fd979e..6e8021c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -183,8 +183,6 @@ static void xen_vcpu_setup(int cpu)
 	 * This path is called twice on PVHVM - first during bootup via
 	 * smp_init -> xen_hvm_cpu_notify, and then if the VCPU is being
 	 * hotplugged: cpu_up -> xen_hvm_cpu_notify.
-	 * As we can only do the VCPUOP_register_vcpu_info once lets
-	 * not over-write its result.
 	 *
 	 * For PV it is called during restore (xen_vcpu_restore) and bootup
 	 * (xen_setup_vcpu_info_placement). The hotplug mechanism does not
@@ -210,10 +208,6 @@ static void xen_vcpu_setup(int cpu)
 	/* Check to see if the hypervisor will put the vcpu_info
 	   structure where we want it, which allows direct access via
 	   a percpu-variable.
-	   N.B. This hypercall can _only_ be called once per CPU. Subsequent
-	   calls will error out with -EINVAL. This is due to the fact that
-	   hypervisor has no unregister variant and this hypercall does not
-	   allow to over-write info.mfn and info.offset.
 	 */
 	err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
 
@@ -228,6 +222,22 @@ static void xen_vcpu_setup(int cpu)
 	}
 }
 
+void xen_teardown_vcpu_setup(int cpu)
+{
+	int err;
+
+	if (!have_vcpu_info_placement)
+		return;
+
+	err = HYPERVISOR_vcpu_op(VCPUOP_reset_vcpu_info, cpu, NULL);
+	if (err) {
+		xen_raw_printk("%s: VCPUOP_reset_vcpu_info rc: %d\n", __func__, err);
+		return;
+	}
+	if (cpu < MAX_VIRT_CPUS)
+		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
+}
+
 /*
  * On restore, set the vcpu placement up again.
  * If it fails, then we're in a bad state, since
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index bc5e897..7c39a38 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -769,10 +769,24 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 #ifdef CONFIG_KEXEC
 void xen_kexec_shutdown(void)
 {
+	int cpu;
+	cpumask_var_t cpu_offline_mask;
+
 	if (!kexec_in_progress)
 		return;
 
+	gnttab_suspend();
+
+	/* Stop all CPUs except for the first one */
+	disable_nonboot_cpus();
+
 	xen_hvm_reset_eventchannels();
+
+	cpumask_andnot(cpu_offline_mask, cpu_present_mask,
+		       cpu_online_mask);
+
+	for_each_cpu(cpu, cpu_offline_mask)
+		xen_teardown_vcpu_setup(cpu);
 }
 #endif
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index d083e82..36dd380 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -53,6 +53,7 @@ void xen_init_irq_ops(void);
 void xen_setup_timer(int cpu);
 void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
+void xen_teardown_vcpu_setup(int cpu);
 cycle_t xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
 void __init xen_init_time_ops(void);
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index b05288c..a0a374c 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -172,4 +172,6 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
 /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
 #define VCPUOP_send_nmi             11
+
+#define VCPUOP_reset_vcpu_info      14
 #endif /* __XEN_PUBLIC_VCPU_H__ */

Vitaly Kuznetsov (1):
  Introduce VCPUOP_reset_vcpu_info

 xen/arch/x86/hvm/hvm.c    |  1 +
 xen/common/domain.c       | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/vcpu.h | 14 +++++++++++
 3 files changed, 76 insertions(+)

-- 
1.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RFCv2 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-14  8:28 [PATCH RFCv2 0/1] Introduce VCPUOP_reset_vcpu_info Vitaly Kuznetsov
@ 2014-08-14  8:28 ` Vitaly Kuznetsov
  2014-08-14 23:25   ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-14  8:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Jones, Stefano Stabellini, David Vrabel, Jan Beulich

When an SMP guest performs kexec/kdump it tries issuing VCPUOP_register_vcpu_info.
This fails due to vcpu_info already being registered. Introduce new vcpu operation
to reset vcpu_info to its default state.

Based on the original patch by Konrad Rzeszutek Wilk.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes from RFCv1:
- Don't use unsuitable unmap_vcpu_info(), rewrite [Jan Beulich]
- Require FIFO ABI being in 2-level mode
- Describe limitations in include/public/vcpu.h [Jan Beulich]
---
 xen/arch/x86/hvm/hvm.c    |  1 +
 xen/common/domain.c       | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/vcpu.h | 14 +++++++++++
 3 files changed, 76 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 216c3f2..7917272 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3332,6 +3332,7 @@ static long hvm_vcpu_op(
     case VCPUOP_set_singleshot_timer:
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
+    case VCPUOP_reset_vcpu_info:
     case VCPUOP_register_vcpu_time_memory_area:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 44e5cbe..889c313 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1116,6 +1116,67 @@ long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case VCPUOP_reset_vcpu_info:
+    {
+        vcpu_info_t *old_info;
+        unsigned int i;
+
+        if ( !test_bit(_VPF_down, &v->pause_flags) )
+        {
+            printk(XENLOG_G_WARNING
+                   "%pv: VCPU is up, refusing to reset vcpu_info!\n", v);
+            return -EBUSY;
+        }
+
+        if ( v->evtchn_fifo )
+        {
+            printk(XENLOG_G_WARNING "%pv: FIFO evtchn ABI is being used,"
+                   "refusing to reset vcpu_info!\n", v);
+            return -EBUSY;
+        }
+
+        if ( v->vcpu_info_mfn == INVALID_MFN )
+        {
+            rc = 0;
+            break;
+        }
+
+        domain_lock(d);
+        old_info = v->vcpu_info;
+
+        if ( vcpuid < XEN_LEGACY_MAX_VCPUS )
+        {
+            memcpy(&shared_info(d, vcpu_info[vcpuid]), v->vcpu_info,
+                   sizeof(vcpu_info_t));
+            v->vcpu_info = (vcpu_info_t *)&shared_info(d, vcpu_info[vcpuid]);
+        }
+        else
+        {
+            v->vcpu_info = &dummy_vcpu_info;
+        }
+        unmap_domain_page_global((void *)
+                                 ((unsigned long)old_info & PAGE_MASK));
+
+        put_page_and_type(mfn_to_page(v->vcpu_info_mfn));
+        v->vcpu_info_mfn = INVALID_MFN;
+
+        /* Make sure vcpu_info was set */
+        smp_wmb();
+
+        if ( vcpuid < XEN_LEGACY_MAX_VCPUS )
+        {
+            /*
+             * Mark everything as being pending to make sure nothing gets lost.
+             */
+            vcpu_info(v, evtchn_upcall_pending) = 1;
+            for ( i = 0; i < BITS_PER_EVTCHN_WORD(d); i++ )
+                set_bit(i, &vcpu_info(v, evtchn_pending_sel));
+        }
+        domain_unlock(d);
+        rc = 0;
+        break;
+    }
+
     case VCPUOP_register_runstate_memory_area:
     {
         struct vcpu_register_runstate_memory_area area;
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index e888daf..81bc1b9 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -227,6 +227,20 @@ struct vcpu_register_time_memory_area {
 typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
 
+/*
+ * Reset all of the vcpu_info information from their previous location
+ * to the default one used at bootup. The following prerequisites should be met:
+ *  1. VCPU should be switched off. This means the operation is unsupported for
+ *     boot VCPU.
+ *  2. Domain should not be using FIFO-based event channel ABI. In case it is in
+ *     use domain is supposed to switch back to 2-level ABI with EVTCHNOP_reset.
+ *
+ * After the call vcpu_info is reset to its default state: first
+ * XEN_LEGACY_MAX_VCPUS VCPUs will get it switched to shared_info and all other
+ * VCPUs will get dummy_vcpu_info.
+ */
+#define VCPUOP_reset_vcpu_info   14
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFCv2 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-14  8:28 ` [PATCH RFCv2 1/1] " Vitaly Kuznetsov
@ 2014-08-14 23:25   ` Jan Beulich
  2014-08-15  9:55     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2014-08-14 23:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Andrew Jones, David Vrabel, Stefano Stabellini

>>> On 14.08.14 at 10:28, <vkuznets@redhat.com> wrote:
> Changes from RFCv1:
> - Don't use unsuitable unmap_vcpu_info(), rewrite [Jan Beulich]

I don't think I said that - you now basically open code most what
is in that function. Instead I was rather hinting towards adding a
flag to the function to adjust the function's behavior to the then
two different purposes.

> --- a/xen/include/public/vcpu.h
> +++ b/xen/include/public/vcpu.h
> @@ -227,6 +227,20 @@ struct vcpu_register_time_memory_area {
>  typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>  
> +/*
> + * Reset all of the vcpu_info information from their previous location
> + * to the default one used at bootup. The following prerequisites should be met:
> + *  1. VCPU should be switched off. This means the operation is unsupported for
> + *     boot VCPU.

Boot vCPU? I don't think this should be written more restrictively
than it is. Just the one CPU doing the resets can't be resetting
itself. With some effort I think one could even get a reset for all
of them working.

> + *  2. Domain should not be using FIFO-based event channel ABI. In case it is in
> + *     use domain is supposed to switch back to 2-level ABI with EVTCHNOP_reset.

I'd prefer this to be worded more generically: Avoid mentioning
the FIFO ABI, and just require resetting _any_ (current or future)
non-default event channel ABI to be switched back to 2-level.

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFCv2 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-14 23:25   ` Jan Beulich
@ 2014-08-15  9:55     ` Vitaly Kuznetsov
  2014-08-15 15:07       ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-15  9:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Jones, David Vrabel, Stefano Stabellini

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 14.08.14 at 10:28, <vkuznets@redhat.com> wrote:
>> Changes from RFCv1:
>> - Don't use unsuitable unmap_vcpu_info(), rewrite [Jan Beulich]
>
> I don't think I said that - you now basically open code most what
> is in that function. Instead I was rather hinting towards adding a
> flag to the function to adjust the function's behavior to the then
> two different purposes.
>

Sorry I misunderstood.. I'll rewrite and send as first non-RFC.

>> --- a/xen/include/public/vcpu.h
>> +++ b/xen/include/public/vcpu.h
>> @@ -227,6 +227,20 @@ struct vcpu_register_time_memory_area {
>>  typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
>>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>>  
>> +/*
>> + * Reset all of the vcpu_info information from their previous location
>> + * to the default one used at bootup. The following prerequisites should be met:
>> + *  1. VCPU should be switched off. This means the operation is unsupported for
>> + *     boot VCPU.
>
> Boot vCPU? I don't think this should be written more restrictively
> than it is. Just the one CPU doing the resets can't be resetting
> itself. With some effort I think one could even get a reset for all
> of them working.

I thought about something simillar to what we have in map_vcpu_info:
allow the operation for all offline VCPUs and the current one. But then
I realized that boot VCPU doesn't perform VCPUOP_register_vcpu_info in
current linux kernel so there is no need to unregister atm..

I'll play with it a bit more.

>
>> + *  2. Domain should not be using FIFO-based event channel ABI. In case it is in
>> + *     use domain is supposed to switch back to 2-level ABI with EVTCHNOP_reset.
>
> I'd prefer this to be worded more generically: Avoid mentioning
> the FIFO ABI, and just require resetting _any_ (current or future)
> non-default event channel ABI to be switched back to 2-level.

Sure, thanks!

>
> Jan

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFCv2 1/1] Introduce VCPUOP_reset_vcpu_info
  2014-08-15  9:55     ` Vitaly Kuznetsov
@ 2014-08-15 15:07       ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2014-08-15 15:07 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Andrew Jones, David Vrabel, Stefano Stabellini

>>> On 15.08.14 at 11:55, <vkuznets@redhat.com> wrote:
> "Jan Beulich" <JBeulich@suse.com> writes:
>>>>> On 14.08.14 at 10:28, <vkuznets@redhat.com> wrote:
>>> --- a/xen/include/public/vcpu.h
>>> +++ b/xen/include/public/vcpu.h
>>> @@ -227,6 +227,20 @@ struct vcpu_register_time_memory_area {
>>>  typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
>>>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>>>  
>>> +/*
>>> + * Reset all of the vcpu_info information from their previous location
>>> + * to the default one used at bootup. The following prerequisites should be met:
>>> + *  1. VCPU should be switched off. This means the operation is unsupported for
>>> + *     boot VCPU.
>>
>> Boot vCPU? I don't think this should be written more restrictively
>> than it is. Just the one CPU doing the resets can't be resetting
>> itself. With some effort I think one could even get a reset for all
>> of them working.
> 
> I thought about something simillar to what we have in map_vcpu_info:
> allow the operation for all offline VCPUs and the current one. But then
> I realized that boot VCPU doesn't perform VCPUOP_register_vcpu_info in
> current linux kernel so there is no need to unregister atm..

But you always ought to remember that when your changing
hypervisor code or the public interface (including comments) that
nothing here should b Linux-centric, i.e. you want to describe
abstract behavior here, not how Linux might be making use of it.

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-15 15:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-14  8:28 [PATCH RFCv2 0/1] Introduce VCPUOP_reset_vcpu_info Vitaly Kuznetsov
2014-08-14  8:28 ` [PATCH RFCv2 1/1] " Vitaly Kuznetsov
2014-08-14 23:25   ` Jan Beulich
2014-08-15  9:55     ` Vitaly Kuznetsov
2014-08-15 15:07       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.