linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
@ 2015-04-24  9:30 Ouyang Zhaowei (Charles)
  2015-04-25 23:31 ` Boris Ostrovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Ouyang Zhaowei (Charles) @ 2015-04-24  9:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian, Ouyangzhaowei (Charles)

If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
finish the job.
So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.

Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
---
 arch/x86/xen/suspend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index d949769..b2bed45 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
        int cpu;
-       xen_hvm_init_shared_info();
+       if (!suspend_cancelled)
+               xen_hvm_init_shared_info();
        xen_callback_vector();
        xen_unplug_emulated_devices();
        if (xen_feature(XENFEAT_hvm_safe_pvclock)) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
  2015-04-24  9:30 [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu Ouyang Zhaowei (Charles)
@ 2015-04-25 23:31 ` Boris Ostrovsky
  2015-04-28 12:30   ` Ouyang Zhaowei (Charles)
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2015-04-25 23:31 UTC (permalink / raw)
  To: Ouyang Zhaowei (Charles), Konrad Rzeszutek Wilk, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian


On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
> finish the job.
> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>
> Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
> ---
>   arch/x86/xen/suspend.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index d949769..b2bed45 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>   {
>   #ifdef CONFIG_XEN_PVHVM
>          int cpu;
> -       xen_hvm_init_shared_info();
> +       if (!suspend_cancelled)
> +               xen_hvm_init_shared_info();
>          xen_callback_vector();
>          xen_unplug_emulated_devices();
>          if (xen_feature(XENFEAT_hvm_safe_pvclock)) {

Do we need to call other routines if suspend is canceled?

Also, if suspend is canceled then we don't do xen_irq_resume() if that's 
what you meant by "vcpu_info will be reinitialized". Were you referring 
some other re-initialization?

(The patch itself looks like the right thing to do though).

-boris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
  2015-04-25 23:31 ` Boris Ostrovsky
@ 2015-04-28 12:30   ` Ouyang Zhaowei (Charles)
  2015-04-28 21:31     ` Boris Ostrovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Ouyang Zhaowei (Charles) @ 2015-04-28 12:30 UTC (permalink / raw)
  To: Boris Ostrovsky, Konrad Rzeszutek Wilk, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian



On 2015.4.26 7:31, Boris Ostrovsky wrote:
> 
> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
>> finish the job.
>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>>
>> Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
>> ---
>>   arch/x86/xen/suspend.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> index d949769..b2bed45 100644
>> --- a/arch/x86/xen/suspend.c
>> +++ b/arch/x86/xen/suspend.c
>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>   {
>>   #ifdef CONFIG_XEN_PVHVM
>>          int cpu;
>> -       xen_hvm_init_shared_info();
>> +       if (!suspend_cancelled)
>> +               xen_hvm_init_shared_info();
>>          xen_callback_vector();
>>          xen_unplug_emulated_devices();
>>          if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> 
> Do we need to call other routines if suspend is canceled?
> 
> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization?
> 
Hi Boris,

Sorry I didn't make myself clear.

About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all
point to HYPERVISOR_shared_info->vcpu_info[cpu].

void __ref xen_hvm_init_shared_info(void)
----
1702          * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
1703          * online but xen_hvm_init_shared_info is run at resume time too and
1704          * in that case multiple vcpus might be online. */
1705         for_each_online_cpu(cpu) {
1706                 /* Leave it to be NULL. */
1707                 if (cpu >= MAX_VIRT_CPUS)
1708                         continue;
1709                 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
1710         }
1711 }


But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to  HYPERVISOR_shared_info->vcpu_info[0]

asmlinkage __visible void __init xen_start_kernel(void)
----
1563         /* Don't do the full vcpu_info placement stuff until we have a
1564            possible map and a non-dummy shared_info. */
1565         per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
1566
1567         local_irq_disable();

Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup().

So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place.
This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if
suspend is canceled.

> (The patch itself looks like the right thing to do though).
> 
> -boris
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
  2015-04-28 12:30   ` Ouyang Zhaowei (Charles)
@ 2015-04-28 21:31     ` Boris Ostrovsky
  2015-04-30  7:27       ` Ouyang Zhaowei (Charles)
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2015-04-28 21:31 UTC (permalink / raw)
  To: Ouyang Zhaowei (Charles), Konrad Rzeszutek Wilk, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian

On 04/28/2015 08:30 AM, Ouyang Zhaowei (Charles) wrote:
>
> On 2015.4.26 7:31, Boris Ostrovsky wrote:
>> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
>>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
>>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
>>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
>>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
>>> finish the job.
>>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>>>
>>> Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
>>> ---
>>>    arch/x86/xen/suspend.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>>> index d949769..b2bed45 100644
>>> --- a/arch/x86/xen/suspend.c
>>> +++ b/arch/x86/xen/suspend.c
>>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>>    {
>>>    #ifdef CONFIG_XEN_PVHVM
>>>           int cpu;
>>> -       xen_hvm_init_shared_info();
>>> +       if (!suspend_cancelled)
>>> +               xen_hvm_init_shared_info();
>>>           xen_callback_vector();
>>>           xen_unplug_emulated_devices();
>>>           if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>> Do we need to call other routines if suspend is canceled?
>>
>> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization?
>>
> Hi Boris,
>
> Sorry I didn't make myself clear.
>
> About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all
> point to HYPERVISOR_shared_info->vcpu_info[cpu].
>
> void __ref xen_hvm_init_shared_info(void)
> ----
> 1702          * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
> 1703          * online but xen_hvm_init_shared_info is run at resume time too and
> 1704          * in that case multiple vcpus might be online. */
> 1705         for_each_online_cpu(cpu) {
> 1706                 /* Leave it to be NULL. */
> 1707                 if (cpu >= MAX_VIRT_CPUS)
> 1708                         continue;
> 1709                 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
> 1710         }
> 1711 }
>
>
> But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to  HYPERVISOR_shared_info->vcpu_info[0]
>
> asmlinkage __visible void __init xen_start_kernel(void)


We are talking about HVM guests here and xen_start_kernel is only called 
for PV.  But even if it was, xen_vcpu pointers for other VCPUs are set 
in xen_vcpu_setup(), which is called when non-boot VCPUs are coming up.

And I wonder whether the actual problem is that we don't call 
xen_vcpu_setup() on canceled suspend (as we don't need to, really) and 
therefore if we call xen_hvm_init_shared_info() then 
per_cpu(xen_vcpu,cpu) for *non-boot* cpus is will become wrong.

-boris

> ----
> 1563         /* Don't do the full vcpu_info placement stuff until we have a
> 1564            possible map and a non-dummy shared_info. */
> 1565         per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
> 1566
> 1567         local_irq_disable();
>
> Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup().
>
> So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place.
> This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if
> suspend is canceled.
>
>> (The patch itself looks like the right thing to do though).
>>
>> -boris
>>
>> .
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
  2015-04-28 21:31     ` Boris Ostrovsky
@ 2015-04-30  7:27       ` Ouyang Zhaowei (Charles)
  2015-05-01 18:55         ` Boris Ostrovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Ouyang Zhaowei (Charles) @ 2015-04-30  7:27 UTC (permalink / raw)
  To: Boris Ostrovsky, Konrad Rzeszutek Wilk, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian, herongguang.he



On 2015.4.29 5:31, Boris Ostrovsky wrote:
> On 04/28/2015 08:30 AM, Ouyang Zhaowei (Charles) wrote:
>>
>> On 2015.4.26 7:31, Boris Ostrovsky wrote:
>>> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
>>>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
>>>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
>>>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
>>>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
>>>> finish the job.
>>>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>>>>
>>>> Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
>>>> ---
>>>>    arch/x86/xen/suspend.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>>>> index d949769..b2bed45 100644
>>>> --- a/arch/x86/xen/suspend.c
>>>> +++ b/arch/x86/xen/suspend.c
>>>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>>>    {
>>>>    #ifdef CONFIG_XEN_PVHVM
>>>>           int cpu;
>>>> -       xen_hvm_init_shared_info();
>>>> +       if (!suspend_cancelled)
>>>> +               xen_hvm_init_shared_info();
>>>>           xen_callback_vector();
>>>>           xen_unplug_emulated_devices();
>>>>           if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>>> Do we need to call other routines if suspend is canceled?
>>>
>>> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization?
>>>
>> Hi Boris,
>>
>> Sorry I didn't make myself clear.
>>
>> About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all
>> point to HYPERVISOR_shared_info->vcpu_info[cpu].
>>
>> void __ref xen_hvm_init_shared_info(void)
>> ----
>> 1702          * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
>> 1703          * online but xen_hvm_init_shared_info is run at resume time too and
>> 1704          * in that case multiple vcpus might be online. */
>> 1705         for_each_online_cpu(cpu) {
>> 1706                 /* Leave it to be NULL. */
>> 1707                 if (cpu >= MAX_VIRT_CPUS)
>> 1708                         continue;
>> 1709                 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
>> 1710         }
>> 1711 }
>>
>>
>> But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to  HYPERVISOR_shared_info->vcpu_info[0]
>>
>> asmlinkage __visible void __init xen_start_kernel(void)
> 
> 
> We are talking about HVM guests here and xen_start_kernel is only called for PV.  But even if it was, xen_vcpu pointers for other VCPUs are set in xen_vcpu_setup(), which is called when non-boot VCPUs are coming up.
> 
> And I wonder whether the actual problem is that we don't call xen_vcpu_setup() on canceled suspend (as we don't need to, really) and therefore if we call xen_hvm_init_shared_info() then per_cpu(xen_vcpu,cpu) for *non-boot* cpus is will become wrong.
> 

Yes, you are right, in xen_vcpu_setup() non-boot VCPUs is set to point to xen_vcpu_info

static void xen_vcpu_setup(int cpu)
----
 208         vcpup = &per_cpu(xen_vcpu_info, cpu);
...
 227                 /* This cpu is using the registered vcpu info, even if
 228                    later ones fail to. */
 229                 per_cpu(xen_vcpu, cpu) = vcpup;

But on canceled suspend if we call xen_hvm_init_shared_info(), the non-boot VCPUS will be reset to point to HYPERVISOR_shared_info->vcpu_info[cpu] which is a wrong address.
So I suggest we don't call xen_hvm_init_shared_info() when suspend is canceled.

> -boris
> 
>> ----
>> 1563         /* Don't do the full vcpu_info placement stuff until we have a
>> 1564            possible map and a non-dummy shared_info. */
>> 1565         per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
>> 1566
>> 1567         local_irq_disable();
>>
>> Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup().
>>
>> So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place.
>> This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if
>> suspend is canceled.
>>
>>> (The patch itself looks like the right thing to do though).
>>>
>>> -boris
>>>
>>> .
>>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
  2015-04-30  7:27       ` Ouyang Zhaowei (Charles)
@ 2015-05-01 18:55         ` Boris Ostrovsky
  2015-05-04  2:25           ` Ouyang Zhaowei (Charles)
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2015-05-01 18:55 UTC (permalink / raw)
  To: Ouyang Zhaowei (Charles), Konrad Rzeszutek Wilk, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian, herongguang.he

On 04/30/2015 03:27 AM, Ouyang Zhaowei (Charles) wrote:
>
> On 2015.4.29 5:31, Boris Ostrovsky wrote:
>> On 04/28/2015 08:30 AM, Ouyang Zhaowei (Charles) wrote:
>>> On 2015.4.26 7:31, Boris Ostrovsky wrote:
>>>> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
>>>>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
>>>>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
>>>>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
>>>>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
>>>>> finish the job.
>>>>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>>>>>
>>>>> Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
>>>>> ---
>>>>>     arch/x86/xen/suspend.c | 3 ++-
>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>>>>> index d949769..b2bed45 100644
>>>>> --- a/arch/x86/xen/suspend.c
>>>>> +++ b/arch/x86/xen/suspend.c
>>>>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>>>>     {
>>>>>     #ifdef CONFIG_XEN_PVHVM
>>>>>            int cpu;
>>>>> -       xen_hvm_init_shared_info();
>>>>> +       if (!suspend_cancelled)
>>>>> +               xen_hvm_init_shared_info();
>>>>>            xen_callback_vector();
>>>>>            xen_unplug_emulated_devices();
>>>>>            if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>>>> Do we need to call other routines if suspend is canceled?
>>>>
>>>> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization?
>>>>
>>> Hi Boris,
>>>
>>> Sorry I didn't make myself clear.
>>>
>>> About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all
>>> point to HYPERVISOR_shared_info->vcpu_info[cpu].
>>>
>>> void __ref xen_hvm_init_shared_info(void)
>>> ----
>>> 1702          * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
>>> 1703          * online but xen_hvm_init_shared_info is run at resume time too and
>>> 1704          * in that case multiple vcpus might be online. */
>>> 1705         for_each_online_cpu(cpu) {
>>> 1706                 /* Leave it to be NULL. */
>>> 1707                 if (cpu >= MAX_VIRT_CPUS)
>>> 1708                         continue;
>>> 1709                 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
>>> 1710         }
>>> 1711 }
>>>
>>>
>>> But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to  HYPERVISOR_shared_info->vcpu_info[0]
>>>
>>> asmlinkage __visible void __init xen_start_kernel(void)
>>
>> We are talking about HVM guests here and xen_start_kernel is only called for PV.  But even if it was, xen_vcpu pointers for other VCPUs are set in xen_vcpu_setup(), which is called when non-boot VCPUs are coming up.
>>
>> And I wonder whether the actual problem is that we don't call xen_vcpu_setup() on canceled suspend (as we don't need to, really) and therefore if we call xen_hvm_init_shared_info() then per_cpu(xen_vcpu,cpu) for *non-boot* cpus is will become wrong.
>>
> Yes, you are right, in xen_vcpu_setup() non-boot VCPUs is set to point to xen_vcpu_info
>
> static void xen_vcpu_setup(int cpu)
> ----
>   208         vcpup = &per_cpu(xen_vcpu_info, cpu);
> ...
>   227                 /* This cpu is using the registered vcpu info, even if
>   228                    later ones fail to. */
>   229                 per_cpu(xen_vcpu, cpu) = vcpup;
>
> But on canceled suspend if we call xen_hvm_init_shared_info(), the non-boot VCPUS will be reset to point to HYPERVISOR_shared_info->vcpu_info[cpu] which is a wrong address.
> So I suggest we don't call xen_hvm_init_shared_info() when suspend is canceled.


Right, so can you resubmit the patch with updated commit message? (Just 
note there that the hypervisor continues assuming that vcpu_info is 
stored in per-cpu data which was set up by xen_vcpu_setup(), while the 
call to xen_hvm_init_shared_info() will now make the guest think that 
vcpu_info is in the shared page).

Thanks.
-boris

>
>> -boris
>>
>>> ----
>>> 1563         /* Don't do the full vcpu_info placement stuff until we have a
>>> 1564            possible map and a non-dummy shared_info. */
>>> 1565         per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
>>> 1566
>>> 1567         local_irq_disable();
>>>
>>> Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup().
>>>
>>> So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place.
>>> This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if
>>> suspend is canceled.
>>>
>>>> (The patch itself looks like the right thing to do though).
>>>>
>>>> -boris
>>>>
>>>> .
>>>>
>>
>> .
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu
  2015-05-01 18:55         ` Boris Ostrovsky
@ 2015-05-04  2:25           ` Ouyang Zhaowei (Charles)
  0 siblings, 0 replies; 7+ messages in thread
From: Ouyang Zhaowei (Charles) @ 2015-05-04  2:25 UTC (permalink / raw)
  To: Boris Ostrovsky, Konrad Rzeszutek Wilk, David Vrabel
  Cc: linux-kernel, Dingweiping, Yanqiangjun, jinjian, herongguang.he



On 2015.5.2 2:55, Boris Ostrovsky wrote:
> On 04/30/2015 03:27 AM, Ouyang Zhaowei (Charles) wrote:
>>
>> On 2015.4.29 5:31, Boris Ostrovsky wrote:
>>> On 04/28/2015 08:30 AM, Ouyang Zhaowei (Charles) wrote:
>>>> On 2015.4.26 7:31, Boris Ostrovsky wrote:
>>>>> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
>>>>>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
>>>>>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
>>>>>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
>>>>>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
>>>>>> finish the job.
>>>>>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>>>>>>
>>>>>> Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
>>>>>> ---
>>>>>>     arch/x86/xen/suspend.c | 3 ++-
>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>>>>>> index d949769..b2bed45 100644
>>>>>> --- a/arch/x86/xen/suspend.c
>>>>>> +++ b/arch/x86/xen/suspend.c
>>>>>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>>>>>     {
>>>>>>     #ifdef CONFIG_XEN_PVHVM
>>>>>>            int cpu;
>>>>>> -       xen_hvm_init_shared_info();
>>>>>> +       if (!suspend_cancelled)
>>>>>> +               xen_hvm_init_shared_info();
>>>>>>            xen_callback_vector();
>>>>>>            xen_unplug_emulated_devices();
>>>>>>            if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>>>>> Do we need to call other routines if suspend is canceled?
>>>>>
>>>>> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization?
>>>>>
>>>> Hi Boris,
>>>>
>>>> Sorry I didn't make myself clear.
>>>>
>>>> About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all
>>>> point to HYPERVISOR_shared_info->vcpu_info[cpu].
>>>>
>>>> void __ref xen_hvm_init_shared_info(void)
>>>> ----
>>>> 1702          * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
>>>> 1703          * online but xen_hvm_init_shared_info is run at resume time too and
>>>> 1704          * in that case multiple vcpus might be online. */
>>>> 1705         for_each_online_cpu(cpu) {
>>>> 1706                 /* Leave it to be NULL. */
>>>> 1707                 if (cpu >= MAX_VIRT_CPUS)
>>>> 1708                         continue;
>>>> 1709                 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
>>>> 1710         }
>>>> 1711 }
>>>>
>>>>
>>>> But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to  HYPERVISOR_shared_info->vcpu_info[0]
>>>>
>>>> asmlinkage __visible void __init xen_start_kernel(void)
>>>
>>> We are talking about HVM guests here and xen_start_kernel is only called for PV.  But even if it was, xen_vcpu pointers for other VCPUs are set in xen_vcpu_setup(), which is called when non-boot VCPUs are coming up.
>>>
>>> And I wonder whether the actual problem is that we don't call xen_vcpu_setup() on canceled suspend (as we don't need to, really) and therefore if we call xen_hvm_init_shared_info() then per_cpu(xen_vcpu,cpu) for *non-boot* cpus is will become wrong.
>>>
>> Yes, you are right, in xen_vcpu_setup() non-boot VCPUs is set to point to xen_vcpu_info
>>
>> static void xen_vcpu_setup(int cpu)
>> ----
>>   208         vcpup = &per_cpu(xen_vcpu_info, cpu);
>> ...
>>   227                 /* This cpu is using the registered vcpu info, even if
>>   228                    later ones fail to. */
>>   229                 per_cpu(xen_vcpu, cpu) = vcpup;
>>
>> But on canceled suspend if we call xen_hvm_init_shared_info(), the non-boot VCPUS will be reset to point to HYPERVISOR_shared_info->vcpu_info[cpu] which is a wrong address.
>> So I suggest we don't call xen_hvm_init_shared_info() when suspend is canceled.
> 
> 
> Right, so can you resubmit the patch with updated commit message? (Just note there that the hypervisor continues assuming that vcpu_info is stored in per-cpu data which was set up by xen_vcpu_setup(), while the call to xen_hvm_init_shared_info() will now make the guest think that vcpu_info is in the shared page).
> 
> Thanks.
> -boris

OK, Thank for the advise, I'll resend the patch now

> 
>>
>>> -boris
>>>
>>>> ----
>>>> 1563         /* Don't do the full vcpu_info placement stuff until we have a
>>>> 1564            possible map and a non-dummy shared_info. */
>>>> 1565         per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
>>>> 1566
>>>> 1567         local_irq_disable();
>>>>
>>>> Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup().
>>>>
>>>> So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place.
>>>> This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if
>>>> suspend is canceled.
>>>>
>>>>> (The patch itself looks like the right thing to do though).
>>>>>
>>>>> -boris
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-04  2:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-24  9:30 [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu Ouyang Zhaowei (Charles)
2015-04-25 23:31 ` Boris Ostrovsky
2015-04-28 12:30   ` Ouyang Zhaowei (Charles)
2015-04-28 21:31     ` Boris Ostrovsky
2015-04-30  7:27       ` Ouyang Zhaowei (Charles)
2015-05-01 18:55         ` Boris Ostrovsky
2015-05-04  2:25           ` Ouyang Zhaowei (Charles)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).