xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Linux PV/PVH domU crash on (guest) resume from suspend
@ 2021-02-17  5:12 Marek Marczykowski-Górecki
  2021-02-17  6:51 ` Jürgen Groß
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-02-17  5:12 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4543 bytes --]

Hi,

I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
this with:

    virsh -c xen dompmsuspend <vmname> mem
    virsh -c xen dompmwakeup <vmname> 

But it's possible to trigger it with plain xl too:

    xl save -c <vmname> <some-file>

The same on HVM works fine.

This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
relevant here. I can reliably reproduce it.

The crash message:

[  219.844995] Freezing user space processes ... (elapsed 0.011 seconds) done.
[  219.856564] OOM killer disabled.
[  219.856566] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  277.562118] register_vcpu_info failed: cpu=0 err=-22
[  219.858384] xen:grant_table: Grant tables using version 1 layout
[  219.858442] ------------[ cut here ]------------
[  219.858446] kernel BUG at drivers/xen/events/events_fifo.c:369!
[  219.858503] invalid opcode: 0000 [#1] SMP NOPTI
[  219.858511] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.4.90-1.qubes.x86_64 #1
[  219.858527] RIP: e030:evtchn_fifo_resume+0x58/0x90
[  219.858532] Code: eb 48 8b 04 dd 80 29 3e 82 4e 8b 04 20 4d 85 c0 74 d5 48 0f a3 1d b8 40 20 01 73 10 4c 89 c6 89 ef e8 5c fb ff ff 85 c0 79 bd <0f> 0b 31 f6 4c 89 c7 e8 7c 8a c8 ff 48 8b 04 dd 80 29 3e 82 4a c7
[  219.858538] RSP: e02b:ffffc9000025be10 EFLAGS: 00010082
[  219.858542] RAX: ffffffffffffffea RBX: 0000000000000000 RCX: 0000000000000000
[  219.858545] RDX: ffff888018400000 RSI: ffffc9000025bde0 RDI: 000000000000000b
[  219.858548] RBP: 0000000000000000 R08: ffff888018143000 R09: 00000000000001e0
[  219.858552] R10: ffff88800e50f440 R11: ffffc9000025bcbd R12: 0000000000026ea0
[  219.858555] R13: 0000000000000000 R14: ffffc9000029bdf8 R15: 0000000000000003
[  219.858567] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
[  219.858571] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  219.858574] CR2: 0000581c2753e030 CR3: 000000000260a000 CR4: 0000000000000660
[  219.858578] Call Trace:
[  219.858615]  xen_irq_resume+0x1b/0xe0
[  219.858620]  xen_suspend+0x13e/0x190
[  219.858626]  multi_cpu_stop+0x6c/0x100
[  219.858630]  ? stop_machine_yield+0x10/0x10
[  219.858633]  cpu_stopper_thread+0xb0/0x110
[  219.858638]  smpboot_thread_fn+0xc5/0x160
[  219.858641]  ? smpboot_register_percpu_thread+0xf0/0xf0
[  219.858645]  kthread+0x115/0x140
[  219.858648]  ? __kthread_bind_mask+0x60/0x60
[  219.858653]  ret_from_fork+0x22/0x40
[  219.858657] Modules linked in: nf_conntrack_netlink nft_reject_ipv4 nft_reject xt_nat nf_tables_set nft_ct nf_tables nfnetlink e1000e rfkill xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables edac_mce_amd pcspkr ipt_REJECT nf_reject_ipv4 xen_pcifront xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn drm fuse ip_tables overlay xen_blkfront
[  219.858754] ---[ end trace 54d868ea756768db ]---
[  219.858758] RIP: e030:evtchn_fifo_resume+0x58/0x90
[  219.858762] Code: eb 48 8b 04 dd 80 29 3e 82 4e 8b 04 20 4d 85 c0 74 d5 48 0f a3 1d b8 40 20 01 73 10 4c 89 c6 89 ef e8 5c fb ff ff 85 c0 79 bd <0f> 0b 31 f6 4c 89 c7 e8 7c 8a c8 ff 48 8b 04 dd 80 29 3e 82 4a c7
[  219.858768] RSP: e02b:ffffc9000025be10 EFLAGS: 00010082
[  219.858770] RAX: ffffffffffffffea RBX: 0000000000000000 RCX: 0000000000000000
[  219.858774] RDX: ffff888018400000 RSI: ffffc9000025bde0 RDI: 000000000000000b
[  219.858777] RBP: 0000000000000000 R08: ffff888018143000 R09: 00000000000001e0
[  219.858780] R10: ffff88800e50f440 R11: ffffc9000025bcbd R12: 0000000000026ea0
[  219.858783] R13: 0000000000000000 R14: ffffc9000029bdf8 R15: 0000000000000003
[  219.858790] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
[  219.858793] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  219.858796] CR2: 0000581c2753e030 CR3: 000000000260a000 CR4: 0000000000000660
[  219.858801] Kernel panic - not syncing: Fatal exception
[  219.858819] Kernel Offset: disabled

Note the time besides "register_vcpu_info failed" - it is in the future.
I think the offset depends on the uptime, with shorter uptime I get much
smaller difference (like 49 vs 51).

Any ideas?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-17  5:12 Linux PV/PVH domU crash on (guest) resume from suspend Marek Marczykowski-Górecki
@ 2021-02-17  6:51 ` Jürgen Groß
  2021-02-17 13:48   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2021-02-17  6:51 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 893 bytes --]

On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
> this with:
> 
>      virsh -c xen dompmsuspend <vmname> mem
>      virsh -c xen dompmwakeup <vmname>
> 
> But it's possible to trigger it with plain xl too:
> 
>      xl save -c <vmname> <some-file>
> 
> The same on HVM works fine.
> 
> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
> relevant here. I can reliably reproduce it.

This is already on my list of issues to look at.

The problem seems to be related to the XSA-332 patches. You could try
the patches I've sent out recently addressing other fallout from XSA-332
which _might_ fix this issue, too:

https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-17  6:51 ` Jürgen Groß
@ 2021-02-17 13:48   ` Marek Marczykowski-Górecki
  2021-02-17 15:37     ` Jürgen Groß
  2021-02-19 12:48     ` Jürgen Groß
  0 siblings, 2 replies; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-02-17 13:48 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
> > this with:
> > 
> >      virsh -c xen dompmsuspend <vmname> mem
> >      virsh -c xen dompmwakeup <vmname>
> > 
> > But it's possible to trigger it with plain xl too:
> > 
> >      xl save -c <vmname> <some-file>
> > 
> > The same on HVM works fine.
> > 
> > This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
> > with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
> > relevant here. I can reliably reproduce it.
> 
> This is already on my list of issues to look at.
> 
> The problem seems to be related to the XSA-332 patches. You could try
> the patches I've sent out recently addressing other fallout from XSA-332
> which _might_ fix this issue, too:
> 
> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/

Thanks for the patches. Sadly it doesn't change anything - I get exactly
the same crash. I applied that on top of 5.11-rc7 (that's what I had
handy). If you think there may be a difference with the final 5.11 or
another branch, please let me know.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-17 13:48   ` Marek Marczykowski-Górecki
@ 2021-02-17 15:37     ` Jürgen Groß
  2021-02-19 12:48     ` Jürgen Groß
  1 sibling, 0 replies; 9+ messages in thread
From: Jürgen Groß @ 2021-02-17 15:37 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1423 bytes --]

On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>> Hi,
>>>
>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>> this with:
>>>
>>>       virsh -c xen dompmsuspend <vmname> mem
>>>       virsh -c xen dompmwakeup <vmname>
>>>
>>> But it's possible to trigger it with plain xl too:
>>>
>>>       xl save -c <vmname> <some-file>
>>>
>>> The same on HVM works fine.
>>>
>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>> relevant here. I can reliably reproduce it.
>>
>> This is already on my list of issues to look at.
>>
>> The problem seems to be related to the XSA-332 patches. You could try
>> the patches I've sent out recently addressing other fallout from XSA-332
>> which _might_ fix this issue, too:
>>
>> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/
> 
> Thanks for the patches. Sadly it doesn't change anything - I get exactly
> the same crash. I applied that on top of 5.11-rc7 (that's what I had
> handy). If you think there may be a difference with the final 5.11 or
> another branch, please let me know.
> 

Okay, thanks for testing.

I hope to find some time to look into the issue soon.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-17 13:48   ` Marek Marczykowski-Górecki
  2021-02-17 15:37     ` Jürgen Groß
@ 2021-02-19 12:48     ` Jürgen Groß
  2021-02-19 13:10       ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2021-02-19 12:48 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel, Jan Beulich


[-- Attachment #1.1.1: Type: text/plain, Size: 1711 bytes --]

On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>> Hi,
>>>
>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>> this with:
>>>
>>>       virsh -c xen dompmsuspend <vmname> mem
>>>       virsh -c xen dompmwakeup <vmname>
>>>
>>> But it's possible to trigger it with plain xl too:
>>>
>>>       xl save -c <vmname> <some-file>
>>>
>>> The same on HVM works fine.
>>>
>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>> relevant here. I can reliably reproduce it.
>>
>> This is already on my list of issues to look at.
>>
>> The problem seems to be related to the XSA-332 patches. You could try
>> the patches I've sent out recently addressing other fallout from XSA-332
>> which _might_ fix this issue, too:
>>
>> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/
> 
> Thanks for the patches. Sadly it doesn't change anything - I get exactly
> the same crash. I applied that on top of 5.11-rc7 (that's what I had
> handy). If you think there may be a difference with the final 5.11 or
> another branch, please let me know.
> 

Some more tests reveal that this seems to be s hypervisor regression.
I can reproduce the very same problem with a 4.12 kernel from 2019.

It seems as if the EVTCHNOP_init_control hypercall is returning
-EINVAL when the domain is continuing to run after the suspend
hypercall (in contrast to the case where a new domain has been created
when doing a "xl restore").


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-19 12:48     ` Jürgen Groß
@ 2021-02-19 13:10       ` Jan Beulich
  2021-02-19 13:18         ` Jürgen Groß
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2021-02-19 13:10 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: xen-devel, Marek Marczykowski-Górecki

On 19.02.2021 13:48, Jürgen Groß wrote:
> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>> Hi,
>>>>
>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>> this with:
>>>>
>>>>       virsh -c xen dompmsuspend <vmname> mem
>>>>       virsh -c xen dompmwakeup <vmname>
>>>>
>>>> But it's possible to trigger it with plain xl too:
>>>>
>>>>       xl save -c <vmname> <some-file>
>>>>
>>>> The same on HVM works fine.
>>>>
>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>> relevant here. I can reliably reproduce it.
>>>
>>> This is already on my list of issues to look at.
>>>
>>> The problem seems to be related to the XSA-332 patches. You could try
>>> the patches I've sent out recently addressing other fallout from XSA-332
>>> which _might_ fix this issue, too:
>>>
>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/
>>
>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>> handy). If you think there may be a difference with the final 5.11 or
>> another branch, please let me know.
>>
> 
> Some more tests reveal that this seems to be s hypervisor regression.
> I can reproduce the very same problem with a 4.12 kernel from 2019.
> 
> It seems as if the EVTCHNOP_init_control hypercall is returning
> -EINVAL when the domain is continuing to run after the suspend
> hypercall (in contrast to the case where a new domain has been created
> when doing a "xl restore").

But when you resume the same domain, the kernel isn't supposed to
call EVTCHNOP_init_control, as that's a one time operation (per
vCPU, and unless EVTCHNOP_reset was called of course). In the
hypervisor map_control_block() has (always had) as its first step

    if ( v->evtchn_fifo->control_block )
        return -EINVAL;

Re-setup is needed only when resuming in a new domain.

Jan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-19 13:10       ` Jan Beulich
@ 2021-02-19 13:18         ` Jürgen Groß
  2021-02-19 13:37           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2021-02-19 13:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 2382 bytes --]

On 19.02.21 14:10, Jan Beulich wrote:
> On 19.02.2021 13:48, Jürgen Groß wrote:
>> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>>> Hi,
>>>>>
>>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>>> this with:
>>>>>
>>>>>        virsh -c xen dompmsuspend <vmname> mem
>>>>>        virsh -c xen dompmwakeup <vmname>
>>>>>
>>>>> But it's possible to trigger it with plain xl too:
>>>>>
>>>>>        xl save -c <vmname> <some-file>
>>>>>
>>>>> The same on HVM works fine.
>>>>>
>>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>>> relevant here. I can reliably reproduce it.
>>>>
>>>> This is already on my list of issues to look at.
>>>>
>>>> The problem seems to be related to the XSA-332 patches. You could try
>>>> the patches I've sent out recently addressing other fallout from XSA-332
>>>> which _might_ fix this issue, too:
>>>>
>>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/
>>>
>>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>>> handy). If you think there may be a difference with the final 5.11 or
>>> another branch, please let me know.
>>>
>>
>> Some more tests reveal that this seems to be s hypervisor regression.
>> I can reproduce the very same problem with a 4.12 kernel from 2019.
>>
>> It seems as if the EVTCHNOP_init_control hypercall is returning
>> -EINVAL when the domain is continuing to run after the suspend
>> hypercall (in contrast to the case where a new domain has been created
>> when doing a "xl restore").
> 
> But when you resume the same domain, the kernel isn't supposed to
> call EVTCHNOP_init_control, as that's a one time operation (per
> vCPU, and unless EVTCHNOP_reset was called of course). In the
> hypervisor map_control_block() has (always had) as its first step
> 
>      if ( v->evtchn_fifo->control_block )
>          return -EINVAL;
> 
> Re-setup is needed only when resuming in a new domain.

But the same guest will not crash when doing the same on a 4.12
hypervisor.


Juergen


[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-19 13:18         ` Jürgen Groß
@ 2021-02-19 13:37           ` Jan Beulich
  2021-02-19 13:41             ` Jürgen Groß
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2021-02-19 13:37 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: xen-devel, Marek Marczykowski-Górecki

On 19.02.2021 14:18, Jürgen Groß wrote:
> On 19.02.21 14:10, Jan Beulich wrote:
>> On 19.02.2021 13:48, Jürgen Groß wrote:
>>> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>>>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>>>> this with:
>>>>>>
>>>>>>        virsh -c xen dompmsuspend <vmname> mem
>>>>>>        virsh -c xen dompmwakeup <vmname>
>>>>>>
>>>>>> But it's possible to trigger it with plain xl too:
>>>>>>
>>>>>>        xl save -c <vmname> <some-file>
>>>>>>
>>>>>> The same on HVM works fine.
>>>>>>
>>>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>>>> relevant here. I can reliably reproduce it.
>>>>>
>>>>> This is already on my list of issues to look at.
>>>>>
>>>>> The problem seems to be related to the XSA-332 patches. You could try
>>>>> the patches I've sent out recently addressing other fallout from XSA-332
>>>>> which _might_ fix this issue, too:
>>>>>
>>>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/
>>>>
>>>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>>>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>>>> handy). If you think there may be a difference with the final 5.11 or
>>>> another branch, please let me know.
>>>>
>>>
>>> Some more tests reveal that this seems to be s hypervisor regression.
>>> I can reproduce the very same problem with a 4.12 kernel from 2019.
>>>
>>> It seems as if the EVTCHNOP_init_control hypercall is returning
>>> -EINVAL when the domain is continuing to run after the suspend
>>> hypercall (in contrast to the case where a new domain has been created
>>> when doing a "xl restore").
>>
>> But when you resume the same domain, the kernel isn't supposed to
>> call EVTCHNOP_init_control, as that's a one time operation (per
>> vCPU, and unless EVTCHNOP_reset was called of course). In the
>> hypervisor map_control_block() has (always had) as its first step
>>
>>      if ( v->evtchn_fifo->control_block )
>>          return -EINVAL;
>>
>> Re-setup is needed only when resuming in a new domain.
> 
> But the same guest will not crash when doing the same on a 4.12
> hypervisor.

Is the kernel perhaps not given the bit of information anymore that
it needs to tell apart the two resume modes?

Jan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux PV/PVH domU crash on (guest) resume from suspend
  2021-02-19 13:37           ` Jan Beulich
@ 2021-02-19 13:41             ` Jürgen Groß
  0 siblings, 0 replies; 9+ messages in thread
From: Jürgen Groß @ 2021-02-19 13:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 2830 bytes --]

On 19.02.21 14:37, Jan Beulich wrote:
> On 19.02.2021 14:18, Jürgen Groß wrote:
>> On 19.02.21 14:10, Jan Beulich wrote:
>>> On 19.02.2021 13:48, Jürgen Groß wrote:
>>>> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>>>>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>>>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>>>>> this with:
>>>>>>>
>>>>>>>         virsh -c xen dompmsuspend <vmname> mem
>>>>>>>         virsh -c xen dompmwakeup <vmname>
>>>>>>>
>>>>>>> But it's possible to trigger it with plain xl too:
>>>>>>>
>>>>>>>         xl save -c <vmname> <some-file>
>>>>>>>
>>>>>>> The same on HVM works fine.
>>>>>>>
>>>>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>>>>> relevant here. I can reliably reproduce it.
>>>>>>
>>>>>> This is already on my list of issues to look at.
>>>>>>
>>>>>> The problem seems to be related to the XSA-332 patches. You could try
>>>>>> the patches I've sent out recently addressing other fallout from XSA-332
>>>>>> which _might_ fix this issue, too:
>>>>>>
>>>>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@suse.com/
>>>>>
>>>>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>>>>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>>>>> handy). If you think there may be a difference with the final 5.11 or
>>>>> another branch, please let me know.
>>>>>
>>>>
>>>> Some more tests reveal that this seems to be s hypervisor regression.
>>>> I can reproduce the very same problem with a 4.12 kernel from 2019.
>>>>
>>>> It seems as if the EVTCHNOP_init_control hypercall is returning
>>>> -EINVAL when the domain is continuing to run after the suspend
>>>> hypercall (in contrast to the case where a new domain has been created
>>>> when doing a "xl restore").
>>>
>>> But when you resume the same domain, the kernel isn't supposed to
>>> call EVTCHNOP_init_control, as that's a one time operation (per
>>> vCPU, and unless EVTCHNOP_reset was called of course). In the
>>> hypervisor map_control_block() has (always had) as its first step
>>>
>>>       if ( v->evtchn_fifo->control_block )
>>>           return -EINVAL;
>>>
>>> Re-setup is needed only when resuming in a new domain.
>>
>> But the same guest will not crash when doing the same on a 4.12
>> hypervisor.
> 
> Is the kernel perhaps not given the bit of information anymore that
> it needs to tell apart the two resume modes?

Ah, yes, this might be the problem.

EVTCHNOP_init_control is indeed used only if the suspend hypercall did
return 0.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-19 13:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-17  5:12 Linux PV/PVH domU crash on (guest) resume from suspend Marek Marczykowski-Górecki
2021-02-17  6:51 ` Jürgen Groß
2021-02-17 13:48   ` Marek Marczykowski-Górecki
2021-02-17 15:37     ` Jürgen Groß
2021-02-19 12:48     ` Jürgen Groß
2021-02-19 13:10       ` Jan Beulich
2021-02-19 13:18         ` Jürgen Groß
2021-02-19 13:37           ` Jan Beulich
2021-02-19 13:41             ` Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).