* Fwd: [BUG] Windows is frozen after restore from snapshot
[not found] <6237e102-f2cf-a66e-09b6-954ebfe28f8c@list.ru>
@ 2021-04-23 10:22 ` Sergey Kovalev
2021-04-23 12:30 ` Jan Beulich
2021-04-23 15:08 ` Roger Pau Monné
0 siblings, 2 replies; 10+ messages in thread
From: Sergey Kovalev @ 2021-04-23 10:22 UTC (permalink / raw)
To: xen-devel; +Cc: zaytsevgu
# Abstract
After `xl save win win.mem` and then `xl restore win.hvm win.mem`
the Windows 10 VM remain frozen for about a minute. After the
minute it becomes responsive.
During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
press the wait cursor appears (blue circle indicator).
This is an intermittent fault been reproduced only twice.
# Technical notes
It have been noticed that there were no timer interrupts during
the freeze.
zaytsevgu@gmail.com has debugged the received Xen state file and
noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
Based on that he provided two Python scripts: one to check the
value and one to patch it.
Both "broken" state files we have been detected and patched
successfully.
# Other information
## Target machine
```bash
$ uname -a
Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
```
## Xen version
Build from source on tag RELEASE-4.12.4
## OS version
* Windows 10 build 1803 x64
* Hibernation, sleep and other disabled with powershell commands:
```
powercfg /hibernate off
powercfg /change standby-timeout-ac 0
powercfg /change standby-timeout-dc 0
powercfg /change monitor-timeout-ac 0
powercfg /change monitor-timeout-dc 0
powercfg /change disk-timeout-ac 0
powercfg /change disk-timeout-dc 0
```
## Configuration file
Build with envsubst from template:
```
name = "$VM_NAME"
type = "hvm"
vcpus = 2
maxvcpus = 2
memory = 2048
maxmem = 2048
on_poweroff = "destroy"
on_reboot = "destroy"
on_watchdog = "destroy"
on_crash = "destroy"
on_soft_reset = "soft-reset"
nomigrate = 1
disk = [ "format=qcow2, vdev=hda, target=$VM_DISK_IMAGE_PATH" ]
vif = [ "type=ioemu, model=e1000" ]
hdtype = "ahci"
shadow_memory = 16
altp2m = "external"
viridian = [ "defaults" ]
videoram = 128
vga = "stdvga"
vnc = 1
vncunused = 1
soundhw = "hda"
usb = 1
usbdevice = [ "usb-tablet" ]
```
## Check script
The script has been provided by zaytsevgu@gmail.com
(with little refactoring).
It checks that image is broken.
```python
#!/usr/bin/env python3
import logging
from pathlib import Path
import sys
import struct
def check_snapshot_hpet(snapshot: Path) -> bool:
def get_b32(file):
data = file.read(4)
return struct.unpack('>L', data)[0]
def get_l32(file):
data = file.read(4)
return struct.unpack('<L', data)[0]
def get_l64(file):
data = file.read(8)
return struct.unpack('<Q', data)[0]
def get_hpet_loc_by_tag9(file):
while True:
tag = get_l32(file)
tlen = get_l32(file)
if tag == 12:
break
file.seek(tlen, 1)
_ = get_l64(file) # caps
_ = [get_l64(file) for i in range(31)]
timer1_conf = get_l64(file)
# Basic check
if timer1_conf & 0xff == 0x34:
return file.tell() - 8
return None
def get_hpet(file):
_ = get_l32(file) # x1
_ = get_l32(file) # x2
hdr = file.read(4)
if hdr != b'XENF':
return None
_ = get_b32(file) # version
get_b32(file)
get_b32(file)
_ = get_l32(file) # dmt
_ = get_l32(file) # page_shift
_ = get_l32(file) # xmj
_ = get_l32(file) # xmn
while True:
tag_type = get_l32(file)
rlen = get_l32(file)
if tag_type == 9:
break
else:
file.seek(rlen, 1)
return get_hpet_loc_by_tag9(file)
original = open(snapshot, 'rb')
header = original.read(0x1000)
xl_offset = header.index(b'LibxlFmt')
original.seek(xl_offset)
magic = original.read(8)
if magic != b'LibxlFmt':
logging.error('Invalid snapshot format')
raise RuntimeError
_ = get_b32(original) # version
_ = get_b32(original) # options
record_type = get_l32(original)
_ = get_l32(original) # blen
if record_type != 1:
logging.error('Invalid snapshot record type')
raise RuntimeError
hpet_flag_byte_offset = get_hpet(original)
if hpet_flag_byte_offset is not None:
original.close()
return False
else:
original.close()
return True
if check_snapshot_hpet(sys.argv[1]):
print('The image is good! :)')
sys.exit(0)
else:
print('The image is so bad... :(')
sys.exit(1)
```
The image could be fixed with a little addition:
```python
hpet_new = hpet[0] ^ 0x8
```
, on `hpet_flag_byte_offset`
## Patch script
```python
import sys
import struct
import io
def get_b32(file):
data = file.read(4)
return struct.unpack(">L", data)[0]
def get_l32(file):
data = file.read(4)
return struct.unpack("<L", data)[0]
def get_l64(file):
data = file.read(8)
return struct.unpack("<Q", data)[0]
def get_hpet_loc_by_tag9(file, rlen):
while True:
tag = get_l32(file)
tlen = get_l32(file)
if tag == 12:
break
file.seek(tlen, 1)
caps = get_l64(file)
[get_l64(file) for i in range(31)]
timer1_conf = get_l64(file)
print(hex(timer1_conf))
if timer1_conf & 0xff == 0x34: #VERY DUMMY CHECK
return file.tell() - 8
return None
def get_hpet(file):
x1 = get_l32(file)
x2 = get_l32(file)
hdr = file.read(4)
# print(hdr)
if hdr != b"XENF":
return None
version = get_b32(file)
get_b32(file)
get_b32(file)
dmt = get_l32(file)
page_shift = get_l32(file)
xmj = get_l32(file)
xmn = get_l32(file)
while True:
tag_type = get_l32(file)
# print(tag_type)
rlen = get_l32(file)
if tag_type == 9:
break
else:
file.seek(rlen, 1)
print("Found tag 9!")
return get_hpet_loc_by_tag9(file, rlen)
original = open(sys.argv[1], "rb")
new = open(sys.argv[1]+".hpet_enable_periodic", "wb")
header = original.read(0x1000)
xl_offset = header.index(b"LibxlFmt")
print("Found offset to xl data: {:x}".format(xl_offset))
original.seek(xl_offset)
magic = original.read(8)
if magic != b"LibxlFmt":
print("ERROR INVALID FORMAT")
else:
version = get_b32(original)
options = get_b32(original)
record_type = get_l32(original)
blen = get_l32(original)
# print(record_type, blen)
if record_type != 1:
0/0
hpet_flag_byte_offset = get_hpet(original)
if hpet_flag_byte_offset != None:
print("Got hpet timer flag!")
file_size = 0
original.seek(0, 2)
file_size = original.tell()
original.seek(0,0)
pos = 0
block_size = 4*1024*1024
print(hex(hpet_flag_byte_offset))
while pos != hpet_flag_byte_offset:
if hpet_flag_byte_offset - pos < block_size:
block_size = hpet_flag_byte_offset - pos
data = original.read(block_size)
new.write(data)
pos += block_size
hpet = original.read(8)
# print(hpet)
hpet_new = hpet[0] ^ 0x8
# print(hpet_new)
new.write(bytes((hpet_new,)))
new.write(hpet[1:])
pos = pos + 8
block_size = 4*1024*1024
while pos != file_size:
if file_size - pos < block_size:
block_size = file_size - pos
data = original.read(block_size)
new.write(data)
pos += block_size
else:
print("can't find")
original.close()
new.close()
```
--
With best regards,
Sergey Kovalev
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev
@ 2021-04-23 12:30 ` Jan Beulich
2021-04-23 12:55 ` Sergey Kovalev
2021-04-23 15:08 ` Roger Pau Monné
1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-04-23 12:30 UTC (permalink / raw)
To: Sergey Kovalev; +Cc: zaytsevgu, xen-devel
On 23.04.2021 12:22, Sergey Kovalev wrote:
> # Abstract
>
> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
> the Windows 10 VM remain frozen for about a minute. After the
> minute it becomes responsive.
>
> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
> press the wait cursor appears (blue circle indicator).
>
> This is an intermittent fault been reproduced only twice.
>
> # Technical notes
>
> It have been noticed that there were no timer interrupts during
> the freeze.
>
> zaytsevgu@gmail.com has debugged the received Xen state file and
> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
>
> Based on that he provided two Python scripts: one to check the
> value and one to patch it.
>
> Both "broken" state files we have been detected and patched
> successfully.
"Patched successfully" meaning the guest, when resumed using that
state, did not stall initially?
In any event, if HPET_TN_PERIODIC was set after unfreeze, it was
also set upon saving state. (Or are you suggesting the flag got
"magically" set?) In which case we can't go and clear it behind
the OS'es back. So I suspect if there is a (rare) problem here,
it is likely connected to other parts of the HPET state. Since
you've taken apart saved state, could you supply the full set of
values (ideally multiple ones, if you happen to have them, plus
ones where the problem didn't occur, to allow someone perhaps
spot a pattern)?
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 12:30 ` Jan Beulich
@ 2021-04-23 12:55 ` Sergey Kovalev
2021-04-23 13:10 ` Георгий Зайцев
0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kovalev @ 2021-04-23 12:55 UTC (permalink / raw)
To: Jan Beulich; +Cc: zaytsevgu, xen-devel
23.04.2021 15:30, Jan Beulich пишет:
> "Patched successfully" meaning the guest, when resumed using that
> state, did not stall initially?
Yes.
> In any event, if HPET_TN_PERIODIC was set after unfreeze, it was
> also set upon saving state. (Or are you suggesting the flag got
> "magically" set?)
I understand that it should be OS related. Though I don't understand
how to prevent similar issues in future.
> Since
> you've taken apart saved state, could you supply the full set of
> values (ideally multiple ones, if you happen to have them, plus
> ones where the problem didn't occur, to allow someone perhaps
> spot a pattern)?
I could provide a xen state filed received with `xl save`.
Would it be help-full? Where to store the file?
--
With best regards,
Sergey Kovalev
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 12:55 ` Sergey Kovalev
@ 2021-04-23 13:10 ` Георгий Зайцев
2021-04-23 13:21 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Георгий Зайцев @ 2021-04-23 13:10 UTC (permalink / raw)
To: Sergey Kovalev; +Cc: Jan Beulich, xen-devel
[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]
>
> Since
> you've taken apart saved state, could you supply the full set of
> values (ideally multiple ones, if you happen to have them, plus
> ones where the problem didn't occur, to allow someone perhaps
> spot a pattern)?
>
Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
structure:
capabiliy: f424008086a201
res0: 0
config: 3
res1: 0
isr: 0
res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]
mc64: 97b90bc74
res3: 0
timer0:
config: f0000000002934
cmp: fd4aa84c
fsb: 0
res4: 0
timer1:
config: f0000000000130
cmp: ffffffff
fsb: 0
res4: 0
timer2:
config: f0000000000130
cmp: ffffffff
fsb: 0
res4: 0
period[0] = ee6b2
period[1] = 0
period[2] = 0
This one taken from snapshot of "unfrozen" one:
capabiliy: f424008086a201
res0: 0
config: 3
res1: 0
isr: 0
res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]
mc64: acbd23c57
res3: 0
timer0:
config: f000000000293c
cmp: acbd3761b
fsb: 0
res4: 0
timer1:
config: f0000000000130
cmp: ffffffff
fsb: 0
res4: 0
timer2:
config: f0000000000130
cmp: ffffffff
fsb: 0
res4: 0
period[0] = ee6b2
period[1] = 0
period[2] = 0
The only difference is HPET_TN_PERIODIC flag for timers[0].config value
[-- Attachment #2: Type: text/html, Size: 2340 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 13:10 ` Георгий Зайцев
@ 2021-04-23 13:21 ` Jan Beulich
2021-04-23 13:30 ` Георгий Зайцев
0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-04-23 13:21 UTC (permalink / raw)
To: Георгий
Зайцев
Cc: xen-devel, Sergey Kovalev
On 23.04.2021 15:10, Георгий Зайцев wrote:
>>
>> Since
>> you've taken apart saved state, could you supply the full set of
>> values (ideally multiple ones, if you happen to have them, plus
>> ones where the problem didn't occur, to allow someone perhaps
>> spot a pattern)?
>>
>
> Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
> structure:
>
> capabiliy: f424008086a201
> res0: 0
> config: 3
> res1: 0
> isr: 0
> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0]
> mc64: 97b90bc74
> res3: 0
> timer0:
> config: f0000000002934
> cmp: fd4aa84c
> fsb: 0
> res4: 0
> timer1:
> config: f0000000000130
> cmp: ffffffff
> fsb: 0
> res4: 0
> timer2:
> config: f0000000000130
> cmp: ffffffff
> fsb: 0
> res4: 0
> period[0] = ee6b2
> period[1] = 0
> period[2] = 0
>
> This one taken from snapshot of "unfrozen" one:
>
> capabiliy: f424008086a201
> res0: 0
> config: 3
> res1: 0
> isr: 0
> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0]
> mc64: acbd23c57
> res3: 0
> timer0:
> config: f000000000293c
> cmp: acbd3761b
> fsb: 0
> res4: 0
> timer1:
> config: f0000000000130
> cmp: ffffffff
> fsb: 0
> res4: 0
> timer2:
> config: f0000000000130
> cmp: ffffffff
> fsb: 0
> res4: 0
> period[0] = ee6b2
> period[1] = 0
> period[2] = 0
>
> The only difference is HPET_TN_PERIODIC flag for timers[0].config value
Thanks, but now I'll need to understand what your quoted "frozen" and
"unfrozen" mean. Plus obviously comparators and main counter are also
different, and it's there where I suspect the issue is.
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 13:21 ` Jan Beulich
@ 2021-04-23 13:30 ` Георгий Зайцев
2021-04-23 13:40 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Георгий Зайцев @ 2021-04-23 13:30 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Sergey Kovalev
[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]
Thanks, but now I'll need to understand what your quoted "frozen" and
> "unfrozen" mean. Plus obviously comparators and main counter are also
> different, and it's there where I suspect the issue is
"frozen" - this is initial snapshot which takes about from 30 seconds to 1
minute after restore to start dispatching timer interrupts to windows guest
"unfrozen" - this is state which taken after restoring "frozen" one and
waiting 90 seconds when guest start receiving interrupts and starts working
as expected
we also made some another snapshots (again after restoring from initial
"frozen" one) when system still in 'freezed" state (about ~20-30 seconds
from start of restore process) and in this snapshots HPET state stays the
same as in initial "frozen" state except mc64 field:
capabiliy: f424008086a201
res0: 0
config: 3
res1: 0
isr: 0
res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]
mc64: 9bafb6e4e
res3: 0
timer0:
config: f0000000002934
cmp: fd4aa84c
fsb: 0
res4: 0
timer1:
config: f0000000000130
cmp: ffffffff
fsb: 0
res4: 0
timer2:
config: f0000000000130
cmp: ffffffff
fsb: 0
res4: 0
period[0] = ee6b2
period[1] = 0
period[2] = 0
пт, 23 апр. 2021 г. в 16:21, Jan Beulich <jbeulich@suse.com>:
> On 23.04.2021 15:10, Георгий Зайцев wrote:
> >>
> >> Since
> >> you've taken apart saved state, could you supply the full set of
> >> values (ideally multiple ones, if you happen to have them, plus
> >> ones where the problem didn't occur, to allow someone perhaps
> >> spot a pattern)?
> >>
> >
> > Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
> > structure:
> >
> > capabiliy: f424008086a201
> > res0: 0
> > config: 3
> > res1: 0
> > isr: 0
> > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> > 0, 0]
> > mc64: 97b90bc74
> > res3: 0
> > timer0:
> > config: f0000000002934
> > cmp: fd4aa84c
> > fsb: 0
> > res4: 0
> > timer1:
> > config: f0000000000130
> > cmp: ffffffff
> > fsb: 0
> > res4: 0
> > timer2:
> > config: f0000000000130
> > cmp: ffffffff
> > fsb: 0
> > res4: 0
> > period[0] = ee6b2
> > period[1] = 0
> > period[2] = 0
> >
> > This one taken from snapshot of "unfrozen" one:
> >
> > capabiliy: f424008086a201
> > res0: 0
> > config: 3
> > res1: 0
> > isr: 0
> > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> > 0, 0]
> > mc64: acbd23c57
> > res3: 0
> > timer0:
> > config: f000000000293c
> > cmp: acbd3761b
> > fsb: 0
> > res4: 0
> > timer1:
> > config: f0000000000130
> > cmp: ffffffff
> > fsb: 0
> > res4: 0
> > timer2:
> > config: f0000000000130
> > cmp: ffffffff
> > fsb: 0
> > res4: 0
> > period[0] = ee6b2
> > period[1] = 0
> > period[2] = 0
> >
> > The only difference is HPET_TN_PERIODIC flag for timers[0].config value
>
> Thanks, but now I'll need to understand what your quoted "frozen" and
> "unfrozen" mean. Plus obviously comparators and main counter are also
> different, and it's there where I suspect the issue is.
>
> Jan
>
[-- Attachment #2: Type: text/html, Size: 4760 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 13:30 ` Георгий Зайцев
@ 2021-04-23 13:40 ` Jan Beulich
0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2021-04-23 13:40 UTC (permalink / raw)
To: Георгий
Зайцев
Cc: xen-devel, Sergey Kovalev
On 23.04.2021 15:30, Георгий Зайцев wrote:
> Thanks, but now I'll need to understand what your quoted "frozen" and
>> "unfrozen" mean. Plus obviously comparators and main counter are also
>> different, and it's there where I suspect the issue is
>
> "frozen" - this is initial snapshot which takes about from 30 seconds to 1
> minute after restore to start dispatching timer interrupts to windows guest
> "unfrozen" - this is state which taken after restoring "frozen" one and
> waiting 90 seconds when guest start receiving interrupts and starts working
> as expected
So I misunderstood Sergey's original mail - HPET_TN_PERIODIC is clear
immediately after restore, and becomes set some time later. That's
still nothing we can do behind the OSes back. If the OS has cleared
the bit, we need to keep it clear.
Jan
> we also made some another snapshots (again after restoring from initial
> "frozen" one) when system still in 'freezed" state (about ~20-30 seconds
> from start of restore process) and in this snapshots HPET state stays the
> same as in initial "frozen" state except mc64 field:
> capabiliy: f424008086a201
> res0: 0
> config: 3
> res1: 0
> isr: 0
> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0]
> mc64: 9bafb6e4e
> res3: 0
> timer0:
> config: f0000000002934
> cmp: fd4aa84c
> fsb: 0
> res4: 0
> timer1:
> config: f0000000000130
> cmp: ffffffff
> fsb: 0
> res4: 0
> timer2:
> config: f0000000000130
> cmp: ffffffff
> fsb: 0
> res4: 0
> period[0] = ee6b2
> period[1] = 0
> period[2] = 0
>
> пт, 23 апр. 2021 г. в 16:21, Jan Beulich <jbeulich@suse.com>:
>
>> On 23.04.2021 15:10, Георгий Зайцев wrote:
>>>>
>>>> Since
>>>> you've taken apart saved state, could you supply the full set of
>>>> values (ideally multiple ones, if you happen to have them, plus
>>>> ones where the problem didn't occur, to allow someone perhaps
>>>> spot a pattern)?
>>>>
>>>
>>> Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
>>> structure:
>>>
>>> capabiliy: f424008086a201
>>> res0: 0
>>> config: 3
>>> res1: 0
>>> isr: 0
>>> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>>> 0, 0]
>>> mc64: 97b90bc74
>>> res3: 0
>>> timer0:
>>> config: f0000000002934
>>> cmp: fd4aa84c
>>> fsb: 0
>>> res4: 0
>>> timer1:
>>> config: f0000000000130
>>> cmp: ffffffff
>>> fsb: 0
>>> res4: 0
>>> timer2:
>>> config: f0000000000130
>>> cmp: ffffffff
>>> fsb: 0
>>> res4: 0
>>> period[0] = ee6b2
>>> period[1] = 0
>>> period[2] = 0
>>>
>>> This one taken from snapshot of "unfrozen" one:
>>>
>>> capabiliy: f424008086a201
>>> res0: 0
>>> config: 3
>>> res1: 0
>>> isr: 0
>>> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>>> 0, 0]
>>> mc64: acbd23c57
>>> res3: 0
>>> timer0:
>>> config: f000000000293c
>>> cmp: acbd3761b
>>> fsb: 0
>>> res4: 0
>>> timer1:
>>> config: f0000000000130
>>> cmp: ffffffff
>>> fsb: 0
>>> res4: 0
>>> timer2:
>>> config: f0000000000130
>>> cmp: ffffffff
>>> fsb: 0
>>> res4: 0
>>> period[0] = ee6b2
>>> period[1] = 0
>>> period[2] = 0
>>>
>>> The only difference is HPET_TN_PERIODIC flag for timers[0].config value
>>
>> Thanks, but now I'll need to understand what your quoted "frozen" and
>> "unfrozen" mean. Plus obviously comparators and main counter are also
>> different, and it's there where I suspect the issue is.
>>
>> Jan
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev
2021-04-23 12:30 ` Jan Beulich
@ 2021-04-23 15:08 ` Roger Pau Monné
2021-04-23 16:19 ` Sergey Kovalev
1 sibling, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2021-04-23 15:08 UTC (permalink / raw)
To: Sergey Kovalev; +Cc: xen-devel, zaytsevgu
On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote:
> # Abstract
>
> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
> the Windows 10 VM remain frozen for about a minute. After the
> minute it becomes responsive.
>
> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
> press the wait cursor appears (blue circle indicator).
>
> This is an intermittent fault been reproduced only twice.
>
> # Technical notes
>
> It have been noticed that there were no timer interrupts during
> the freeze.
>
> zaytsevgu@gmail.com has debugged the received Xen state file and
> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
>
> Based on that he provided two Python scripts: one to check the
> value and one to patch it.
>
> Both "broken" state files we have been detected and patched
> successfully.
>
> # Other information
>
> ## Target machine
>
> ```bash
> $ uname -a
> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> ```
>
> ## Xen version
>
> Build from source on tag RELEASE-4.12.4
>
> ## OS version
>
> * Windows 10 build 1803 x64
Do you also run other versions of Windows, and in which case I assume
you have never seen the issue on those, or it's this specific version
the only that you use?
Thanks, Roger.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 15:08 ` Roger Pau Monné
@ 2021-04-23 16:19 ` Sergey Kovalev
2021-04-24 0:39 ` Tamas K Lengyel
0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kovalev @ 2021-04-23 16:19 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel, zaytsevgu
23.04.2021 18:08, Roger Pau Monné пишет:
> On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote:
>> # Abstract
>>
>> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
>> the Windows 10 VM remain frozen for about a minute. After the
>> minute it becomes responsive.
>>
>> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
>> press the wait cursor appears (blue circle indicator).
>>
>> This is an intermittent fault been reproduced only twice.
>>
>> # Technical notes
>>
>> It have been noticed that there were no timer interrupts during
>> the freeze.
>>
>> zaytsevgu@gmail.com has debugged the received Xen state file and
>> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
>>
>> Based on that he provided two Python scripts: one to check the
>> value and one to patch it.
>>
>> Both "broken" state files we have been detected and patched
>> successfully.
>>
>> # Other information
>>
>> ## Target machine
>>
>> ```bash
>> $ uname -a
>> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
>> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>> ```
>>
>> ## Xen version
>>
>> Build from source on tag RELEASE-4.12.4
>>
>> ## OS version
>>
>> * Windows 10 build 1803 x64
>
> Do you also run other versions of Windows, and in which case I assume
> you have never seen the issue on those, or it's this specific version
> the only that you use?
>
> Thanks, Roger.
>
We use Windows 7 SP1 x86/x64, Windows 8.1 update1 and
Windows 10 1803 x64.
The Windows 10 is the only one affected by the bug at
the time.
--
With best regards,
Sergey Kovalev
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
2021-04-23 16:19 ` Sergey Kovalev
@ 2021-04-24 0:39 ` Tamas K Lengyel
0 siblings, 0 replies; 10+ messages in thread
From: Tamas K Lengyel @ 2021-04-24 0:39 UTC (permalink / raw)
To: Sergey Kovalev; +Cc: Roger Pau Monné, Xen-devel, zaytsevgu
On Fri, Apr 23, 2021 at 12:19 PM Sergey Kovalev <valor@list.ru> wrote:
>
>
> 23.04.2021 18:08, Roger Pau Monné пишет:
> > On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote:
> >> # Abstract
> >>
> >> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
> >> the Windows 10 VM remain frozen for about a minute. After the
> >> minute it becomes responsive.
> >>
> >> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
> >> press the wait cursor appears (blue circle indicator).
> >>
> >> This is an intermittent fault been reproduced only twice.
> >>
> >> # Technical notes
> >>
> >> It have been noticed that there were no timer interrupts during
> >> the freeze.
> >>
> >> zaytsevgu@gmail.com has debugged the received Xen state file and
> >> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
> >>
> >> Based on that he provided two Python scripts: one to check the
> >> value and one to patch it.
> >>
> >> Both "broken" state files we have been detected and patched
> >> successfully.
> >>
> >> # Other information
> >>
> >> ## Target machine
> >>
> >> ```bash
> >> $ uname -a
> >> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
> >> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> >> ```
> >>
> >> ## Xen version
> >>
> >> Build from source on tag RELEASE-4.12.4
> >>
> >> ## OS version
> >>
> >> * Windows 10 build 1803 x64
> >
> > Do you also run other versions of Windows, and in which case I assume
> > you have never seen the issue on those, or it's this specific version
> > the only that you use?
> >
> > Thanks, Roger.
> >
>
> We use Windows 7 SP1 x86/x64, Windows 8.1 update1 and
> Windows 10 1803 x64.
>
> The Windows 10 is the only one affected by the bug at
> the time.
I can confirm that I have ran into this issue as well in the past, but
never had time to dig deeper into the root cause. I may add that with
snapshots taken of Windows 10 on Xen 4.14 or 4.15 and using those for
restoring I haven't seen it happen yet. The Win10 version didn't
change on my end, only the hypervisor got upgraded. So this may be a
bug that got fixed in newer Xen versions.
Tamas
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-04-24 0:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <6237e102-f2cf-a66e-09b6-954ebfe28f8c@list.ru>
2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev
2021-04-23 12:30 ` Jan Beulich
2021-04-23 12:55 ` Sergey Kovalev
2021-04-23 13:10 ` Георгий Зайцев
2021-04-23 13:21 ` Jan Beulich
2021-04-23 13:30 ` Георгий Зайцев
2021-04-23 13:40 ` Jan Beulich
2021-04-23 15:08 ` Roger Pau Monné
2021-04-23 16:19 ` Sergey Kovalev
2021-04-24 0:39 ` Tamas K Lengyel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).