* Fwd: [BUG] Windows is frozen after restore from snapshot [not found] <6237e102-f2cf-a66e-09b6-954ebfe28f8c@list.ru> @ 2021-04-23 10:22 ` Sergey Kovalev 2021-04-23 12:30 ` Jan Beulich 2021-04-23 15:08 ` Roger Pau Monné 0 siblings, 2 replies; 10+ messages in thread From: Sergey Kovalev @ 2021-04-23 10:22 UTC (permalink / raw) To: xen-devel; +Cc: zaytsevgu # Abstract After `xl save win win.mem` and then `xl restore win.hvm win.mem` the Windows 10 VM remain frozen for about a minute. After the minute it becomes responsive. During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc` press the wait cursor appears (blue circle indicator). This is an intermittent fault been reproduced only twice. # Technical notes It have been noticed that there were no timer interrupts during the freeze. zaytsevgu@gmail.com has debugged the received Xen state file and noticed that the flag HPET_TN_PERIODIC been set after unfreeze. Based on that he provided two Python scripts: one to check the value and one to patch it. Both "broken" state files we have been detected and patched successfully. # Other information ## Target machine ```bash $ uname -a Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ``` ## Xen version Build from source on tag RELEASE-4.12.4 ## OS version * Windows 10 build 1803 x64 * Hibernation, sleep and other disabled with powershell commands: ``` powercfg /hibernate off powercfg /change standby-timeout-ac 0 powercfg /change standby-timeout-dc 0 powercfg /change monitor-timeout-ac 0 powercfg /change monitor-timeout-dc 0 powercfg /change disk-timeout-ac 0 powercfg /change disk-timeout-dc 0 ``` ## Configuration file Build with envsubst from template: ``` name = "$VM_NAME" type = "hvm" vcpus = 2 maxvcpus = 2 memory = 2048 maxmem = 2048 on_poweroff = "destroy" on_reboot = "destroy" on_watchdog = "destroy" on_crash = "destroy" on_soft_reset = "soft-reset" nomigrate = 1 disk = [ "format=qcow2, vdev=hda, target=$VM_DISK_IMAGE_PATH" ] vif = [ "type=ioemu, model=e1000" ] hdtype = "ahci" shadow_memory = 16 altp2m = "external" viridian = [ "defaults" ] videoram = 128 vga = "stdvga" vnc = 1 vncunused = 1 soundhw = "hda" usb = 1 usbdevice = [ "usb-tablet" ] ``` ## Check script The script has been provided by zaytsevgu@gmail.com (with little refactoring). It checks that image is broken. ```python #!/usr/bin/env python3 import logging from pathlib import Path import sys import struct def check_snapshot_hpet(snapshot: Path) -> bool: def get_b32(file): data = file.read(4) return struct.unpack('>L', data)[0] def get_l32(file): data = file.read(4) return struct.unpack('<L', data)[0] def get_l64(file): data = file.read(8) return struct.unpack('<Q', data)[0] def get_hpet_loc_by_tag9(file): while True: tag = get_l32(file) tlen = get_l32(file) if tag == 12: break file.seek(tlen, 1) _ = get_l64(file) # caps _ = [get_l64(file) for i in range(31)] timer1_conf = get_l64(file) # Basic check if timer1_conf & 0xff == 0x34: return file.tell() - 8 return None def get_hpet(file): _ = get_l32(file) # x1 _ = get_l32(file) # x2 hdr = file.read(4) if hdr != b'XENF': return None _ = get_b32(file) # version get_b32(file) get_b32(file) _ = get_l32(file) # dmt _ = get_l32(file) # page_shift _ = get_l32(file) # xmj _ = get_l32(file) # xmn while True: tag_type = get_l32(file) rlen = get_l32(file) if tag_type == 9: break else: file.seek(rlen, 1) return get_hpet_loc_by_tag9(file) original = open(snapshot, 'rb') header = original.read(0x1000) xl_offset = header.index(b'LibxlFmt') original.seek(xl_offset) magic = original.read(8) if magic != b'LibxlFmt': logging.error('Invalid snapshot format') raise RuntimeError _ = get_b32(original) # version _ = get_b32(original) # options record_type = get_l32(original) _ = get_l32(original) # blen if record_type != 1: logging.error('Invalid snapshot record type') raise RuntimeError hpet_flag_byte_offset = get_hpet(original) if hpet_flag_byte_offset is not None: original.close() return False else: original.close() return True if check_snapshot_hpet(sys.argv[1]): print('The image is good! :)') sys.exit(0) else: print('The image is so bad... :(') sys.exit(1) ``` The image could be fixed with a little addition: ```python hpet_new = hpet[0] ^ 0x8 ``` , on `hpet_flag_byte_offset` ## Patch script ```python import sys import struct import io def get_b32(file): data = file.read(4) return struct.unpack(">L", data)[0] def get_l32(file): data = file.read(4) return struct.unpack("<L", data)[0] def get_l64(file): data = file.read(8) return struct.unpack("<Q", data)[0] def get_hpet_loc_by_tag9(file, rlen): while True: tag = get_l32(file) tlen = get_l32(file) if tag == 12: break file.seek(tlen, 1) caps = get_l64(file) [get_l64(file) for i in range(31)] timer1_conf = get_l64(file) print(hex(timer1_conf)) if timer1_conf & 0xff == 0x34: #VERY DUMMY CHECK return file.tell() - 8 return None def get_hpet(file): x1 = get_l32(file) x2 = get_l32(file) hdr = file.read(4) # print(hdr) if hdr != b"XENF": return None version = get_b32(file) get_b32(file) get_b32(file) dmt = get_l32(file) page_shift = get_l32(file) xmj = get_l32(file) xmn = get_l32(file) while True: tag_type = get_l32(file) # print(tag_type) rlen = get_l32(file) if tag_type == 9: break else: file.seek(rlen, 1) print("Found tag 9!") return get_hpet_loc_by_tag9(file, rlen) original = open(sys.argv[1], "rb") new = open(sys.argv[1]+".hpet_enable_periodic", "wb") header = original.read(0x1000) xl_offset = header.index(b"LibxlFmt") print("Found offset to xl data: {:x}".format(xl_offset)) original.seek(xl_offset) magic = original.read(8) if magic != b"LibxlFmt": print("ERROR INVALID FORMAT") else: version = get_b32(original) options = get_b32(original) record_type = get_l32(original) blen = get_l32(original) # print(record_type, blen) if record_type != 1: 0/0 hpet_flag_byte_offset = get_hpet(original) if hpet_flag_byte_offset != None: print("Got hpet timer flag!") file_size = 0 original.seek(0, 2) file_size = original.tell() original.seek(0,0) pos = 0 block_size = 4*1024*1024 print(hex(hpet_flag_byte_offset)) while pos != hpet_flag_byte_offset: if hpet_flag_byte_offset - pos < block_size: block_size = hpet_flag_byte_offset - pos data = original.read(block_size) new.write(data) pos += block_size hpet = original.read(8) # print(hpet) hpet_new = hpet[0] ^ 0x8 # print(hpet_new) new.write(bytes((hpet_new,))) new.write(hpet[1:]) pos = pos + 8 block_size = 4*1024*1024 while pos != file_size: if file_size - pos < block_size: block_size = file_size - pos data = original.read(block_size) new.write(data) pos += block_size else: print("can't find") original.close() new.close() ``` -- With best regards, Sergey Kovalev ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev @ 2021-04-23 12:30 ` Jan Beulich 2021-04-23 12:55 ` Sergey Kovalev 2021-04-23 15:08 ` Roger Pau Monné 1 sibling, 1 reply; 10+ messages in thread From: Jan Beulich @ 2021-04-23 12:30 UTC (permalink / raw) To: Sergey Kovalev; +Cc: zaytsevgu, xen-devel On 23.04.2021 12:22, Sergey Kovalev wrote: > # Abstract > > After `xl save win win.mem` and then `xl restore win.hvm win.mem` > the Windows 10 VM remain frozen for about a minute. After the > minute it becomes responsive. > > During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc` > press the wait cursor appears (blue circle indicator). > > This is an intermittent fault been reproduced only twice. > > # Technical notes > > It have been noticed that there were no timer interrupts during > the freeze. > > zaytsevgu@gmail.com has debugged the received Xen state file and > noticed that the flag HPET_TN_PERIODIC been set after unfreeze. > > Based on that he provided two Python scripts: one to check the > value and one to patch it. > > Both "broken" state files we have been detected and patched > successfully. "Patched successfully" meaning the guest, when resumed using that state, did not stall initially? In any event, if HPET_TN_PERIODIC was set after unfreeze, it was also set upon saving state. (Or are you suggesting the flag got "magically" set?) In which case we can't go and clear it behind the OS'es back. So I suspect if there is a (rare) problem here, it is likely connected to other parts of the HPET state. Since you've taken apart saved state, could you supply the full set of values (ideally multiple ones, if you happen to have them, plus ones where the problem didn't occur, to allow someone perhaps spot a pattern)? Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 12:30 ` Jan Beulich @ 2021-04-23 12:55 ` Sergey Kovalev 2021-04-23 13:10 ` Георгий Зайцев 0 siblings, 1 reply; 10+ messages in thread From: Sergey Kovalev @ 2021-04-23 12:55 UTC (permalink / raw) To: Jan Beulich; +Cc: zaytsevgu, xen-devel 23.04.2021 15:30, Jan Beulich пишет: > "Patched successfully" meaning the guest, when resumed using that > state, did not stall initially? Yes. > In any event, if HPET_TN_PERIODIC was set after unfreeze, it was > also set upon saving state. (Or are you suggesting the flag got > "magically" set?) I understand that it should be OS related. Though I don't understand how to prevent similar issues in future. > Since > you've taken apart saved state, could you supply the full set of > values (ideally multiple ones, if you happen to have them, plus > ones where the problem didn't occur, to allow someone perhaps > spot a pattern)? I could provide a xen state filed received with `xl save`. Would it be help-full? Where to store the file? -- With best regards, Sergey Kovalev ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 12:55 ` Sergey Kovalev @ 2021-04-23 13:10 ` Георгий Зайцев 2021-04-23 13:21 ` Jan Beulich 0 siblings, 1 reply; 10+ messages in thread From: Георгий Зайцев @ 2021-04-23 13:10 UTC (permalink / raw) To: Sergey Kovalev; +Cc: Jan Beulich, xen-devel [-- Attachment #1: Type: text/plain, Size: 1411 bytes --] > > Since > you've taken apart saved state, could you supply the full set of > values (ideally multiple ones, if you happen to have them, plus > ones where the problem didn't occur, to allow someone perhaps > spot a pattern)? > Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet structure: capabiliy: f424008086a201 res0: 0 config: 3 res1: 0 isr: 0 res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] mc64: 97b90bc74 res3: 0 timer0: config: f0000000002934 cmp: fd4aa84c fsb: 0 res4: 0 timer1: config: f0000000000130 cmp: ffffffff fsb: 0 res4: 0 timer2: config: f0000000000130 cmp: ffffffff fsb: 0 res4: 0 period[0] = ee6b2 period[1] = 0 period[2] = 0 This one taken from snapshot of "unfrozen" one: capabiliy: f424008086a201 res0: 0 config: 3 res1: 0 isr: 0 res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] mc64: acbd23c57 res3: 0 timer0: config: f000000000293c cmp: acbd3761b fsb: 0 res4: 0 timer1: config: f0000000000130 cmp: ffffffff fsb: 0 res4: 0 timer2: config: f0000000000130 cmp: ffffffff fsb: 0 res4: 0 period[0] = ee6b2 period[1] = 0 period[2] = 0 The only difference is HPET_TN_PERIODIC flag for timers[0].config value [-- Attachment #2: Type: text/html, Size: 2340 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 13:10 ` Георгий Зайцев @ 2021-04-23 13:21 ` Jan Beulich 2021-04-23 13:30 ` Георгий Зайцев 0 siblings, 1 reply; 10+ messages in thread From: Jan Beulich @ 2021-04-23 13:21 UTC (permalink / raw) To: Георгий Зайцев Cc: xen-devel, Sergey Kovalev On 23.04.2021 15:10, Георгий Зайцев wrote: >> >> Since >> you've taken apart saved state, could you supply the full set of >> values (ideally multiple ones, if you happen to have them, plus >> ones where the problem didn't occur, to allow someone perhaps >> spot a pattern)? >> > > Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet > structure: > > capabiliy: f424008086a201 > res0: 0 > config: 3 > res1: 0 > isr: 0 > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0] > mc64: 97b90bc74 > res3: 0 > timer0: > config: f0000000002934 > cmp: fd4aa84c > fsb: 0 > res4: 0 > timer1: > config: f0000000000130 > cmp: ffffffff > fsb: 0 > res4: 0 > timer2: > config: f0000000000130 > cmp: ffffffff > fsb: 0 > res4: 0 > period[0] = ee6b2 > period[1] = 0 > period[2] = 0 > > This one taken from snapshot of "unfrozen" one: > > capabiliy: f424008086a201 > res0: 0 > config: 3 > res1: 0 > isr: 0 > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0] > mc64: acbd23c57 > res3: 0 > timer0: > config: f000000000293c > cmp: acbd3761b > fsb: 0 > res4: 0 > timer1: > config: f0000000000130 > cmp: ffffffff > fsb: 0 > res4: 0 > timer2: > config: f0000000000130 > cmp: ffffffff > fsb: 0 > res4: 0 > period[0] = ee6b2 > period[1] = 0 > period[2] = 0 > > The only difference is HPET_TN_PERIODIC flag for timers[0].config value Thanks, but now I'll need to understand what your quoted "frozen" and "unfrozen" mean. Plus obviously comparators and main counter are also different, and it's there where I suspect the issue is. Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 13:21 ` Jan Beulich @ 2021-04-23 13:30 ` Георгий Зайцев 2021-04-23 13:40 ` Jan Beulich 0 siblings, 1 reply; 10+ messages in thread From: Георгий Зайцев @ 2021-04-23 13:30 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Sergey Kovalev [-- Attachment #1: Type: text/plain, Size: 3421 bytes --] Thanks, but now I'll need to understand what your quoted "frozen" and > "unfrozen" mean. Plus obviously comparators and main counter are also > different, and it's there where I suspect the issue is "frozen" - this is initial snapshot which takes about from 30 seconds to 1 minute after restore to start dispatching timer interrupts to windows guest "unfrozen" - this is state which taken after restoring "frozen" one and waiting 90 seconds when guest start receiving interrupts and starts working as expected we also made some another snapshots (again after restoring from initial "frozen" one) when system still in 'freezed" state (about ~20-30 seconds from start of restore process) and in this snapshots HPET state stays the same as in initial "frozen" state except mc64 field: capabiliy: f424008086a201 res0: 0 config: 3 res1: 0 isr: 0 res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] mc64: 9bafb6e4e res3: 0 timer0: config: f0000000002934 cmp: fd4aa84c fsb: 0 res4: 0 timer1: config: f0000000000130 cmp: ffffffff fsb: 0 res4: 0 timer2: config: f0000000000130 cmp: ffffffff fsb: 0 res4: 0 period[0] = ee6b2 period[1] = 0 period[2] = 0 пт, 23 апр. 2021 г. в 16:21, Jan Beulich <jbeulich@suse.com>: > On 23.04.2021 15:10, Георгий Зайцев wrote: > >> > >> Since > >> you've taken apart saved state, could you supply the full set of > >> values (ideally multiple ones, if you happen to have them, plus > >> ones where the problem didn't occur, to allow someone perhaps > >> spot a pattern)? > >> > > > > Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet > > structure: > > > > capabiliy: f424008086a201 > > res0: 0 > > config: 3 > > res1: 0 > > isr: 0 > > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, > > 0, 0] > > mc64: 97b90bc74 > > res3: 0 > > timer0: > > config: f0000000002934 > > cmp: fd4aa84c > > fsb: 0 > > res4: 0 > > timer1: > > config: f0000000000130 > > cmp: ffffffff > > fsb: 0 > > res4: 0 > > timer2: > > config: f0000000000130 > > cmp: ffffffff > > fsb: 0 > > res4: 0 > > period[0] = ee6b2 > > period[1] = 0 > > period[2] = 0 > > > > This one taken from snapshot of "unfrozen" one: > > > > capabiliy: f424008086a201 > > res0: 0 > > config: 3 > > res1: 0 > > isr: 0 > > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, > > 0, 0] > > mc64: acbd23c57 > > res3: 0 > > timer0: > > config: f000000000293c > > cmp: acbd3761b > > fsb: 0 > > res4: 0 > > timer1: > > config: f0000000000130 > > cmp: ffffffff > > fsb: 0 > > res4: 0 > > timer2: > > config: f0000000000130 > > cmp: ffffffff > > fsb: 0 > > res4: 0 > > period[0] = ee6b2 > > period[1] = 0 > > period[2] = 0 > > > > The only difference is HPET_TN_PERIODIC flag for timers[0].config value > > Thanks, but now I'll need to understand what your quoted "frozen" and > "unfrozen" mean. Plus obviously comparators and main counter are also > different, and it's there where I suspect the issue is. > > Jan > [-- Attachment #2: Type: text/html, Size: 4760 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 13:30 ` Георгий Зайцев @ 2021-04-23 13:40 ` Jan Beulich 0 siblings, 0 replies; 10+ messages in thread From: Jan Beulich @ 2021-04-23 13:40 UTC (permalink / raw) To: Георгий Зайцев Cc: xen-devel, Sergey Kovalev On 23.04.2021 15:30, Георгий Зайцев wrote: > Thanks, but now I'll need to understand what your quoted "frozen" and >> "unfrozen" mean. Plus obviously comparators and main counter are also >> different, and it's there where I suspect the issue is > > "frozen" - this is initial snapshot which takes about from 30 seconds to 1 > minute after restore to start dispatching timer interrupts to windows guest > "unfrozen" - this is state which taken after restoring "frozen" one and > waiting 90 seconds when guest start receiving interrupts and starts working > as expected So I misunderstood Sergey's original mail - HPET_TN_PERIODIC is clear immediately after restore, and becomes set some time later. That's still nothing we can do behind the OSes back. If the OS has cleared the bit, we need to keep it clear. Jan > we also made some another snapshots (again after restoring from initial > "frozen" one) when system still in 'freezed" state (about ~20-30 seconds > from start of restore process) and in this snapshots HPET state stays the > same as in initial "frozen" state except mc64 field: > capabiliy: f424008086a201 > res0: 0 > config: 3 > res1: 0 > isr: 0 > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0] > mc64: 9bafb6e4e > res3: 0 > timer0: > config: f0000000002934 > cmp: fd4aa84c > fsb: 0 > res4: 0 > timer1: > config: f0000000000130 > cmp: ffffffff > fsb: 0 > res4: 0 > timer2: > config: f0000000000130 > cmp: ffffffff > fsb: 0 > res4: 0 > period[0] = ee6b2 > period[1] = 0 > period[2] = 0 > > пт, 23 апр. 2021 г. в 16:21, Jan Beulich <jbeulich@suse.com>: > >> On 23.04.2021 15:10, Георгий Зайцев wrote: >>>> >>>> Since >>>> you've taken apart saved state, could you supply the full set of >>>> values (ideally multiple ones, if you happen to have them, plus >>>> ones where the problem didn't occur, to allow someone perhaps >>>> spot a pattern)? >>>> >>> >>> Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet >>> structure: >>> >>> capabiliy: f424008086a201 >>> res0: 0 >>> config: 3 >>> res1: 0 >>> isr: 0 >>> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> 0, >>> 0, 0] >>> mc64: 97b90bc74 >>> res3: 0 >>> timer0: >>> config: f0000000002934 >>> cmp: fd4aa84c >>> fsb: 0 >>> res4: 0 >>> timer1: >>> config: f0000000000130 >>> cmp: ffffffff >>> fsb: 0 >>> res4: 0 >>> timer2: >>> config: f0000000000130 >>> cmp: ffffffff >>> fsb: 0 >>> res4: 0 >>> period[0] = ee6b2 >>> period[1] = 0 >>> period[2] = 0 >>> >>> This one taken from snapshot of "unfrozen" one: >>> >>> capabiliy: f424008086a201 >>> res0: 0 >>> config: 3 >>> res1: 0 >>> isr: 0 >>> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> 0, >>> 0, 0] >>> mc64: acbd23c57 >>> res3: 0 >>> timer0: >>> config: f000000000293c >>> cmp: acbd3761b >>> fsb: 0 >>> res4: 0 >>> timer1: >>> config: f0000000000130 >>> cmp: ffffffff >>> fsb: 0 >>> res4: 0 >>> timer2: >>> config: f0000000000130 >>> cmp: ffffffff >>> fsb: 0 >>> res4: 0 >>> period[0] = ee6b2 >>> period[1] = 0 >>> period[2] = 0 >>> >>> The only difference is HPET_TN_PERIODIC flag for timers[0].config value >> >> Thanks, but now I'll need to understand what your quoted "frozen" and >> "unfrozen" mean. Plus obviously comparators and main counter are also >> different, and it's there where I suspect the issue is. >> >> Jan >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev 2021-04-23 12:30 ` Jan Beulich @ 2021-04-23 15:08 ` Roger Pau Monné 2021-04-23 16:19 ` Sergey Kovalev 1 sibling, 1 reply; 10+ messages in thread From: Roger Pau Monné @ 2021-04-23 15:08 UTC (permalink / raw) To: Sergey Kovalev; +Cc: xen-devel, zaytsevgu On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote: > # Abstract > > After `xl save win win.mem` and then `xl restore win.hvm win.mem` > the Windows 10 VM remain frozen for about a minute. After the > minute it becomes responsive. > > During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc` > press the wait cursor appears (blue circle indicator). > > This is an intermittent fault been reproduced only twice. > > # Technical notes > > It have been noticed that there were no timer interrupts during > the freeze. > > zaytsevgu@gmail.com has debugged the received Xen state file and > noticed that the flag HPET_TN_PERIODIC been set after unfreeze. > > Based on that he provided two Python scripts: one to check the > value and one to patch it. > > Both "broken" state files we have been detected and patched > successfully. > > # Other information > > ## Target machine > > ```bash > $ uname -a > Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP > Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > ``` > > ## Xen version > > Build from source on tag RELEASE-4.12.4 > > ## OS version > > * Windows 10 build 1803 x64 Do you also run other versions of Windows, and in which case I assume you have never seen the issue on those, or it's this specific version the only that you use? Thanks, Roger. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 15:08 ` Roger Pau Monné @ 2021-04-23 16:19 ` Sergey Kovalev 2021-04-24 0:39 ` Tamas K Lengyel 0 siblings, 1 reply; 10+ messages in thread From: Sergey Kovalev @ 2021-04-23 16:19 UTC (permalink / raw) To: Roger Pau Monné; +Cc: xen-devel, zaytsevgu 23.04.2021 18:08, Roger Pau Monné пишет: > On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote: >> # Abstract >> >> After `xl save win win.mem` and then `xl restore win.hvm win.mem` >> the Windows 10 VM remain frozen for about a minute. After the >> minute it becomes responsive. >> >> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc` >> press the wait cursor appears (blue circle indicator). >> >> This is an intermittent fault been reproduced only twice. >> >> # Technical notes >> >> It have been noticed that there were no timer interrupts during >> the freeze. >> >> zaytsevgu@gmail.com has debugged the received Xen state file and >> noticed that the flag HPET_TN_PERIODIC been set after unfreeze. >> >> Based on that he provided two Python scripts: one to check the >> value and one to patch it. >> >> Both "broken" state files we have been detected and patched >> successfully. >> >> # Other information >> >> ## Target machine >> >> ```bash >> $ uname -a >> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP >> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux >> ``` >> >> ## Xen version >> >> Build from source on tag RELEASE-4.12.4 >> >> ## OS version >> >> * Windows 10 build 1803 x64 > > Do you also run other versions of Windows, and in which case I assume > you have never seen the issue on those, or it's this specific version > the only that you use? > > Thanks, Roger. > We use Windows 7 SP1 x86/x64, Windows 8.1 update1 and Windows 10 1803 x64. The Windows 10 is the only one affected by the bug at the time. -- With best regards, Sergey Kovalev ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: [BUG] Windows is frozen after restore from snapshot 2021-04-23 16:19 ` Sergey Kovalev @ 2021-04-24 0:39 ` Tamas K Lengyel 0 siblings, 0 replies; 10+ messages in thread From: Tamas K Lengyel @ 2021-04-24 0:39 UTC (permalink / raw) To: Sergey Kovalev; +Cc: Roger Pau Monné, Xen-devel, zaytsevgu On Fri, Apr 23, 2021 at 12:19 PM Sergey Kovalev <valor@list.ru> wrote: > > > 23.04.2021 18:08, Roger Pau Monné пишет: > > On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote: > >> # Abstract > >> > >> After `xl save win win.mem` and then `xl restore win.hvm win.mem` > >> the Windows 10 VM remain frozen for about a minute. After the > >> minute it becomes responsive. > >> > >> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc` > >> press the wait cursor appears (blue circle indicator). > >> > >> This is an intermittent fault been reproduced only twice. > >> > >> # Technical notes > >> > >> It have been noticed that there were no timer interrupts during > >> the freeze. > >> > >> zaytsevgu@gmail.com has debugged the received Xen state file and > >> noticed that the flag HPET_TN_PERIODIC been set after unfreeze. > >> > >> Based on that he provided two Python scripts: one to check the > >> value and one to patch it. > >> > >> Both "broken" state files we have been detected and patched > >> successfully. > >> > >> # Other information > >> > >> ## Target machine > >> > >> ```bash > >> $ uname -a > >> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP > >> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > >> ``` > >> > >> ## Xen version > >> > >> Build from source on tag RELEASE-4.12.4 > >> > >> ## OS version > >> > >> * Windows 10 build 1803 x64 > > > > Do you also run other versions of Windows, and in which case I assume > > you have never seen the issue on those, or it's this specific version > > the only that you use? > > > > Thanks, Roger. > > > > We use Windows 7 SP1 x86/x64, Windows 8.1 update1 and > Windows 10 1803 x64. > > The Windows 10 is the only one affected by the bug at > the time. I can confirm that I have ran into this issue as well in the past, but never had time to dig deeper into the root cause. I may add that with snapshots taken of Windows 10 on Xen 4.14 or 4.15 and using those for restoring I haven't seen it happen yet. The Win10 version didn't change on my end, only the hypervisor got upgraded. So this may be a bug that got fixed in newer Xen versions. Tamas ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-04-24 0:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <6237e102-f2cf-a66e-09b6-954ebfe28f8c@list.ru> 2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev 2021-04-23 12:30 ` Jan Beulich 2021-04-23 12:55 ` Sergey Kovalev 2021-04-23 13:10 ` Георгий Зайцев 2021-04-23 13:21 ` Jan Beulich 2021-04-23 13:30 ` Георгий Зайцев 2021-04-23 13:40 ` Jan Beulich 2021-04-23 15:08 ` Roger Pau Monné 2021-04-23 16:19 ` Sergey Kovalev 2021-04-24 0:39 ` Tamas K Lengyel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).