All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: [BUG] Windows is frozen after restore from snapshot
       [not found] <6237e102-f2cf-a66e-09b6-954ebfe28f8c@list.ru>
@ 2021-04-23 10:22 ` Sergey Kovalev
  2021-04-23 12:30   ` Jan Beulich
  2021-04-23 15:08   ` Roger Pau Monné
  0 siblings, 2 replies; 10+ messages in thread
From: Sergey Kovalev @ 2021-04-23 10:22 UTC (permalink / raw)
  To: xen-devel; +Cc: zaytsevgu

# Abstract

After `xl save win win.mem` and then `xl restore win.hvm win.mem`
the Windows 10 VM remain frozen for about a minute. After the
minute it becomes responsive.

During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
press the wait cursor appears (blue circle indicator).

This is an intermittent fault been reproduced only twice.

# Technical notes

It have been noticed that there were no timer interrupts during
the freeze.

zaytsevgu@gmail.com has debugged the received Xen state file and
noticed that the flag HPET_TN_PERIODIC been set after unfreeze.

Based on that he provided two Python scripts: one to check the
value and one to patch it.

Both "broken" state files we have been detected and patched
successfully.

# Other information

## Target machine

```bash
$ uname -a
Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
```

## Xen version

Build from source on tag RELEASE-4.12.4

## OS version

* Windows 10 build 1803 x64
* Hibernation, sleep and other disabled with powershell commands:
```
powercfg /hibernate off
powercfg /change standby-timeout-ac 0
powercfg /change standby-timeout-dc 0
powercfg /change monitor-timeout-ac 0
powercfg /change monitor-timeout-dc 0
powercfg /change disk-timeout-ac 0
powercfg /change disk-timeout-dc 0
```

## Configuration file

Build with envsubst from template:

```
name = "$VM_NAME"
type = "hvm"

vcpus = 2
maxvcpus = 2

memory = 2048
maxmem = 2048

on_poweroff = "destroy"
on_reboot = "destroy"
on_watchdog = "destroy"
on_crash = "destroy"
on_soft_reset = "soft-reset"

nomigrate = 1

disk = [ "format=qcow2, vdev=hda, target=$VM_DISK_IMAGE_PATH" ]

vif = [ "type=ioemu, model=e1000" ]

hdtype = "ahci"

shadow_memory = 16

altp2m = "external"

viridian = [ "defaults" ]

videoram = 128
vga = "stdvga"

vnc = 1
vncunused = 1

soundhw = "hda"

usb = 1
usbdevice = [ "usb-tablet" ]
```

## Check script

The script has been provided by zaytsevgu@gmail.com
(with little refactoring).

It checks that image is broken.

```python
#!/usr/bin/env python3


import logging
from pathlib import Path
import sys
import struct


def check_snapshot_hpet(snapshot: Path) -> bool:
     def get_b32(file):
         data = file.read(4)
         return struct.unpack('>L', data)[0]

     def get_l32(file):
         data = file.read(4)
         return struct.unpack('<L', data)[0]

     def get_l64(file):
         data = file.read(8)
         return struct.unpack('<Q', data)[0]

     def get_hpet_loc_by_tag9(file):
         while True:
             tag = get_l32(file)
             tlen = get_l32(file)
             if tag == 12:
                 break
             file.seek(tlen, 1)
         _ = get_l64(file) # caps
         _ = [get_l64(file) for i in range(31)]
         timer1_conf = get_l64(file)
         # Basic check
         if timer1_conf & 0xff == 0x34:
             return file.tell() - 8
         return None

     def get_hpet(file):
         _ = get_l32(file)  # x1
         _ = get_l32(file)  # x2
         hdr = file.read(4)
         if hdr != b'XENF':
             return None
         _ = get_b32(file)  # version
         get_b32(file)
         get_b32(file)
         _ = get_l32(file)  # dmt
         _ = get_l32(file)  # page_shift
         _ = get_l32(file)  # xmj
         _ = get_l32(file)  # xmn

         while True:
             tag_type = get_l32(file)
             rlen = get_l32(file)
             if tag_type == 9:
                 break
             else:
                 file.seek(rlen, 1)
         return get_hpet_loc_by_tag9(file)

     original = open(snapshot, 'rb')

     header = original.read(0x1000)
     xl_offset = header.index(b'LibxlFmt')
     original.seek(xl_offset)
     magic = original.read(8)
     if magic != b'LibxlFmt':
         logging.error('Invalid snapshot format')
         raise RuntimeError

     _ = get_b32(original)  # version
     _ = get_b32(original)  # options
     record_type = get_l32(original)
     _ = get_l32(original)  # blen
     if record_type != 1:
         logging.error('Invalid snapshot record type')
         raise RuntimeError
     hpet_flag_byte_offset = get_hpet(original)
     if hpet_flag_byte_offset is not None:
         original.close()
         return False
     else:
         original.close()
         return True


if check_snapshot_hpet(sys.argv[1]):
     print('The image is good! :)')
     sys.exit(0)
else:
     print('The image is so bad... :(')
     sys.exit(1)
```

The image could be fixed with a little addition:
```python
hpet_new = hpet[0] ^ 0x8
```
, on `hpet_flag_byte_offset`

## Patch script

```python
import sys
import struct
import io

def get_b32(file):
     data = file.read(4)
     return struct.unpack(">L", data)[0]

def get_l32(file):
     data = file.read(4)
     return struct.unpack("<L", data)[0]

def get_l64(file):
     data = file.read(8)
     return struct.unpack("<Q", data)[0]


def get_hpet_loc_by_tag9(file, rlen):
     while True:
         tag = get_l32(file)
         tlen = get_l32(file)
         if tag == 12:
             break
         file.seek(tlen, 1)
     caps = get_l64(file)
     [get_l64(file) for i in range(31)]
     timer1_conf = get_l64(file)
     print(hex(timer1_conf))
     if timer1_conf & 0xff == 0x34: #VERY DUMMY CHECK
         return file.tell() - 8
     return None

def get_hpet(file):
     x1 = get_l32(file)
     x2 = get_l32(file)
     hdr = file.read(4)
     # print(hdr)
     if hdr != b"XENF":
         return None
     version = get_b32(file)
     get_b32(file)
     get_b32(file)
     dmt = get_l32(file)
     page_shift = get_l32(file)
     xmj = get_l32(file)
     xmn = get_l32(file)

     while True:
         tag_type = get_l32(file)
         # print(tag_type)
         rlen = get_l32(file)
         if tag_type == 9:
             break
         else:
             file.seek(rlen, 1)
     print("Found tag 9!")
     return get_hpet_loc_by_tag9(file, rlen)


original = open(sys.argv[1], "rb")
new = open(sys.argv[1]+".hpet_enable_periodic", "wb")

header = original.read(0x1000)
xl_offset = header.index(b"LibxlFmt")
print("Found offset to xl data: {:x}".format(xl_offset))
original.seek(xl_offset)
magic = original.read(8)
if magic != b"LibxlFmt":
     print("ERROR INVALID FORMAT")
else:
     version = get_b32(original)
     options = get_b32(original)
     record_type = get_l32(original)
     blen = get_l32(original)
     # print(record_type, blen)
     if record_type != 1:
         0/0
     hpet_flag_byte_offset = get_hpet(original)
     if hpet_flag_byte_offset != None:
         print("Got hpet timer flag!")
         file_size = 0
         original.seek(0, 2)
         file_size = original.tell()
         original.seek(0,0)
         pos = 0
         block_size = 4*1024*1024
         print(hex(hpet_flag_byte_offset))
         while pos != hpet_flag_byte_offset:
             if hpet_flag_byte_offset - pos < block_size:
                 block_size = hpet_flag_byte_offset - pos
             data = original.read(block_size)
             new.write(data)
             pos += block_size
         hpet = original.read(8)
         # print(hpet)
         hpet_new = hpet[0] ^ 0x8
         # print(hpet_new)
         new.write(bytes((hpet_new,)))
         new.write(hpet[1:])
         pos = pos + 8
         block_size = 4*1024*1024
         while pos != file_size:
             if file_size - pos < block_size:
                 block_size = file_size - pos
             data = original.read(block_size)
             new.write(data)
             pos += block_size
     else:
         print("can't find")
original.close()
new.close()
```

-- 
With best regards,
Sergey Kovalev



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev
@ 2021-04-23 12:30   ` Jan Beulich
  2021-04-23 12:55     ` Sergey Kovalev
  2021-04-23 15:08   ` Roger Pau Monné
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-04-23 12:30 UTC (permalink / raw)
  To: Sergey Kovalev; +Cc: zaytsevgu, xen-devel

On 23.04.2021 12:22, Sergey Kovalev wrote:
> # Abstract
> 
> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
> the Windows 10 VM remain frozen for about a minute. After the
> minute it becomes responsive.
> 
> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
> press the wait cursor appears (blue circle indicator).
> 
> This is an intermittent fault been reproduced only twice.
> 
> # Technical notes
> 
> It have been noticed that there were no timer interrupts during
> the freeze.
> 
> zaytsevgu@gmail.com has debugged the received Xen state file and
> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
> 
> Based on that he provided two Python scripts: one to check the
> value and one to patch it.
> 
> Both "broken" state files we have been detected and patched
> successfully.

"Patched successfully" meaning the guest, when resumed using that
state, did not stall initially?

In any event, if HPET_TN_PERIODIC was set after unfreeze, it was
also set upon saving state. (Or are you suggesting the flag got
"magically" set?) In which case we can't go and clear it behind
the OS'es back. So I suspect if there is a (rare) problem here,
it is likely connected to other parts of the HPET state. Since
you've taken apart saved state, could you supply the full set of
values (ideally multiple ones, if you happen to have them, plus
ones where the problem didn't occur, to allow someone perhaps
spot a pattern)?

Jan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 12:30   ` Jan Beulich
@ 2021-04-23 12:55     ` Sergey Kovalev
  2021-04-23 13:10       ` Георгий Зайцев
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kovalev @ 2021-04-23 12:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: zaytsevgu, xen-devel

23.04.2021 15:30, Jan Beulich пишет:

 > "Patched successfully" meaning the guest, when resumed using that
 > state, did not stall initially?

Yes.

 > In any event, if HPET_TN_PERIODIC was set after unfreeze, it was
 > also set upon saving state. (Or are you suggesting the flag got
 > "magically" set?)
I understand that it should be OS related. Though I don't understand
how to prevent similar issues in future.

 > Since
 > you've taken apart saved state, could you supply the full set of
 > values (ideally multiple ones, if you happen to have them, plus
 > ones where the problem didn't occur, to allow someone perhaps
 > spot a pattern)?
I could provide a xen state filed received with `xl save`.
Would it be help-full? Where to store the file?

-- 
With best regards,
Sergey Kovalev


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 12:55     ` Sergey Kovalev
@ 2021-04-23 13:10       ` Георгий Зайцев
  2021-04-23 13:21         ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Георгий Зайцев @ 2021-04-23 13:10 UTC (permalink / raw)
  To: Sergey Kovalev; +Cc: Jan Beulich, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

>
> Since
> you've taken apart saved state, could you supply the full set of
> values (ideally multiple ones, if you happen to have them, plus
> ones where the problem didn't occur, to allow someone perhaps
> spot a pattern)?
>

Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
structure:

capabiliy: f424008086a201
res0: 0
config: 3
res1: 0
isr: 0
res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]
mc64: 97b90bc74
res3: 0
timer0:
        config: f0000000002934
        cmp: fd4aa84c
        fsb: 0
        res4: 0
timer1:
        config: f0000000000130
        cmp: ffffffff
        fsb: 0
        res4: 0
timer2:
        config: f0000000000130
        cmp: ffffffff
        fsb: 0
        res4: 0
period[0] = ee6b2
period[1] = 0
period[2] = 0

This one taken from snapshot of "unfrozen" one:

capabiliy: f424008086a201
res0: 0
config: 3
res1: 0
isr: 0
res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]
mc64: acbd23c57
res3: 0
timer0:
        config: f000000000293c
        cmp: acbd3761b
        fsb: 0
        res4: 0
timer1:
        config: f0000000000130
        cmp: ffffffff
        fsb: 0
        res4: 0
timer2:
        config: f0000000000130
        cmp: ffffffff
        fsb: 0
        res4: 0
period[0] = ee6b2
period[1] = 0
period[2] = 0

The only difference is HPET_TN_PERIODIC flag for timers[0].config value

[-- Attachment #2: Type: text/html, Size: 2340 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 13:10       ` Георгий Зайцев
@ 2021-04-23 13:21         ` Jan Beulich
  2021-04-23 13:30           ` Георгий Зайцев
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-04-23 13:21 UTC (permalink / raw)
  To: Георгий
	Зайцев
  Cc: xen-devel, Sergey Kovalev

On 23.04.2021 15:10, Георгий Зайцев wrote:
>>
>> Since
>> you've taken apart saved state, could you supply the full set of
>> values (ideally multiple ones, if you happen to have them, plus
>> ones where the problem didn't occur, to allow someone perhaps
>> spot a pattern)?
>>
> 
> Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
> structure:
> 
> capabiliy: f424008086a201
> res0: 0
> config: 3
> res1: 0
> isr: 0
> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0]
> mc64: 97b90bc74
> res3: 0
> timer0:
>         config: f0000000002934
>         cmp: fd4aa84c
>         fsb: 0
>         res4: 0
> timer1:
>         config: f0000000000130
>         cmp: ffffffff
>         fsb: 0
>         res4: 0
> timer2:
>         config: f0000000000130
>         cmp: ffffffff
>         fsb: 0
>         res4: 0
> period[0] = ee6b2
> period[1] = 0
> period[2] = 0
> 
> This one taken from snapshot of "unfrozen" one:
> 
> capabiliy: f424008086a201
> res0: 0
> config: 3
> res1: 0
> isr: 0
> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0]
> mc64: acbd23c57
> res3: 0
> timer0:
>         config: f000000000293c
>         cmp: acbd3761b
>         fsb: 0
>         res4: 0
> timer1:
>         config: f0000000000130
>         cmp: ffffffff
>         fsb: 0
>         res4: 0
> timer2:
>         config: f0000000000130
>         cmp: ffffffff
>         fsb: 0
>         res4: 0
> period[0] = ee6b2
> period[1] = 0
> period[2] = 0
> 
> The only difference is HPET_TN_PERIODIC flag for timers[0].config value

Thanks, but now I'll need to understand what your quoted "frozen" and
"unfrozen" mean. Plus obviously comparators and main counter are also
different, and it's there where I suspect the issue is.

Jan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 13:21         ` Jan Beulich
@ 2021-04-23 13:30           ` Георгий Зайцев
  2021-04-23 13:40             ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Георгий Зайцев @ 2021-04-23 13:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Sergey Kovalev

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

Thanks, but now I'll need to understand what your quoted "frozen" and
> "unfrozen" mean. Plus obviously comparators and main counter are also
> different, and it's there where I suspect the issue is

"frozen" - this is initial snapshot which takes about from 30 seconds to 1
minute after restore to start dispatching timer interrupts to windows guest
"unfrozen" - this is state which taken after restoring "frozen" one and
waiting 90 seconds when guest start receiving interrupts and starts working
as expected

we also made some another snapshots (again after restoring from initial
"frozen" one) when system still in 'freezed" state (about ~20-30 seconds
from start of restore process) and in this snapshots HPET state stays the
same as in initial "frozen" state except mc64 field:
capabiliy: f424008086a201
res0: 0
config: 3
res1: 0
isr: 0
res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]
mc64: 9bafb6e4e
res3: 0
timer0:
        config: f0000000002934
        cmp: fd4aa84c
        fsb: 0
        res4: 0
timer1:
        config: f0000000000130
        cmp: ffffffff
        fsb: 0
        res4: 0
timer2:
        config: f0000000000130
        cmp: ffffffff
        fsb: 0
        res4: 0
period[0] = ee6b2
period[1] = 0
period[2] = 0

пт, 23 апр. 2021 г. в 16:21, Jan Beulich <jbeulich@suse.com>:

> On 23.04.2021 15:10, Георгий Зайцев wrote:
> >>
> >> Since
> >> you've taken apart saved state, could you supply the full set of
> >> values (ideally multiple ones, if you happen to have them, plus
> >> ones where the problem didn't occur, to allow someone perhaps
> >> spot a pattern)?
> >>
> >
> > Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
> > structure:
> >
> > capabiliy: f424008086a201
> > res0: 0
> > config: 3
> > res1: 0
> > isr: 0
> > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> > 0, 0]
> > mc64: 97b90bc74
> > res3: 0
> > timer0:
> >         config: f0000000002934
> >         cmp: fd4aa84c
> >         fsb: 0
> >         res4: 0
> > timer1:
> >         config: f0000000000130
> >         cmp: ffffffff
> >         fsb: 0
> >         res4: 0
> > timer2:
> >         config: f0000000000130
> >         cmp: ffffffff
> >         fsb: 0
> >         res4: 0
> > period[0] = ee6b2
> > period[1] = 0
> > period[2] = 0
> >
> > This one taken from snapshot of "unfrozen" one:
> >
> > capabiliy: f424008086a201
> > res0: 0
> > config: 3
> > res1: 0
> > isr: 0
> > res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> > 0, 0]
> > mc64: acbd23c57
> > res3: 0
> > timer0:
> >         config: f000000000293c
> >         cmp: acbd3761b
> >         fsb: 0
> >         res4: 0
> > timer1:
> >         config: f0000000000130
> >         cmp: ffffffff
> >         fsb: 0
> >         res4: 0
> > timer2:
> >         config: f0000000000130
> >         cmp: ffffffff
> >         fsb: 0
> >         res4: 0
> > period[0] = ee6b2
> > period[1] = 0
> > period[2] = 0
> >
> > The only difference is HPET_TN_PERIODIC flag for timers[0].config value
>
> Thanks, but now I'll need to understand what your quoted "frozen" and
> "unfrozen" mean. Plus obviously comparators and main counter are also
> different, and it's there where I suspect the issue is.
>
> Jan
>

[-- Attachment #2: Type: text/html, Size: 4760 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 13:30           ` Георгий Зайцев
@ 2021-04-23 13:40             ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2021-04-23 13:40 UTC (permalink / raw)
  To: Георгий
	Зайцев
  Cc: xen-devel, Sergey Kovalev

On 23.04.2021 15:30, Георгий Зайцев wrote:
> Thanks, but now I'll need to understand what your quoted "frozen" and
>> "unfrozen" mean. Plus obviously comparators and main counter are also
>> different, and it's there where I suspect the issue is
> 
> "frozen" - this is initial snapshot which takes about from 30 seconds to 1
> minute after restore to start dispatching timer interrupts to windows guest
> "unfrozen" - this is state which taken after restoring "frozen" one and
> waiting 90 seconds when guest start receiving interrupts and starts working
> as expected

So I misunderstood Sergey's original mail - HPET_TN_PERIODIC is clear
immediately after restore, and becomes set some time later. That's
still nothing we can do behind the OSes back. If the OS has cleared
the bit, we need to keep it clear.

Jan

> we also made some another snapshots (again after restoring from initial
> "frozen" one) when system still in 'freezed" state (about ~20-30 seconds
> from start of restore process) and in this snapshots HPET state stays the
> same as in initial "frozen" state except mc64 field:
> capabiliy: f424008086a201
> res0: 0
> config: 3
> res1: 0
> isr: 0
> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0]
> mc64: 9bafb6e4e
> res3: 0
> timer0:
>         config: f0000000002934
>         cmp: fd4aa84c
>         fsb: 0
>         res4: 0
> timer1:
>         config: f0000000000130
>         cmp: ffffffff
>         fsb: 0
>         res4: 0
> timer2:
>         config: f0000000000130
>         cmp: ffffffff
>         fsb: 0
>         res4: 0
> period[0] = ee6b2
> period[1] = 0
> period[2] = 0
> 
> пт, 23 апр. 2021 г. в 16:21, Jan Beulich <jbeulich@suse.com>:
> 
>> On 23.04.2021 15:10, Георгий Зайцев wrote:
>>>>
>>>> Since
>>>> you've taken apart saved state, could you supply the full set of
>>>> values (ideally multiple ones, if you happen to have them, plus
>>>> ones where the problem didn't occur, to allow someone perhaps
>>>> spot a pattern)?
>>>>
>>>
>>> Here is full HPET state from "frozen" snapshot according to hvm_hw_hpet
>>> structure:
>>>
>>> capabiliy: f424008086a201
>>> res0: 0
>>> config: 3
>>> res1: 0
>>> isr: 0
>>> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>>> 0, 0]
>>> mc64: 97b90bc74
>>> res3: 0
>>> timer0:
>>>         config: f0000000002934
>>>         cmp: fd4aa84c
>>>         fsb: 0
>>>         res4: 0
>>> timer1:
>>>         config: f0000000000130
>>>         cmp: ffffffff
>>>         fsb: 0
>>>         res4: 0
>>> timer2:
>>>         config: f0000000000130
>>>         cmp: ffffffff
>>>         fsb: 0
>>>         res4: 0
>>> period[0] = ee6b2
>>> period[1] = 0
>>> period[2] = 0
>>>
>>> This one taken from snapshot of "unfrozen" one:
>>>
>>> capabiliy: f424008086a201
>>> res0: 0
>>> config: 3
>>> res1: 0
>>> isr: 0
>>> res2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>>> 0, 0]
>>> mc64: acbd23c57
>>> res3: 0
>>> timer0:
>>>         config: f000000000293c
>>>         cmp: acbd3761b
>>>         fsb: 0
>>>         res4: 0
>>> timer1:
>>>         config: f0000000000130
>>>         cmp: ffffffff
>>>         fsb: 0
>>>         res4: 0
>>> timer2:
>>>         config: f0000000000130
>>>         cmp: ffffffff
>>>         fsb: 0
>>>         res4: 0
>>> period[0] = ee6b2
>>> period[1] = 0
>>> period[2] = 0
>>>
>>> The only difference is HPET_TN_PERIODIC flag for timers[0].config value
>>
>> Thanks, but now I'll need to understand what your quoted "frozen" and
>> "unfrozen" mean. Plus obviously comparators and main counter are also
>> different, and it's there where I suspect the issue is.
>>
>> Jan
>>
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev
  2021-04-23 12:30   ` Jan Beulich
@ 2021-04-23 15:08   ` Roger Pau Monné
  2021-04-23 16:19     ` Sergey Kovalev
  1 sibling, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2021-04-23 15:08 UTC (permalink / raw)
  To: Sergey Kovalev; +Cc: xen-devel, zaytsevgu

On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote:
> # Abstract
> 
> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
> the Windows 10 VM remain frozen for about a minute. After the
> minute it becomes responsive.
> 
> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
> press the wait cursor appears (blue circle indicator).
> 
> This is an intermittent fault been reproduced only twice.
> 
> # Technical notes
> 
> It have been noticed that there were no timer interrupts during
> the freeze.
> 
> zaytsevgu@gmail.com has debugged the received Xen state file and
> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
> 
> Based on that he provided two Python scripts: one to check the
> value and one to patch it.
> 
> Both "broken" state files we have been detected and patched
> successfully.
> 
> # Other information
> 
> ## Target machine
> 
> ```bash
> $ uname -a
> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> ```
> 
> ## Xen version
> 
> Build from source on tag RELEASE-4.12.4
> 
> ## OS version
> 
> * Windows 10 build 1803 x64

Do you also run other versions of Windows, and in which case I assume
you have never seen the issue on those, or it's this specific version
the only that you use?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 15:08   ` Roger Pau Monné
@ 2021-04-23 16:19     ` Sergey Kovalev
  2021-04-24  0:39       ` Tamas K Lengyel
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kovalev @ 2021-04-23 16:19 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, zaytsevgu


23.04.2021 18:08, Roger Pau Monné пишет:
> On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote:
>> # Abstract
>>
>> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
>> the Windows 10 VM remain frozen for about a minute. After the
>> minute it becomes responsive.
>>
>> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
>> press the wait cursor appears (blue circle indicator).
>>
>> This is an intermittent fault been reproduced only twice.
>>
>> # Technical notes
>>
>> It have been noticed that there were no timer interrupts during
>> the freeze.
>>
>> zaytsevgu@gmail.com has debugged the received Xen state file and
>> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
>>
>> Based on that he provided two Python scripts: one to check the
>> value and one to patch it.
>>
>> Both "broken" state files we have been detected and patched
>> successfully.
>>
>> # Other information
>>
>> ## Target machine
>>
>> ```bash
>> $ uname -a
>> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
>> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>> ```
>>
>> ## Xen version
>>
>> Build from source on tag RELEASE-4.12.4
>>
>> ## OS version
>>
>> * Windows 10 build 1803 x64
> 
> Do you also run other versions of Windows, and in which case I assume
> you have never seen the issue on those, or it's this specific version
> the only that you use?
> 
> Thanks, Roger.
> 

We use Windows 7 SP1 x86/x64, Windows 8.1 update1 and
Windows 10 1803 x64.

The Windows 10 is the only one affected by the bug at
the time.
-- 
With best regards,
Sergey Kovalev



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: [BUG] Windows is frozen after restore from snapshot
  2021-04-23 16:19     ` Sergey Kovalev
@ 2021-04-24  0:39       ` Tamas K Lengyel
  0 siblings, 0 replies; 10+ messages in thread
From: Tamas K Lengyel @ 2021-04-24  0:39 UTC (permalink / raw)
  To: Sergey Kovalev; +Cc: Roger Pau Monné, Xen-devel, zaytsevgu

On Fri, Apr 23, 2021 at 12:19 PM Sergey Kovalev <valor@list.ru> wrote:
>
>
> 23.04.2021 18:08, Roger Pau Monné пишет:
> > On Fri, Apr 23, 2021 at 01:22:34PM +0300, Sergey Kovalev wrote:
> >> # Abstract
> >>
> >> After `xl save win win.mem` and then `xl restore win.hvm win.mem`
> >> the Windows 10 VM remain frozen for about a minute. After the
> >> minute it becomes responsive.
> >>
> >> During the freeze the OS remains semi-responsive: on `Ctrl+Shift+Esc`
> >> press the wait cursor appears (blue circle indicator).
> >>
> >> This is an intermittent fault been reproduced only twice.
> >>
> >> # Technical notes
> >>
> >> It have been noticed that there were no timer interrupts during
> >> the freeze.
> >>
> >> zaytsevgu@gmail.com has debugged the received Xen state file and
> >> noticed that the flag HPET_TN_PERIODIC been set after unfreeze.
> >>
> >> Based on that he provided two Python scripts: one to check the
> >> value and one to patch it.
> >>
> >> Both "broken" state files we have been detected and patched
> >> successfully.
> >>
> >> # Other information
> >>
> >> ## Target machine
> >>
> >> ```bash
> >> $ uname -a
> >> Linux localhost 5.4.0-66-generic #74~18.04.2-Ubuntu SMP
> >> Fri Feb 5 11:17:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> >> ```
> >>
> >> ## Xen version
> >>
> >> Build from source on tag RELEASE-4.12.4
> >>
> >> ## OS version
> >>
> >> * Windows 10 build 1803 x64
> >
> > Do you also run other versions of Windows, and in which case I assume
> > you have never seen the issue on those, or it's this specific version
> > the only that you use?
> >
> > Thanks, Roger.
> >
>
> We use Windows 7 SP1 x86/x64, Windows 8.1 update1 and
> Windows 10 1803 x64.
>
> The Windows 10 is the only one affected by the bug at
> the time.

I can confirm that I have ran into this issue as well in the past, but
never had time to dig deeper into the root cause. I may add that with
snapshots taken of Windows 10 on Xen 4.14 or 4.15 and using those for
restoring I haven't seen it happen yet. The Win10 version didn't
change on my end, only the hypervisor got upgraded. So this may be a
bug that got fixed in newer Xen versions.

Tamas


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-24  0:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <6237e102-f2cf-a66e-09b6-954ebfe28f8c@list.ru>
2021-04-23 10:22 ` Fwd: [BUG] Windows is frozen after restore from snapshot Sergey Kovalev
2021-04-23 12:30   ` Jan Beulich
2021-04-23 12:55     ` Sergey Kovalev
2021-04-23 13:10       ` Георгий Зайцев
2021-04-23 13:21         ` Jan Beulich
2021-04-23 13:30           ` Георгий Зайцев
2021-04-23 13:40             ` Jan Beulich
2021-04-23 15:08   ` Roger Pau Monné
2021-04-23 16:19     ` Sergey Kovalev
2021-04-24  0:39       ` Tamas K Lengyel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.