All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
@ 2015-09-21 20:03 Andreas Sundstrom
  2015-09-22  7:22 ` Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Andreas Sundstrom @ 2015-09-21 20:03 UTC (permalink / raw)
  To: xen-devel

This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
applied) and Xen 4.4.1

I originally posted a bug report with Debian but got the suggestion to
file bugs with upstream as well.
Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480

Note that my original thought was that this bug probably is within GRUB.
But Ian asked me to file a bug with Xen as well, you have to live with the
fact that it is centered around GRUB though.

Here's the information from my original bug report:

Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
fail when chainloading the domU's grub. 64-bit domU seem to work 100%
of the time.

My understanding of the process:

 * dom0 launches domU with grub that is loaded from dom0's disk.
 * Grub reads config file from memdisk, and then looks for grub binary in
    domU filesystem.
 * If grub is found in domU it then chainloads (multiboot) that grub binary
    and the domU grub reads grub.cfg and continue booting.
 * If grub is not found in domU it reads grub.cfg and continues with boot.

It fails at step 3 in my list of the boot process, but sometimes it
does work so it may be something like a race condition that causes the
problem?

A workaround is to not install or rename /boot/xen in domU so that the
first grub that is loaded from dom0's disk will not find the grub
binary in the domU filesystem and hence continues to read grub.cfg and
boot. The drawback of this is of course that the two versions can't
differ too much as there are different setups creating grub.cfg and
then reading/parsing it at boot time.

I am not sure at this point whether this is a problem in XEN or a
problem in grub but I compiled the legacy pvgrub that uses some minios
from XEN (don't really know much more about it) and when that legacy
pvgrub chainloads the domU grub it seems to work 100% of the time. Now
the legace pvgrub is not a real alternative as it's not packaged for
Debian though.

When it fails "xl create vm -c" outputs this:
Parsing config from /etc/xen/vm
libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
type for domid=16
Unable to attach console
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
child [0] exited with error status 1

And "xl dmesg" shows errors like this:
(XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
0x0000000000000000 to 0x000000000000ffff.
(XEN) d16:v0: unhandled page fault (ec=0010)
(XEN) Pagetable walk from 0000000000000000:
(XEN) L4[0x000] = 0000000200256027 000000000000049c
(XEN) L3[0x000] = 0000000200255027 000000000000049d
(XEN) L2[0x000] = 0000000200251023 00000000000004a1
(XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
compat_create_bounce_frame+0xc6/0xde
(XEN) Domain 16 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e019:[<0000000000000000>]
(XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
(XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
(XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
(XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
(XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
(XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
(XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
(XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
(XEN) Guest stack trace from esp=005a5ff0:
(XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
0016b388
(XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
0016b380
(XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
0016b378
(XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
0016b370
(XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
0016b368
(XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
0016b360
(XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
0016b358
(XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
0016b350
(XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
0016b348
(XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
0016b340
(XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
0016b338
(XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
0016b330
(XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
0016b328
(XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
0016b320
(XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
0016b318
(XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
0016b310
(XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
0016b308
(XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
0016b300
(XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
0016b2f8
(XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
0016b2f0

An easy way to find out which grub you are in if the machine boots is
to hit 'c' and type 'ls', only the grub from dom0 will know about
(memdisk). So when trying to replicate the issue (and the domU
actually starts) you can hit 'c', type 'ls' (check for memdisk) and
then type 'halt' and relaunch the domU. Usually I can't launch more
than 4-5 times in a row before it fails, often it fails on my first
try.

For information I have reproduced on two different AMD desktop
processor machines, not sure if Intel would be any different. I'm
pretty sure I did tests with grub from unstable with same result at
some point, but can test again if that is likely to work.

The package that is in installed on the domU side is "grub-xen".

I am unable to understand how to debug grub further on my own, I have
printed out text from grub so that I understood that it is the
chainload that fails. I see no output from the domU grub (except when
it works as it should of course). I can help with further testing if
needed.

/Andreas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-21 20:03 [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Andreas Sundstrom
@ 2015-09-22  7:22 ` Andrew Cooper
  2015-09-22  8:52   ` Ian Campbell
  2015-09-22 13:26   ` Andreas Sundstrom
  2015-09-22  8:53 ` Ian Campbell
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Andrew Cooper @ 2015-09-22  7:22 UTC (permalink / raw)
  To: Andreas Sundstrom, xen-devel

On 21/09/2015 21:03, Andreas Sundstrom wrote:
> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
> applied) and Xen 4.4.1
>
> I originally posted a bug report with Debian but got the suggestion to
> file bugs with upstream as well.
> Debian bug report:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>
> Note that my original thought was that this bug probably is within GRUB.
> But Ian asked me to file a bug with Xen as well, you have to live with the
> fact that it is centered around GRUB though.
>
> Here's the information from my original bug report:
>
> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
> of the time.

You say sometimes.  Do you mean that repeated attempts to boot a 32bit
domU causes it to ether boot correctly, or die in the below manor?

>
> My understanding of the process:
>
>  * dom0 launches domU with grub that is loaded from dom0's disk.
>  * Grub reads config file from memdisk, and then looks for grub binary in
>     domU filesystem.
>  * If grub is found in domU it then chainloads (multiboot) that grub binary
>     and the domU grub reads grub.cfg and continue booting.
>  * If grub is not found in domU it reads grub.cfg and continues with boot.
>
> It fails at step 3 in my list of the boot process, but sometimes it
> does work so it may be something like a race condition that causes the
> problem?
>
> A workaround is to not install or rename /boot/xen in domU so that the
> first grub that is loaded from dom0's disk will not find the grub
> binary in the domU filesystem and hence continues to read grub.cfg and
> boot. The drawback of this is of course that the two versions can't
> differ too much as there are different setups creating grub.cfg and
> then reading/parsing it at boot time.
>
> I am not sure at this point whether this is a problem in XEN or a
> problem in grub but I compiled the legacy pvgrub that uses some minios
> from XEN (don't really know much more about it) and when that legacy
> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
> the legace pvgrub is not a real alternative as it's not packaged for
> Debian though.
>
> When it fails "xl create vm -c" outputs this:
> Parsing config from /etc/xen/vm
> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
> type for domid=16
> Unable to attach console
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
> child [0] exited with error status 1

These error messages are just because the domain crashes sufficiently
early that libxl can't find the console information.  Running `xl
create` without '-c' would remove the libxl errors.

>
> And "xl dmesg" shows errors like this:
> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
> 0x0000000000000000 to 0x000000000000ffff.
> (XEN) d16:v0: unhandled page fault (ec=0010)
> (XEN) Pagetable walk from 0000000000000000:
> (XEN) L4[0x000] = 0000000200256027 000000000000049c
> (XEN) L3[0x000] = 0000000200255027 000000000000049d
> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
> compat_create_bounce_frame+0xc6/0xde
> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e019:[<0000000000000000>]
> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
> (XEN) Guest stack trace from esp=005a5ff0:
> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
> 0016b388
> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
> 0016b380
> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
> 0016b378
> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
> 0016b370
> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
> 0016b368
> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
> 0016b360
> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
> 0016b358
> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
> 0016b350
> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
> 0016b348
> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
> 0016b340
> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
> 0016b338
> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
> 0016b330
> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
> 0016b328
> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
> 0016b320
> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
> 0016b318
> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
> 0016b310
> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
> 0016b308
> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
> 0016b300
> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
> 0016b2f8
> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
> 0016b2f0

This is a very concerning stack trace.  You appear to have a spliced
32/64bit domain which, irrespective if your other problems, should not
be able to exist.

The segment registers indicate that the domU is executing in ring1 which
makes it a 32bit guest (also why 32bit words are used for the stack
dump), but r10 through r14 have 64bit values in.

>
> An easy way to find out which grub you are in if the machine boots is
> to hit 'c' and type 'ls', only the grub from dom0 will know about
> (memdisk). So when trying to replicate the issue (and the domU
> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
> then type 'halt' and relaunch the domU. Usually I can't launch more
> than 4-5 times in a row before it fails, often it fails on my first
> try.
>
> For information I have reproduced on two different AMD desktop
> processor machines, not sure if Intel would be any different. I'm
> pretty sure I did tests with grub from unstable with same result at
> some point, but can test again if that is likely to work.
>
> The package that is in installed on the domU side is "grub-xen".
>
> I am unable to understand how to debug grub further on my own, I have
> printed out text from grub so that I understood that it is the
> chainload that fails. I see no output from the domU grub (except when
> it works as it should of course). I can help with further testing if
> needed.

It does appear to be an intermittent bug in 32bit grub-xen in the
eventual domU, but I have no help to offer with respect to debugging
grub-xen further.

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-22  7:22 ` Andrew Cooper
@ 2015-09-22  8:52   ` Ian Campbell
  2015-09-22 13:26   ` Andreas Sundstrom
  1 sibling, 0 replies; 24+ messages in thread
From: Ian Campbell @ 2015-09-22  8:52 UTC (permalink / raw)
  To: Andrew Cooper, Andreas Sundstrom, xen-devel

On Tue, 2015-09-22 at 08:22 +0100, Andrew Cooper wrote:
> 
> The segment registers indicate that the domU is executing in ring1 which
> makes it a 32bit guest (also why 32bit words are used for the stack
> dump), but r10 through r14 have 64bit values in.

r10..r14 are not visible to 32-bit guests but it appears that this dumping
function in Xen doesn't check for that and omit printing them.

I suspect that if these were zeroed or poisoned upon domain creation you
would see those values unmodified here.

> It does appear to be an intermittent bug in 32bit grub-xen in the
> eventual domU, but I have no help to offer with respect to debugging
> grub-xen further.

Me neither. I did suggest to Andreas that he also took it to grub-devel.
I'll reply to the original with a full quote and copy that list.

Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-21 20:03 [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Andreas Sundstrom
  2015-09-22  7:22 ` Andrew Cooper
@ 2015-09-22  8:53 ` Ian Campbell
  2015-09-22  8:53 ` [Xen-devel] " Ian Campbell
  2015-09-22 22:37 ` Samuel Thibault
  3 siblings, 0 replies; 24+ messages in thread
From: Ian Campbell @ 2015-09-22  8:53 UTC (permalink / raw)
  To: grub-devel, Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: Andreas Sundstrom, xen-devel

Hi Vladimir & grub-devel,

Do you have any thoughts on this issue with i386 pv-grub2?

Thanks, Ian.

On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
> applied) and Xen 4.4.1
> 
> I originally posted a bug report with Debian but got the suggestion to
> file bugs with upstream as well.
> Debian bug report:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
> 
> Note that my original thought was that this bug probably is within GRUB.
> But Ian asked me to file a bug with Xen as well, you have to live with
> the
> fact that it is centered around GRUB though.
> 
> Here's the information from my original bug report:
> 
> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
> of the time.
> 
> My understanding of the process:
> 
>  * dom0 launches domU with grub that is loaded from dom0's disk.
>  * Grub reads config file from memdisk, and then looks for grub binary in
>     domU filesystem.
>  * If grub is found in domU it then chainloads (multiboot) that grub
> binary
>     and the domU grub reads grub.cfg and continue booting.
>  * If grub is not found in domU it reads grub.cfg and continues with
> boot.
> 
> It fails at step 3 in my list of the boot process, but sometimes it
> does work so it may be something like a race condition that causes the
> problem?
> 
> A workaround is to not install or rename /boot/xen in domU so that the
> first grub that is loaded from dom0's disk will not find the grub
> binary in the domU filesystem and hence continues to read grub.cfg and
> boot. The drawback of this is of course that the two versions can't
> differ too much as there are different setups creating grub.cfg and
> then reading/parsing it at boot time.
> 
> I am not sure at this point whether this is a problem in XEN or a
> problem in grub but I compiled the legacy pvgrub that uses some minios
> from XEN (don't really know much more about it) and when that legacy
> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
> the legace pvgrub is not a real alternative as it's not packaged for
> Debian though.
> 
> When it fails "xl create vm -c" outputs this:
> Parsing config from /etc/xen/vm
> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
> type for domid=16
> Unable to attach console
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
> child [0] exited with error status 1
> 
> And "xl dmesg" shows errors like this:
> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
> 0x0000000000000000 to 0x000000000000ffff.
> (XEN) d16:v0: unhandled page fault (ec=0010)
> (XEN) Pagetable walk from 0000000000000000:
> (XEN) L4[0x000] = 0000000200256027 000000000000049c
> (XEN) L3[0x000] = 0000000200255027 000000000000049d
> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
> compat_create_bounce_frame+0xc6/0xde
> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e019:[<0000000000000000>]
> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
> (XEN) Guest stack trace from esp=005a5ff0:
> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
> 0016b388
> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
> 0016b380
> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
> 0016b378
> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
> 0016b370
> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
> 0016b368
> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
> 0016b360
> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
> 0016b358
> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
> 0016b350
> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
> 0016b348
> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
> 0016b340
> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
> 0016b338
> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
> 0016b330
> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
> 0016b328
> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
> 0016b320
> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
> 0016b318
> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
> 0016b310
> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
> 0016b308
> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
> 0016b300
> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
> 0016b2f8
> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
> 0016b2f0
> 
> An easy way to find out which grub you are in if the machine boots is
> to hit 'c' and type 'ls', only the grub from dom0 will know about
> (memdisk). So when trying to replicate the issue (and the domU
> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
> then type 'halt' and relaunch the domU. Usually I can't launch more
> than 4-5 times in a row before it fails, often it fails on my first
> try.
> 
> For information I have reproduced on two different AMD desktop
> processor machines, not sure if Intel would be any different. I'm
> pretty sure I did tests with grub from unstable with same result at
> some point, but can test again if that is likely to work.
> 
> The package that is in installed on the domU side is "grub-xen".
> 
> I am unable to understand how to debug grub further on my own, I have
> printed out text from grub so that I understood that it is the
> chainload that fails. I see no output from the domU grub (except when
> it works as it should of course). I can help with further testing if
> needed.
> 
> /Andreas
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-21 20:03 [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Andreas Sundstrom
  2015-09-22  7:22 ` Andrew Cooper
  2015-09-22  8:53 ` Ian Campbell
@ 2015-09-22  8:53 ` Ian Campbell
  2016-01-22 12:56   ` Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 12:56   ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
  2015-09-22 22:37 ` Samuel Thibault
  3 siblings, 2 replies; 24+ messages in thread
From: Ian Campbell @ 2015-09-22  8:53 UTC (permalink / raw)
  To: grub-devel, Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: Andreas Sundstrom, xen-devel

Hi Vladimir & grub-devel,

Do you have any thoughts on this issue with i386 pv-grub2?

Thanks, Ian.

On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
> applied) and Xen 4.4.1
> 
> I originally posted a bug report with Debian but got the suggestion to
> file bugs with upstream as well.
> Debian bug report:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
> 
> Note that my original thought was that this bug probably is within GRUB.
> But Ian asked me to file a bug with Xen as well, you have to live with
> the
> fact that it is centered around GRUB though.
> 
> Here's the information from my original bug report:
> 
> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
> of the time.
> 
> My understanding of the process:
> 
>  * dom0 launches domU with grub that is loaded from dom0's disk.
>  * Grub reads config file from memdisk, and then looks for grub binary in
>     domU filesystem.
>  * If grub is found in domU it then chainloads (multiboot) that grub
> binary
>     and the domU grub reads grub.cfg and continue booting.
>  * If grub is not found in domU it reads grub.cfg and continues with
> boot.
> 
> It fails at step 3 in my list of the boot process, but sometimes it
> does work so it may be something like a race condition that causes the
> problem?
> 
> A workaround is to not install or rename /boot/xen in domU so that the
> first grub that is loaded from dom0's disk will not find the grub
> binary in the domU filesystem and hence continues to read grub.cfg and
> boot. The drawback of this is of course that the two versions can't
> differ too much as there are different setups creating grub.cfg and
> then reading/parsing it at boot time.
> 
> I am not sure at this point whether this is a problem in XEN or a
> problem in grub but I compiled the legacy pvgrub that uses some minios
> from XEN (don't really know much more about it) and when that legacy
> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
> the legace pvgrub is not a real alternative as it's not packaged for
> Debian though.
> 
> When it fails "xl create vm -c" outputs this:
> Parsing config from /etc/xen/vm
> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
> type for domid=16
> Unable to attach console
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
> child [0] exited with error status 1
> 
> And "xl dmesg" shows errors like this:
> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
> 0x0000000000000000 to 0x000000000000ffff.
> (XEN) d16:v0: unhandled page fault (ec=0010)
> (XEN) Pagetable walk from 0000000000000000:
> (XEN) L4[0x000] = 0000000200256027 000000000000049c
> (XEN) L3[0x000] = 0000000200255027 000000000000049d
> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
> compat_create_bounce_frame+0xc6/0xde
> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e019:[<0000000000000000>]
> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
> (XEN) Guest stack trace from esp=005a5ff0:
> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
> 0016b388
> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
> 0016b380
> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
> 0016b378
> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
> 0016b370
> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
> 0016b368
> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
> 0016b360
> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
> 0016b358
> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
> 0016b350
> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
> 0016b348
> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
> 0016b340
> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
> 0016b338
> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
> 0016b330
> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
> 0016b328
> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
> 0016b320
> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
> 0016b318
> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
> 0016b310
> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
> 0016b308
> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
> 0016b300
> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
> 0016b2f8
> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
> 0016b2f0
> 
> An easy way to find out which grub you are in if the machine boots is
> to hit 'c' and type 'ls', only the grub from dom0 will know about
> (memdisk). So when trying to replicate the issue (and the domU
> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
> then type 'halt' and relaunch the domU. Usually I can't launch more
> than 4-5 times in a row before it fails, often it fails on my first
> try.
> 
> For information I have reproduced on two different AMD desktop
> processor machines, not sure if Intel would be any different. I'm
> pretty sure I did tests with grub from unstable with same result at
> some point, but can test again if that is likely to work.
> 
> The package that is in installed on the domU side is "grub-xen".
> 
> I am unable to understand how to debug grub further on my own, I have
> printed out text from grub so that I understood that it is the
> chainload that fails. I see no output from the domU grub (except when
> it works as it should of course). I can help with further testing if
> needed.
> 
> /Andreas
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-22  7:22 ` Andrew Cooper
  2015-09-22  8:52   ` Ian Campbell
@ 2015-09-22 13:26   ` Andreas Sundstrom
  1 sibling, 0 replies; 24+ messages in thread
From: Andreas Sundstrom @ 2015-09-22 13:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel


Citerar Andrew Cooper <andrew.cooper3@citrix.com>:

> On 21/09/2015 21:03, Andreas Sundstrom wrote:
>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>> of the time.
>
> You say sometimes.  Do you mean that repeated attempts to boot a 32bit
> domU causes it to ether boot correctly, or die in the below manor?

Yes that is correct, it may or may not be able to complete the booting.

>> When it fails "xl create vm -c" outputs this:
>> Parsing config from /etc/xen/vm
>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>> type for domid=16
>> Unable to attach console
>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>> child [0] exited with error status 1
>
> These error messages are just because the domain crashes sufficiently
> early that libxl can't find the console information.  Running `xl
> create` without '-c' would remove the libxl errors.

Correct, they only appear due to the failed connecting of the console
so not really relevant to the actual issue.

>> I am unable to understand how to debug grub further on my own, I have
>> printed out text from grub so that I understood that it is the
>> chainload that fails. I see no output from the domU grub (except when
>> it works as it should of course). I can help with further testing if
>> needed.
>
> It does appear to be an intermittent bug in 32bit grub-xen in the
> eventual domU, but I have no help to offer with respect to debugging
> grub-xen further.
>
> ~Andrew

As Ian Campbell suggested I have also filed a bug with upstream grub:
http://savannah.gnu.org/bugs/?46014

/Andreas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-21 20:03 [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Andreas Sundstrom
                   ` (2 preceding siblings ...)
  2015-09-22  8:53 ` [Xen-devel] " Ian Campbell
@ 2015-09-22 22:37 ` Samuel Thibault
  2015-09-23  8:34   ` Ian Campbell
  2015-09-23  8:37   ` Andreas Sundstrom
  3 siblings, 2 replies; 24+ messages in thread
From: Samuel Thibault @ 2015-09-22 22:37 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: xen-devel

Andreas Sundstrom, le Mon 21 Sep 2015 22:03:22 +0200, a écrit :
> Note that my original thought was that this bug probably is within GRUB.

It's probably in the GRUB implementation of loading the domU GRUB, since
you say that pvgrub1 loads it fine.

> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
> compat_create_bounce_frame+0xc6/0xde

So it's inside xen entry code...

> (XEN) Guest stack trace from esp=005a5ff0:

This looks like the end of the stack

> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
> 0016b388
> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
> 0016b380
[...]

and this looks like a lot of consecutive numbers.  Perhaps we simply
somehow overflow?  Did you try giving less memory to the domU?

> I see no output from the domU grub (except when it works as it should
> of course).

Yes, as explained in another mail domU has to get to connect to the
console before getting messages from there.  Another way is to make
console_io hypercalls, which should end up into xl dmesg.

You may also want to enable grub debugging prints, by setting the debug
variable to "all".

Samuel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-22 22:37 ` Samuel Thibault
@ 2015-09-23  8:34   ` Ian Campbell
  2015-09-23 12:47     ` Andreas Sundstrom
  2015-09-23  8:37   ` Andreas Sundstrom
  1 sibling, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2015-09-23  8:34 UTC (permalink / raw)
  To: Samuel Thibault, Andreas Sundstrom; +Cc: xen-devel

On Wed, 2015-09-23 at 00:37 +0200, Samuel Thibault wrote:
> Andreas Sundstrom, le Mon 21 Sep 2015 22:03:22 +0200, a écrit :
> > Note that my original thought was that this bug probably is within
> > GRUB.
> 
> It's probably in the GRUB implementation of loading the domU GRUB, since
> you say that pvgrub1 loads it fine.
> 
> > (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
> > compat_create_bounce_frame+0xc6/0xde
> 
> So it's inside xen entry code...
> 
> > (XEN) Guest stack trace from esp=005a5ff0:
> 
> This looks like the end of the stack
> 
> > (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
> > 0016b388
> > (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
> > 0016b380
> [...]
> 
> and this looks like a lot of consecutive numbers.  Perhaps we simply
> somehow overflow?  Did you try giving less memory to the domU?

Along those lines, if the _host_ has buckets of RAM then might it be worth
restricting it in case the issue is with getting MFNs which are not
representably by the 32-bit PV interfaces? (IIRC the limit is ~160G due to
the size of the m2p hole, a 32-bit MFN spans 16TB so it's unlikely to be
that).

Likewise maybe the issue is with full addresses which don't fit in a 32-bit
number (which is maybe more likely to happen if grub uses a 1:1 mapping
like I would guess it does), so limiting the host to <4GB might also be
interesting?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-22 22:37 ` Samuel Thibault
  2015-09-23  8:34   ` Ian Campbell
@ 2015-09-23  8:37   ` Andreas Sundstrom
  1 sibling, 0 replies; 24+ messages in thread
From: Andreas Sundstrom @ 2015-09-23  8:37 UTC (permalink / raw)
  To: Samuel Thibault; +Cc: xen-devel

Citerar Samuel Thibault <samuel.thibault@ens-lyon.org>:

> Andreas Sundstrom, le Mon 21 Sep 2015 22:03:22 +0200, a écrit :
>> Note that my original thought was that this bug probably is within GRUB.
>
> It's probably in the GRUB implementation of loading the domU GRUB, since
> you say that pvgrub1 loads it fine.
>
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>> compat_create_bounce_frame+0xc6/0xde
>
> So it's inside xen entry code...
>
>> (XEN) Guest stack trace from esp=005a5ff0:
>
> This looks like the end of the stack
>
>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>> 0016b388
>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>> 0016b380
> [...]
>
> and this looks like a lot of consecutive numbers.  Perhaps we simply
> somehow overflow?  Did you try giving less memory to the domU?

No I had not tried that, one of the machines that I have used to replicate
the problem with had:
maxmem = 1024
memory = 512

First just removed the maxmem part as that is probably quite unusal.
No difference.

Then I set memory to 128 and at first I was not able to reproduce.
But I did some more tests while writing this response, and eventually it
failed with 128M as well.
Any use reason to try lower?

>> I see no output from the domU grub (except when it works as it should
>> of course).
>
> Yes, as explained in another mail domU has to get to connect to the
> console before getting messages from there.  Another way is to make
> console_io hypercalls, which should end up into xl dmesg.
>
> You may also want to enable grub debugging prints, by setting the debug
> variable to "all".

I just tried some with "set debug=all" at the top of the grub.cfg file.
And I could not see any difference in the output from the 1st. grub when
comparing a working chainload to a non-working (by diffing the output).

Adding the debug statement to the grub.cfg that is loaded by the 2nd.
grub (loaded from domU) gives no output at all when booting fails and
of course a lot of output when booting works.

So it seems quite clear to me that the actual chainloading/handover to
the 2nd. grub is where something goes wrong.

/Andreas



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-23  8:34   ` Ian Campbell
@ 2015-09-23 12:47     ` Andreas Sundstrom
  2015-09-23 14:18       ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Sundstrom @ 2015-09-23 12:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Samuel Thibault, xen-devel

Citerar Ian Campbell <ian.campbell@citrix.com>:

> On Wed, 2015-09-23 at 00:37 +0200, Samuel Thibault wrote:
>> Andreas Sundstrom, le Mon 21 Sep 2015 22:03:22 +0200, a écrit :
>> > Note that my original thought was that this bug probably is within
>> > GRUB.
>>
>> It's probably in the GRUB implementation of loading the domU GRUB, since
>> you say that pvgrub1 loads it fine.
>>
>> > (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>> > compat_create_bounce_frame+0xc6/0xde
>>
>> So it's inside xen entry code...
>>
>> > (XEN) Guest stack trace from esp=005a5ff0:
>>
>> This looks like the end of the stack
>>
>> > (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>> > 0016b388
>> > (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>> > 0016b380
>> [...]
>>
>> and this looks like a lot of consecutive numbers.  Perhaps we simply
>> somehow overflow?  Did you try giving less memory to the domU?
>
> Along those lines, if the _host_ has buckets of RAM then might it be worth
> restricting it in case the issue is with getting MFNs which are not
> representably by the 32-bit PV interfaces? (IIRC the limit is ~160G due to
> the size of the m2p hole, a 32-bit MFN spans 16TB so it's unlikely to be
> that).
>
> Likewise maybe the issue is with full addresses which don't fit in a 32-bit
> number (which is maybe more likely to happen if grub uses a 1:1 mapping
> like I would guess it does), so limiting the host to <4GB might also be
> interesting?
>

If this was meant for me I will need more information to understand  
what to test.
dom0 has either 12G or 8G memory in my test machines if that makes a  
difference.

/Andreas


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-23 12:47     ` Andreas Sundstrom
@ 2015-09-23 14:18       ` Ian Campbell
  2015-09-24 17:28         ` Andreas Sundstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2015-09-23 14:18 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: Samuel Thibault, xen-devel

On Wed, 2015-09-23 at 12:47 +0000, Andreas Sundstrom wrote:
> Citerar Ian Campbell <ian.campbell@citrix.com>:
> 
> > Along those lines, if the _host_ has buckets of RAM then might it be
> > worth
> > restricting it in case the issue is with getting MFNs which are not
> > representably by the 32-bit PV interfaces? (IIRC the limit is ~160G due
> > to
> > the size of the m2p hole, a 32-bit MFN spans 16TB so it's unlikely to
> > be
> > that).
> > 
> > Likewise maybe the issue is with full addresses which don't fit in a 32
> > -bit
> > number (which is maybe more likely to happen if grub uses a 1:1 mapping
> > like I would guess it does), so limiting the host to <4GB might also be
> > interesting?
> > 
> 
> If this was meant for me I will need more information to understand  
> what to test.
> dom0 has either 12G or 8G memory in my test machines if that makes a  
> difference.

It was, sorry for not being clear.

How much memory do the test machines have?

If it is more than 160G then try booting with "mem=160G" on the hypervisor
(not Linux) command line. You can just edit that in via grub.

Then try with mem=4G (which might require shrinking dom0 too of course).

Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-23 14:18       ` Ian Campbell
@ 2015-09-24 17:28         ` Andreas Sundstrom
  2015-09-25  8:36           ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Sundstrom @ 2015-09-24 17:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Samuel Thibault, xen-devel

On 2015-09-23 16:18, Ian Campbell wrote:
> On Wed, 2015-09-23 at 12:47 +0000, Andreas Sundstrom wrote:
>> Citerar Ian Campbell <ian.campbell@citrix.com>:
>>
>>> Along those lines, if the _host_ has buckets of RAM then might it be
>>> worth
>>> restricting it in case the issue is with getting MFNs which are not
>>> representably by the 32-bit PV interfaces? (IIRC the limit is ~160G due
>>> to
>>> the size of the m2p hole, a 32-bit MFN spans 16TB so it's unlikely to
>>> be
>>> that).
>>>
>>> Likewise maybe the issue is with full addresses which don't fit in a 32
>>> -bit
>>> number (which is maybe more likely to happen if grub uses a 1:1 mapping
>>> like I would guess it does), so limiting the host to <4GB might also be
>>> interesting?
>>>
>>
>> If this was meant for me I will need more information to understand  
>> what to test.
>> dom0 has either 12G or 8G memory in my test machines if that makes a  
>> difference.
> 
> It was, sorry for not being clear.
> 
> How much memory do the test machines have?
> 
> If it is more than 160G then try booting with "mem=160G" on the hypervisor
> (not Linux) command line. You can just edit that in via grub.
> 
> Then try with mem=4G (which might require shrinking dom0 too of course).

Well as I said my test machines only have 12 and 8G of memory.
I did a quick test with mem=2G though just to be sure, it failed on
first attempt.

/Andreas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-24 17:28         ` Andreas Sundstrom
@ 2015-09-25  8:36           ` Ian Campbell
  2015-09-25 13:23             ` Andreas Sundstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2015-09-25  8:36 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: Samuel Thibault, xen-devel

On Thu, 2015-09-24 at 19:28 +0200, Andreas Sundstrom wrote:
> On 2015-09-23 16:18, Ian Campbell wrote:
> > On Wed, 2015-09-23 at 12:47 +0000, Andreas Sundstrom wrote:
> > > Citerar Ian Campbell <ian.campbell@citrix.com>:
> > > 
> > > > Along those lines, if the _host_ has buckets of RAM then might it
> > > > be
> > > > worth
> > > > restricting it in case the issue is with getting MFNs which are not
> > > > representably by the 32-bit PV interfaces? (IIRC the limit is ~160G
> > > > due
> > > > to
> > > > the size of the m2p hole, a 32-bit MFN spans 16TB so it's unlikely
> > > > to
> > > > be
> > > > that).
> > > > 
> > > > Likewise maybe the issue is with full addresses which don't fit in
> > > > a 32
> > > > -bit
> > > > number (which is maybe more likely to happen if grub uses a 1:1
> > > > mapping
> > > > like I would guess it does), so limiting the host to <4GB might
> > > > also be
> > > > interesting?
> > > > 
> > > 
> > > If this was meant for me I will need more information to understand  
> > > what to test.
> > > dom0 has either 12G or 8G memory in my test machines if that makes a 
> > > difference.
> > 
> > It was, sorry for not being clear.
> > 
> > How much memory do the test machines have?
> > 
> > If it is more than 160G then try booting with "mem=160G" on the
> > hypervisor
> > (not Linux) command line. You can just edit that in via grub.
> > 
> > Then try with mem=4G (which might require shrinking dom0 too of
> > course).
> 
> Well as I said my test machines only have 12 and 8G of memory.

You said dom0 did, from which I wasn't able to tell how much RAM the host
had, giving 12GB to dom0 on a 256G machine would be a plausible
configuration. But this is a confusing distinction for many and I should
have made that reasoning for including the 160G test clearer, sorry.

> I did a quick test with mem=2G though just to be sure, it failed on
> first attempt.

OK, so it is unlikely to be any of the possible integer overflow type
things I was thinking of then, thanks for testing.

Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-25  8:36           ` Ian Campbell
@ 2015-09-25 13:23             ` Andreas Sundstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Andreas Sundstrom @ 2015-09-25 13:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Samuel Thibault, xen-devel


Citerar Ian Campbell <ian.campbell@citrix.com>:

> On Thu, 2015-09-24 at 19:28 +0200, Andreas Sundstrom wrote:
>> On 2015-09-23 16:18, Ian Campbell wrote:
>> > On Wed, 2015-09-23 at 12:47 +0000, Andreas Sundstrom wrote:
>> > > Citerar Ian Campbell <ian.campbell@citrix.com>:
>> > >
>> > > > Along those lines, if the _host_ has buckets of RAM then might it
>> > > > be
>> > > > worth
>> > > > restricting it in case the issue is with getting MFNs which are not
>> > > > representably by the 32-bit PV interfaces? (IIRC the limit is ~160G
>> > > > due
>> > > > to
>> > > > the size of the m2p hole, a 32-bit MFN spans 16TB so it's unlikely
>> > > > to
>> > > > be
>> > > > that).
>> > > >
>> > > > Likewise maybe the issue is with full addresses which don't fit in
>> > > > a 32
>> > > > -bit
>> > > > number (which is maybe more likely to happen if grub uses a 1:1
>> > > > mapping
>> > > > like I would guess it does), so limiting the host to <4GB might
>> > > > also be
>> > > > interesting?
>> > > >
>> > >
>> > > If this was meant for me I will need more information to understand
>> > > what to test.
>> > > dom0 has either 12G or 8G memory in my test machines if that makes a
>> > > difference.
>> >
>> > It was, sorry for not being clear.
>> >
>> > How much memory do the test machines have?
>> >
>> > If it is more than 160G then try booting with "mem=160G" on the
>> > hypervisor
>> > (not Linux) command line. You can just edit that in via grub.
>> >
>> > Then try with mem=4G (which might require shrinking dom0 too of
>> > course).
>>
>> Well as I said my test machines only have 12 and 8G of memory.
>
> You said dom0 did, from which I wasn't able to tell how much RAM the host
> had, giving 12GB to dom0 on a 256G machine would be a plausible
> configuration. But this is a confusing distinction for many and I should
> have made that reasoning for including the 160G test clearer, sorry.

No worries I was equally unclear when I should have said that the host had
X amount of RAM not dom0.

>
>> I did a quick test with mem=2G though just to be sure, it failed on
>> first attempt.
>
> OK, so it is unlikely to be any of the possible integer overflow type
> things I was thinking of then, thanks for testing.

No worries, I have received nothing with regards to grub as of yet but I
think that is where further debugging needs to happen.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-22  8:53 ` [Xen-devel] " Ian Campbell
@ 2016-01-22 12:56   ` Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 12:56   ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 0 replies; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 12:56 UTC (permalink / raw)
  To: Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7439 bytes --]

On 22.09.2015 10:53, Ian Campbell wrote:
> Hi Vladimir & grub-devel,
> 
> Do you have any thoughts on this issue with i386 pv-grub2?
> 
Is it still an issue? If so I'll try to replicate it. From stack dump I
see that it has jumped to NULL. GRUB has no threads so it's not a race
condition with itself but may be one with some Xen part. An altrnative
possibility is that grub forgets to flush cache at some point in boot
process.
> Thanks, Ian.
> 
> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>> applied) and Xen 4.4.1
>>
>> I originally posted a bug report with Debian but got the suggestion to
>> file bugs with upstream as well.
>> Debian bug report:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>
>> Note that my original thought was that this bug probably is within GRUB.
>> But Ian asked me to file a bug with Xen as well, you have to live with
>> the
>> fact that it is centered around GRUB though.
>>
>> Here's the information from my original bug report:
>>
>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>> of the time.
>>
>> My understanding of the process:
>>
>>  * dom0 launches domU with grub that is loaded from dom0's disk.
>>  * Grub reads config file from memdisk, and then looks for grub binary in
>>     domU filesystem.
>>  * If grub is found in domU it then chainloads (multiboot) that grub
>> binary
>>     and the domU grub reads grub.cfg and continue booting.
>>  * If grub is not found in domU it reads grub.cfg and continues with
>> boot.
>>
>> It fails at step 3 in my list of the boot process, but sometimes it
>> does work so it may be something like a race condition that causes the
>> problem?
>>
>> A workaround is to not install or rename /boot/xen in domU so that the
>> first grub that is loaded from dom0's disk will not find the grub
>> binary in the domU filesystem and hence continues to read grub.cfg and
>> boot. The drawback of this is of course that the two versions can't
>> differ too much as there are different setups creating grub.cfg and
>> then reading/parsing it at boot time.
>>
>> I am not sure at this point whether this is a problem in XEN or a
>> problem in grub but I compiled the legacy pvgrub that uses some minios
>> from XEN (don't really know much more about it) and when that legacy
>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>> the legace pvgrub is not a real alternative as it's not packaged for
>> Debian though.
>>
>> When it fails "xl create vm -c" outputs this:
>> Parsing config from /etc/xen/vm
>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>> type for domid=16
>> Unable to attach console
>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>> child [0] exited with error status 1
>>
>> And "xl dmesg" shows errors like this:
>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>> 0x0000000000000000 to 0x000000000000ffff.
>> (XEN) d16:v0: unhandled page fault (ec=0010)
>> (XEN) Pagetable walk from 0000000000000000:
>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>> compat_create_bounce_frame+0xc6/0xde
>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e019:[<0000000000000000>]
>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>> (XEN) Guest stack trace from esp=005a5ff0:
>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>> 0016b388
>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>> 0016b380
>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>> 0016b378
>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>> 0016b370
>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>> 0016b368
>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>> 0016b360
>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>> 0016b358
>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>> 0016b350
>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>> 0016b348
>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>> 0016b340
>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>> 0016b338
>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>> 0016b330
>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>> 0016b328
>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>> 0016b320
>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>> 0016b318
>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>> 0016b310
>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>> 0016b308
>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>> 0016b300
>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>> 0016b2f8
>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>> 0016b2f0
>>
>> An easy way to find out which grub you are in if the machine boots is
>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>> (memdisk). So when trying to replicate the issue (and the domU
>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>> then type 'halt' and relaunch the domU. Usually I can't launch more
>> than 4-5 times in a row before it fails, often it fails on my first
>> try.
>>
>> For information I have reproduced on two different AMD desktop
>> processor machines, not sure if Intel would be any different. I'm
>> pretty sure I did tests with grub from unstable with same result at
>> some point, but can test again if that is likely to work.
>>
>> The package that is in installed on the domU side is "grub-xen".
>>
>> I am unable to understand how to debug grub further on my own, I have
>> printed out text from grub so that I understood that it is the
>> chainload that fails. I see no output from the domU grub (except when
>> it works as it should of course). I can help with further testing if
>> needed.
>>
>> /Andreas
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2015-09-22  8:53 ` [Xen-devel] " Ian Campbell
  2016-01-22 12:56   ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 12:56   ` Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:01     ` Andrew Cooper
                       ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 12:56 UTC (permalink / raw)
  To: Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel

[-- Attachment #1: Type: text/plain, Size: 7439 bytes --]

On 22.09.2015 10:53, Ian Campbell wrote:
> Hi Vladimir & grub-devel,
> 
> Do you have any thoughts on this issue with i386 pv-grub2?
> 
Is it still an issue? If so I'll try to replicate it. From stack dump I
see that it has jumped to NULL. GRUB has no threads so it's not a race
condition with itself but may be one with some Xen part. An altrnative
possibility is that grub forgets to flush cache at some point in boot
process.
> Thanks, Ian.
> 
> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>> applied) and Xen 4.4.1
>>
>> I originally posted a bug report with Debian but got the suggestion to
>> file bugs with upstream as well.
>> Debian bug report:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>
>> Note that my original thought was that this bug probably is within GRUB.
>> But Ian asked me to file a bug with Xen as well, you have to live with
>> the
>> fact that it is centered around GRUB though.
>>
>> Here's the information from my original bug report:
>>
>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>> of the time.
>>
>> My understanding of the process:
>>
>>  * dom0 launches domU with grub that is loaded from dom0's disk.
>>  * Grub reads config file from memdisk, and then looks for grub binary in
>>     domU filesystem.
>>  * If grub is found in domU it then chainloads (multiboot) that grub
>> binary
>>     and the domU grub reads grub.cfg and continue booting.
>>  * If grub is not found in domU it reads grub.cfg and continues with
>> boot.
>>
>> It fails at step 3 in my list of the boot process, but sometimes it
>> does work so it may be something like a race condition that causes the
>> problem?
>>
>> A workaround is to not install or rename /boot/xen in domU so that the
>> first grub that is loaded from dom0's disk will not find the grub
>> binary in the domU filesystem and hence continues to read grub.cfg and
>> boot. The drawback of this is of course that the two versions can't
>> differ too much as there are different setups creating grub.cfg and
>> then reading/parsing it at boot time.
>>
>> I am not sure at this point whether this is a problem in XEN or a
>> problem in grub but I compiled the legacy pvgrub that uses some minios
>> from XEN (don't really know much more about it) and when that legacy
>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>> the legace pvgrub is not a real alternative as it's not packaged for
>> Debian though.
>>
>> When it fails "xl create vm -c" outputs this:
>> Parsing config from /etc/xen/vm
>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>> type for domid=16
>> Unable to attach console
>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>> child [0] exited with error status 1
>>
>> And "xl dmesg" shows errors like this:
>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>> 0x0000000000000000 to 0x000000000000ffff.
>> (XEN) d16:v0: unhandled page fault (ec=0010)
>> (XEN) Pagetable walk from 0000000000000000:
>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>> compat_create_bounce_frame+0xc6/0xde
>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e019:[<0000000000000000>]
>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>> (XEN) Guest stack trace from esp=005a5ff0:
>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>> 0016b388
>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>> 0016b380
>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>> 0016b378
>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>> 0016b370
>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>> 0016b368
>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>> 0016b360
>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>> 0016b358
>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>> 0016b350
>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>> 0016b348
>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>> 0016b340
>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>> 0016b338
>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>> 0016b330
>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>> 0016b328
>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>> 0016b320
>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>> 0016b318
>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>> 0016b310
>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>> 0016b308
>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>> 0016b300
>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>> 0016b2f8
>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>> 0016b2f0
>>
>> An easy way to find out which grub you are in if the machine boots is
>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>> (memdisk). So when trying to replicate the issue (and the domU
>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>> then type 'halt' and relaunch the domU. Usually I can't launch more
>> than 4-5 times in a row before it fails, often it fails on my first
>> try.
>>
>> For information I have reproduced on two different AMD desktop
>> processor machines, not sure if Intel would be any different. I'm
>> pretty sure I did tests with grub from unstable with same result at
>> some point, but can test again if that is likely to work.
>>
>> The package that is in installed on the domU side is "grub-xen".
>>
>> I am unable to understand how to debug grub further on my own, I have
>> printed out text from grub so that I understood that it is the
>> chainload that fails. I see no output from the domU grub (except when
>> it works as it should of course). I can help with further testing if
>> needed.
>>
>> /Andreas
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 12:56   ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:01     ` Andrew Cooper
@ 2016-01-22 13:01     ` Andrew Cooper
  2016-01-22 17:44       ` [Xen-devel] " Andreas Sundstrom
  2 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:01 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
	grub-devel
  Cc: Andreas Sundstrom, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 8144 bytes --]

On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.09.2015 10:53, Ian Campbell wrote:
>> Hi Vladimir & grub-devel,
>>
>> Do you have any thoughts on this issue with i386 pv-grub2?
>>
> Is it still an issue? If so I'll try to replicate it. From stack dump I
> see that it has jumped to NULL. GRUB has no threads so it's not a race
> condition with itself but may be one with some Xen part. An altrnative
> possibility is that grub forgets to flush cache at some point in boot
> process.

Looks like GRUB doesn't have a traptable registered with Xen (the PV
equivalent of the IDT).

First, Xen tried to inject a #GP fault and found that the entry EIP was
at 0 (which is sadly the default if nothing is specified).  It then took
a pagefault while attempting to inject the #GP, and crashed the domain.

~Andrew

>> Thanks, Ian.
>>
>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>>> applied) and Xen 4.4.1
>>>
>>> I originally posted a bug report with Debian but got the suggestion to
>>> file bugs with upstream as well.
>>> Debian bug report:
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>>
>>> Note that my original thought was that this bug probably is within GRUB.
>>> But Ian asked me to file a bug with Xen as well, you have to live with
>>> the
>>> fact that it is centered around GRUB though.
>>>
>>> Here's the information from my original bug report:
>>>
>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>>> of the time.
>>>
>>> My understanding of the process:
>>>
>>>  * dom0 launches domU with grub that is loaded from dom0's disk.
>>>  * Grub reads config file from memdisk, and then looks for grub binary in
>>>     domU filesystem.
>>>  * If grub is found in domU it then chainloads (multiboot) that grub
>>> binary
>>>     and the domU grub reads grub.cfg and continue booting.
>>>  * If grub is not found in domU it reads grub.cfg and continues with
>>> boot.
>>>
>>> It fails at step 3 in my list of the boot process, but sometimes it
>>> does work so it may be something like a race condition that causes the
>>> problem?
>>>
>>> A workaround is to not install or rename /boot/xen in domU so that the
>>> first grub that is loaded from dom0's disk will not find the grub
>>> binary in the domU filesystem and hence continues to read grub.cfg and
>>> boot. The drawback of this is of course that the two versions can't
>>> differ too much as there are different setups creating grub.cfg and
>>> then reading/parsing it at boot time.
>>>
>>> I am not sure at this point whether this is a problem in XEN or a
>>> problem in grub but I compiled the legacy pvgrub that uses some minios
>>> from XEN (don't really know much more about it) and when that legacy
>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>>> the legace pvgrub is not a real alternative as it's not packaged for
>>> Debian though.
>>>
>>> When it fails "xl create vm -c" outputs this:
>>> Parsing config from /etc/xen/vm
>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>>> type for domid=16
>>> Unable to attach console
>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>>> child [0] exited with error status 1
>>>
>>> And "xl dmesg" shows errors like this:
>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>>> 0x0000000000000000 to 0x000000000000ffff.
>>> (XEN) d16:v0: unhandled page fault (ec=0010)
>>> (XEN) Pagetable walk from 0000000000000000:
>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>>> compat_create_bounce_frame+0xc6/0xde
>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e019:[<0000000000000000>]
>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>>> (XEN) Guest stack trace from esp=005a5ff0:
>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>>> 0016b388
>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>>> 0016b380
>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>>> 0016b378
>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>>> 0016b370
>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>>> 0016b368
>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>>> 0016b360
>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>>> 0016b358
>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>>> 0016b350
>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>>> 0016b348
>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>>> 0016b340
>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>>> 0016b338
>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>>> 0016b330
>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>>> 0016b328
>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>>> 0016b320
>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>>> 0016b318
>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>>> 0016b310
>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>>> 0016b308
>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>>> 0016b300
>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>>> 0016b2f8
>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>>> 0016b2f0
>>>
>>> An easy way to find out which grub you are in if the machine boots is
>>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>>> (memdisk). So when trying to replicate the issue (and the domU
>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>>> then type 'halt' and relaunch the domU. Usually I can't launch more
>>> than 4-5 times in a row before it fails, often it fails on my first
>>> try.
>>>
>>> For information I have reproduced on two different AMD desktop
>>> processor machines, not sure if Intel would be any different. I'm
>>> pretty sure I did tests with grub from unstable with same result at
>>> some point, but can test again if that is likely to work.
>>>
>>> The package that is in installed on the domU side is "grub-xen".
>>>
>>> I am unable to understand how to debug grub further on my own, I have
>>> printed out text from grub so that I understood that it is the
>>> chainload that fails. I see no output from the domU grub (except when
>>> it works as it should of course). I can help with further testing if
>>> needed.
>>>
>>> /Andreas
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 8886 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 12:56   ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 13:01     ` Andrew Cooper
  2016-01-22 13:08       ` Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:08       ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:01     ` Andrew Cooper
  2016-01-22 17:44       ` [Xen-devel] " Andreas Sundstrom
  2 siblings, 2 replies; 24+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:01 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
	grub-devel
  Cc: Andreas Sundstrom, xen-devel

[-- Attachment #1: Type: text/plain, Size: 7962 bytes --]

On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.09.2015 10:53, Ian Campbell wrote:
>> Hi Vladimir & grub-devel,
>>
>> Do you have any thoughts on this issue with i386 pv-grub2?
>>
> Is it still an issue? If so I'll try to replicate it. From stack dump I
> see that it has jumped to NULL. GRUB has no threads so it's not a race
> condition with itself but may be one with some Xen part. An altrnative
> possibility is that grub forgets to flush cache at some point in boot
> process.

Looks like GRUB doesn't have a traptable registered with Xen (the PV
equivalent of the IDT).

First, Xen tried to inject a #GP fault and found that the entry EIP was
at 0 (which is sadly the default if nothing is specified).  It then took
a pagefault while attempting to inject the #GP, and crashed the domain.

~Andrew

>> Thanks, Ian.
>>
>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>>> applied) and Xen 4.4.1
>>>
>>> I originally posted a bug report with Debian but got the suggestion to
>>> file bugs with upstream as well.
>>> Debian bug report:
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>>
>>> Note that my original thought was that this bug probably is within GRUB.
>>> But Ian asked me to file a bug with Xen as well, you have to live with
>>> the
>>> fact that it is centered around GRUB though.
>>>
>>> Here's the information from my original bug report:
>>>
>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>>> of the time.
>>>
>>> My understanding of the process:
>>>
>>>  * dom0 launches domU with grub that is loaded from dom0's disk.
>>>  * Grub reads config file from memdisk, and then looks for grub binary in
>>>     domU filesystem.
>>>  * If grub is found in domU it then chainloads (multiboot) that grub
>>> binary
>>>     and the domU grub reads grub.cfg and continue booting.
>>>  * If grub is not found in domU it reads grub.cfg and continues with
>>> boot.
>>>
>>> It fails at step 3 in my list of the boot process, but sometimes it
>>> does work so it may be something like a race condition that causes the
>>> problem?
>>>
>>> A workaround is to not install or rename /boot/xen in domU so that the
>>> first grub that is loaded from dom0's disk will not find the grub
>>> binary in the domU filesystem and hence continues to read grub.cfg and
>>> boot. The drawback of this is of course that the two versions can't
>>> differ too much as there are different setups creating grub.cfg and
>>> then reading/parsing it at boot time.
>>>
>>> I am not sure at this point whether this is a problem in XEN or a
>>> problem in grub but I compiled the legacy pvgrub that uses some minios
>>> from XEN (don't really know much more about it) and when that legacy
>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>>> the legace pvgrub is not a real alternative as it's not packaged for
>>> Debian though.
>>>
>>> When it fails "xl create vm -c" outputs this:
>>> Parsing config from /etc/xen/vm
>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>>> type for domid=16
>>> Unable to attach console
>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>>> child [0] exited with error status 1
>>>
>>> And "xl dmesg" shows errors like this:
>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>>> 0x0000000000000000 to 0x000000000000ffff.
>>> (XEN) d16:v0: unhandled page fault (ec=0010)
>>> (XEN) Pagetable walk from 0000000000000000:
>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>>> compat_create_bounce_frame+0xc6/0xde
>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e019:[<0000000000000000>]
>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>>> (XEN) Guest stack trace from esp=005a5ff0:
>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>>> 0016b388
>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>>> 0016b380
>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>>> 0016b378
>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>>> 0016b370
>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>>> 0016b368
>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>>> 0016b360
>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>>> 0016b358
>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>>> 0016b350
>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>>> 0016b348
>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>>> 0016b340
>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>>> 0016b338
>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>>> 0016b330
>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>>> 0016b328
>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>>> 0016b320
>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>>> 0016b318
>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>>> 0016b310
>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>>> 0016b308
>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>>> 0016b300
>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>>> 0016b2f8
>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>>> 0016b2f0
>>>
>>> An easy way to find out which grub you are in if the machine boots is
>>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>>> (memdisk). So when trying to replicate the issue (and the domU
>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>>> then type 'halt' and relaunch the domU. Usually I can't launch more
>>> than 4-5 times in a row before it fails, often it fails on my first
>>> try.
>>>
>>> For information I have reproduced on two different AMD desktop
>>> processor machines, not sure if Intel would be any different. I'm
>>> pretty sure I did tests with grub from unstable with same result at
>>> some point, but can test again if that is likely to work.
>>>
>>> The package that is in installed on the domU side is "grub-xen".
>>>
>>> I am unable to understand how to debug grub further on my own, I have
>>> printed out text from grub so that I understood that it is the
>>> chainload that fails. I see no output from the domU grub (except when
>>> it works as it should of course). I can help with further testing if
>>> needed.
>>>
>>> /Andreas
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #2: Type: text/html, Size: 8673 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 13:01     ` Andrew Cooper
@ 2016-01-22 13:08       ` Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:08       ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 0 replies; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 13:08 UTC (permalink / raw)
  To: Andrew Cooper, Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 8465 bytes --]

On 22.01.2016 14:01, Andrew Cooper wrote:
> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>> On 22.09.2015 10:53, Ian Campbell wrote:
>>> Hi Vladimir & grub-devel,
>>>
>>> Do you have any thoughts on this issue with i386 pv-grub2?
>>>
>> Is it still an issue? If so I'll try to replicate it. From stack dump I
>> see that it has jumped to NULL. GRUB has no threads so it's not a race
>> condition with itself but may be one with some Xen part. An altrnative
>> possibility is that grub forgets to flush cache at some point in boot
>> process.
> 
> Looks like GRUB doesn't have a traptable registered with Xen (the PV
> equivalent of the IDT).
> 
> First, Xen tried to inject a #GP fault and found that the entry EIP was
> at 0 (which is sadly the default if nothing is specified).  It then took
> a pagefault while attempting to inject the #GP, and crashed the domain.
> 
Do you have a link how to add one? We can put a catch-stacktrace-abort
on it.
> ~Andrew
> 
>>> Thanks, Ian.
>>>
>>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>>>> applied) and Xen 4.4.1
>>>>
>>>> I originally posted a bug report with Debian but got the suggestion to
>>>> file bugs with upstream as well.
>>>> Debian bug report:
>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>>>
>>>> Note that my original thought was that this bug probably is within GRUB.
>>>> But Ian asked me to file a bug with Xen as well, you have to live with
>>>> the
>>>> fact that it is centered around GRUB though.
>>>>
>>>> Here's the information from my original bug report:
>>>>
>>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>>>> of the time.
>>>>
>>>> My understanding of the process:
>>>>
>>>>  * dom0 launches domU with grub that is loaded from dom0's disk.
>>>>  * Grub reads config file from memdisk, and then looks for grub binary in
>>>>     domU filesystem.
>>>>  * If grub is found in domU it then chainloads (multiboot) that grub
>>>> binary
>>>>     and the domU grub reads grub.cfg and continue booting.
>>>>  * If grub is not found in domU it reads grub.cfg and continues with
>>>> boot.
>>>>
>>>> It fails at step 3 in my list of the boot process, but sometimes it
>>>> does work so it may be something like a race condition that causes the
>>>> problem?
>>>>
>>>> A workaround is to not install or rename /boot/xen in domU so that the
>>>> first grub that is loaded from dom0's disk will not find the grub
>>>> binary in the domU filesystem and hence continues to read grub.cfg and
>>>> boot. The drawback of this is of course that the two versions can't
>>>> differ too much as there are different setups creating grub.cfg and
>>>> then reading/parsing it at boot time.
>>>>
>>>> I am not sure at this point whether this is a problem in XEN or a
>>>> problem in grub but I compiled the legacy pvgrub that uses some minios
>>>> from XEN (don't really know much more about it) and when that legacy
>>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>>>> the legace pvgrub is not a real alternative as it's not packaged for
>>>> Debian though.
>>>>
>>>> When it fails "xl create vm -c" outputs this:
>>>> Parsing config from /etc/xen/vm
>>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>>>> type for domid=16
>>>> Unable to attach console
>>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>>>> child [0] exited with error status 1
>>>>
>>>> And "xl dmesg" shows errors like this:
>>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>>>> 0x0000000000000000 to 0x000000000000ffff.
>>>> (XEN) d16:v0: unhandled page fault (ec=0010)
>>>> (XEN) Pagetable walk from 0000000000000000:
>>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>>>> compat_create_bounce_frame+0xc6/0xde
>>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>>>> (XEN) CPU: 0
>>>> (XEN) RIP: e019:[<0000000000000000>]
>>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>>>> (XEN) Guest stack trace from esp=005a5ff0:
>>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>>>> 0016b388
>>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>>>> 0016b380
>>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>>>> 0016b378
>>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>>>> 0016b370
>>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>>>> 0016b368
>>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>>>> 0016b360
>>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>>>> 0016b358
>>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>>>> 0016b350
>>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>>>> 0016b348
>>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>>>> 0016b340
>>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>>>> 0016b338
>>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>>>> 0016b330
>>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>>>> 0016b328
>>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>>>> 0016b320
>>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>>>> 0016b318
>>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>>>> 0016b310
>>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>>>> 0016b308
>>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>>>> 0016b300
>>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>>>> 0016b2f8
>>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>>>> 0016b2f0
>>>>
>>>> An easy way to find out which grub you are in if the machine boots is
>>>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>>>> (memdisk). So when trying to replicate the issue (and the domU
>>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>>>> then type 'halt' and relaunch the domU. Usually I can't launch more
>>>> than 4-5 times in a row before it fails, often it fails on my first
>>>> try.
>>>>
>>>> For information I have reproduced on two different AMD desktop
>>>> processor machines, not sure if Intel would be any different. I'm
>>>> pretty sure I did tests with grub from unstable with same result at
>>>> some point, but can test again if that is likely to work.
>>>>
>>>> The package that is in installed on the domU side is "grub-xen".
>>>>
>>>> I am unable to understand how to debug grub further on my own, I have
>>>> printed out text from grub so that I understood that it is the
>>>> chainload that fails. I see no output from the domU grub (except when
>>>> it works as it should of course). I can help with further testing if
>>>> needed.
>>>>
>>>> /Andreas
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 13:01     ` Andrew Cooper
  2016-01-22 13:08       ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 13:08       ` Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:43         ` Andrew Cooper
  2016-01-22 13:43         ` Andrew Cooper
  1 sibling, 2 replies; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 13:08 UTC (permalink / raw)
  To: Andrew Cooper, Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel

[-- Attachment #1: Type: text/plain, Size: 8465 bytes --]

On 22.01.2016 14:01, Andrew Cooper wrote:
> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>> On 22.09.2015 10:53, Ian Campbell wrote:
>>> Hi Vladimir & grub-devel,
>>>
>>> Do you have any thoughts on this issue with i386 pv-grub2?
>>>
>> Is it still an issue? If so I'll try to replicate it. From stack dump I
>> see that it has jumped to NULL. GRUB has no threads so it's not a race
>> condition with itself but may be one with some Xen part. An altrnative
>> possibility is that grub forgets to flush cache at some point in boot
>> process.
> 
> Looks like GRUB doesn't have a traptable registered with Xen (the PV
> equivalent of the IDT).
> 
> First, Xen tried to inject a #GP fault and found that the entry EIP was
> at 0 (which is sadly the default if nothing is specified).  It then took
> a pagefault while attempting to inject the #GP, and crashed the domain.
> 
Do you have a link how to add one? We can put a catch-stacktrace-abort
on it.
> ~Andrew
> 
>>> Thanks, Ian.
>>>
>>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>>>> applied) and Xen 4.4.1
>>>>
>>>> I originally posted a bug report with Debian but got the suggestion to
>>>> file bugs with upstream as well.
>>>> Debian bug report:
>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>>>
>>>> Note that my original thought was that this bug probably is within GRUB.
>>>> But Ian asked me to file a bug with Xen as well, you have to live with
>>>> the
>>>> fact that it is centered around GRUB though.
>>>>
>>>> Here's the information from my original bug report:
>>>>
>>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>>>> of the time.
>>>>
>>>> My understanding of the process:
>>>>
>>>>  * dom0 launches domU with grub that is loaded from dom0's disk.
>>>>  * Grub reads config file from memdisk, and then looks for grub binary in
>>>>     domU filesystem.
>>>>  * If grub is found in domU it then chainloads (multiboot) that grub
>>>> binary
>>>>     and the domU grub reads grub.cfg and continue booting.
>>>>  * If grub is not found in domU it reads grub.cfg and continues with
>>>> boot.
>>>>
>>>> It fails at step 3 in my list of the boot process, but sometimes it
>>>> does work so it may be something like a race condition that causes the
>>>> problem?
>>>>
>>>> A workaround is to not install or rename /boot/xen in domU so that the
>>>> first grub that is loaded from dom0's disk will not find the grub
>>>> binary in the domU filesystem and hence continues to read grub.cfg and
>>>> boot. The drawback of this is of course that the two versions can't
>>>> differ too much as there are different setups creating grub.cfg and
>>>> then reading/parsing it at boot time.
>>>>
>>>> I am not sure at this point whether this is a problem in XEN or a
>>>> problem in grub but I compiled the legacy pvgrub that uses some minios
>>>> from XEN (don't really know much more about it) and when that legacy
>>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>>>> the legace pvgrub is not a real alternative as it's not packaged for
>>>> Debian though.
>>>>
>>>> When it fails "xl create vm -c" outputs this:
>>>> Parsing config from /etc/xen/vm
>>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>>>> type for domid=16
>>>> Unable to attach console
>>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>>>> child [0] exited with error status 1
>>>>
>>>> And "xl dmesg" shows errors like this:
>>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>>>> 0x0000000000000000 to 0x000000000000ffff.
>>>> (XEN) d16:v0: unhandled page fault (ec=0010)
>>>> (XEN) Pagetable walk from 0000000000000000:
>>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>>>> compat_create_bounce_frame+0xc6/0xde
>>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>>>> (XEN) CPU: 0
>>>> (XEN) RIP: e019:[<0000000000000000>]
>>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>>>> (XEN) Guest stack trace from esp=005a5ff0:
>>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>>>> 0016b388
>>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>>>> 0016b380
>>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>>>> 0016b378
>>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>>>> 0016b370
>>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>>>> 0016b368
>>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>>>> 0016b360
>>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>>>> 0016b358
>>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>>>> 0016b350
>>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>>>> 0016b348
>>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>>>> 0016b340
>>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>>>> 0016b338
>>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>>>> 0016b330
>>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>>>> 0016b328
>>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>>>> 0016b320
>>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>>>> 0016b318
>>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>>>> 0016b310
>>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>>>> 0016b308
>>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>>>> 0016b300
>>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>>>> 0016b2f8
>>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>>>> 0016b2f0
>>>>
>>>> An easy way to find out which grub you are in if the machine boots is
>>>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>>>> (memdisk). So when trying to replicate the issue (and the domU
>>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>>>> then type 'halt' and relaunch the domU. Usually I can't launch more
>>>> than 4-5 times in a row before it fails, often it fails on my first
>>>> try.
>>>>
>>>> For information I have reproduced on two different AMD desktop
>>>> processor machines, not sure if Intel would be any different. I'm
>>>> pretty sure I did tests with grub from unstable with same result at
>>>> some point, but can test again if that is likely to work.
>>>>
>>>> The package that is in installed on the domU side is "grub-xen".
>>>>
>>>> I am unable to understand how to debug grub further on my own, I have
>>>> printed out text from grub so that I understood that it is the
>>>> chainload that fails. I see no output from the domU grub (except when
>>>> it works as it should of course). I can help with further testing if
>>>> needed.
>>>>
>>>> /Andreas
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 13:08       ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
  2016-01-22 13:43         ` Andrew Cooper
@ 2016-01-22 13:43         ` Andrew Cooper
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:43 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
	grub-devel
  Cc: Andreas Sundstrom, xen-devel

On 22/01/16 13:08, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.01.2016 14:01, Andrew Cooper wrote:
>> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>>> On 22.09.2015 10:53, Ian Campbell wrote:
>>>> Hi Vladimir & grub-devel,
>>>>
>>>> Do you have any thoughts on this issue with i386 pv-grub2?
>>>>
>>> Is it still an issue? If so I'll try to replicate it. From stack dump I
>>> see that it has jumped to NULL. GRUB has no threads so it's not a race
>>> condition with itself but may be one with some Xen part. An altrnative
>>> possibility is that grub forgets to flush cache at some point in boot
>>> process.
>> Looks like GRUB doesn't have a traptable registered with Xen (the PV
>> equivalent of the IDT).
>>
>> First, Xen tried to inject a #GP fault and found that the entry EIP was
>> at 0 (which is sadly the default if nothing is specified).  It then took
>> a pagefault while attempting to inject the #GP, and crashed the domain.
>>
> Do you have a link how to add one? We can put a catch-stacktrace-abort
> on it.

This is from my microkernel framework, and is probably the most succinct
code implementation:

http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen-test-framework.git;a=blob;f=arch/x86/pv/traps.c;h=7f9a1908d260659c10f5cbb1d2d234c9fea1edb5;hb=HEAD#l31

The hypercall ABI documentation is:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/xen.h;h=cdd93c1c6446a92e89188c6a5132538188825d27;hb=refs/heads/staging#l126

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 13:08       ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 13:43         ` Andrew Cooper
  2016-01-22 13:43         ` Andrew Cooper
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:43 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
	grub-devel
  Cc: Andreas Sundstrom, xen-devel

On 22/01/16 13:08, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.01.2016 14:01, Andrew Cooper wrote:
>> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>>> On 22.09.2015 10:53, Ian Campbell wrote:
>>>> Hi Vladimir & grub-devel,
>>>>
>>>> Do you have any thoughts on this issue with i386 pv-grub2?
>>>>
>>> Is it still an issue? If so I'll try to replicate it. From stack dump I
>>> see that it has jumped to NULL. GRUB has no threads so it's not a race
>>> condition with itself but may be one with some Xen part. An altrnative
>>> possibility is that grub forgets to flush cache at some point in boot
>>> process.
>> Looks like GRUB doesn't have a traptable registered with Xen (the PV
>> equivalent of the IDT).
>>
>> First, Xen tried to inject a #GP fault and found that the entry EIP was
>> at 0 (which is sadly the default if nothing is specified).  It then took
>> a pagefault while attempting to inject the #GP, and crashed the domain.
>>
> Do you have a link how to add one? We can put a catch-stacktrace-abort
> on it.

This is from my microkernel framework, and is probably the most succinct
code implementation:

http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen-test-framework.git;a=blob;f=arch/x86/pv/traps.c;h=7f9a1908d260659c10f5cbb1d2d234c9fea1edb5;hb=HEAD#l31

The hypercall ABI documentation is:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/xen.h;h=cdd93c1c6446a92e89188c6a5132538188825d27;hb=refs/heads/staging#l126

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
  2016-01-22 12:56   ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 17:44       ` Andreas Sundstrom
  2016-01-22 13:01     ` Andrew Cooper
  2016-01-22 17:44       ` [Xen-devel] " Andreas Sundstrom
  2 siblings, 0 replies; 24+ messages in thread
From: Andreas Sundstrom @ 2016-01-22 17:44 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
	grub-devel
  Cc: xen-devel

On 2016-01-22 13:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.09.2015 10:53, Ian Campbell wrote:
>> Hi Vladimir & grub-devel,
>>
>> Do you have any thoughts on this issue with i386 pv-grub2?
>>
> Is it still an issue? If so I'll try to replicate it. From stack dump I
> see that it has jumped to NULL. GRUB has no threads so it's not a race
> condition with itself but may be one with some Xen part. An altrnative
> possibility is that grub forgets to flush cache at some point in boot
> process.

I can still reproduce the issue.
I don't think much has changed in my setup since the report.
I run the current version of Xen and GRUB from Debian stable.

/Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
@ 2016-01-22 17:44       ` Andreas Sundstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Andreas Sundstrom @ 2016-01-22 17:44 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
	grub-devel
  Cc: xen-devel

On 2016-01-22 13:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.09.2015 10:53, Ian Campbell wrote:
>> Hi Vladimir & grub-devel,
>>
>> Do you have any thoughts on this issue with i386 pv-grub2?
>>
> Is it still an issue? If so I'll try to replicate it. From stack dump I
> see that it has jumped to NULL. GRUB has no threads so it's not a race
> condition with itself but may be one with some Xen part. An altrnative
> possibility is that grub forgets to flush cache at some point in boot
> process.

I can still reproduce the issue.
I don't think much has changed in my setup since the report.
I run the current version of Xen and GRUB from Debian stable.

/Andreas


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-01-22 21:26 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-21 20:03 [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Andreas Sundstrom
2015-09-22  7:22 ` Andrew Cooper
2015-09-22  8:52   ` Ian Campbell
2015-09-22 13:26   ` Andreas Sundstrom
2015-09-22  8:53 ` Ian Campbell
2015-09-22  8:53 ` [Xen-devel] " Ian Campbell
2016-01-22 12:56   ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 12:56   ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:01     ` Andrew Cooper
2016-01-22 13:08       ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:08       ` [Xen-devel] " Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:43         ` Andrew Cooper
2016-01-22 13:43         ` Andrew Cooper
2016-01-22 13:01     ` Andrew Cooper
2016-01-22 17:44     ` Andreas Sundstrom
2016-01-22 17:44       ` [Xen-devel] " Andreas Sundstrom
2015-09-22 22:37 ` Samuel Thibault
2015-09-23  8:34   ` Ian Campbell
2015-09-23 12:47     ` Andreas Sundstrom
2015-09-23 14:18       ` Ian Campbell
2015-09-24 17:28         ` Andreas Sundstrom
2015-09-25  8:36           ` Ian Campbell
2015-09-25 13:23             ` Andreas Sundstrom
2015-09-23  8:37   ` Andreas Sundstrom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.