* [Xen-devel] [TESTDAY] Test report
@ 2019-11-14 18:34 Tamas K Lengyel
2019-11-14 18:39 ` Andrew Cooper
0 siblings, 1 reply; 11+ messages in thread
From: Tamas K Lengyel @ 2019-11-14 18:34 UTC (permalink / raw)
To: Xen-devel
* Hardware: i7-2700
* Software: Debian buster
* Guest operating systems: Debian stretch
* Functionality tested: compiling, installing, Booting with dom0=pvh
* Comments: All works
----
* Hardware: i3-7100
* Software: Debian buster
* Guest operating systems: Debian stretch, debian jessie, windows 7
sp1 x86, windows7 sp1 x64, windows 10 1903
* Functionality tested: compiling, installing, booting from UEFI via
grub.efi, altp2m, introspection
* Comments: All works, altp2m+introspection requires the ept=pml=0
boot flag specified to workaround a deadlock in Xen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-14 18:34 [Xen-devel] [TESTDAY] Test report Tamas K Lengyel
@ 2019-11-14 18:39 ` Andrew Cooper
2019-11-14 22:36 ` Tamas K Lengyel
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-11-14 18:39 UTC (permalink / raw)
To: Tamas K Lengyel, Xen-devel
On 14/11/2019 18:34, Tamas K Lengyel wrote:
> * Comments: All works, altp2m+introspection requires the ept=pml=0
> boot flag specified to workaround a deadlock in Xen
Is this separate from the general problem with EPT A/D and
write-protecting pagetables?
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-14 18:39 ` Andrew Cooper
@ 2019-11-14 22:36 ` Tamas K Lengyel
2019-11-15 11:56 ` Andrew Cooper
0 siblings, 1 reply; 11+ messages in thread
From: Tamas K Lengyel @ 2019-11-14 22:36 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel
On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 14/11/2019 18:34, Tamas K Lengyel wrote:
> > * Comments: All works, altp2m+introspection requires the ept=pml=0
> > boot flag specified to workaround a deadlock in Xen
>
> Is this separate from the general problem with EPT A/D and
> write-protecting pagetables?
>
It sounds like it is, it happens without write-protecting in-guest
pagetables. I didn't have time to investigate where the deadlock
happens and since the workaround is fine for the usecase it wasn't a
priority to figure out.
Tamas
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-14 22:36 ` Tamas K Lengyel
@ 2019-11-15 11:56 ` Andrew Cooper
2019-11-15 15:19 ` Tamas K Lengyel
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-11-15 11:56 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: Alexandru Isaila, Xen-devel, Petre Pircalabu, Razvan Cojocaru,
Juergen Gross
On 14/11/2019 22:36, Tamas K Lengyel wrote:
> On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 14/11/2019 18:34, Tamas K Lengyel wrote:
>>> * Comments: All works, altp2m+introspection requires the ept=pml=0
>>> boot flag specified to workaround a deadlock in Xen
>> Is this separate from the general problem with EPT A/D and
>> write-protecting pagetables?
>>
> It sounds like it is, it happens without write-protecting in-guest
> pagetables. I didn't have time to investigate where the deadlock
> happens and since the workaround is fine for the usecase it wasn't a
> priority to figure out.
Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
as any altp2m gfn translations are used.
I'd be tempted to work around the deadlock by disabling pml the moment
altp2m is touched. That would give a sightly less bad user experience,
and should be easy to sort for 4.13.
Thoughts, (inc. Juergen as RM) ?
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-15 11:56 ` Andrew Cooper
@ 2019-11-15 15:19 ` Tamas K Lengyel
2019-11-15 15:32 ` Jürgen Groß
0 siblings, 1 reply; 11+ messages in thread
From: Tamas K Lengyel @ 2019-11-15 15:19 UTC (permalink / raw)
To: Andrew Cooper
Cc: Alexandru Isaila, Xen-devel, Petre Pircalabu, Razvan Cojocaru,
Juergen Gross
On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 14/11/2019 22:36, Tamas K Lengyel wrote:
> > On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 14/11/2019 18:34, Tamas K Lengyel wrote:
> >>> * Comments: All works, altp2m+introspection requires the ept=pml=0
> >>> boot flag specified to workaround a deadlock in Xen
> >> Is this separate from the general problem with EPT A/D and
> >> write-protecting pagetables?
> >>
> > It sounds like it is, it happens without write-protecting in-guest
> > pagetables. I didn't have time to investigate where the deadlock
> > happens and since the workaround is fine for the usecase it wasn't a
> > priority to figure out.
>
> Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
> as any altp2m gfn translations are used.
>
> I'd be tempted to work around the deadlock by disabling pml the moment
> altp2m is touched. That would give a sightly less bad user experience,
> and should be easy to sort for 4.13.
>
> Thoughts, (inc. Juergen as RM) ?
That sounds like a good idea to me, that way you can keep pml for
guests where it doesn't cause an issue instead of disabling it system
wide.
Tamas
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-15 15:19 ` Tamas K Lengyel
@ 2019-11-15 15:32 ` Jürgen Groß
0 siblings, 0 replies; 11+ messages in thread
From: Jürgen Groß @ 2019-11-15 15:32 UTC (permalink / raw)
To: Tamas K Lengyel, Andrew Cooper
Cc: Alexandru Isaila, Xen-devel, Petre Pircalabu, Razvan Cojocaru
On 15.11.19 16:19, Tamas K Lengyel wrote:
> On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 14/11/2019 22:36, Tamas K Lengyel wrote:
>>> On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 14/11/2019 18:34, Tamas K Lengyel wrote:
>>>>> * Comments: All works, altp2m+introspection requires the ept=pml=0
>>>>> boot flag specified to workaround a deadlock in Xen
>>>> Is this separate from the general problem with EPT A/D and
>>>> write-protecting pagetables?
>>>>
>>> It sounds like it is, it happens without write-protecting in-guest
>>> pagetables. I didn't have time to investigate where the deadlock
>>> happens and since the workaround is fine for the usecase it wasn't a
>>> priority to figure out.
>>
>> Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
>> as any altp2m gfn translations are used.
>>
>> I'd be tempted to work around the deadlock by disabling pml the moment
>> altp2m is touched. That would give a sightly less bad user experience,
>> and should be easy to sort for 4.13.
>>
>> Thoughts, (inc. Juergen as RM) ?
>
> That sounds like a good idea to me, that way you can keep pml for
> guests where it doesn't cause an issue instead of disabling it system
> wide.
Sounds like decent way to handle it.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-18 6:15 ` Jürgen Groß
@ 2019-11-19 7:22 ` Roman Shaposhnik
0 siblings, 0 replies; 11+ messages in thread
From: Roman Shaposhnik @ 2019-11-19 7:22 UTC (permalink / raw)
To: Jürgen Groß; +Cc: xen-devel
On Sun, Nov 17, 2019 at 10:15 PM Jürgen Groß <jgross@suse.com> wrote:
>
> On 16.11.19 02:12, Roman Shaposhnik wrote:
> > NOTE: this may or may not be a hair on fire problem, reporting it
> > anyway since I'd hate to pass on something that maybe a serious issue.
> > I haven't had time to debug this just yet -- so just reporting it here
> > pretty raw.
> >
> > Software:
> > Xen 4.13 RC2
> > Linux kernel 4.19.5
> > Hardware:
> > Supermicro E300
> > https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
> > Supermicro E100
> > https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
> > Supermicro E50
> > https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm
> >
> > Functionality tested: trying to boot Dom0
> > Comments: Xen boots completely and then seems like it either dies
> > right after saying
> > Xen relinquishing a console
> > or Dom0 dies (without printing a single line of output)
> >
> > FWIW, this started happening after upgrade to RC2. IOW, if I take my
> > previous RC1 binary and stick it into the very same setup --
> > everything boots fine.
> >
> > The issue doesn't seem to be reproducible on Dell boxes (and in my
> > virtual QEmu setup) that I've got.
>
> Can you please add the following to dom0's boot parameters:
>
> console=hvc0 earlyprintk=xen
>
> and send the Xen boot log (obtained via serial line)?
Will do once I get to the lab (traveling for KubeCON for the next
couple of days).
That said, if you see the other thread -- we've figured out that the
culprit was efi=no-rs
that regressed in functionality between RC1 and RC2. Marek has suggested a patch
that I need to test.
Now, if I drop efi=no-rs -- I can boot all the hardware mentioned in
*this* report
just fine.
A much bigger problem is that the following entire product line is now
busted with Xen 4.13 RC2:
https://www.dell.com/en-us/work/shop/gateways-embedded-computing/sc/gateways-embedded-pcs/edge-gateway?~ck=bt
On all these boxes:
- Without efi=no-rs option Xen panics on boot
- With efi=no-rs Xen boots fine, but Dom0 can't come up
Thanks,
Roman.
P.S. An additional complication with these Dell boxes is that it
required reasonably major brain surgery with soldering iron to rig
console output on them. I did it for one box in my lab but I need
physical access to it and I'm currently traveling.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-16 1:12 Roman Shaposhnik
@ 2019-11-18 6:15 ` Jürgen Groß
2019-11-19 7:22 ` Roman Shaposhnik
0 siblings, 1 reply; 11+ messages in thread
From: Jürgen Groß @ 2019-11-18 6:15 UTC (permalink / raw)
To: Roman Shaposhnik, xen-devel
On 16.11.19 02:12, Roman Shaposhnik wrote:
> NOTE: this may or may not be a hair on fire problem, reporting it
> anyway since I'd hate to pass on something that maybe a serious issue.
> I haven't had time to debug this just yet -- so just reporting it here
> pretty raw.
>
> Software:
> Xen 4.13 RC2
> Linux kernel 4.19.5
> Hardware:
> Supermicro E300
> https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
> Supermicro E100
> https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
> Supermicro E50
> https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm
>
> Functionality tested: trying to boot Dom0
> Comments: Xen boots completely and then seems like it either dies
> right after saying
> Xen relinquishing a console
> or Dom0 dies (without printing a single line of output)
>
> FWIW, this started happening after upgrade to RC2. IOW, if I take my
> previous RC1 binary and stick it into the very same setup --
> everything boots fine.
>
> The issue doesn't seem to be reproducible on Dell boxes (and in my
> virtual QEmu setup) that I've got.
Can you please add the following to dom0's boot parameters:
console=hvc0 earlyprintk=xen
and send the Xen boot log (obtained via serial line)?
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Xen-devel] [TESTDAY] Test report
@ 2019-11-16 1:12 Roman Shaposhnik
2019-11-18 6:15 ` Jürgen Groß
0 siblings, 1 reply; 11+ messages in thread
From: Roman Shaposhnik @ 2019-11-16 1:12 UTC (permalink / raw)
To: xen-devel
NOTE: this may or may not be a hair on fire problem, reporting it
anyway since I'd hate to pass on something that maybe a serious issue.
I haven't had time to debug this just yet -- so just reporting it here
pretty raw.
Software:
Xen 4.13 RC2
Linux kernel 4.19.5
Hardware:
Supermicro E300
https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
Supermicro E100
https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
Supermicro E50
https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm
Functionality tested: trying to boot Dom0
Comments: Xen boots completely and then seems like it either dies
right after saying
Xen relinquishing a console
or Dom0 dies (without printing a single line of output)
FWIW, this started happening after upgrade to RC2. IOW, if I take my
previous RC1 binary and stick it into the very same setup --
everything boots fine.
The issue doesn't seem to be reproducible on Dell boxes (and in my
virtual QEmu setup) that I've got.
Thanks,
Roman.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [TESTDAY] Test report
2019-11-15 2:39 Roman Shaposhnik
@ 2019-11-15 7:04 ` Jürgen Groß
0 siblings, 0 replies; 11+ messages in thread
From: Jürgen Groß @ 2019-11-15 7:04 UTC (permalink / raw)
To: Roman Shaposhnik, xen-devel
On 15.11.19 03:39, Roman Shaposhnik wrote:
> * Software: Xen 4.13 RC2
> * Hardware: Dell IoT Gateway 3000 series
> * Software: Project EVE
> * Guest operating systems: Alpine Linux
> * Functionality tested: compiling, installing, Booting with dom0=pv
> * Comments: All works, aside from xl create often timing out
>
> The timeout happens when either doing xl create or
> xl creating in a paused state (with -p) and later resuming.
> The error message is below:
> libxl: error: libxl_dom_suspend.c:609:dm_resume_done: Domain
> 3:Failed to resume device model: rc=-9
>
> We've actually tracked this issue down to this piece of code:
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_dom_suspend.c;h=248dbc33e384ae008e4ab9ce8fb573be06dddd72;hb=HEAD#l515
>
> Curiously enough it seems to be the only place (aside from
> libxl__wait_for_device_model_deprecated) that uses the
> timeout value that low. Everywhere else it seems to be
> LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000
Thanks for the thorough analysis.
It's clearly a regression. Patch sent.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Xen-devel] [TESTDAY] Test report
@ 2019-11-15 2:39 Roman Shaposhnik
2019-11-15 7:04 ` Jürgen Groß
0 siblings, 1 reply; 11+ messages in thread
From: Roman Shaposhnik @ 2019-11-15 2:39 UTC (permalink / raw)
To: xen-devel
* Software: Xen 4.13 RC2
* Hardware: Dell IoT Gateway 3000 series
* Software: Project EVE
* Guest operating systems: Alpine Linux
* Functionality tested: compiling, installing, Booting with dom0=pv
* Comments: All works, aside from xl create often timing out
The timeout happens when either doing xl create or
xl creating in a paused state (with -p) and later resuming.
The error message is below:
libxl: error: libxl_dom_suspend.c:609:dm_resume_done: Domain
3:Failed to resume device model: rc=-9
We've actually tracked this issue down to this piece of code:
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_dom_suspend.c;h=248dbc33e384ae008e4ab9ce8fb573be06dddd72;hb=HEAD#l515
Curiously enough it seems to be the only place (aside from
libxl__wait_for_device_model_deprecated) that uses the
timeout value that low. Everywhere else it seems to be
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000
./libxl/libxl_dom_suspend.c:
LIBXL_DEVICE_MODEL_START_TIMEOUT);
./libxl/libxl_dm.c: spawn->timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_dm.c: dmss->spawn.timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_pci.c: pas->xswait.timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_pci.c: LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000);
./libxl/libxl_pci.c: prs->xswait.timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_device.c:
LIBXL_DEVICE_MODEL_START_TIMEOUT,
./libxl/libxl_internal.h:#define LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight_test.so:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight.so.4.13:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight.so:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight.so.4.13.0:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-11-19 7:23 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 18:34 [Xen-devel] [TESTDAY] Test report Tamas K Lengyel
2019-11-14 18:39 ` Andrew Cooper
2019-11-14 22:36 ` Tamas K Lengyel
2019-11-15 11:56 ` Andrew Cooper
2019-11-15 15:19 ` Tamas K Lengyel
2019-11-15 15:32 ` Jürgen Groß
2019-11-15 2:39 Roman Shaposhnik
2019-11-15 7:04 ` Jürgen Groß
2019-11-16 1:12 Roman Shaposhnik
2019-11-18 6:15 ` Jürgen Groß
2019-11-19 7:22 ` Roman Shaposhnik
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.