All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [TESTDAY] Test report
@ 2019-11-14 18:34 Tamas K Lengyel
  2019-11-14 18:39 ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Tamas K Lengyel @ 2019-11-14 18:34 UTC (permalink / raw)
  To: Xen-devel

* Hardware: i7-2700

* Software: Debian buster

* Guest operating systems: Debian stretch

* Functionality tested: compiling, installing, Booting with dom0=pvh

* Comments: All works

----

* Hardware: i3-7100

* Software: Debian buster

* Guest operating systems: Debian stretch, debian jessie, windows 7
sp1 x86, windows7 sp1 x64, windows 10 1903

* Functionality tested: compiling, installing, booting from UEFI via
grub.efi, altp2m, introspection

* Comments: All works, altp2m+introspection requires the ept=pml=0
boot flag specified to workaround a deadlock in Xen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-14 18:34 [Xen-devel] [TESTDAY] Test report Tamas K Lengyel
@ 2019-11-14 18:39 ` Andrew Cooper
  2019-11-14 22:36   ` Tamas K Lengyel
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-11-14 18:39 UTC (permalink / raw)
  To: Tamas K Lengyel, Xen-devel

On 14/11/2019 18:34, Tamas K Lengyel wrote:
> * Comments: All works, altp2m+introspection requires the ept=pml=0
> boot flag specified to workaround a deadlock in Xen

Is this separate from the general problem with EPT A/D and
write-protecting pagetables?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-14 18:39 ` Andrew Cooper
@ 2019-11-14 22:36   ` Tamas K Lengyel
  2019-11-15 11:56     ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Tamas K Lengyel @ 2019-11-14 22:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 14/11/2019 18:34, Tamas K Lengyel wrote:
> > * Comments: All works, altp2m+introspection requires the ept=pml=0
> > boot flag specified to workaround a deadlock in Xen
>
> Is this separate from the general problem with EPT A/D and
> write-protecting pagetables?
>

It sounds like it is, it happens without write-protecting in-guest
pagetables. I didn't have time to investigate where the deadlock
happens and since the workaround is fine for the usecase it wasn't a
priority to figure out.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-14 22:36   ` Tamas K Lengyel
@ 2019-11-15 11:56     ` Andrew Cooper
  2019-11-15 15:19       ` Tamas K Lengyel
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-11-15 11:56 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Alexandru Isaila, Xen-devel, Petre Pircalabu, Razvan Cojocaru,
	Juergen Gross

On 14/11/2019 22:36, Tamas K Lengyel wrote:
> On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 14/11/2019 18:34, Tamas K Lengyel wrote:
>>> * Comments: All works, altp2m+introspection requires the ept=pml=0
>>> boot flag specified to workaround a deadlock in Xen
>> Is this separate from the general problem with EPT A/D and
>> write-protecting pagetables?
>>
> It sounds like it is, it happens without write-protecting in-guest
> pagetables. I didn't have time to investigate where the deadlock
> happens and since the workaround is fine for the usecase it wasn't a
> priority to figure out.

Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
as any altp2m gfn translations are used.

I'd be tempted to work around the deadlock by disabling pml the moment
altp2m is touched.  That would give a sightly less bad user experience,
and should be easy to sort for 4.13.

Thoughts, (inc. Juergen as RM) ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-15 11:56     ` Andrew Cooper
@ 2019-11-15 15:19       ` Tamas K Lengyel
  2019-11-15 15:32         ` Jürgen Groß
  0 siblings, 1 reply; 11+ messages in thread
From: Tamas K Lengyel @ 2019-11-15 15:19 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Alexandru Isaila, Xen-devel, Petre Pircalabu, Razvan Cojocaru,
	Juergen Gross

On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 14/11/2019 22:36, Tamas K Lengyel wrote:
> > On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 14/11/2019 18:34, Tamas K Lengyel wrote:
> >>> * Comments: All works, altp2m+introspection requires the ept=pml=0
> >>> boot flag specified to workaround a deadlock in Xen
> >> Is this separate from the general problem with EPT A/D and
> >> write-protecting pagetables?
> >>
> > It sounds like it is, it happens without write-protecting in-guest
> > pagetables. I didn't have time to investigate where the deadlock
> > happens and since the workaround is fine for the usecase it wasn't a
> > priority to figure out.
>
> Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
> as any altp2m gfn translations are used.
>
> I'd be tempted to work around the deadlock by disabling pml the moment
> altp2m is touched.  That would give a sightly less bad user experience,
> and should be easy to sort for 4.13.
>
> Thoughts, (inc. Juergen as RM) ?

That sounds like a good idea to me, that way you can keep pml for
guests where it doesn't cause an issue instead of disabling it system
wide.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-15 15:19       ` Tamas K Lengyel
@ 2019-11-15 15:32         ` Jürgen Groß
  0 siblings, 0 replies; 11+ messages in thread
From: Jürgen Groß @ 2019-11-15 15:32 UTC (permalink / raw)
  To: Tamas K Lengyel, Andrew Cooper
  Cc: Alexandru Isaila, Xen-devel, Petre Pircalabu, Razvan Cojocaru

On 15.11.19 16:19, Tamas K Lengyel wrote:
> On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 14/11/2019 22:36, Tamas K Lengyel wrote:
>>> On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 14/11/2019 18:34, Tamas K Lengyel wrote:
>>>>> * Comments: All works, altp2m+introspection requires the ept=pml=0
>>>>> boot flag specified to workaround a deadlock in Xen
>>>> Is this separate from the general problem with EPT A/D and
>>>> write-protecting pagetables?
>>>>
>>> It sounds like it is, it happens without write-protecting in-guest
>>> pagetables. I didn't have time to investigate where the deadlock
>>> happens and since the workaround is fine for the usecase it wasn't a
>>> priority to figure out.
>>
>> Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
>> as any altp2m gfn translations are used.
>>
>> I'd be tempted to work around the deadlock by disabling pml the moment
>> altp2m is touched.  That would give a sightly less bad user experience,
>> and should be easy to sort for 4.13.
>>
>> Thoughts, (inc. Juergen as RM) ?
> 
> That sounds like a good idea to me, that way you can keep pml for
> guests where it doesn't cause an issue instead of disabling it system
> wide.

Sounds like decent way to handle it.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-18  6:15 ` Jürgen Groß
@ 2019-11-19  7:22   ` Roman Shaposhnik
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Shaposhnik @ 2019-11-19  7:22 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: xen-devel

On Sun, Nov 17, 2019 at 10:15 PM Jürgen Groß <jgross@suse.com> wrote:
>
> On 16.11.19 02:12, Roman Shaposhnik wrote:
> > NOTE: this may or may not be a hair on fire problem, reporting it
> > anyway since I'd hate to pass on something that maybe a serious issue.
> > I haven't had time to debug this just yet -- so just reporting it here
> > pretty raw.
> >
> > Software:
> >     Xen 4.13 RC2
> >     Linux kernel 4.19.5
> > Hardware:
> >     Supermicro E300
> >         https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
> >     Supermicro E100
> >         https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
> >     Supermicro E50
> >         https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm
> >
> > Functionality tested: trying to boot Dom0
> > Comments: Xen boots completely and then seems like it either dies
> > right after saying
> >      Xen relinquishing a console
> > or Dom0 dies (without printing a single line of output)
> >
> > FWIW, this started happening after upgrade to RC2. IOW, if I take my
> > previous RC1 binary and stick it into the very same setup --
> > everything boots fine.
> >
> > The issue doesn't seem to be reproducible on Dell boxes (and in my
> > virtual QEmu setup) that I've got.
>
> Can you please add the following to dom0's boot parameters:
>
> console=hvc0 earlyprintk=xen
>
> and send the Xen boot log (obtained via serial line)?

Will do once I get to the lab (traveling for KubeCON for the next
couple of days).

That said, if you see the other thread -- we've figured out that the
culprit was efi=no-rs
that regressed in functionality between RC1 and RC2. Marek has suggested a patch
that I need to test.

Now, if I drop efi=no-rs -- I can boot all the hardware mentioned in
*this* report
just fine.

A much bigger problem is that the following entire product line is now
busted with Xen 4.13 RC2:
     https://www.dell.com/en-us/work/shop/gateways-embedded-computing/sc/gateways-embedded-pcs/edge-gateway?~ck=bt

On all these boxes:
   - Without efi=no-rs option Xen panics on boot
   - With efi=no-rs Xen boots fine, but Dom0 can't come up

Thanks,
Roman.

P.S. An additional complication with these Dell boxes is that it
required reasonably major brain surgery with soldering iron to rig
console output on them. I did it for one box in my lab but I need
physical access to it and I'm currently traveling.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-16  1:12 Roman Shaposhnik
@ 2019-11-18  6:15 ` Jürgen Groß
  2019-11-19  7:22   ` Roman Shaposhnik
  0 siblings, 1 reply; 11+ messages in thread
From: Jürgen Groß @ 2019-11-18  6:15 UTC (permalink / raw)
  To: Roman Shaposhnik, xen-devel

On 16.11.19 02:12, Roman Shaposhnik wrote:
> NOTE: this may or may not be a hair on fire problem, reporting it
> anyway since I'd hate to pass on something that maybe a serious issue.
> I haven't had time to debug this just yet -- so just reporting it here
> pretty raw.
> 
> Software:
>     Xen 4.13 RC2
>     Linux kernel 4.19.5
> Hardware:
>     Supermicro E300
>         https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
>     Supermicro E100
>         https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
>     Supermicro E50
>         https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm
> 
> Functionality tested: trying to boot Dom0
> Comments: Xen boots completely and then seems like it either dies
> right after saying
>      Xen relinquishing a console
> or Dom0 dies (without printing a single line of output)
> 
> FWIW, this started happening after upgrade to RC2. IOW, if I take my
> previous RC1 binary and stick it into the very same setup --
> everything boots fine.
> 
> The issue doesn't seem to be reproducible on Dell boxes (and in my
> virtual QEmu setup) that I've got.

Can you please add the following to dom0's boot parameters:

console=hvc0 earlyprintk=xen

and send the Xen boot log (obtained via serial line)?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Xen-devel] [TESTDAY] Test report
@ 2019-11-16  1:12 Roman Shaposhnik
  2019-11-18  6:15 ` Jürgen Groß
  0 siblings, 1 reply; 11+ messages in thread
From: Roman Shaposhnik @ 2019-11-16  1:12 UTC (permalink / raw)
  To: xen-devel

NOTE: this may or may not be a hair on fire problem, reporting it
anyway since I'd hate to pass on something that maybe a serious issue.
I haven't had time to debug this just yet -- so just reporting it here
pretty raw.

Software:
   Xen 4.13 RC2
   Linux kernel 4.19.5
Hardware:
   Supermicro E300
       https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
   Supermicro E100
       https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
   Supermicro E50
       https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm

Functionality tested: trying to boot Dom0
Comments: Xen boots completely and then seems like it either dies
right after saying
    Xen relinquishing a console
or Dom0 dies (without printing a single line of output)

FWIW, this started happening after upgrade to RC2. IOW, if I take my
previous RC1 binary and stick it into the very same setup --
everything boots fine.

The issue doesn't seem to be reproducible on Dell boxes (and in my
virtual QEmu setup) that I've got.

Thanks,
Roman.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [TESTDAY] Test report
  2019-11-15  2:39 Roman Shaposhnik
@ 2019-11-15  7:04 ` Jürgen Groß
  0 siblings, 0 replies; 11+ messages in thread
From: Jürgen Groß @ 2019-11-15  7:04 UTC (permalink / raw)
  To: Roman Shaposhnik, xen-devel

On 15.11.19 03:39, Roman Shaposhnik wrote:
> * Software: Xen 4.13 RC2
> * Hardware: Dell IoT Gateway 3000 series
> * Software: Project EVE
> * Guest operating systems: Alpine Linux
> * Functionality tested: compiling, installing, Booting with dom0=pv
> * Comments: All works, aside from xl create often timing out
> 
> The timeout happens when either doing xl create or
> xl creating in a paused state (with -p) and later resuming.
> The error message is below:
>     libxl: error: libxl_dom_suspend.c:609:dm_resume_done: Domain
> 3:Failed to resume device model: rc=-9
> 
> We've actually tracked this issue down to this piece of code:
>      http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_dom_suspend.c;h=248dbc33e384ae008e4ab9ce8fb573be06dddd72;hb=HEAD#l515
> 
> Curiously enough it seems to be the only place (aside from
> libxl__wait_for_device_model_deprecated) that uses the
> timeout value that low. Everywhere else it seems to be
>      LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000

Thanks for the thorough analysis.

It's clearly a regression. Patch sent.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Xen-devel] [TESTDAY] Test report
@ 2019-11-15  2:39 Roman Shaposhnik
  2019-11-15  7:04 ` Jürgen Groß
  0 siblings, 1 reply; 11+ messages in thread
From: Roman Shaposhnik @ 2019-11-15  2:39 UTC (permalink / raw)
  To: xen-devel

* Software: Xen 4.13 RC2
* Hardware: Dell IoT Gateway 3000 series
* Software: Project EVE
* Guest operating systems: Alpine Linux
* Functionality tested: compiling, installing, Booting with dom0=pv
* Comments: All works, aside from xl create often timing out

The timeout happens when either doing xl create or
xl creating in a paused state (with -p) and later resuming.
The error message is below:
   libxl: error: libxl_dom_suspend.c:609:dm_resume_done: Domain
3:Failed to resume device model: rc=-9

We've actually tracked this issue down to this piece of code:
    http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_dom_suspend.c;h=248dbc33e384ae008e4ab9ce8fb573be06dddd72;hb=HEAD#l515

Curiously enough it seems to be the only place (aside from
libxl__wait_for_device_model_deprecated) that uses the
timeout value that low. Everywhere else it seems to be
    LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000


./libxl/libxl_dom_suspend.c:
LIBXL_DEVICE_MODEL_START_TIMEOUT);
./libxl/libxl_dm.c:    spawn->timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_dm.c:    dmss->spawn.timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_pci.c:                pas->xswait.timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_pci.c:            LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000);
./libxl/libxl_pci.c:            prs->xswait.timeout_ms =
LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
./libxl/libxl_device.c:
LIBXL_DEVICE_MODEL_START_TIMEOUT,
./libxl/libxl_internal.h:#define LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight_test.so:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight.so.4.13:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight.so:LIBXL_DEVICE_MODEL_START_TIMEOUT 60
./libxl/libxenlight.so.4.13.0:LIBXL_DEVICE_MODEL_START_TIMEOUT 60

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-19  7:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 18:34 [Xen-devel] [TESTDAY] Test report Tamas K Lengyel
2019-11-14 18:39 ` Andrew Cooper
2019-11-14 22:36   ` Tamas K Lengyel
2019-11-15 11:56     ` Andrew Cooper
2019-11-15 15:19       ` Tamas K Lengyel
2019-11-15 15:32         ` Jürgen Groß
2019-11-15  2:39 Roman Shaposhnik
2019-11-15  7:04 ` Jürgen Groß
2019-11-16  1:12 Roman Shaposhnik
2019-11-18  6:15 ` Jürgen Groß
2019-11-19  7:22   ` Roman Shaposhnik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.