All of lore.kernel.org
 help / color / mirror / Atom feed
* Test for osstest, features used in Qubes OS
@ 2018-05-16 21:54 Marek Marczykowski-Górecki
  2018-05-17 12:26 ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-05-16 21:54 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1409 bytes --]

Hi,

As discussed some time ago, I'd like to help with adding tests for some
features we use in Qubes OS.

IMO the easiest thing to test is host suspend. You just need to execute
"rtcwake -s 30 -m mem", and see if the host is back to live after ~30s.
Right now I know it works on Xen 4.8, but supposedly is broken on
staging (haven't tested the most recent version).
Next step would be the same while having some domains running.

How the test should look like (where to add this? etc)?

Next things would be mostly related to PCI passthrough:
 - PCI passthrough with qemu in stubdomain
 - the same as above, but with Linux-based stubdomain (we need cleanup
   and send patches for that first, probably 4.12 material)
 - guest suspend (recently added libxl_domain_suspend_only), for
   different guest types (PV, PVH, HVM), also with/without PCI device

For this, the machine obviously need to have IOMMU (I assume at least
some of the hardware used in test lab have it), and some spare PCI
device. I use sound card for some of such tests. But testing on USB
controllers would be more useful (from out experience, one of the most
problematic devices for suspend, sadly also lacking FLR or such...).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-16 21:54 Test for osstest, features used in Qubes OS Marek Marczykowski-Górecki
@ 2018-05-17 12:26 ` Ian Jackson
  2018-05-17 14:59   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2018-05-17 12:26 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

Marek Marczykowski-Górecki writes ("Test for osstest, features used in Qubes OS"):
> As discussed some time ago, I'd like to help with adding tests for some
> features we use in Qubes OS.
> 
> IMO the easiest thing to test is host suspend. You just need to execute
> "rtcwake -s 30 -m mem", and see if the host is back to live after ~30s.
> Right now I know it works on Xen 4.8, but supposedly is broken on
> staging (haven't tested the most recent version).
> Next step would be the same while having some domains running.
> 
> How the test should look like (where to add this? etc)?

I guess this should be a new
  ts-host-suspend-test
script.

Is it likely that this will depend on non-buggy host firmware ?  If so
then we need to make arrangements to test it and only do it on hosts
which are not buggy.  In practice this probably means wiring it up to
the automatic host examiner.

> Next things would be mostly related to PCI passthrough:
>  - PCI passthrough with qemu in stubdomain
>  - the same as above, but with Linux-based stubdomain (we need cleanup
>    and send patches for that first, probably 4.12 material)
>  - guest suspend (recently added libxl_domain_suspend_only), for
>    different guest types (PV, PVH, HVM), also with/without PCI device
> 
> For this, the machine obviously need to have IOMMU (I assume at least
> some of the hardware used in test lab have it), and some spare PCI
> device. I use sound card for some of such tests. But testing on USB
> controllers would be more useful (from out experience, one of the most
> problematic devices for suspend, sadly also lacking FLR or such...).

I doubt any of our x86 machines have sound cards. ...  Just looked at
one and it says
  00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core
  Processor HD Audio Controller (rev 06)
which is obviously mad.

I'm pretty sure they all have usb controllers.  Almost all of them
have multiple NICs, often on different pci devices, although it is
difficult to tell if a NIC not connected to anything is working.

Eg,

  02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
  Connection (rev 03)

  03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
  Connection (rev 03)

Is there some kind of cheap USB HID, that is interactable-with, which
we could plug into each machine's USB port ?  I'm slightly concerned
that plugging in a storage device, or connecting the other NIC, might
interfere with booting.

If you want to get pci passthrough tests working I would suggest
testing it with non-stubdom first.  I assume the config etc. is the
same, so having got that working, osstest would be able to test it for
the stubdom tests too.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 12:26 ` Ian Jackson
@ 2018-05-17 14:59   ` Marek Marczykowski-Górecki
  2018-05-17 15:12     ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-05-17 14:59 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3312 bytes --]

On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Test for osstest, features used in Qubes OS"):
> > As discussed some time ago, I'd like to help with adding tests for some
> > features we use in Qubes OS.
> > 
> > IMO the easiest thing to test is host suspend. You just need to execute
> > "rtcwake -s 30 -m mem", and see if the host is back to live after ~30s.
> > Right now I know it works on Xen 4.8, but supposedly is broken on
> > staging (haven't tested the most recent version).
> > Next step would be the same while having some domains running.
> > 
> > How the test should look like (where to add this? etc)?
> 
> I guess this should be a new
>   ts-host-suspend-test
> script.
> 
> Is it likely that this will depend on non-buggy host firmware ?  If so
> then we need to make arrangements to test it and only do it on hosts
> which are not buggy.  In practice this probably means wiring it up to
> the automatic host examiner.

Yes, probably.

> > Next things would be mostly related to PCI passthrough:
> >  - PCI passthrough with qemu in stubdomain
> >  - the same as above, but with Linux-based stubdomain (we need cleanup
> >    and send patches for that first, probably 4.12 material)
> >  - guest suspend (recently added libxl_domain_suspend_only), for
> >    different guest types (PV, PVH, HVM), also with/without PCI device
> > 
> > For this, the machine obviously need to have IOMMU (I assume at least
> > some of the hardware used in test lab have it), and some spare PCI
> > device. I use sound card for some of such tests. But testing on USB
> > controllers would be more useful (from out experience, one of the most
> > problematic devices for suspend, sadly also lacking FLR or such...).
> 
> I doubt any of our x86 machines have sound cards. ...  Just looked at
> one and it says
>   00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core
>   Processor HD Audio Controller (rev 06)
> which is obviously mad.
> 
> I'm pretty sure they all have usb controllers.  Almost all of them
> have multiple NICs, often on different pci devices, although it is
> difficult to tell if a NIC not connected to anything is working.
> 
> Eg,
> 
>   02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
>   Connection (rev 03)
> 
>   03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
>   Connection (rev 03)
> 
> Is there some kind of cheap USB HID, that is interactable-with, which
> we could plug into each machine's USB port ?  I'm slightly concerned
> that plugging in a storage device, or connecting the other NIC, might
> interfere with booting.

I use mass storage for tests... But if you use network boot, it
shouldn't really interfere, no?

> If you want to get pci passthrough tests working I would suggest
> testing it with non-stubdom first.  I assume the config etc. is the
> same, so having got that working, osstest would be able to test it for
> the stubdom tests too.

Oh, I though there are already tests for that...
Yes, good idea.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 14:59   ` Marek Marczykowski-Górecki
@ 2018-05-17 15:12     ` Ian Jackson
  2018-05-17 18:00       ` Sander Eikelenboom
                         ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Ian Jackson @ 2018-05-17 15:12 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in Qubes OS"):
> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
> > Is it likely that this will depend on non-buggy host firmware ?  If so
> > then we need to make arrangements to test it and only do it on hosts
> > which are not buggy.  In practice this probably means wiring it up to
> > the automatic host examiner.
> 
> Yes, probably.

That's not entirely trivial then, especially for you, unless you want
to set up your own osstest production instance.  However, I can
probably do the osstest-machinery work if you will help debug it,
review logs, tell me what to do next, etc. :-).

> > Is there some kind of cheap USB HID, that is interactable-with, which
> > we could plug into each machine's USB port ?  I'm slightly concerned
> > that plugging in a storage device, or connecting the other NIC, might
> > interfere with booting.
> 
> I use mass storage for tests... But if you use network boot, it
> shouldn't really interfere, no?

We do both network boot and disk boot.  I think the BIOS disk boot has
to continue to work and boot the HDD.

> > If you want to get pci passthrough tests working I would suggest
> > testing it with non-stubdom first.  I assume the config etc. is the
> > same, so having got that working, osstest would be able to test it for
> > the stubdom tests too.
> 
> Oh, I though there are already tests for that...

There are no PCI passthrough tests at all.  For a while we had some
SRIOV NIC tests which were requested by Intel.  But they always failed
giving kernel stack dumps.  We kept poking Intel to get them to fix
them, or tell us how the tests were wrong, but to no avail.  So we
dropped them.

So any work in this area would be greatly appreciated!

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 15:12     ` Ian Jackson
@ 2018-05-17 18:00       ` Sander Eikelenboom
  2018-05-18 15:44         ` Marek Marczykowski-Górecki
  2018-05-18 15:33       ` Marek Marczykowski-Górecki
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Sander Eikelenboom @ 2018-05-17 18:00 UTC (permalink / raw)
  To: Ian Jackson, Marek Marczykowski-Górecki; +Cc: xen-devel

Marek / Ian,

Nice to see PCI-passthrough getting some attention again.

On 17/05/18 17:12, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in Qubes OS"):
>> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
>>> Is it likely that this will depend on non-buggy host firmware ?  If so
>>> then we need to make arrangements to test it and only do it on hosts
>>> which are not buggy.  In practice this probably means wiring it up to
>>> the automatic host examiner.
>>
>> Yes, probably.
> 
> That's not entirely trivial then, especially for you, unless you want
> to set up your own osstest production instance.  However, I can
> probably do the osstest-machinery work if you will help debug it,
> review logs, tell me what to do next, etc. :-).
> 
>>> Is there some kind of cheap USB HID, that is interactable-with, which
>>> we could plug into each machine's USB port ?  I'm slightly concerned
>>> that plugging in a storage device, or connecting the other NIC, might
>>> interfere with booting.
>>
>> I use mass storage for tests... But if you use network boot, it
>> shouldn't really interfere, no?
> 
> We do both network boot and disk boot.  I think the BIOS disk boot has
> to continue to work and boot the HDD.

As a user of pci-passthrough for quite some time and reporting some pci-passthrough bugs in the past,
I do have some comments:

- First of all it would be very nice to get some autotesting :).
- But if you want to thoroughly test pci-passthrough, 
  it will be far from easy since there is quite a multi-dimensional support matrix
  (I'm not implying that everything should be done or it won't be valuable if any is missing,
   it's only meant for reference):
  1) Guest side implementation: 
     - PV guest (pcifront)
     - HVM (qemu-traditional) 
     - HVM (qemu-xen) 
     - HVM (qemu-upstream) 
     - perhaps PVH support for pci passthrough coming around the corner.

  2) (Un)Binding method to pciback:
     - binding pci devices to pciback on host boot (command line) 
     - de/re/unbinding devices from dom0 while running.
 
  3) (Un)binding to guest:
     - On guest start (guest.cfg pci=[...])
     - After the guest has been started with 'xl pci-*' commands
  3) Device interrupts: legacy versus MSI versus MSI-X
  4) Other pci device features: roms, BAR sizes, etc.
  5) AMD versus Intel IOMMU

From the past reports, I know (1) and (3) did matter (problems being isolated to one of these variants only).


As for restarting guests and reassigning pci-devices again to other guests the current pciback reset support lacks
the bus-reset patches at present in upstream linux kernels. Passthrough of AMD Radeon graphics adapters works only one
time without it (if you stop and restart a guest it doesn't work anymore and you need to reboot the host). 
With the bus-reset patches (which have been posted to the list and seem to be in both Qubes and Xenserver 
in some form but not in upstream linux). Someone from Oracle had picked them up to get them upstream some time ago,
but that effort seems to have stalled.

The code in libxl seems to be quite messy for pci-passthrough especially for handling all the guest side implementations (1)
and xenstore interactions that go with it (or don't for qemu).

--
Sander

 
>>> If you want to get pci passthrough tests working I would suggest
>>> testing it with non-stubdom first.  I assume the config etc. is the
>>> same, so having got that working, osstest would be able to test it for
>>> the stubdom tests too.
>>
>> Oh, I though there are already tests for that...
> 
> There are no PCI passthrough tests at all.  For a while we had some
> SRIOV NIC tests which were requested by Intel.  But they always failed
> giving kernel stack dumps.  We kept poking Intel to get them to fix
> them, or tell us how the tests were wrong, but to no avail.  So we
> dropped them.
> 
> So any work in this area would be greatly appreciated!
> 
> Ian.
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 15:12     ` Ian Jackson
  2018-05-17 18:00       ` Sander Eikelenboom
@ 2018-05-18 15:33       ` Marek Marczykowski-Górecki
  2018-05-18 15:54         ` Jan Beulich
  2018-05-21 11:04       ` George Dunlap
  2018-05-21 11:49       ` Dario Faggioli
  3 siblings, 1 reply; 22+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-05-18 15:33 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1408 bytes --]

On Thu, May 17, 2018 at 04:12:09PM +0100, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in Qubes OS"):
> > On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
> > > Is it likely that this will depend on non-buggy host firmware ?  If so
> > > then we need to make arrangements to test it and only do it on hosts
> > > which are not buggy.  In practice this probably means wiring it up to
> > > the automatic host examiner.
> > 
> > Yes, probably.
> 
> That's not entirely trivial then, especially for you, unless you want
> to set up your own osstest production instance.  However, I can
> probably do the osstest-machinery work if you will help debug it,
> review logs, tell me what to do next, etc. :-).

Yes, I'm happy to help with that. As I've said, the basic test is very
simple (rtcwake command) and already very useful. The fact that it is(?)
broken on staging doesn't make it easier, but I think setting up the
test using 4.8 branch first should be fine.
If you want to talk on IRC about it, just ping me on email first, I
don't have my irc client running all the time.

In the meantime, I'll try to familiarize myself with osstest...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 18:00       ` Sander Eikelenboom
@ 2018-05-18 15:44         ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 22+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-05-18 15:44 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Ian Jackson, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4008 bytes --]

On Thu, May 17, 2018 at 08:00:38PM +0200, Sander Eikelenboom wrote:
> Marek / Ian,
> 
> Nice to see PCI-passthrough getting some attention again.
> 
> On 17/05/18 17:12, Ian Jackson wrote:
> > Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in Qubes OS"):
> >> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
> >>> Is there some kind of cheap USB HID, that is interactable-with, which
> >>> we could plug into each machine's USB port ?  I'm slightly concerned
> >>> that plugging in a storage device, or connecting the other NIC, might
> >>> interfere with booting.
> >>
> >> I use mass storage for tests... But if you use network boot, it
> >> shouldn't really interfere, no?
> > 
> > We do both network boot and disk boot.  I think the BIOS disk boot has
> > to continue to work and boot the HDD.

In fact, using any device should be enough for the start. USB mouse for
example. Just reading USB descriptor involve some communication with the
controller, so it should be some indication about its state.

> As a user of pci-passthrough for quite some time and reporting some pci-passthrough bugs in the past,
> I do have some comments:
> 
> - First of all it would be very nice to get some autotesting :).
> - But if you want to thoroughly test pci-passthrough, 
>   it will be far from easy since there is quite a multi-dimensional support matrix
>   (I'm not implying that everything should be done or it won't be valuable if any is missing,
>    it's only meant for reference):
>   1) Guest side implementation: 
>      - PV guest (pcifront)
>      - HVM (qemu-traditional) 
>      - HVM (qemu-xen) 
>      - HVM (qemu-upstream) 
>      - perhaps PVH support for pci passthrough coming around the corner.
> 
>   2) (Un)Binding method to pciback:
>      - binding pci devices to pciback on host boot (command line) 
>      - de/re/unbinding devices from dom0 while running.
>  
>   3) (Un)binding to guest:
>      - On guest start (guest.cfg pci=[...])
>      - After the guest has been started with 'xl pci-*' commands
>   3) Device interrupts: legacy versus MSI versus MSI-X
>   4) Other pci device features: roms, BAR sizes, etc.
>   5) AMD versus Intel IOMMU
> 
> From the past reports, I know (1) and (3) did matter (problems being isolated to one of these variants only).

Yes, that's right, my experience is similar in that matter. Especially
point 3 is tricky/problematic, as some devices (or rather: drivers)
doesn't correctly fallback to legacy interrupts if MSI/MSI-X isn't
available.
So, the ideal test should check those things too - if the guest driver
really use what it's expected to use. But lets start with something
first. I don't know how osstest handle it yet, but I'd expect adding
more guest configurations to run the same test on should be easy.

> As for restarting guests and reassigning pci-devices again to other guests the current pciback reset support lacks
> the bus-reset patches at present in upstream linux kernels. Passthrough of AMD Radeon graphics adapters works only one
> time without it (if you stop and restart a guest it doesn't work anymore and you need to reboot the host). 
> With the bus-reset patches (which have been posted to the list and seem to be in both Qubes and Xenserver 
> in some form but not in upstream linux). Someone from Oracle had picked them up to get them upstream some time ago,
> but that effort seems to have stalled.

Can you point specifically what patches are you talking about? In Qubes
in most cases device reset is handled by libvirt...

> The code in libxl seems to be quite messy for pci-passthrough especially for handling all the guest side implementations (1)
> and xenstore interactions that go with it (or don't for qemu).
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-18 15:33       ` Marek Marczykowski-Górecki
@ 2018-05-18 15:54         ` Jan Beulich
  2018-05-18 16:19           ` Marek Marczykowski
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2018-05-18 15:54 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Ian Jackson, xen-devel

>>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
> Yes, I'm happy to help with that. As I've said, the basic test is very
> simple (rtcwake command) and already very useful. The fact that it is(?)
> broken on staging doesn't make it easier,

Details on the breakage would be appreciated (on a separate thread),
unless you plan to address it yourself. I recall Simon(?) mentioning this as
well, but also not providing sufficient data to consider looking into it
(perhaps simply because it wasn't easy to obtain useful data, as
frequently is the case with S3 resume). I think it would be nice if we could
release 4.11 without a regression here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-18 15:54         ` Jan Beulich
@ 2018-05-18 16:19           ` Marek Marczykowski
  2018-05-21 15:48             ` George Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Marek Marczykowski @ 2018-05-18 16:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Simon Gaiser, Ian Jackson, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1008 bytes --]

On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote:
> >>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
> > Yes, I'm happy to help with that. As I've said, the basic test is very
> > simple (rtcwake command) and already very useful. The fact that it is(?)
> > broken on staging doesn't make it easier,
> 
> Details on the breakage would be appreciated (on a separate thread),
> unless you plan to address it yourself. I recall Simon(?) mentioning this as
> well, but also not providing sufficient data to consider looking into it
> (perhaps simply because it wasn't easy to obtain useful data, as
> frequently is the case with S3 resume). I think it would be nice if we could
> release 4.11 without a regression here.

I only know that Simon have tested it and it fails. Cc'ing him.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 15:12     ` Ian Jackson
  2018-05-17 18:00       ` Sander Eikelenboom
  2018-05-18 15:33       ` Marek Marczykowski-Górecki
@ 2018-05-21 11:04       ` George Dunlap
  2018-05-21 11:52         ` Dario Faggioli
  2018-05-21 11:49       ` Dario Faggioli
  3 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2018-05-21 11:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Dario Faggioli, Marek Marczykowski-Górecki, xen-devel

On Thu, May 17, 2018 at 4:12 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in Qubes OS"):
>> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
>> > Is it likely that this will depend on non-buggy host firmware ?  If so
>> > then we need to make arrangements to test it and only do it on hosts
>> > which are not buggy.  In practice this probably means wiring it up to
>> > the automatic host examiner.
>>
>> Yes, probably.
>
> That's not entirely trivial then, especially for you, unless you want
> to set up your own osstest production instance.  However, I can
> probably do the osstest-machinery work if you will help debug it,
> review logs, tell me what to do next, etc. :-).

I'm pretty sure it would be possible to test the Xen "get ready for
suspend" and "resume from suspend" functionality without actually
needing to interact with ACPI -- we just get it to the point where it
would start interacting with ACPI, and then have it return instead.
From a "I'm positive this will continue to work" point of view it's
not as satisfying as actually doing the suspend; but from a practical
point of view, it will catch the vast majority of bugs in Xen (as
opposed to hardware-specific quirks); and it will run on any hardware
(which means not having to do reliability testing).

IIRC Dario actually had a patch for something like this for his own
testing at some point -- Dario, anything to add?

What if we 1) have two versions of the test -- "Fake suspend" and
"Real Suspend"; 2) only run "Real suspend" on hardware specifically
marked as having a suspend that works reliably; 3) default all
hardware to 'false' until we do some testing to find out how reliable
it is?

That way we get suspend testing 95% effective as quickly as possible,
and we can complete it as we have time.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-17 15:12     ` Ian Jackson
                         ` (2 preceding siblings ...)
  2018-05-21 11:04       ` George Dunlap
@ 2018-05-21 11:49       ` Dario Faggioli
  3 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2018-05-21 11:49 UTC (permalink / raw)
  To: Ian Jackson, Marek Marczykowski-Górecki; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1628 bytes --]

On Thu, 2018-05-17 at 16:12 +0100, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Re: Test for osstest, features
> used in Qubes OS"):
> > On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote:
> > > Is it likely that this will depend on non-buggy host firmware
> > > ?  If so
> > > then we need to make arrangements to test it and only do it on
> > > hosts
> > > which are not buggy.  In practice this probably means wiring it
> > > up to
> > > the automatic host examiner.
> > 
> > Yes, probably.
> 
> That's not entirely trivial then, especially for you, unless you want
> to set up your own osstest production instance.  However, I can
> probably do the osstest-machinery work if you will help debug it,
> review logs, tell me what to do next, etc. :-).
> 
I'm not sure what 'non-bugs' in the firmware we're talking about, but I
problem I had when trying to do something like testing S3
suspend/resume in osstest, was that most server class hardware I could
find, did not support that.

If that's the bug you're talking about, yes, I agree it's not trivial.
:-) (although, I did not actually check the boxes in the MA colo, they
were just servers from Citrix's lab).

There's a (non-perfect) workaround, though, as George suggests, which
would allow us to run a "quasi-suspend" test at every flight on every
hardware.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 11:04       ` George Dunlap
@ 2018-05-21 11:52         ` Dario Faggioli
  2018-05-21 13:57           ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2018-05-21 11:52 UTC (permalink / raw)
  To: George Dunlap, Ian Jackson; +Cc: Marek Marczykowski-Górecki, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2110 bytes --]

On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote:
> On Thu, May 17, 2018 at 4:12 PM, Ian Jackson <ian.jackson@citrix.com>
> wrote:
> > That's not entirely trivial then, especially for you, unless you
> > want
> > to set up your own osstest production instance.  However, I can
> > probably do the osstest-machinery work if you will help debug it,
> > review logs, tell me what to do next, etc. :-).
> 
> I'm pretty sure it would be possible to test the Xen "get ready for
> suspend" and "resume from suspend" functionality without actually
> needing to interact with ACPI -- we just get it to the point where it
> would start interacting with ACPI, and then have it return instead.
> From a "I'm positive this will continue to work" point of view it's
> not as satisfying as actually doing the suspend; but from a practical
> point of view, it will catch the vast majority of bugs in Xen (as
> opposed to hardware-specific quirks); and it will run on any hardware
> (which means not having to do reliability testing).
> 
> IIRC Dario actually had a patch for something like this for his own
> testing at some point -- Dario, anything to add?
> 
Indeed I had a patch (it's originally from Ben, actually). I sent it,
so it can be found in list archives. And, in any case, I still have it
around and can resend it.

I did catch quite a few bugs with it back then.

> What if we 1) have two versions of the test -- "Fake suspend" and
> "Real Suspend"; 2) only run "Real suspend" on hardware specifically
> marked as having a suspend that works reliably; 3) default all
> hardware to 'false' until we do some testing to find out how reliable
> it is?
> 
> That way we get suspend testing 95% effective as quickly as possible,
> and we can complete it as we have time.
> 
That sounds a very good plan to me, FWIW.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 11:52         ` Dario Faggioli
@ 2018-05-21 13:57           ` Ian Jackson
  2018-05-21 14:11             ` George Dunlap
  2018-05-22  7:51             ` Dario Faggioli
  0 siblings, 2 replies; 22+ messages in thread
From: Ian Jackson @ 2018-05-21 13:57 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Marek Marczykowski-Górecki, xen-devel

Dario Faggioli writes ("Re: [Xen-devel] Test for osstest, features used in Qubes OS"):
> On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote:
> > What if we 1) have two versions of the test -- "Fake suspend" and
> > "Real Suspend"; 2) only run "Real suspend" on hardware specifically
> > marked as having a suspend that works reliably; 3) default all
> > hardware to 'false' until we do some testing to find out how reliable
> > it is?
> > 
> > That way we get suspend testing 95% effective as quickly as possible,
> > and we can complete it as we have time.
> 
> That sounds a very good plan to me, FWIW.

OK, for starters, how about we add the fake suspend test to every
flight.

What is the rune for that.

Do we want or need to do that test with a guest running ?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 13:57           ` Ian Jackson
@ 2018-05-21 14:11             ` George Dunlap
  2018-05-22  7:51             ` Dario Faggioli
  1 sibling, 0 replies; 22+ messages in thread
From: George Dunlap @ 2018-05-21 14:11 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Marek Marczykowski-Górecki, Dario Faggioli

On Mon, May 21, 2018 at 2:57 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> Dario Faggioli writes ("Re: [Xen-devel] Test for osstest, features used in Qubes OS"):
>> On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote:
>> > What if we 1) have two versions of the test -- "Fake suspend" and
>> > "Real Suspend"; 2) only run "Real suspend" on hardware specifically
>> > marked as having a suspend that works reliably; 3) default all
>> > hardware to 'false' until we do some testing to find out how reliable
>> > it is?
>> >
>> > That way we get suspend testing 95% effective as quickly as possible,
>> > and we can complete it as we have time.
>>
>> That sounds a very good plan to me, FWIW.
>
> OK, for starters, how about we add the fake suspend test to every
> flight.
>
> What is the rune for that.
>
> Do we want or need to do that test with a guest running ?

Unfortunately the patch was never checked in.

I'll send an updated patch.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-18 16:19           ` Marek Marczykowski
@ 2018-05-21 15:48             ` George Dunlap
  2018-05-21 16:17               ` Andrew Cooper
  2018-05-22 22:21               ` Simon Gaiser
  0 siblings, 2 replies; 22+ messages in thread
From: George Dunlap @ 2018-05-21 15:48 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Simon Gaiser, Ian Jackson, Jan Beulich, xen-devel

On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski
<marmarek@invisiblethingslab.com> wrote:
> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote:
>> >>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
>> > Yes, I'm happy to help with that. As I've said, the basic test is very
>> > simple (rtcwake command) and already very useful. The fact that it is(?)
>> > broken on staging doesn't make it easier,
>>
>> Details on the breakage would be appreciated (on a separate thread),
>> unless you plan to address it yourself. I recall Simon(?) mentioning this as
>> well, but also not providing sufficient data to consider looking into it
>> (perhaps simply because it wasn't easy to obtain useful data, as
>> frequently is the case with S3 resume). I think it would be nice if we could
>> release 4.11 without a regression here.
>
> I only know that Simon have tested it and it fails. Cc'ing him.

Well I tried it with a post-RC 4.11 and got the below.  I haven't done
any investigation.

 -George

(XEN) CPU0 CMCI LVT vector (0xf2) already installed
(XEN) CPU0: Thermal monitoring enabled (TM1)
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Preparing system for ACPI S3 state.
(XEN) Disabling non-boot CPUs ...
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 49
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 256K
(XEN) CPU: L3 cache: 12288K
(XEN) Enabling non-boot CPUs  ...
(XEN) Booting processor 1/2 eip 8e000
(XEN) Initializing CPU#1
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 256K
(XEN) CPU: L3 cache: 12288K
(XEN) CPU1: Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz stepping 02
(XEN) Booting processor 2/18 eip 8e000
(XEN) Initializing CPU#2
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 9
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 256K
(XEN) CPU: L3 cache: 12288K
(XEN) CPU2: Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz stepping 02
(XEN) Booting processor 3/20 eip 8e000
(XEN) Initializing CPU#3
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 10
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 256K
(XEN) CPU: L3 cache: 12288K
(XEN) CPU3: Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz stepping 02
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.11-rc  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080377944>] handle_exception+0x9c/0xf7
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffffc900422480b8   rbx: 0000000000000000   rcx: 0000000000000005
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
(XEN) rbp: 000036ffbddb7f27   rsp: ffffc90042248000   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc9004224ffff
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN) cr3: 000000018a100000   cr2: ffffc90042247ff8
(XEN) fsb: 00007f6242d95700   gsb: ffff88003dc00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Current stack base ffffc90042248000 differs from expected ffff8300dfa80000
(XEN) Valid stack range: ffffc9004224e000-ffffc90042250000,
sp=ffffc90042248000, tss.rsp0=ffff8300dfa87fa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 15:48             ` George Dunlap
@ 2018-05-21 16:17               ` Andrew Cooper
  2018-05-21 16:28                 ` George Dunlap
  2018-05-22 22:21               ` Simon Gaiser
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2018-05-21 16:17 UTC (permalink / raw)
  To: George Dunlap, Marek Marczykowski
  Cc: Simon Gaiser, Ian Jackson, Jan Beulich, xen-devel

On 21/05/18 16:48, George Dunlap wrote:
> On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski
> <marmarek@invisiblethingslab.com> wrote:
>> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote:
>>>>>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
>>>> Yes, I'm happy to help with that. As I've said, the basic test is very
>>>> simple (rtcwake command) and already very useful. The fact that it is(?)
>>>> broken on staging doesn't make it easier,
>>> Details on the breakage would be appreciated (on a separate thread),
>>> unless you plan to address it yourself. I recall Simon(?) mentioning this as
>>> well, but also not providing sufficient data to consider looking into it
>>> (perhaps simply because it wasn't easy to obtain useful data, as
>>> frequently is the case with S3 resume). I think it would be nice if we could
>>> release 4.11 without a regression here.
>> I only know that Simon have tested it and it fails. Cc'ing him.
> Well I tried it with a post-RC 4.11 and got the below.  I haven't done
> any investigation.
>
>  -George
>
> <snip>
> (XEN) CPU3: Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz stepping 02
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-rc  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d080377944>] handle_exception+0x9c/0xf7

Do you have xen-syms from this build?  That looks like its in the middle
of the Spectre alternative, but isn't the wrmsr instruction itself.

> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> (XEN) rax: ffffc900422480b8   rbx: 0000000000000000   rcx: 0000000000000005
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
> (XEN) rbp: 000036ffbddb7f27   rsp: ffffc90042248000   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc9004224ffff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) cr3: 000000018a100000   cr2: ffffc90042247ff8
> (XEN) fsb: 00007f6242d95700   gsb: ffff88003dc00000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Current stack base ffffc90042248000 differs from expected ffff8300dfa80000
> (XEN) Valid stack range: ffffc9004224e000-ffffc90042250000,
> sp=ffffc90042248000, tss.rsp0=ffff8300dfa87fa0
> (XEN) No stack overflow detected. Skipping stack trace.

I really need to wire up the code dump, irrespective of this particular
issue.

~Andrew

> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) DOUBLE FAULT -- system shutdown
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 16:17               ` Andrew Cooper
@ 2018-05-21 16:28                 ` George Dunlap
  2018-05-21 17:18                   ` George Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2018-05-21 16:28 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Simon Gaiser, Ian Jackson, Marek Marczykowski, Jan Beulich, xen-devel

On Mon, May 21, 2018 at 5:17 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 21/05/18 16:48, George Dunlap wrote:
>> On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski
>> <marmarek@invisiblethingslab.com> wrote:
>>> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote:
>>>>>>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
>>>>> Yes, I'm happy to help with that. As I've said, the basic test is very
>>>>> simple (rtcwake command) and already very useful. The fact that it is(?)
>>>>> broken on staging doesn't make it easier,
>>>> Details on the breakage would be appreciated (on a separate thread),
>>>> unless you plan to address it yourself. I recall Simon(?) mentioning this as
>>>> well, but also not providing sufficient data to consider looking into it
>>>> (perhaps simply because it wasn't easy to obtain useful data, as
>>>> frequently is the case with S3 resume). I think it would be nice if we could
>>>> release 4.11 without a regression here.
>>> I only know that Simon have tested it and it fails. Cc'ing him.
>> Well I tried it with a post-RC 4.11 and got the below.  I haven't done
>> any investigation.
>>
>>  -George
>>
>> <snip>
>> (XEN) CPU3: Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz stepping 02
>> (XEN) *** DOUBLE FAULT ***
>> (XEN) ----[ Xen-4.11-rc  x86_64  debug=y   Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82d080377944>] handle_exception+0x9c/0xf7
>
> Do you have xen-syms from this build?  That looks like its in the middle
> of the Spectre alternative, but isn't the wrmsr instruction itself.

Hmm, sorry, I've trashed it -- I was really trying to test my
"acpi_sleep=s3_fake" test.

I've never tried suspend on this particular box, so I'm not sure it
works generally.  Let me get a reasonable baseline first.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 16:28                 ` George Dunlap
@ 2018-05-21 17:18                   ` George Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2018-05-21 17:18 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Simon Gaiser, Ian Jackson, Marek Marczykowski, Jan Beulich, xen-devel

On Mon, May 21, 2018 at 5:28 PM, George Dunlap <dunlapg@umich.edu> wrote:
> On Mon, May 21, 2018 at 5:17 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 21/05/18 16:48, George Dunlap wrote:
>>> On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski
>>> <marmarek@invisiblethingslab.com> wrote:
>>>> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote:
>>>>>>>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
>>>>>> Yes, I'm happy to help with that. As I've said, the basic test is very
>>>>>> simple (rtcwake command) and already very useful. The fact that it is(?)
>>>>>> broken on staging doesn't make it easier,
>>>>> Details on the breakage would be appreciated (on a separate thread),
>>>>> unless you plan to address it yourself. I recall Simon(?) mentioning this as
>>>>> well, but also not providing sufficient data to consider looking into it
>>>>> (perhaps simply because it wasn't easy to obtain useful data, as
>>>>> frequently is the case with S3 resume). I think it would be nice if we could
>>>>> release 4.11 without a regression here.
>>>> I only know that Simon have tested it and it fails. Cc'ing him.
>>> Well I tried it with a post-RC 4.11 and got the below.  I haven't done
>>> any investigation.
>>>
>>>  -George
>>>
>>> <snip>
>>> (XEN) CPU3: Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz stepping 02
>>> (XEN) *** DOUBLE FAULT ***
>>> (XEN) ----[ Xen-4.11-rc  x86_64  debug=y   Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d080377944>] handle_exception+0x9c/0xf7
>>
>> Do you have xen-syms from this build?  That looks like its in the middle
>> of the Spectre alternative, but isn't the wrmsr instruction itself.
>
> Hmm, sorry, I've trashed it -- I was really trying to test my
> "acpi_sleep=s3_fake" test.
>
> I've never tried suspend on this particular box, so I'm not sure it
> works generally.  Let me get a reasonable baseline first.

OK, well suspend / resume works on this box in all the following configurations:

* 4.8.0 (real)
* 4.8.0 with s3_fake backported (fake)
* 4.8.3 (real)
* staging-4.8 with bti=false and xpti=false (real)

It fails in the following configuration:
* staging-4.8 with speculation mitigations at default.  (It is an
Intel box, so BTI and XPTI will both be on.)

I didn't get a stack trace unfortunately -- the box just stopped responding.

I'll do some more playing around on staging tomorrow.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 13:57           ` Ian Jackson
  2018-05-21 14:11             ` George Dunlap
@ 2018-05-22  7:51             ` Dario Faggioli
  1 sibling, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2018-05-22  7:51 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, Marek Marczykowski-Górecki, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1043 bytes --]

On Mon, 2018-05-21 at 14:57 +0100, Ian Jackson wrote:
> > On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote:
> > > What if we 1) have two versions of the test -- "Fake suspend" and
> > > "Real Suspend"; 2) only run "Real suspend" on hardware
> > > specifically
> > > marked as having a suspend that works reliably; 3) default all
> > > hardware to 'false' until we do some testing to find out how
> > > reliable
> > > it is?
> > > 
>
> OK, for starters, how about we add the fake suspend test to every
> flight.
> 
> Do we want or need to do that test with a guest running ?
> 
Doing it with a guest running would be more complete, I think.

I think the best would be to do both, i.e.:
- suspend without any guest
- (when resumed) start a guest
- suspend with a guest

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-21 15:48             ` George Dunlap
  2018-05-21 16:17               ` Andrew Cooper
@ 2018-05-22 22:21               ` Simon Gaiser
  2018-05-24 13:15                 ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Simon Gaiser @ 2018-05-22 22:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Marek Marczykowski,
	Dario Faggioli, Jan Beulich, Ian Jackson


[-- Attachment #1.1.1: Type: text/plain, Size: 3343 bytes --]

George Dunlap:
> On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski
> <marmarek@invisiblethingslab.com> wrote:
>> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote:
>>>>>> On 18.05.18 at 17:33, <marmarek@invisiblethingslab.com> wrote:
>>>> Yes, I'm happy to help with that. As I've said, the basic test is very
>>>> simple (rtcwake command) and already very useful. The fact that it is(?)
>>>> broken on staging doesn't make it easier,
>>>
>>> Details on the breakage would be appreciated (on a separate thread),
>>> unless you plan to address it yourself. I recall Simon(?) mentioning this as
>>> well, but also not providing sufficient data to consider looking into it
>>> (perhaps simply because it wasn't easy to obtain useful data, as
>>> frequently is the case with S3 resume). I think it would be nice if we could
>>> release 4.11 without a regression here.
>>
>> I only know that Simon have tested it and it fails. Cc'ing him.

I run into the same problem as George below (see [1] for the inital
report).

> Well I tried it with a post-RC 4.11 and got the below.  I haven't done
> any investigation.
> 
>  -George
> 
[...]
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-rc  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d080377944>] handle_exception+0x9c/0xf7
> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> (XEN) rax: ffffc900422480b8   rbx: 0000000000000000   rcx: 0000000000000005
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
> (XEN) rbp: 000036ffbddb7f27   rsp: ffffc90042248000   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc9004224ffff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) cr3: 000000018a100000   cr2: ffffc90042247ff8
> (XEN) fsb: 00007f6242d95700   gsb: ffff88003dc00000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Current stack base ffffc90042248000 differs from expected ffff8300dfa80000
> (XEN) Valid stack range: ffffc9004224e000-ffffc90042250000,
> sp=ffffc90042248000, tss.rsp0=ffff8300dfa87fa0
> (XEN) No stack overflow detected. Skipping stack trace.
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) DOUBLE FAULT -- system shutdown
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...

I have done some more testing in the meantime. The issue also affect
4.10.1, but not 4.10.0. That's useful since it makes the bisect shorter.
A bisect identifies 8462c575d9 "x86/xpti: Hide almost all of .text and
all .data/.rodata/.bss mappings" as the commit which breaks suspend.

8462c575d9 is a squashed backport of:

  422588e885 x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
  d1d6fc97d6 x86/xpti: really hide almost all of Xen image
  044fedfaa2 x86/traps: Put idt_table[] back into .bss

And indeed, reverting those on staging fixes suspend. (This also matches
the behavior that xpti=off fixes suspend as George already reported
earlier today).

[1]: https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg01137.html


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-22 22:21               ` Simon Gaiser
@ 2018-05-24 13:15                 ` Jan Beulich
  2018-05-24 13:39                   ` Simon Gaiser
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2018-05-24 13:15 UTC (permalink / raw)
  To: Simon Gaiser
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Marek Marczykowski,
	Dario Faggioli, Ian Jackson, xen-devel

>>> On 23.05.18 at 00:21, <simon@invisiblethingslab.com> wrote:
> I have done some more testing in the meantime. The issue also affect
> 4.10.1, but not 4.10.0. That's useful since it makes the bisect shorter.
> A bisect identifies 8462c575d9 "x86/xpti: Hide almost all of .text and
> all .data/.rodata/.bss mappings" as the commit which breaks suspend.
> 
> 8462c575d9 is a squashed backport of:
> 
>   422588e885 x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
>   d1d6fc97d6 x86/xpti: really hide almost all of Xen image
>   044fedfaa2 x86/traps: Put idt_table[] back into .bss
> 
> And indeed, reverting those on staging fixes suspend. (This also matches
> the behavior that xpti=off fixes suspend as George already reported
> earlier today).

Okay, that was quite helpful - I think I see now where I screwed up (i.e.
the issue is in the middle of the three commits). Could you confirm that a
Xen booted with "nosmp" suspends and resumes fine?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Test for osstest, features used in Qubes OS
  2018-05-24 13:15                 ` Jan Beulich
@ 2018-05-24 13:39                   ` Simon Gaiser
  0 siblings, 0 replies; 22+ messages in thread
From: Simon Gaiser @ 2018-05-24 13:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Marek Marczykowski,
	Dario Faggioli, Ian Jackson, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1060 bytes --]

Jan Beulich:
>>>> On 23.05.18 at 00:21, <simon@invisiblethingslab.com> wrote:
>> I have done some more testing in the meantime. The issue also affect
>> 4.10.1, but not 4.10.0. That's useful since it makes the bisect shorter.
>> A bisect identifies 8462c575d9 "x86/xpti: Hide almost all of .text and
>> all .data/.rodata/.bss mappings" as the commit which breaks suspend.
>>
>> 8462c575d9 is a squashed backport of:
>>
>>   422588e885 x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
>>   d1d6fc97d6 x86/xpti: really hide almost all of Xen image
>>   044fedfaa2 x86/traps: Put idt_table[] back into .bss
>>
>> And indeed, reverting those on staging fixes suspend. (This also matches
>> the behavior that xpti=off fixes suspend as George already reported
>> earlier today).
> 
> Okay, that was quite helpful - I think I see now where I screwed up (i.e.
> the issue is in the middle of the three commits). Could you confirm that a
> Xen booted with "nosmp" suspends and resumes fine?

Yes, with nosmp suspend works.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-05-24 13:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16 21:54 Test for osstest, features used in Qubes OS Marek Marczykowski-Górecki
2018-05-17 12:26 ` Ian Jackson
2018-05-17 14:59   ` Marek Marczykowski-Górecki
2018-05-17 15:12     ` Ian Jackson
2018-05-17 18:00       ` Sander Eikelenboom
2018-05-18 15:44         ` Marek Marczykowski-Górecki
2018-05-18 15:33       ` Marek Marczykowski-Górecki
2018-05-18 15:54         ` Jan Beulich
2018-05-18 16:19           ` Marek Marczykowski
2018-05-21 15:48             ` George Dunlap
2018-05-21 16:17               ` Andrew Cooper
2018-05-21 16:28                 ` George Dunlap
2018-05-21 17:18                   ` George Dunlap
2018-05-22 22:21               ` Simon Gaiser
2018-05-24 13:15                 ` Jan Beulich
2018-05-24 13:39                   ` Simon Gaiser
2018-05-21 11:04       ` George Dunlap
2018-05-21 11:52         ` Dario Faggioli
2018-05-21 13:57           ` Ian Jackson
2018-05-21 14:11             ` George Dunlap
2018-05-22  7:51             ` Dario Faggioli
2018-05-21 11:49       ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.