All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 123379: regressions - FAIL
@ 2018-05-31  6:00 osstest service owner
  2018-05-31  8:32 ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: osstest service owner @ 2018-05-31  6:00 UTC (permalink / raw)
  To: xen-devel, osstest-admin

flight 123379 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/123379/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 fail REGR. vs. 123323
 test-armhf-armhf-xl-arndale   5 host-ping-check-native   fail REGR. vs. 123323

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop            fail like 123323
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop            fail like 123323
 test-armhf-armhf-libvirt     14 saverestore-support-check    fail  like 123323
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-check    fail  like 123323
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop             fail like 123323
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop             fail like 123323
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop             fail like 123323
 test-armhf-armhf-libvirt-raw 13 saverestore-support-check    fail  like 123323
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop            fail like 123323
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop            fail like 123323
 test-amd64-i386-xl-pvshim    12 guest-start                  fail   never pass
 test-amd64-amd64-libvirt     13 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      13 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-check    fail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-check    fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      13 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      14 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-xsm      13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-xsm      14 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-check    fail  never pass
 test-armhf-armhf-xl          13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          14 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-rtds     13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     14 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt     13 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-check    fail never pass
 test-arm64-arm64-xl          13 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          14 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      13 saverestore-support-check    fail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop              fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install         fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install        fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-install        fail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install         fail never pass

version targeted for testing:
 xen                  06f542f8f2e446c01bd0edab51e9450af7f6e05b
baseline version:
 xen                  fc5805daef091240cd5fc06634a8bcdb2f3bb843

Last test of basis   123323  2018-05-28 23:34:10 Z    2 days
Testing same since   123379  2018-05-29 21:42:20 Z    1 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Ian Jackson <Ian.Jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Juergen Gross <jgross@suse.com>
  Lars Kurth <lars.kurth@citrix.com>
  Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
  Tim Deegan <tim@xen.org>
  Wei Liu <wei.liu2@citrix.com>

jobs:
 build-amd64-xsm                                              pass    
 build-arm64-xsm                                              pass    
 build-armhf-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-arm64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-arm64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-arm64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 build-amd64-rumprun                                          pass    
 build-i386-rumprun                                           pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-arm64-arm64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            fail    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-arm64-arm64-libvirt-xsm                                 pass    
 test-armhf-armhf-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-arm64-arm64-xl-xsm                                      pass    
 test-armhf-armhf-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvhv2-amd                                pass    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-rumprun-amd64                               pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-amd64-xl-qemut-ws16-amd64                         fail    
 test-amd64-i386-xl-qemut-ws16-amd64                          fail    
 test-amd64-amd64-xl-qemuu-ws16-amd64                         fail    
 test-amd64-i386-xl-qemuu-ws16-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  fail    
 test-amd64-amd64-xl-credit2                                  pass    
 test-arm64-arm64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-amd64-examine                                     pass    
 test-arm64-arm64-examine                                     pass    
 test-armhf-armhf-examine                                     pass    
 test-amd64-i386-examine                                      pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-i386-rumprun-i386                                 pass    
 test-amd64-amd64-xl-qemut-win10-i386                         fail    
 test-amd64-i386-xl-qemut-win10-i386                          fail    
 test-amd64-amd64-xl-qemuu-win10-i386                         fail    
 test-amd64-i386-xl-qemuu-win10-i386                          fail    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvhv2-intel                              pass    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-livepatch                                   pass    
 test-amd64-i386-livepatch                                    pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-amd64-pvgrub                                pass    
 test-amd64-amd64-i386-pvgrub                                 pass    
 test-amd64-amd64-xl-pvshim                                   pass    
 test-amd64-i386-xl-pvshim                                    fail    
 test-amd64-amd64-pygrub                                      pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-xl-raw                                       pass    
 test-amd64-amd64-xl-rtds                                     pass    
 test-armhf-armhf-xl-rtds                                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow             pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow              pass    
 test-amd64-amd64-xl-shadow                                   pass    
 test-amd64-i386-xl-shadow                                    pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-armhf-armhf-xl-vhd                                      pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit 06f542f8f2e446c01bd0edab51e9450af7f6e05b
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue May 29 12:39:24 2018 +0200

    x86/CPUID: don't override tool stack decision to hide STIBP
    
    Other than in the feature sets, where we indeed want to offer the
    feature even if not enumerated on hardware, we shouldn't dictate the
    feature being available if tool stack or host admin have decided to not
    expose it (for whatever [questionable?] reason). That feature set side
    override is sufficient to achieve the intended guest side safety
    property (in offering - by default - STIBP independent of actual
    availability in hardware).
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Release-acked-by: Juergen Gross <jgross@suse.com>

commit d6239f64713df819278bf048446d3187c6ac4734
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue May 29 12:38:52 2018 +0200

    x86: correct default_xen_spec_ctrl calculation
    
    Even with opt_msr_sc_{pv,hvm} both false we should set up the variable
    as usual, to ensure proper one-time setup during boot and CPU bringup.
    This then also brings the code in line with the comment immediately
    ahead of the printk() being modified saying "irrespective of guests".
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Release-acked-by: Juergen Gross <jgross@suse.com>

commit b7eb9d8bd61ecdc399e8fc41ea4bbff35cbe0755
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue May 29 12:38:09 2018 +0200

    x86: suppress sync when XPTI is disabled for a domain
    
    Now that we have a per-domain flag we can and should control sync-ing in
    a more fine grained manner: Only domains having XPTI enabled need the
    sync to occur.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Release-acked-by: Juergen Gross <jgross@suse.com>

commit 849cc9ac56eff8a8d575ed9f484aad72f383862c
Author: Jan Beulich <JBeulich@suse.com>
Date:   Tue May 22 05:40:02 2018 -0600

    libxc/x86/PV: don't hand through CPUID leaf 0x80000008 as is
    
    Just like for HVM the feature set should be used for EBX output, while
    EAX should be restricted to the low 16 bits and ECX/EDX should be zero.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

commit 2de2b10b2252761baa5dd0077df384dbfcca8212
Author: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Date:   Tue May 22 21:47:45 2018 +0200

    tools/kdd: alternative way of muting spurious gcc warning
    
    Older gcc does not support #pragma GCC diagnostics, so use alternative
    approach - change variable type to uint32_t (this code handle 32-bit
    requests only anyway), which apparently also avoid gcc complaining about
    this (otherwise correct) code.
    
    Fixes 437e00fea04becc91c1b6bc1c0baa636b067a5cc "tools/kdd: mute spurious
    gcc warning"
    
    Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Release-acked-by: Juergen Gross <jgross@suse.com>
    Acked-by: Tim Deegan <tim@xen.org>

commit 09afb9e78e1e90ce77d5107677a8464e8410802b
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Wed Dec 13 11:58:00 2017 +0000

    docs/process/xen-release-management: Lesson to learn
    
    The 4.10 release preparation was significantly more hairy than ideal.
    (We seem to have a good overall outcome despite, rather than because
    of, our approach.)
    
    This is the second time (at least) that we have come close to failure
    by committing to a release date before the exact code to be released
    is known and has been made and tested.
    
    Evidently our docs makes it insufficiently clear not to do that.
    
    CC: Julien Grall <julien.grall@arm.com>
    Acked-by: Juergen Gross <jgross@suse.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: Lars Kurth <lars.kurth@citrix.com>
    Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

commit 480b5ebcc98810aa8bb670a28900a62d02a48cbc
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Tue May 22 17:39:52 2018 +0100

    docs/process: Add RUBRIC
    
    Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
    Acked-by: Juergen Gross <jgross@suse.com>

commit 4712c0a231f010253a5471531e335a5a13dcec76
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Thu May 24 15:06:16 2018 +0100

    x86/traps: Dump the instruction stream even for double faults
    
    This helps debug #DF's which occur in alternative patches
    
    Reported-by: George Dunlap <george.dunlap@eu.citrix.com>
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Release-acked-by: Juergen Gross <jgross@suse.com>

commit 6b9562dac1746014ab376bd2cf8ba400acf34c6d
Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon May 28 11:20:26 2018 +0200

    x86/XPTI: fix S3 resume (and CPU offlining in general)
    
    We should index an L1 table with an L1 index.
    
    Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Release-acked-by: Juergen Gross <jgross@suse.com>
(qemu changes not included)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-05-31  6:00 [xen-unstable test] 123379: regressions - FAIL osstest service owner
@ 2018-05-31  8:32 ` Juergen Gross
  2018-05-31  9:14   ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2018-05-31  8:32 UTC (permalink / raw)
  To: xen-devel

On 31/05/18 08:00, osstest service owner wrote:
> flight 123379 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/123379/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 fail REGR. vs. 123323

AFAICS this seems to be the suspected Windows reboot again?

>  test-armhf-armhf-xl-arndale   5 host-ping-check-native   fail REGR. vs. 123323

Flaky hardware again?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-05-31  8:32 ` Juergen Gross
@ 2018-05-31  9:14   ` Juergen Gross
  2018-06-01  8:10     ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2018-05-31  9:14 UTC (permalink / raw)
  To: xen-devel

On 31/05/18 10:32, Juergen Gross wrote:
> On 31/05/18 08:00, osstest service owner wrote:
>> flight 123379 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/123379/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 fail REGR. vs. 123323
> 
> AFAICS this seems to be the suspected Windows reboot again?

Hmm, thinking more about it: xl save is done with the domU paused,
so the guest rebooting concurrently is rather improbable.

As this is an issue occurring sporadically not only during 4.11
development phase I don't think this should be a blocker.

Thoughts?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-05-31  9:14   ` Juergen Gross
@ 2018-06-01  8:10     ` Jan Beulich
  2018-06-01  9:08       ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2018-06-01  8:10 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel

>>> On 31.05.18 at 11:14, <jgross@suse.com> wrote:
> On 31/05/18 10:32, Juergen Gross wrote:
>> On 31/05/18 08:00, osstest service owner wrote:
>>> flight 123379 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/123379/ 
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
> fail REGR. vs. 123323
>> 
>> AFAICS this seems to be the suspected Windows reboot again?
> 
> Hmm, thinking more about it: xl save is done with the domU paused,
> so the guest rebooting concurrently is rather improbable.

Not sure, considering e.g.

libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address

When looking into the Windows reboot issue (note this we're not dealing
with Windows here), I had noticed that there was a problem with trying
to save the guest at the "wrong" time. Generally, as explained back then,
I think the tool stack should honor the guest trying to reboot when it is
already in the process of being migrated/saved, and migration/save
should not even be attempted when the guest has already signaled
reboot (iirc it's only the former that is an actual issue). Otherwise the
tool stack will internally try to drive the same guest into two distinct new
states at the same time. Giving reboot (or shutdown) higher priority than
migration/save seems natural to me: A rebooting guest can be moved to
the new host with no migration cost at all, and a shut down guest doesn't
need (live) moving in the first place.

> As this is an issue occurring sporadically not only during 4.11
> development phase I don't think this should be a blocker.

Yes and no: Yes, it's not a regression. But as long as we don't make this
a blocker, I don't think the issue will be addressed, considering for how
long it has been there already.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-01  8:10     ` Jan Beulich
@ 2018-06-01  9:08       ` Juergen Gross
  2018-06-05 16:16         ` Ian Jackson
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-01  9:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Ian Jackson

On 01/06/18 10:10, Jan Beulich wrote:
>>>> On 31.05.18 at 11:14, <jgross@suse.com> wrote:
>> On 31/05/18 10:32, Juergen Gross wrote:
>>> On 31/05/18 08:00, osstest service owner wrote:
>>>> flight 123379 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/123379/ 
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>> fail REGR. vs. 123323
>>>
>>> AFAICS this seems to be the suspected Windows reboot again?
>>
>> Hmm, thinking more about it: xl save is done with the domU paused,
>> so the guest rebooting concurrently is rather improbable.
> 
> Not sure, considering e.g.
> 
> libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address

That was at 2018-05-30 22:12:49.650+0000

Before that there was:

2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14
= Bad address): Internal error

But looking at the messages issued some seconds before that I see some
xenstore watch related messages in:

http://logs.test-lab.xenproject.org/osstest/logs/123379/test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm/huxelrebe1---var-log-libvirt-libxl-libxl-driver.log

which make me wonder whether the libxl watch handling is really
correct: e.g. libxl__ev_xswatch_register() first registers the watch
with xenstore and only then writes the data needed for processing the
watch in the related structure. Could it be that the real suspend watch
event was interpreted as a @releaseDomain event?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-01  9:08       ` Juergen Gross
@ 2018-06-05 16:16         ` Ian Jackson
  2018-06-06  7:39           ` Juergen Gross
  2018-06-05 16:19         ` Ian Jackson
  2018-06-08 14:25         ` Ad-hoc test instructions (was Re: [xen-unstable test] 123379: regressions - FAIL) Ian Jackson
  2 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2018-06-05 16:16 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Wei Liu, Jan Beulich

Juergen Gross writes ("Re: [Xen-devel] [xen-unstable test] 123379: regressions - FAIL"):
> Before that there was:
> 
> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14
> = Bad address): Internal error

This seems to be the only message about the root cause.

> But looking at the messages issued some seconds before that I see some
> xenstore watch related messages in:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/123379/test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm/huxelrebe1---var-log-libvirt-libxl-libxl-driver.log

I think this is all a red herring.

What I see happening is:

2018-05-30 22:12:44.695+0000: libxl: libxl_event.c:636:libxl__ev_xswatch_register: watch w=0xb40005e8 wpath=/local/domain/3/control/shutdown token=2/b: register slotnum=2

libxl starts watching the domain's shutdown control node.  I think
this is done from near libxl_dom_suspend.c:202.

2018-05-30 22:12:44.696+0000: libxl: libxl_event.c:573:watchfd_callback: watch w=0xb40005e8 wpath=/local/domain/3/control/shutdown token=2/b: event epath=/local/domain/3/control/shutdown

The watch we just set triggers.  This happens with every xenstore
watch, after it is set up - ie, it does not mean that anything had
written the node.

2018-05-30 22:12:44.696+0000: libxl: libxl_event.c:673:libxl__ev_xswatch_deregister: watch w=0xb40005e8 wpath=/local/domain/3/control/shutdown token=2/b: deregister slotnum=2

libxl stops watching the domain's shutdown control node.  This is
done, I think, by domain_suspend_common_pvcontrol_suspending
(libxl_dom_suspend.c:222).

We can conclude that
  if (!rc && !domain_suspend_pvcontrol_acked(state))
was not taken.  It seems unlikely that rc!=0, because the
node is read in xswait_xswatch_callback using libxl__xs_read_checked
which I think would log a message.  So probably
/local/domain/3/control/shutdown was `suspend', meaning the domain had
indeed acked the suspend request.

2018-05-30 22:12:44.696+0000: libxl: libxl_event.c:636:libxl__ev_xswatch_register: watch w=0xb40005f8 wpath=@releaseDomain token=2/c: register slotnum=2

This is the watch registration in domain_suspend_common_wait_guest.

2018-05-30 22:12:44.696+0000: libxl: libxl_event.c:548:watchfd_callback: watch w=0xb40005f8 epath=/local/domain/3/control/shutdown token=2/b: counter != c

This is a watch event for the watch we set up at 2018-05-30
22:12:44.696+0000.  You can tell because the token is the same.  But
that watch was cancelled within libxl at 2018-05-30
22:12:44.696+0000.  libxl's watch handling machinery knows this, and
discards the event.  "counter != c", libxl_event.c:547.

It does indeed use the same slot in the libxl xswatch data structure,
but libxl can distinguish it by the counter and the event path.  (In
any case xs watch handlers should tolerate spurious events and be
idempotent, although that does not matter here.)

I think this must be the watch event from the guest actually writing
its acknowledgement to the control node - we would indeed expect two
such events, one generated by the watch setup, and one from the
guest's write.  The timing meant that here we processed the guest's
written value as a result of the first watch event.  This is fine.

2018-05-30 22:12:44.696+0000: libxl: libxl_event.c:573:watchfd_callback: watch w=0xb40005f8 wpath=@releaseDomain token=2/c: event epath=@releaseDomain

This is the immediate-auto-firing of the @releaseDomain event set up
at 2018-05-30 22:12:44.696+0000.  libxl's xswatch machinery looks this
up in slot 2 and finds that the counter and paths are right, so it
will dispatch it to suspend_common_wait_guest_watch which is a
frontend for suspend_common_wait_guest_check.

In the absence of log messages from that function we can conclude that
  !(info.flags & XEN_DOMINF_shutdown)
ie the guest has not shut down yet.

2018-05-30 22:12:44.720+0000: libxl: libxl_event.c:573:watchfd_callback: watch w=0xb2a26708 wpath=@releaseDomain token=3/0: event epath=@releaseDomain

This is a watch event which was set up much earlier at 2018-05-30
21:58:02.182+0000.  The surrounding context there (references to
domain_death_xswatch_callback) makes it clear that this is pursuant to
libxl_evenable_domain_death - ie, libvirt asked libxl to monitor for
the death of the domain.

2018-05-30 22:12:44.724+0000: libxl: libxl_domain.c:816:domain_death_xswatch_callback:  shutdown reporting

The output here is a bit perplexing.  I don't understand how we can
have the message "shutdown reporting" without any previous message
"Exists shutdown_reported=%d" or "[evg=%p] nentries=%d rc=%d %ld..%ld"
both of which seem to precede the "shutdown reporting" message in
domain_death_xswatch_callback.

However, we can conclude that, at this point, libxl finds that
  got->flags & XEN_DOMINF_shutdown
and it decides to inform libvirt that the domain has shut down,
by providing a DOMAIN_SHUTDOWN libxl event.

(This event is not passed to libvirt immediately yet because it lives
on either (a) a queue on this thread's stack, which will be processed
on return to libvirt, or (b) a queue associated with the CTX, whose
lock we hold.  The callback to libvirt will be reported later.)

2018-05-30 22:12:44.724+0000: libxl: libxl_domain.c:771:domain_death_xswatch_callback: [evg=0] all reported
2018-05-30 22:12:44.724+0000: libxl: libxl_domain.c:833:domain_death_xswatch_callback: domain death search done

This is the end of the search ing domain_death_xswatch_callback for
domains which need to be reported.  libvirt was listening only for one
domain.

2018-05-30 22:12:44.724+0000: libxl: libxl_event.c:573:watchfd_callback: watch w=0xb40005f8 wpath=@releaseDomain token=2/c: event epath=@releaseDomain

Another xs watch event for the same domain shutdown, because libxl had
set up two watches for it.

(These will probably have been written very quickly together into the
xs ring and/or socket, and handled within the loop in
libxl_event.c:watchfd_callback.  So we have not yet released the CTX
lock or returned to libvirt: therefore, probably, the domain shutdown
event notification to libvirt is still queued with libxl.  Indeed as
we will see, that occurs a bit later.)

This is another watch event from the registration in
domain_suspend_common_wait_guest and again it will call
suspend_common_wait_guest_check.

2018-05-30 22:12:44.724+0000: libxl: libxl_event.c:673:libxl__ev_xswatch_deregister: watch w=0xb40005f8 wpath=@releaseDomain token=2/c: deregister slotnum=2

This must logically be one of the libxl__ev_xswatch_deregister calls
in domain_suspend_common_guest_suspended / domain_suspend_common_done.

However, looking at suspend_common_wait_guest_check, reaching either
of those should have produced some kind of log message - either "guest
has suspended" (DEBUG) or an ERROR of some sort.

... oh I have just spotted this logfile ...

http://logs.test-lab.xenproject.org/osstest/logs/123379/test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm/huxelrebe1---var-log-libvirt-libxl-debianhvm.guest.osstest.log

... which contains the other half of the messages.

So we see there:

2018-05-30 22:12:44.724+0000: libxl: libxl_dom_suspend.c:350:suspend_common_wait_guest_check: Domain 3:guest has suspended

OK.

2018-05-30 22:12:44.731+0000: libxl: libxl_event.c:686:libxl__ev_xswatch_deregister: watch w=0xb40005f8: deregister unregistered

This is the second call to libxl__ev_xswatch_deregister for the same
watch event.  That tells us that the first call must have been in
domain_suspend_common_guest_suspended.  So all is going well, and we
called domain_suspend_common_done, whose idempotent cleanup
deregisters the already-deregisterd watch.

2018-05-30 22:12:44.731+0000: libxl: libxl_event.c:1398:egc_run_callbacks: event 0xb40068f0 callback type=domain_shutdown

This is libxl calling libvirt to tell libvirt that the domain has shut
down.  libvirt does not seem to respond.

2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address

And this is the first thing that goes wrong.  You can see similar
messages in the other logfile:

2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address

All of these are reports of the same thing: xc_get_pfn_type_batch at
xc_sr_save.c:133 failed with EFAULT.


2018-05-30 22:12:49.650+0000: libxl: libxl.c:746:libxl__fd_flags_restore: fnctl F_SETFL of fd 33 to 0x8001
2018-05-30 22:12:49.650+0000: libxl: libxl_event.c:1869:libxl__ao_complete: ao 0xb4000478: complete, rc=-3
2018-05-30 22:12:49.650+0000: libxl: libxl_event.c:1838:libxl__ao__destroy: ao 0xb4000478: destroy

> which make me wonder whether the libxl watch handling is really
> correct: e.g. libxl__ev_xswatch_register() first registers the watch
> with xenstore and only then writes the data needed for processing the
> watch in the related structure. Could it be that the real suspend watch
> event was interpreted as a @releaseDomain event?

No.  The code in libxl__ev_xswatch_register all runs with the libxl
CTX lock held so it cannot be interrupted in this way.  As you see
above I have analysed the log and it is all operating correctly,
albeit rather noisily in the debug log.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-01  9:08       ` Juergen Gross
  2018-06-05 16:16         ` Ian Jackson
@ 2018-06-05 16:19         ` Ian Jackson
  2018-06-06  9:35           ` Jan Beulich
       [not found]           ` <5B17AAE102000078001C8972@suse.com>
  2018-06-08 14:25         ` Ad-hoc test instructions (was Re: [xen-unstable test] 123379: regressions - FAIL) Ian Jackson
  2 siblings, 2 replies; 22+ messages in thread
From: Ian Jackson @ 2018-06-05 16:19 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 

I thought I would reply again with the key point from my earlier mail
highlighted, and go a bit further.  The first thing to go wrong in
this was:

2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address

You can see similar messages in the other logfile:

2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address

All of these are reports of the same thing: xc_get_pfn_type_batch at
xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.

There is no corresponding message in the host's serial log nor the
dom0 kernel log.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-05 16:16         ` Ian Jackson
@ 2018-06-06  7:39           ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-06  7:39 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich

On 05/06/18 18:16, Ian Jackson wrote:
> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error

This is worrying me.

The message is issued as a result of xc_get_pfn_type_batch() failing.
I see no other possibility for the failure with errno being 14 (EFAULT)
than the hypervisor failing a copy from/to guest for either struct
xen_domctl or the pfn array passed via struct xen_domctl (op
XEN_DOMCTL_getpageframeinfo3). Both should be accessible as they have
been correctly declared via DECLARE_HYPERCALL_BOUNCE() in xc_private.c.

Any ideas how that could have happened?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-05 16:19         ` Ian Jackson
@ 2018-06-06  9:35           ` Jan Beulich
       [not found]           ` <5B17AAE102000078001C8972@suse.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2018-06-06  9:35 UTC (permalink / raw)
  To: Ian Jackson, Juergen Gross; +Cc: Ian Jackson, Wei Liu, xen-devel

>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>> >  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
> 
> I thought I would reply again with the key point from my earlier mail
> highlighted, and go a bit further.  The first thing to go wrong in
> this was:
> 
> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
> 
> You can see similar messages in the other logfile:
> 
> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
> 
> All of these are reports of the same thing: xc_get_pfn_type_batch at
> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
> 
> There is no corresponding message in the host's serial log nor the
> dom0 kernel log.

I vaguely recall from the time when I had looked at the similar Windows
migration issues that the guest is already in the process of being cleaned
up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
warning") intentionally suppressed a log message here, and the
immediately following debugging code (933f966bcd x86/mm: add
temporary debugging code to get_page_from_gfn_p2m()) was reverted
a little over a month later. This wasn't as a follow-up to another patch
(fix), but following the discussion rooted at
https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
       [not found]           ` <5B17AAE102000078001C8972@suse.com>
@ 2018-06-06  9:40             ` Juergen Gross
  2018-06-07 11:30               ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2018-06-06  9:40 UTC (permalink / raw)
  To: Jan Beulich, Ian Jackson <ian.jackson@citrix.com>
  Cc: Ian Jackson, Wei Liu, xen-devel

On 06/06/18 11:35, Jan Beulich wrote:
>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>
>> I thought I would reply again with the key point from my earlier mail
>> highlighted, and go a bit further.  The first thing to go wrong in
>> this was:
>>
>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>
>> You can see similar messages in the other logfile:
>>
>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>
>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>
>> There is no corresponding message in the host's serial log nor the
>> dom0 kernel log.
> 
> I vaguely recall from the time when I had looked at the similar Windows
> migration issues that the guest is already in the process of being cleaned
> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
> warning") intentionally suppressed a log message here, and the
> immediately following debugging code (933f966bcd x86/mm: add
> temporary debugging code to get_page_from_gfn_p2m()) was reverted
> a little over a month later. This wasn't as a follow-up to another patch
> (fix), but following the discussion rooted at
> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html

That was -ESRCH, not -EFAULT.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-06  9:40             ` Juergen Gross
@ 2018-06-07 11:30               ` Juergen Gross
  2018-06-08 10:12                 ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2018-06-07 11:30 UTC (permalink / raw)
  To: Jan Beulich, Ian Jackson <ian.jackson@citrix.com>
  Cc: Ian Jackson, Wei Liu, xen-devel

On 06/06/18 11:40, Juergen Gross wrote:
> On 06/06/18 11:35, Jan Beulich wrote:
>>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>>
>>> I thought I would reply again with the key point from my earlier mail
>>> highlighted, and go a bit further.  The first thing to go wrong in
>>> this was:
>>>
>>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>>
>>> You can see similar messages in the other logfile:
>>>
>>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>>
>>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>>
>>> There is no corresponding message in the host's serial log nor the
>>> dom0 kernel log.
>>
>> I vaguely recall from the time when I had looked at the similar Windows
>> migration issues that the guest is already in the process of being cleaned
>> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
>> warning") intentionally suppressed a log message here, and the
>> immediately following debugging code (933f966bcd x86/mm: add
>> temporary debugging code to get_page_from_gfn_p2m()) was reverted
>> a little over a month later. This wasn't as a follow-up to another patch
>> (fix), but following the discussion rooted at
>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html
> 
> That was -ESRCH, not -EFAULT.

I've looked a little bit more into this.

As we are seeing EFAULT being returned by the hypervisor this either
means the tools are specifying an invalid address (quite unlikely)
or the buffers are not as MAP_LOCKED as we wish them to be.

Is there a way to see whether the host was experiencing some memory
shortage, so the buffers might have been swapped out?

man mmap tells me: "This implementation will try to populate (prefault)
the whole range but the mmap call doesn't fail with ENOMEM if this
fails. Therefore major faults might happen later on."

And: "One should use mmap(2) plus mlock(2) when major faults are not
acceptable after the initialization of the mapping."

With osdep_alloc_pages() in tools/libs/call/linux.c touching all the
hypercall buffer pages before doing the hypercall I'm not sure this
could be an issue.

Any thoughts on that?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-07 11:30               ` Juergen Gross
@ 2018-06-08 10:12                 ` Juergen Gross
  2018-06-12 15:58                   ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2018-06-08 10:12 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich

On 07/06/18 13:30, Juergen Gross wrote:
> On 06/06/18 11:40, Juergen Gross wrote:
>> On 06/06/18 11:35, Jan Beulich wrote:
>>>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>>>
>>>> I thought I would reply again with the key point from my earlier mail
>>>> highlighted, and go a bit further.  The first thing to go wrong in
>>>> this was:
>>>>
>>>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>>>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>>>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>>>
>>>> You can see similar messages in the other logfile:
>>>>
>>>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>>>
>>>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>>>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>>>
>>>> There is no corresponding message in the host's serial log nor the
>>>> dom0 kernel log.
>>>
>>> I vaguely recall from the time when I had looked at the similar Windows
>>> migration issues that the guest is already in the process of being cleaned
>>> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
>>> warning") intentionally suppressed a log message here, and the
>>> immediately following debugging code (933f966bcd x86/mm: add
>>> temporary debugging code to get_page_from_gfn_p2m()) was reverted
>>> a little over a month later. This wasn't as a follow-up to another patch
>>> (fix), but following the discussion rooted at
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html
>>
>> That was -ESRCH, not -EFAULT.
> 
> I've looked a little bit more into this.
> 
> As we are seeing EFAULT being returned by the hypervisor this either
> means the tools are specifying an invalid address (quite unlikely)
> or the buffers are not as MAP_LOCKED as we wish them to be.
> 
> Is there a way to see whether the host was experiencing some memory
> shortage, so the buffers might have been swapped out?
> 
> man mmap tells me: "This implementation will try to populate (prefault)
> the whole range but the mmap call doesn't fail with ENOMEM if this
> fails. Therefore major faults might happen later on."
> 
> And: "One should use mmap(2) plus mlock(2) when major faults are not
> acceptable after the initialization of the mapping."
> 
> With osdep_alloc_pages() in tools/libs/call/linux.c touching all the
> hypercall buffer pages before doing the hypercall I'm not sure this
> could be an issue.
> 
> Any thoughts on that?

Ian, is there a chance to dedicate a machine to a specific test trying
to reproduce the problem? In case we manage to get this failure in a
reasonable time frame I guess the most promising approach would be to
use a test hypervisor producing more debug data. If you think this is
worth doing I can write a patch.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Ad-hoc test instructions (was Re: [xen-unstable test] 123379: regressions - FAIL)
  2018-06-01  9:08       ` Juergen Gross
  2018-06-05 16:16         ` Ian Jackson
  2018-06-05 16:19         ` Ian Jackson
@ 2018-06-08 14:25         ` Ian Jackson
  2018-06-08 15:42           ` Juergen Gross
  2 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2018-06-08 14:25 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel

Apropos of the irc conversation below, in particular my suggestion to
use mg-repro-setup.  Probably, an appropriate rune is

  ./mg-repro-setup -f123855 -Ejgross@suse.com 123855 test-amd64-amd64-xl-qemuu-ovmf-amd64 guest-saverestore.2 host=alloc:'{equiv-albana}'

(You will have wanted to
   git clone ~osstest/testing.git
   cd testing
to make yourself a working tree.  To run pieces of osstest, your
cwd should be the root of the osstest source tree.)

What mg-repro-setup will do is:

 * Make a new flight consisting of one job
     test-amd64-amd64-xl-qemuu-ovmf-amd64
   whose recipe and parameters are copied from the corresponding job
   in 123855 - and which will reuse builds from 123855 too.

 * Try to allocate a host according to '{equiv-albana}', which is an
   expression meaning "any host with the equiv-albana flag set", ie
   one of the (two) albanas.  This will have to wait for a slot,
   but as a command line user you get a highish priority.

 * Run the repro flight, including all the steps up to and including
   the one with testid "guest-saverestore.2".  This will wipe the
   albana machine allocated above and install on it the same
   versions of everything as used for the same job in 123855.

   (You can also ask mg-repro-setup to wipe and reinstall an existing
   host you have already allocated, or to reuse existing host and put
   the relevant Xen and kernel on it but without wiping it.  See the
   usage message.)

 * Email you with a report, comparing the results with 123855

After this, you will still have the relevant albana[01] machine
allocated and you may log into it etc. etc.  If you want to connect to
its serial port,
   ssh -vt serial2.test-lab sympathy -r albanaN
(from your own workstation)

You can power cycle it with
   mg-hosts power albanaN reboot

Also you may use the flight constructed by ./mg-repro-setup for your
own ad-hoc-tests:

  export OSSTEST_FLIGHT=<whatever number you were told by mg-repro-setup>
  export OSSTEST_JOB=test-amd64-amd64-xl-qemuu-ovmf-amd64

  ./ts-guest-saverestore host=albanaN debian

When you are done, you must release the host manually

  mg-allocate ^albananN

Please ask me on IRC if you have any questions.  There are also docs
in the osstest tree.  HTH.

Ian.


14:48 <juergen_gross> Okay. Already found some interesting samples, e.g. from 
                      4.9 tests. All found up to now on different hosts 
                      (silvana0, chardonnay0, huxelrebe1)
14:49 <juergen_gross> So the chances for some hardware specific bug are rather 
                      slim, other than x86 :-)
14:55 <juergen_gross> Diziet: are most tests now running with dom0 4.14 kernel?
14:55 <Diziet> Basically all the x86 tests that aren't tests of some Linux 
               version.
14:56 <juergen_gross> Okay, so it is no strange coincidents all failures have 
                      been with 4.14. :-) BTW: even 4.6 Xen has the same problem
14:58 <juergen_gross> Is it possible to run ts-guest-saverestore in a loop on a 
                      machine to have an idea how long it takes to reproduce it?
14:58 <Diziet> Yes.
14:58 <juergen_gross> I'll write a hypervisor patch to get more diagnostics then
14:58 <Diziet> The easiest way is probably to use mg-repro-setup to find a 
               host, set everything up, repro the test up to that point, and 
               then run ts-repeat-test by hand in a shell
14:59 <Diziet> You could indeed also install a new hypervisor after 
               mg-repro-setup has done its thing
15:00 <juergen_gross> I guess the best would be if you could send me mail with 
                      either the commands or a hint where to find the needed 
                      info. I should do those tests myself to learn using them, 
                      but I'd appreciate some help to start
15:00 <Diziet> Of course.
15:00 <juergen_gross> Thanks.
15:00 <Diziet> You mentioned having found some other occurrences.
15:00 <juergen_gross> Yes.
15:01 <Diziet> You might want to pick one that has a relatively simple guest 
               (and ideally not a Windows one)
15:01 <Diziet> Can you tell me the flight number and job name from that ?
15:01 <juergen_gross> Flight 123855 test-amd64-amd64-xl-qemuu-ovmf-amd64
15:02 <Diziet> t\z
15:02 <Diziet> ta I mean
...
15:09 <juergen_gross> Shows clearly the same problem
15:09 <Diziet> Indeed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Ad-hoc test instructions (was Re: [xen-unstable test] 123379: regressions - FAIL)
  2018-06-08 14:25         ` Ad-hoc test instructions (was Re: [xen-unstable test] 123379: regressions - FAIL) Ian Jackson
@ 2018-06-08 15:42           ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-08 15:42 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On 08/06/18 16:25, Ian Jackson wrote:
> Apropos of the irc conversation below, in particular my suggestion to
> use mg-repro-setup.  Probably, an appropriate rune is
> 
>   ./mg-repro-setup -f123855 -Ejgross@suse.com 123855 test-amd64-amd64-xl-qemuu-ovmf-amd64 guest-saverestore.2 host=alloc:'{equiv-albana}'
> 
> (You will have wanted to
>    git clone ~osstest/testing.git
>    cd testing
> to make yourself a working tree.  To run pieces of osstest, your
> cwd should be the root of the osstest source tree.)

./mg-repro-setup should do a "mkdir -p tmp":

./mg-repro-setup -f123855 -Ejgross@suse.com 123855
test-amd64-amd64-xl-qemuu-ovmf-amd64 guest-saverestore.2
host=alloc:'{equiv-albana}'
logging to tmp/mg-repro-setup.log
touch: cannot touch 'tmp/mg-repro-setup.log': No such file or directory
savelog: could not touch tmp/mg-repro-setup.log


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [xen-unstable test] 123379: regressions - FAIL
  2018-06-08 10:12                 ` Juergen Gross
@ 2018-06-12 15:58                   ` Juergen Gross
  2018-06-13  6:11                     ` Jan Beulich
                                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-12 15:58 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich

On 08/06/18 12:12, Juergen Gross wrote:
> On 07/06/18 13:30, Juergen Gross wrote:
>> On 06/06/18 11:40, Juergen Gross wrote:
>>> On 06/06/18 11:35, Jan Beulich wrote:
>>>>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>>>>
>>>>> I thought I would reply again with the key point from my earlier mail
>>>>> highlighted, and go a bit further.  The first thing to go wrong in
>>>>> this was:
>>>>>
>>>>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>>>>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>>>>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>>>>
>>>>> You can see similar messages in the other logfile:
>>>>>
>>>>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>>>>
>>>>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>>>>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>>>>
>>>>> There is no corresponding message in the host's serial log nor the
>>>>> dom0 kernel log.
>>>>
>>>> I vaguely recall from the time when I had looked at the similar Windows
>>>> migration issues that the guest is already in the process of being cleaned
>>>> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
>>>> warning") intentionally suppressed a log message here, and the
>>>> immediately following debugging code (933f966bcd x86/mm: add
>>>> temporary debugging code to get_page_from_gfn_p2m()) was reverted
>>>> a little over a month later. This wasn't as a follow-up to another patch
>>>> (fix), but following the discussion rooted at
>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html
>>>
>>> That was -ESRCH, not -EFAULT.
>>
>> I've looked a little bit more into this.
>>
>> As we are seeing EFAULT being returned by the hypervisor this either
>> means the tools are specifying an invalid address (quite unlikely)
>> or the buffers are not as MAP_LOCKED as we wish them to be.
>>
>> Is there a way to see whether the host was experiencing some memory
>> shortage, so the buffers might have been swapped out?
>>
>> man mmap tells me: "This implementation will try to populate (prefault)
>> the whole range but the mmap call doesn't fail with ENOMEM if this
>> fails. Therefore major faults might happen later on."
>>
>> And: "One should use mmap(2) plus mlock(2) when major faults are not
>> acceptable after the initialization of the mapping."
>>
>> With osdep_alloc_pages() in tools/libs/call/linux.c touching all the
>> hypercall buffer pages before doing the hypercall I'm not sure this
>> could be an issue.
>>
>> Any thoughts on that?
> 
> Ian, is there a chance to dedicate a machine to a specific test trying
> to reproduce the problem? In case we manage to get this failure in a
> reasonable time frame I guess the most promising approach would be to
> use a test hypervisor producing more debug data. If you think this is
> worth doing I can write a patch.

Trying to reproduce the problem in a limited test environment finally
worked: doing a loop of "xl save -c" produced the problem after 198
iterations.

I have asked a SUSE engineer doing kernel memory management if he
could think of something. His idea is that maybe some kthread could be
the reason for our problem, e.g. trying page migration or compaction
(at least on the test machine I've looked at compaction of mlocked
pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).

In order to be really sure nothing in the kernel can temporarily
switch hypercall buffer pages read-only or invalid for the hypervisor
we'll have to modify the privcmd driver interface: it will have to
gain knowledge which pages are handed over to the hypervisor as buffers
in order to be able to lock them accordingly via get_user_pages().

While this is a possible explanation of the fault we are seeing it might
be related to another reason. So I'm going to apply some modifications
to the hypervisor to get some more diagnostics in order to verify the
suspected kernel behavior is really the reason for the hypervisor to
return EFAULT.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-12 15:58                   ` Juergen Gross
@ 2018-06-13  6:11                     ` Jan Beulich
       [not found]                     ` <5B20B5A602000078001CAACA@suse.com>
  2018-06-13  8:52                     ` Juergen Gross
  2 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2018-06-13  6:11 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Ian Jackson, Wei Liu, xen-devel

>>> On 12.06.18 at 17:58, <jgross@suse.com> wrote:
> Trying to reproduce the problem in a limited test environment finally
> worked: doing a loop of "xl save -c" produced the problem after 198
> iterations.
> 
> I have asked a SUSE engineer doing kernel memory management if he
> could think of something. His idea is that maybe some kthread could be
> the reason for our problem, e.g. trying page migration or compaction
> (at least on the test machine I've looked at compaction of mlocked
> pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).

Iirc the primary goal of compaction is to make contiguous memory
available for huge page allocations. PV not using huge pages, this is
of no interest here. The secondary consideration of physically
contiguous I/O buffer is an illusion only under PV, so perhaps not
much more of an interest (albeit I can see drivers wanting to allocate
physically contiguous buffers nevertheless now and then, but I'd
expect this to be mostly limited to driver initialization and device hot
add).

So it is perhaps at least worth considering whether to turn off
compaction/migration when running PV. But the problem would still
need addressing then mid-term, as PVH Dom0 would have the same
issue (and of course DomU, i.e. including HVM, can make hypercalls
too, and hence would be affected as well, just perhaps not as
visibly).

> In order to be really sure nothing in the kernel can temporarily
> switch hypercall buffer pages read-only or invalid for the hypervisor
> we'll have to modify the privcmd driver interface: it will have to
> gain knowledge which pages are handed over to the hypervisor as buffers
> in order to be able to lock them accordingly via get_user_pages().

So are you / is he saying that mlock() doesn't protect against such
playing with process memory? Teaching the privcmd driver of all
the indirections in hypercall request structures doesn't look very
attractive (or maintainable). Or are you thinking of the caller
providing sideband information describing the buffers involved,
perhaps along the lines of how dm_op was designed?

There's another option, but that has potentially severe drawbacks
too: Instead of returning -EFAULT on buffer access issues, we
could raise #PF on the very hypercall insn. Maybe something to
consider as an opt-in for PV/PVH, and as default for HVM.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
       [not found]                     ` <5B20B5A602000078001CAACA@suse.com>
@ 2018-06-13  6:50                       ` Juergen Gross
  2018-06-13  7:21                         ` Jan Beulich
       [not found]                         ` <5B20C5E002000078001CAB80@suse.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-13  6:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, xen-devel

On 13/06/18 08:11, Jan Beulich wrote:
>>>> On 12.06.18 at 17:58, <jgross@suse.com> wrote:
>> Trying to reproduce the problem in a limited test environment finally
>> worked: doing a loop of "xl save -c" produced the problem after 198
>> iterations.
>>
>> I have asked a SUSE engineer doing kernel memory management if he
>> could think of something. His idea is that maybe some kthread could be
>> the reason for our problem, e.g. trying page migration or compaction
>> (at least on the test machine I've looked at compaction of mlocked
>> pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).
> 
> Iirc the primary goal of compaction is to make contiguous memory
> available for huge page allocations. PV not using huge pages, this is
> of no interest here. The secondary consideration of physically
> contiguous I/O buffer is an illusion only under PV, so perhaps not
> much more of an interest (albeit I can see drivers wanting to allocate
> physically contiguous buffers nevertheless now and then, but I'd
> expect this to be mostly limited to driver initialization and device hot
> add).
> 
> So it is perhaps at least worth considering whether to turn off
> compaction/migration when running PV. But the problem would still
> need addressing then mid-term, as PVH Dom0 would have the same
> issue (and of course DomU, i.e. including HVM, can make hypercalls
> too, and hence would be affected as well, just perhaps not as
> visibly).

I think we should try to solve the problem by being aware of such
possibilities. Another potential source would be NUMA memory
migration (not now in pv, of course). And who knows what will come
in the next years.

> 
>> In order to be really sure nothing in the kernel can temporarily
>> switch hypercall buffer pages read-only or invalid for the hypervisor
>> we'll have to modify the privcmd driver interface: it will have to
>> gain knowledge which pages are handed over to the hypervisor as buffers
>> in order to be able to lock them accordingly via get_user_pages().
> 
> So are you / is he saying that mlock() doesn't protect against such
> playing with process memory?

Right. Due to proper locking in the kernel this is just a guarantee you
won't ever see a fault for such a page in user mode.

> Teaching the privcmd driver of all
> the indirections in hypercall request structures doesn't look very
> attractive (or maintainable). Or are you thinking of the caller
> providing sideband information describing the buffers involved,
> perhaps along the lines of how dm_op was designed?

I thought about that, yes. libxencall already has all the needed data
for that. Another possibility would be a dedicated ioctl for registering
a hypercall buffer (or some of them).

> There's another option, but that has potentially severe drawbacks
> too: Instead of returning -EFAULT on buffer access issues, we
> could raise #PF on the very hypercall insn. Maybe something to
> consider as an opt-in for PV/PVH, and as default for HVM.

Hmm, I'm not sure this will solve any problem. I'm not aware that it
is considered good practice to just access a user buffer from kernel
without using copyin()/copyout() when you haven't locked the page(s)
via get_user_pages(), even if the buffer was mlock()ed. Returning
-EFAULT is the right thing to do, I believe.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-13  6:50                       ` Juergen Gross
@ 2018-06-13  7:21                         ` Jan Beulich
       [not found]                         ` <5B20C5E002000078001CAB80@suse.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2018-06-13  7:21 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Ian Jackson, Wei Liu, xen-devel

>>> On 13.06.18 at 08:50, <jgross@suse.com> wrote:
> On 13/06/18 08:11, Jan Beulich wrote:
>> Teaching the privcmd driver of all
>> the indirections in hypercall request structures doesn't look very
>> attractive (or maintainable). Or are you thinking of the caller
>> providing sideband information describing the buffers involved,
>> perhaps along the lines of how dm_op was designed?
> 
> I thought about that, yes. libxencall already has all the needed data
> for that. Another possibility would be a dedicated ioctl for registering
> a hypercall buffer (or some of them).

I'm not sure that's an option: Is it legitimate (secure) to retain the
effects of get_user_pages() across system calls?

>> There's another option, but that has potentially severe drawbacks
>> too: Instead of returning -EFAULT on buffer access issues, we
>> could raise #PF on the very hypercall insn. Maybe something to
>> consider as an opt-in for PV/PVH, and as default for HVM.
> 
> Hmm, I'm not sure this will solve any problem. I'm not aware that it
> is considered good practice to just access a user buffer from kernel
> without using copyin()/copyout() when you haven't locked the page(s)
> via get_user_pages(), even if the buffer was mlock()ed. Returning
> -EFAULT is the right thing to do, I believe.

But we're talking about the very copyin()/copyout(), just that here
it's being amortized by doing the operation just once (in the
hypervisor). A #PF would arise from syscall buffer copyin()/copyout(),
and the suggestion was to produce the same effect for the squashed
operation. Perhaps we wouldn't want #PF to come back from ordinary
(kernel invoked) hypercalls, but ones relayed by privcmd are different
in many ways anyway (see the stac()/clac() pair around the actual
call, for example).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
       [not found]                         ` <5B20C5E002000078001CAB80@suse.com>
@ 2018-06-13  7:57                           ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-13  7:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, xen-devel

On 13/06/18 09:21, Jan Beulich wrote:
>>>> On 13.06.18 at 08:50, <jgross@suse.com> wrote:
>> On 13/06/18 08:11, Jan Beulich wrote:
>>> Teaching the privcmd driver of all
>>> the indirections in hypercall request structures doesn't look very
>>> attractive (or maintainable). Or are you thinking of the caller
>>> providing sideband information describing the buffers involved,
>>> perhaps along the lines of how dm_op was designed?
>>
>> I thought about that, yes. libxencall already has all the needed data
>> for that. Another possibility would be a dedicated ioctl for registering
>> a hypercall buffer (or some of them).
> 
> I'm not sure that's an option: Is it legitimate (secure) to retain the
> effects of get_user_pages() across system calls?

I have to check that.

>>> There's another option, but that has potentially severe drawbacks
>>> too: Instead of returning -EFAULT on buffer access issues, we
>>> could raise #PF on the very hypercall insn. Maybe something to
>>> consider as an opt-in for PV/PVH, and as default for HVM.
>>
>> Hmm, I'm not sure this will solve any problem. I'm not aware that it
>> is considered good practice to just access a user buffer from kernel
>> without using copyin()/copyout() when you haven't locked the page(s)
>> via get_user_pages(), even if the buffer was mlock()ed. Returning
>> -EFAULT is the right thing to do, I believe.
> 
> But we're talking about the very copyin()/copyout(), just that here
> it's being amortized by doing the operation just once (in the
> hypervisor). A #PF would arise from syscall buffer copyin()/copyout(),
> and the suggestion was to produce the same effect for the squashed
> operation. Perhaps we wouldn't want #PF to come back from ordinary
> (kernel invoked) hypercalls, but ones relayed by privcmd are different
> in many ways anyway (see the stac()/clac() pair around the actual
> call, for example).

Aah, okay. This is an option, but it would require some kind of
interface to tell the hypervisor it should raise the #PF instead of
returning -EFAULT, of course, as the kernel has to be prepared for
that.

I like that idea very much!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-12 15:58                   ` Juergen Gross
  2018-06-13  6:11                     ` Jan Beulich
       [not found]                     ` <5B20B5A602000078001CAACA@suse.com>
@ 2018-06-13  8:52                     ` Juergen Gross
  2018-06-13  8:58                       ` Andrew Cooper
  2 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2018-06-13  8:52 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich

On 12/06/18 17:58, Juergen Gross wrote:
> On 08/06/18 12:12, Juergen Gross wrote:
>> On 07/06/18 13:30, Juergen Gross wrote:
>>> On 06/06/18 11:40, Juergen Gross wrote:
>>>> On 06/06/18 11:35, Jan Beulich wrote:
>>>>>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>>>>>
>>>>>> I thought I would reply again with the key point from my earlier mail
>>>>>> highlighted, and go a bit further.  The first thing to go wrong in
>>>>>> this was:
>>>>>>
>>>>>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>>>>>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>>>>>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>>>>>
>>>>>> You can see similar messages in the other logfile:
>>>>>>
>>>>>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>>>>>
>>>>>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>>>>>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>>>>>
>>>>>> There is no corresponding message in the host's serial log nor the
>>>>>> dom0 kernel log.
>>>>>
>>>>> I vaguely recall from the time when I had looked at the similar Windows
>>>>> migration issues that the guest is already in the process of being cleaned
>>>>> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
>>>>> warning") intentionally suppressed a log message here, and the
>>>>> immediately following debugging code (933f966bcd x86/mm: add
>>>>> temporary debugging code to get_page_from_gfn_p2m()) was reverted
>>>>> a little over a month later. This wasn't as a follow-up to another patch
>>>>> (fix), but following the discussion rooted at
>>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html
>>>>
>>>> That was -ESRCH, not -EFAULT.
>>>
>>> I've looked a little bit more into this.
>>>
>>> As we are seeing EFAULT being returned by the hypervisor this either
>>> means the tools are specifying an invalid address (quite unlikely)
>>> or the buffers are not as MAP_LOCKED as we wish them to be.
>>>
>>> Is there a way to see whether the host was experiencing some memory
>>> shortage, so the buffers might have been swapped out?
>>>
>>> man mmap tells me: "This implementation will try to populate (prefault)
>>> the whole range but the mmap call doesn't fail with ENOMEM if this
>>> fails. Therefore major faults might happen later on."
>>>
>>> And: "One should use mmap(2) plus mlock(2) when major faults are not
>>> acceptable after the initialization of the mapping."
>>>
>>> With osdep_alloc_pages() in tools/libs/call/linux.c touching all the
>>> hypercall buffer pages before doing the hypercall I'm not sure this
>>> could be an issue.
>>>
>>> Any thoughts on that?
>>
>> Ian, is there a chance to dedicate a machine to a specific test trying
>> to reproduce the problem? In case we manage to get this failure in a
>> reasonable time frame I guess the most promising approach would be to
>> use a test hypervisor producing more debug data. If you think this is
>> worth doing I can write a patch.
> 
> Trying to reproduce the problem in a limited test environment finally
> worked: doing a loop of "xl save -c" produced the problem after 198
> iterations.
> 
> I have asked a SUSE engineer doing kernel memory management if he
> could think of something. His idea is that maybe some kthread could be
> the reason for our problem, e.g. trying page migration or compaction
> (at least on the test machine I've looked at compaction of mlocked
> pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).
> 
> In order to be really sure nothing in the kernel can temporarily
> switch hypercall buffer pages read-only or invalid for the hypervisor
> we'll have to modify the privcmd driver interface: it will have to
> gain knowledge which pages are handed over to the hypervisor as buffers
> in order to be able to lock them accordingly via get_user_pages().
> 
> While this is a possible explanation of the fault we are seeing it might
> be related to another reason. So I'm going to apply some modifications
> to the hypervisor to get some more diagnostics in order to verify the
> suspected kernel behavior is really the reason for the hypervisor to
> return EFAULT.

I was lucky. Took only 39 iterations this time.

The debug data confirms the theory that the kernel is setting the PTE to
invalid or read only for a short amount of time:

(XEN) fixup for address 00007ffb9904fe44, error_code 0002:
(XEN) Pagetable walk from 00007ffb9904fe44:
(XEN)  L4[0x0ff] = 0000000458da6067 0000000000019190
(XEN)  L3[0x1ee] = 0000000457d26067 0000000000018210
(XEN)  L2[0x0c8] = 0000000445ab3067 0000000000006083
(XEN)  L1[0x04f] = 8000000458cdc107 000000000001925a
(XEN) Xen call trace:
(XEN)    [<ffff82d0802abe31>] __copy_to_user_ll+0x27/0x30
(XEN)    [<ffff82d080272edb>] arch_do_domctl+0x5a8/0x2648
(XEN)    [<ffff82d080206d5d>] do_domctl+0x18fb/0x1c4e
(XEN)    [<ffff82d08036d1ba>] pv_hypercall+0x1f4/0x43e
(XEN)    [<ffff82d0803734a6>] lstar_enter+0x116/0x120

The page was writable again when the page walk data has been collected,
but A and D bits still are 0 (which should not be the case in case the
kernel didn't touch the PTE, as the hypervisor read from that page some
instructions before the failed write).

Starting with the Xen patches now...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-13  8:52                     ` Juergen Gross
@ 2018-06-13  8:58                       ` Andrew Cooper
  2018-06-13  9:02                         ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2018-06-13  8:58 UTC (permalink / raw)
  To: Juergen Gross, Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich

On 13/06/18 09:52, Juergen Gross wrote:
> On 12/06/18 17:58, Juergen Gross wrote:
>> On 08/06/18 12:12, Juergen Gross wrote:
>>> On 07/06/18 13:30, Juergen Gross wrote:
>>>> On 06/06/18 11:40, Juergen Gross wrote:
>>>>> On 06/06/18 11:35, Jan Beulich wrote:
>>>>>>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>>>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>>>>>> I thought I would reply again with the key point from my earlier mail
>>>>>>> highlighted, and go a bit further.  The first thing to go wrong in
>>>>>>> this was:
>>>>>>>
>>>>>>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>>>>>>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>>>>>>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>>>>>>
>>>>>>> You can see similar messages in the other logfile:
>>>>>>>
>>>>>>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>>>>>>
>>>>>>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>>>>>>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>>>>>>
>>>>>>> There is no corresponding message in the host's serial log nor the
>>>>>>> dom0 kernel log.
>>>>>> I vaguely recall from the time when I had looked at the similar Windows
>>>>>> migration issues that the guest is already in the process of being cleaned
>>>>>> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
>>>>>> warning") intentionally suppressed a log message here, and the
>>>>>> immediately following debugging code (933f966bcd x86/mm: add
>>>>>> temporary debugging code to get_page_from_gfn_p2m()) was reverted
>>>>>> a little over a month later. This wasn't as a follow-up to another patch
>>>>>> (fix), but following the discussion rooted at
>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html
>>>>> That was -ESRCH, not -EFAULT.
>>>> I've looked a little bit more into this.
>>>>
>>>> As we are seeing EFAULT being returned by the hypervisor this either
>>>> means the tools are specifying an invalid address (quite unlikely)
>>>> or the buffers are not as MAP_LOCKED as we wish them to be.
>>>>
>>>> Is there a way to see whether the host was experiencing some memory
>>>> shortage, so the buffers might have been swapped out?
>>>>
>>>> man mmap tells me: "This implementation will try to populate (prefault)
>>>> the whole range but the mmap call doesn't fail with ENOMEM if this
>>>> fails. Therefore major faults might happen later on."
>>>>
>>>> And: "One should use mmap(2) plus mlock(2) when major faults are not
>>>> acceptable after the initialization of the mapping."
>>>>
>>>> With osdep_alloc_pages() in tools/libs/call/linux.c touching all the
>>>> hypercall buffer pages before doing the hypercall I'm not sure this
>>>> could be an issue.
>>>>
>>>> Any thoughts on that?
>>> Ian, is there a chance to dedicate a machine to a specific test trying
>>> to reproduce the problem? In case we manage to get this failure in a
>>> reasonable time frame I guess the most promising approach would be to
>>> use a test hypervisor producing more debug data. If you think this is
>>> worth doing I can write a patch.
>> Trying to reproduce the problem in a limited test environment finally
>> worked: doing a loop of "xl save -c" produced the problem after 198
>> iterations.
>>
>> I have asked a SUSE engineer doing kernel memory management if he
>> could think of something. His idea is that maybe some kthread could be
>> the reason for our problem, e.g. trying page migration or compaction
>> (at least on the test machine I've looked at compaction of mlocked
>> pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).
>>
>> In order to be really sure nothing in the kernel can temporarily
>> switch hypercall buffer pages read-only or invalid for the hypervisor
>> we'll have to modify the privcmd driver interface: it will have to
>> gain knowledge which pages are handed over to the hypervisor as buffers
>> in order to be able to lock them accordingly via get_user_pages().
>>
>> While this is a possible explanation of the fault we are seeing it might
>> be related to another reason. So I'm going to apply some modifications
>> to the hypervisor to get some more diagnostics in order to verify the
>> suspected kernel behavior is really the reason for the hypervisor to
>> return EFAULT.
> I was lucky. Took only 39 iterations this time.
>
> The debug data confirms the theory that the kernel is setting the PTE to
> invalid or read only for a short amount of time:
>
> (XEN) fixup for address 00007ffb9904fe44, error_code 0002:
> (XEN) Pagetable walk from 00007ffb9904fe44:
> (XEN)  L4[0x0ff] = 0000000458da6067 0000000000019190
> (XEN)  L3[0x1ee] = 0000000457d26067 0000000000018210
> (XEN)  L2[0x0c8] = 0000000445ab3067 0000000000006083
> (XEN)  L1[0x04f] = 8000000458cdc107 000000000001925a
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0802abe31>] __copy_to_user_ll+0x27/0x30
> (XEN)    [<ffff82d080272edb>] arch_do_domctl+0x5a8/0x2648
> (XEN)    [<ffff82d080206d5d>] do_domctl+0x18fb/0x1c4e
> (XEN)    [<ffff82d08036d1ba>] pv_hypercall+0x1f4/0x43e
> (XEN)    [<ffff82d0803734a6>] lstar_enter+0x116/0x120
>
> The page was writable again when the page walk data has been collected,
> but A and D bits still are 0 (which should not be the case in case the
> kernel didn't touch the PTE, as the hypervisor read from that page some
> instructions before the failed write).
>
> Starting with the Xen patches now...

Given that walk, I'd expect the spurious pagefault logic to have kicked
in, and retried.

Presumably the spurious walk logic saw the non-present/read-only mappings?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [xen-unstable test] 123379: regressions - FAIL
  2018-06-13  8:58                       ` Andrew Cooper
@ 2018-06-13  9:02                         ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2018-06-13  9:02 UTC (permalink / raw)
  To: Andrew Cooper, Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich

On 13/06/18 10:58, Andrew Cooper wrote:
> On 13/06/18 09:52, Juergen Gross wrote:
>> On 12/06/18 17:58, Juergen Gross wrote:
>>> On 08/06/18 12:12, Juergen Gross wrote:
>>>> On 07/06/18 13:30, Juergen Gross wrote:
>>>>> On 06/06/18 11:40, Juergen Gross wrote:
>>>>>> On 06/06/18 11:35, Jan Beulich wrote:
>>>>>>>>>> On 05.06.18 at 18:19, <ian.jackson@citrix.com> wrote:
>>>>>>>>>>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 14 guest-saverestore.2 
>>>>>>>> I thought I would reply again with the key point from my earlier mail
>>>>>>>> highlighted, and go a bit further.  The first thing to go wrong in
>>>>>>>> this was:
>>>>>>>>
>>>>>>>> 2018-05-30 22:12:49.320+0000: xc: Failed to get types for pfn batch (14 = Bad address): Internal error
>>>>>>>> 2018-05-30 22:12:49.483+0000: xc: Save failed (14 = Bad address): Internal error
>>>>>>>> 2018-05-30 22:12:49.648+0000: libxl-save-helper: complete r=-1: Bad address
>>>>>>>>
>>>>>>>> You can see similar messages in the other logfile:
>>>>>>>>
>>>>>>>> 2018-05-30 22:12:49.650+0000: libxl: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Bad address
>>>>>>>>
>>>>>>>> All of these are reports of the same thing: xc_get_pfn_type_batch at
>>>>>>>> xc_sr_save.c:133 failed with EFAULT.  I'm afraid I don't know why.
>>>>>>>>
>>>>>>>> There is no corresponding message in the host's serial log nor the
>>>>>>>> dom0 kernel log.
>>>>>>> I vaguely recall from the time when I had looked at the similar Windows
>>>>>>> migration issues that the guest is already in the process of being cleaned
>>>>>>> up when these occur. Commit 2dbe9c3cd2 ("x86/mm: silence a pointless
>>>>>>> warning") intentionally suppressed a log message here, and the
>>>>>>> immediately following debugging code (933f966bcd x86/mm: add
>>>>>>> temporary debugging code to get_page_from_gfn_p2m()) was reverted
>>>>>>> a little over a month later. This wasn't as a follow-up to another patch
>>>>>>> (fix), but following the discussion rooted at
>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg00324.html
>>>>>> That was -ESRCH, not -EFAULT.
>>>>> I've looked a little bit more into this.
>>>>>
>>>>> As we are seeing EFAULT being returned by the hypervisor this either
>>>>> means the tools are specifying an invalid address (quite unlikely)
>>>>> or the buffers are not as MAP_LOCKED as we wish them to be.
>>>>>
>>>>> Is there a way to see whether the host was experiencing some memory
>>>>> shortage, so the buffers might have been swapped out?
>>>>>
>>>>> man mmap tells me: "This implementation will try to populate (prefault)
>>>>> the whole range but the mmap call doesn't fail with ENOMEM if this
>>>>> fails. Therefore major faults might happen later on."
>>>>>
>>>>> And: "One should use mmap(2) plus mlock(2) when major faults are not
>>>>> acceptable after the initialization of the mapping."
>>>>>
>>>>> With osdep_alloc_pages() in tools/libs/call/linux.c touching all the
>>>>> hypercall buffer pages before doing the hypercall I'm not sure this
>>>>> could be an issue.
>>>>>
>>>>> Any thoughts on that?
>>>> Ian, is there a chance to dedicate a machine to a specific test trying
>>>> to reproduce the problem? In case we manage to get this failure in a
>>>> reasonable time frame I guess the most promising approach would be to
>>>> use a test hypervisor producing more debug data. If you think this is
>>>> worth doing I can write a patch.
>>> Trying to reproduce the problem in a limited test environment finally
>>> worked: doing a loop of "xl save -c" produced the problem after 198
>>> iterations.
>>>
>>> I have asked a SUSE engineer doing kernel memory management if he
>>> could think of something. His idea is that maybe some kthread could be
>>> the reason for our problem, e.g. trying page migration or compaction
>>> (at least on the test machine I've looked at compaction of mlocked
>>> pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).
>>>
>>> In order to be really sure nothing in the kernel can temporarily
>>> switch hypercall buffer pages read-only or invalid for the hypervisor
>>> we'll have to modify the privcmd driver interface: it will have to
>>> gain knowledge which pages are handed over to the hypervisor as buffers
>>> in order to be able to lock them accordingly via get_user_pages().
>>>
>>> While this is a possible explanation of the fault we are seeing it might
>>> be related to another reason. So I'm going to apply some modifications
>>> to the hypervisor to get some more diagnostics in order to verify the
>>> suspected kernel behavior is really the reason for the hypervisor to
>>> return EFAULT.
>> I was lucky. Took only 39 iterations this time.
>>
>> The debug data confirms the theory that the kernel is setting the PTE to
>> invalid or read only for a short amount of time:
>>
>> (XEN) fixup for address 00007ffb9904fe44, error_code 0002:
>> (XEN) Pagetable walk from 00007ffb9904fe44:
>> (XEN)  L4[0x0ff] = 0000000458da6067 0000000000019190
>> (XEN)  L3[0x1ee] = 0000000457d26067 0000000000018210
>> (XEN)  L2[0x0c8] = 0000000445ab3067 0000000000006083
>> (XEN)  L1[0x04f] = 8000000458cdc107 000000000001925a
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d0802abe31>] __copy_to_user_ll+0x27/0x30
>> (XEN)    [<ffff82d080272edb>] arch_do_domctl+0x5a8/0x2648
>> (XEN)    [<ffff82d080206d5d>] do_domctl+0x18fb/0x1c4e
>> (XEN)    [<ffff82d08036d1ba>] pv_hypercall+0x1f4/0x43e
>> (XEN)    [<ffff82d0803734a6>] lstar_enter+0x116/0x120
>>
>> The page was writable again when the page walk data has been collected,
>> but A and D bits still are 0 (which should not be the case in case the
>> kernel didn't touch the PTE, as the hypervisor read from that page some
>> instructions before the failed write).
>>
>> Starting with the Xen patches now...
> 
> Given that walk, I'd expect the spurious pagefault logic to have kicked
> in, and retried.
> 
> Presumably the spurious walk logic saw the non-present/read-only mappings?

I guess so.

Otherwise my debug coding wouldn't have been called...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-06-13  9:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31  6:00 [xen-unstable test] 123379: regressions - FAIL osstest service owner
2018-05-31  8:32 ` Juergen Gross
2018-05-31  9:14   ` Juergen Gross
2018-06-01  8:10     ` Jan Beulich
2018-06-01  9:08       ` Juergen Gross
2018-06-05 16:16         ` Ian Jackson
2018-06-06  7:39           ` Juergen Gross
2018-06-05 16:19         ` Ian Jackson
2018-06-06  9:35           ` Jan Beulich
     [not found]           ` <5B17AAE102000078001C8972@suse.com>
2018-06-06  9:40             ` Juergen Gross
2018-06-07 11:30               ` Juergen Gross
2018-06-08 10:12                 ` Juergen Gross
2018-06-12 15:58                   ` Juergen Gross
2018-06-13  6:11                     ` Jan Beulich
     [not found]                     ` <5B20B5A602000078001CAACA@suse.com>
2018-06-13  6:50                       ` Juergen Gross
2018-06-13  7:21                         ` Jan Beulich
     [not found]                         ` <5B20C5E002000078001CAB80@suse.com>
2018-06-13  7:57                           ` Juergen Gross
2018-06-13  8:52                     ` Juergen Gross
2018-06-13  8:58                       ` Andrew Cooper
2018-06-13  9:02                         ` Juergen Gross
2018-06-08 14:25         ` Ad-hoc test instructions (was Re: [xen-unstable test] 123379: regressions - FAIL) Ian Jackson
2018-06-08 15:42           ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.