xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [xen-unstable test] 110009: regressions - FAIL
@ 2017-06-05 16:55 osstest service owner
  2017-06-06 12:59 ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: osstest service owner @ 2017-06-05 16:55 UTC (permalink / raw)
  To: xen-devel, osstest-admin

[-- Attachment #1: Type: text/plain, Size: 13885 bytes --]

flight 110009 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110009/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 109841

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 109803
 test-armhf-armhf-libvirt     13 saverestore-support-check    fail  like 109828
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop            fail like 109841
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check    fail  like 109841
 test-armhf-armhf-xl-rtds     15 guest-start/debian.repeat    fail  like 109841
 test-amd64-amd64-xl-rtds      9 debian-install               fail  like 109841
 test-armhf-armhf-libvirt-raw 12 saverestore-support-check    fail  like 109841
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop             fail like 109841
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-install        fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-install        fail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-xsm      12 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      13 saverestore-support-check    fail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-arm64-arm64-xl          12 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          13 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-check        fail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-check    fail  never pass
 test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-check    fail never pass
 test-armhf-armhf-xl-xsm      12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-xsm      13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-vhd      11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      12 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-check        fail   never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install         fail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install         fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-install        fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install         fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install         fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-install        fail never pass

version targeted for testing:
 xen                  d8eed4021d50eb48ca75c8559aed95a2ad74afaa
baseline version:
 xen                  876800d5f9de8b15355172794cb82f505dd26e18

Last test of basis   109841  2017-05-30 02:02:16 Z    6 days
Failing since        109866  2017-05-30 19:48:42 Z    5 days    7 attempts
Testing same since   109957  2017-06-03 10:00:05 Z    2 days    4 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Armando Vega <armando@greenhost.nl>
  Borislav Petkov <bp@suse.de>
  George Dunlap <george.dunlap@eu.citrix.com>
  Gregory Herrero <gregory.herrero@oracle.com>
  Haozhong Zhang <haozhong.zhang@intel.com>
  Ian Jackson <Ian.Jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Julien Grall <julien.grall@arm.com>
  Kevin Tian <kevin.tian@intel.com>
  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  Luwei Kang <luwei.kang@intel.com>
  Roger Pau Monné <roger.pau@citrix.com>
  Swapnil Paratey <swapnil.paratey@amd.com>
  Wei Liu <wei.liu2@citrix.com>
  Zhang Bo <oscar.zhangbo@huawei.com>

jobs:
 build-amd64-xsm                                              pass    
 build-arm64-xsm                                              pass    
 build-armhf-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-arm64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-arm64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-arm64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 build-amd64-rumprun                                          pass    
 build-i386-rumprun                                           pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-arm64-arm64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-arm64-arm64-libvirt-xsm                                 pass    
 test-armhf-armhf-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-arm64-arm64-xl-xsm                                      pass    
 test-armhf-armhf-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvh-amd                                  pass    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-rumprun-amd64                               pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-amd64-xl-qemut-ws16-amd64                         fail    
 test-amd64-i386-xl-qemut-ws16-amd64                          fail    
 test-amd64-amd64-xl-qemuu-ws16-amd64                         fail    
 test-amd64-i386-xl-qemuu-ws16-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit2                                  pass    
 test-arm64-arm64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-amd64-examine                                     pass    
 test-arm64-arm64-examine                                     pass    
 test-armhf-armhf-examine                                     pass    
 test-amd64-i386-examine                                      pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-i386-rumprun-i386                                 pass    
 test-amd64-amd64-xl-qemut-win10-i386                         fail    
 test-amd64-i386-xl-qemut-win10-i386                          fail    
 test-amd64-amd64-xl-qemuu-win10-i386                         fail    
 test-amd64-i386-xl-qemuu-win10-i386                          fail    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvh-intel                                pass    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-amd64-pvgrub                                pass    
 test-amd64-amd64-i386-pvgrub                                 pass    
 test-amd64-amd64-pygrub                                      pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-xl-raw                                       pass    
 test-amd64-amd64-xl-rtds                                     fail    
 test-armhf-armhf-xl-rtds                                     fail    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-armhf-armhf-xl-vhd                                      pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 589 lines long.)


[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-05 16:55 [xen-unstable test] 110009: regressions - FAIL osstest service owner
@ 2017-06-06 12:59 ` Jan Beulich
  2017-06-06 13:20   ` Andrew Cooper
  2017-06-06 14:00   ` Ian Jackson
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2017-06-06 12:59 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Ian Jackson; +Cc: xen-devel, osstest-admin

>>> On 05.06.17 at 18:55, <osstest-admin@xenproject.org> wrote:
> flight 110009 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/110009/ 
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 109841

So finally we have some output from the debugging code added by
933f966bcd ("x86/mm: add temporary debugging code to
get_page_from_gfn_p2m()"), i.e. the migration heisenbug we hope
to hunt down:

(XEN) d0v2: d7 dying (looking up 3e000)
...
(XEN) Xen call trace:
(XEN)    [<ffff82d0803150ef>] get_page_from_gfn_p2m+0x7b/0x416
(XEN)    [<ffff82d080268e88>] arch_do_domctl+0x51a/0x2535
(XEN)    [<ffff82d080206cf9>] do_domctl+0x17e4/0x1baf
(XEN)    [<ffff82d080355896>] pv_hypercall+0x1ef/0x42d
(XEN)    [<ffff82d0803594c6>] entry.o#test_all_events+0/0x30

which points at XEN_DOMCTL_getpageframeinfo3 handling code.
What business would the tool stack have invoking this domctl for
a dying domain? I'd expect all of these operations to be done
while the domain is still alive (perhaps paused), but none of them
to occur once domain death was initiated.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-06 12:59 ` Jan Beulich
@ 2017-06-06 13:20   ` Andrew Cooper
  2017-06-06 14:00     ` Jan Beulich
  2017-06-06 14:00   ` Ian Jackson
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2017-06-06 13:20 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu, Ian Jackson; +Cc: xen-devel, osstest-admin

On 06/06/17 13:59, Jan Beulich wrote:
>>>> On 05.06.17 at 18:55, <osstest-admin@xenproject.org> wrote:
>> flight 110009 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/110009/ 
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 109841
> So finally we have some output from the debugging code added by
> 933f966bcd ("x86/mm: add temporary debugging code to
> get_page_from_gfn_p2m()"), i.e. the migration heisenbug we hope
> to hunt down:
>
> (XEN) d0v2: d7 dying (looking up 3e000)
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0803150ef>] get_page_from_gfn_p2m+0x7b/0x416
> (XEN)    [<ffff82d080268e88>] arch_do_domctl+0x51a/0x2535
> (XEN)    [<ffff82d080206cf9>] do_domctl+0x17e4/0x1baf
> (XEN)    [<ffff82d080355896>] pv_hypercall+0x1ef/0x42d
> (XEN)    [<ffff82d0803594c6>] entry.o#test_all_events+0/0x30
>
> which points at XEN_DOMCTL_getpageframeinfo3 handling code.
> What business would the tool stack have invoking this domctl for
> a dying domain? I'd expect all of these operations to be done
> while the domain is still alive (perhaps paused), but none of them
> to occur once domain death was initiated.

http://logs.test-lab.xenproject.org/osstest/logs/110009/test-amd64-amd64-xl-qemut-win7-amd64/15.ts-guest-localmigrate.log
is rather curious.  Unfortunately, libxl doesn't annotate the source and
destination logging lines when it merges them back together, and doesn't
include the progress markers.  I've manually rearranged them back to a
logical order.

libxl-save-helper: debug: starting save: Success
xc: detail: fd 10, dom 7, max_iters 0, max_factor 0, flags 5, hvm 1
xc: info: Saving domain 7, type x86 HVM
xc: error: Failed to get types for pfn batch (3 = No such process):
Internal error
xc: error: Save failed (3 = No such process): Internal error
xc: error: Couldn't disable qemu log-dirty mode (3 = No such process):
Internal error
xc: error: Failed to clean up (3 = No such process): Internal error

The first -ESRCH here is the result of XEN_DOMCTL_getpageframeinfo3
encountering a dying domain.  The qemu logdirty error is because the
libxl callback found that the qemu process it was expecting talk to
doesn't exist.

From
http://logs.test-lab.xenproject.org/osstest/logs/110009/test-amd64-amd64-xl-qemut-win7-amd64/elbling1---var-log-xen-xl-win.guest.osstest.log

libxl: debug: libxl_domain.c:747:domain_death_xswatch_callback: Domain
7:[evg=0x11f5af0]   got=domaininfos[0] got->domain=7
libxl: debug: libxl_domain.c:773:domain_death_xswatch_callback: Domain
7:Exists shutdown_reported=1 dominf.flags=1010f
libxl: debug: libxl_domain.c:693:domain_death_occurred: Domain 7:dying
libxl: debug: libxl_domain.c:740:domain_death_xswatch_callback: [evg=0]
all reported
libxl: debug: libxl_domain.c:802:domain_death_xswatch_callback: domain
death search done
libxl: debug: libxl_event.c:1869:libxl__ao_complete: ao 0x11f8220:
complete, rc=0
libxl: debug: libxl_event.c:1838:libxl__ao__destroy: ao 0x11f8220: destroy

So it appears that the domain died while it was being migrated.  I
expect the daemonised xl process then proceeded to clean it up under the
feet of the ongoing migration.

http://logs.test-lab.xenproject.org/osstest/logs/110009/test-amd64-amd64-xl-qemut-win7-amd64/elbling1---var-log-xen-qemu-dm-win.guest.osstest.log.1
says

Log-dirty: no command yet.
reset requested in cpu_handle_ioreq.
Issued domain 7 reboot

So actually it looks like reboot might have been going on, which also
explains why the guest was booting as domain 9 while domain 7 was having
problems during migrate.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-06 13:20   ` Andrew Cooper
@ 2017-06-06 14:00     ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2017-06-06 14:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: IanJackson, Wei Liu, osstest-admin, xen-devel

>>> On 06.06.17 at 15:20, <andrew.cooper3@citrix.com> wrote:
> So actually it looks like reboot might have been going on, which also
> explains why the guest was booting as domain 9 while domain 7 was having
> problems during migrate.

Hmm, so far I was assuming the guest reboot to have been a result
of migration having gone wrong, but yes, it being the other way
around would explain observed behavior. But it wouldn't get us any
closer to an understanding of what's going on/wrong.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-06 12:59 ` Jan Beulich
  2017-06-06 13:20   ` Andrew Cooper
@ 2017-06-06 14:00   ` Ian Jackson
  2017-06-06 14:22     ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Ian Jackson @ 2017-06-06 14:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, osstest-admin, xen-devel

Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 110009: regressions - FAIL"):
> So finally we have some output from the debugging code added by
> 933f966bcd ("x86/mm: add temporary debugging code to
> get_page_from_gfn_p2m()"), i.e. the migration heisenbug we hope
> to hunt down:
> 
> (XEN) d0v2: d7 dying (looking up 3e000)
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0803150ef>] get_page_from_gfn_p2m+0x7b/0x416
> (XEN)    [<ffff82d080268e88>] arch_do_domctl+0x51a/0x2535
> (XEN)    [<ffff82d080206cf9>] do_domctl+0x17e4/0x1baf
> (XEN)    [<ffff82d080355896>] pv_hypercall+0x1ef/0x42d
> (XEN)    [<ffff82d0803594c6>] entry.o#test_all_events+0/0x30
> 
> which points at XEN_DOMCTL_getpageframeinfo3 handling code.
> What business would the tool stack have invoking this domctl for
> a dying domain? I'd expect all of these operations to be done
> while the domain is still alive (perhaps paused), but none of them
> to occur once domain death was initiated.

The toolstack log says:

  libxl-save-helper: debug: starting restore: Success
  xc: detail: fd 8, dom 8, hvm 0, pae 0, superpages 0, stream_type 0
  xc: info: Found x86 HVM domain from Xen 4.10
  xc: info: Restoring domain
  xc: error: Failed to get types for pfn batch (3 = No such process): Internal error
  xc: error: Save failed (3 = No such process): Internal error

This is a mixture of output from the save, and output from the restore.
Domain 7 is the domain which is migrating out; domain 8 is migrating
in.

The `Failed to get types message' is the first thing that seems to go
wrong.  It's from tools/libxc/xc_sr_save.c line 136, which is part of
the machinery for constructing a memory batch.


I tried comparing this test with a successful one.  I had to hunt a
bit to find one where the (inherently possibly-out-of-order) toolstack
messages were similar, but found 110010 (a linux-4.9 test) [1].

The first significant difference (excluding some variations of
addresses etc., and some messages about NUMA placement of the new
domain which presumably result from a different host) occur here:

  libxl-save-helper: debug: starting restore: Success
  xc: detail: fd 8, dom 8, hvm 0, pae 0, superpages 0, stream_type 0
  xc: info: Found x86 HVM domain from Xen 4.9
  xc: info: Restoring domain
  libxl: debug: libxl_dom_suspend.c:179:domain_suspend_callback_common: Domain 7:Calling xc_domain_shutdown on HVM domain
  libxl: debug: libxl_dom_suspend.c:294:domain_suspend_common_wait_guest: Domain 7:wait for the guest to suspend
  libxl: debug: libxl_event.c:636:libxl__ev_xswatch_register: watch w=0x2179a40 wpath=@releaseDomain token=3/1: register slotnum=3
  libxl: debug: libxl_event.c:573:watchfd_callback: watch w=0x2179a40 wpath=@releaseDomain token=3/1: event epath=@releaseDomain
  libxl: debug: libxl_dom_suspend.c:352:suspend_common_wait_guest_check: Domain 7:guest has suspended

Looking at the serial logs for that and comparing them with 10009,
it's not terribly easy to see what's going on because the kernel
versions are different and so produce different messages about xenbr0
(and I think may have a different bridge port management algorithm).

But the messages about promiscuous mode seem the same, and of course
promiscuous mode is controlled by userspace, rather than by the kernel
(so should be the same in both).

However, in the failed test we see extra messages about promis:

  Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous mode
  ...
  Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode

Also, the qemu log for the guest in the failure case says this:

  Log-dirty command enable
  Log-dirty: no command yet.
  reset requested in cpu_handle_ioreq.
  Issued domain 7 reboot

Whereas in the working tests we see something like this:

  Log-dirty command enable
  Log-dirty: no command yet.
  dm-command: pause and save state
  device model saving state

In the xl log in the failure case I see this:

  libxl: debug: libxl_domain.c:773:domain_death_xswatch_callback: Domain 7:Exists shutdown_reported=0 dominf.flags=10106
  libxl: debug: libxl_domain.c:785:domain_death_xswatch_callback:  shutdown reporting
  libxl: debug: libxl_domain.c:740:domain_death_xswatch_callback: [evg=0] all reported
  libxl: debug: libxl_domain.c:802:domain_death_xswatch_callback: domain death search done
  Domain 7 has shut down, reason code 1 0x1
  Action for shutdown reason code 1 is restart

xl then tears down the domain's devices and destroys the domain.

All of this seems to suggest that the domain decided to reboot
mid-migration, which is pretty strange.

Ian.


[1]  http://logs.test-lab.xenproject.org/osstest/logs/110010/test-amd64-amd64-xl-qemut-win7-amd64/info.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-06 14:00   ` Ian Jackson
@ 2017-06-06 14:22     ` Jan Beulich
  2017-06-06 19:19       ` Stefano Stabellini
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2017-06-06 14:22 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Stefano Stabellini, Wei Liu, osstest-admin, xen-devel

>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
> Looking at the serial logs for that and comparing them with 10009,
> it's not terribly easy to see what's going on because the kernel
> versions are different and so produce different messages about xenbr0
> (and I think may have a different bridge port management algorithm).
> 
> But the messages about promiscuous mode seem the same, and of course
> promiscuous mode is controlled by userspace, rather than by the kernel
> (so should be the same in both).
> 
> However, in the failed test we see extra messages about promis:
> 
>   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous 
> mode
>   ...
>   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode

Wouldn't those be another result of the guest shutting down /
being shut down?

> Also, the qemu log for the guest in the failure case says this:
> 
>   Log-dirty command enable
>   Log-dirty: no command yet.
>   reset requested in cpu_handle_ioreq.

So this would seem to call for instrumentation on the qemu side
then, as the only path via which this can be initiated is - afaics -
qemu_system_reset_request(), which doesn't have very many
callers that could possibly be of interest here. Adding Stefano ...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-06 14:22     ` Jan Beulich
@ 2017-06-06 19:19       ` Stefano Stabellini
  2017-06-07  8:12         ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Stefano Stabellini @ 2017-06-06 19:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	osstest-admin, xen-devel

On Tue, 6 Jun 2017, Jan Beulich wrote:
> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
> > Looking at the serial logs for that and comparing them with 10009,
> > it's not terribly easy to see what's going on because the kernel
> > versions are different and so produce different messages about xenbr0
> > (and I think may have a different bridge port management algorithm).
> > 
> > But the messages about promiscuous mode seem the same, and of course
> > promiscuous mode is controlled by userspace, rather than by the kernel
> > (so should be the same in both).
> > 
> > However, in the failed test we see extra messages about promis:
> > 
> >   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous 
> > mode
> >   ...
> >   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
> 
> Wouldn't those be another result of the guest shutting down /
> being shut down?
> 
> > Also, the qemu log for the guest in the failure case says this:
> > 
> >   Log-dirty command enable
> >   Log-dirty: no command yet.
> >   reset requested in cpu_handle_ioreq.
> 
> So this would seem to call for instrumentation on the qemu side
> then, as the only path via which this can be initiated is - afaics -
> qemu_system_reset_request(), which doesn't have very many
> callers that could possibly be of interest here. Adding Stefano ...

I am pretty sure that those messages come from qemu traditional: "reset
requested in cpu_handle_ioreq" is not printed by qemu-xen.

In any case, the request comes from qemu_system_reset_request, which is
called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
initiated the reset (or resume)?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-06 19:19       ` Stefano Stabellini
@ 2017-06-07  8:12         ` Jan Beulich
  2017-06-09  8:19           ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2017-06-07  8:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: osstest-admin, Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
> On Tue, 6 Jun 2017, Jan Beulich wrote:
>> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>> > Looking at the serial logs for that and comparing them with 10009,
>> > it's not terribly easy to see what's going on because the kernel
>> > versions are different and so produce different messages about xenbr0
>> > (and I think may have a different bridge port management algorithm).
>> > 
>> > But the messages about promiscuous mode seem the same, and of course
>> > promiscuous mode is controlled by userspace, rather than by the kernel
>> > (so should be the same in both).
>> > 
>> > However, in the failed test we see extra messages about promis:
>> > 
>> >   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous 
>> > mode
>> >   ...
>> >   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>> 
>> Wouldn't those be another result of the guest shutting down /
>> being shut down?
>> 
>> > Also, the qemu log for the guest in the failure case says this:
>> > 
>> >   Log-dirty command enable
>> >   Log-dirty: no command yet.
>> >   reset requested in cpu_handle_ioreq.
>> 
>> So this would seem to call for instrumentation on the qemu side
>> then, as the only path via which this can be initiated is - afaics -
>> qemu_system_reset_request(), which doesn't have very many
>> callers that could possibly be of interest here. Adding Stefano ...
> 
> I am pretty sure that those messages come from qemu traditional: "reset
> requested in cpu_handle_ioreq" is not printed by qemu-xen.

Oh, indeed - I didn't pay attention to this being a *-qemut-*
test. I'm sorry.

> In any case, the request comes from qemu_system_reset_request, which is
> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
> initiated the reset (or resume)?

Right, this and hw/pckbd.c look to be the only possible
sources. Yet then it's still unclear what makes the guest go
down.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-07  8:12         ` Jan Beulich
@ 2017-06-09  8:19           ` Jan Beulich
  2017-06-09 17:50             ` Stefano Stabellini
  2017-06-12 14:30             ` Julien Grall
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2017-06-09  8:19 UTC (permalink / raw)
  To: Julien Grall, Andrew Cooper, George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, osstest-admin, xen-devel

>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>> > Looking at the serial logs for that and comparing them with 10009,
>>> > it's not terribly easy to see what's going on because the kernel
>>> > versions are different and so produce different messages about xenbr0
>>> > (and I think may have a different bridge port management algorithm).
>>> > 
>>> > But the messages about promiscuous mode seem the same, and of course
>>> > promiscuous mode is controlled by userspace, rather than by the kernel
>>> > (so should be the same in both).
>>> > 
>>> > However, in the failed test we see extra messages about promis:
>>> > 
>>> >   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous 
>>> > mode
>>> >   ...
>>> >   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>> 
>>> Wouldn't those be another result of the guest shutting down /
>>> being shut down?
>>> 
>>> > Also, the qemu log for the guest in the failure case says this:
>>> > 
>>> >   Log-dirty command enable
>>> >   Log-dirty: no command yet.
>>> >   reset requested in cpu_handle_ioreq.
>>> 
>>> So this would seem to call for instrumentation on the qemu side
>>> then, as the only path via which this can be initiated is - afaics -
>>> qemu_system_reset_request(), which doesn't have very many
>>> callers that could possibly be of interest here. Adding Stefano ...
>> 
>> I am pretty sure that those messages come from qemu traditional: "reset
>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
> 
> Oh, indeed - I didn't pay attention to this being a *-qemut-*
> test. I'm sorry.
> 
>> In any case, the request comes from qemu_system_reset_request, which is
>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>> initiated the reset (or resume)?
> 
> Right, this and hw/pckbd.c look to be the only possible
> sources. Yet then it's still unclear what makes the guest go
> down.

So with all of the above in mind I wonder whether we shouldn't
revert 933f966bcd then - that debugging code is unlikely to help
with any further analysis of the issue, as reaching that code
for a dying domain is only a symptom as far as we understand it
now, not anywhere near the cause.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-09  8:19           ` Jan Beulich
@ 2017-06-09 17:50             ` Stefano Stabellini
  2017-06-12 14:30             ` Julien Grall
  1 sibling, 0 replies; 14+ messages in thread
From: Stefano Stabellini @ 2017-06-09 17:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, osstest-admin, Julien Grall, xen-devel

On Fri, 9 Jun 2017, Jan Beulich wrote:
> >>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
> >>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
> >> On Tue, 6 Jun 2017, Jan Beulich wrote:
> >>> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
> >>> > Looking at the serial logs for that and comparing them with 10009,
> >>> > it's not terribly easy to see what's going on because the kernel
> >>> > versions are different and so produce different messages about xenbr0
> >>> > (and I think may have a different bridge port management algorithm).
> >>> > 
> >>> > But the messages about promiscuous mode seem the same, and of course
> >>> > promiscuous mode is controlled by userspace, rather than by the kernel
> >>> > (so should be the same in both).
> >>> > 
> >>> > However, in the failed test we see extra messages about promis:
> >>> > 
> >>> >   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous 
> >>> > mode
> >>> >   ...
> >>> >   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
> >>> 
> >>> Wouldn't those be another result of the guest shutting down /
> >>> being shut down?
> >>> 
> >>> > Also, the qemu log for the guest in the failure case says this:
> >>> > 
> >>> >   Log-dirty command enable
> >>> >   Log-dirty: no command yet.
> >>> >   reset requested in cpu_handle_ioreq.
> >>> 
> >>> So this would seem to call for instrumentation on the qemu side
> >>> then, as the only path via which this can be initiated is - afaics -
> >>> qemu_system_reset_request(), which doesn't have very many
> >>> callers that could possibly be of interest here. Adding Stefano ...
> >> 
> >> I am pretty sure that those messages come from qemu traditional: "reset
> >> requested in cpu_handle_ioreq" is not printed by qemu-xen.
> > 
> > Oh, indeed - I didn't pay attention to this being a *-qemut-*
> > test. I'm sorry.
> > 
> >> In any case, the request comes from qemu_system_reset_request, which is
> >> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
> >> initiated the reset (or resume)?
> > 
> > Right, this and hw/pckbd.c look to be the only possible
> > sources. Yet then it's still unclear what makes the guest go
> > down.
> 
> So with all of the above in mind I wonder whether we shouldn't
> revert 933f966bcd then - that debugging code is unlikely to help
> with any further analysis of the issue, as reaching that code
> for a dying domain is only a symptom as far as we understand it
> now, not anywhere near the cause.

Makes sense to me

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-09  8:19           ` Jan Beulich
  2017-06-09 17:50             ` Stefano Stabellini
@ 2017-06-12 14:30             ` Julien Grall
  2017-06-12 14:57               ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Julien Grall @ 2017-06-12 14:30 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, osstest-admin, xen-devel

Hi Jan,

On 09/06/17 09:19, Jan Beulich wrote:
>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>> it's not terribly easy to see what's going on because the kernel
>>>>> versions are different and so produce different messages about xenbr0
>>>>> (and I think may have a different bridge port management algorithm).
>>>>>
>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>> promiscuous mode is controlled by userspace, rather than by the kernel
>>>>> (so should be the same in both).
>>>>>
>>>>> However, in the failed test we see extra messages about promis:
>>>>>
>>>>>   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>>>> mode
>>>>>   ...
>>>>>   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>>
>>>> Wouldn't those be another result of the guest shutting down /
>>>> being shut down?
>>>>
>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>
>>>>>   Log-dirty command enable
>>>>>   Log-dirty: no command yet.
>>>>>   reset requested in cpu_handle_ioreq.
>>>>
>>>> So this would seem to call for instrumentation on the qemu side
>>>> then, as the only path via which this can be initiated is - afaics -
>>>> qemu_system_reset_request(), which doesn't have very many
>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>
>>> I am pretty sure that those messages come from qemu traditional: "reset
>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>
>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>> test. I'm sorry.
>>
>>> In any case, the request comes from qemu_system_reset_request, which is
>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>> initiated the reset (or resume)?
>>
>> Right, this and hw/pckbd.c look to be the only possible
>> sources. Yet then it's still unclear what makes the guest go
>> down.
>
> So with all of the above in mind I wonder whether we shouldn't
> revert 933f966bcd then - that debugging code is unlikely to help
> with any further analysis of the issue, as reaching that code
> for a dying domain is only a symptom as far as we understand it
> now, not anywhere near the cause.

Are you suggesting to revert on Xen 4.9?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-12 14:30             ` Julien Grall
@ 2017-06-12 14:57               ` Jan Beulich
  2017-06-13  9:30                 ` Julien Grall
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2017-06-12 14:57 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, osstest-admin, xen-devel

>>> On 12.06.17 at 16:30, <julien.grall@arm.com> wrote:
> On 09/06/17 09:19, Jan Beulich wrote:
>>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>> versions are different and so produce different messages about xenbr0
>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>
>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>> promiscuous mode is controlled by userspace, rather than by the kernel
>>>>>> (so should be the same in both).
>>>>>>
>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>
>>>>>>   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>>>>> mode
>>>>>>   ...
>>>>>>   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>>>
>>>>> Wouldn't those be another result of the guest shutting down /
>>>>> being shut down?
>>>>>
>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>
>>>>>>   Log-dirty command enable
>>>>>>   Log-dirty: no command yet.
>>>>>>   reset requested in cpu_handle_ioreq.
>>>>>
>>>>> So this would seem to call for instrumentation on the qemu side
>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>
>>>> I am pretty sure that those messages come from qemu traditional: "reset
>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>
>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>> test. I'm sorry.
>>>
>>>> In any case, the request comes from qemu_system_reset_request, which is
>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>> initiated the reset (or resume)?
>>>
>>> Right, this and hw/pckbd.c look to be the only possible
>>> sources. Yet then it's still unclear what makes the guest go
>>> down.
>>
>> So with all of the above in mind I wonder whether we shouldn't
>> revert 933f966bcd then - that debugging code is unlikely to help
>> with any further analysis of the issue, as reaching that code
>> for a dying domain is only a symptom as far as we understand it
>> now, not anywhere near the cause.
> 
> Are you suggesting to revert on Xen 4.9?

Yes, if we revert now, then I'd say on both master and 4.9.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-12 14:57               ` Jan Beulich
@ 2017-06-13  9:30                 ` Julien Grall
  2017-06-14  9:23                   ` George Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: Julien Grall @ 2017-06-13  9:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, osstest-admin, xen-devel, nd

Hi Jan,

On 12/06/2017 15:57, Jan Beulich wrote:
>>>> On 12.06.17 at 16:30, <julien.grall@arm.com> wrote:
>> On 09/06/17 09:19, Jan Beulich wrote:
>>>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>>> versions are different and so produce different messages about xenbr0
>>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>>
>>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>>> promiscuous mode is controlled by userspace, rather than by the kernel
>>>>>>> (so should be the same in both).
>>>>>>>
>>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>>
>>>>>>>   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>>>>>> mode
>>>>>>>   ...
>>>>>>>   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>>>>
>>>>>> Wouldn't those be another result of the guest shutting down /
>>>>>> being shut down?
>>>>>>
>>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>>
>>>>>>>   Log-dirty command enable
>>>>>>>   Log-dirty: no command yet.
>>>>>>>   reset requested in cpu_handle_ioreq.
>>>>>>
>>>>>> So this would seem to call for instrumentation on the qemu side
>>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>>
>>>>> I am pretty sure that those messages come from qemu traditional: "reset
>>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>>
>>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>>> test. I'm sorry.
>>>>
>>>>> In any case, the request comes from qemu_system_reset_request, which is
>>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>>> initiated the reset (or resume)?
>>>>
>>>> Right, this and hw/pckbd.c look to be the only possible
>>>> sources. Yet then it's still unclear what makes the guest go
>>>> down.
>>>
>>> So with all of the above in mind I wonder whether we shouldn't
>>> revert 933f966bcd then - that debugging code is unlikely to help
>>> with any further analysis of the issue, as reaching that code
>>> for a dying domain is only a symptom as far as we understand it
>>> now, not anywhere near the cause.
>>
>> Are you suggesting to revert on Xen 4.9?
>
> Yes, if we revert now, then I'd say on both master and 4.9.

I would be ok with that.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [xen-unstable test] 110009: regressions - FAIL
  2017-06-13  9:30                 ` Julien Grall
@ 2017-06-14  9:23                   ` George Dunlap
  0 siblings, 0 replies; 14+ messages in thread
From: George Dunlap @ 2017-06-14  9:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	osstest-admin, Jan Beulich, xen-devel, nd

On Tue, Jun 13, 2017 at 10:30 AM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Jan,
>
>
> On 12/06/2017 15:57, Jan Beulich wrote:
>>>>>
>>>>> On 12.06.17 at 16:30, <julien.grall@arm.com> wrote:
>>>
>>> On 09/06/17 09:19, Jan Beulich wrote:
>>>>>>>
>>>>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>>>>>
>>>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>>>>
>>>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>>>>>
>>>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>>>> versions are different and so produce different messages about
>>>>>>>> xenbr0
>>>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>>>
>>>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>>>> promiscuous mode is controlled by userspace, rather than by the
>>>>>>>> kernel
>>>>>>>> (so should be the same in both).
>>>>>>>>
>>>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>>>
>>>>>>>>   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left
>>>>>>>> promiscuous
>>>>>>>> mode
>>>>>>>>   ...
>>>>>>>>   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left
>>>>>>>> promiscuous mode
>>>>>>>
>>>>>>>
>>>>>>> Wouldn't those be another result of the guest shutting down /
>>>>>>> being shut down?
>>>>>>>
>>>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>>>
>>>>>>>>   Log-dirty command enable
>>>>>>>>   Log-dirty: no command yet.
>>>>>>>>   reset requested in cpu_handle_ioreq.
>>>>>>>
>>>>>>>
>>>>>>> So this would seem to call for instrumentation on the qemu side
>>>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>>>
>>>>>>
>>>>>> I am pretty sure that those messages come from qemu traditional:
>>>>>> "reset
>>>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>>>
>>>>>
>>>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>>>> test. I'm sorry.
>>>>>
>>>>>> In any case, the request comes from qemu_system_reset_request, which
>>>>>> is
>>>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>>>> initiated the reset (or resume)?
>>>>>
>>>>>
>>>>> Right, this and hw/pckbd.c look to be the only possible
>>>>> sources. Yet then it's still unclear what makes the guest go
>>>>> down.
>>>>
>>>>
>>>> So with all of the above in mind I wonder whether we shouldn't
>>>> revert 933f966bcd then - that debugging code is unlikely to help
>>>> with any further analysis of the issue, as reaching that code
>>>> for a dying domain is only a symptom as far as we understand it
>>>> now, not anywhere near the cause.
>>>
>>>
>>> Are you suggesting to revert on Xen 4.9?
>>
>>
>> Yes, if we revert now, then I'd say on both master and 4.9.
>
>
> I would be ok with that.

Reverting 933f966bcd

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-14  9:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 16:55 [xen-unstable test] 110009: regressions - FAIL osstest service owner
2017-06-06 12:59 ` Jan Beulich
2017-06-06 13:20   ` Andrew Cooper
2017-06-06 14:00     ` Jan Beulich
2017-06-06 14:00   ` Ian Jackson
2017-06-06 14:22     ` Jan Beulich
2017-06-06 19:19       ` Stefano Stabellini
2017-06-07  8:12         ` Jan Beulich
2017-06-09  8:19           ` Jan Beulich
2017-06-09 17:50             ` Stefano Stabellini
2017-06-12 14:30             ` Julien Grall
2017-06-12 14:57               ` Jan Beulich
2017-06-13  9:30                 ` Julien Grall
2017-06-14  9:23                   ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).