* [xen-unstable test] 110009: regressions - FAIL
@ 2017-06-05 16:55 osstest service owner
2017-06-06 12:59 ` Jan Beulich
0 siblings, 1 reply; 14+ messages in thread
From: osstest service owner @ 2017-06-05 16:55 UTC (permalink / raw)
To: xen-devel, osstest-admin
[-- Attachment #1: Type: text/plain, Size: 13885 bytes --]
flight 110009 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110009/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 109841
Tests which did not succeed, but are not blocking:
test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 109803
test-armhf-armhf-libvirt 13 saverestore-support-check fail like 109828
test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 109841
test-armhf-armhf-libvirt-xsm 13 saverestore-support-check fail like 109841
test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat fail like 109841
test-amd64-amd64-xl-rtds 9 debian-install fail like 109841
test-armhf-armhf-libvirt-raw 12 saverestore-support-check fail like 109841
test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 109841
test-amd64-amd64-xl-qemut-ws16-amd64 9 windows-install fail never pass
test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-install fail never pass
test-amd64-i386-libvirt-xsm 12 migrate-support-check fail never pass
test-amd64-i386-libvirt 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail never pass
test-arm64-arm64-xl-credit2 12 migrate-support-check fail never pass
test-arm64-arm64-xl-credit2 13 saverestore-support-check fail never pass
test-arm64-arm64-xl-xsm 12 migrate-support-check fail never pass
test-arm64-arm64-xl-xsm 13 saverestore-support-check fail never pass
test-arm64-arm64-libvirt-xsm 12 migrate-support-check fail never pass
test-arm64-arm64-libvirt-xsm 13 saverestore-support-check fail never pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
test-arm64-arm64-xl 12 migrate-support-check fail never pass
test-arm64-arm64-xl 13 saverestore-support-check fail never pass
test-amd64-amd64-libvirt-vhd 11 migrate-support-check fail never pass
test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass
test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-xl 12 migrate-support-check fail never pass
test-armhf-armhf-xl 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-cubietruck 12 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl-xsm 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-arndale 12 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 13 saverestore-support-check fail never pass
test-amd64-amd64-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 12 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-vhd 11 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 12 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 12 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-raw 11 migrate-support-check fail never pass
test-amd64-i386-xl-qemut-win10-i386 9 windows-install fail never pass
test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass
test-amd64-amd64-xl-qemuu-win10-i386 9 windows-install fail never pass
test-amd64-i386-xl-qemuu-ws16-amd64 9 windows-install fail never pass
test-amd64-i386-xl-qemut-ws16-amd64 9 windows-install fail never pass
test-amd64-amd64-xl-qemut-win10-i386 9 windows-install fail never pass
version targeted for testing:
xen d8eed4021d50eb48ca75c8559aed95a2ad74afaa
baseline version:
xen 876800d5f9de8b15355172794cb82f505dd26e18
Last test of basis 109841 2017-05-30 02:02:16 Z 6 days
Failing since 109866 2017-05-30 19:48:42 Z 5 days 7 attempts
Testing same since 109957 2017-06-03 10:00:05 Z 2 days 4 attempts
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Armando Vega <armando@greenhost.nl>
Borislav Petkov <bp@suse.de>
George Dunlap <george.dunlap@eu.citrix.com>
Gregory Herrero <gregory.herrero@oracle.com>
Haozhong Zhang <haozhong.zhang@intel.com>
Ian Jackson <Ian.Jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Julien Grall <julien.grall@arm.com>
Kevin Tian <kevin.tian@intel.com>
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Luwei Kang <luwei.kang@intel.com>
Roger Pau Monné <roger.pau@citrix.com>
Swapnil Paratey <swapnil.paratey@amd.com>
Wei Liu <wei.liu2@citrix.com>
Zhang Bo <oscar.zhangbo@huawei.com>
jobs:
build-amd64-xsm pass
build-arm64-xsm pass
build-armhf-xsm pass
build-i386-xsm pass
build-amd64-xtf pass
build-amd64 pass
build-arm64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-arm64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-prev pass
build-i386-prev pass
build-amd64-pvops pass
build-arm64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
build-amd64-rumprun pass
build-i386-rumprun pass
test-xtf-amd64-amd64-1 pass
test-xtf-amd64-amd64-2 pass
test-xtf-amd64-amd64-3 pass
test-xtf-amd64-amd64-4 pass
test-xtf-amd64-amd64-5 pass
test-amd64-amd64-xl pass
test-arm64-arm64-xl pass
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass
test-amd64-amd64-libvirt-xsm pass
test-arm64-arm64-libvirt-xsm pass
test-armhf-armhf-libvirt-xsm pass
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-arm64-arm64-xl-xsm pass
test-armhf-armhf-xl-xsm pass
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-amd64-xl-pvh-amd pass
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-debianhvm-amd64 pass
test-amd64-i386-xl-qemut-debianhvm-amd64 pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-rumprun-amd64 pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-qemut-ws16-amd64 fail
test-amd64-i386-xl-qemut-ws16-amd64 fail
test-amd64-amd64-xl-qemuu-ws16-amd64 fail
test-amd64-i386-xl-qemuu-ws16-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit2 pass
test-arm64-arm64-xl-credit2 pass
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-amd64-examine pass
test-arm64-arm64-examine pass
test-armhf-armhf-examine pass
test-amd64-i386-examine pass
test-amd64-i386-freebsd10-i386 pass
test-amd64-i386-rumprun-i386 pass
test-amd64-amd64-xl-qemut-win10-i386 fail
test-amd64-i386-xl-qemut-win10-i386 fail
test-amd64-amd64-xl-qemuu-win10-i386 fail
test-amd64-i386-xl-qemuu-win10-i386 fail
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-amd64-xl-pvh-intel pass
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-migrupgrade pass
test-amd64-i386-migrupgrade pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair pass
test-amd64-i386-libvirt-pair pass
test-amd64-amd64-amd64-pvgrub pass
test-amd64-amd64-i386-pvgrub pass
test-amd64-amd64-pygrub pass
test-amd64-amd64-xl-qcow2 pass
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-xl-raw pass
test-amd64-amd64-xl-rtds fail
test-armhf-armhf-xl-rtds fail
test-amd64-amd64-libvirt-vhd pass
test-armhf-armhf-xl-vhd pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 589 lines long.)
[-- Attachment #2: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-05 16:55 [xen-unstable test] 110009: regressions - FAIL osstest service owner
@ 2017-06-06 12:59 ` Jan Beulich
2017-06-06 13:20 ` Andrew Cooper
2017-06-06 14:00 ` Ian Jackson
0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2017-06-06 12:59 UTC (permalink / raw)
To: Andrew Cooper, Wei Liu, Ian Jackson; +Cc: xen-devel, osstest-admin
>>> On 05.06.17 at 18:55, <osstest-admin@xenproject.org> wrote:
> flight 110009 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/110009/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 109841
So finally we have some output from the debugging code added by
933f966bcd ("x86/mm: add temporary debugging code to
get_page_from_gfn_p2m()"), i.e. the migration heisenbug we hope
to hunt down:
(XEN) d0v2: d7 dying (looking up 3e000)
...
(XEN) Xen call trace:
(XEN) [<ffff82d0803150ef>] get_page_from_gfn_p2m+0x7b/0x416
(XEN) [<ffff82d080268e88>] arch_do_domctl+0x51a/0x2535
(XEN) [<ffff82d080206cf9>] do_domctl+0x17e4/0x1baf
(XEN) [<ffff82d080355896>] pv_hypercall+0x1ef/0x42d
(XEN) [<ffff82d0803594c6>] entry.o#test_all_events+0/0x30
which points at XEN_DOMCTL_getpageframeinfo3 handling code.
What business would the tool stack have invoking this domctl for
a dying domain? I'd expect all of these operations to be done
while the domain is still alive (perhaps paused), but none of them
to occur once domain death was initiated.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-06 12:59 ` Jan Beulich
@ 2017-06-06 13:20 ` Andrew Cooper
2017-06-06 14:00 ` Jan Beulich
2017-06-06 14:00 ` Ian Jackson
1 sibling, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2017-06-06 13:20 UTC (permalink / raw)
To: Jan Beulich, Wei Liu, Ian Jackson; +Cc: xen-devel, osstest-admin
On 06/06/17 13:59, Jan Beulich wrote:
>>>> On 05.06.17 at 18:55, <osstest-admin@xenproject.org> wrote:
>> flight 110009 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/110009/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 109841
> So finally we have some output from the debugging code added by
> 933f966bcd ("x86/mm: add temporary debugging code to
> get_page_from_gfn_p2m()"), i.e. the migration heisenbug we hope
> to hunt down:
>
> (XEN) d0v2: d7 dying (looking up 3e000)
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d0803150ef>] get_page_from_gfn_p2m+0x7b/0x416
> (XEN) [<ffff82d080268e88>] arch_do_domctl+0x51a/0x2535
> (XEN) [<ffff82d080206cf9>] do_domctl+0x17e4/0x1baf
> (XEN) [<ffff82d080355896>] pv_hypercall+0x1ef/0x42d
> (XEN) [<ffff82d0803594c6>] entry.o#test_all_events+0/0x30
>
> which points at XEN_DOMCTL_getpageframeinfo3 handling code.
> What business would the tool stack have invoking this domctl for
> a dying domain? I'd expect all of these operations to be done
> while the domain is still alive (perhaps paused), but none of them
> to occur once domain death was initiated.
http://logs.test-lab.xenproject.org/osstest/logs/110009/test-amd64-amd64-xl-qemut-win7-amd64/15.ts-guest-localmigrate.log
is rather curious. Unfortunately, libxl doesn't annotate the source and
destination logging lines when it merges them back together, and doesn't
include the progress markers. I've manually rearranged them back to a
logical order.
libxl-save-helper: debug: starting save: Success
xc: detail: fd 10, dom 7, max_iters 0, max_factor 0, flags 5, hvm 1
xc: info: Saving domain 7, type x86 HVM
xc: error: Failed to get types for pfn batch (3 = No such process):
Internal error
xc: error: Save failed (3 = No such process): Internal error
xc: error: Couldn't disable qemu log-dirty mode (3 = No such process):
Internal error
xc: error: Failed to clean up (3 = No such process): Internal error
The first -ESRCH here is the result of XEN_DOMCTL_getpageframeinfo3
encountering a dying domain. The qemu logdirty error is because the
libxl callback found that the qemu process it was expecting talk to
doesn't exist.
From
http://logs.test-lab.xenproject.org/osstest/logs/110009/test-amd64-amd64-xl-qemut-win7-amd64/elbling1---var-log-xen-xl-win.guest.osstest.log
libxl: debug: libxl_domain.c:747:domain_death_xswatch_callback: Domain
7:[evg=0x11f5af0] got=domaininfos[0] got->domain=7
libxl: debug: libxl_domain.c:773:domain_death_xswatch_callback: Domain
7:Exists shutdown_reported=1 dominf.flags=1010f
libxl: debug: libxl_domain.c:693:domain_death_occurred: Domain 7:dying
libxl: debug: libxl_domain.c:740:domain_death_xswatch_callback: [evg=0]
all reported
libxl: debug: libxl_domain.c:802:domain_death_xswatch_callback: domain
death search done
libxl: debug: libxl_event.c:1869:libxl__ao_complete: ao 0x11f8220:
complete, rc=0
libxl: debug: libxl_event.c:1838:libxl__ao__destroy: ao 0x11f8220: destroy
So it appears that the domain died while it was being migrated. I
expect the daemonised xl process then proceeded to clean it up under the
feet of the ongoing migration.
http://logs.test-lab.xenproject.org/osstest/logs/110009/test-amd64-amd64-xl-qemut-win7-amd64/elbling1---var-log-xen-qemu-dm-win.guest.osstest.log.1
says
Log-dirty: no command yet.
reset requested in cpu_handle_ioreq.
Issued domain 7 reboot
So actually it looks like reboot might have been going on, which also
explains why the guest was booting as domain 9 while domain 7 was having
problems during migrate.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-06 13:20 ` Andrew Cooper
@ 2017-06-06 14:00 ` Jan Beulich
0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2017-06-06 14:00 UTC (permalink / raw)
To: Andrew Cooper; +Cc: IanJackson, Wei Liu, osstest-admin, xen-devel
>>> On 06.06.17 at 15:20, <andrew.cooper3@citrix.com> wrote:
> So actually it looks like reboot might have been going on, which also
> explains why the guest was booting as domain 9 while domain 7 was having
> problems during migrate.
Hmm, so far I was assuming the guest reboot to have been a result
of migration having gone wrong, but yes, it being the other way
around would explain observed behavior. But it wouldn't get us any
closer to an understanding of what's going on/wrong.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-06 12:59 ` Jan Beulich
2017-06-06 13:20 ` Andrew Cooper
@ 2017-06-06 14:00 ` Ian Jackson
2017-06-06 14:22 ` Jan Beulich
1 sibling, 1 reply; 14+ messages in thread
From: Ian Jackson @ 2017-06-06 14:00 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, osstest-admin, xen-devel
Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 110009: regressions - FAIL"):
> So finally we have some output from the debugging code added by
> 933f966bcd ("x86/mm: add temporary debugging code to
> get_page_from_gfn_p2m()"), i.e. the migration heisenbug we hope
> to hunt down:
>
> (XEN) d0v2: d7 dying (looking up 3e000)
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d0803150ef>] get_page_from_gfn_p2m+0x7b/0x416
> (XEN) [<ffff82d080268e88>] arch_do_domctl+0x51a/0x2535
> (XEN) [<ffff82d080206cf9>] do_domctl+0x17e4/0x1baf
> (XEN) [<ffff82d080355896>] pv_hypercall+0x1ef/0x42d
> (XEN) [<ffff82d0803594c6>] entry.o#test_all_events+0/0x30
>
> which points at XEN_DOMCTL_getpageframeinfo3 handling code.
> What business would the tool stack have invoking this domctl for
> a dying domain? I'd expect all of these operations to be done
> while the domain is still alive (perhaps paused), but none of them
> to occur once domain death was initiated.
The toolstack log says:
libxl-save-helper: debug: starting restore: Success
xc: detail: fd 8, dom 8, hvm 0, pae 0, superpages 0, stream_type 0
xc: info: Found x86 HVM domain from Xen 4.10
xc: info: Restoring domain
xc: error: Failed to get types for pfn batch (3 = No such process): Internal error
xc: error: Save failed (3 = No such process): Internal error
This is a mixture of output from the save, and output from the restore.
Domain 7 is the domain which is migrating out; domain 8 is migrating
in.
The `Failed to get types message' is the first thing that seems to go
wrong. It's from tools/libxc/xc_sr_save.c line 136, which is part of
the machinery for constructing a memory batch.
I tried comparing this test with a successful one. I had to hunt a
bit to find one where the (inherently possibly-out-of-order) toolstack
messages were similar, but found 110010 (a linux-4.9 test) [1].
The first significant difference (excluding some variations of
addresses etc., and some messages about NUMA placement of the new
domain which presumably result from a different host) occur here:
libxl-save-helper: debug: starting restore: Success
xc: detail: fd 8, dom 8, hvm 0, pae 0, superpages 0, stream_type 0
xc: info: Found x86 HVM domain from Xen 4.9
xc: info: Restoring domain
libxl: debug: libxl_dom_suspend.c:179:domain_suspend_callback_common: Domain 7:Calling xc_domain_shutdown on HVM domain
libxl: debug: libxl_dom_suspend.c:294:domain_suspend_common_wait_guest: Domain 7:wait for the guest to suspend
libxl: debug: libxl_event.c:636:libxl__ev_xswatch_register: watch w=0x2179a40 wpath=@releaseDomain token=3/1: register slotnum=3
libxl: debug: libxl_event.c:573:watchfd_callback: watch w=0x2179a40 wpath=@releaseDomain token=3/1: event epath=@releaseDomain
libxl: debug: libxl_dom_suspend.c:352:suspend_common_wait_guest_check: Domain 7:guest has suspended
Looking at the serial logs for that and comparing them with 10009,
it's not terribly easy to see what's going on because the kernel
versions are different and so produce different messages about xenbr0
(and I think may have a different bridge port management algorithm).
But the messages about promiscuous mode seem the same, and of course
promiscuous mode is controlled by userspace, rather than by the kernel
(so should be the same in both).
However, in the failed test we see extra messages about promis:
Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous mode
...
Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
Also, the qemu log for the guest in the failure case says this:
Log-dirty command enable
Log-dirty: no command yet.
reset requested in cpu_handle_ioreq.
Issued domain 7 reboot
Whereas in the working tests we see something like this:
Log-dirty command enable
Log-dirty: no command yet.
dm-command: pause and save state
device model saving state
In the xl log in the failure case I see this:
libxl: debug: libxl_domain.c:773:domain_death_xswatch_callback: Domain 7:Exists shutdown_reported=0 dominf.flags=10106
libxl: debug: libxl_domain.c:785:domain_death_xswatch_callback: shutdown reporting
libxl: debug: libxl_domain.c:740:domain_death_xswatch_callback: [evg=0] all reported
libxl: debug: libxl_domain.c:802:domain_death_xswatch_callback: domain death search done
Domain 7 has shut down, reason code 1 0x1
Action for shutdown reason code 1 is restart
xl then tears down the domain's devices and destroys the domain.
All of this seems to suggest that the domain decided to reboot
mid-migration, which is pretty strange.
Ian.
[1] http://logs.test-lab.xenproject.org/osstest/logs/110010/test-amd64-amd64-xl-qemut-win7-amd64/info.html
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-06 14:00 ` Ian Jackson
@ 2017-06-06 14:22 ` Jan Beulich
2017-06-06 19:19 ` Stefano Stabellini
0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2017-06-06 14:22 UTC (permalink / raw)
To: Ian Jackson
Cc: Andrew Cooper, Stefano Stabellini, Wei Liu, osstest-admin, xen-devel
>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
> Looking at the serial logs for that and comparing them with 10009,
> it's not terribly easy to see what's going on because the kernel
> versions are different and so produce different messages about xenbr0
> (and I think may have a different bridge port management algorithm).
>
> But the messages about promiscuous mode seem the same, and of course
> promiscuous mode is controlled by userspace, rather than by the kernel
> (so should be the same in both).
>
> However, in the failed test we see extra messages about promis:
>
> Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
> mode
> ...
> Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
Wouldn't those be another result of the guest shutting down /
being shut down?
> Also, the qemu log for the guest in the failure case says this:
>
> Log-dirty command enable
> Log-dirty: no command yet.
> reset requested in cpu_handle_ioreq.
So this would seem to call for instrumentation on the qemu side
then, as the only path via which this can be initiated is - afaics -
qemu_system_reset_request(), which doesn't have very many
callers that could possibly be of interest here. Adding Stefano ...
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-06 14:22 ` Jan Beulich
@ 2017-06-06 19:19 ` Stefano Stabellini
2017-06-07 8:12 ` Jan Beulich
0 siblings, 1 reply; 14+ messages in thread
From: Stefano Stabellini @ 2017-06-06 19:19 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
osstest-admin, xen-devel
On Tue, 6 Jun 2017, Jan Beulich wrote:
> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
> > Looking at the serial logs for that and comparing them with 10009,
> > it's not terribly easy to see what's going on because the kernel
> > versions are different and so produce different messages about xenbr0
> > (and I think may have a different bridge port management algorithm).
> >
> > But the messages about promiscuous mode seem the same, and of course
> > promiscuous mode is controlled by userspace, rather than by the kernel
> > (so should be the same in both).
> >
> > However, in the failed test we see extra messages about promis:
> >
> > Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
> > mode
> > ...
> > Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>
> Wouldn't those be another result of the guest shutting down /
> being shut down?
>
> > Also, the qemu log for the guest in the failure case says this:
> >
> > Log-dirty command enable
> > Log-dirty: no command yet.
> > reset requested in cpu_handle_ioreq.
>
> So this would seem to call for instrumentation on the qemu side
> then, as the only path via which this can be initiated is - afaics -
> qemu_system_reset_request(), which doesn't have very many
> callers that could possibly be of interest here. Adding Stefano ...
I am pretty sure that those messages come from qemu traditional: "reset
requested in cpu_handle_ioreq" is not printed by qemu-xen.
In any case, the request comes from qemu_system_reset_request, which is
called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
initiated the reset (or resume)?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-06 19:19 ` Stefano Stabellini
@ 2017-06-07 8:12 ` Jan Beulich
2017-06-09 8:19 ` Jan Beulich
0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2017-06-07 8:12 UTC (permalink / raw)
To: Stefano Stabellini
Cc: osstest-admin, Andrew Cooper, Wei Liu, Ian Jackson, xen-devel
>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
> On Tue, 6 Jun 2017, Jan Beulich wrote:
>> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>> > Looking at the serial logs for that and comparing them with 10009,
>> > it's not terribly easy to see what's going on because the kernel
>> > versions are different and so produce different messages about xenbr0
>> > (and I think may have a different bridge port management algorithm).
>> >
>> > But the messages about promiscuous mode seem the same, and of course
>> > promiscuous mode is controlled by userspace, rather than by the kernel
>> > (so should be the same in both).
>> >
>> > However, in the failed test we see extra messages about promis:
>> >
>> > Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>> > mode
>> > ...
>> > Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>
>> Wouldn't those be another result of the guest shutting down /
>> being shut down?
>>
>> > Also, the qemu log for the guest in the failure case says this:
>> >
>> > Log-dirty command enable
>> > Log-dirty: no command yet.
>> > reset requested in cpu_handle_ioreq.
>>
>> So this would seem to call for instrumentation on the qemu side
>> then, as the only path via which this can be initiated is - afaics -
>> qemu_system_reset_request(), which doesn't have very many
>> callers that could possibly be of interest here. Adding Stefano ...
>
> I am pretty sure that those messages come from qemu traditional: "reset
> requested in cpu_handle_ioreq" is not printed by qemu-xen.
Oh, indeed - I didn't pay attention to this being a *-qemut-*
test. I'm sorry.
> In any case, the request comes from qemu_system_reset_request, which is
> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
> initiated the reset (or resume)?
Right, this and hw/pckbd.c look to be the only possible
sources. Yet then it's still unclear what makes the guest go
down.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-07 8:12 ` Jan Beulich
@ 2017-06-09 8:19 ` Jan Beulich
2017-06-09 17:50 ` Stefano Stabellini
2017-06-12 14:30 ` Julien Grall
0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2017-06-09 8:19 UTC (permalink / raw)
To: Julien Grall, Andrew Cooper, George Dunlap
Cc: Ian Jackson, Stefano Stabellini, Wei Liu, osstest-admin, xen-devel
>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>> > Looking at the serial logs for that and comparing them with 10009,
>>> > it's not terribly easy to see what's going on because the kernel
>>> > versions are different and so produce different messages about xenbr0
>>> > (and I think may have a different bridge port management algorithm).
>>> >
>>> > But the messages about promiscuous mode seem the same, and of course
>>> > promiscuous mode is controlled by userspace, rather than by the kernel
>>> > (so should be the same in both).
>>> >
>>> > However, in the failed test we see extra messages about promis:
>>> >
>>> > Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>> > mode
>>> > ...
>>> > Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>
>>> Wouldn't those be another result of the guest shutting down /
>>> being shut down?
>>>
>>> > Also, the qemu log for the guest in the failure case says this:
>>> >
>>> > Log-dirty command enable
>>> > Log-dirty: no command yet.
>>> > reset requested in cpu_handle_ioreq.
>>>
>>> So this would seem to call for instrumentation on the qemu side
>>> then, as the only path via which this can be initiated is - afaics -
>>> qemu_system_reset_request(), which doesn't have very many
>>> callers that could possibly be of interest here. Adding Stefano ...
>>
>> I am pretty sure that those messages come from qemu traditional: "reset
>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>
> Oh, indeed - I didn't pay attention to this being a *-qemut-*
> test. I'm sorry.
>
>> In any case, the request comes from qemu_system_reset_request, which is
>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>> initiated the reset (or resume)?
>
> Right, this and hw/pckbd.c look to be the only possible
> sources. Yet then it's still unclear what makes the guest go
> down.
So with all of the above in mind I wonder whether we shouldn't
revert 933f966bcd then - that debugging code is unlikely to help
with any further analysis of the issue, as reaching that code
for a dying domain is only a symptom as far as we understand it
now, not anywhere near the cause.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-09 8:19 ` Jan Beulich
@ 2017-06-09 17:50 ` Stefano Stabellini
2017-06-12 14:30 ` Julien Grall
1 sibling, 0 replies; 14+ messages in thread
From: Stefano Stabellini @ 2017-06-09 17:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, osstest-admin, Julien Grall, xen-devel
On Fri, 9 Jun 2017, Jan Beulich wrote:
> >>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
> >>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
> >> On Tue, 6 Jun 2017, Jan Beulich wrote:
> >>> >>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
> >>> > Looking at the serial logs for that and comparing them with 10009,
> >>> > it's not terribly easy to see what's going on because the kernel
> >>> > versions are different and so produce different messages about xenbr0
> >>> > (and I think may have a different bridge port management algorithm).
> >>> >
> >>> > But the messages about promiscuous mode seem the same, and of course
> >>> > promiscuous mode is controlled by userspace, rather than by the kernel
> >>> > (so should be the same in both).
> >>> >
> >>> > However, in the failed test we see extra messages about promis:
> >>> >
> >>> > Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
> >>> > mode
> >>> > ...
> >>> > Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
> >>>
> >>> Wouldn't those be another result of the guest shutting down /
> >>> being shut down?
> >>>
> >>> > Also, the qemu log for the guest in the failure case says this:
> >>> >
> >>> > Log-dirty command enable
> >>> > Log-dirty: no command yet.
> >>> > reset requested in cpu_handle_ioreq.
> >>>
> >>> So this would seem to call for instrumentation on the qemu side
> >>> then, as the only path via which this can be initiated is - afaics -
> >>> qemu_system_reset_request(), which doesn't have very many
> >>> callers that could possibly be of interest here. Adding Stefano ...
> >>
> >> I am pretty sure that those messages come from qemu traditional: "reset
> >> requested in cpu_handle_ioreq" is not printed by qemu-xen.
> >
> > Oh, indeed - I didn't pay attention to this being a *-qemut-*
> > test. I'm sorry.
> >
> >> In any case, the request comes from qemu_system_reset_request, which is
> >> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
> >> initiated the reset (or resume)?
> >
> > Right, this and hw/pckbd.c look to be the only possible
> > sources. Yet then it's still unclear what makes the guest go
> > down.
>
> So with all of the above in mind I wonder whether we shouldn't
> revert 933f966bcd then - that debugging code is unlikely to help
> with any further analysis of the issue, as reaching that code
> for a dying domain is only a symptom as far as we understand it
> now, not anywhere near the cause.
Makes sense to me
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-09 8:19 ` Jan Beulich
2017-06-09 17:50 ` Stefano Stabellini
@ 2017-06-12 14:30 ` Julien Grall
2017-06-12 14:57 ` Jan Beulich
1 sibling, 1 reply; 14+ messages in thread
From: Julien Grall @ 2017-06-12 14:30 UTC (permalink / raw)
To: Jan Beulich, Andrew Cooper, George Dunlap
Cc: Ian Jackson, Stefano Stabellini, Wei Liu, osstest-admin, xen-devel
Hi Jan,
On 09/06/17 09:19, Jan Beulich wrote:
>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>> it's not terribly easy to see what's going on because the kernel
>>>>> versions are different and so produce different messages about xenbr0
>>>>> (and I think may have a different bridge port management algorithm).
>>>>>
>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>> promiscuous mode is controlled by userspace, rather than by the kernel
>>>>> (so should be the same in both).
>>>>>
>>>>> However, in the failed test we see extra messages about promis:
>>>>>
>>>>> Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>>>> mode
>>>>> ...
>>>>> Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>>
>>>> Wouldn't those be another result of the guest shutting down /
>>>> being shut down?
>>>>
>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>
>>>>> Log-dirty command enable
>>>>> Log-dirty: no command yet.
>>>>> reset requested in cpu_handle_ioreq.
>>>>
>>>> So this would seem to call for instrumentation on the qemu side
>>>> then, as the only path via which this can be initiated is - afaics -
>>>> qemu_system_reset_request(), which doesn't have very many
>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>
>>> I am pretty sure that those messages come from qemu traditional: "reset
>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>
>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>> test. I'm sorry.
>>
>>> In any case, the request comes from qemu_system_reset_request, which is
>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>> initiated the reset (or resume)?
>>
>> Right, this and hw/pckbd.c look to be the only possible
>> sources. Yet then it's still unclear what makes the guest go
>> down.
>
> So with all of the above in mind I wonder whether we shouldn't
> revert 933f966bcd then - that debugging code is unlikely to help
> with any further analysis of the issue, as reaching that code
> for a dying domain is only a symptom as far as we understand it
> now, not anywhere near the cause.
Are you suggesting to revert on Xen 4.9?
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-12 14:30 ` Julien Grall
@ 2017-06-12 14:57 ` Jan Beulich
2017-06-13 9:30 ` Julien Grall
0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2017-06-12 14:57 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, osstest-admin, xen-devel
>>> On 12.06.17 at 16:30, <julien.grall@arm.com> wrote:
> On 09/06/17 09:19, Jan Beulich wrote:
>>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>> versions are different and so produce different messages about xenbr0
>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>
>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>> promiscuous mode is controlled by userspace, rather than by the kernel
>>>>>> (so should be the same in both).
>>>>>>
>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>
>>>>>> Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>>>>> mode
>>>>>> ...
>>>>>> Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>>>
>>>>> Wouldn't those be another result of the guest shutting down /
>>>>> being shut down?
>>>>>
>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>
>>>>>> Log-dirty command enable
>>>>>> Log-dirty: no command yet.
>>>>>> reset requested in cpu_handle_ioreq.
>>>>>
>>>>> So this would seem to call for instrumentation on the qemu side
>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>
>>>> I am pretty sure that those messages come from qemu traditional: "reset
>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>
>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>> test. I'm sorry.
>>>
>>>> In any case, the request comes from qemu_system_reset_request, which is
>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>> initiated the reset (or resume)?
>>>
>>> Right, this and hw/pckbd.c look to be the only possible
>>> sources. Yet then it's still unclear what makes the guest go
>>> down.
>>
>> So with all of the above in mind I wonder whether we shouldn't
>> revert 933f966bcd then - that debugging code is unlikely to help
>> with any further analysis of the issue, as reaching that code
>> for a dying domain is only a symptom as far as we understand it
>> now, not anywhere near the cause.
>
> Are you suggesting to revert on Xen 4.9?
Yes, if we revert now, then I'd say on both master and 4.9.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-12 14:57 ` Jan Beulich
@ 2017-06-13 9:30 ` Julien Grall
2017-06-14 9:23 ` George Dunlap
0 siblings, 1 reply; 14+ messages in thread
From: Julien Grall @ 2017-06-13 9:30 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, osstest-admin, xen-devel, nd
Hi Jan,
On 12/06/2017 15:57, Jan Beulich wrote:
>>>> On 12.06.17 at 16:30, <julien.grall@arm.com> wrote:
>> On 09/06/17 09:19, Jan Beulich wrote:
>>>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>>> versions are different and so produce different messages about xenbr0
>>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>>
>>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>>> promiscuous mode is controlled by userspace, rather than by the kernel
>>>>>>> (so should be the same in both).
>>>>>>>
>>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>>
>>>>>>> Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left promiscuous
>>>>>>> mode
>>>>>>> ...
>>>>>>> Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left promiscuous mode
>>>>>>
>>>>>> Wouldn't those be another result of the guest shutting down /
>>>>>> being shut down?
>>>>>>
>>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>>
>>>>>>> Log-dirty command enable
>>>>>>> Log-dirty: no command yet.
>>>>>>> reset requested in cpu_handle_ioreq.
>>>>>>
>>>>>> So this would seem to call for instrumentation on the qemu side
>>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>>
>>>>> I am pretty sure that those messages come from qemu traditional: "reset
>>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>>
>>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>>> test. I'm sorry.
>>>>
>>>>> In any case, the request comes from qemu_system_reset_request, which is
>>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>>> initiated the reset (or resume)?
>>>>
>>>> Right, this and hw/pckbd.c look to be the only possible
>>>> sources. Yet then it's still unclear what makes the guest go
>>>> down.
>>>
>>> So with all of the above in mind I wonder whether we shouldn't
>>> revert 933f966bcd then - that debugging code is unlikely to help
>>> with any further analysis of the issue, as reaching that code
>>> for a dying domain is only a symptom as far as we understand it
>>> now, not anywhere near the cause.
>>
>> Are you suggesting to revert on Xen 4.9?
>
> Yes, if we revert now, then I'd say on both master and 4.9.
I would be ok with that.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xen-unstable test] 110009: regressions - FAIL
2017-06-13 9:30 ` Julien Grall
@ 2017-06-14 9:23 ` George Dunlap
0 siblings, 0 replies; 14+ messages in thread
From: George Dunlap @ 2017-06-14 9:23 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
osstest-admin, Jan Beulich, xen-devel, nd
On Tue, Jun 13, 2017 at 10:30 AM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Jan,
>
>
> On 12/06/2017 15:57, Jan Beulich wrote:
>>>>>
>>>>> On 12.06.17 at 16:30, <julien.grall@arm.com> wrote:
>>>
>>> On 09/06/17 09:19, Jan Beulich wrote:
>>>>>>>
>>>>>>> On 07.06.17 at 10:12, <JBeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 06.06.17 at 21:19, <sstabellini@kernel.org> wrote:
>>>>>>
>>>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>>>>
>>>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@eu.citrix.com> wrote:
>>>>>>>>
>>>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>>>> versions are different and so produce different messages about
>>>>>>>> xenbr0
>>>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>>>
>>>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>>>> promiscuous mode is controlled by userspace, rather than by the
>>>>>>>> kernel
>>>>>>>> (so should be the same in both).
>>>>>>>>
>>>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>>>
>>>>>>>> Jun 5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left
>>>>>>>> promiscuous
>>>>>>>> mode
>>>>>>>> ...
>>>>>>>> Jun 5 13:37:08.377571 [ 2191.675298] device vif7.0 left
>>>>>>>> promiscuous mode
>>>>>>>
>>>>>>>
>>>>>>> Wouldn't those be another result of the guest shutting down /
>>>>>>> being shut down?
>>>>>>>
>>>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>>>
>>>>>>>> Log-dirty command enable
>>>>>>>> Log-dirty: no command yet.
>>>>>>>> reset requested in cpu_handle_ioreq.
>>>>>>>
>>>>>>>
>>>>>>> So this would seem to call for instrumentation on the qemu side
>>>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>>>
>>>>>>
>>>>>> I am pretty sure that those messages come from qemu traditional:
>>>>>> "reset
>>>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>>>
>>>>>
>>>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>>>> test. I'm sorry.
>>>>>
>>>>>> In any case, the request comes from qemu_system_reset_request, which
>>>>>> is
>>>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>>>> initiated the reset (or resume)?
>>>>>
>>>>>
>>>>> Right, this and hw/pckbd.c look to be the only possible
>>>>> sources. Yet then it's still unclear what makes the guest go
>>>>> down.
>>>>
>>>>
>>>> So with all of the above in mind I wonder whether we shouldn't
>>>> revert 933f966bcd then - that debugging code is unlikely to help
>>>> with any further analysis of the issue, as reaching that code
>>>> for a dying domain is only a symptom as far as we understand it
>>>> now, not anywhere near the cause.
>>>
>>>
>>> Are you suggesting to revert on Xen 4.9?
>>
>>
>> Yes, if we revert now, then I'd say on both master and 4.9.
>
>
> I would be ok with that.
Reverting 933f966bcd
Acked-by: George Dunlap <george.dunlap@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-06-14 9:24 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 16:55 [xen-unstable test] 110009: regressions - FAIL osstest service owner
2017-06-06 12:59 ` Jan Beulich
2017-06-06 13:20 ` Andrew Cooper
2017-06-06 14:00 ` Jan Beulich
2017-06-06 14:00 ` Ian Jackson
2017-06-06 14:22 ` Jan Beulich
2017-06-06 19:19 ` Stefano Stabellini
2017-06-07 8:12 ` Jan Beulich
2017-06-09 8:19 ` Jan Beulich
2017-06-09 17:50 ` Stefano Stabellini
2017-06-12 14:30 ` Julien Grall
2017-06-12 14:57 ` Jan Beulich
2017-06-13 9:30 ` Julien Grall
2017-06-14 9:23 ` George Dunlap
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).