* [xen-4.9-testing test] 126201: regressions - FAIL
@ 2018-08-21 1:11 osstest service owner
2018-08-21 11:14 ` Jan Beulich
[not found] ` <5B7BF42E02000078001E06A7@suse.com>
0 siblings, 2 replies; 10+ messages in thread
From: osstest service owner @ 2018-08-21 1:11 UTC (permalink / raw)
To: xen-devel, osstest-admin
flight 126201 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/126201/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
Tests which are failing intermittently (not blocking):
test-armhf-armhf-xl-arndale 5 host-ping-check-native fail in 126075 pass in 126201
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 16 guest-localmigrate/x10 fail pass in 126075
Regressions which are regarded as allowable (not blocking):
test-amd64-amd64-xl-rtds 10 debian-install fail REGR. vs. 124328
Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-qemuu-ws16-amd64 18 guest-start/win.repeat fail blocked in 124328
test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail blocked in 124328
test-amd64-amd64-xl-qemuu-ws16-amd64 14 guest-localmigrate fail in 126075 like 124248
test-amd64-i386-xl-qemut-ws16-amd64 18 guest-start/win.repeat fail in 126075 like 124248
test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail in 126075 like 124328
test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail in 126075 like 124328
test-amd64-i386-libvirt-pair 22 guest-migrate/src_host/dst_host fail like 124248
test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 124248
test-amd64-i386-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail like 124248
test-amd64-i386-xl-qemuu-ws16-amd64 16 guest-localmigrate/x10 fail like 124248
test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail like 124328
test-amd64-amd64-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 124328
test-amd64-i386-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail like 124328
test-amd64-amd64-libvirt 13 migrate-support-check fail never pass
test-amd64-i386-libvirt 13 migrate-support-check fail never pass
test-amd64-i386-libvirt-xsm 13 migrate-support-check fail never pass
test-amd64-amd64-libvirt-xsm 13 migrate-support-check fail never pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 13 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 14 saverestore-support-check fail never pass
test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass
test-armhf-armhf-xl-rtds 13 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 14 saverestore-support-check fail never pass
test-amd64-amd64-libvirt-vhd 12 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 13 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 14 saverestore-support-check fail never pass
test-armhf-armhf-xl 13 migrate-support-check fail never pass
test-armhf-armhf-xl 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit2 13 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 14 saverestore-support-check fail never pass
test-armhf-armhf-libvirt 13 migrate-support-check fail never pass
test-armhf-armhf-libvirt 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-vhd 12 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-raw 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail never pass
test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install fail never pass
test-amd64-amd64-xl-qemut-win10-i386 10 windows-install fail never pass
test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
version targeted for testing:
xen 6c9d139cdd0289f2b35b5deea4b41b8e3e1b39b7
baseline version:
xen 238007d6fae9447bf5e8e73d67ae9fb844e7ff2a
Last test of basis 124328 2018-06-17 23:39:07 Z 64 days
Failing since 124807 2018-06-28 17:38:04 Z 53 days 33 attempts
Testing same since 125922 2018-08-15 14:57:13 Z 5 days 3 attempts
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Christian Lindig <christian.lindig@citrix.com>
George Dunlap <dunlapg@umich.edu
George Dunlap <george.dunlap@citrix.com>
Ian Jackson <Ian.Jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Juergen Gross <jgross@suse.com>
Julien Grall <julien.grall@arm.com>
Kevin Tian <kevin.tian@intel.com>
Lars Kurth <lars.kurth.xen@gmail.com>
Paul Durrant <paul.durrant@citrix.com>
Stefano Stabellini <sstabellini@kernel.org>
Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Wei Liu <wei.liu2@citrix.com>
jobs:
build-amd64-xsm pass
build-i386-xsm pass
build-amd64-xtf pass
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-prev pass
build-i386-prev pass
build-amd64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
build-amd64-rumprun pass
build-i386-rumprun pass
test-xtf-amd64-amd64-1 pass
test-xtf-amd64-amd64-2 pass
test-xtf-amd64-amd64-3 pass
test-xtf-amd64-amd64-4 pass
test-xtf-amd64-amd64-5 pass
test-amd64-amd64-xl pass
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm fail
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass
test-amd64-amd64-libvirt-xsm pass
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-debianhvm-amd64 pass
test-amd64-i386-xl-qemut-debianhvm-amd64 pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-rumprun-amd64 pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-qemut-ws16-amd64 pass
test-amd64-i386-xl-qemut-ws16-amd64 fail
test-amd64-amd64-xl-qemuu-ws16-amd64 fail
test-amd64-i386-xl-qemuu-ws16-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit2 pass
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-i386-freebsd10-i386 pass
test-amd64-i386-rumprun-i386 pass
test-amd64-amd64-xl-qemut-win10-i386 fail
test-amd64-i386-xl-qemut-win10-i386 fail
test-amd64-amd64-xl-qemuu-win10-i386 fail
test-amd64-i386-xl-qemuu-win10-i386 fail
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-livepatch pass
test-amd64-i386-livepatch pass
test-amd64-amd64-migrupgrade pass
test-amd64-i386-migrupgrade pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair fail
test-amd64-i386-libvirt-pair fail
test-amd64-amd64-amd64-pvgrub pass
test-amd64-amd64-i386-pvgrub pass
test-amd64-amd64-pygrub pass
test-amd64-amd64-xl-qcow2 pass
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-xl-raw pass
test-amd64-amd64-xl-rtds fail
test-armhf-armhf-xl-rtds fail
test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-amd64-xl-shadow pass
test-amd64-i386-xl-shadow pass
test-amd64-amd64-libvirt-vhd pass
test-armhf-armhf-xl-vhd pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 1277 lines long.)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-21 1:11 [xen-4.9-testing test] 126201: regressions - FAIL osstest service owner
@ 2018-08-21 11:14 ` Jan Beulich
2018-08-21 11:44 ` Roger Pau Monné
[not found] ` <5B7BF42E02000078001E06A7@suse.com>
1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2018-08-21 11:14 UTC (permalink / raw)
To: osstest service owner; +Cc: xen-devel, Jim Fehlig
>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> flight 126201 xen-4.9-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
Something needs to be done about this, as this continued failure is
blocking the 4.9.3 release. I did mail about this on Aug 2nd already
for flight 125710, I've got back from Wei:
>This is libvirtd's error message.
>
>The remote host can't obtain the state change log due to it is already
>held by another task/thread. It could be a libvirt / libxl bug.
>
>2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
The apparently same issue is blocking 4.7, and I think it is only
because of some earlier force-push and/or "fail pass in" that 4.8
and 4.6 aren't blocked by this. The failures look to always be on
the joubertins. 4.10, 4.11, and master all have entries on these
hosts (some not very new, but anyway), and hence might be
fine.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-21 11:14 ` Jan Beulich
@ 2018-08-21 11:44 ` Roger Pau Monné
2018-08-21 11:58 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2018-08-21 11:44 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Jim Fehlig, osstest service owner
On Tue, Aug 21, 2018 at 05:14:54AM -0600, Jan Beulich wrote:
> >>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > flight 126201 xen-4.9-testing real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> >
> > Regressions :-(
> >
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> > test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
>
> Something needs to be done about this, as this continued failure is
> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> for flight 125710, I've got back from Wei:
>
> >This is libvirtd's error message.
> >
> >The remote host can't obtain the state change log due to it is already
> >held by another task/thread. It could be a libvirt / libxl bug.
> >
> >2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> >Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
Seems like the error is mostly the same, this happens on the
destination host, and from the libvirt log:
2018-08-19 17:05:19.183+0000: 24982: debug : virEventPollInterruptLocked:726 : Interrupting
2018-08-19 17:05:19.183+0000: 24982: info : virEventPollAddTimeout:253 : EVENT_POLL_ADD_TIMEOUT: timer=3 frequency=60000 cb=0x7f84db8bf87e opaque=0x7f84a00f7240 ff=0x7f84db8bf632
2018-08-19 17:05:19.183+0000: 24982: debug : libvirt_vmessage:76 : libvirt_vmessage: context='libxl' format='%s%s%s%s%s%s'
2018-08-19 17:05:19.183+0000: 24982: info : virEventPollUpdateHandle:152 : EVENT_POLL_UPDATE_HANDLE: watch=10 events=5
2018-08-19 17:05:19.183+0000: 24982: debug : virEventPollInterruptLocked:726 : Interrupting
2018-08-19 17:05:19.188+0000: 24982: debug : libvirt_vmessage:76 : libvirt_vmessage: context='libxl' format='%s%s%s%s%s%s'
2018-08-19 17:05:19.188+0000: 24982: debug : libvirt_vmessage:76 : libvirt_vmessage: context='libxl' format='%s%s%s%s%s%s'
[...]
2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24982)
2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 : Timed out during operation: cannot acquire state change lock
I have however no idea of what's going on.
> The apparently same issue is blocking 4.7, and I think it is only
> because of some earlier force-push and/or "fail pass in" that 4.8
> and 4.6 aren't blocked by this. The failures look to always be on
> the joubertins. 4.10, 4.11, and master all have entries on these
> hosts (some not very new, but anyway), and hence might be
> fine.
AFAICT it only happens with Xen <= 4.9?
Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-21 11:44 ` Roger Pau Monné
@ 2018-08-21 11:58 ` Jan Beulich
0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2018-08-21 11:58 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: xen-devel, Jim Fehlig, osstest service owner
>>> On 21.08.18 at 13:44, <roger.pau@citrix.com> wrote:
> On Tue, Aug 21, 2018 at 05:14:54AM -0600, Jan Beulich wrote:
>> The apparently same issue is blocking 4.7, and I think it is only
>> because of some earlier force-push and/or "fail pass in" that 4.8
>> and 4.6 aren't blocked by this. The failures look to always be on
>> the joubertins. 4.10, 4.11, and master all have entries on these
>> hosts (some not very new, but anyway), and hence might be
>> fine.
>
> AFAICT it only happens with Xen <= 4.9?
That's what it currently looks like, and also only on the joubertins.
I have no idea on why either of the two criteria would matter;
according to the test history the libvirt commit hasn't changed on
those branches for quite a long time.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
[not found] ` <5B7BF42E02000078001E06A7@suse.com>
@ 2018-08-22 22:52 ` Jim Fehlig
2018-08-24 8:58 ` Wei Liu
0 siblings, 1 reply; 10+ messages in thread
From: Jim Fehlig @ 2018-08-22 22:52 UTC (permalink / raw)
To: Jan Beulich, osstest service owner; +Cc: xen-devel
On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>> flight 126201 xen-4.9-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
>
> Something needs to be done about this, as this continued failure is
> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> for flight 125710, I've got back from Wei:
>
>> This is libvirtd's error message.
>>
>> The remote host can't obtain the state change log due to it is already
>> held by another task/thread. It could be a libvirt / libxl bug.
>>
>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>> Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
I took a closer look at the logs and it appears the finish phase of migration
fails to acquire the domain job lock since it is already held by the perform
phase. In the perform phase, after the vm has been transferred to the dst, the
qemu process associated with the vm is started. For whatever reason that takes a
long time on this host:
2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
arguments: ...
2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event: domain
1 device model: spawn watch p=(null)
...
2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback: watch
w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event
epath=/local/domain/0/device-model/1/state
2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event: domain
1 device model: spawn watch p=running
In the meantime we move to the finish phase and timeout waiting for the above
perform phase to complete
2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492
(virNetServerHandleJob) is now running job remoteDispatchDomainMigrateFinish3Params
...
2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 :
Cannot start job (modify) for domain debian.guest.osstest; current job is
(modify) owned by (24982)
2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 : Timed
out during operation: cannot acquire state change lock
What could be causing the long startup time of qemu on these hosts? Does dom0
have enough cpu/memory? As you noticed, the libvirt commit used for this test
has not changed in a long time, well before the failures appeared. Perhaps a
subtle change in libxl is exposing the bug?
Regardless, I'm happy to have looked at the issue since I think libvirt can be
improved to cope with the problem. The thread running in the dst receiving the
vm via libxl_domain_create_restore() can be created with joinable flag, then
joined in the finish phase before attempting to acquire the job lock. I'll look
into making such an improvement in libvirt's libxl driver.
Regards,
Jim
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-22 22:52 ` Jim Fehlig
@ 2018-08-24 8:58 ` Wei Liu
2018-08-27 7:50 ` Jan Beulich
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Wei Liu @ 2018-08-24 8:58 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel, Wei Liu, osstest service owner, Jan Beulich
On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
> On 08/21/2018 05:14 AM, Jan Beulich wrote:
> > > > > On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > > flight 126201 xen-4.9-testing real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> > >
> > > Regressions :-(
> > >
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > > test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> >
> > Something needs to be done about this, as this continued failure is
> > blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> > for flight 125710, I've got back from Wei:
> >
> > > This is libvirtd's error message.
> > >
> > > The remote host can't obtain the state change log due to it is already
> > > held by another task/thread. It could be a libvirt / libxl bug.
> > >
> > > 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> > > Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
>
> I took a closer look at the logs and it appears the finish phase of
> migration fails to acquire the domain job lock since it is already held by
> the perform phase. In the perform phase, after the vm has been transferred
> to the dst, the qemu process associated with the vm is started. For whatever
> reason that takes a long time on this host:
>
> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
> arguments: ...
> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> domain 1 device model: spawn watch p=(null)
This is a spurious event after the watch has been set up.
> ...
> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
> event epath=/local/domain/0/device-model/1/state
> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> domain 1 device model: spawn watch p=running
So it has taken 32s for QEMU to write "running" in xenstore. This,
however, is still within the timeout limit set by libxl (60s).
>
> In the meantime we move to the finish phase and timeout waiting for the
> above perform phase to complete
>
> 2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492
> (virNetServerHandleJob) is now running job
> remoteDispatchDomainMigrateFinish3Params
> ...
> 2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 :
> Cannot start job (modify) for domain debian.guest.osstest; current job is
> (modify) owned by (24982)
> 2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 :
> Timed out during operation: cannot acquire state change lock
>
> What could be causing the long startup time of qemu on these hosts? Does
> dom0 have enough cpu/memory? As you noticed, the libvirt commit used for
> this test has not changed in a long time, well before the failures appeared.
> Perhaps a subtle change in libxl is exposing the bug?
There have only been two changes to libxl in the range of changesets
being tested.
c257e35a libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config
5d92007c libxl: restore passing "readonly=" to qemu for SCSI disks
They wouldn't change how libxl interact with libvirt. QEMU tag didn't
change.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-24 8:58 ` Wei Liu
@ 2018-08-27 7:50 ` Jan Beulich
2018-08-30 10:57 ` Wei Liu
2018-09-05 21:37 ` Jim Fehlig
2 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2018-08-27 7:50 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel, Jim Fehlig, osstest service owner
>>> On 24.08.18 at 10:58, <wei.liu2@citrix.com> wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
>> What could be causing the long startup time of qemu on these hosts? Does
>> dom0 have enough cpu/memory? As you noticed, the libvirt commit used for
>> this test has not changed in a long time, well before the failures appeared.
>> Perhaps a subtle change in libxl is exposing the bug?
>
> There have only been two changes to libxl in the range of changesets
> being tested.
>
> c257e35a libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config
> 5d92007c libxl: restore passing "readonly=" to qemu for SCSI disks
>
> They wouldn't change how libxl interact with libvirt. QEMU tag didn't
> change.
I'm afraid this is an unhelpful perspective to take: The issue apparently
being host-specific, a possible commit having exposed the bad behavior
may have passed the push gate long ago, due to the test having been
performed on another host.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-24 8:58 ` Wei Liu
2018-08-27 7:50 ` Jan Beulich
@ 2018-08-30 10:57 ` Wei Liu
2018-09-05 21:37 ` Jim Fehlig
2 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2018-08-30 10:57 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel, Wei Liu, osstest service owner, Jan Beulich
On Fri, Aug 24, 2018 at 09:58:02AM +0100, Wei Liu wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
> > On 08/21/2018 05:14 AM, Jan Beulich wrote:
> > > > > > On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > > > flight 126201 xen-4.9-testing real [real]
> > > > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> > > >
> > > > Regressions :-(
> > > >
> > > > Tests which did not succeed and are blocking,
> > > > including tests which could not be run:
> > > > test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> > >
> > > Something needs to be done about this, as this continued failure is
> > > blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> > > for flight 125710, I've got back from Wei:
> > >
> > > > This is libvirtd's error message.
> > > >
> > > > The remote host can't obtain the state change log due to it is already
> > > > held by another task/thread. It could be a libvirt / libxl bug.
> > > >
> > > > 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> > > > Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
> >
> > I took a closer look at the logs and it appears the finish phase of
> > migration fails to acquire the domain job lock since it is already held by
> > the perform phase. In the perform phase, after the vm has been transferred
> > to the dst, the qemu process associated with the vm is started. For whatever
> > reason that takes a long time on this host:
> >
> > 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
> > Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
> > arguments: ...
> > 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> > domain 1 device model: spawn watch p=(null)
>
> This is a spurious event after the watch has been set up.
>
> > ...
> > 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
> > watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
> > event epath=/local/domain/0/device-model/1/state
> > 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> > domain 1 device model: spawn watch p=running
>
> So it has taken 32s for QEMU to write "running" in xenstore. This,
> however, is still within the timeout limit set by libxl (60s).
>
I haven't been able to reliably reproduce the timeout.
One thing I observe is that libvirt picks qdisk backend while xl picks
phys backend.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-08-24 8:58 ` Wei Liu
2018-08-27 7:50 ` Jan Beulich
2018-08-30 10:57 ` Wei Liu
@ 2018-09-05 21:37 ` Jim Fehlig
2018-09-11 22:18 ` Jim Fehlig
2 siblings, 1 reply; 10+ messages in thread
From: Jim Fehlig @ 2018-09-05 21:37 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel, osstest service owner, Jan Beulich
On 08/24/2018 02:58 AM, Wei Liu wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
>> On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>>>> flight 126201 xen-4.9-testing real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>> test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
>>>
>>> Something needs to be done about this, as this continued failure is
>>> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
>>> for flight 125710, I've got back from Wei:
>>>
>>>> This is libvirtd's error message.
>>>>
>>>> The remote host can't obtain the state change log due to it is already
>>>> held by another task/thread. It could be a libvirt / libxl bug.
>>>>
>>>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>>>> Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
>>
>> I took a closer look at the logs and it appears the finish phase of
>> migration fails to acquire the domain job lock since it is already held by
>> the perform phase. In the perform phase, after the vm has been transferred
>> to the dst, the qemu process associated with the vm is started. For whatever
>> reason that takes a long time on this host:
>>
>> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
>> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
>> arguments: ...
>> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>> domain 1 device model: spawn watch p=(null)
>
> This is a spurious event after the watch has been set up.
>
>> ...
>> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
>> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
>> event epath=/local/domain/0/device-model/1/state
>> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>> domain 1 device model: spawn watch p=running
>
> So it has taken 32s for QEMU to write "running" in xenstore. This,
> however, is still within the timeout limit set by libxl (60s).
Right, but it is not within libvirt's job wait timeout, which is 30s.
I've sent a series to fix this and other problems I found while testing/debugging
https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html
Assuming those patches are committed to libvirt.git master, it's not clear how
they will improve this and other tests that use an older, fixed libvirt commit.
Regards,
Jim
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [xen-4.9-testing test] 126201: regressions - FAIL
2018-09-05 21:37 ` Jim Fehlig
@ 2018-09-11 22:18 ` Jim Fehlig
0 siblings, 0 replies; 10+ messages in thread
From: Jim Fehlig @ 2018-09-11 22:18 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel, osstest service owner, Jan Beulich
On 9/5/18 3:37 PM, Jim Fehlig wrote:
> On 08/24/2018 02:58 AM, Wei Liu wrote:
>> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
>>> On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>>>>> flight 126201 xen-4.9-testing real [real]
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>>>>
>>>>> Regressions :-(
>>>>>
>>>>> Tests which did not succeed and are blocking,
>>>>> including tests which could not be run:
>>>>> test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail
>>>>> REGR. vs. 124328
>>>>
>>>> Something needs to be done about this, as this continued failure is
>>>> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
>>>> for flight 125710, I've got back from Wei:
>>>>
>>>>> This is libvirtd's error message.
>>>>>
>>>>> The remote host can't obtain the state change log due to it is already
>>>>> held by another task/thread. It could be a libvirt / libxl bug.
>>>>>
>>>>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>>>>> Cannot start job (modify) for domain debian.guest.osstest; current job is
>>>>> (modify) owned by (24975)
>>>
>>> I took a closer look at the logs and it appears the finish phase of
>>> migration fails to acquire the domain job lock since it is already held by
>>> the perform phase. In the perform phase, after the vm has been transferred
>>> to the dst, the qemu process associated with the vm is started. For whatever
>>> reason that takes a long time on this host:
>>>
>>> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
>>> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
>>> arguments: ...
>>> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>>> domain 1 device model: spawn watch p=(null)
>>
>> This is a spurious event after the watch has been set up.
>>
>>> ...
>>> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
>>> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
>>> event epath=/local/domain/0/device-model/1/state
>>> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>>> domain 1 device model: spawn watch p=running
>>
>> So it has taken 32s for QEMU to write "running" in xenstore. This,
>> however, is still within the timeout limit set by libxl (60s).
>
> Right, but it is not within libvirt's job wait timeout, which is 30s.
>
> I've sent a series to fix this and other problems I found while testing/debugging
>
> https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html
>
> Assuming those patches are committed to libvirt.git master, it's not clear how
> they will improve this and other tests that use an older, fixed libvirt commit.
FYI, the patches fixing this problem from the libvirt side have been committed
to libvir.git master now. See commits 60b4fd90, e39c66d3, 47da84e0, 0149464a,
and 5ea2abb3.
Regards,
Jim
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-09-11 22:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-21 1:11 [xen-4.9-testing test] 126201: regressions - FAIL osstest service owner
2018-08-21 11:14 ` Jan Beulich
2018-08-21 11:44 ` Roger Pau Monné
2018-08-21 11:58 ` Jan Beulich
[not found] ` <5B7BF42E02000078001E06A7@suse.com>
2018-08-22 22:52 ` Jim Fehlig
2018-08-24 8:58 ` Wei Liu
2018-08-27 7:50 ` Jan Beulich
2018-08-30 10:57 ` Wei Liu
2018-09-05 21:37 ` Jim Fehlig
2018-09-11 22:18 ` Jim Fehlig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.