All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-4.9-testing test] 126201: regressions - FAIL
@ 2018-08-21  1:11 osstest service owner
  2018-08-21 11:14 ` Jan Beulich
       [not found] ` <5B7BF42E02000078001E06A7@suse.com>
  0 siblings, 2 replies; 10+ messages in thread
From: osstest service owner @ 2018-08-21  1:11 UTC (permalink / raw)
  To: xen-devel, osstest-admin

flight 126201 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/126201/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-arndale 5 host-ping-check-native fail in 126075 pass in 126201
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 16 guest-localmigrate/x10 fail pass in 126075

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds     10 debian-install           fail REGR. vs. 124328

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-ws16-amd64 18 guest-start/win.repeat fail blocked in 124328
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop      fail blocked in 124328
 test-amd64-amd64-xl-qemuu-ws16-amd64 14 guest-localmigrate fail in 126075 like 124248
 test-amd64-i386-xl-qemut-ws16-amd64 18 guest-start/win.repeat fail in 126075 like 124248
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop   fail in 126075 like 124328
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop  fail in 126075 like 124328
 test-amd64-i386-libvirt-pair 22 guest-migrate/src_host/dst_host fail like 124248
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop             fail like 124248
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail like 124248
 test-amd64-i386-xl-qemuu-ws16-amd64 16 guest-localmigrate/x10 fail like 124248
 test-armhf-armhf-xl-rtds     16 guest-start/debian.repeat    fail  like 124328
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 124328
 test-amd64-i386-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail like 124328
 test-amd64-amd64-libvirt     13 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      13 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-check    fail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds     13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     14 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-check    fail never pass
 test-armhf-armhf-xl          13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          14 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt     13 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt     14 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-vhd      12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-check    fail  never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-check    fail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install        fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-install        fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install         fail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install         fail never pass

version targeted for testing:
 xen                  6c9d139cdd0289f2b35b5deea4b41b8e3e1b39b7
baseline version:
 xen                  238007d6fae9447bf5e8e73d67ae9fb844e7ff2a

Last test of basis   124328  2018-06-17 23:39:07 Z   64 days
Failing since        124807  2018-06-28 17:38:04 Z   53 days   33 attempts
Testing same since   125922  2018-08-15 14:57:13 Z    5 days    3 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Christian Lindig <christian.lindig@citrix.com>
  George Dunlap <dunlapg@umich.edu
  George Dunlap <george.dunlap@citrix.com>
  Ian Jackson <Ian.Jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Juergen Gross <jgross@suse.com>
  Julien Grall <julien.grall@arm.com>
  Kevin Tian <kevin.tian@intel.com>
  Lars Kurth <lars.kurth.xen@gmail.com>
  Paul Durrant <paul.durrant@citrix.com>
  Stefano Stabellini <sstabellini@kernel.org>
  Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
  Wei Liu <wei.liu2@citrix.com>

jobs:
 build-amd64-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 build-amd64-rumprun                                          pass    
 build-i386-rumprun                                           pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        fail    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-rumprun-amd64                               pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-amd64-xl-qemut-ws16-amd64                         pass    
 test-amd64-i386-xl-qemut-ws16-amd64                          fail    
 test-amd64-amd64-xl-qemuu-ws16-amd64                         fail    
 test-amd64-i386-xl-qemuu-ws16-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-i386-rumprun-i386                                 pass    
 test-amd64-amd64-xl-qemut-win10-i386                         fail    
 test-amd64-i386-xl-qemut-win10-i386                          fail    
 test-amd64-amd64-xl-qemuu-win10-i386                         fail    
 test-amd64-i386-xl-qemuu-win10-i386                          fail    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-livepatch                                   pass    
 test-amd64-i386-livepatch                                    pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                fail    
 test-amd64-i386-libvirt-pair                                 fail    
 test-amd64-amd64-amd64-pvgrub                                pass    
 test-amd64-amd64-i386-pvgrub                                 pass    
 test-amd64-amd64-pygrub                                      pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-xl-raw                                       pass    
 test-amd64-amd64-xl-rtds                                     fail    
 test-armhf-armhf-xl-rtds                                     fail    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow             pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow              pass    
 test-amd64-amd64-xl-shadow                                   pass    
 test-amd64-i386-xl-shadow                                    pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-armhf-armhf-xl-vhd                                      pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 1277 lines long.)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-21  1:11 [xen-4.9-testing test] 126201: regressions - FAIL osstest service owner
@ 2018-08-21 11:14 ` Jan Beulich
  2018-08-21 11:44   ` Roger Pau Monné
       [not found] ` <5B7BF42E02000078001E06A7@suse.com>
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2018-08-21 11:14 UTC (permalink / raw)
  To: osstest service owner; +Cc: xen-devel, Jim Fehlig

>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> flight 126201 xen-4.9-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/126201/ 
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328

Something needs to be done about this, as this continued failure is
blocking the 4.9.3 release. I did mail about this on Aug 2nd already
for flight 125710, I've got back from Wei:

>This is libvirtd's error message.
>
>The remote host can't obtain the state change log due to it is already
>held by another task/thread. It could be a libvirt / libxl bug.
>
>2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)

The apparently same issue is blocking 4.7, and I think it is only
because of some earlier force-push and/or "fail pass in" that 4.8
and 4.6 aren't blocked by this. The failures look to always be on
the joubertins. 4.10, 4.11, and master all have entries on these
hosts (some not very new, but anyway), and hence might be
fine.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-21 11:14 ` Jan Beulich
@ 2018-08-21 11:44   ` Roger Pau Monné
  2018-08-21 11:58     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2018-08-21 11:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Jim Fehlig, osstest service owner

On Tue, Aug 21, 2018 at 05:14:54AM -0600, Jan Beulich wrote:
> >>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > flight 126201 xen-4.9-testing real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/126201/ 
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> 
> Something needs to be done about this, as this continued failure is
> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> for flight 125710, I've got back from Wei:
> 
> >This is libvirtd's error message.
> >
> >The remote host can't obtain the state change log due to it is already
> >held by another task/thread. It could be a libvirt / libxl bug.
> >
> >2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> >Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)

Seems like the error is mostly the same, this happens on the
destination host, and from the libvirt log:

2018-08-19 17:05:19.183+0000: 24982: debug : virEventPollInterruptLocked:726 : Interrupting
2018-08-19 17:05:19.183+0000: 24982: info : virEventPollAddTimeout:253 : EVENT_POLL_ADD_TIMEOUT: timer=3 frequency=60000 cb=0x7f84db8bf87e opaque=0x7f84a00f7240 ff=0x7f84db8bf632
2018-08-19 17:05:19.183+0000: 24982: debug : libvirt_vmessage:76 : libvirt_vmessage: context='libxl' format='%s%s%s%s%s%s'
2018-08-19 17:05:19.183+0000: 24982: info : virEventPollUpdateHandle:152 : EVENT_POLL_UPDATE_HANDLE: watch=10 events=5
2018-08-19 17:05:19.183+0000: 24982: debug : virEventPollInterruptLocked:726 : Interrupting
2018-08-19 17:05:19.188+0000: 24982: debug : libvirt_vmessage:76 : libvirt_vmessage: context='libxl' format='%s%s%s%s%s%s'
2018-08-19 17:05:19.188+0000: 24982: debug : libvirt_vmessage:76 : libvirt_vmessage: context='libxl' format='%s%s%s%s%s%s'
[...]
2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24982)
2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 : Timed out during operation: cannot acquire state change lock

I have however no idea of what's going on.

> The apparently same issue is blocking 4.7, and I think it is only
> because of some earlier force-push and/or "fail pass in" that 4.8
> and 4.6 aren't blocked by this. The failures look to always be on
> the joubertins. 4.10, 4.11, and master all have entries on these
> hosts (some not very new, but anyway), and hence might be
> fine.

AFAICT it only happens with Xen <= 4.9?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-21 11:44   ` Roger Pau Monné
@ 2018-08-21 11:58     ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2018-08-21 11:58 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jim Fehlig, osstest service owner

>>> On 21.08.18 at 13:44, <roger.pau@citrix.com> wrote:
> On Tue, Aug 21, 2018 at 05:14:54AM -0600, Jan Beulich wrote:
>> The apparently same issue is blocking 4.7, and I think it is only
>> because of some earlier force-push and/or "fail pass in" that 4.8
>> and 4.6 aren't blocked by this. The failures look to always be on
>> the joubertins. 4.10, 4.11, and master all have entries on these
>> hosts (some not very new, but anyway), and hence might be
>> fine.
> 
> AFAICT it only happens with Xen <= 4.9?

That's what it currently looks like, and also only on the joubertins.
I have no idea on why either of the two criteria would matter;
according to the test history the libvirt commit hasn't changed on
those branches for quite a long time.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
       [not found] ` <5B7BF42E02000078001E06A7@suse.com>
@ 2018-08-22 22:52   ` Jim Fehlig
  2018-08-24  8:58     ` Wei Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Fehlig @ 2018-08-22 22:52 UTC (permalink / raw)
  To: Jan Beulich, osstest service owner; +Cc: xen-devel

On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>> flight 126201 xen-4.9-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>   test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> 
> Something needs to be done about this, as this continued failure is
> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> for flight 125710, I've got back from Wei:
> 
>> This is libvirtd's error message.
>>
>> The remote host can't obtain the state change log due to it is already
>> held by another task/thread. It could be a libvirt / libxl bug.
>>
>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>> Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)

I took a closer look at the logs and it appears the finish phase of migration 
fails to acquire the domain job lock since it is already held by the perform 
phase. In the perform phase, after the vm has been transferred to the dst, the 
qemu process associated with the vm is started. For whatever reason that takes a 
long time on this host:

2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm: 
Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with 
arguments: ...
2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event: domain 
1 device model: spawn watch p=(null)
...
2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback: watch 
w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event 
epath=/local/domain/0/device-model/1/state
2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event: domain 
1 device model: spawn watch p=running

In the meantime we move to the finish phase and timeout waiting for the above 
perform phase to complete

2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492 
(virNetServerHandleJob) is now running job remoteDispatchDomainMigrateFinish3Params
...
2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 : 
Cannot start job (modify) for domain debian.guest.osstest; current job is 
(modify) owned by (24982)
2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 : Timed 
out during operation: cannot acquire state change lock

What could be causing the long startup time of qemu on these hosts? Does dom0 
have enough cpu/memory? As you noticed, the libvirt commit used for this test 
has not changed in a long time, well before the failures appeared. Perhaps a 
subtle change in libxl is exposing the bug?

Regardless, I'm happy to have looked at the issue since I think libvirt can be 
improved to cope with the problem. The thread running in the dst receiving the 
vm via libxl_domain_create_restore() can be created with joinable flag, then 
joined in the finish phase before attempting to acquire the job lock. I'll look 
into making such an improvement in libvirt's libxl driver.

Regards,
Jim

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-22 22:52   ` Jim Fehlig
@ 2018-08-24  8:58     ` Wei Liu
  2018-08-27  7:50       ` Jan Beulich
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Wei Liu @ 2018-08-24  8:58 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, Wei Liu, osstest service owner, Jan Beulich

On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
> On 08/21/2018 05:14 AM, Jan Beulich wrote:
> > > > > On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > > flight 126201 xen-4.9-testing real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >   test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> > 
> > Something needs to be done about this, as this continued failure is
> > blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> > for flight 125710, I've got back from Wei:
> > 
> > > This is libvirtd's error message.
> > > 
> > > The remote host can't obtain the state change log due to it is already
> > > held by another task/thread. It could be a libvirt / libxl bug.
> > > 
> > > 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> > > Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
> 
> I took a closer look at the logs and it appears the finish phase of
> migration fails to acquire the domain job lock since it is already held by
> the perform phase. In the perform phase, after the vm has been transferred
> to the dst, the qemu process associated with the vm is started. For whatever
> reason that takes a long time on this host:
> 
> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
> arguments: ...
> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> domain 1 device model: spawn watch p=(null)

This is a spurious event after the watch has been set up.

> ...
> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
> event epath=/local/domain/0/device-model/1/state
> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> domain 1 device model: spawn watch p=running

So it has taken 32s for QEMU to write "running" in xenstore. This,
however, is still within the timeout limit set by libxl (60s).

> 
> In the meantime we move to the finish phase and timeout waiting for the
> above perform phase to complete
> 
> 2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492
> (virNetServerHandleJob) is now running job
> remoteDispatchDomainMigrateFinish3Params
> ...
> 2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 :
> Cannot start job (modify) for domain debian.guest.osstest; current job is
> (modify) owned by (24982)
> 2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 :
> Timed out during operation: cannot acquire state change lock
> 
> What could be causing the long startup time of qemu on these hosts? Does
> dom0 have enough cpu/memory? As you noticed, the libvirt commit used for
> this test has not changed in a long time, well before the failures appeared.
> Perhaps a subtle change in libxl is exposing the bug?

There have only been two changes to libxl in the range of changesets
being tested.

c257e35a libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config
5d92007c libxl: restore passing "readonly=" to qemu for SCSI disks

They wouldn't change how libxl interact with libvirt. QEMU tag didn't
change.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-24  8:58     ` Wei Liu
@ 2018-08-27  7:50       ` Jan Beulich
  2018-08-30 10:57       ` Wei Liu
  2018-09-05 21:37       ` Jim Fehlig
  2 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2018-08-27  7:50 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Jim Fehlig, osstest service owner

>>> On 24.08.18 at 10:58, <wei.liu2@citrix.com> wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
>> What could be causing the long startup time of qemu on these hosts? Does
>> dom0 have enough cpu/memory? As you noticed, the libvirt commit used for
>> this test has not changed in a long time, well before the failures appeared.
>> Perhaps a subtle change in libxl is exposing the bug?
> 
> There have only been two changes to libxl in the range of changesets
> being tested.
> 
> c257e35a libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config
> 5d92007c libxl: restore passing "readonly=" to qemu for SCSI disks
> 
> They wouldn't change how libxl interact with libvirt. QEMU tag didn't
> change.

I'm afraid this is an unhelpful perspective to take: The issue apparently
being host-specific, a possible commit having exposed the bad behavior
may have passed the push gate long ago, due to the test having been
performed on another host.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-24  8:58     ` Wei Liu
  2018-08-27  7:50       ` Jan Beulich
@ 2018-08-30 10:57       ` Wei Liu
  2018-09-05 21:37       ` Jim Fehlig
  2 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2018-08-30 10:57 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, Wei Liu, osstest service owner, Jan Beulich

On Fri, Aug 24, 2018 at 09:58:02AM +0100, Wei Liu wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
> > On 08/21/2018 05:14 AM, Jan Beulich wrote:
> > > > > > On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > > > flight 126201 xen-4.9-testing real [real]
> > > > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> > > > 
> > > > Regressions :-(
> > > > 
> > > > Tests which did not succeed and are blocking,
> > > > including tests which could not be run:
> > > >   test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> > > 
> > > Something needs to be done about this, as this continued failure is
> > > blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> > > for flight 125710, I've got back from Wei:
> > > 
> > > > This is libvirtd's error message.
> > > > 
> > > > The remote host can't obtain the state change log due to it is already
> > > > held by another task/thread. It could be a libvirt / libxl bug.
> > > > 
> > > > 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> > > > Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
> > 
> > I took a closer look at the logs and it appears the finish phase of
> > migration fails to acquire the domain job lock since it is already held by
> > the perform phase. In the perform phase, after the vm has been transferred
> > to the dst, the qemu process associated with the vm is started. For whatever
> > reason that takes a long time on this host:
> > 
> > 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
> > Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
> > arguments: ...
> > 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> > domain 1 device model: spawn watch p=(null)
> 
> This is a spurious event after the watch has been set up.
> 
> > ...
> > 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
> > watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
> > event epath=/local/domain/0/device-model/1/state
> > 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> > domain 1 device model: spawn watch p=running
> 
> So it has taken 32s for QEMU to write "running" in xenstore. This,
> however, is still within the timeout limit set by libxl (60s).
> 

I haven't been able to reliably reproduce the timeout.

One thing I observe is that libvirt picks qdisk backend while xl picks
phys backend.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-08-24  8:58     ` Wei Liu
  2018-08-27  7:50       ` Jan Beulich
  2018-08-30 10:57       ` Wei Liu
@ 2018-09-05 21:37       ` Jim Fehlig
  2018-09-11 22:18         ` Jim Fehlig
  2 siblings, 1 reply; 10+ messages in thread
From: Jim Fehlig @ 2018-09-05 21:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, osstest service owner, Jan Beulich

On 08/24/2018 02:58 AM, Wei Liu wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
>> On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>>>> flight 126201 xen-4.9-testing real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>    test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
>>>
>>> Something needs to be done about this, as this continued failure is
>>> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
>>> for flight 125710, I've got back from Wei:
>>>
>>>> This is libvirtd's error message.
>>>>
>>>> The remote host can't obtain the state change log due to it is already
>>>> held by another task/thread. It could be a libvirt / libxl bug.
>>>>
>>>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>>>> Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
>>
>> I took a closer look at the logs and it appears the finish phase of
>> migration fails to acquire the domain job lock since it is already held by
>> the perform phase. In the perform phase, after the vm has been transferred
>> to the dst, the qemu process associated with the vm is started. For whatever
>> reason that takes a long time on this host:
>>
>> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
>> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
>> arguments: ...
>> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>> domain 1 device model: spawn watch p=(null)
> 
> This is a spurious event after the watch has been set up.
> 
>> ...
>> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
>> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
>> event epath=/local/domain/0/device-model/1/state
>> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>> domain 1 device model: spawn watch p=running
> 
> So it has taken 32s for QEMU to write "running" in xenstore. This,
> however, is still within the timeout limit set by libxl (60s).

Right, but it is not within libvirt's job wait timeout, which is 30s.

I've sent a series to fix this and other problems I found while testing/debugging

https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html

Assuming those patches are committed to libvirt.git master, it's not clear how 
they will improve this and other tests that use an older, fixed libvirt commit.

Regards,
Jim

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [xen-4.9-testing test] 126201: regressions - FAIL
  2018-09-05 21:37       ` Jim Fehlig
@ 2018-09-11 22:18         ` Jim Fehlig
  0 siblings, 0 replies; 10+ messages in thread
From: Jim Fehlig @ 2018-09-11 22:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, osstest service owner, Jan Beulich

On 9/5/18 3:37 PM, Jim Fehlig wrote:
> On 08/24/2018 02:58 AM, Wei Liu wrote:
>> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
>>> On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>>>>> flight 126201 xen-4.9-testing real [real]
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>>>>
>>>>> Regressions :-(
>>>>>
>>>>> Tests which did not succeed and are blocking,
>>>>> including tests which could not be run:
>>>>>    test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail 
>>>>> REGR. vs. 124328
>>>>
>>>> Something needs to be done about this, as this continued failure is
>>>> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
>>>> for flight 125710, I've got back from Wei:
>>>>
>>>>> This is libvirtd's error message.
>>>>>
>>>>> The remote host can't obtain the state change log due to it is already
>>>>> held by another task/thread. It could be a libvirt / libxl bug.
>>>>>
>>>>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>>>>> Cannot start job (modify) for domain debian.guest.osstest; current job is 
>>>>> (modify) owned by (24975)
>>>
>>> I took a closer look at the logs and it appears the finish phase of
>>> migration fails to acquire the domain job lock since it is already held by
>>> the perform phase. In the perform phase, after the vm has been transferred
>>> to the dst, the qemu process associated with the vm is started. For whatever
>>> reason that takes a long time on this host:
>>>
>>> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
>>> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
>>> arguments: ...
>>> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>>> domain 1 device model: spawn watch p=(null)
>>
>> This is a spurious event after the watch has been set up.
>>
>>> ...
>>> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
>>> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
>>> event epath=/local/domain/0/device-model/1/state
>>> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
>>> domain 1 device model: spawn watch p=running
>>
>> So it has taken 32s for QEMU to write "running" in xenstore. This,
>> however, is still within the timeout limit set by libxl (60s).
> 
> Right, but it is not within libvirt's job wait timeout, which is 30s.
> 
> I've sent a series to fix this and other problems I found while testing/debugging
> 
> https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html
> 
> Assuming those patches are committed to libvirt.git master, it's not clear how 
> they will improve this and other tests that use an older, fixed libvirt commit.

FYI, the patches fixing this problem from the libvirt side have been committed 
to libvir.git master now. See commits 60b4fd90, e39c66d3, 47da84e0, 0149464a, 
and 5ea2abb3.

Regards,
Jim

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-11 22:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-21  1:11 [xen-4.9-testing test] 126201: regressions - FAIL osstest service owner
2018-08-21 11:14 ` Jan Beulich
2018-08-21 11:44   ` Roger Pau Monné
2018-08-21 11:58     ` Jan Beulich
     [not found] ` <5B7BF42E02000078001E06A7@suse.com>
2018-08-22 22:52   ` Jim Fehlig
2018-08-24  8:58     ` Wei Liu
2018-08-27  7:50       ` Jan Beulich
2018-08-30 10:57       ` Wei Liu
2018-09-05 21:37       ` Jim Fehlig
2018-09-11 22:18         ` Jim Fehlig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.