* [libvirt test] 55257: regressions - FAIL
@ 2015-05-11 12:46 osstest service user
2015-05-11 13:22 ` Ian Campbell
0 siblings, 1 reply; 15+ messages in thread
From: osstest service user @ 2015-05-11 12:46 UTC (permalink / raw)
To: xen-devel; +Cc: ian.jackson
flight 55257 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/55257/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854
Tests which did not succeed, but are not blocking:
test-amd64-i386-libvirt-xsm 11 guest-start fail never pass
test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass
test-armhf-armhf-libvirt-xsm 6 xen-boot fail never pass
test-amd64-i386-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
version targeted for testing:
libvirt 8910e063dbafc09695b2100c80213be569abb7ef
baseline version:
libvirt fd74e231751334b64af0934b680c5cc62f652453
------------------------------------------------------------
People who touched revisions under test:
Cole Robinson <crobinso@redhat.com>
------------------------------------------------------------
jobs:
build-amd64-xsm pass
build-armhf-xsm pass
build-i386-xsm pass
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
test-amd64-amd64-libvirt-xsm fail
test-armhf-armhf-libvirt-xsm fail
test-amd64-i386-libvirt-xsm fail
test-amd64-amd64-libvirt fail
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/osstest/pub/logs
images: /home/osstest/pub/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
------------------------------------------------------------
commit 8910e063dbafc09695b2100c80213be569abb7ef
Author: Cole Robinson <crobinso@redhat.com>
Date: Wed May 6 18:32:05 2015 -0400
caps: Fix regression defaulting to host arch
My commit 747761a79 (v1.2.15 only) dropped this bit of logic when filling
in a default arch in the XML:
- /* First try to find one matching host arch */
- for (i = 0; i < caps->nguests; i++) {
- if (caps->guests[i]->ostype == ostype) {
- for (j = 0; j < caps->guests[i]->arch.ndomains; j++) {
- if (caps->guests[i]->arch.domains[j]->type == domain &&
- caps->guests[i]->arch.id == caps->host.arch)
- return caps->guests[i]->arch.id;
- }
- }
- }
That attempt to match host.arch is important, otherwise we end up
defaulting to i686 on x86_64 host for KVM, which is not intended.
Duplicate it in the centralized CapsLookup function.
Additionally add some testcases that would have caught this.
https://bugzilla.redhat.com/show_bug.cgi?id=1219191
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-11 12:46 [libvirt test] 55257: regressions - FAIL osstest service user
@ 2015-05-11 13:22 ` Ian Campbell
2015-05-11 16:36 ` Jim Fehlig
0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-05-11 13:22 UTC (permalink / raw)
To: xen-devel, ian.jackson, Jim Fehlig
On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote:
> flight 55257 libvirt real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/55257/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854
I fear this may be a new heisenbug.
I fear a heisenbug because flight 53854 passed and there is only one
more, completely unrelated change here.
I saw something similar in
http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an
osstest flight against itself (so not posted to the list). That one had:
> test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 53073
> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53073
In that case the range of libvirt was more useful than the one commit
here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not
sure if 225aa80246d5 was OK or not
http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libxl-libxl-driver.log ends with:
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda2 spec.backend=qdisk
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd10750: deregister unregistered
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda1 spec.backend=qdisk
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979ccdd370: deregister unregistered
libxl: debug: libxl_dm.c:1487:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: /usr/local/lib/xen/bin/qemu-system-i386
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-domid
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 1
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -no-shutdown
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxl-cmd,mode=control
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxenstat-cmd,mode=control
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nodefaults
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-attach
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -name
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: debian.guest.osstest
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -vnc
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -display
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nographic
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -machine
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: xenpv
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -m
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 512
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
libxl: debug: libxl_create.c:1549:do_domain_create: ao 0x7f979cce1b10: inprogress: poller=0x7f979ccdd260, flags=i
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state)
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3
libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
libxl: error: libxl_dm.c:1558:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1351:domcreate_devmodel_started: device model did not start: -3
libxl: debug: libxl_dm.c:1671:kill_device_model: Device Model signaled
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd12ce0: deregister unregistered
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd13010: deregister unregistered
libxl: info: libxl.c:1701:devices_destroy_cb: forked pid 18657 for destroy of domain 1
libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x7f979cce1b10: complete, rc=-3
The qemu log is sadly empty so I've no clue why this timed out.
Perhaps there is something in
http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
I can't make heads nor tail though.
> Tests which did not succeed, but are not blocking:
> test-amd64-i386-libvirt-xsm 11 guest-start fail never pass
> test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass
> test-armhf-armhf-libvirt-xsm 6 xen-boot fail never pass
> test-amd64-i386-libvirt 12 migrate-support-check fail never pass
> test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
>
> version targeted for testing:
> libvirt 8910e063dbafc09695b2100c80213be569abb7ef
> baseline version:
> libvirt fd74e231751334b64af0934b680c5cc62f652453
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-11 13:22 ` Ian Campbell
@ 2015-05-11 16:36 ` Jim Fehlig
2015-05-11 17:02 ` Ian Campbell
2015-05-13 8:46 ` Ian Campbell
0 siblings, 2 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-11 16:36 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel, ian.jackson
Ian Campbell wrote:
> On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote:
>
>> flight 55257 libvirt real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/55257/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854
>>
>
> I fear this may be a new heisenbug.
>
> I fear a heisenbug because flight 53854 passed and there is only one
> more, completely unrelated change here.
>
> I saw something similar in
> http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an
> osstest flight against itself (so not posted to the list). That one had:
>
>> test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 53073
>> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53073
>>
>
> In that case the range of libvirt was more useful than the one commit
> here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not
> sure if 225aa80246d5 was OK or not
>
225aa80246d5 only touches the qemu driver and should not affect Xen.
> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libxl-libxl-driver.log ends with:
>
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda2 spec.backend=qdisk
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd10750: deregister unregistered
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda1 spec.backend=qdisk
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979ccdd370: deregister unregistered
> libxl: debug: libxl_dm.c:1487:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: /usr/local/lib/xen/bin/qemu-system-i386
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-domid
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 1
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -no-shutdown
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxl-cmd,mode=control
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxenstat-cmd,mode=control
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nodefaults
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-attach
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -name
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: debian.guest.osstest
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -vnc
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -display
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nographic
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -machine
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: xenpv
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -m
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 512
> libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
> libxl: debug: libxl_create.c:1549:do_domain_create: ao 0x7f979cce1b10: inprogress: poller=0x7f979ccdd260, flags=i
> libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
> libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state)
> libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3
> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
> libxl: error: libxl_dm.c:1558:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1351:domcreate_devmodel_started: device model did not start: -3
> libxl: debug: libxl_dm.c:1671:kill_device_model: Device Model signaled
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd12ce0: deregister unregistered
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd13010: deregister unregistered
> libxl: info: libxl.c:1701:devices_destroy_cb: forked pid 18657 for destroy of domain 1
> libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x7f979cce1b10: complete, rc=-3
>
> The qemu log is sadly empty so I've no clue why this timed out.
>
I guess qemu didn't run at all...
> Perhaps there is something in
> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> I can't make heads nor tail though.
>
Nothing interesting. Only the unhelpful
2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
internal error: libxenlight failed to create new domain
'debian.guest.osstest'
Off topic, but I'd really like to improve reporting of libxl errors in
libvirt. Currently, when calls to libxl_foo fail, libvirt simply
reports something like "libxenlight failed foo". Users must resort to
/var/log/libvirt/libxl/libxl-driver.log and
/var/log/xen/qemu-dm-<domname>.log for details. Perhaps a topic for the
dev summit.
Regards,
Jim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-11 16:36 ` Jim Fehlig
@ 2015-05-11 17:02 ` Ian Campbell
2015-05-13 8:46 ` Ian Campbell
1 sibling, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-11 17:02 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel, ian.jackson
On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> Ian Campbell wrote:
> > On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote:
> >
> >> flight 55257 libvirt real [real]
> >> http://logs.test-lab.xenproject.org/osstest/logs/55257/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854
> >>
> >
> > I fear this may be a new heisenbug.
> >
> > I fear a heisenbug because flight 53854 passed and there is only one
> > more, completely unrelated change here.
> >
> > I saw something similar in
> > http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an
> > osstest flight against itself (so not posted to the list). That one had:
> >
> >> test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 53073
> >> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53073
> >>
> >
> > In that case the range of libvirt was more useful than the one commit
> > here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not
> > sure if 225aa80246d5 was OK or not
> >
>
> 225aa80246d5 only touches the qemu driver and should not affect Xen.
I meant the state of the tree at that point rather than that commit
itself.
Ian.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-11 16:36 ` Jim Fehlig
2015-05-11 17:02 ` Ian Campbell
@ 2015-05-13 8:46 ` Ian Campbell
2015-05-13 17:46 ` Anthony PERARD
1 sibling, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-05-13 8:46 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel, ian.jackson
On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
[...]
> > The qemu log is sadly empty so I've no clue why this timed out.
> >
>
> I guess qemu didn't run at all...
>
> > Perhaps there is something in
> > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > I can't make heads nor tail though.
> >
>
> Nothing interesting. Only the unhelpful
>
> 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> internal error: libxenlight failed to create new domain
> 'debian.guest.osstest'
This happened again in
http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
Is there anything we could tweak in osstest to produce more helpful
logging?
> Off topic, but I'd really like to improve reporting of libxl errors in
> libvirt. Currently, when calls to libxl_foo fail, libvirt simply
> reports something like "libxenlight failed foo". Users must resort to
> /var/log/libvirt/libxl/libxl-driver.log and
> /var/log/xen/qemu-dm-<domname>.log for details. Perhaps a topic for the
> dev summit.
Indeed.
One thing we would like to do is to have more specific error codes so
that ERROR_FAIL is not returned everywhere. The xapi guys would like
this too. In general we are happy to have error codes which are used for
exactly one specific type of failure and to take patches to switch
things from ERROR_FAIL to use something more meaningful.
Other ideas:
A logger which, as well as logging, would cache the last N messages of a
certain priority or higher, in such a way that the caller could query
them and output them. If the priority was >= ERROR I think that would on
most failures get you the most relevant things.
I wonder if it would even be possible to buffer up all of the calls to a
given libxl_* entry point, in such a way that the messages associated
with exactly that call could be retrieved. If we could find a way to
integrate that with, say, the GC_INIT infrastructure then we would get
it for free almost everywhere (not sure how recursive calls to libxl_*
rather than libxl__* would be handled there).
Ian.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-13 8:46 ` Ian Campbell
@ 2015-05-13 17:46 ` Anthony PERARD
2015-05-14 10:47 ` Ian Campbell
0 siblings, 1 reply; 15+ messages in thread
From: Anthony PERARD @ 2015-05-13 17:46 UTC (permalink / raw)
To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson
On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
> On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> [...]
> > > The qemu log is sadly empty so I've no clue why this timed out.
> > >
> >
> > I guess qemu didn't run at all...
> >
> > > Perhaps there is something in
> > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > > I can't make heads nor tail though.
> > >
> >
> > Nothing interesting. Only the unhelpful
> >
> > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> > internal error: libxenlight failed to create new domain
> > 'debian.guest.osstest'
>
> This happened again in
> http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
>
> Is there anything we could tweak in osstest to produce more helpful
> logging?
Well we can find in var-log-libvirt-libvirtd.log.gz this:
2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
And for more information we need to look into the driver specific log,
libxl logs in var-log-libvirt-libxl-libxl-driver.log:
libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
I'm seeing this error a lot on our OpenStack CI loop, I thought the error
was due to the "host" been very busy, but if osstest is having the same
issue, then there is probably something wrong with libxl+libvirt :(.
--
Anthony PERARD
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-13 17:46 ` Anthony PERARD
@ 2015-05-14 10:47 ` Ian Campbell
2015-05-14 11:07 ` Anthony PERARD
2015-05-14 21:21 ` Jim Fehlig
0 siblings, 2 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-14 10:47 UTC (permalink / raw)
To: Anthony PERARD; +Cc: Jim Fehlig, xen-devel, ian.jackson
On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote:
> On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
> > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> > [...]
> > > > The qemu log is sadly empty so I've no clue why this timed out.
> > > >
> > >
> > > I guess qemu didn't run at all...
> > >
> > > > Perhaps there is something in
> > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > > > I can't make heads nor tail though.
> > > >
> > >
> > > Nothing interesting. Only the unhelpful
> > >
> > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> > > internal error: libxenlight failed to create new domain
> > > 'debian.guest.osstest'
> >
> > This happened again in
> > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
> >
> > Is there anything we could tweak in osstest to produce more helpful
> > logging?
>
> Well we can find in var-log-libvirt-libvirtd.log.gz this:
> 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
>
> And for more information we need to look into the driver specific log,
> libxl logs in var-log-libvirt-libxl-libxl-driver.log:
> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
Thanks, all of that was mentioned earlier in the thread too, I was
looking for ways to get more info.
> I'm seeing this error a lot on our OpenStack CI loop, I thought the error
> was due to the "host" been very busy, but if osstest is having the same
> issue, then there is probably something wrong with libxl+libvirt :(.
Are you able to reproduce at will or is it like osstest and just a
sporadic failure?
I suppose the openstack CI loop doesn't capture anything more
interesting than osstest does?
FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to
have two more instances of this (amd64 and i386), but with no
interesting logs still and a different one on ARM:
http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log:
2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16
2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml
error: internal error: received hangup / error event on socket
In that case the the libxl-driver log ends with:
libxl: debug: libxl_dm.c:1495:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
[...]
libxl: debug: libxl_event.c:600:libxl__ev_xswatch_register: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
libxl: debug: libxl_create.c:1560:do_domain_create: ao 0xb2e044f0: inprogress: poller=0xb2e07590, flags=i
libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
Which I don't think is complete, i.e. there should be more? Not sure if
this gives a hint for the x86 case too?
I don't see anything useful in
http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/arndale-lakeside---var-log-libvirt-libvirtd.log.gz
Ian.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-14 10:47 ` Ian Campbell
@ 2015-05-14 11:07 ` Anthony PERARD
2015-05-14 21:27 ` Jim Fehlig
2015-05-14 21:21 ` Jim Fehlig
1 sibling, 1 reply; 15+ messages in thread
From: Anthony PERARD @ 2015-05-14 11:07 UTC (permalink / raw)
To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson
On Thu, May 14, 2015 at 11:47:18AM +0100, Ian Campbell wrote:
> On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote:
> > On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
> > > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> > > [...]
> > > > > The qemu log is sadly empty so I've no clue why this timed out.
> > > > >
> > > >
> > > > I guess qemu didn't run at all...
> > > >
> > > > > Perhaps there is something in
> > > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > > > > I can't make heads nor tail though.
> > > > >
> > > >
> > > > Nothing interesting. Only the unhelpful
> > > >
> > > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> > > > internal error: libxenlight failed to create new domain
> > > > 'debian.guest.osstest'
> > >
> > > This happened again in
> > > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
> > >
> > > Is there anything we could tweak in osstest to produce more helpful
> > > logging?
> >
> > Well we can find in var-log-libvirt-libvirtd.log.gz this:
> > 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
> >
> > And for more information we need to look into the driver specific log,
> > libxl logs in var-log-libvirt-libxl-libxl-driver.log:
> > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
>
> Thanks, all of that was mentioned earlier in the thread too, I was
> looking for ways to get more info.
>
> > I'm seeing this error a lot on our OpenStack CI loop, I thought the error
> > was due to the "host" been very busy, but if osstest is having the same
> > issue, then there is probably something wrong with libxl+libvirt :(.
>
> Are you able to reproduce at will or is it like osstest and just a
> sporadic failure?
Like osstest. I haven't spend much time on it yet, I did not try to
reproduce it yet.
> I suppose the openstack CI loop doesn't capture anything more
> interesting than osstest does?
No, nothing else interesting. The next step would be to enable more debug
output from libvirtd by playing with "log_level" and "log_filters" in
/etc/libvirtd.conf, but I don't know which filter would be intersting.
--
Anthony PERARD
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-14 10:47 ` Ian Campbell
2015-05-14 11:07 ` Anthony PERARD
@ 2015-05-14 21:21 ` Jim Fehlig
2015-05-14 21:31 ` Jim Fehlig
` (2 more replies)
1 sibling, 3 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-14 21:21 UTC (permalink / raw)
To: Ian Campbell; +Cc: Anthony PERARD, xen-devel, ian.jackson
Ian Campbell wrote:
> On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote:
>
>> On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
>>
>>> On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
>>> [...]
>>>
>>>>> The qemu log is sadly empty so I've no clue why this timed out.
>>>>>
>>>>>
>>>> I guess qemu didn't run at all...
>>>>
>>>>
>>>>> Perhaps there is something in
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
>>>>> I can't make heads nor tail though.
>>>>>
>>>>>
>>>> Nothing interesting. Only the unhelpful
>>>>
>>>> 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
>>>> internal error: libxenlight failed to create new domain
>>>> 'debian.guest.osstest'
>>>>
>>> This happened again in
>>> http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
>>>
>>> Is there anything we could tweak in osstest to produce more helpful
>>> logging?
>>>
>> Well we can find in var-log-libvirt-libvirtd.log.gz this:
>> 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
>>
>> And for more information we need to look into the driver specific log,
>> libxl logs in var-log-libvirt-libxl-libxl-driver.log:
>> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
>>
>
> Thanks, all of that was mentioned earlier in the thread too, I was
> looking for ways to get more info.
>
>
>> I'm seeing this error a lot on our OpenStack CI loop, I thought the error
>> was due to the "host" been very busy, but if osstest is having the same
>> issue, then there is probably something wrong with libxl+libvirt :(.
>>
>
> Are you able to reproduce at will or is it like osstest and just a
> sporadic failure?
>
> I suppose the openstack CI loop doesn't capture anything more
> interesting than osstest does?
>
> FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to
> have two more instances of this (amd64 and i386)
More cases of qemu not starting. I'm not sure how we can get more
details about that.
> but with no
> interesting logs still and a different one on ARM:
>
> http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log:
> 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16
> 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml
> error: internal error: received hangup / error event on socket
>
In this case it seems libvirtd crashed.
> In that case the the libxl-driver log ends with:
> libxl: debug: libxl_dm.c:1495:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
> [...]
> libxl: debug: libxl_event.c:600:libxl__ev_xswatch_register: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
> libxl: debug: libxl_create.c:1560:do_domain_create: ao 0xb2e044f0: inprogress: poller=0xb2e07590, flags=i
> libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
>
> Which I don't think is complete, i.e. there should be more? Not sure if
> this gives a hint for the x86 case too?
>
More hint that libvirtd crashed. Have there been any attempts to
reproduce this outside of the test rig? Or capture a core dump?
Regards,
Jim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-14 11:07 ` Anthony PERARD
@ 2015-05-14 21:27 ` Jim Fehlig
0 siblings, 0 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-14 21:27 UTC (permalink / raw)
To: Anthony PERARD; +Cc: xen-devel, ian.jackson, Ian Campbell
Anthony PERARD wrote:
> On Thu, May 14, 2015 at 11:47:18AM +0100, Ian Campbell wrote:
>
>> I suppose the openstack CI loop doesn't capture anything more
>> interesting than osstest does?
>>
>
> No, nothing else interesting. The next step would be to enable more debug
> output from libvirtd by playing with "log_level" and "log_filters" in
> /etc/libvirtd.conf, but I don't know which filter would be intersting.
>
log_level is already set to DEBUG. And the xen tool logger used by the
libxl driver is also set to XTL_DEBUG. I'm not aware of any more debug
or logging to enable.
Regards,
Jim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-14 21:21 ` Jim Fehlig
@ 2015-05-14 21:31 ` Jim Fehlig
2015-05-15 8:44 ` Ian Campbell
2015-05-15 10:39 ` Anthony PERARD
2 siblings, 0 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-14 21:31 UTC (permalink / raw)
To: Ian Campbell; +Cc: Anthony PERARD, xen-devel, ian.jackson
Jim Fehlig wrote:
> More hint that libvirtd crashed. Have there been any attempts to
> reproduce this outside of the test rig? Or capture a core dump?
>
FYI, I've unsuccessfully tried to reproduce this using config similar to
debian.guest.osstest.cfg.xml.
Regards,
Jim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-14 21:21 ` Jim Fehlig
2015-05-14 21:31 ` Jim Fehlig
@ 2015-05-15 8:44 ` Ian Campbell
2015-05-15 10:39 ` Anthony PERARD
2 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-15 8:44 UTC (permalink / raw)
To: Jim Fehlig; +Cc: Anthony PERARD, xen-devel, ian.jackson
On Thu, 2015-05-14 at 15:21 -0600, Jim Fehlig wrote:
> > FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to
> > have two more instances of this (amd64 and i386)
>
> More cases of qemu not starting. I'm not sure how we can get more
> details about that.
FWIW I dug into this a bit more yesterday having discussed this with Ian
and others a bit.
We wondered if qemu had crashed, but the logs show a time out and libxl
has code in the parent process which receives SIGCHLD and logs + errors
out, so I think it probably isn't that, unless the monitoring code is
buggy somehow (not out of the question, it's probably not exercised
much).
Also we expect that a crash would produce a segfault message on the
kernel console, which didn't appear.
We also considered where stderr was going. libxl redirects std{out,err}
for the qemu to the qemu-dm-debian.guest.osstest.log file, which is
captured and empty.
There was some question about where libvirt's own stderr was going
(/dev/null or perhaps the console) but it doesn't appear as if anything
is going wrong in libvirt itself and as above we capture the std* for
processes which we spawn ourselves.
Lastly libvirtd is still running and is shown in the ps logs captured.
>
> > but with no
> > interesting logs still and a different one on ARM:
> >
> > http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log:
> > 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16
> > 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> > 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> > error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml
> > error: internal error: received hangup / error event on socket
> >
>
> In this case it seems libvirtd crashed.
http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/arndale-lakeside-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs
includes:
2301 ? DLl 00:00:00 0 0 0.0 1.6 ffffff fdget_pos /usr/local/sbin/libvirtd -d
16395 ? S 00:00:00 0 0 0.0 0.5 24b6dc wait \_ /usr/local/sbin/libvirtd -d
16396 ? Ssl 00:00:00 0 0 0.0 1.9 ffffff poll_schedule_timeout \_ /usr/local/lib/xen/bin/qemu-system-i386 -xen-domid 1 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait -mon chardev=libxenstat-cmd,mode=control -nodefaults -xen-attach -name debian.guest.osstest -vnc none -display none -nographic -machine xenpv -m 512
So I don't think it has crashed, it's even successfully spawned a qemu
it seems.
Comparing the libxl-driver.log here with the amd64 case:
libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0x7ff4d70595e0 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
[arm stops here, amd64 continues with the remainder]
libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state)
libxl: debug: libxl_event.c:638:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3
libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0: deregister unregistered
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0: deregister unregistered
libxl: error: libxl_dm.c:1565:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1362:domcreate_devmodel_started: device model did not start: -3
libxl: debug: libxl_dm.c:1678:kill_device_model: Device Model signaled
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d702f3c0: deregister unregistered
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d7031290: deregister unregistered
libxl: debug: libxl.c:1701:devices_destroy_cb: forked pid 18588 for destroy of domain 1
libxl: debug: libxl_event.c:1768:libxl__ao_complete: ao 0x7ff4d702ed60: complete, rc=-3
libxl: debug: libxl_event.c:1740:libxl__ao__destroy: ao 0x7ff4d702ed60: destroy
I wonder if we are somehow loosing an event or getting the event loop screwed up.
Perhaps in the amd64 case we are somehow losing the xenstore watch, in
the armhf case we are losing some other fd which interferes with
libvirt's own event loop?
So I think we are looking at either a hang or an event processing SNAFU
rather than a crash.
BTW, in the above there is "Device Model signaled", which indicates that
kill(pid, SIGHUP) returned 0 and not e.g. ESRCH (when it would log
"Device Model already exited") or anything else (when it would log
"failed to kill..."). So the qemu process was actually present.
The host is doing nothing other than running this one test case, so it
doesn't seem likely that we are really hitting the 30s qemu startup
timeout.
Ian.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-14 21:21 ` Jim Fehlig
2015-05-14 21:31 ` Jim Fehlig
2015-05-15 8:44 ` Ian Campbell
@ 2015-05-15 10:39 ` Anthony PERARD
2015-05-15 11:54 ` Ian Campbell
2 siblings, 1 reply; 15+ messages in thread
From: Anthony PERARD @ 2015-05-15 10:39 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel, ian.jackson, Ian Campbell
On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> More hint that libvirtd crashed. Have there been any attempts to
> reproduce this outside of the test rig? Or capture a core dump?
Here are two from the OpenStack CI loop:
http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0
in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out
libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3
Weird, it's the same domain number for both logs :).
Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
which is the service that talk to libvirtd.
It's running libvirt 1.2.14 with:
f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
894d2ff libxl: acquire a job when destroying a domain
6dfec1e libxl: drop virDomainObj lock when destroying a domain
and xen 4.4.1 with:
9369988 libxl: event handling: Break out ao_work_outstanding
f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
4783c99 libxl: In domain death search, start search at first domid we want
188e9c5 libxl: Domain destroy: fork
http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline
No libvirtd carsh.
--
Anthony PERARD
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-15 10:39 ` Anthony PERARD
@ 2015-05-15 11:54 ` Ian Campbell
2015-05-15 15:33 ` Anthony PERARD
0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-05-15 11:54 UTC (permalink / raw)
To: Anthony PERARD; +Cc: Jim Fehlig, xen-devel, ian.jackson
On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote:
> On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> > More hint that libvirtd crashed. Have there been any attempts to
> > reproduce this outside of the test rig? Or capture a core dump?
>
> Here are two from the OpenStack CI loop:
> http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
> http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0
>
> in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
> libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out
> libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3
>
> Weird, it's the same domain number for both logs :).
>
> Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
> which is the service that talk to libvirtd.
>
> It's running libvirt 1.2.14 with:
> f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
> 894d2ff libxl: acquire a job when destroying a domain
> 6dfec1e libxl: drop virDomainObj lock when destroying a domain
> and xen 4.4.1 with:
> 9369988 libxl: event handling: Break out ao_work_outstanding
> f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
> 4783c99 libxl: In domain death search, start search at first domid we want
> 188e9c5 libxl: Domain destroy: fork
> http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline
Interesting.
We didn't used to see these issues, but there has been a rather large
gap where we didn't get useful results due to upheaval from the colo
move and there were other issues (e.g. the crashing issue) which make it
hard to pinpoint a point in time where this didn't happen.
Did you have a previous baseline which didn't exhibit these problems? Or
did it exhibit enough other problems not to be usable?
If we can find some plausible sounding baseline to try (i.e. commit id,
not a commit id + patch queue) then I could try and run some adhoc tests
to establish a baseline.
Perhaps I should try xen.git#stable-4.5 and libvirt.git#1.2.14 in the
first instance? Or I could pick a xen-unstable flight pass from, say,
Easter-ish and try with that?
This seems to be an intermittent bug, so it's not clear that the
bisector is going to be all that useful. However we do do multiple
domain starts now so perhaps the chances of sneaking past are reduced.
Ian.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL
2015-05-15 11:54 ` Ian Campbell
@ 2015-05-15 15:33 ` Anthony PERARD
0 siblings, 0 replies; 15+ messages in thread
From: Anthony PERARD @ 2015-05-15 15:33 UTC (permalink / raw)
To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson
On Fri, May 15, 2015 at 12:54:32PM +0100, Ian Campbell wrote:
> On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote:
> > On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> > > More hint that libvirtd crashed. Have there been any attempts to
> > > reproduce this outside of the test rig? Or capture a core dump?
> >
> > Here are two from the OpenStack CI loop:
> > http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
> > http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0
> >
> > in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
> > libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out
> > libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3)
> > libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3
> >
> > Weird, it's the same domain number for both logs :).
> >
> > Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
> > which is the service that talk to libvirtd.
> >
> > It's running libvirt 1.2.14 with:
> > f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
> > 894d2ff libxl: acquire a job when destroying a domain
> > 6dfec1e libxl: drop virDomainObj lock when destroying a domain
> > and xen 4.4.1 with:
> > 9369988 libxl: event handling: Break out ao_work_outstanding
> > f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
> > 4783c99 libxl: In domain death search, start search at first domid we want
> > 188e9c5 libxl: Domain destroy: fork
> > http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline
>
> Interesting.
>
> We didn't used to see these issues, but there has been a rather large
> gap where we didn't get useful results due to upheaval from the colo
> move and there were other issues (e.g. the crashing issue) which make it
> hard to pinpoint a point in time where this didn't happen.
>
> Did you have a previous baseline which didn't exhibit these problems? Or
> did it exhibit enough other problems not to be usable?
This is the first baseline to be usefull with the CI loop, previous libvirt
release had other issues.
So in short, no.
--
Anthony PERARD
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-05-15 15:33 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-11 12:46 [libvirt test] 55257: regressions - FAIL osstest service user
2015-05-11 13:22 ` Ian Campbell
2015-05-11 16:36 ` Jim Fehlig
2015-05-11 17:02 ` Ian Campbell
2015-05-13 8:46 ` Ian Campbell
2015-05-13 17:46 ` Anthony PERARD
2015-05-14 10:47 ` Ian Campbell
2015-05-14 11:07 ` Anthony PERARD
2015-05-14 21:27 ` Jim Fehlig
2015-05-14 21:21 ` Jim Fehlig
2015-05-14 21:31 ` Jim Fehlig
2015-05-15 8:44 ` Ian Campbell
2015-05-15 10:39 ` Anthony PERARD
2015-05-15 11:54 ` Ian Campbell
2015-05-15 15:33 ` Anthony PERARD
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.