All of lore.kernel.org
 help / color / mirror / Atom feed
* [libvirt test] 55257: regressions - FAIL
@ 2015-05-11 12:46 osstest service user
  2015-05-11 13:22 ` Ian Campbell
  0 siblings, 1 reply; 15+ messages in thread
From: osstest service user @ 2015-05-11 12:46 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson

flight 55257 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/55257/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53854

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-xsm  11 guest-start                  fail   never pass
 test-amd64-amd64-libvirt-xsm 11 guest-start                  fail   never pass
 test-armhf-armhf-libvirt-xsm  6 xen-boot                     fail   never pass
 test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass

version targeted for testing:
 libvirt              8910e063dbafc09695b2100c80213be569abb7ef
baseline version:
 libvirt              fd74e231751334b64af0934b680c5cc62f652453

------------------------------------------------------------
People who touched revisions under test:
  Cole Robinson <crobinso@redhat.com>
------------------------------------------------------------

jobs:
 build-amd64-xsm                                              pass    
 build-armhf-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-amd64-amd64-libvirt-xsm                                 fail    
 test-armhf-armhf-libvirt-xsm                                 fail    
 test-amd64-i386-libvirt-xsm                                  fail    
 test-amd64-amd64-libvirt                                     fail    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/osstest/pub/logs
images: /home/osstest/pub/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit 8910e063dbafc09695b2100c80213be569abb7ef
Author: Cole Robinson <crobinso@redhat.com>
Date:   Wed May 6 18:32:05 2015 -0400

    caps: Fix regression defaulting to host arch
    
    My commit 747761a79 (v1.2.15 only) dropped this bit of logic when filling
    in a default arch in the XML:
    
    -    /* First try to find one matching host arch */
    -    for (i = 0; i < caps->nguests; i++) {
    -        if (caps->guests[i]->ostype == ostype) {
    -            for (j = 0; j < caps->guests[i]->arch.ndomains; j++) {
    -                if (caps->guests[i]->arch.domains[j]->type == domain &&
    -                    caps->guests[i]->arch.id == caps->host.arch)
    -                    return caps->guests[i]->arch.id;
    -            }
    -        }
    -    }
    
    That attempt to match host.arch is important, otherwise we end up
    defaulting to i686 on x86_64 host for KVM, which is not intended.
    Duplicate it in the centralized CapsLookup function.
    
    Additionally add some testcases that would have caught this.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1219191

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-11 12:46 [libvirt test] 55257: regressions - FAIL osstest service user
@ 2015-05-11 13:22 ` Ian Campbell
  2015-05-11 16:36   ` Jim Fehlig
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-05-11 13:22 UTC (permalink / raw)
  To: xen-devel, ian.jackson, Jim Fehlig

On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote:
> flight 55257 libvirt real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/55257/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53854

I fear this may be a new heisenbug.

I fear a heisenbug because flight 53854 passed and there is only one
more, completely unrelated change here.

I saw something similar in
http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an
osstest flight against itself (so not posted to the list). That one had:
>  test-amd64-i386-libvirt      11 guest-start               fail REGR. vs. 53073
>  test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53073

In that case the range of libvirt was more useful than the one commit
here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not
sure if 225aa80246d5 was OK or not

http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libxl-libxl-driver.log ends with:

libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda2 spec.backend=qdisk
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd10750: deregister unregistered
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda1 spec.backend=qdisk
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979ccdd370: deregister unregistered
libxl: debug: libxl_dm.c:1487:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   /usr/local/lib/xen/bin/qemu-system-i386
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -xen-domid
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   1
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -chardev
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -no-shutdown
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -mon
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   chardev=libxl-cmd,mode=control
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -chardev
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -mon
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   chardev=libxenstat-cmd,mode=control
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -nodefaults
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -xen-attach
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -name
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   debian.guest.osstest
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -vnc
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   none
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -display
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   none
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -nographic
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -machine
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   xenpv
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -m
libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   512
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
libxl: debug: libxl_create.c:1549:do_domain_create: ao 0x7f979cce1b10: inprogress: poller=0x7f979ccdd260, flags=i
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state)
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3
libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
libxl: error: libxl_dm.c:1558:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1351:domcreate_devmodel_started: device model did not start: -3
libxl: debug: libxl_dm.c:1671:kill_device_model: Device Model signaled
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd12ce0: deregister unregistered
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd13010: deregister unregistered
libxl: info: libxl.c:1701:devices_destroy_cb: forked pid 18657 for destroy of domain 1
libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x7f979cce1b10: complete, rc=-3

The qemu log is sadly empty so I've no clue why this timed out.

Perhaps there is something in 
http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
I can't make heads nor tail though.


> Tests which did not succeed, but are not blocking:
>  test-amd64-i386-libvirt-xsm  11 guest-start                  fail   never pass
>  test-amd64-amd64-libvirt-xsm 11 guest-start                  fail   never pass
>  test-armhf-armhf-libvirt-xsm  6 xen-boot                     fail   never pass
>  test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
>  test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass
> 
> version targeted for testing:
>  libvirt              8910e063dbafc09695b2100c80213be569abb7ef
> baseline version:
>  libvirt              fd74e231751334b64af0934b680c5cc62f652453
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-11 13:22 ` Ian Campbell
@ 2015-05-11 16:36   ` Jim Fehlig
  2015-05-11 17:02     ` Ian Campbell
  2015-05-13  8:46     ` Ian Campbell
  0 siblings, 2 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-11 16:36 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, ian.jackson

Ian Campbell wrote:
> On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote:
>   
>> flight 55257 libvirt real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/55257/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53854
>>     
>
> I fear this may be a new heisenbug.
>
> I fear a heisenbug because flight 53854 passed and there is only one
> more, completely unrelated change here.
>
> I saw something similar in
> http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an
> osstest flight against itself (so not posted to the list). That one had:
>   
>>  test-amd64-i386-libvirt      11 guest-start               fail REGR. vs. 53073
>>  test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53073
>>     
>
> In that case the range of libvirt was more useful than the one commit
> here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not
> sure if 225aa80246d5 was OK or not
>   

225aa80246d5 only touches the qemu driver and should not affect Xen.

> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libxl-libxl-driver.log ends with:
>
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda2 spec.backend=qdisk
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd10750: deregister unregistered
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda1 spec.backend=qdisk
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979ccdd370: deregister unregistered
> libxl: debug: libxl_dm.c:1487:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   /usr/local/lib/xen/bin/qemu-system-i386
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -xen-domid
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   1
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -chardev
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -no-shutdown
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -mon
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   chardev=libxl-cmd,mode=control
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -chardev
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -mon
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   chardev=libxenstat-cmd,mode=control
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -nodefaults
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -xen-attach
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -name
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   debian.guest.osstest
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -vnc
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   none
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -display
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   none
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -nographic
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -machine
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   xenpv
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   -m
> libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm:   512
> libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
> libxl: debug: libxl_create.c:1549:do_domain_create: ao 0x7f979cce1b10: inprogress: poller=0x7f979ccdd260, flags=i
> libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
> libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state)
> libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3
> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered
> libxl: error: libxl_dm.c:1558:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1351:domcreate_devmodel_started: device model did not start: -3
> libxl: debug: libxl_dm.c:1671:kill_device_model: Device Model signaled
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd12ce0: deregister unregistered
> libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd13010: deregister unregistered
> libxl: info: libxl.c:1701:devices_destroy_cb: forked pid 18657 for destroy of domain 1
> libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x7f979cce1b10: complete, rc=-3
>
> The qemu log is sadly empty so I've no clue why this timed out.
>   

I guess qemu didn't run at all...

> Perhaps there is something in 
> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> I can't make heads nor tail though.
>   

Nothing interesting.  Only the unhelpful

2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
internal error: libxenlight failed to create new domain
'debian.guest.osstest'

Off topic, but I'd really like to improve reporting of libxl errors in
libvirt.  Currently, when calls to libxl_foo fail, libvirt simply
reports something like "libxenlight failed foo".  Users must resort to
/var/log/libvirt/libxl/libxl-driver.log and
/var/log/xen/qemu-dm-<domname>.log for details.  Perhaps a topic for the
dev summit.

Regards,
Jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-11 16:36   ` Jim Fehlig
@ 2015-05-11 17:02     ` Ian Campbell
  2015-05-13  8:46     ` Ian Campbell
  1 sibling, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-11 17:02 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, ian.jackson

On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> Ian Campbell wrote:
> > On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote:
> >   
> >> flight 55257 libvirt real [real]
> >> http://logs.test-lab.xenproject.org/osstest/logs/55257/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >>  test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53854
> >>     
> >
> > I fear this may be a new heisenbug.
> >
> > I fear a heisenbug because flight 53854 passed and there is only one
> > more, completely unrelated change here.
> >
> > I saw something similar in
> > http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an
> > osstest flight against itself (so not posted to the list). That one had:
> >   
> >>  test-amd64-i386-libvirt      11 guest-start               fail REGR. vs. 53073
> >>  test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 53073
> >>     
> >
> > In that case the range of libvirt was more useful than the one commit
> > here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not
> > sure if 225aa80246d5 was OK or not
> >   
> 
> 225aa80246d5 only touches the qemu driver and should not affect Xen.

I meant the state of the tree at that point rather than that commit
itself.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-11 16:36   ` Jim Fehlig
  2015-05-11 17:02     ` Ian Campbell
@ 2015-05-13  8:46     ` Ian Campbell
  2015-05-13 17:46       ` Anthony PERARD
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-05-13  8:46 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, ian.jackson

On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
[...]
> > The qemu log is sadly empty so I've no clue why this timed out.
> >   
> 
> I guess qemu didn't run at all...
> 
> > Perhaps there is something in 
> > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > I can't make heads nor tail though.
> >   
> 
> Nothing interesting.  Only the unhelpful
> 
> 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> internal error: libxenlight failed to create new domain
> 'debian.guest.osstest'

This happened again in
http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html

Is there anything we could tweak in osstest to produce more helpful
logging?

> Off topic, but I'd really like to improve reporting of libxl errors in
> libvirt.  Currently, when calls to libxl_foo fail, libvirt simply
> reports something like "libxenlight failed foo".  Users must resort to
> /var/log/libvirt/libxl/libxl-driver.log and
> /var/log/xen/qemu-dm-<domname>.log for details.  Perhaps a topic for the
> dev summit.

Indeed.

One thing we would like to do is to have more specific error codes so
that ERROR_FAIL is not returned everywhere. The xapi guys would like
this too. In general we are happy to have error codes which are used for
exactly one specific type of failure and to take patches to switch
things from ERROR_FAIL to use something more meaningful.

Other ideas:

A logger which, as well as logging, would cache the last N messages of a
certain priority or higher, in such a way that the caller could query
them and output them. If the priority was >= ERROR I think that would on
most failures get you the most relevant things.

I wonder if it would even be possible to buffer up all of the calls to a
given libxl_* entry point, in such a way that the messages associated
with exactly that call could be retrieved. If we could find a way to
integrate that with, say, the GC_INIT infrastructure then we would get
it for free almost everywhere (not sure how recursive calls to libxl_*
rather than libxl__* would be handled there).

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-13  8:46     ` Ian Campbell
@ 2015-05-13 17:46       ` Anthony PERARD
  2015-05-14 10:47         ` Ian Campbell
  0 siblings, 1 reply; 15+ messages in thread
From: Anthony PERARD @ 2015-05-13 17:46 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson

On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
> On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> [...]
> > > The qemu log is sadly empty so I've no clue why this timed out.
> > >   
> > 
> > I guess qemu didn't run at all...
> > 
> > > Perhaps there is something in 
> > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > > I can't make heads nor tail though.
> > >   
> > 
> > Nothing interesting.  Only the unhelpful
> > 
> > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> > internal error: libxenlight failed to create new domain
> > 'debian.guest.osstest'
> 
> This happened again in
> http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
> 
> Is there anything we could tweak in osstest to produce more helpful
> logging?

Well we can find in var-log-libvirt-libvirtd.log.gz this:
2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'

And for more information we need to look into the driver specific log,
libxl logs in var-log-libvirt-libxl-libxl-driver.log:
libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out

I'm seeing this error a lot on our OpenStack CI loop, I thought the error
was due to the "host" been very busy, but if osstest is having the same
issue, then there is probably something wrong with libxl+libvirt :(.

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-13 17:46       ` Anthony PERARD
@ 2015-05-14 10:47         ` Ian Campbell
  2015-05-14 11:07           ` Anthony PERARD
  2015-05-14 21:21           ` Jim Fehlig
  0 siblings, 2 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-14 10:47 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: Jim Fehlig, xen-devel, ian.jackson

On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote:
> On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
> > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> > [...]
> > > > The qemu log is sadly empty so I've no clue why this timed out.
> > > >   
> > > 
> > > I guess qemu didn't run at all...
> > > 
> > > > Perhaps there is something in 
> > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > > > I can't make heads nor tail though.
> > > >   
> > > 
> > > Nothing interesting.  Only the unhelpful
> > > 
> > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> > > internal error: libxenlight failed to create new domain
> > > 'debian.guest.osstest'
> > 
> > This happened again in
> > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
> > 
> > Is there anything we could tweak in osstest to produce more helpful
> > logging?
> 
> Well we can find in var-log-libvirt-libvirtd.log.gz this:
> 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
> 
> And for more information we need to look into the driver specific log,
> libxl logs in var-log-libvirt-libxl-libxl-driver.log:
> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out

Thanks, all of that was mentioned earlier in the thread too, I was
looking for ways to get more info.

> I'm seeing this error a lot on our OpenStack CI loop, I thought the error
> was due to the "host" been very busy, but if osstest is having the same
> issue, then there is probably something wrong with libxl+libvirt :(.

Are you able to reproduce at will or is it like osstest and just a
sporadic failure?

I suppose the openstack CI loop doesn't capture anything more
interesting than osstest does?

FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to
have two more instances of this (amd64 and i386), but with no 
interesting logs still and a different one on ARM:

http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log:
2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16
2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml
error: internal error: received hangup / error event on socket

In that case the the libxl-driver log ends with:
libxl: debug: libxl_dm.c:1495:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
[...]
libxl: debug: libxl_event.c:600:libxl__ev_xswatch_register: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
libxl: debug: libxl_create.c:1560:do_domain_create: ao 0xb2e044f0: inprogress: poller=0xb2e07590, flags=i
libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state

Which I don't think is complete, i.e. there should be more? Not sure if
this gives a hint for the x86 case too?

I don't see anything useful in
http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/arndale-lakeside---var-log-libvirt-libvirtd.log.gz

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-14 10:47         ` Ian Campbell
@ 2015-05-14 11:07           ` Anthony PERARD
  2015-05-14 21:27             ` Jim Fehlig
  2015-05-14 21:21           ` Jim Fehlig
  1 sibling, 1 reply; 15+ messages in thread
From: Anthony PERARD @ 2015-05-14 11:07 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson

On Thu, May 14, 2015 at 11:47:18AM +0100, Ian Campbell wrote:
> On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote:
> > On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
> > > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
> > > [...]
> > > > > The qemu log is sadly empty so I've no clue why this timed out.
> > > > >   
> > > > 
> > > > I guess qemu didn't run at all...
> > > > 
> > > > > Perhaps there is something in 
> > > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
> > > > > I can't make heads nor tail though.
> > > > >   
> > > > 
> > > > Nothing interesting.  Only the unhelpful
> > > > 
> > > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
> > > > internal error: libxenlight failed to create new domain
> > > > 'debian.guest.osstest'
> > > 
> > > This happened again in
> > > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
> > > 
> > > Is there anything we could tweak in osstest to produce more helpful
> > > logging?
> > 
> > Well we can find in var-log-libvirt-libvirtd.log.gz this:
> > 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
> > 
> > And for more information we need to look into the driver specific log,
> > libxl logs in var-log-libvirt-libxl-libxl-driver.log:
> > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
> 
> Thanks, all of that was mentioned earlier in the thread too, I was
> looking for ways to get more info.
> 
> > I'm seeing this error a lot on our OpenStack CI loop, I thought the error
> > was due to the "host" been very busy, but if osstest is having the same
> > issue, then there is probably something wrong with libxl+libvirt :(.
> 
> Are you able to reproduce at will or is it like osstest and just a
> sporadic failure?

Like osstest. I haven't spend much time on it yet, I did not try to
reproduce it yet.

> I suppose the openstack CI loop doesn't capture anything more
> interesting than osstest does?

No, nothing else interesting. The next step would be to enable more debug
output from libvirtd by playing with "log_level" and "log_filters" in
/etc/libvirtd.conf, but I don't know which filter would be intersting.

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-14 10:47         ` Ian Campbell
  2015-05-14 11:07           ` Anthony PERARD
@ 2015-05-14 21:21           ` Jim Fehlig
  2015-05-14 21:31             ` Jim Fehlig
                               ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-14 21:21 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Anthony PERARD, xen-devel, ian.jackson

Ian Campbell wrote:
> On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote:
>   
>> On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote:
>>     
>>> On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote:
>>> [...]
>>>       
>>>>> The qemu log is sadly empty so I've no clue why this timed out.
>>>>>   
>>>>>           
>>>> I guess qemu didn't run at all...
>>>>
>>>>         
>>>>> Perhaps there is something in 
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz
>>>>> I can't make heads nor tail though.
>>>>>   
>>>>>           
>>>> Nothing interesting.  Only the unhelpful
>>>>
>>>> 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 :
>>>> internal error: libxenlight failed to create new domain
>>>> 'debian.guest.osstest'
>>>>         
>>> This happened again in
>>> http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html
>>>
>>> Is there anything we could tweak in osstest to produce more helpful
>>> logging?
>>>       
>> Well we can find in var-log-libvirt-libvirtd.log.gz this:
>> 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest'
>>
>> And for more information we need to look into the driver specific log,
>> libxl logs in var-log-libvirt-libxl-libxl-driver.log:
>> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
>>     
>
> Thanks, all of that was mentioned earlier in the thread too, I was
> looking for ways to get more info.
>
>   
>> I'm seeing this error a lot on our OpenStack CI loop, I thought the error
>> was due to the "host" been very busy, but if osstest is having the same
>> issue, then there is probably something wrong with libxl+libvirt :(.
>>     
>
> Are you able to reproduce at will or is it like osstest and just a
> sporadic failure?
>
> I suppose the openstack CI loop doesn't capture anything more
> interesting than osstest does?
>
> FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to
> have two more instances of this (amd64 and i386)

More cases of qemu not starting.  I'm not sure how we can get more
details about that.

>  but with no 
> interesting logs still and a different one on ARM:
>
> http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log:
> 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16
> 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml
> error: internal error: received hangup / error event on socket
>   

In this case it seems libvirtd crashed.

> In that case the the libxl-driver log ends with:
> libxl: debug: libxl_dm.c:1495:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments:
> [...]
> libxl: debug: libxl_event.c:600:libxl__ev_xswatch_register: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3
> libxl: debug: libxl_create.c:1560:do_domain_create: ao 0xb2e044f0: inprogress: poller=0xb2e07590, flags=i
> libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state
>
> Which I don't think is complete, i.e. there should be more? Not sure if
> this gives a hint for the x86 case too?
>   

More hint that libvirtd crashed.  Have there been any attempts to
reproduce this outside of the test rig?  Or capture a core dump?

Regards,
Jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-14 11:07           ` Anthony PERARD
@ 2015-05-14 21:27             ` Jim Fehlig
  0 siblings, 0 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-14 21:27 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: xen-devel, ian.jackson, Ian Campbell

Anthony PERARD wrote:
> On Thu, May 14, 2015 at 11:47:18AM +0100, Ian Campbell wrote:
>   
>> I suppose the openstack CI loop doesn't capture anything more
>> interesting than osstest does?
>>     
>
> No, nothing else interesting. The next step would be to enable more debug
> output from libvirtd by playing with "log_level" and "log_filters" in
> /etc/libvirtd.conf, but I don't know which filter would be intersting.
>   

log_level is already set to DEBUG.  And the xen tool logger used by the
libxl driver is also set to XTL_DEBUG.  I'm not aware of any more debug
or logging to enable.

Regards,
Jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-14 21:21           ` Jim Fehlig
@ 2015-05-14 21:31             ` Jim Fehlig
  2015-05-15  8:44             ` Ian Campbell
  2015-05-15 10:39             ` Anthony PERARD
  2 siblings, 0 replies; 15+ messages in thread
From: Jim Fehlig @ 2015-05-14 21:31 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Anthony PERARD, xen-devel, ian.jackson

Jim Fehlig wrote:
> More hint that libvirtd crashed.  Have there been any attempts to
> reproduce this outside of the test rig?  Or capture a core dump?
>   

FYI, I've unsuccessfully tried to reproduce this using config similar to
debian.guest.osstest.cfg.xml.

Regards,
Jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-14 21:21           ` Jim Fehlig
  2015-05-14 21:31             ` Jim Fehlig
@ 2015-05-15  8:44             ` Ian Campbell
  2015-05-15 10:39             ` Anthony PERARD
  2 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-15  8:44 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: Anthony PERARD, xen-devel, ian.jackson

On Thu, 2015-05-14 at 15:21 -0600, Jim Fehlig wrote:

> > FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to
> > have two more instances of this (amd64 and i386)
> 
> More cases of qemu not starting.  I'm not sure how we can get more
> details about that.

FWIW I dug into this a bit more yesterday having discussed this with Ian
and others a bit.

We wondered if qemu had crashed, but the logs show a time out and libxl
has code in the parent process which receives SIGCHLD and logs + errors
out, so I think it probably isn't that, unless the monitoring code is
buggy somehow (not out of the question, it's probably not exercised
much).

Also we expect that a crash would produce a segfault message on the
kernel console, which didn't appear.

We also considered where stderr was going. libxl redirects std{out,err}
for the qemu to the qemu-dm-debian.guest.osstest.log file, which is
captured and empty.

There was some question about where libvirt's own stderr was going
(/dev/null or perhaps the console) but it doesn't appear as if anything
is going wrong in libvirt itself and as above we capture the std* for
processes which we spawn ourselves.

Lastly libvirtd is still running and is shown in the ps logs captured.

> 
> >  but with no 
> > interesting logs still and a different one on ARM:
> >
> > http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log:
> > 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16
> > 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> > 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds
> > error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml
> > error: internal error: received hangup / error event on socket
> >   
> 
> In this case it seems libvirtd crashed.

http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/arndale-lakeside-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs 

includes:
 2301 ?        DLl  00:00:00   0   0  0.0  1.6 ffffff fdget_pos                 /usr/local/sbin/libvirtd -d
16395 ?        S    00:00:00   0   0  0.0  0.5 24b6dc wait                       \_ /usr/local/sbin/libvirtd -d
16396 ?        Ssl  00:00:00   0   0  0.0  1.9 ffffff poll_schedule_timeout          \_ /usr/local/lib/xen/bin/qemu-system-i386 -xen-domid 1 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait -mon chardev=libxenstat-cmd,mode=control -nodefaults -xen-attach -name debian.guest.osstest -vnc none -display none -nographic -machine xenpv -m 512

So I don't think it has crashed, it's even successfully spawned a qemu
it seems.

Comparing the libxl-driver.log here with the amd64 case:

libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0x7ff4d70595e0 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state

[arm stops here, amd64 continues with the remainder]

libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state)
libxl: debug: libxl_event.c:638:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3
libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0: deregister unregistered
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0: deregister unregistered
libxl: error: libxl_dm.c:1565:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1362:domcreate_devmodel_started: device model did not start: -3
libxl: debug: libxl_dm.c:1678:kill_device_model: Device Model signaled
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d702f3c0: deregister unregistered
libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d7031290: deregister unregistered
libxl: debug: libxl.c:1701:devices_destroy_cb: forked pid 18588 for destroy of domain 1
libxl: debug: libxl_event.c:1768:libxl__ao_complete: ao 0x7ff4d702ed60: complete, rc=-3
libxl: debug: libxl_event.c:1740:libxl__ao__destroy: ao 0x7ff4d702ed60: destroy

I wonder if we are somehow loosing an event or getting the event loop screwed up.

Perhaps in the amd64 case we are somehow losing the xenstore watch, in
the armhf case we are losing some other fd which interferes with
libvirt's own event loop?

So I think we are looking at either a hang or an event processing SNAFU
rather than a crash.

BTW, in the above there is "Device Model signaled", which indicates that
kill(pid, SIGHUP) returned 0 and not e.g. ESRCH (when it would log
"Device Model already exited") or anything else (when it would log
"failed to kill..."). So the qemu process was actually present.

The host is doing nothing other than running this one test case, so it
doesn't seem likely that we are really hitting the 30s qemu startup
timeout.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-14 21:21           ` Jim Fehlig
  2015-05-14 21:31             ` Jim Fehlig
  2015-05-15  8:44             ` Ian Campbell
@ 2015-05-15 10:39             ` Anthony PERARD
  2015-05-15 11:54               ` Ian Campbell
  2 siblings, 1 reply; 15+ messages in thread
From: Anthony PERARD @ 2015-05-15 10:39 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, ian.jackson, Ian Campbell

On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> More hint that libvirtd crashed.  Have there been any attempts to
> reproduce this outside of the test rig?  Or capture a core dump?

Here are two from the OpenStack CI loop:
http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0

in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out
libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3

Weird, it's the same domain number for both logs :).

Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
which is the service that talk to libvirtd.

It's running libvirt 1.2.14 with:
    f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
    894d2ff libxl: acquire a job when destroying a domain
    6dfec1e libxl: drop virDomainObj lock when destroying a domain
and xen 4.4.1 with:
    9369988 libxl: event handling: Break out ao_work_outstanding
    f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
    4783c99 libxl: In domain death search, start search at first domid we want
    188e9c5 libxl: Domain destroy: fork
http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline

No libvirtd carsh.

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-15 10:39             ` Anthony PERARD
@ 2015-05-15 11:54               ` Ian Campbell
  2015-05-15 15:33                 ` Anthony PERARD
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-05-15 11:54 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: Jim Fehlig, xen-devel, ian.jackson

On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote:
> On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> > More hint that libvirtd crashed.  Have there been any attempts to
> > reproduce this outside of the test rig?  Or capture a core dump?
> 
> Here are two from the OpenStack CI loop:
> http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
> http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0
> 
> in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
> libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out
> libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3
> 
> Weird, it's the same domain number for both logs :).
> 
> Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
> which is the service that talk to libvirtd.
> 
> It's running libvirt 1.2.14 with:
>     f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
>     894d2ff libxl: acquire a job when destroying a domain
>     6dfec1e libxl: drop virDomainObj lock when destroying a domain
> and xen 4.4.1 with:
>     9369988 libxl: event handling: Break out ao_work_outstanding
>     f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
>     4783c99 libxl: In domain death search, start search at first domid we want
>     188e9c5 libxl: Domain destroy: fork
> http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline

Interesting.

We didn't used to see these issues, but there has been a rather large
gap where we didn't get useful results due to upheaval from the colo
move and there were other issues (e.g. the crashing issue) which make it
hard to pinpoint a point in time where this didn't happen.

Did you have a previous baseline which didn't exhibit these problems? Or
did it exhibit enough other problems not to be usable?

If we can find some plausible sounding baseline to try (i.e. commit id,
not a commit id + patch queue) then I could try and run some adhoc tests
to establish a baseline.

Perhaps I should try xen.git#stable-4.5 and libvirt.git#1.2.14 in the
first instance? Or I could pick a xen-unstable flight pass from, say,
Easter-ish and try with that?

This seems to be an intermittent bug, so it's not clear that the
bisector is going to be all that useful. However we do do multiple
domain starts now so perhaps the chances of sneaking past are reduced.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libvirt test] 55257: regressions - FAIL
  2015-05-15 11:54               ` Ian Campbell
@ 2015-05-15 15:33                 ` Anthony PERARD
  0 siblings, 0 replies; 15+ messages in thread
From: Anthony PERARD @ 2015-05-15 15:33 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson

On Fri, May 15, 2015 at 12:54:32PM +0100, Ian Campbell wrote:
> On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote:
> > On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> > > More hint that libvirtd crashed.  Have there been any attempts to
> > > reproduce this outside of the test rig?  Or capture a core dump?
> > 
> > Here are two from the OpenStack CI loop:
> > http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
> > http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0
> > 
> > in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
> > libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out
> > libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3)
> > libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3
> > 
> > Weird, it's the same domain number for both logs :).
> > 
> > Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
> > which is the service that talk to libvirtd.
> > 
> > It's running libvirt 1.2.14 with:
> >     f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
> >     894d2ff libxl: acquire a job when destroying a domain
> >     6dfec1e libxl: drop virDomainObj lock when destroying a domain
> > and xen 4.4.1 with:
> >     9369988 libxl: event handling: Break out ao_work_outstanding
> >     f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
> >     4783c99 libxl: In domain death search, start search at first domid we want
> >     188e9c5 libxl: Domain destroy: fork
> > http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline
> 
> Interesting.
> 
> We didn't used to see these issues, but there has been a rather large
> gap where we didn't get useful results due to upheaval from the colo
> move and there were other issues (e.g. the crashing issue) which make it
> hard to pinpoint a point in time where this didn't happen.
> 
> Did you have a previous baseline which didn't exhibit these problems? Or
> did it exhibit enough other problems not to be usable?

This is the first baseline to be usefull with the CI loop, previous libvirt
release had other issues.
So in short, no.

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-05-15 15:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-11 12:46 [libvirt test] 55257: regressions - FAIL osstest service user
2015-05-11 13:22 ` Ian Campbell
2015-05-11 16:36   ` Jim Fehlig
2015-05-11 17:02     ` Ian Campbell
2015-05-13  8:46     ` Ian Campbell
2015-05-13 17:46       ` Anthony PERARD
2015-05-14 10:47         ` Ian Campbell
2015-05-14 11:07           ` Anthony PERARD
2015-05-14 21:27             ` Jim Fehlig
2015-05-14 21:21           ` Jim Fehlig
2015-05-14 21:31             ` Jim Fehlig
2015-05-15  8:44             ` Ian Campbell
2015-05-15 10:39             ` Anthony PERARD
2015-05-15 11:54               ` Ian Campbell
2015-05-15 15:33                 ` Anthony PERARD

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.