All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [osstest test] 60719: tolerable FAIL - PUSHED
       [not found] <osstest-60719-mainreport@xen.org>
@ 2015-08-21  8:05 ` Ian Campbell
  2015-08-21 14:02   ` Wei Liu
  2015-08-27  3:33   ` Jim Fehlig
  0 siblings, 2 replies; 18+ messages in thread
From: Ian Campbell @ 2015-08-21  8:05 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu, Jim Fehlig; +Cc: xen-devel

On Wed, 2015-08-19 at 00:18 +0000, osstest service owner wrote:
> flight 60719 osstest real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/60719/
> 
> Failures :-/ but no regressions.
> 
> Tests which did not succeed, but are not blocking:
> [...]
>  test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
>  test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass

All of the pending changes are now in production, the libvirt migration
test is now failing in the production colo with:

2015-08-18 19:07:36 Z executing ssh ... root@172.16.144.34 virsh migrate --live debian.guest.osstest xen+ssh://pinot1
error: unable to connect to 'pinot1.test-lab.xenproject.org:49152': Invalid argument

>From the _controller_ pinot1.test-lab.xenproject.org is valid:

    ianc@osstest    :~$ ping -c 1 pinot1.test-lab.xenproject.org
    PING pinot1.test-lab.xenproject.org (172.16.144.35) 56(84) bytes of data.
    64 bytes from 172.16.144.35: icmp_req=1 ttl=64 time=0.258 ms

    --- pinot1.test-lab.xenproject.org ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.258/0.258/0.258/0.000 ms

Maybe the test boxes are seeing a different view of DNS, but I doubt
it. Also I note that the failure is "Invalid argument" and not "Unknown
host".

Anyone got any idea what is going on?

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-08-21  8:05 ` [osstest test] 60719: tolerable FAIL - PUSHED Ian Campbell
@ 2015-08-21 14:02   ` Wei Liu
  2015-08-22  7:25     ` Ian Campbell
  2015-08-27  3:33   ` Jim Fehlig
  1 sibling, 1 reply; 18+ messages in thread
From: Wei Liu @ 2015-08-21 14:02 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, Jim Fehlig, Ian Jackson, xen-devel

On Fri, Aug 21, 2015 at 09:05:30AM +0100, Ian Campbell wrote:
> On Wed, 2015-08-19 at 00:18 +0000, osstest service owner wrote:
> > flight 60719 osstest real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/60719/
> > 
> > Failures :-/ but no regressions.
> > 
> > Tests which did not succeed, but are not blocking:
> > [...]
> >  test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
> >  test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
> 
> All of the pending changes are now in production, the libvirt migration
> test is now failing in the production colo with:
> 
> 2015-08-18 19:07:36 Z executing ssh ... root@172.16.144.34 virsh migrate --live debian.guest.osstest xen+ssh://pinot1
> error: unable to connect to 'pinot1.test-lab.xenproject.org:49152': Invalid argument
> 
> >From the _controller_ pinot1.test-lab.xenproject.org is valid:
> 
>     ianc@osstest    :~$ ping -c 1 pinot1.test-lab.xenproject.org
>     PING pinot1.test-lab.xenproject.org (172.16.144.35) 56(84) bytes of data.
>     64 bytes from 172.16.144.35: icmp_req=1 ttl=64 time=0.258 ms
> 
>     --- pinot1.test-lab.xenproject.org ping statistics ---
>     1 packets transmitted, 1 received, 0% packet loss, time 0ms
>     rtt min/avg/max/mdev = 0.258/0.258/0.258/0.000 ms
> 
> Maybe the test boxes are seeing a different view of DNS, but I doubt
> it. Also I note that the failure is "Invalid argument" and not "Unknown
> host".
> 
> Anyone got any idea what is going on?
> 

I notice that libvirtd is not configured to listen to tcp connection,
while tls connection is enabled by default.

# This is disabled by default, uncomment this to enable it.
#listen_tcp = 1


# This is enabled by default, uncomment this to disable it
#listen_tls = 0

I'm not sure if xen+ssh:// requires enabling listen_tcp.

Wei.



> Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-08-21 14:02   ` Wei Liu
@ 2015-08-22  7:25     ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-08-22  7:25 UTC (permalink / raw)
  To: Wei Liu; +Cc: Jim Fehlig, Ian Jackson, xen-devel

On Fri, 2015-08-21 at 15:02 +0100, Wei Liu wrote:
> On Fri, Aug 21, 2015 at 09:05:30AM +0100, Ian Campbell wrote:
> > On Wed, 2015-08-19 at 00:18 +0000, osstest service owner wrote:
> > > flight 60719 osstest real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/60719/
> > > 
> > > Failures :-/ but no regressions.
> > > 
> > > Tests which did not succeed, but are not blocking:
> > > [...]
> > >  test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host 
> > > fail never pass
> > >  test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host 
> > > fail never pass
> > 
> > All of the pending changes are now in production, the libvirt 
> > migration
> > test is now failing in the production colo with:
> > 
> > 2015-08-18 19:07:36 Z executing ssh ... root@172.16.144.34 virsh 
> > migrate --live debian.guest.osstest xen+ssh://pinot1
> > error: unable to connect to 'pinot1.test-lab.xenproject.org:49152': 
> > Invalid argument
> > 
> > > From the _controller_ pinot1.test-lab.xenproject.org is valid:
> > 
> >     ianc@osstest    :~$ ping -c 1 pinot1.test-lab.xenproject.org
> >     PING pinot1.test-lab.xenproject.org (172.16.144.35) 56(84) 
> > bytes of data.
> >     64 bytes from 172.16.144.35: icmp_req=1 ttl=64 time=0.258 ms
> > 
> >     --- pinot1.test-lab.xenproject.org ping statistics ---
> >     1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >     rtt min/avg/max/mdev = 0.258/0.258/0.258/0.000 ms
> > 
> > Maybe the test boxes are seeing a different view of DNS, but I 
> > doubt
> > it. Also I note that the failure is "Invalid argument" and not 
> > "Unknown
> > host".
> > 
> > Anyone got any idea what is going on?
> > 
> 
> I notice that libvirtd is not configured to listen to tcp connection,
> while tls connection is enabled by default.
> 
> # This is disabled by default, uncomment this to enable it.
> #listen_tcp = 1
> 
> 
> # This is enabled by default, uncomment this to disable it
> #listen_tls = 0
> 
> I'm not sure if xen+ssh:// requires enabling listen_tcp.

Whether or not it does the failure here is Invalid Argument, not
Connection Refused, so I don't think we are getting as far as that
particular failure.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-08-21  8:05 ` [osstest test] 60719: tolerable FAIL - PUSHED Ian Campbell
  2015-08-21 14:02   ` Wei Liu
@ 2015-08-27  3:33   ` Jim Fehlig
  2015-09-01 12:47     ` Ian Jackson
  1 sibling, 1 reply; 18+ messages in thread
From: Jim Fehlig @ 2015-08-27  3:33 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson, Wei Liu; +Cc: xen-devel

On 08/21/2015 02:05 AM, Ian Campbell wrote:
> On Wed, 2015-08-19 at 00:18 +0000, osstest service owner wrote:
>> flight 60719 osstest real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/60719/
>>
>> Failures :-/ but no regressions.
>>
>> Tests which did not succeed, but are not blocking:
>> [...]
>>   test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
>>   test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
> All of the pending changes are now in production, the libvirt migration
> test is now failing in the production colo with:
>
> 2015-08-18 19:07:36 Z executing ssh ... root@172.16.144.34 virsh migrate --live debian.guest.osstest xen+ssh://pinot1
> error: unable to connect to 'pinot1.test-lab.xenproject.org:49152': Invalid argument

This sounds a bit like an issue discussed in the Redhat libvirt troubleshooting FAQ

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-Migration_fails_with_Error_unable_to_resolve_address

>
>  From the _controller_ pinot1.test-lab.xenproject.org is valid:
>
>      ianc@osstest    :~$ ping -c 1 pinot1.test-lab.xenproject.org
>      PING pinot1.test-lab.xenproject.org (172.16.144.35) 56(84) bytes of data.
>      64 bytes from 172.16.144.35: icmp_req=1 ttl=64 time=0.258 ms
>
>      --- pinot1.test-lab.xenproject.org ping statistics ---
>      1 packets transmitted, 1 received, 0% packet loss, time 0ms
>      rtt min/avg/max/mdev = 0.258/0.258/0.258/0.000 ms
>
> Maybe the test boxes are seeing a different view of DNS, but I doubt
> it. Also I note that the failure is "Invalid argument" and not "Unknown
> host".

Right. If it is a DNS issue, error handling in the libvirt libxl migration code 
needs improving.

Regards,
Jim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-08-27  3:33   ` Jim Fehlig
@ 2015-09-01 12:47     ` Ian Jackson
  2015-09-01 13:14       ` Ian Campbell
  2015-09-04  2:47       ` Jim Fehlig
  0 siblings, 2 replies; 18+ messages in thread
From: Ian Jackson @ 2015-09-01 12:47 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: Wei Liu, Ian Campbell, xen-devel

Jim Fehlig writes ("Re: [osstest test] 60719: tolerable FAIL - PUSHED"):
> This sounds a bit like an issue discussed in the Redhat libvirt troubleshooting FAQ
> 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-Migration_fails_with_Error_unable_to_resolve_address

> Right. If it is a DNS issue, error handling in the libvirt libxl
> migration code needs improving.

I booked out a test host, and (as I expected) forward DNS works, but
reverse DNS on test box IP addresses does not:

root@nocera0:~# host nocera1.test-lab.xenproject.org
nocera1.test-lab.xenproject.org has address 172.16.144.23
root@nocera0:~# host -i 172.16.144.23
Host 23.144.16.172.in-addr.arpa. not found: 3(NXDOMAIN)
root@nocera0:~# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       nocera0.test-lab.xenproject.org nocera0

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
root@nocera0:~# cat /etc/resolv.conf
domain test-lab.xenproject.org
search test-lab.xenproject.org
nameserver 172.16.148.4
nameserver 172.16.144.3
root@nocera0:~#

That admin guide article isn't quite clear, but reading between the
lines and applying some supposition, maybe libvirt is doing a reverse
lookup on some associated IP address ?

I can probably put the test boxes in the reverse DNS, but really I
think at the very least libvirt's error message needs to be improved
too.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-01 12:47     ` Ian Jackson
@ 2015-09-01 13:14       ` Ian Campbell
  2015-09-03  6:38         ` Jim Fehlig
  2015-09-04  2:47       ` Jim Fehlig
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-09-01 13:14 UTC (permalink / raw)
  To: Ian Jackson, Jim Fehlig; +Cc: Wei Liu, xen-devel

On Tue, 2015-09-01 at 13:47 +0100, Ian Jackson wrote:
> Jim Fehlig writes ("Re: [osstest test] 60719: tolerable FAIL - PUSHED"):
> > This sounds a bit like an issue discussed in the Redhat libvirt 
> > troubleshooting FAQ
> > 
> > https://access.redhat.com/documentation/en
> > -US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Admin
> > istration_Guide/sect-Troubleshooting
> > -Common_libvirt_errors_and_troubleshooting.html#sect
> > -Migration_fails_with_Error_unable_to_resolve_address
> 
> > Right. If it is a DNS issue, error handling in the libvirt libxl
> > migration code needs improving.
> 
> I booked out a test host, and (as I expected) forward DNS works, but
> reverse DNS on test box IP addresses does not:

As discussed IRL I was also investigating this using the Cambridge
instance, which does have correct reverse DNS:

    root@moss-bug    :~# host moss-bug.xs.citrite.net
    moss-bug.xs.citrite.net has address 10.80.229.144
    root@moss-bug    :~# host -i 10.80.229.144
    144.229.80.10.in-addr.arpa domain name pointer moss-bug.xs.citrite.net.
    root@moss-bug    :~# domainname -f
    moss-bug.xs.citrite.net
    root@moss-bug    :~# cat /etc/hosts
    127.0.0.1    	    localhost
    127.0.1.1    	    moss-bug.xs.citrite.net    	    moss-bug

    # The following lines are desirable for IPv6 capable hosts
    ::1     localhost ip6-localhost ip6-loopback
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    root@moss-bug    :~#

(previously these machines had a wrong idea about their own FQDN, that is
now fixed)

I am now seeing the same error as the production instance:

2015-09-01 12:10:00 Z executing ssh ... root@10.80.229.144 virsh --debug 0 migrate --live debian.guest.osstest xen+ssh://10.80.228.77
migrate: live(bool): (none)
migrate: domain(optdata): debian.guest.osstest
migrate: desturi(optdata): xen+ssh://10.80.228.77
migrate: found option <domain>: debian.guest.osstest
migrate: <domain> trying as domain NAME
migrate: found option <domain>: debian.guest.osstest
migrate: <domain> trying as domain NAME
error: unable to connect to 'lace-bug.xs.citrite.net:49152': Invalid argument

> That admin guide article isn't quite clear, but reading between the
> lines and applying some supposition, maybe libvirt is doing a reverse
> lookup on some associated IP address ?

Based on the above that doesn't seem to be the case. So...

> I can probably put the test boxes in the reverse DNS,

... this is probably not needed.

>  but really I
> think at the very least libvirt's error message needs to be improved
> too.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-01 13:14       ` Ian Campbell
@ 2015-09-03  6:38         ` Jim Fehlig
  2015-09-03 10:26           ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Jim Fehlig @ 2015-09-03  6:38 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson; +Cc: Wei Liu, xen-devel

On 09/01/2015 07:14 AM, Ian Campbell wrote:
> On Tue, 2015-09-01 at 13:47 +0100, Ian Jackson wrote:
>> Jim Fehlig writes ("Re: [osstest test] 60719: tolerable FAIL - PUSHED"):
>>> This sounds a bit like an issue discussed in the Redhat libvirt
>>> troubleshooting FAQ
>>>
>>> https://access.redhat.com/documentation/en
>>> -US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Admin
>>> istration_Guide/sect-Troubleshooting
>>> -Common_libvirt_errors_and_troubleshooting.html#sect
>>> -Migration_fails_with_Error_unable_to_resolve_address
>>> Right. If it is a DNS issue, error handling in the libvirt libxl
>>> migration code needs improving.
>> I booked out a test host, and (as I expected) forward DNS works, but
>> reverse DNS on test box IP addresses does not:
> As discussed IRL I was also investigating this using the Cambridge
> instance, which does have correct reverse DNS:
>
>      root@moss-bug    :~# host moss-bug.xs.citrite.net
>      moss-bug.xs.citrite.net has address 10.80.229.144
>      root@moss-bug    :~# host -i 10.80.229.144
>      144.229.80.10.in-addr.arpa domain name pointer moss-bug.xs.citrite.net.
>      root@moss-bug    :~# domainname -f
>      moss-bug.xs.citrite.net
>      root@moss-bug    :~# cat /etc/hosts
>      127.0.0.1    	    localhost
>      127.0.1.1    	    moss-bug.xs.citrite.net    	    moss-bug
>
>      # The following lines are desirable for IPv6 capable hosts
>      ::1     localhost ip6-localhost ip6-loopback
>      ff02::1 ip6-allnodes
>      ff02::2 ip6-allrouters
>      root@moss-bug    :~#
>
> (previously these machines had a wrong idea about their own FQDN, that is
> now fixed)
>
> I am now seeing the same error as the production instance:
>
> 2015-09-01 12:10:00 Z executing ssh ... root@10.80.229.144 virsh --debug 0 migrate --live debian.guest.osstest xen+ssh://10.80.228.77
> migrate: live(bool): (none)
> migrate: domain(optdata): debian.guest.osstest
> migrate: desturi(optdata): xen+ssh://10.80.228.77
> migrate: found option <domain>: debian.guest.osstest
> migrate: <domain> trying as domain NAME
> migrate: found option <domain>: debian.guest.osstest
> migrate: <domain> trying as domain NAME
> error: unable to connect to 'lace-bug.xs.citrite.net:49152': Invalid argument

AFAICT, this error means the source libvirtd cannot open a tcp connection to the 
destination libvirtd during the 'perform' phase of migration. In the preceding 
'prepare' phase, the destination libvirtd opened a socket to listen for the 
incoming migration, and passed the connection details back to the source 
libvirtd. The connection details (hostname:port) are generated on the 
destination libvirtd with

virGetHostname():virPortAllocatorAcquire()

virPortAllocatorAcquire() grabs the next available port in a range of ports. 
virGetHostName() attempts to get the FQDN of the host

http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virutil.c;h=cddc78a700c12a4f786a1f6544b92b8ee19c85f5;hb=HEAD#l632

Seems the source libvirtd cannot connect to the hostname:port created by the 
destination libvirtd.

Regards,
Jim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03  6:38         ` Jim Fehlig
@ 2015-09-03 10:26           ` Ian Campbell
  2015-09-03 10:49             ` Ian Jackson
  2015-09-03 11:37             ` Ian Campbell
  0 siblings, 2 replies; 18+ messages in thread
From: Ian Campbell @ 2015-09-03 10:26 UTC (permalink / raw)
  To: Jim Fehlig, Ian Jackson; +Cc: Wei Liu, xen-devel

On Thu, 2015-09-03 at 00:38 -0600, Jim Fehlig wrote:
> AFAICT, this error means the source libvirtd cannot open a tcp connection to the 
> destination libvirtd during the 'perform' phase of migration. In the preceding 
> 'prepare' phase, the destination libvirtd opened a socket to listen for the 
> incoming migration, and passed the connection details back to the source 
> libvirtd. The connection details (hostname:port) are generated on the 
> destination libvirtd with
> 
> virGetHostname():virPortAllocatorAcquire()
> 
> virPortAllocatorAcquire() grabs the next available port in a range of ports. 
> virGetHostName() attempts to get the FQDN of the host
> 
> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virutil.c;h=cddc78a700c12a4f786a1f6544b92b8ee19c85f5;hb=HEAD#l632
>
> Seems the source libvirtd cannot connect to the hostname:port created by the 
> destination libvirtd.

Indeed. I've now got two boxes setup to do this and in the libvirtd.log of
the source host I see:

2015-09-03 10:03:56.154+0000: 3440: error : virNetSocketNewConnectTCP:578 : unable to connect to server at 'lace-bug.xs.citrite.net:49154': Connection refused
2015-09-03 10:03:56.154+0000: 3440: error : libxlDomainMigrationPerform:501 : unable to connect to 'lace-bug.xs.citrite.net:49154': Invalid argument

It seems like libxlDomainMigrationPerform is clobbering the errno from
virNetSocketNewConnectTCP. I sent a patch for that:

http://lists.xen.org/archives/html/xen-devel/2015-09/msg00320.html

Looking further at the test failure on the destination host I see:

2015-09-03 10:03:56.133+0000: 3463: info : virNetSocketNew:277 : RPC_SOCKET_NEW: sock=0x7fbb768807a0 fd=28 errfd=-1 pid=0 localAddr=127.0.1.1;49154, remoteAddr=<null>

Notice that it has bound to 127.0.1.1 and not to 10.80.228.77!

I suspect this is down to:

    root@lace-bug    :/etc/libvirt# cat /etc/hosts
    127.0.0.1    	    localhost
    127.0.1.1    	    lace-bug.xs.citrite.net    	    lace-bug

    # The following lines are desirable for IPv6 capable hosts
    ::1     localhost ip6-localhost ip6-loopback
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters

And in particular the line associating 127.0.1.1 with lace
-bug.xs.citrite.net.

This seems to be a Debian thing, possibly the installer I'm not sure.

https://lists.debian.org/debian-devel/2013/07/msg00809.html looks relevant.

Overall I'm not sure what to do here. The Debian config seems a bit odd,
but I'm not sure if it is actually "wrong". OTOH I'm not sure how libvirt
could be changed to work in this scenario.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 10:26           ` Ian Campbell
@ 2015-09-03 10:49             ` Ian Jackson
  2015-09-03 10:57               ` Ian Campbell
  2015-09-03 16:04               ` Ian Campbell
  2015-09-03 11:37             ` Ian Campbell
  1 sibling, 2 replies; 18+ messages in thread
From: Ian Jackson @ 2015-09-03 10:49 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, Wei Liu, xen-devel

Ian Campbell writes ("Re: [Xen-devel] [osstest test] 60719: tolerable FAIL - PUSHED"):
...
> I suspect this is down to:
> 
>     root@lace-bug    :/etc/libvirt# cat /etc/hosts
>     127.0.0.1    	    localhost
>     127.0.1.1    	    lace-bug.xs.citrite.net    	    lace-bug

This is simply wrong.  It means that when programs on the host try to
find the host's own IP address starting with its host name, they get
different (and wrong) answers to programs on other hosts.

I can see why D-I wants to do this but in our setup it is simply
entirely wrong.  Is there a way to suppress this (from preseed
maybe) ?

> Overall I'm not sure what to do here. The Debian config seems a bit odd,
> but I'm not sure if it is actually "wrong". OTOH I'm not sure how libvirt
> could be changed to work in this scenario.

It might be possible to work around this in libvirt, but this is by no
means libvirt's fault.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 10:49             ` Ian Jackson
@ 2015-09-03 10:57               ` Ian Campbell
  2015-09-03 16:04               ` Ian Campbell
  1 sibling, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-09-03 10:57 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, Wei Liu, xen-devel

On Thu, 2015-09-03 at 11:49 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [osstest test] 60719: tolerable 
> FAIL - PUSHED"):
> ...
> > I suspect this is down to:
> > 
> >     root@lace-bug    :/etc/libvirt# cat /etc/hosts
> >     127.0.0.1    	    localhost
> >     127.0.1.1    	    lace-bug.xs.citrite.net    	   
> >  lace-bug
> 
> This is simply wrong.  It means that when programs on the host try to
> find the host's own IP address starting with its host name, they get
> different (and wrong) answers to programs on other hosts.
> 
> I can see why D-I wants to do this but in our setup it is simply
> entirely wrong.  Is there a way to suppress this (from preseed
> maybe) ?

I'll see if I can find the code which generates this file...

> > Overall I'm not sure what to do here. The Debian config seems a bit 
> > odd,
> > but I'm not sure if it is actually "wrong". OTOH I'm not sure how 
> > libvirt
> > could be changed to work in this scenario.
> 
> It might be possible to work around this in libvirt, but this is by no
> means libvirt's fault.

Right.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 10:26           ` Ian Campbell
  2015-09-03 10:49             ` Ian Jackson
@ 2015-09-03 11:37             ` Ian Campbell
  2015-09-03 16:35               ` Jim Fehlig
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-09-03 11:37 UTC (permalink / raw)
  To: Jim Fehlig, Ian Jackson; +Cc: Wei Liu, xen-devel

On Thu, 2015-09-03 at 11:26 +0100, Ian Campbell wrote:
> 
> Notice that it has bound to 127.0.1.1 and not to 10.80.228.77!

So while I investigate how to make d-i not create these entries I also
removed the line from /etc/hosts such that looking up the FQDN gives the
non-local IP. But:


    root@moss-bug:/var/log# strace -o /tmp/virsh -fff virsh --debug 0 migrate --live debian.guest.osstest xen+ssh    ://10.80.228.77
    migrate: live(bool): (none)
    migrate: domain(optdata): debian.guest.osstest
    migrate: desturi(optdata): xen+ssh://10.80.228.77
    migrate: found option <domain>: debian.guest.osstest
    migrate: <domain> trying as domain NAME
    migrate: found option <domain>: debian.guest.osstest
    migrate: <domain> trying as domain NAME
    error: internal error: Failed to send migration data to destination host

The senders libxl-driver.log says:

2015-09-03 12:29:45 BST libxl-save-helper: debug: starting save: Success
2015-09-03 12:29:45 BST xc: detail: fd 27, dom 3, max_iters 0, max_factor 0, flags 1, hvm 0
2015-09-03 12:29:45 BST xc: info: Saving domain 3, type x86 PV
2015-09-03 12:29:45 BST xc: detail: 64 bits, 4 levels
2015-09-03 12:29:45 BST xc: detail: max_pfn 0x1ffff, p2m_frames 256
2015-09-03 12:29:45 BST xc: detail: max_mfn 0x120000
2015-09-03 12:29:46 BST xc: error: Failed to write page data to stream (104 = Connection reset by peer): Internal error
2015-09-03 12:29:46 BST xc: error: Save failed (104 = Connection reset by peer): Internal error
2015-09-03 12:29:46 BST libxl-save-helper: debug: complete r=-1: Connection reset by peer
2015-09-03 12:29:46 BST libxl: error: libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: domain did not respond to suspend request: Connection reset by peer
2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1874:libxl__ao_complete: ao 0x7f67b3f63e90: complete, rc=-8
2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1843:libxl__ao__destroy: ao 0x7f67b3f63e90: destroy
2015-09-03 12:29:46 BST libxl: debug: libxl.c:526:libxl_domain_resume: ao 0x7f67b3fa44b0: create: how=(nil) callback=(nil) poller=0x7f67a0002610
2015-09-03 12:29:46 BST xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
2015-09-03 12:29:46 BST libxl: error: libxl_dom_suspend.c:409:libxl__domain_resume: xc_domain_resume failed for domain 3: Invalid argument
2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1874:libxl__ao_complete: ao 0x7f67b3fa44b0: complete, rc=-3
2015-09-03 12:29:46 BST libxl: debug: libxl.c:529:libxl_domain_resume: ao 0x7f67b3fa44b0: inprogress: poller=0x7f67a0002610, flags=ic
2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1843:libxl__ao__destroy: ao 0x7f67b3fa44b0: destroy

While the receiver has:

2015-09-03 12:29:45 BST libxl-save-helper: debug: starting restore: Success
2015-09-03 12:29:45 BST xc: detail: fd 31, dom 4, hvm 0, pae 0, superpages 0, checkpointed_stream 0
2015-09-03 12:29:45 BST xc: info: Found x86 PV domain from Xen 4.6
2015-09-03 12:29:45 BST xc: info: Restoring domain
2015-09-03 12:29:45 BST xc: detail: 64 bits, 4 levels
2015-09-03 12:29:45 BST xc: detail: max_mfn 0x120000
2015-09-03 12:29:45 BST xc: detail: Expanded p2m from 0 to 0x1ffff
2015-09-03 12:29:45 BST xc: error: Failed to read 4202504 bytes of data for record (0x00000001, Page data) (11 = Resource temporarily unavailabl): Internal error
2015-09-03 12:29:45 BST xc: error: Restore failed (11 = Resource temporarily unavailabl): Internal error
2015-09-03 12:29:45 BST libxl-save-helper: debug: complete r=-1: Resource temporarily unavailable
2015-09-03 12:29:45 BST libxl: error: libxl_stream_read.c:749:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable
2015-09-03 12:29:45 BST libxl: error: libxl_create.c:1141:domcreate_rebuild_done: cannot (re-)build domain: -3
2015-09-03 12:29:46 BST libxl: debug: libxl.c:1708:devices_destroy_cb: forked pid 18738 for destroy of domain 4
2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1874:libxl__ao_complete: ao 0x7fbb7687e900: complete, rc=-3
2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1843:libxl__ao__destroy: ao 0x7fbb7687e900: destroy


"xc: error: Failed to write page data to stream (104 = Connection reset by
peer): Internal error" seems to be the initial failure.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 10:49             ` Ian Jackson
  2015-09-03 10:57               ` Ian Campbell
@ 2015-09-03 16:04               ` Ian Campbell
  1 sibling, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-09-03 16:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, Wei Liu, xen-devel

On Thu, 2015-09-03 at 11:49 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [osstest test] 60719: tolerable 
> FAIL - PUSHED"):
> ...
> > I suspect this is down to:
> > 
> >     root@lace-bug    :/etc/libvirt# cat /etc/hosts
> >     127.0.0.1    	    localhost
> >     127.0.1.1    	    lace-bug.xs.citrite.net    	   
> >  lace-bug
> 
> This is simply wrong.  It means that when programs on the host try to
> find the host's own IP address starting with its host name, they get
> different (and wrong) answers to programs on other hosts.
> 
> I can see why D-I wants to do this but in our setup it is simply
> entirely wrong.  Is there a way to suppress this (from preseed
> maybe) ?

The responsible component in d-i is netcfg and when it has been told (e.g.
via preseed) to use dhcp it will do as above, with no option to do
otherwise (it _might_ be possible to omit FQDN by not giving the domains
name in preseed, I can't quite figure that out without trying).

If instead preseed is changed to use a static address then it will write
the given static address instead.

So we could change osstest to use static addresses at preseed time,
although that would be problematic if the host was actually dynamic. Also
it seems like we explicitly stopped doing this in:
    commit 28bc2c8875c30209c2f189ba4d87fc401bb78cf6
    Author: Ian Jackson <    iwj@woking.uk.xensource.com    >
    Date:   Thu Aug 18 01:23:40 2011 +0100

        OsstestDebian: use dhcp for installation again (avoids reference to NetNetmask and NetGateway)

We could rewrite /etc/hosts in ts-xen-install (around the time we frob
/etc/network/interfaces) to remove the 127.0.1.1 altogether, meaning that
lookups of a hosts own FQDN would be resolved by DNS instead. THis would be
OK even if we stop switching /e/n/i from DHCP to static too.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 11:37             ` Ian Campbell
@ 2015-09-03 16:35               ` Jim Fehlig
  2015-09-03 16:49                 ` Ian Campbell
  2015-09-10 16:40                 ` Ian Campbell
  0 siblings, 2 replies; 18+ messages in thread
From: Jim Fehlig @ 2015-09-03 16:35 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson; +Cc: Wei Liu, xen-devel

On 09/03/2015 05:37 AM, Ian Campbell wrote:
> On Thu, 2015-09-03 at 11:26 +0100, Ian Campbell wrote:
>> Notice that it has bound to 127.0.1.1 and not to 10.80.228.77!
> So while I investigate how to make d-i not create these entries I also
> removed the line from /etc/hosts such that looking up the FQDN gives the
> non-local IP. But:
>
>
>      root@moss-bug:/var/log# strace -o /tmp/virsh -fff virsh --debug 0 migrate --live debian.guest.osstest xen+ssh    ://10.80.228.77
>      migrate: live(bool): (none)
>      migrate: domain(optdata): debian.guest.osstest
>      migrate: desturi(optdata): xen+ssh://10.80.228.77
>      migrate: found option <domain>: debian.guest.osstest
>      migrate: <domain> trying as domain NAME
>      migrate: found option <domain>: debian.guest.osstest
>      migrate: <domain> trying as domain NAME
>      error: internal error: Failed to send migration data to destination host
>
> The senders libxl-driver.log says:
>
> 2015-09-03 12:29:45 BST libxl-save-helper: debug: starting save: Success
> 2015-09-03 12:29:45 BST xc: detail: fd 27, dom 3, max_iters 0, max_factor 0, flags 1, hvm 0
> 2015-09-03 12:29:45 BST xc: info: Saving domain 3, type x86 PV
> 2015-09-03 12:29:45 BST xc: detail: 64 bits, 4 levels
> 2015-09-03 12:29:45 BST xc: detail: max_pfn 0x1ffff, p2m_frames 256
> 2015-09-03 12:29:45 BST xc: detail: max_mfn 0x120000
> 2015-09-03 12:29:46 BST xc: error: Failed to write page data to stream (104 = Connection reset by peer): Internal error
> 2015-09-03 12:29:46 BST xc: error: Save failed (104 = Connection reset by peer): Internal error
> 2015-09-03 12:29:46 BST libxl-save-helper: debug: complete r=-1: Connection reset by peer
> 2015-09-03 12:29:46 BST libxl: error: libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: domain did not respond to suspend request: Connection reset by peer
> 2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1874:libxl__ao_complete: ao 0x7f67b3f63e90: complete, rc=-8
> 2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1843:libxl__ao__destroy: ao 0x7f67b3f63e90: destroy
> 2015-09-03 12:29:46 BST libxl: debug: libxl.c:526:libxl_domain_resume: ao 0x7f67b3fa44b0: create: how=(nil) callback=(nil) poller=0x7f67a0002610
> 2015-09-03 12:29:46 BST xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
> 2015-09-03 12:29:46 BST libxl: error: libxl_dom_suspend.c:409:libxl__domain_resume: xc_domain_resume failed for domain 3: Invalid argument
> 2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1874:libxl__ao_complete: ao 0x7f67b3fa44b0: complete, rc=-3
> 2015-09-03 12:29:46 BST libxl: debug: libxl.c:529:libxl_domain_resume: ao 0x7f67b3fa44b0: inprogress: poller=0x7f67a0002610, flags=ic
> 2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1843:libxl__ao__destroy: ao 0x7f67b3fa44b0: destroy
>
> While the receiver has:
>
> 2015-09-03 12:29:45 BST libxl-save-helper: debug: starting restore: Success
> 2015-09-03 12:29:45 BST xc: detail: fd 31, dom 4, hvm 0, pae 0, superpages 0, checkpointed_stream 0
> 2015-09-03 12:29:45 BST xc: info: Found x86 PV domain from Xen 4.6
> 2015-09-03 12:29:45 BST xc: info: Restoring domain
> 2015-09-03 12:29:45 BST xc: detail: 64 bits, 4 levels
> 2015-09-03 12:29:45 BST xc: detail: max_mfn 0x120000
> 2015-09-03 12:29:45 BST xc: detail: Expanded p2m from 0 to 0x1ffff
> 2015-09-03 12:29:45 BST xc: error: Failed to read 4202504 bytes of data for record (0x00000001, Page data) (11 = Resource temporarily unavailabl): Internal error
> 2015-09-03 12:29:45 BST xc: error: Restore failed (11 = Resource temporarily unavailabl): Internal error
> 2015-09-03 12:29:45 BST libxl-save-helper: debug: complete r=-1: Resource temporarily unavailable
> 2015-09-03 12:29:45 BST libxl: error: libxl_stream_read.c:749:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable
> 2015-09-03 12:29:45 BST libxl: error: libxl_create.c:1141:domcreate_rebuild_done: cannot (re-)build domain: -3
> 2015-09-03 12:29:46 BST libxl: debug: libxl.c:1708:devices_destroy_cb: forked pid 18738 for destroy of domain 4
> 2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1874:libxl__ao_complete: ao 0x7fbb7687e900: complete, rc=-3
> 2015-09-03 12:29:46 BST libxl: debug: libxl_event.c:1843:libxl__ao__destroy: ao 0x7fbb7687e900: destroy
>
>
> "xc: error: Failed to write page data to stream (104 = Connection reset by
> peer): Internal error" seems to be the initial failure.

I wonder if this has anything to do with migration V2? I noticed a migration 
regression a few days back, but later realized that the sender was 4.5 and 
receiver was 4.6. I planned to see if migration worked through libvirt between 
two 4.6 hosts, but before doing so I had to re-purpose the machines for another 
task. I think libvirt needs some work to accommodate migration V2...

Regards,
Jim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 16:35               ` Jim Fehlig
@ 2015-09-03 16:49                 ` Ian Campbell
  2015-09-10 16:40                 ` Ian Campbell
  1 sibling, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-09-03 16:49 UTC (permalink / raw)
  To: Jim Fehlig, Ian Jackson; +Cc: Wei Liu, xen-devel

On Thu, 2015-09-03 at 10:35 -0600, Jim Fehlig wrote:
> On 09/03/2015 05:37 AM, Ian Campbell wrote:
> > On Thu, 2015-09-03 at 11:26 +0100, Ian Campbell wrote:
> > > Notice that it has bound to 127.0.1.1 and not to 10.80.228.77!
> > So while I investigate how to make d-i not create these entries I also
> > removed the line from /etc/hosts such that looking up the FQDN gives 
> > the
> > non-local IP. But:
> > 
> > 
> >      root@moss-bug:/var/log# strace -o /tmp/virsh -fff virsh --debug 0 
> > migrate --live debian.guest.osstest xen+ssh    ://10.80.228.77
> >      migrate: live(bool): (none)
> >      migrate: domain(optdata): debian.guest.osstest
> >      migrate: desturi(optdata): xen+ssh://10.80.228.77
> >      migrate: found option <domain>: debian.guest.osstest
> >      migrate: <domain> trying as domain NAME
> >      migrate: found option <domain>: debian.guest.osstest
> >      migrate: <domain> trying as domain NAME
> >      error: internal error: Failed to send migration data to 
> > destination host
> > 
> > The senders libxl-driver.log says:
> > 
> > 2015-09-03 12:29:45 BST libxl-save-helper: debug: starting save: 
> > Success
> > 2015-09-03 12:29:45 BST xc: detail: fd 27, dom 3, max_iters 0, 
> > max_factor 0, flags 1, hvm 0
> > 2015-09-03 12:29:45 BST xc: info: Saving domain 3, type x86 PV
> > 2015-09-03 12:29:45 BST xc: detail: 64 bits, 4 levels
> > 2015-09-03 12:29:45 BST xc: detail: max_pfn 0x1ffff, p2m_frames 256
> > 2015-09-03 12:29:45 BST xc: detail: max_mfn 0x120000
> > 2015-09-03 12:29:46 BST xc: error: Failed to write page data to stream 
> > (104 = Connection reset by peer): Internal error
> > 2015-09-03 12:29:46 BST xc: error: Save failed (104 = Connection reset 
> > by peer): Internal error
> > 2015-09-03 12:29:46 BST libxl-save-helper: debug: complete r=-1: 
> > Connection reset by peer
> > 2015-09-03 12:29:46 BST libxl: error: 
> > libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: 
> > domain did not respond to suspend request: Connection reset by peer
> > 2015-09-03 12:29:46 BST libxl: debug: 
> > libxl_event.c:1874:libxl__ao_complete: ao 0x7f67b3f63e90: complete, rc=
> > -8
> > 2015-09-03 12:29:46 BST libxl: debug: 
> > libxl_event.c:1843:libxl__ao__destroy: ao 0x7f67b3f63e90: destroy
> > 2015-09-03 12:29:46 BST libxl: debug: libxl.c:526:libxl_domain_resume: 
> > ao 0x7f67b3fa44b0: create: how=(nil) callback=(nil) 
> > poller=0x7f67a0002610
> > 2015-09-03 12:29:46 BST xc: error: Dom 3 not suspended: (shutdown 0, 
> > reason 255): Internal error
> > 2015-09-03 12:29:46 BST libxl: error: 
> > libxl_dom_suspend.c:409:libxl__domain_resume: xc_domain_resume failed 
> > for domain 3: Invalid argument
> > 2015-09-03 12:29:46 BST libxl: debug: 
> > libxl_event.c:1874:libxl__ao_complete: ao 0x7f67b3fa44b0: complete, rc=
> > -3
> > 2015-09-03 12:29:46 BST libxl: debug: libxl.c:529:libxl_domain_resume: 
> > ao 0x7f67b3fa44b0: inprogress: poller=0x7f67a0002610, flags=ic
> > 2015-09-03 12:29:46 BST libxl: debug: 
> > libxl_event.c:1843:libxl__ao__destroy: ao 0x7f67b3fa44b0: destroy
> > 
> > While the receiver has:
> > 
> > 2015-09-03 12:29:45 BST libxl-save-helper: debug: starting restore: 
> > Success
> > 2015-09-03 12:29:45 BST xc: detail: fd 31, dom 4, hvm 0, pae 0, 
> > superpages 0, checkpointed_stream 0
> > 2015-09-03 12:29:45 BST xc: info: Found x86 PV domain from Xen 4.6
> > 2015-09-03 12:29:45 BST xc: info: Restoring domain
> > 2015-09-03 12:29:45 BST xc: detail: 64 bits, 4 levels
> > 2015-09-03 12:29:45 BST xc: detail: max_mfn 0x120000
> > 2015-09-03 12:29:45 BST xc: detail: Expanded p2m from 0 to 0x1ffff
> > 2015-09-03 12:29:45 BST xc: error: Failed to read 4202504 bytes of data 
> > for record (0x00000001, Page data) (11 = Resource temporarily 
> > unavailabl): Internal error
> > 2015-09-03 12:29:45 BST xc: error: Restore failed (11 = Resource 
> > temporarily unavailabl): Internal error
> > 2015-09-03 12:29:45 BST libxl-save-helper: debug: complete r=-1: 
> > Resource temporarily unavailable
> > 2015-09-03 12:29:45 BST libxl: error: 
> > libxl_stream_read.c:749:libxl__xc_domain_restore_done: restoring 
> > domain: Resource temporarily unavailable
> > 2015-09-03 12:29:45 BST libxl: error: 
> > libxl_create.c:1141:domcreate_rebuild_done: cannot (re-)build domain: 
> > -3
> > 2015-09-03 12:29:46 BST libxl: debug: libxl.c:1708:devices_destroy_cb: 
> > forked pid 18738 for destroy of domain 4
> > 2015-09-03 12:29:46 BST libxl: debug: 
> > libxl_event.c:1874:libxl__ao_complete: ao 0x7fbb7687e900: complete, rc=
> > -3
> > 2015-09-03 12:29:46 BST libxl: debug: 
> > libxl_event.c:1843:libxl__ao__destroy: ao 0x7fbb7687e900: destroy
> > 
> > 
> > "xc: error: Failed to write page data to stream (104 = Connection reset 
> > by
> > peer): Internal error" seems to be the initial failure.
> 
> I wonder if this has anything to do with migration V2?

That would be my first guess.

>  I noticed a migration 
> regression a few days back, but later realized that the sender was 4.5 and 
> receiver was 4.6. I planned to see if migration worked through libvirt between 
> two 4.6 hosts, but before doing so I had to re-purpose the machines for another 
> task.

> I think libvirt needs some work to accommodate migration V2...

If so then I think that would be a bug in libxl, since it is supposed to be
backward compatible at the libxl API level.

(It may also be true that libvirt might like updates to work _better_ with
migration v2)

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-01 12:47     ` Ian Jackson
  2015-09-01 13:14       ` Ian Campbell
@ 2015-09-04  2:47       ` Jim Fehlig
  1 sibling, 0 replies; 18+ messages in thread
From: Jim Fehlig @ 2015-09-04  2:47 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Ian Campbell, xen-devel

On 09/01/2015 06:47 AM, Ian Jackson wrote:
> Jim Fehlig writes ("Re: [osstest test] 60719: tolerable FAIL - PUSHED"):
>> This sounds a bit like an issue discussed in the Redhat libvirt troubleshooting FAQ
>>
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-Migration_fails_with_Error_unable_to_resolve_address
>> Right. If it is a DNS issue, error handling in the libvirt libxl
>> migration code needs improving.
> I booked out a test host, and (as I expected) forward DNS works, but
> reverse DNS on test box IP addresses does not:
>
> root@nocera0:~# host nocera1.test-lab.xenproject.org
> nocera1.test-lab.xenproject.org has address 172.16.144.23
> root@nocera0:~# host -i 172.16.144.23
> Host 23.144.16.172.in-addr.arpa. not found: 3(NXDOMAIN)
> root@nocera0:~# cat /etc/hosts
> 127.0.0.1       localhost
> 127.0.1.1       nocera0.test-lab.xenproject.org nocera0
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> root@nocera0:~# cat /etc/resolv.conf
> domain test-lab.xenproject.org
> search test-lab.xenproject.org
> nameserver 172.16.148.4
> nameserver 172.16.144.3
> root@nocera0:~#
>
> That admin guide article isn't quite clear, but reading between the
> lines and applying some supposition, maybe libvirt is doing a reverse
> lookup on some associated IP address ?
>
> I can probably put the test boxes in the reverse DNS, but really I
> think at the very least libvirt's error message needs to be improved
> too.

The unhelpful "Invalid argument" error has been fixed

http://libvirt.org/git/?p=libvirt.git;a=commit;h=6ce939c2472e8cd97dfe448e902bc878c826351e

Regards,
Jim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-03 16:35               ` Jim Fehlig
  2015-09-03 16:49                 ` Ian Campbell
@ 2015-09-10 16:40                 ` Ian Campbell
  2015-09-12  3:56                   ` Jim Fehlig
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-09-10 16:40 UTC (permalink / raw)
  To: Jim Fehlig, Ian Jackson; +Cc: Wei Liu, xen-devel

On Thu, 2015-09-03 at 10:35 -0600, Jim Fehlig wrote:

> I wonder if this has anything to do with migration V2? I noticed a migration 
> regression a few days back, but later realized that the sender was 4.5 and 
> receiver was 4.6. I planned to see if migration worked through libvirt between 
> two 4.6 hosts, but before doing so I had to re-purpose the machines for another 
> task. I think libvirt needs some work to accommodate migration V2...

So after shaving a bunch of yakks wrt getting my test boxes setup I've
finally tracked this one down...

libvirt is passing libxl a restore (and perhaps save) file descriptor which
is set O_NONBLOCK, which libxl/c doesn't expect and therefore doesn't
handle the resulting EAGAIN.

Ian and I think it would be more convenient for most callers if libxl took
care of this by making the fd blocking again and returning it to the
original state when it was done.

I'll cook up a patch.

I think migr v1 probably had the same requirement, although it may not have
manifested itself as a bug.

Ian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-10 16:40                 ` Ian Campbell
@ 2015-09-12  3:56                   ` Jim Fehlig
  2015-09-16  8:28                     ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Jim Fehlig @ 2015-09-12  3:56 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson; +Cc: Wei Liu, xen-devel

On 09/10/2015 10:40 AM, Ian Campbell wrote:
> On Thu, 2015-09-03 at 10:35 -0600, Jim Fehlig wrote:
>
>> I wonder if this has anything to do with migration V2? I noticed a migration
>> regression a few days back, but later realized that the sender was 4.5 and
>> receiver was 4.6. I planned to see if migration worked through libvirt between
>> two 4.6 hosts, but before doing so I had to re-purpose the machines for another
>> task. I think libvirt needs some work to accommodate migration V2...
> So after shaving a bunch of yakks wrt getting my test boxes setup I've
> finally tracked this one down...

Thanks for investigating this issue! It bubbled to the top of my queue, so I'm 
glad I read this mail before duplicating the effort.

>
> libvirt is passing libxl a restore (and perhaps save) file descriptor which
> is set O_NONBLOCK, which libxl/c doesn't expect and therefore doesn't
> handle the resulting EAGAIN.
>
> Ian and I think it would be more convenient for most callers if libxl took
> care of this by making the fd blocking again and returning it to the
> original state when it was done.
>
> I'll cook up a patch.

I also noticed your patch has been ACK'ed and applied.  Thanks again!

Regards,
Jim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [osstest test] 60719: tolerable FAIL - PUSHED
  2015-09-12  3:56                   ` Jim Fehlig
@ 2015-09-16  8:28                     ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-09-16  8:28 UTC (permalink / raw)
  To: Jim Fehlig, Ian Jackson; +Cc: Wei Liu, xen-devel

On Fri, 2015-09-11 at 21:56 -0600, Jim Fehlig wrote:
> On 09/10/2015 10:40 AM, Ian Campbell wrote:
> > On Thu, 2015-09-03 at 10:35 -0600, Jim Fehlig wrote:
> > 
> > > I wonder if this has anything to do with migration V2? I noticed a
> > > migration
> > > regression a few days back, but later realized that the sender was
> > > 4.5 and
> > > receiver was 4.6. I planned to see if migration worked through
> > > libvirt between
> > > two 4.6 hosts, but before doing so I had to re-purpose the machines
> > > for another
> > > task. I think libvirt needs some work to accommodate migration V2...
> > So after shaving a bunch of yakks wrt getting my test boxes setup I've
> > finally tracked this one down...
> 
> Thanks for investigating this issue! It bubbled to the top of my queue,
> so I'm 
> glad I read this mail before duplicating the effort.
> 
> > 
> > libvirt is passing libxl a restore (and perhaps save) file descriptor
> > which
> > is set O_NONBLOCK, which libxl/c doesn't expect and therefore doesn't
> > handle the resulting EAGAIN.
> > 
> > Ian and I think it would be more convenient for most callers if libxl
> > took
> > care of this by making the fd blocking again and returning it to the
> > original state when it was done.
> > 
> > I'll cook up a patch.
> 
> I also noticed your patch has been ACK'ed and applied.  Thanks again!

No problem.

FYI the osstest fixup to /etc/hosts hit osstest production yesterday
andflight 62004 is the first to pickup both that and the libxl fix, it's
doing the build phase about now, so I'd expect actual the results tomorrow.
Hopefully we'll get a pass from the test-*-*-libvirt-pair job!

Ian.

> Regards,
> Jim
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-09-16  8:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <osstest-60719-mainreport@xen.org>
2015-08-21  8:05 ` [osstest test] 60719: tolerable FAIL - PUSHED Ian Campbell
2015-08-21 14:02   ` Wei Liu
2015-08-22  7:25     ` Ian Campbell
2015-08-27  3:33   ` Jim Fehlig
2015-09-01 12:47     ` Ian Jackson
2015-09-01 13:14       ` Ian Campbell
2015-09-03  6:38         ` Jim Fehlig
2015-09-03 10:26           ` Ian Campbell
2015-09-03 10:49             ` Ian Jackson
2015-09-03 10:57               ` Ian Campbell
2015-09-03 16:04               ` Ian Campbell
2015-09-03 11:37             ` Ian Campbell
2015-09-03 16:35               ` Jim Fehlig
2015-09-03 16:49                 ` Ian Campbell
2015-09-10 16:40                 ` Ian Campbell
2015-09-12  3:56                   ` Jim Fehlig
2015-09-16  8:28                     ` Ian Campbell
2015-09-04  2:47       ` Jim Fehlig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.