All of lore.kernel.org
 help / color / mirror / Atom feed
* arndales, armhf osstest capacity
@ 2019-04-11 16:51 ` Ian Jackson
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Jackson @ 2019-04-11 16:51 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall; +Cc: xen-devel

In "[OSSTEST PATCH 00/62] Update to Debian stable (stretch)" I wrote:

>  * I experienced difficulties with the 4 Arndale devboards: high
>    probability guest start failures.  For now I have marked those
>    nodes as unsuitable for use with stretch, which will, effectively,
>    take them out of service - and leave us with a lack of armhf
>    capacity.  It is possible that this problem is due to the
>    ifupdown-hotplug issue, now addressed, so I plan to retest.

I forced pushed this series earlier.  So the arndales are now out of
service and soon we will have to do something about the problems with
armhf capacity.

I reran the tests with a version of osstest bodged to let it run on
hosts not flagged as useable with stretch, and:

  flight 134631 osstest play [commission-arndale]
  http://logs.test-lab.xenproject.org/osstest/logs/134631/

  Failures :-/ but no regressions.

  Tests which did not succeed,
  including tests which could not be run:
   test-armhf-armhf-examine      5 host-install          broken baseline untested

The arndales were unreliable before.  But this one seems odd.  It
doesn't seem to find its storage.

   test-armhf-armhf-libvirt     12 guest-start             fail baseline untested
   test-armhf-armhf-xl          12 guest-start             fail baseline untested
   test-armhf-armhf-xl-multivcpu 12 guest-start            fail baseline untested
   test-armhf-armhf-xl-credit2  12 guest-start             fail baseline untested
   test-armhf-armhf-xl-credit1  12 guest-start             fail baseline untested
   test-armhf-armhf-xl-rtds     12 guest-start             fail baseline untested
   test-armhf-armhf-xl-arndale  12 guest-start             fail baseline untested

This is the real problem.  Only 2 of the tests actually got further
than this.  It works fine with the cubietrucks.

Julien and Stefano, would you be able to look at this and advise ?

   test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail baseline untested
   test-armhf-armhf-libvirt-raw 12 migrate-support-check        fail   never pass
   test-armhf-armhf-xl-vhd      12 migrate-support-check        fail   never pass
   test-armhf-armhf-xl-vhd      13 saverestore-support-check    fail   never pass

These are the expected lack of save/migration support on ARM.

On the plus side, the two ThunderX machines are now in service.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Xen-devel] arndales, armhf osstest capacity
@ 2019-04-11 16:51 ` Ian Jackson
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Jackson @ 2019-04-11 16:51 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall; +Cc: xen-devel

In "[OSSTEST PATCH 00/62] Update to Debian stable (stretch)" I wrote:

>  * I experienced difficulties with the 4 Arndale devboards: high
>    probability guest start failures.  For now I have marked those
>    nodes as unsuitable for use with stretch, which will, effectively,
>    take them out of service - and leave us with a lack of armhf
>    capacity.  It is possible that this problem is due to the
>    ifupdown-hotplug issue, now addressed, so I plan to retest.

I forced pushed this series earlier.  So the arndales are now out of
service and soon we will have to do something about the problems with
armhf capacity.

I reran the tests with a version of osstest bodged to let it run on
hosts not flagged as useable with stretch, and:

  flight 134631 osstest play [commission-arndale]
  http://logs.test-lab.xenproject.org/osstest/logs/134631/

  Failures :-/ but no regressions.

  Tests which did not succeed,
  including tests which could not be run:
   test-armhf-armhf-examine      5 host-install          broken baseline untested

The arndales were unreliable before.  But this one seems odd.  It
doesn't seem to find its storage.

   test-armhf-armhf-libvirt     12 guest-start             fail baseline untested
   test-armhf-armhf-xl          12 guest-start             fail baseline untested
   test-armhf-armhf-xl-multivcpu 12 guest-start            fail baseline untested
   test-armhf-armhf-xl-credit2  12 guest-start             fail baseline untested
   test-armhf-armhf-xl-credit1  12 guest-start             fail baseline untested
   test-armhf-armhf-xl-rtds     12 guest-start             fail baseline untested
   test-armhf-armhf-xl-arndale  12 guest-start             fail baseline untested

This is the real problem.  Only 2 of the tests actually got further
than this.  It works fine with the cubietrucks.

Julien and Stefano, would you be able to look at this and advise ?

   test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail baseline untested
   test-armhf-armhf-libvirt-raw 12 migrate-support-check        fail   never pass
   test-armhf-armhf-xl-vhd      12 migrate-support-check        fail   never pass
   test-armhf-armhf-xl-vhd      13 saverestore-support-check    fail   never pass

These are the expected lack of save/migration support on ARM.

On the plus side, the two ThunderX machines are now in service.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: arndales, armhf osstest capacity
@ 2019-04-11 17:29   ` Julien Grall
  0 siblings, 0 replies; 4+ messages in thread
From: Julien Grall @ 2019-04-11 17:29 UTC (permalink / raw)
  To: Ian Jackson, Stefano Stabellini; +Cc: xen-devel

Hi Ian,

On 4/11/19 5:51 PM, Ian Jackson wrote:
> In "[OSSTEST PATCH 00/62] Update to Debian stable (stretch)" I wrote:
> 
>>   * I experienced difficulties with the 4 Arndale devboards: high
>>     probability guest start failures.  For now I have marked those
>>     nodes as unsuitable for use with stretch, which will, effectively,
>>     take them out of service - and leave us with a lack of armhf
>>     capacity.  It is possible that this problem is due to the
>>     ifupdown-hotplug issue, now addressed, so I plan to retest.
> 
> I forced pushed this series earlier.  So the arndales are now out of
> service and soon we will have to do something about the problems with
> armhf capacity.
> 
> I reran the tests with a version of osstest bodged to let it run on
> hosts not flagged as useable with stretch, and:
> 
>    flight 134631 osstest play [commission-arndale]
>    http://logs.test-lab.xenproject.org/osstest/logs/134631/
> 
>    Failures :-/ but no regressions.
> 
>    Tests which did not succeed,
>    including tests which could not be run:
>     test-armhf-armhf-examine      5 host-install          broken baseline untested
> 
> The arndales were unreliable before.  But this one seems odd.  It
> doesn't seem to find its storage.

It looks like to me a network issue. U-boot is trying to continuously 
load the initrd via tftp because got a timeout from, I guess, the USB 
driver in U-boot.

We have been using a pretty old U-boot on the arndale. I am wondering 
whether it would be worth having a try to upgrade u-boot and see if it 
makes more reliable.

My only worry is I am not sure if I can do the upgrade safely remotely. 
I would probably need to find a board in Cambridge for trying out the 
firmware first.

> 
>     test-armhf-armhf-libvirt     12 guest-start             fail baseline untested
>     test-armhf-armhf-xl          12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-multivcpu 12 guest-start            fail baseline untested
>     test-armhf-armhf-xl-credit2  12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-credit1  12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-rtds     12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-arndale  12 guest-start             fail baseline untested
> 
> This is the real problem.  Only 2 of the tests actually got further
> than this.  It works fine with the cubietrucks.
> 

The common problem seems to be the network. Some of the logs even have:

[   16.914427] IPv6: eth0: IPv6 duplicate address 
fe80::a446:13ff:fe77:e82f detected!

> Julien and Stefano, would you be able to look at this and advise ?

We haven't really updated the Linux branch used for quite a while. We 
are using a pretty old version of 4.14 (.19 and the current is .111). It 
is possible that a bug was fixed in newer release of 4.14.

> On the plus side, the two ThunderX machines are now in service.

Youhou! Just in time for the 2nd anniversary of their purchase :).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xen-devel] arndales, armhf osstest capacity
@ 2019-04-11 17:29   ` Julien Grall
  0 siblings, 0 replies; 4+ messages in thread
From: Julien Grall @ 2019-04-11 17:29 UTC (permalink / raw)
  To: Ian Jackson, Stefano Stabellini; +Cc: xen-devel

Hi Ian,

On 4/11/19 5:51 PM, Ian Jackson wrote:
> In "[OSSTEST PATCH 00/62] Update to Debian stable (stretch)" I wrote:
> 
>>   * I experienced difficulties with the 4 Arndale devboards: high
>>     probability guest start failures.  For now I have marked those
>>     nodes as unsuitable for use with stretch, which will, effectively,
>>     take them out of service - and leave us with a lack of armhf
>>     capacity.  It is possible that this problem is due to the
>>     ifupdown-hotplug issue, now addressed, so I plan to retest.
> 
> I forced pushed this series earlier.  So the arndales are now out of
> service and soon we will have to do something about the problems with
> armhf capacity.
> 
> I reran the tests with a version of osstest bodged to let it run on
> hosts not flagged as useable with stretch, and:
> 
>    flight 134631 osstest play [commission-arndale]
>    http://logs.test-lab.xenproject.org/osstest/logs/134631/
> 
>    Failures :-/ but no regressions.
> 
>    Tests which did not succeed,
>    including tests which could not be run:
>     test-armhf-armhf-examine      5 host-install          broken baseline untested
> 
> The arndales were unreliable before.  But this one seems odd.  It
> doesn't seem to find its storage.

It looks like to me a network issue. U-boot is trying to continuously 
load the initrd via tftp because got a timeout from, I guess, the USB 
driver in U-boot.

We have been using a pretty old U-boot on the arndale. I am wondering 
whether it would be worth having a try to upgrade u-boot and see if it 
makes more reliable.

My only worry is I am not sure if I can do the upgrade safely remotely. 
I would probably need to find a board in Cambridge for trying out the 
firmware first.

> 
>     test-armhf-armhf-libvirt     12 guest-start             fail baseline untested
>     test-armhf-armhf-xl          12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-multivcpu 12 guest-start            fail baseline untested
>     test-armhf-armhf-xl-credit2  12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-credit1  12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-rtds     12 guest-start             fail baseline untested
>     test-armhf-armhf-xl-arndale  12 guest-start             fail baseline untested
> 
> This is the real problem.  Only 2 of the tests actually got further
> than this.  It works fine with the cubietrucks.
> 

The common problem seems to be the network. Some of the logs even have:

[   16.914427] IPv6: eth0: IPv6 duplicate address 
fe80::a446:13ff:fe77:e82f detected!

> Julien and Stefano, would you be able to look at this and advise ?

We haven't really updated the Linux branch used for quite a while. We 
are using a pretty old version of 4.14 (.19 and the current is .111). It 
is possible that a bug was fixed in newer release of 4.14.

> On the plus side, the two ThunderX machines are now in service.

Youhou! Just in time for the 2nd anniversary of their purchase :).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-11 17:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 16:51 arndales, armhf osstest capacity Ian Jackson
2019-04-11 16:51 ` [Xen-devel] " Ian Jackson
2019-04-11 17:29 ` Julien Grall
2019-04-11 17:29   ` [Xen-devel] " Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.