All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 60076: regressions - FAIL
@ 2015-07-29  6:42 osstest service owner
  2015-07-29  9:05 ` Dario Faggioli
  0 siblings, 1 reply; 12+ messages in thread
From: osstest service owner @ 2015-07-29  6:42 UTC (permalink / raw)
  To: xen-devel, osstest-admin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 9320 bytes --]

flight 60076 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60076/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
 test-armhf-armhf-xl-multivcpu 14 guest-start.2            fail REGR. vs. 59817
 test-amd64-i386-xl-qemuu-ovmf-amd64  9 debian-hvm-install fail REGR. vs. 59817

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-rumpuserxen-i386 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 59817
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 fail blocked in 59817
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail like 59817
 test-armhf-armhf-xl-rtds     11 guest-start                  fail   like 59817
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop             fail like 59817
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop              fail like 59817

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale  12 migrate-support-check        fail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start                  fail   never pass
 test-armhf-armhf-xl-xsm      12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check        fail  never pass
 test-amd64-amd64-libvirt     12 migrate-support-check        fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start                  fail  never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop              fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-check        fail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop             fail never pass

version targeted for testing:
 xen                  3e791ccb1d1d036ed25e880b1ef72ea8dcabe43a
baseline version:
 xen                  7c60c2da3160766a265cb84c7411ff2c9cbd8d0b

Last test of basis    59817  2015-07-22 07:29:29 Z    6 days
Failing since         59833  2015-07-23 10:56:30 Z    5 days    4 attempts
Testing same since    60076  2015-07-28 12:22:00 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Andrew Cooper <andrew.cooper3@citrix.com> for the x86 bits.
  Chao Peng <chao.p.peng@linux.intel.com>
  Chris (Christopher) Brand <chris.brand@broadcom.com>
  Chris Brand <chris.brand@broadcom.com>
  Daniel De Graaf <dgdegra@tycho.nsa.gov>
  Dario Faggioli <dario.faggioli@citrix.com>
  Ed White <edmund.h.white@intel.com>
  George Dunlap <george.dunlap@eu.citrix.com>
  Ian Campbell <ian.campbell@citrix.com>
  Ian Jackson <ian.jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Jonathan Creekmore <jonathan.creekmore@gmail.com>
  Juergen Gross <jgross@suse.com>
  Julien Grall <julien.grall@citrix.com>
  Jun Nakajima <jun.nakajima@intel.com>
  Kevin Tian <kevin.tian@intel.com>
  Martin Lucina <martin@lucina.net>
  Ravi Sahita <ravi.sahita@intel.com>
  Roger Pau Monné <roger.pau@citrix.com>
  Tamas K Lengyel <tlengyel@novetta.com>
  Tiejun Chen <tiejun.chen@intel.com>
  Wei Liu <wei.liu2@citrix.com>

jobs:
 build-amd64-xsm                                              pass
 build-armhf-xsm                                              pass
 build-i386-xsm                                               pass
 build-amd64                                                  pass
 build-armhf                                                  pass
 build-i386                                                   pass
 build-amd64-libvirt                                          pass
 build-armhf-libvirt                                          pass
 build-i386-libvirt                                           pass
 build-amd64-oldkern                                          pass
 build-i386-oldkern                                           pass
 build-amd64-pvops                                            pass
 build-armhf-pvops                                            pass
 build-i386-pvops                                             pass
 build-amd64-rumpuserxen                                      pass
 build-i386-rumpuserxen                                       pass
 test-amd64-amd64-xl                                          pass
 test-armhf-armhf-xl                                          pass
 test-amd64-i386-xl                                           pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        fail
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         fail
 test-amd64-amd64-libvirt-xsm                                 pass
 test-armhf-armhf-libvirt-xsm                                 pass
 test-amd64-i386-libvirt-xsm                                  pass
 test-amd64-amd64-xl-xsm                                      pass
 test-armhf-armhf-xl-xsm                                      pass
 test-amd64-i386-xl-xsm                                       pass
 test-amd64-amd64-xl-pvh-amd                                  fail
 test-amd64-i386-qemut-rhel6hvm-amd                           pass
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass
 test-amd64-i386-freebsd10-amd64                              pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         fail
 test-amd64-i386-xl-qemuu-ovmf-amd64                          fail
 test-amd64-amd64-rumpuserxen-amd64                           pass
 test-amd64-amd64-xl-qemut-win7-amd64                         fail
 test-amd64-i386-xl-qemut-win7-amd64                          fail
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail
 test-amd64-i386-xl-qemuu-win7-amd64                          fail
 test-armhf-armhf-xl-arndale                                  pass
 test-amd64-amd64-xl-credit2                                  pass
 test-armhf-armhf-xl-credit2                                  pass
 test-armhf-armhf-xl-cubietruck                               pass
 test-amd64-i386-freebsd10-i386                               pass
 test-amd64-i386-rumpuserxen-i386                             fail
 test-amd64-amd64-xl-pvh-intel                                fail
 test-amd64-i386-qemut-rhel6hvm-intel                         pass
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass
 test-amd64-amd64-libvirt                                     pass
 test-armhf-armhf-libvirt                                     pass
 test-amd64-i386-libvirt                                      pass
 test-amd64-amd64-xl-multivcpu                                pass
 test-armhf-armhf-xl-multivcpu                                fail
 test-amd64-amd64-pair                                        pass
 test-amd64-i386-pair                                         pass
 test-amd64-amd64-xl-rtds                                     pass
 test-armhf-armhf-xl-rtds                                     fail
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1                     pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1                     pass
 test-amd64-amd64-xl-qemut-winxpsp3                           pass
 test-amd64-i386-xl-qemut-winxpsp3                            pass
 test-amd64-amd64-xl-qemuu-winxpsp3                           pass
 test-amd64-i386-xl-qemuu-winxpsp3                            pass


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 1573 lines long.)


[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29  6:42 [xen-unstable test] 60076: regressions - FAIL osstest service owner
@ 2015-07-29  9:05 ` Dario Faggioli
  2015-07-29 14:10   ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2015-07-29  9:05 UTC (permalink / raw)
  To: osstest service owner; +Cc: Julien Grall, xen-devel, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 3395 bytes --]

On Wed, 2015-07-29 at 06:42 +0000, osstest service owner wrote:
> flight 60076 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/60076/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
>  test-armhf-armhf-xl-multivcpu 14 guest-start.2            fail REGR. vs. 59817
>
I gave a quick look at the logs, and didn't spot any obvious issues.

AFAICT, it seems it was actually working:

--- ---
http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/serial-arndale-metrocentre.log
Jul 28 20:22:21.525058 [  623.706988] device vif2.0 entered promiscuous mode

Jul 28 20:22:21.669108 [  623.713782] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready

Jul 28 20:22:21.677039 [  625.296200] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants

Jul 28 20:22:23.261086 [  625.325256] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants

Jul 28 20:22:23.293017 [  625.400219] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready

Jul 28 20:22:23.365065 [  625.405368] xenbr0: port 2(vif2.0) entered forwarding state

Jul 28 20:22:23.365110 [  625.410948] xenbr0: port 2(vif2.0) entered forwarding state

http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
INIT: Entering runlevel: 2

[^[[36minfo^[[39;49m] Using makefile-style concurrent boot in runlevel 2.
[....] Starting enhanced syslogd: rsyslogd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
[....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
[....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
^[[r^[[H^[[J

^[[r^[[H^[[J

Debian GNU/Linux 7 debian hvc0

debian login: Debian GNU/Linux 7 debian hvc0

debian login: 

--- ---

Can it be that things are "just" slow, since we're creating a 4 vcpus
guest on a 1 pcpu (not so powerful, I guess) host?

http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre-output-xl_info_-n
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0

http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre-output-xl_vcpu-list
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    0   r--     337.5  all / all
debian.guest.osstest                 2     0    0   ---      13.5  all / all
debian.guest.osstest                 2     1    0   ---      12.9  all / all
debian.guest.osstest                 2     2    0   ---      12.2  all / all
debian.guest.osstest                 2     3    0   ---      12.5  all / all

If I missed something, and this is just completely off... sorry for the
noise. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29  9:05 ` Dario Faggioli
@ 2015-07-29 14:10   ` Julien Grall
  2015-07-29 14:15     ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2015-07-29 14:10 UTC (permalink / raw)
  To: Dario Faggioli, osstest service owner; +Cc: xen-devel, Ian Campbell

Hi Dario,

On 29/07/15 10:05, Dario Faggioli wrote:
> On Wed, 2015-07-29 at 06:42 +0000, osstest service owner wrote:
>> flight 60076 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/60076/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
>>  test-armhf-armhf-xl-multivcpu 14 guest-start.2            fail REGR. vs. 59817
>>
> I gave a quick look at the logs, and didn't spot any obvious issues.
> 
> AFAICT, it seems it was actually working:
> 
> --- ---
> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/serial-arndale-metrocentre.log
> Jul 28 20:22:21.525058 [  623.706988] device vif2.0 entered promiscuous mode
> 
> Jul 28 20:22:21.669108 [  623.713782] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
> 
> Jul 28 20:22:21.677039 [  625.296200] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
> 
> Jul 28 20:22:23.261086 [  625.325256] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
> 
> Jul 28 20:22:23.293017 [  625.400219] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
> 
> Jul 28 20:22:23.365065 [  625.405368] xenbr0: port 2(vif2.0) entered forwarding state
> 
> Jul 28 20:22:23.365110 [  625.410948] xenbr0: port 2(vif2.0) entered forwarding state
> 
> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
> INIT: Entering runlevel: 2
> 
> [^[[36minfo^[[39;49m] Using makefile-style concurrent boot in runlevel 2.
> [....] Starting enhanced syslogd: rsyslogd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
> [....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
> [....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
> ^[[r^[[H^[[J
> 
> ^[[r^[[H^[[J
> 
> Debian GNU/Linux 7 debian hvc0
> 
> debian login: Debian GNU/Linux 7 debian hvc0
> 
> debian login: 
> 
> --- ---
> 
> Can it be that things are "just" slow, since we're creating a 4 vcpus
> guest on a 1 pcpu (not so powerful, I guess) host?

The arndale board has a 2 physical CPUs. Although it looks like that the
secondary cpu is never coming up:

Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
Jul 28 01:35:39.064998 (XEN) CPU1 never came online
Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)

This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
the right number of physical on the Arndale.

Nonetheless, we are aware on the multi-vcpu test failing time to time on
the arndale. It seems only happen with Xen-unstable.

osstest is waiting 40s to get the network ready in the guest. When the
test pass, the osstest is likely waiting ~20s to pass it. I took the
time between

guest debian.guest.osstest 5a:36:0e:06:00:20 22 link/ip/tcp: waiting 40s...

and the first

executing ssh ... root@172.16.146.149 echo guest debian.guest.osstest: ok
guest debian.guest.osstest: ok

For instance see
http://logs.test-lab.xenproject.org/osstest/logs/59910/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log

I will do more test once we get the 2 pCPUs case fixed.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29 14:10   ` Julien Grall
@ 2015-07-29 14:15     ` Julien Grall
  2015-07-29 18:18       ` Arndale secondary CPU boot issue Was " Julien Grall
  2015-07-30  7:48       ` Dario Faggioli
  0 siblings, 2 replies; 12+ messages in thread
From: Julien Grall @ 2015-07-29 14:15 UTC (permalink / raw)
  To: Dario Faggioli, osstest service owner; +Cc: xen-devel, Ian Campbell

On 29/07/15 15:10, Julien Grall wrote:
> Hi Dario,
> 
> On 29/07/15 10:05, Dario Faggioli wrote:
>> On Wed, 2015-07-29 at 06:42 +0000, osstest service owner wrote:
>>> flight 60076 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/60076/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
>>>  test-armhf-armhf-xl-multivcpu 14 guest-start.2            fail REGR. vs. 59817
>>>
>> I gave a quick look at the logs, and didn't spot any obvious issues.
>>
>> AFAICT, it seems it was actually working:
>>
>> --- ---
>> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/serial-arndale-metrocentre.log
>> Jul 28 20:22:21.525058 [  623.706988] device vif2.0 entered promiscuous mode
>>
>> Jul 28 20:22:21.669108 [  623.713782] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
>>
>> Jul 28 20:22:21.677039 [  625.296200] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
>>
>> Jul 28 20:22:23.261086 [  625.325256] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
>>
>> Jul 28 20:22:23.293017 [  625.400219] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
>>
>> Jul 28 20:22:23.365065 [  625.405368] xenbr0: port 2(vif2.0) entered forwarding state
>>
>> Jul 28 20:22:23.365110 [  625.410948] xenbr0: port 2(vif2.0) entered forwarding state
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
>> INIT: Entering runlevel: 2
>>
>> [^[[36minfo^[[39;49m] Using makefile-style concurrent boot in runlevel 2.
>> [....] Starting enhanced syslogd: rsyslogd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
>> [....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
>> [....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
>> ^[[r^[[H^[[J
>>
>> ^[[r^[[H^[[J
>>
>> Debian GNU/Linux 7 debian hvc0
>>
>> debian login: Debian GNU/Linux 7 debian hvc0
>>
>> debian login: 
>>
>> --- ---
>>
>> Can it be that things are "just" slow, since we're creating a 4 vcpus
>> guest on a 1 pcpu (not so powerful, I guess) host?
> 
> The arndale board has a 2 physical CPUs. Although it looks like that the
> secondary cpu is never coming up:
> 
> Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
> Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
> Jul 28 01:35:39.064998 (XEN) CPU1 never came online
> Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
> Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
> 
> This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
> the right number of physical on the Arndale.
> 
> Nonetheless, we are aware on the multi-vcpu test failing time to time on
> the arndale. It seems only happen with Xen-unstable.
> 
> osstest is waiting 40s to get the network ready in the guest. When the
> test pass, the osstest is likely waiting ~20s to pass it. I took the
> time between
> 
> guest debian.guest.osstest 5a:36:0e:06:00:20 22 link/ip/tcp: waiting 40s...
> 
> and the first
> 
> executing ssh ... root@172.16.146.149 echo guest debian.guest.osstest: ok
> guest debian.guest.osstest: ok

> For instance see
> http://logs.test-lab.xenproject.org/osstest/logs/59910/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log

FWIW, there is also worth case where the waiting time very close to 40s
(exactly 38s):

http://logs.test-lab.xenproject.org/osstest/logs/59721/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29 14:15     ` Julien Grall
@ 2015-07-29 18:18       ` Julien Grall
  2015-07-30  8:55         ` Ian Campbell
  2015-07-30 10:38         ` Andrew Cooper
  2015-07-30  7:48       ` Dario Faggioli
  1 sibling, 2 replies; 12+ messages in thread
From: Julien Grall @ 2015-07-29 18:18 UTC (permalink / raw)
  To: Dario Faggioli, osstest service owner
  Cc: xen-devel, Ian Campbell, Stefano Stabellini, David Vrabel

On 29/07/15 15:15, Julien Grall wrote:
>>> Can it be that things are "just" slow, since we're creating a 4 vcpus
>>> guest on a 1 pcpu (not so powerful, I guess) host?
>>
>> The arndale board has a 2 physical CPUs. Although it looks like that the
>> secondary cpu is never coming up:
>>
>> Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
>> Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
>> Jul 28 01:35:39.064998 (XEN) CPU1 never came online
>> Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
>> Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
>>
>> This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
>> the right number of physical on the Arndale.

I figured out what's going on. The problem interestingly came after the
commit which added the support of the ticket lock [1] in Xen.

While the problem is solved by reverting this patch, the source of the
issue is not because of a ticket lock issue with ARM (thanks god!).

The old implementation of spinlock is sending an event (via the assembly
instruction SEV) to the other physical CPUs. This will wake up the
others CPUs waiting on the assembly instruction WFE (Wait For Event).

It appears to be required on the Arndale to boot secondaries CPUs.
Although, depending on where I put the sev I don't have the same behavior:
	- sev in smp_init callback: the CPU is not coming up
	- sev before or after arch_cpu_up: the CPU is booting but not in HYP
mode [2]

I haven't yet figured out where the "sev" should be placed in order to
get the CPU boot correctly.

What I don't understand is how the placement of "sev" would affect the
secondary processor to boot in HYP mode or Kernel mode or nothing at all

This platform seems very picky and I don't remember having a
documentation about how the SMP boot works for this platform. Linux
seems to avoid the SEV for this platform.

Regards,

[1] e10784ac424405c82accd0542fcc84cf468c53dc "use ticket locks for spin
locks".
[2] - CPU 00000001 booting -
    - Xen must be entered in NS Hyp mode -
    - Boot failed -

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29 14:15     ` Julien Grall
  2015-07-29 18:18       ` Arndale secondary CPU boot issue Was " Julien Grall
@ 2015-07-30  7:48       ` Dario Faggioli
  1 sibling, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2015-07-30  7:48 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, osstest service owner, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 3667 bytes --]

On Wed, 2015-07-29 at 15:15 +0100, Julien Grall wrote:
> On 29/07/15 15:10, Julien Grall wrote:

> > osstest is waiting 40s to get the network ready in the guest. When the
> > test pass, the osstest is likely waiting ~20s to pass it. I took the
> > time between
> > 
> > guest debian.guest.osstest 5a:36:0e:06:00:20 22 link/ip/tcp: waiting 40s...
> > 
> > and the first
> > 
> > executing ssh ... root@172.16.146.149 echo guest debian.guest.osstest: ok
> > guest debian.guest.osstest: ok
> 
> > For instance see
> > http://logs.test-lab.xenproject.org/osstest/logs/59910/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
> 
> FWIW, there is also worth case where the waiting time very close to 40s
> (exactly 38s):
> 
> http://logs.test-lab.xenproject.org/osstest/logs/59721/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
> 
Exactly my point, together with this:

http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log

It show two instances of full guest boot, which makes sense as it is the
second attempt that "fails".

Look at the second one and note:
 - that it actually boots fine
 - for some reason, we have:

    [    1.196443] udevd[69]: starting version 175
    Begin: Loading essential drivers ... done.
    Begin: Running /scripts/init-premount ... done.
    Begin: Mounting root file system ... Begin:     
    Running /scripts/local-top ... done.
    Begin: Running /scripts/local-premount ... done.
    Begin: Running /scripts/local-premount ... done.
    [   20.741128] EXT4-fs (xvda2): mounting ext3 file system using the ext4 subsystem
    [   20.755723] EXT4-fs (xvda2): mounted filesystem with ordered data mode. Opts: (null)
    ... ... ... ...
    [   47.329342] EXT4-fs (xvda2): re-mounted. Opts: (null)
    [....] Checking root file system...fsck from util-linux 2.20.1
    /dev/xvda2: clean, 14689/262144 files, 124109/1048576 blocks
    ... ... ... ...
    [   47.803550] EXT4-fs (xvda2): re-mounted. Opts: errors=remount-ro

   so it looks like it did take quite a bit to start. Yes, that's in 
   guest time, but stil...

In first instance, we have this:

[    1.221159] udevd[69]: starting version 175
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
[    2.275805] EXT4-fs (xvda2): mounting ext3 file system using the ext4 subsystem
[    2.300418] EXT4-fs (xvda2): mounted filesystem with ordered data mode. Opts: (null)
... ... ... ...
[    5.958201] EXT4-fs (xvda2): re-mounted. Opts: (null)
[....] Checking root file system...fsck from util-linux 2.20.1
... ... ... ...
[    6.424911] EXT4-fs (xvda2): re-mounted. Opts: errors=remount-ro

Then, no, I don't think I see why the pre-mount activities (I don't even
know what those are, although, I don't think it matters) already is ~10x
slower, and then the mounting and the fsck check ~6x...

The host is certainly overloaded, in terms of number of vcpus vs. number
of pcpus, but it's not that all those vcpus should be super busy at this
point... Perhaps, the host being practically UP matters (I don't think
I've actually ever run Xen on an UP system! :-P)

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29 18:18       ` Arndale secondary CPU boot issue Was " Julien Grall
@ 2015-07-30  8:55         ` Ian Campbell
  2015-07-30 10:54           ` Stefano Stabellini
  2015-07-30 11:36           ` Julien Grall
  2015-07-30 10:38         ` Andrew Cooper
  1 sibling, 2 replies; 12+ messages in thread
From: Ian Campbell @ 2015-07-30  8:55 UTC (permalink / raw)
  To: Julien Grall, Dario Faggioli, osstest service owner
  Cc: xen-devel, David Vrabel, Stefano Stabellini

On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:

As an aside from the issue you are seeing:

> The old implementation of spinlock is sending an event (via the assembly
> instruction SEV) to the other physical CPUs. This will wake up the
> others CPUs waiting on the assembly instruction WFE (Wait For Event).

Uh, I didn't notice this about the new implementation, sorry I should have
done.

IMHO we should investigate (probably with some urgency) inserting a WFE and
SEV pair into the lock/unlock paths, else power consumption will suck.

I think that probably means using something new to replace the cpu_relax()
calls in the spinlocks with a WFE on ARM (we don't just want to change
relax) and to add a arch specific hook for the SEV on the release path.

If it is too late for 4.6 (which would depend on the eventual complexity of
the actual fix) then we should fix this ASAP in 4.7 and backport for 4.6.1.

> It appears to be required on the Arndale to boot secondaries CPUs.
> Although, depending on where I put the sev I don't have the same 
> behavior:
> 	- sev in smp_init callback: the CPU is not coming up
> 	- sev before or after arch_cpu_up: the CPU is booting but not in 
> HYP
> mode [2]
> 
> I haven't yet figured out where the "sev" should be placed in order to
> get the CPU boot correctly.

Does the arndale end up using 
.cpu_up = cpu_up_send_sgi,
or
.cpu_up = exynos5_cpu_up,
?

> What I don't understand is how the placement of "sev" would affect the
> secondary processor to boot in HYP mode or Kernel mode or nothing at all
> 
> This platform seems very picky and I don't remember having a
> documentation about how the SMP boot works for this platform. Linux
> seems to avoid the SEV for this platform.

32-bit Linux has some in common code paths IIRC, which are not always
apparent at first glance.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-29 18:18       ` Arndale secondary CPU boot issue Was " Julien Grall
  2015-07-30  8:55         ` Ian Campbell
@ 2015-07-30 10:38         ` Andrew Cooper
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2015-07-30 10:38 UTC (permalink / raw)
  To: Julien Grall, Dario Faggioli, osstest service owner
  Cc: David Vrabel, xen-devel, Ian Campbell, Stefano Stabellini

On 29/07/15 19:18, Julien Grall wrote:
> On 29/07/15 15:15, Julien Grall wrote:
>>>> Can it be that things are "just" slow, since we're creating a 4 vcpus
>>>> guest on a 1 pcpu (not so powerful, I guess) host?
>>> The arndale board has a 2 physical CPUs. Although it looks like that the
>>> secondary cpu is never coming up:
>>>
>>> Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
>>> Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
>>> Jul 28 01:35:39.064998 (XEN) CPU1 never came online
>>> Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
>>> Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
>>>
>>> This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
>>> the right number of physical on the Arndale.
> I figured out what's going on. The problem interestingly came after the
> commit which added the support of the ticket lock [1] in Xen.
>
> While the problem is solved by reverting this patch, the source of the
> issue is not because of a ticket lock issue with ARM (thanks god!).

As an aside, why is failing to bring up a cpu not fatal under ARM?

I admit that x86 isn't much better in this regard - it will spin in an
infinite loop waiting for the upcoming cpu to call in, but it least it
doesn't proceed booting with some cpus unexpectedly missing.

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-30  8:55         ` Ian Campbell
@ 2015-07-30 10:54           ` Stefano Stabellini
  2015-07-30 11:27             ` Ian Campbell
  2015-07-30 11:27             ` David Vrabel
  2015-07-30 11:36           ` Julien Grall
  1 sibling, 2 replies; 12+ messages in thread
From: Stefano Stabellini @ 2015-07-30 10:54 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Stefano Stabellini, Dario Faggioli,
	osstest service owner, Julien Grall, David Vrabel

On Thu, 30 Jul 2015, Ian Campbell wrote:
> On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
> 
> As an aside from the issue you are seeing:
> 
> > The old implementation of spinlock is sending an event (via the assembly
> > instruction SEV) to the other physical CPUs. This will wake up the
> > others CPUs waiting on the assembly instruction WFE (Wait For Event).
> 
> Uh, I didn't notice this about the new implementation, sorry I should have
> done.
> 
> IMHO we should investigate (probably with some urgency) inserting a WFE and
> SEV pair into the lock/unlock paths, else power consumption will suck.
> 
> I think that probably means using something new to replace the cpu_relax()
> calls in the spinlocks with a WFE on ARM (we don't just want to change
> relax) and to add a arch specific hook for the SEV on the release path.

I agree: adding a WFE in cpu_relax() is too risky at this point.


> If it is too late for 4.6 (which would depend on the eventual complexity of
> the actual fix) then we should fix this ASAP in 4.7 and backport for 4.6.1.

I don't think we can release 4.6 without a WFE in the locks. We might
want to consider reverting to spin_locks on ARM (although I am aware
that the code is common at the moment).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-30 10:54           ` Stefano Stabellini
@ 2015-07-30 11:27             ` Ian Campbell
  2015-07-30 11:27             ` David Vrabel
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2015-07-30 11:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Dario Faggioli, xen-devel, osstest service owner,
	David Vrabel

On Thu, 2015-07-30 at 11:54 +0100, Stefano Stabellini wrote:
> On Thu, 30 Jul 2015, Ian Campbell wrote:
> > On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
> > 
> > As an aside from the issue you are seeing:
> > 
> > > The old implementation of spinlock is sending an event (via the 
> > > assembly
> > > instruction SEV) to the other physical CPUs. This will wake up the
> > > others CPUs waiting on the assembly instruction WFE (Wait For Event).
> > 
> > Uh, I didn't notice this about the new implementation, sorry I should 
> > have
> > done.
> > 
> > IMHO we should investigate (probably with some urgency) inserting a WFE 
> > and
> > SEV pair into the lock/unlock paths, else power consumption will suck.
> > 
> > I think that probably means using something new to replace the 
> > cpu_relax()
> > calls in the spinlocks with a WFE on ARM (we don't just want to change
> > relax) and to add a arch specific hook for the SEV on the release path.
> 
> I agree: adding a WFE in cpu_relax() is too risky at this point.
> 
> 
> > If it is too late for 4.6 (which would depend on the eventual 
> > complexity of
> > the actual fix) then we should fix this ASAP in 4.7 and backport for 
> > 4.6.1.
> 
> I don't think we can release 4.6 without a WFE in the locks. We might
> want to consider reverting to spin_locks on ARM (although I am aware
> that the code is common at the moment).

It turns out we were missing the WFE even in the old code (I vaguely recall
having to refactor the Linux original to fit in with our arch/common split
and leaving myself a TODO item).

So I don't think we can justify changing this was 4.6. Investigating for
4.7 would be nice. Needs some careful though about races of the evt bit vs
the tickets changing though.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-30 10:54           ` Stefano Stabellini
  2015-07-30 11:27             ` Ian Campbell
@ 2015-07-30 11:27             ` David Vrabel
  1 sibling, 0 replies; 12+ messages in thread
From: David Vrabel @ 2015-07-30 11:27 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Campbell
  Cc: Julien Grall, Dario Faggioli, xen-devel, osstest service owner

On 30/07/15 11:54, Stefano Stabellini wrote:
> On Thu, 30 Jul 2015, Ian Campbell wrote:
>> On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
>>
>> As an aside from the issue you are seeing:
>>
>>> The old implementation of spinlock is sending an event (via the assembly
>>> instruction SEV) to the other physical CPUs. This will wake up the
>>> others CPUs waiting on the assembly instruction WFE (Wait For Event).
>>
>> Uh, I didn't notice this about the new implementation, sorry I should have
>> done.
>>
>> IMHO we should investigate (probably with some urgency) inserting a WFE and
>> SEV pair into the lock/unlock paths, else power consumption will suck.
>>
>> I think that probably means using something new to replace the cpu_relax()
>> calls in the spinlocks with a WFE on ARM (we don't just want to change
>> relax) and to add a arch specific hook for the SEV on the release path.
> 
> I agree: adding a WFE in cpu_relax() is too risky at this point.

WFE in cpu_relax() would be broken.

However, adding two hooks for spin_relax() (using this instead of
cpu_relax()) and spin_signal() that do the WFE/SEV seems low risk to me.

For x86 use:

#define spin_relax() cpu_relax()
#define spin_signal()

>> If it is too late for 4.6 (which would depend on the eventual complexity of
>> the actual fix) then we should fix this ASAP in 4.7 and backport for 4.6.1.
> 
> I don't think we can release 4.6 without a WFE in the locks. We might
> want to consider reverting to spin_locks on ARM (although I am aware
> that the code is common at the moment).

You can't revert the ticket locks for one architecture only.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
  2015-07-30  8:55         ` Ian Campbell
  2015-07-30 10:54           ` Stefano Stabellini
@ 2015-07-30 11:36           ` Julien Grall
  1 sibling, 0 replies; 12+ messages in thread
From: Julien Grall @ 2015-07-30 11:36 UTC (permalink / raw)
  To: Ian Campbell, Dario Faggioli, osstest service owner
  Cc: xen-devel, David Vrabel, Stefano Stabellini

Hi Ian,

On 30/07/15 09:55, Ian Campbell wrote:
> On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
>> It appears to be required on the Arndale to boot secondaries CPUs.
>> Although, depending on where I put the sev I don't have the same 
>> behavior:
>> 	- sev in smp_init callback: the CPU is not coming up
>> 	- sev before or after arch_cpu_up: the CPU is booting but not in 
>> HYP
>> mode [2]
>>
>> I haven't yet figured out where the "sev" should be placed in order to
>> get the CPU boot correctly.
> 
> Does the arndale end up using 
> .cpu_up = cpu_up_send_sgi,
> or
> .cpu_up = exynos5_cpu_up,
> ?

The first one given that the arndale is an exynos5250. I'm not sure why
we didn't use exynos5_cpu_up. Although it doesn't seem to fix the problem.

>> What I don't understand is how the placement of "sev" would affect the
>> secondary processor to boot in HYP mode or Kernel mode or nothing at all
>>
>> This platform seems very picky and I don't remember having a
>> documentation about how the SMP boot works for this platform. Linux
>> seems to avoid the SEV for this platform.
> 
> 32-bit Linux has some in common code paths IIRC, which are not always
> apparent at first glance.

Good to know.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-07-30 11:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29  6:42 [xen-unstable test] 60076: regressions - FAIL osstest service owner
2015-07-29  9:05 ` Dario Faggioli
2015-07-29 14:10   ` Julien Grall
2015-07-29 14:15     ` Julien Grall
2015-07-29 18:18       ` Arndale secondary CPU boot issue Was " Julien Grall
2015-07-30  8:55         ` Ian Campbell
2015-07-30 10:54           ` Stefano Stabellini
2015-07-30 11:27             ` Ian Campbell
2015-07-30 11:27             ` David Vrabel
2015-07-30 11:36           ` Julien Grall
2015-07-30 10:38         ` Andrew Cooper
2015-07-30  7:48       ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.