* [xen-unstable test] 60076: regressions - FAIL
@ 2015-07-29 6:42 osstest service owner
2015-07-29 9:05 ` Dario Faggioli
0 siblings, 1 reply; 12+ messages in thread
From: osstest service owner @ 2015-07-29 6:42 UTC (permalink / raw)
To: xen-devel, osstest-admin
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 9320 bytes --]
flight 60076 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60076/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
test-armhf-armhf-xl-multivcpu 14 guest-start.2 fail REGR. vs. 59817
test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
Regressions which are regarded as allowable (not blocking):
test-amd64-i386-rumpuserxen-i386 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 59817
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 fail blocked in 59817
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail like 59817
test-armhf-armhf-xl-rtds 11 guest-start fail like 59817
test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 59817
test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 59817
Tests which did not succeed, but are not blocking:
test-armhf-armhf-xl-arndale 12 migrate-support-check fail never pass
test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass
test-armhf-armhf-xl-xsm 12 migrate-support-check fail never pass
test-amd64-i386-libvirt-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl 12 migrate-support-check fail never pass
test-amd64-i386-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt 12 migrate-support-check fail never pass
test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass
test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass
test-armhf-armhf-xl-cubietruck 12 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt-xsm 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
version targeted for testing:
xen 3e791ccb1d1d036ed25e880b1ef72ea8dcabe43a
baseline version:
xen 7c60c2da3160766a265cb84c7411ff2c9cbd8d0b
Last test of basis 59817 2015-07-22 07:29:29 Z 6 days
Failing since 59833 2015-07-23 10:56:30 Z 5 days 4 attempts
Testing same since 60076 2015-07-28 12:22:00 Z 0 days 1 attempts
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper <andrew.cooper3@citrix.com> for the x86 bits.
Chao Peng <chao.p.peng@linux.intel.com>
Chris (Christopher) Brand <chris.brand@broadcom.com>
Chris Brand <chris.brand@broadcom.com>
Daniel De Graaf <dgdegra@tycho.nsa.gov>
Dario Faggioli <dario.faggioli@citrix.com>
Ed White <edmund.h.white@intel.com>
George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Jonathan Creekmore <jonathan.creekmore@gmail.com>
Juergen Gross <jgross@suse.com>
Julien Grall <julien.grall@citrix.com>
Jun Nakajima <jun.nakajima@intel.com>
Kevin Tian <kevin.tian@intel.com>
Martin Lucina <martin@lucina.net>
Ravi Sahita <ravi.sahita@intel.com>
Roger Pau Monné <roger.pau@citrix.com>
Tamas K Lengyel <tlengyel@novetta.com>
Tiejun Chen <tiejun.chen@intel.com>
Wei Liu <wei.liu2@citrix.com>
jobs:
build-amd64-xsm pass
build-armhf-xsm pass
build-i386-xsm pass
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
build-amd64-rumpuserxen pass
build-i386-rumpuserxen pass
test-amd64-amd64-xl pass
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm fail
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
test-amd64-amd64-libvirt-xsm pass
test-armhf-armhf-libvirt-xsm pass
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-armhf-armhf-xl-xsm pass
test-amd64-i386-xl-xsm pass
test-amd64-amd64-xl-pvh-amd fail
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-debianhvm-amd64 pass
test-amd64-i386-xl-qemut-debianhvm-amd64 pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
test-amd64-i386-xl-qemuu-ovmf-amd64 fail
test-amd64-amd64-rumpuserxen-amd64 pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit2 pass
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-i386-freebsd10-i386 pass
test-amd64-i386-rumpuserxen-i386 fail
test-amd64-amd64-xl-pvh-intel fail
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu fail
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-xl-rtds pass
test-armhf-armhf-xl-rtds fail
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 pass
test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
test-amd64-amd64-xl-qemut-winxpsp3 pass
test-amd64-i386-xl-qemut-winxpsp3 pass
test-amd64-amd64-xl-qemuu-winxpsp3 pass
test-amd64-i386-xl-qemuu-winxpsp3 pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 1573 lines long.)
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 6:42 [xen-unstable test] 60076: regressions - FAIL osstest service owner
@ 2015-07-29 9:05 ` Dario Faggioli
2015-07-29 14:10 ` Julien Grall
0 siblings, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2015-07-29 9:05 UTC (permalink / raw)
To: osstest service owner; +Cc: Julien Grall, xen-devel, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 3395 bytes --]
On Wed, 2015-07-29 at 06:42 +0000, osstest service owner wrote:
> flight 60076 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/60076/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
> test-armhf-armhf-xl-multivcpu 14 guest-start.2 fail REGR. vs. 59817
>
I gave a quick look at the logs, and didn't spot any obvious issues.
AFAICT, it seems it was actually working:
--- ---
http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/serial-arndale-metrocentre.log
Jul 28 20:22:21.525058 [ 623.706988] device vif2.0 entered promiscuous mode
Jul 28 20:22:21.669108 [ 623.713782] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
Jul 28 20:22:21.677039 [ 625.296200] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
Jul 28 20:22:23.261086 [ 625.325256] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
Jul 28 20:22:23.293017 [ 625.400219] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
Jul 28 20:22:23.365065 [ 625.405368] xenbr0: port 2(vif2.0) entered forwarding state
Jul 28 20:22:23.365110 [ 625.410948] xenbr0: port 2(vif2.0) entered forwarding state
http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
INIT: Entering runlevel: 2
[^[[36minfo^[[39;49m] Using makefile-style concurrent boot in runlevel 2.
[....] Starting enhanced syslogd: rsyslogd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
[....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
[....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
^[[r^[[H^[[J
^[[r^[[H^[[J
Debian GNU/Linux 7 debian hvc0
debian login: Debian GNU/Linux 7 debian hvc0
debian login:
--- ---
Can it be that things are "just" slow, since we're creating a 4 vcpus
guest on a 1 pcpu (not so powerful, I guess) host?
http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre-output-xl_info_-n
cpu_topology :
cpu: core socket node
0: 0 0 0
http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre-output-xl_vcpu-list
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 0 0 r-- 337.5 all / all
debian.guest.osstest 2 0 0 --- 13.5 all / all
debian.guest.osstest 2 1 0 --- 12.9 all / all
debian.guest.osstest 2 2 0 --- 12.2 all / all
debian.guest.osstest 2 3 0 --- 12.5 all / all
If I missed something, and this is just completely off... sorry for the
noise. :-)
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 9:05 ` Dario Faggioli
@ 2015-07-29 14:10 ` Julien Grall
2015-07-29 14:15 ` Julien Grall
0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2015-07-29 14:10 UTC (permalink / raw)
To: Dario Faggioli, osstest service owner; +Cc: xen-devel, Ian Campbell
Hi Dario,
On 29/07/15 10:05, Dario Faggioli wrote:
> On Wed, 2015-07-29 at 06:42 +0000, osstest service owner wrote:
>> flight 60076 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/60076/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
>> test-armhf-armhf-xl-multivcpu 14 guest-start.2 fail REGR. vs. 59817
>>
> I gave a quick look at the logs, and didn't spot any obvious issues.
>
> AFAICT, it seems it was actually working:
>
> --- ---
> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/serial-arndale-metrocentre.log
> Jul 28 20:22:21.525058 [ 623.706988] device vif2.0 entered promiscuous mode
>
> Jul 28 20:22:21.669108 [ 623.713782] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
>
> Jul 28 20:22:21.677039 [ 625.296200] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
>
> Jul 28 20:22:23.261086 [ 625.325256] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
>
> Jul 28 20:22:23.293017 [ 625.400219] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
>
> Jul 28 20:22:23.365065 [ 625.405368] xenbr0: port 2(vif2.0) entered forwarding state
>
> Jul 28 20:22:23.365110 [ 625.410948] xenbr0: port 2(vif2.0) entered forwarding state
>
> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
> INIT: Entering runlevel: 2
>
> [^[[36minfo^[[39;49m] Using makefile-style concurrent boot in runlevel 2.
> [....] Starting enhanced syslogd: rsyslogd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
> [....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
> [....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
> ^[[r^[[H^[[J
>
> ^[[r^[[H^[[J
>
> Debian GNU/Linux 7 debian hvc0
>
> debian login: Debian GNU/Linux 7 debian hvc0
>
> debian login:
>
> --- ---
>
> Can it be that things are "just" slow, since we're creating a 4 vcpus
> guest on a 1 pcpu (not so powerful, I guess) host?
The arndale board has a 2 physical CPUs. Although it looks like that the
secondary cpu is never coming up:
Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
Jul 28 01:35:39.064998 (XEN) CPU1 never came online
Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
the right number of physical on the Arndale.
Nonetheless, we are aware on the multi-vcpu test failing time to time on
the arndale. It seems only happen with Xen-unstable.
osstest is waiting 40s to get the network ready in the guest. When the
test pass, the osstest is likely waiting ~20s to pass it. I took the
time between
guest debian.guest.osstest 5a:36:0e:06:00:20 22 link/ip/tcp: waiting 40s...
and the first
executing ssh ... root@172.16.146.149 echo guest debian.guest.osstest: ok
guest debian.guest.osstest: ok
For instance see
http://logs.test-lab.xenproject.org/osstest/logs/59910/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
I will do more test once we get the 2 pCPUs case fixed.
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 14:10 ` Julien Grall
@ 2015-07-29 14:15 ` Julien Grall
2015-07-29 18:18 ` Arndale secondary CPU boot issue Was " Julien Grall
2015-07-30 7:48 ` Dario Faggioli
0 siblings, 2 replies; 12+ messages in thread
From: Julien Grall @ 2015-07-29 14:15 UTC (permalink / raw)
To: Dario Faggioli, osstest service owner; +Cc: xen-devel, Ian Campbell
On 29/07/15 15:10, Julien Grall wrote:
> Hi Dario,
>
> On 29/07/15 10:05, Dario Faggioli wrote:
>> On Wed, 2015-07-29 at 06:42 +0000, osstest service owner wrote:
>>> flight 60076 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/60076/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>> test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 59817
>>> test-armhf-armhf-xl-multivcpu 14 guest-start.2 fail REGR. vs. 59817
>>>
>> I gave a quick look at the logs, and didn't spot any obvious issues.
>>
>> AFAICT, it seems it was actually working:
>>
>> --- ---
>> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/serial-arndale-metrocentre.log
>> Jul 28 20:22:21.525058 [ 623.706988] device vif2.0 entered promiscuous mode
>>
>> Jul 28 20:22:21.669108 [ 623.713782] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
>>
>> Jul 28 20:22:21.677039 [ 625.296200] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
>>
>> Jul 28 20:22:23.261086 [ 625.325256] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
>>
>> Jul 28 20:22:23.293017 [ 625.400219] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
>>
>> Jul 28 20:22:23.365065 [ 625.405368] xenbr0: port 2(vif2.0) entered forwarding state
>>
>> Jul 28 20:22:23.365110 [ 625.410948] xenbr0: port 2(vif2.0) entered forwarding state
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
>> INIT: Entering runlevel: 2
>>
>> [^[[36minfo^[[39;49m] Using makefile-style concurrent boot in runlevel 2.
>> [....] Starting enhanced syslogd: rsyslogd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
>> [....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
>> [....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
>> ^[[r^[[H^[[J
>>
>> ^[[r^[[H^[[J
>>
>> Debian GNU/Linux 7 debian hvc0
>>
>> debian login: Debian GNU/Linux 7 debian hvc0
>>
>> debian login:
>>
>> --- ---
>>
>> Can it be that things are "just" slow, since we're creating a 4 vcpus
>> guest on a 1 pcpu (not so powerful, I guess) host?
>
> The arndale board has a 2 physical CPUs. Although it looks like that the
> secondary cpu is never coming up:
>
> Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
> Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
> Jul 28 01:35:39.064998 (XEN) CPU1 never came online
> Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
> Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
>
> This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
> the right number of physical on the Arndale.
>
> Nonetheless, we are aware on the multi-vcpu test failing time to time on
> the arndale. It seems only happen with Xen-unstable.
>
> osstest is waiting 40s to get the network ready in the guest. When the
> test pass, the osstest is likely waiting ~20s to pass it. I took the
> time between
>
> guest debian.guest.osstest 5a:36:0e:06:00:20 22 link/ip/tcp: waiting 40s...
>
> and the first
>
> executing ssh ... root@172.16.146.149 echo guest debian.guest.osstest: ok
> guest debian.guest.osstest: ok
> For instance see
> http://logs.test-lab.xenproject.org/osstest/logs/59910/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
FWIW, there is also worth case where the waiting time very close to 40s
(exactly 38s):
http://logs.test-lab.xenproject.org/osstest/logs/59721/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
* Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 14:15 ` Julien Grall
@ 2015-07-29 18:18 ` Julien Grall
2015-07-30 8:55 ` Ian Campbell
2015-07-30 10:38 ` Andrew Cooper
2015-07-30 7:48 ` Dario Faggioli
1 sibling, 2 replies; 12+ messages in thread
From: Julien Grall @ 2015-07-29 18:18 UTC (permalink / raw)
To: Dario Faggioli, osstest service owner
Cc: xen-devel, Ian Campbell, Stefano Stabellini, David Vrabel
On 29/07/15 15:15, Julien Grall wrote:
>>> Can it be that things are "just" slow, since we're creating a 4 vcpus
>>> guest on a 1 pcpu (not so powerful, I guess) host?
>>
>> The arndale board has a 2 physical CPUs. Although it looks like that the
>> secondary cpu is never coming up:
>>
>> Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
>> Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
>> Jul 28 01:35:39.064998 (XEN) CPU1 never came online
>> Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
>> Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
>>
>> This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
>> the right number of physical on the Arndale.
I figured out what's going on. The problem interestingly came after the
commit which added the support of the ticket lock [1] in Xen.
While the problem is solved by reverting this patch, the source of the
issue is not because of a ticket lock issue with ARM (thanks god!).
The old implementation of spinlock is sending an event (via the assembly
instruction SEV) to the other physical CPUs. This will wake up the
others CPUs waiting on the assembly instruction WFE (Wait For Event).
It appears to be required on the Arndale to boot secondaries CPUs.
Although, depending on where I put the sev I don't have the same behavior:
- sev in smp_init callback: the CPU is not coming up
- sev before or after arch_cpu_up: the CPU is booting but not in HYP
mode [2]
I haven't yet figured out where the "sev" should be placed in order to
get the CPU boot correctly.
What I don't understand is how the placement of "sev" would affect the
secondary processor to boot in HYP mode or Kernel mode or nothing at all
This platform seems very picky and I don't remember having a
documentation about how the SMP boot works for this platform. Linux
seems to avoid the SEV for this platform.
Regards,
[1] e10784ac424405c82accd0542fcc84cf468c53dc "use ticket locks for spin
locks".
[2] - CPU 00000001 booting -
- Xen must be entered in NS Hyp mode -
- Boot failed -
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 14:15 ` Julien Grall
2015-07-29 18:18 ` Arndale secondary CPU boot issue Was " Julien Grall
@ 2015-07-30 7:48 ` Dario Faggioli
1 sibling, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2015-07-30 7:48 UTC (permalink / raw)
To: Julien Grall; +Cc: xen-devel, osstest service owner, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 3667 bytes --]
On Wed, 2015-07-29 at 15:15 +0100, Julien Grall wrote:
> On 29/07/15 15:10, Julien Grall wrote:
> > osstest is waiting 40s to get the network ready in the guest. When the
> > test pass, the osstest is likely waiting ~20s to pass it. I took the
> > time between
> >
> > guest debian.guest.osstest 5a:36:0e:06:00:20 22 link/ip/tcp: waiting 40s...
> >
> > and the first
> >
> > executing ssh ... root@172.16.146.149 echo guest debian.guest.osstest: ok
> > guest debian.guest.osstest: ok
>
> > For instance see
> > http://logs.test-lab.xenproject.org/osstest/logs/59910/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
>
> FWIW, there is also worth case where the waiting time very close to 40s
> (exactly 38s):
>
> http://logs.test-lab.xenproject.org/osstest/logs/59721/test-armhf-armhf-xl-multivcpu/14.ts-guest-start.log
>
Exactly my point, together with this:
http://logs.test-lab.xenproject.org/osstest/logs/60076/test-armhf-armhf-xl-multivcpu/arndale-metrocentre---var-log-xen-console-guest-debian.guest.osstest.log
It show two instances of full guest boot, which makes sense as it is the
second attempt that "fails".
Look at the second one and note:
- that it actually boots fine
- for some reason, we have:
[ 1.196443] udevd[69]: starting version 175
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin:
Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Running /scripts/local-premount ... done.
[ 20.741128] EXT4-fs (xvda2): mounting ext3 file system using the ext4 subsystem
[ 20.755723] EXT4-fs (xvda2): mounted filesystem with ordered data mode. Opts: (null)
... ... ... ...
[ 47.329342] EXT4-fs (xvda2): re-mounted. Opts: (null)
[....] Checking root file system...fsck from util-linux 2.20.1
/dev/xvda2: clean, 14689/262144 files, 124109/1048576 blocks
... ... ... ...
[ 47.803550] EXT4-fs (xvda2): re-mounted. Opts: errors=remount-ro
so it looks like it did take quite a bit to start. Yes, that's in
guest time, but stil...
In first instance, we have this:
[ 1.221159] udevd[69]: starting version 175
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
[ 2.275805] EXT4-fs (xvda2): mounting ext3 file system using the ext4 subsystem
[ 2.300418] EXT4-fs (xvda2): mounted filesystem with ordered data mode. Opts: (null)
... ... ... ...
[ 5.958201] EXT4-fs (xvda2): re-mounted. Opts: (null)
[....] Checking root file system...fsck from util-linux 2.20.1
... ... ... ...
[ 6.424911] EXT4-fs (xvda2): re-mounted. Opts: errors=remount-ro
Then, no, I don't think I see why the pre-mount activities (I don't even
know what those are, although, I don't think it matters) already is ~10x
slower, and then the mounting and the fsck check ~6x...
The host is certainly overloaded, in terms of number of vcpus vs. number
of pcpus, but it's not that all those vcpus should be super busy at this
point... Perhaps, the host being practically UP matters (I don't think
I've actually ever run Xen on an UP system! :-P)
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 18:18 ` Arndale secondary CPU boot issue Was " Julien Grall
@ 2015-07-30 8:55 ` Ian Campbell
2015-07-30 10:54 ` Stefano Stabellini
2015-07-30 11:36 ` Julien Grall
2015-07-30 10:38 ` Andrew Cooper
1 sibling, 2 replies; 12+ messages in thread
From: Ian Campbell @ 2015-07-30 8:55 UTC (permalink / raw)
To: Julien Grall, Dario Faggioli, osstest service owner
Cc: xen-devel, David Vrabel, Stefano Stabellini
On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
As an aside from the issue you are seeing:
> The old implementation of spinlock is sending an event (via the assembly
> instruction SEV) to the other physical CPUs. This will wake up the
> others CPUs waiting on the assembly instruction WFE (Wait For Event).
Uh, I didn't notice this about the new implementation, sorry I should have
done.
IMHO we should investigate (probably with some urgency) inserting a WFE and
SEV pair into the lock/unlock paths, else power consumption will suck.
I think that probably means using something new to replace the cpu_relax()
calls in the spinlocks with a WFE on ARM (we don't just want to change
relax) and to add a arch specific hook for the SEV on the release path.
If it is too late for 4.6 (which would depend on the eventual complexity of
the actual fix) then we should fix this ASAP in 4.7 and backport for 4.6.1.
> It appears to be required on the Arndale to boot secondaries CPUs.
> Although, depending on where I put the sev I don't have the same
> behavior:
> - sev in smp_init callback: the CPU is not coming up
> - sev before or after arch_cpu_up: the CPU is booting but not in
> HYP
> mode [2]
>
> I haven't yet figured out where the "sev" should be placed in order to
> get the CPU boot correctly.
Does the arndale end up using
.cpu_up = cpu_up_send_sgi,
or
.cpu_up = exynos5_cpu_up,
?
> What I don't understand is how the placement of "sev" would affect the
> secondary processor to boot in HYP mode or Kernel mode or nothing at all
>
> This platform seems very picky and I don't remember having a
> documentation about how the SMP boot works for this platform. Linux
> seems to avoid the SEV for this platform.
32-bit Linux has some in common code paths IIRC, which are not always
apparent at first glance.
Ian.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-29 18:18 ` Arndale secondary CPU boot issue Was " Julien Grall
2015-07-30 8:55 ` Ian Campbell
@ 2015-07-30 10:38 ` Andrew Cooper
1 sibling, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2015-07-30 10:38 UTC (permalink / raw)
To: Julien Grall, Dario Faggioli, osstest service owner
Cc: David Vrabel, xen-devel, Ian Campbell, Stefano Stabellini
On 29/07/15 19:18, Julien Grall wrote:
> On 29/07/15 15:15, Julien Grall wrote:
>>>> Can it be that things are "just" slow, since we're creating a 4 vcpus
>>>> guest on a 1 pcpu (not so powerful, I guess) host?
>>> The arndale board has a 2 physical CPUs. Although it looks like that the
>>> secondary cpu is never coming up:
>>>
>>> Jul 28 01:35:39.057076 (XEN) Adding cpu 1 to runqueue 0
>>> Jul 28 01:35:39.057104 (XEN) Bringing up CPU1
>>> Jul 28 01:35:39.064998 (XEN) CPU1 never came online
>>> Jul 28 01:35:40.065133 (XEN) Removing cpu 1 from runqueue 0
>>> Jul 28 01:35:40.065176 (XEN) Failed to bring up CPU 1 (error -5)
>>>
>>> This has been broken at some point in Xen 4.6. Xen 4.5 is booting with
>>> the right number of physical on the Arndale.
> I figured out what's going on. The problem interestingly came after the
> commit which added the support of the ticket lock [1] in Xen.
>
> While the problem is solved by reverting this patch, the source of the
> issue is not because of a ticket lock issue with ARM (thanks god!).
As an aside, why is failing to bring up a cpu not fatal under ARM?
I admit that x86 isn't much better in this regard - it will spin in an
infinite loop waiting for the upcoming cpu to call in, but it least it
doesn't proceed booting with some cpus unexpectedly missing.
~Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-30 8:55 ` Ian Campbell
@ 2015-07-30 10:54 ` Stefano Stabellini
2015-07-30 11:27 ` Ian Campbell
2015-07-30 11:27 ` David Vrabel
2015-07-30 11:36 ` Julien Grall
1 sibling, 2 replies; 12+ messages in thread
From: Stefano Stabellini @ 2015-07-30 10:54 UTC (permalink / raw)
To: Ian Campbell
Cc: xen-devel, Stefano Stabellini, Dario Faggioli,
osstest service owner, Julien Grall, David Vrabel
On Thu, 30 Jul 2015, Ian Campbell wrote:
> On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
>
> As an aside from the issue you are seeing:
>
> > The old implementation of spinlock is sending an event (via the assembly
> > instruction SEV) to the other physical CPUs. This will wake up the
> > others CPUs waiting on the assembly instruction WFE (Wait For Event).
>
> Uh, I didn't notice this about the new implementation, sorry I should have
> done.
>
> IMHO we should investigate (probably with some urgency) inserting a WFE and
> SEV pair into the lock/unlock paths, else power consumption will suck.
>
> I think that probably means using something new to replace the cpu_relax()
> calls in the spinlocks with a WFE on ARM (we don't just want to change
> relax) and to add a arch specific hook for the SEV on the release path.
I agree: adding a WFE in cpu_relax() is too risky at this point.
> If it is too late for 4.6 (which would depend on the eventual complexity of
> the actual fix) then we should fix this ASAP in 4.7 and backport for 4.6.1.
I don't think we can release 4.6 without a WFE in the locks. We might
want to consider reverting to spin_locks on ARM (although I am aware
that the code is common at the moment).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-30 10:54 ` Stefano Stabellini
@ 2015-07-30 11:27 ` Ian Campbell
2015-07-30 11:27 ` David Vrabel
1 sibling, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2015-07-30 11:27 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Julien Grall, Dario Faggioli, xen-devel, osstest service owner,
David Vrabel
On Thu, 2015-07-30 at 11:54 +0100, Stefano Stabellini wrote:
> On Thu, 30 Jul 2015, Ian Campbell wrote:
> > On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
> >
> > As an aside from the issue you are seeing:
> >
> > > The old implementation of spinlock is sending an event (via the
> > > assembly
> > > instruction SEV) to the other physical CPUs. This will wake up the
> > > others CPUs waiting on the assembly instruction WFE (Wait For Event).
> >
> > Uh, I didn't notice this about the new implementation, sorry I should
> > have
> > done.
> >
> > IMHO we should investigate (probably with some urgency) inserting a WFE
> > and
> > SEV pair into the lock/unlock paths, else power consumption will suck.
> >
> > I think that probably means using something new to replace the
> > cpu_relax()
> > calls in the spinlocks with a WFE on ARM (we don't just want to change
> > relax) and to add a arch specific hook for the SEV on the release path.
>
> I agree: adding a WFE in cpu_relax() is too risky at this point.
>
>
> > If it is too late for 4.6 (which would depend on the eventual
> > complexity of
> > the actual fix) then we should fix this ASAP in 4.7 and backport for
> > 4.6.1.
>
> I don't think we can release 4.6 without a WFE in the locks. We might
> want to consider reverting to spin_locks on ARM (although I am aware
> that the code is common at the moment).
It turns out we were missing the WFE even in the old code (I vaguely recall
having to refactor the Linux original to fit in with our arch/common split
and leaving myself a TODO item).
So I don't think we can justify changing this was 4.6. Investigating for
4.7 would be nice. Needs some careful though about races of the evt bit vs
the tickets changing though.
Ian.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-30 10:54 ` Stefano Stabellini
2015-07-30 11:27 ` Ian Campbell
@ 2015-07-30 11:27 ` David Vrabel
1 sibling, 0 replies; 12+ messages in thread
From: David Vrabel @ 2015-07-30 11:27 UTC (permalink / raw)
To: Stefano Stabellini, Ian Campbell
Cc: Julien Grall, Dario Faggioli, xen-devel, osstest service owner
On 30/07/15 11:54, Stefano Stabellini wrote:
> On Thu, 30 Jul 2015, Ian Campbell wrote:
>> On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
>>
>> As an aside from the issue you are seeing:
>>
>>> The old implementation of spinlock is sending an event (via the assembly
>>> instruction SEV) to the other physical CPUs. This will wake up the
>>> others CPUs waiting on the assembly instruction WFE (Wait For Event).
>>
>> Uh, I didn't notice this about the new implementation, sorry I should have
>> done.
>>
>> IMHO we should investigate (probably with some urgency) inserting a WFE and
>> SEV pair into the lock/unlock paths, else power consumption will suck.
>>
>> I think that probably means using something new to replace the cpu_relax()
>> calls in the spinlocks with a WFE on ARM (we don't just want to change
>> relax) and to add a arch specific hook for the SEV on the release path.
>
> I agree: adding a WFE in cpu_relax() is too risky at this point.
WFE in cpu_relax() would be broken.
However, adding two hooks for spin_relax() (using this instead of
cpu_relax()) and spin_signal() that do the WFE/SEV seems low risk to me.
For x86 use:
#define spin_relax() cpu_relax()
#define spin_signal()
>> If it is too late for 4.6 (which would depend on the eventual complexity of
>> the actual fix) then we should fix this ASAP in 4.7 and backport for 4.6.1.
>
> I don't think we can release 4.6 without a WFE in the locks. We might
> want to consider reverting to spin_locks on ARM (although I am aware
> that the code is common at the moment).
You can't revert the ticket locks for one architecture only.
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Arndale secondary CPU boot issue Was Re: [xen-unstable test] 60076: regressions - FAIL
2015-07-30 8:55 ` Ian Campbell
2015-07-30 10:54 ` Stefano Stabellini
@ 2015-07-30 11:36 ` Julien Grall
1 sibling, 0 replies; 12+ messages in thread
From: Julien Grall @ 2015-07-30 11:36 UTC (permalink / raw)
To: Ian Campbell, Dario Faggioli, osstest service owner
Cc: xen-devel, David Vrabel, Stefano Stabellini
Hi Ian,
On 30/07/15 09:55, Ian Campbell wrote:
> On Wed, 2015-07-29 at 19:18 +0100, Julien Grall wrote:
>> It appears to be required on the Arndale to boot secondaries CPUs.
>> Although, depending on where I put the sev I don't have the same
>> behavior:
>> - sev in smp_init callback: the CPU is not coming up
>> - sev before or after arch_cpu_up: the CPU is booting but not in
>> HYP
>> mode [2]
>>
>> I haven't yet figured out where the "sev" should be placed in order to
>> get the CPU boot correctly.
>
> Does the arndale end up using
> .cpu_up = cpu_up_send_sgi,
> or
> .cpu_up = exynos5_cpu_up,
> ?
The first one given that the arndale is an exynos5250. I'm not sure why
we didn't use exynos5_cpu_up. Although it doesn't seem to fix the problem.
>> What I don't understand is how the placement of "sev" would affect the
>> secondary processor to boot in HYP mode or Kernel mode or nothing at all
>>
>> This platform seems very picky and I don't remember having a
>> documentation about how the SMP boot works for this platform. Linux
>> seems to avoid the SEV for this platform.
>
> 32-bit Linux has some in common code paths IIRC, which are not always
> apparent at first glance.
Good to know.
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-07-30 11:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29 6:42 [xen-unstable test] 60076: regressions - FAIL osstest service owner
2015-07-29 9:05 ` Dario Faggioli
2015-07-29 14:10 ` Julien Grall
2015-07-29 14:15 ` Julien Grall
2015-07-29 18:18 ` Arndale secondary CPU boot issue Was " Julien Grall
2015-07-30 8:55 ` Ian Campbell
2015-07-30 10:54 ` Stefano Stabellini
2015-07-30 11:27 ` Ian Campbell
2015-07-30 11:27 ` David Vrabel
2015-07-30 11:36 ` Julien Grall
2015-07-30 10:38 ` Andrew Cooper
2015-07-30 7:48 ` Dario Faggioli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.