All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
@ 2017-08-30 13:49 osstest service owner
  2017-08-30 13:54 ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: osstest service owner @ 2017-08-30 13:49 UTC (permalink / raw)
  To: xen-devel, osstest-admin

flight 112957 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112957/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl          12 guest-start              fail REGR. vs. 112956

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm       1 build-check(1)               blocked  n/a
 build-arm64-pvops             2 hosts-allocate              broken like 112956
 build-arm64-pvops             3 capture-logs                broken like 112956
 build-arm64                   2 hosts-allocate              broken like 112956
 build-arm64                   3 capture-logs                broken like 112956
 test-amd64-amd64-libvirt     13 migrate-support-check        fail   never pass

version targeted for testing:
 xen                  2b936ea7b716dc1a13c98550f81752ab053e95c0
baseline version:
 xen                  dab6a84aadab11f31332030a1e9f0b9282d76156

Last test of basis   112956  2017-08-30 09:56:56 Z    0 days
Testing same since   112957  2017-08-30 12:02:17 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Dario Faggioli <dario.faggioli@citrix.com>
  George Dunlap <george.dunlap@citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Tim Deegan <tim@xen.org>

jobs:
 build-amd64                                                  pass    
 build-arm64                                                  broken  
 build-armhf                                                  pass    
 build-amd64-libvirt                                          pass    
 build-arm64-pvops                                            broken  
 test-armhf-armhf-xl                                          fail    
 test-arm64-arm64-xl-xsm                                      broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386                     pass    
 test-amd64-amd64-libvirt                                     pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-step build-arm64-pvops hosts-allocate
broken-step build-arm64-pvops capture-logs
broken-step build-arm64 hosts-allocate
broken-step build-arm64 capture-logs

Not pushing.

(No revision log; it would be 334 lines long.)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-08-30 13:49 [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass osstest service owner
@ 2017-08-30 13:54 ` Andrew Cooper
  2017-08-30 14:15   ` George Dunlap
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2017-08-30 13:54 UTC (permalink / raw)
  To: osstest service owner, xen-devel
  Cc: George Dunlap, Dario Faggioli, Stefano Stabellini, Julien Grall

On 30/08/17 14:49, osstest service owner wrote:
> flight 112957 xen-unstable-smoke real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/112957/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-armhf-armhf-xl          12 guest-start              fail REGR. vs. 112956

(XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
(XEN) ----[ Xen-4.10-unstable  arm32  debug=y   Not tainted ]----
(XEN) CPU:    1
(XEN) PC:     0023b710 softirq.c#__do_softirq+0x3c/0x134
(XEN) CPSR:   8000005a MODE:Hypervisor
(XEN)      R0: 4003d000 R1: 00000001 R2: 3fcffd00 R3: 00000001
(XEN)      R4: 002e5f74 R5: 00000000 R6: 0031d694 R7: 0031a224
(XEN)      R8: 002e1f80 R9: 0029b880 R10:00000001 R11:40037f3c R12:00000000
(XEN) HYP: SP: 40037f04 LR: 0025826c
(XEN) 
(XEN)   VTCR_EL2: 80003558
(XEN)  VTTBR_EL2: 00010000bff1e000
(XEN) 
(XEN)  SCTLR_EL2: 30cd187f
(XEN)    HCR_EL2: 000000000038663f
(XEN)  TTBR0_EL2: 00000000ba016000
(XEN) 
(XEN)    ESR_EL2: 00000000
(XEN)  HPFAR_EL2: 0000000000104810
(XEN)      HDFAR: df000f00
(XEN)      HIFAR: 00000000
(XEN) 
(XEN) Xen stack trace from sp=40037f04:
(XEN)    00000000 00000004 002e1f80 00000000 00000000 002e1f80 0031d694 002e1f80
(XEN)    c1203098 00000001 00000000 00000000 c11151a8 40037f44 0023b87c 40037f54
(XEN)    0026b320 c1200000 c1203034 40037f58 0026ef40 00000001 00000000 00000001
(XEN)    c031c520 c1200000 c1203034 c1203098 00000001 00000000 00000000 c11151a8
(XEN)    c12030a0 192b8000 ffffffff 7f5706d3 c031c528 60000093 07e00000 bebcd108
(XEN)    c1318ac0 c030d0a0 c1201fa0 c030928c c1318acc c030d420 c1318ad8 c030d4e0
(XEN)    00000000 00000000 00000000 00000000 00000000 c1318ae4 c1318ae4 60000013
(XEN)    60010193 20000093 60000193 00000000 00000000 00000000 00000000
(XEN) Xen call trace:
(XEN)    [<0023b710>] softirq.c#__do_softirq+0x3c/0x134 (PC)
(XEN)    [<0025826c>] domain.c#schedule_tail+0x2f4/0x308 (LR)
(XEN)    [<0023b87c>] do_softirq+0x18/0x28
(XEN)    [<0026b320>] leave_hypervisor_tail+0x84/0xb8
(XEN)    [<0026ef40>] entry.o#return_to_guest+0xc/0xb8
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
(XEN) ****************************************
(XEN) 
(XEN) Manual reset required ('noreboot' specified)

At a guess, I'd say the reasoning behind
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=57450cfe48b56db90166c52d45a411a9279a12e1
is false.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-08-30 13:54 ` Andrew Cooper
@ 2017-08-30 14:15   ` George Dunlap
  2017-08-31 15:53     ` Wei Liu
  2017-09-02 15:39     ` Julien Grall
  0 siblings, 2 replies; 12+ messages in thread
From: George Dunlap @ 2017-08-30 14:15 UTC (permalink / raw)
  To: Andrew Cooper, osstest service owner, xen-devel
  Cc: George Dunlap, Dario Faggioli, Stefano Stabellini, Julien Grall

On 08/30/2017 02:54 PM, Andrew Cooper wrote:
> On 30/08/17 14:49, osstest service owner wrote:
>> flight 112957 xen-unstable-smoke real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/112957/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-armhf-armhf-xl          12 guest-start              fail REGR. vs. 112956
> 
> (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
> (XEN) ----[ Xen-4.10-unstable  arm32  debug=y   Not tainted ]----
> (XEN) CPU:    1
> (XEN) PC:     0023b710 softirq.c#__do_softirq+0x3c/0x134
> (XEN) CPSR:   8000005a MODE:Hypervisor
> (XEN)      R0: 4003d000 R1: 00000001 R2: 3fcffd00 R3: 00000001
> (XEN)      R4: 002e5f74 R5: 00000000 R6: 0031d694 R7: 0031a224
> (XEN)      R8: 002e1f80 R9: 0029b880 R10:00000001 R11:40037f3c R12:00000000
> (XEN) HYP: SP: 40037f04 LR: 0025826c
> (XEN) 
> (XEN)   VTCR_EL2: 80003558
> (XEN)  VTTBR_EL2: 00010000bff1e000
> (XEN) 
> (XEN)  SCTLR_EL2: 30cd187f
> (XEN)    HCR_EL2: 000000000038663f
> (XEN)  TTBR0_EL2: 00000000ba016000
> (XEN) 
> (XEN)    ESR_EL2: 00000000
> (XEN)  HPFAR_EL2: 0000000000104810
> (XEN)      HDFAR: df000f00
> (XEN)      HIFAR: 00000000
> (XEN) 
> (XEN) Xen stack trace from sp=40037f04:
> (XEN)    00000000 00000004 002e1f80 00000000 00000000 002e1f80 0031d694 002e1f80
> (XEN)    c1203098 00000001 00000000 00000000 c11151a8 40037f44 0023b87c 40037f54
> (XEN)    0026b320 c1200000 c1203034 40037f58 0026ef40 00000001 00000000 00000001
> (XEN)    c031c520 c1200000 c1203034 c1203098 00000001 00000000 00000000 c11151a8
> (XEN)    c12030a0 192b8000 ffffffff 7f5706d3 c031c528 60000093 07e00000 bebcd108
> (XEN)    c1318ac0 c030d0a0 c1201fa0 c030928c c1318acc c030d420 c1318ad8 c030d4e0
> (XEN)    00000000 00000000 00000000 00000000 00000000 c1318ae4 c1318ae4 60000013
> (XEN)    60010193 20000093 60000193 00000000 00000000 00000000 00000000
> (XEN) Xen call trace:
> (XEN)    [<0023b710>] softirq.c#__do_softirq+0x3c/0x134 (PC)
> (XEN)    [<0025826c>] domain.c#schedule_tail+0x2f4/0x308 (LR)
> (XEN)    [<0023b87c>] do_softirq+0x18/0x28
> (XEN)    [<0026b320>] leave_hypervisor_tail+0x84/0xb8
> (XEN)    [<0026ef40>] entry.o#return_to_guest+0xc/0xb8
> (XEN) 
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
> (XEN) ****************************************
> (XEN) 
> (XEN) Manual reset required ('noreboot' specified)
> 
> At a guess, I'd say the reasoning behind
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=57450cfe48b56db90166c52d45a411a9279a12e1
> is false.

Wow -- I actually rather doubt that the reasoning is wrong; I can't see
anywhere in the context switch path that could possibly move the
hypervisor stack to another processor.  I'd be more inclined to suspect
that smp_processor_id() returns the wrong value under certain conditions
-- e.g., between a schedule() softirq and the next VMENTER (whatever
it's called on ARM).

Stefano, any ideas?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-08-30 14:15   ` George Dunlap
@ 2017-08-31 15:53     ` Wei Liu
  2017-09-02 15:39     ` Julien Grall
  1 sibling, 0 replies; 12+ messages in thread
From: Wei Liu @ 2017-08-31 15:53 UTC (permalink / raw)
  To: George Dunlap
  Cc: xen-devel, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli,
	osstest service owner, Julien Grall, Stefano Stabellini

On Wed, Aug 30, 2017 at 03:15:09PM +0100, George Dunlap wrote:
> On 08/30/2017 02:54 PM, Andrew Cooper wrote:
> > On 30/08/17 14:49, osstest service owner wrote:
> >> flight 112957 xen-unstable-smoke real [real]
> >> http://logs.test-lab.xenproject.org/osstest/logs/112957/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >>  test-armhf-armhf-xl          12 guest-start              fail REGR. vs. 112956
> > 
> > (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
> > (XEN) ----[ Xen-4.10-unstable  arm32  debug=y   Not tainted ]----
> > (XEN) CPU:    1
> > (XEN) PC:     0023b710 softirq.c#__do_softirq+0x3c/0x134
> > (XEN) CPSR:   8000005a MODE:Hypervisor
> > (XEN)      R0: 4003d000 R1: 00000001 R2: 3fcffd00 R3: 00000001
> > (XEN)      R4: 002e5f74 R5: 00000000 R6: 0031d694 R7: 0031a224
> > (XEN)      R8: 002e1f80 R9: 0029b880 R10:00000001 R11:40037f3c R12:00000000
> > (XEN) HYP: SP: 40037f04 LR: 0025826c
> > (XEN) 
> > (XEN)   VTCR_EL2: 80003558
> > (XEN)  VTTBR_EL2: 00010000bff1e000
> > (XEN) 
> > (XEN)  SCTLR_EL2: 30cd187f
> > (XEN)    HCR_EL2: 000000000038663f
> > (XEN)  TTBR0_EL2: 00000000ba016000
> > (XEN) 
> > (XEN)    ESR_EL2: 00000000
> > (XEN)  HPFAR_EL2: 0000000000104810
> > (XEN)      HDFAR: df000f00
> > (XEN)      HIFAR: 00000000
> > (XEN) 
> > (XEN) Xen stack trace from sp=40037f04:
> > (XEN)    00000000 00000004 002e1f80 00000000 00000000 002e1f80 0031d694 002e1f80
> > (XEN)    c1203098 00000001 00000000 00000000 c11151a8 40037f44 0023b87c 40037f54
> > (XEN)    0026b320 c1200000 c1203034 40037f58 0026ef40 00000001 00000000 00000001
> > (XEN)    c031c520 c1200000 c1203034 c1203098 00000001 00000000 00000000 c11151a8
> > (XEN)    c12030a0 192b8000 ffffffff 7f5706d3 c031c528 60000093 07e00000 bebcd108
> > (XEN)    c1318ac0 c030d0a0 c1201fa0 c030928c c1318acc c030d420 c1318ad8 c030d4e0
> > (XEN)    00000000 00000000 00000000 00000000 00000000 c1318ae4 c1318ae4 60000013
> > (XEN)    60010193 20000093 60000193 00000000 00000000 00000000 00000000
> > (XEN) Xen call trace:
> > (XEN)    [<0023b710>] softirq.c#__do_softirq+0x3c/0x134 (PC)
> > (XEN)    [<0025826c>] domain.c#schedule_tail+0x2f4/0x308 (LR)
> > (XEN)    [<0023b87c>] do_softirq+0x18/0x28
> > (XEN)    [<0026b320>] leave_hypervisor_tail+0x84/0xb8
> > (XEN)    [<0026ef40>] entry.o#return_to_guest+0xc/0xb8
> > (XEN) 
> > (XEN) 
> > (XEN) ****************************************
> > (XEN) Panic on CPU 1:
> > (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
> > (XEN) ****************************************
> > (XEN) 
> > (XEN) Manual reset required ('noreboot' specified)
> > 
> > At a guess, I'd say the reasoning behind
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=57450cfe48b56db90166c52d45a411a9279a12e1
> > is false.
> 
> Wow -- I actually rather doubt that the reasoning is wrong; I can't see
> anywhere in the context switch path that could possibly move the
> hypervisor stack to another processor.  I'd be more inclined to suspect
> that smp_processor_id() returns the wrong value under certain conditions
> -- e.g., between a schedule() softirq and the next VMENTER (whatever
> it's called on ARM).
> 
> Stefano, any ideas?
> 
>  -George
> 

In the mean time I have just reverted the offending patch to unblock
pushgate.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-08-30 14:15   ` George Dunlap
  2017-08-31 15:53     ` Wei Liu
@ 2017-09-02 15:39     ` Julien Grall
  2017-09-04  8:46       ` George Dunlap
  1 sibling, 1 reply; 12+ messages in thread
From: Julien Grall @ 2017-09-02 15:39 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper, osstest service owner, xen-devel,
	andre.przywara
  Cc: George Dunlap, Dario Faggioli, Stefano Stabellini, Julien Grall


[-- Attachment #1.1: Type: text/plain, Size: 3798 bytes --]

Hi,

Sorry for the late reply and formatting, writing from my phone.

On Wed, 30 Aug 2017, 15:17 George Dunlap <george.dunlap@citrix.com> wrote:

> On 08/30/2017 02:54 PM, Andrew Cooper wrote:
> > On 30/08/17 14:49, osstest service owner wrote:
> >> flight 112957 xen-unstable-smoke real [real]
> >> http://logs.test-lab.xenproject.org/osstest/logs/112957/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >>  test-armhf-armhf-xl          12 guest-start              fail REGR.
> vs. 112956
> >
> > (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
> > (XEN) ----[ Xen-4.10-unstable  arm32  debug=y   Not tainted ]----
> > (XEN) CPU:    1
> > (XEN) PC:     0023b710 softirq.c#__do_softirq+0x3c/0x134
> > (XEN) CPSR:   8000005a MODE:Hypervisor
> > (XEN)      R0: 4003d000 R1: 00000001 R2: 3fcffd00 R3: 00000001
> > (XEN)      R4: 002e5f74 R5: 00000000 R6: 0031d694 R7: 0031a224
> > (XEN)      R8: 002e1f80 R9: 0029b880 R10:00000001 R11:40037f3c
> R12:00000000
> > (XEN) HYP: SP: 40037f04 LR: 0025826c
> > (XEN)
> > (XEN)   VTCR_EL2: 80003558
> > (XEN)  VTTBR_EL2: 00010000bff1e000
> > (XEN)
> > (XEN)  SCTLR_EL2: 30cd187f
> > (XEN)    HCR_EL2: 000000000038663f
> > (XEN)  TTBR0_EL2: 00000000ba016000
> > (XEN)
> > (XEN)    ESR_EL2: 00000000
> > (XEN)  HPFAR_EL2: 0000000000104810
> > (XEN)      HDFAR: df000f00
> > (XEN)      HIFAR: 00000000
> > (XEN)
> > (XEN) Xen stack trace from sp=40037f04:
> > (XEN)    00000000 00000004 002e1f80 00000000 00000000 002e1f80 0031d694
> 002e1f80
> > (XEN)    c1203098 00000001 00000000 00000000 c11151a8 40037f44 0023b87c
> 40037f54
> > (XEN)    0026b320 c1200000 c1203034 40037f58 0026ef40 00000001 00000000
> 00000001
> > (XEN)    c031c520 c1200000 c1203034 c1203098 00000001 00000000 00000000
> c11151a8
> > (XEN)    c12030a0 192b8000 ffffffff 7f5706d3 c031c528 60000093 07e00000
> bebcd108
> > (XEN)    c1318ac0 c030d0a0 c1201fa0 c030928c c1318acc c030d420 c1318ad8
> c030d4e0
> > (XEN)    00000000 00000000 00000000 00000000 00000000 c1318ae4 c1318ae4
> 60000013
> > (XEN)    60010193 20000093 60000193 00000000 00000000 00000000 00000000
> > (XEN) Xen call trace:
> > (XEN)    [<0023b710>] softirq.c#__do_softirq+0x3c/0x134 (PC)
> > (XEN)    [<0025826c>] domain.c#schedule_tail+0x2f4/0x308 (LR)
> > (XEN)    [<0023b87c>] do_softirq+0x18/0x28
> > (XEN)    [<0026b320>] leave_hypervisor_tail+0x84/0xb8
> > (XEN)    [<0026ef40>] entry.o#return_to_guest+0xc/0xb8
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 1:
> > (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> >
> > At a guess, I'd say the reasoning behind
> >
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=57450cfe48b56db90166c52d45a411a9279a12e1
> > is false.
>
> Wow -- I actually rather doubt that the reasoning is wrong; I can't see
> anywhere in the context switch path that could possibly move the
> hypervisor stack to another processor.  I'd be more inclined to suspect
> that smp_processor_id() returns the wrong value under certain conditions
> -- e.g., between a schedule() softirq and the next VMENTER (whatever
> it's called on ARM).
>
> Stefano, any ideas?
>

If I am not mistaken the hypervisor stack is per-vCPU. So when you move the
vCPU to another pCPU, the stack will be moved.
This means the smp_processor_id() will return a different value. Isn't it
the same on x86?

I have CCed Andre who might be able to help while I am away.

Cheers,


>  -George
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 5303 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-02 15:39     ` Julien Grall
@ 2017-09-04  8:46       ` George Dunlap
  2017-09-05 15:50         ` Dario Faggioli
  0 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2017-09-04  8:46 UTC (permalink / raw)
  To: Julien Grall, Andrew Cooper, osstest service owner, xen-devel,
	andre.przywara
  Cc: George Dunlap, Dario Faggioli, Stefano Stabellini, Julien Grall

On 09/02/2017 04:39 PM, Julien Grall wrote:
> Hi,
> 
> Sorry for the late reply and formatting, writing from my phone.
> 
> On Wed, 30 Aug 2017, 15:17 George Dunlap <george.dunlap@citrix.com> wrote:
> 
>> On 08/30/2017 02:54 PM, Andrew Cooper wrote:
>>> On 30/08/17 14:49, osstest service owner wrote:
>>>> flight 112957 xen-unstable-smoke real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/112957/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-armhf-armhf-xl          12 guest-start              fail REGR.
>> vs. 112956
>>>
>>> (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
>>> (XEN) ----[ Xen-4.10-unstable  arm32  debug=y   Not tainted ]----
>>> (XEN) CPU:    1
>>> (XEN) PC:     0023b710 softirq.c#__do_softirq+0x3c/0x134
>>> (XEN) CPSR:   8000005a MODE:Hypervisor
>>> (XEN)      R0: 4003d000 R1: 00000001 R2: 3fcffd00 R3: 00000001
>>> (XEN)      R4: 002e5f74 R5: 00000000 R6: 0031d694 R7: 0031a224
>>> (XEN)      R8: 002e1f80 R9: 0029b880 R10:00000001 R11:40037f3c
>> R12:00000000
>>> (XEN) HYP: SP: 40037f04 LR: 0025826c
>>> (XEN)
>>> (XEN)   VTCR_EL2: 80003558
>>> (XEN)  VTTBR_EL2: 00010000bff1e000
>>> (XEN)
>>> (XEN)  SCTLR_EL2: 30cd187f
>>> (XEN)    HCR_EL2: 000000000038663f
>>> (XEN)  TTBR0_EL2: 00000000ba016000
>>> (XEN)
>>> (XEN)    ESR_EL2: 00000000
>>> (XEN)  HPFAR_EL2: 0000000000104810
>>> (XEN)      HDFAR: df000f00
>>> (XEN)      HIFAR: 00000000
>>> (XEN)
>>> (XEN) Xen stack trace from sp=40037f04:
>>> (XEN)    00000000 00000004 002e1f80 00000000 00000000 002e1f80 0031d694
>> 002e1f80
>>> (XEN)    c1203098 00000001 00000000 00000000 c11151a8 40037f44 0023b87c
>> 40037f54
>>> (XEN)    0026b320 c1200000 c1203034 40037f58 0026ef40 00000001 00000000
>> 00000001
>>> (XEN)    c031c520 c1200000 c1203034 c1203098 00000001 00000000 00000000
>> c11151a8
>>> (XEN)    c12030a0 192b8000 ffffffff 7f5706d3 c031c528 60000093 07e00000
>> bebcd108
>>> (XEN)    c1318ac0 c030d0a0 c1201fa0 c030928c c1318acc c030d420 c1318ad8
>> c030d4e0
>>> (XEN)    00000000 00000000 00000000 00000000 00000000 c1318ae4 c1318ae4
>> 60000013
>>> (XEN)    60010193 20000093 60000193 00000000 00000000 00000000 00000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<0023b710>] softirq.c#__do_softirq+0x3c/0x134 (PC)
>>> (XEN)    [<0025826c>] domain.c#schedule_tail+0x2f4/0x308 (LR)
>>> (XEN)    [<0023b87c>] do_softirq+0x18/0x28
>>> (XEN)    [<0026b320>] leave_hypervisor_tail+0x84/0xb8
>>> (XEN)    [<0026ef40>] entry.o#return_to_guest+0xc/0xb8
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 1:
>>> (XEN) Assertion 'cpu == smp_processor_id()' failed at softirq.c:35
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Manual reset required ('noreboot' specified)
>>>
>>> At a guess, I'd say the reasoning behind
>>>
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=57450cfe48b56db90166c52d45a411a9279a12e1
>>> is false.
>>
>> Wow -- I actually rather doubt that the reasoning is wrong; I can't see
>> anywhere in the context switch path that could possibly move the
>> hypervisor stack to another processor.  I'd be more inclined to suspect
>> that smp_processor_id() returns the wrong value under certain conditions
>> -- e.g., between a schedule() softirq and the next VMENTER (whatever
>> it's called on ARM).
>>
>> Stefano, any ideas?
>>
> 
> If I am not mistaken the hypervisor stack is per-vCPU. So when you move the
> vCPU to another pCPU, the stack will be moved.
> This means the smp_processor_id() will return a different value. Isn't it
> the same on x86?

No, the hypervisor stack on x86 has always been per-pcpu.  Apparently
the powerpc port was per-vcpu, which is why the smp_processor_id() was
there.  I (and apparently Dario) assumed the ARM implementation was the
same as x86, which is why I checked in this change.

I guess we'd better leave the patch reverted for now.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-04  8:46       ` George Dunlap
@ 2017-09-05 15:50         ` Dario Faggioli
  2017-09-05 22:06           ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2017-09-05 15:50 UTC (permalink / raw)
  To: George Dunlap, Julien Grall, Andrew Cooper,
	osstest service owner, xen-devel, andre.przywara
  Cc: George Dunlap, Julien Grall, Stefano Stabellini


[-- Attachment #1.1: Type: text/plain, Size: 2742 bytes --]

On Mon, 2017-09-04 at 09:46 +0100, George Dunlap wrote:
> On 09/02/2017 04:39 PM, Julien Grall wrote:
> > 
> > If I am not mistaken the hypervisor stack is per-vCPU. So when you
> > move the
> > vCPU to another pCPU, the stack will be moved.
> > This means the smp_processor_id() will return a different value.
> > Isn't it
> > the same on x86?
> 
> No, the hypervisor stack on x86 has always been per-pcpu.  Apparently
> the powerpc port was per-vcpu, which is why the smp_processor_id()
> was
> there.  I (and apparently Dario) assumed the ARM implementation was
> the
> same as x86, which is why I checked in this change.
> 
So, AFAIUI, the reason why the re-sampling at all iterations was
introduced (in ae9bfcdc, "[XEN] Various softirq cleanups") was that, on
IA64 (not powerpc :-D), actual context_switch() returns.

Basically, we are in do_softirq(), with SCHEDULE_SOFTIRQ set, so we
call the handler, which is schedule() (__enter_scheduler(), back at the
time), which calls context_switch(), which switch the stack.

On x86, context_switch() does not 'return', it jumps (via
schedule_tail()) to trying to resume guest context (of the to be
scheduled vCPU, which may be a different one). During that path, we do
check softirqs again, and we may go back to do_softirq(), but if we do,
we execute the function from its entry point, and hence we re-
initialize cpu, outside of the loop.

OTOH, on IA64, context_switch(), and hence schedule()
(__enter_scheduler()), does a regular 'return'. So, we go back to the
for(;;) loop in do_softirq(), with (I think, but I don't speak any IA64
:-/) the stack changed. And that's why we need to refresh the content
of the 'cpu' local variable.

So, I now think that what I did not understand, when looking at ARM
code, was that context_switch() does indeed return, and hence we do at
least another step inside the loop, and hit the ASSERT(), which I guess
may trigger if what's in spite of the local variable 'cpu', in the new
stack, is different than smp_processor_id().

Re-checking things now, I actually do see that context_switch() on ARM
is not 'terminal'. It call schedule_tail(), which on x86 does not
return, while in ARM, it does. I must have confused these two... Sorry.

Is this analysis correct?

Also, mostly out of curiosity, still looking at ARM code, I'm not
getting at all how continue_new_vcpu() works (e.g., when/how is it
invoked?).

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-05 15:50         ` Dario Faggioli
@ 2017-09-05 22:06           ` Stefano Stabellini
  2017-09-06 10:27             ` Dario Faggioli
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2017-09-05 22:06 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xen-devel, George Dunlap, Andrew Cooper, George Dunlap,
	osstest service owner, Julien Grall, Julien Grall,
	Stefano Stabellini, andre.przywara

On Tue, 5 Sep 2017, Dario Faggioli wrote:
> So, I now think that what I did not understand, when looking at ARM
> code, was that context_switch() does indeed return, and hence we do at
> least another step inside the loop, and hit the ASSERT(), which I guess
> may trigger if what's in spite of the local variable 'cpu', in the new
> stack, is different than smp_processor_id().
> 
> Re-checking things now, I actually do see that context_switch() on ARM
> is not 'terminal'. It call schedule_tail(), which on x86 does not
> return, while in ARM, it does. I must have confused these two... Sorry.
> 
> Is this analysis correct?
>
> Also, mostly out of curiosity, still looking at ARM code, I'm not
> getting at all how continue_new_vcpu() works (e.g., when/how is it
> invoked?).

On ARM, context_switch() returns, unless it's the first time a new vcpu
is run. In that case pc is set to continue_new_vcpu. __context_switch
restores pc to continue_new_vcpu, returning to it.  continue_new_vcpu
reset_stack_and_jumps to return_to_new_vcpu32/64.

__context_switch also saves the new registers, including pc, overwriting
the initial value of continue_new_vcpu with the return address within
context_switch. From the second time onward a vcpu is run,
context_switch returns normally.

(I think the above is correct, but I didn't double check by actually
running the code.)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-05 22:06           ` Stefano Stabellini
@ 2017-09-06 10:27             ` Dario Faggioli
  2017-09-06 19:29               ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2017-09-06 10:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, George Dunlap, Andrew Cooper, George Dunlap,
	osstest service owner, Julien Grall, Julien Grall,
	andre.przywara


[-- Attachment #1.1: Type: text/plain, Size: 1613 bytes --]

On Tue, 2017-09-05 at 15:06 -0700, Stefano Stabellini wrote:
> On Tue, 5 Sep 2017, Dario Faggioli wrote:
> > 
> > Re-checking things now, I actually do see that context_switch() on
> > ARM
> > is not 'terminal'. It call schedule_tail(), which on x86 does not
> > return, while in ARM, it does. I must have confused these two...
> > Sorry.
> > 
> > Also, mostly out of curiosity, still looking at ARM code, I'm not
> > getting at all how continue_new_vcpu() works (e.g., when/how is it
> > invoked?).
> 
> On ARM, context_switch() returns, unless it's the first time a new
> vcpu
> is run. In that case pc is set to continue_new_vcpu. __context_switch
> restores pc to continue_new_vcpu, returning to it.
>
Ah, yes, that's what I was missing! The fact that PC is assigned the
adress of continue_new_vcpu().. that's how it run. Only the first time,
as you're explaining.

Thanks! :-)

> From the second time onward a vcpu is run,
> context_switch returns normally.
> 
Right. And you (or someone else) can also confirm that the stack is
per-vCPU?

Or, in general, make sense out of the fact that the stack pointer
register changes in such a way that, when we get back in do_softirq(),
what's in the stack in the place where there was the 'cpu' local
variable has (at least in some circumstances) changed?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-06 10:27             ` Dario Faggioli
@ 2017-09-06 19:29               ` Stefano Stabellini
  2017-09-06 23:36                 ` Dario Faggioli
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2017-09-06 19:29 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, George Dunlap, Andrew Cooper, George Dunlap,
	osstest service owner, Julien Grall, Julien Grall, xen-devel,
	andre.przywara

On Wed, 6 Sep 2017, Dario Faggioli wrote:
> On Tue, 2017-09-05 at 15:06 -0700, Stefano Stabellini wrote:
> > On Tue, 5 Sep 2017, Dario Faggioli wrote:
> > > 
> > > Re-checking things now, I actually do see that context_switch() on
> > > ARM
> > > is not 'terminal'. It call schedule_tail(), which on x86 does not
> > > return, while in ARM, it does. I must have confused these two...
> > > Sorry.
> > > 
> > > Also, mostly out of curiosity, still looking at ARM code, I'm not
> > > getting at all how continue_new_vcpu() works (e.g., when/how is it
> > > invoked?).
> > 
> > On ARM, context_switch() returns, unless it's the first time a new
> > vcpu
> > is run. In that case pc is set to continue_new_vcpu. __context_switch
> > restores pc to continue_new_vcpu, returning to it.
> >
> Ah, yes, that's what I was missing! The fact that PC is assigned the
> adress of continue_new_vcpu().. that's how it run. Only the first time,
> as you're explaining.
> 
> Thanks! :-)
> 
> > From the second time onward a vcpu is run,
> > context_switch returns normally.
> > 
> Right. And you (or someone else) can also confirm that the stack is
> per-vCPU?

Yes, we have a per-vCPU stack on ARM.


> Or, in general, make sense out of the fact that the stack pointer
> register changes in such a way that, when we get back in do_softirq(),
> what's in the stack in the place where there was the 'cpu' local
> variable has (at least in some circumstances) changed?

I think yes, it could cause the smp_processor_id() mismatch.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-06 19:29               ` Stefano Stabellini
@ 2017-09-06 23:36                 ` Dario Faggioli
  2017-09-07 16:05                   ` Wei Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2017-09-06 23:36 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, George Dunlap, Andrew Cooper, George Dunlap,
	osstest service owner, Julien Grall, Julien Grall,
	andre.przywara


[-- Attachment #1.1: Type: text/plain, Size: 942 bytes --]

On Wed, 2017-09-06 at 12:29 -0700, Stefano Stabellini wrote:
> On Wed, 6 Sep 2017, Dario Faggioli wrote:
> > 
> > Or, in general, make sense out of the fact that the stack pointer
> > register changes in such a way that, when we get back in
> > do_softirq(),
> > what's in the stack in the place where there was the 'cpu' local
> > variable has (at least in some circumstances) changed?
> 
> I think yes, it could cause the smp_processor_id() mismatch.
>
Ok, then the patch was wrong (sorry again), and should stay reverted. 

I still find the comment very confusing (if correct at all), and I'll
probably send a new patch to improve it.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass
  2017-09-06 23:36                 ` Dario Faggioli
@ 2017-09-07 16:05                   ` Wei Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Wei Liu @ 2017-09-07 16:05 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	osstest service owner, George Dunlap, Julien Grall, Julien Grall,
	xen-devel, andre.przywara

On Thu, Sep 07, 2017 at 01:36:36AM +0200, Dario Faggioli wrote:
> On Wed, 2017-09-06 at 12:29 -0700, Stefano Stabellini wrote:
> > On Wed, 6 Sep 2017, Dario Faggioli wrote:
> > > 
> > > Or, in general, make sense out of the fact that the stack pointer
> > > register changes in such a way that, when we get back in
> > > do_softirq(),
> > > what's in the stack in the place where there was the 'cpu' local
> > > variable has (at least in some circumstances) changed?
> > 
> > I think yes, it could cause the smp_processor_id() mismatch.
> >
> Ok, then the patch was wrong (sorry again), and should stay reverted. 
> 
> I still find the comment very confusing (if correct at all), and I'll
> probably send a new patch to improve it.
> 

Yes please. :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-07 16:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 13:49 [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass osstest service owner
2017-08-30 13:54 ` Andrew Cooper
2017-08-30 14:15   ` George Dunlap
2017-08-31 15:53     ` Wei Liu
2017-09-02 15:39     ` Julien Grall
2017-09-04  8:46       ` George Dunlap
2017-09-05 15:50         ` Dario Faggioli
2017-09-05 22:06           ` Stefano Stabellini
2017-09-06 10:27             ` Dario Faggioli
2017-09-06 19:29               ` Stefano Stabellini
2017-09-06 23:36                 ` Dario Faggioli
2017-09-07 16:05                   ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.