All of lore.kernel.org
 help / color / mirror / Atom feed
* Scheduler regression in 4.7
@ 2016-08-11 11:35 Andrew Cooper
  2016-08-11 13:24 ` George Dunlap
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-08-11 11:35 UTC (permalink / raw)
  To: George Dunlap; +Cc: Dario Faggioli, Jan Beulich, Xen-devel List

Hello,

XenServer testing has discovered a regression from recent changes in
staging-4.7.

The actual cause is _csched_cpu_pick() falling over LIST_POISON, which
happened to occur at the same time as a domain was shutting down.  The
instruction in question is `mov 0x10(%rax),%rax` which looks like
reverse list traversal.

The regression is across the changes

xen-4.7/xen$ git lg d37c2b9^..f2160ba
* f2160ba - x86/mmcfg: Fix initalisation of variables in
pci_mmcfg_nvidia_mcp55() (6 days ago) <Andrew Cooper>
* 471a151 - xen: Remove buggy initial placement algorithm (6 days ago)
<George Dunlap>
* c732d3c - xen: Have schedulers revise initial placement (6 days ago)
<George Dunlap>
* d37c2b9 - x86/EFI + Live Patch: avoid symbol address truncation (6
days ago) <Jan Beulich>

and is almost certainly c732d3c.

The log is below, although being a non-debug build, has mostly stack
rubble in the stack trace.

~Andrew

(XEN) [ 3315.431878] ----[ Xen-4.7.0-xs127546  x86_64  debug=n  Not
tainted ]----
(XEN) [ 3315.431884] CPU:    3
(XEN) [ 3315.431888] RIP:    e008:[<ffff82d08012944f>]
sched_credit.c#_csched_cpu_pick+0x1af/0x549
(XEN) [ 3315.431900] RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v6)
(XEN) [ 3315.431907] rax: 0200200200200200   rbx: 0000000000000006  
rcx: 0000000000000006
(XEN) [ 3315.431914] rdx: 0000003fbfc42580   rsi: ffff82d0802df3a0  
rdi: ffff83102dba7c78
(XEN) [ 3315.431919] rbp: ffff83102dba7d28   rsp: ffff83102dba7bb8  
r8:  0000000000000001
(XEN) [ 3315.431924] r9:  0000000000000001   r10: ffff82d080317528  
r11: 0000000000000000
(XEN) [ 3315.431930] r12: ffff831108d7a000   r13: 0000000000000040  
r14: ffff83110889e980
(XEN) [ 3315.431934] r15: 0000000000000000   cr0: 0000000080050033  
cr4: 00000000000426e0
(XEN) [ 3315.431939] cr3: 000000202036a000   cr2: ffff88013dc783d8
(XEN) [ 3315.431944] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
e010   cs: e008
(XEN) [ 3315.431952] Xen code around <ffff82d08012944f>
(sched_credit.c#_csched_cpu_pick+0x1af/0x549):
(XEN) [ 3315.431956]  18 48 8b 00 48 8b 40 28 <48> 8b 40 10 66 81 38 ff
7f 75 07 0f ab 9d 50 ff
(XEN) [ 3315.431973] Xen stack trace from rsp=ffff83102dba7bb8:
(XEN) [ 3315.431976]    000000012dba7c78 ffff83102db9d9c0
ffff82d0802da560 0000000100000002
(XEN) [ 3315.431984]    ffff8300bdb7e000 ffff82d080121afd
ffff82d08035ebb0 ffff82d08035eba8
(XEN) [ 3315.431992]    ff00000000000000 0000000100000028
00000011088c4001 0000000000000001
(XEN) [ 3315.431999]    ffff83102dba7c38 0000000000000000
0000000000000000 0000000000000000
(XEN) [ 3315.432005]    0000000000000000 0000000000000003
ffff83102dba7c98 0000000000000206
(XEN) [ 3315.432011]    0000000000000292 000000fb2dba7c78
0000000000000206 ffff82d08032ab78
(XEN) [ 3315.432018]    00000000fffddfb7 ffff82d080121a1c
ffff83102dba7ca8 000000002dba7ca8
(XEN) [ 3315.432025]    ff00000000000000 ffff830000000028
ffff83102dba7ce8 ffff82d08013dc34
(XEN) [ 3315.432032]    00000000ffffffff 0000000000000010
0000000000000048 0000000000000048
(XEN) [ 3315.432038]    0000000000000001 ffff83110889e8c0
ffff83102dba7d38 ffff82d08013dff0
(XEN) [ 3315.432045]    ffff83102dba7d28 ffff8300bdb7e000
ffff831108d7a000 0000000000000040
(XEN) [ 3315.432053]    ffff83110889e980 ffff83110889e8c0
ffff83102dba7d38 ffff82d080129804
(XEN) [ 3315.432060]    ffff83102dba7d78 ffff82d080129833
ffff83102dba7d98 ffff8300bdb7e000
(XEN) [ 3315.432068]    ffff831108d7a000 0000000000000040
0000000000000001 ffff83110889e8c0
(XEN) [ 3315.432074]    ffff83102dba7db8 ffff82d08012f930
0000000000000006 ffff8300bdb7e000
(XEN) [ 3315.432081]    ffff831108d7a000 000000000000001b
0000000000000006 0000000000000020
(XEN) [ 3315.432087]    ffff83102dba7de8 ffff82d080107847
ffff831108d7a000 0000000000000006
(XEN) [ 3315.432095]    00007f9f7007b004 ffff83102db9d9c0
ffff83102dba7f08 ffff82d08010537c
(XEN) [ 3315.432102]    ffff8300bd8fd000 07ff830000000000
000000000000001b 000000000000001b
(XEN) [ 3315.432109]    ffff8310031540c0 0000000000000003
ffff83102dba7e48 ffff83103ffe37c0
(XEN) [ 3315.432116] Xen call trace:
(XEN) [ 3315.432122]    [<ffff82d08012944f>]
sched_credit.c#_csched_cpu_pick+0x1af/0x549
(XEN) [ 3315.432129]    [<ffff82d080121afd>]
page_alloc.c#alloc_heap_pages+0x604/0x6d7
(XEN) [ 3315.432135]    [<ffff82d080121a1c>]
page_alloc.c#alloc_heap_pages+0x523/0x6d7
(XEN) [ 3315.432141]    [<ffff82d08013dc34>] xmem_pool_alloc+0x43f/0x46d
(XEN) [ 3315.432147]    [<ffff82d08013dff0>] _xmalloc+0xcb/0x1fc
(XEN) [ 3315.432153]    [<ffff82d080129804>]
sched_credit.c#csched_cpu_pick+0x1b/0x1d
(XEN) [ 3315.432160]    [<ffff82d080129833>]
sched_credit.c#csched_vcpu_insert+0x2d/0x14f
(XEN) [ 3315.432166]    [<ffff82d08012f930>] sched_init_vcpu+0x24e/0x2ec
(XEN) [ 3315.432173]    [<ffff82d080107847>] alloc_vcpu+0x1d1/0x2ca
(XEN) [ 3315.432178]    [<ffff82d08010537c>] do_domctl+0x98f/0x1de3
(XEN) [ 3315.432189]    [<ffff82d08022ac5b>] lstar_enter+0x9b/0xa0
(XEN) [ 3315.432192]
(XEN) [ 3317.105524]
(XEN) [ 3317.114726] ****************************************
(XEN) [ 3317.139954] Panic on CPU 3:
(XEN) [ 3317.155197] GENERAL PROTECTION FAULT
(XEN) [ 3317.174247] [error_code=0000]
(XEN) [ 3317.190248] ****************************************
(XEN) [ 3317.215469]
(XEN) [ 3317.224674] Reboot in five seconds...
(XEN) [ 3317.243913] Executing kexec image on cpu3
(XEN) [ 3317.265338] Shot down all CPUs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scheduler regression in 4.7
  2016-08-11 11:35 Scheduler regression in 4.7 Andrew Cooper
@ 2016-08-11 13:24 ` George Dunlap
  2016-08-11 13:39   ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: George Dunlap @ 2016-08-11 13:24 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap; +Cc: Dario Faggioli, Jan Beulich, Xen-devel List

On 11/08/16 12:35, Andrew Cooper wrote:
> Hello,
> 
> XenServer testing has discovered a regression from recent changes in
> staging-4.7.
> 
> The actual cause is _csched_cpu_pick() falling over LIST_POISON, which
> happened to occur at the same time as a domain was shutting down.  The
> instruction in question is `mov 0x10(%rax),%rax` which looks like
> reverse list traversal.

I don't see in sched_credit.c:_csched_cpu_pick() where any list
traversal happens.  The instruction above could easily be any pointer
dereference (although you'd noramlly expect pointers to be either valid
or NULL).

Could you use line2addr or objdump -dl to get a better idea where the
#GP is happening?

 -George

> 
> The regression is across the changes
> 
> xen-4.7/xen$ git lg d37c2b9^..f2160ba
> * f2160ba - x86/mmcfg: Fix initalisation of variables in
> pci_mmcfg_nvidia_mcp55() (6 days ago) <Andrew Cooper>
> * 471a151 - xen: Remove buggy initial placement algorithm (6 days ago)
> <George Dunlap>
> * c732d3c - xen: Have schedulers revise initial placement (6 days ago)
> <George Dunlap>
> * d37c2b9 - x86/EFI + Live Patch: avoid symbol address truncation (6
> days ago) <Jan Beulich>
> 
> and is almost certainly c732d3c.
> 
> The log is below, although being a non-debug build, has mostly stack
> rubble in the stack trace.
> 
> ~Andrew
> 
> (XEN) [ 3315.431878] ----[ Xen-4.7.0-xs127546  x86_64  debug=n  Not
> tainted ]----
> (XEN) [ 3315.431884] CPU:    3
> (XEN) [ 3315.431888] RIP:    e008:[<ffff82d08012944f>]
> sched_credit.c#_csched_cpu_pick+0x1af/0x549
> (XEN) [ 3315.431900] RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v6)
> (XEN) [ 3315.431907] rax: 0200200200200200   rbx: 0000000000000006  
> rcx: 0000000000000006
> (XEN) [ 3315.431914] rdx: 0000003fbfc42580   rsi: ffff82d0802df3a0  
> rdi: ffff83102dba7c78
> (XEN) [ 3315.431919] rbp: ffff83102dba7d28   rsp: ffff83102dba7bb8  
> r8:  0000000000000001
> (XEN) [ 3315.431924] r9:  0000000000000001   r10: ffff82d080317528  
> r11: 0000000000000000
> (XEN) [ 3315.431930] r12: ffff831108d7a000   r13: 0000000000000040  
> r14: ffff83110889e980
> (XEN) [ 3315.431934] r15: 0000000000000000   cr0: 0000000080050033  
> cr4: 00000000000426e0
> (XEN) [ 3315.431939] cr3: 000000202036a000   cr2: ffff88013dc783d8
> (XEN) [ 3315.431944] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
> e010   cs: e008
> (XEN) [ 3315.431952] Xen code around <ffff82d08012944f>
> (sched_credit.c#_csched_cpu_pick+0x1af/0x549):
> (XEN) [ 3315.431956]  18 48 8b 00 48 8b 40 28 <48> 8b 40 10 66 81 38 ff
> 7f 75 07 0f ab 9d 50 ff
> (XEN) [ 3315.431973] Xen stack trace from rsp=ffff83102dba7bb8:
> (XEN) [ 3315.431976]    000000012dba7c78 ffff83102db9d9c0
> ffff82d0802da560 0000000100000002
> (XEN) [ 3315.431984]    ffff8300bdb7e000 ffff82d080121afd
> ffff82d08035ebb0 ffff82d08035eba8
> (XEN) [ 3315.431992]    ff00000000000000 0000000100000028
> 00000011088c4001 0000000000000001
> (XEN) [ 3315.431999]    ffff83102dba7c38 0000000000000000
> 0000000000000000 0000000000000000
> (XEN) [ 3315.432005]    0000000000000000 0000000000000003
> ffff83102dba7c98 0000000000000206
> (XEN) [ 3315.432011]    0000000000000292 000000fb2dba7c78
> 0000000000000206 ffff82d08032ab78
> (XEN) [ 3315.432018]    00000000fffddfb7 ffff82d080121a1c
> ffff83102dba7ca8 000000002dba7ca8
> (XEN) [ 3315.432025]    ff00000000000000 ffff830000000028
> ffff83102dba7ce8 ffff82d08013dc34
> (XEN) [ 3315.432032]    00000000ffffffff 0000000000000010
> 0000000000000048 0000000000000048
> (XEN) [ 3315.432038]    0000000000000001 ffff83110889e8c0
> ffff83102dba7d38 ffff82d08013dff0
> (XEN) [ 3315.432045]    ffff83102dba7d28 ffff8300bdb7e000
> ffff831108d7a000 0000000000000040
> (XEN) [ 3315.432053]    ffff83110889e980 ffff83110889e8c0
> ffff83102dba7d38 ffff82d080129804
> (XEN) [ 3315.432060]    ffff83102dba7d78 ffff82d080129833
> ffff83102dba7d98 ffff8300bdb7e000
> (XEN) [ 3315.432068]    ffff831108d7a000 0000000000000040
> 0000000000000001 ffff83110889e8c0
> (XEN) [ 3315.432074]    ffff83102dba7db8 ffff82d08012f930
> 0000000000000006 ffff8300bdb7e000
> (XEN) [ 3315.432081]    ffff831108d7a000 000000000000001b
> 0000000000000006 0000000000000020
> (XEN) [ 3315.432087]    ffff83102dba7de8 ffff82d080107847
> ffff831108d7a000 0000000000000006
> (XEN) [ 3315.432095]    00007f9f7007b004 ffff83102db9d9c0
> ffff83102dba7f08 ffff82d08010537c
> (XEN) [ 3315.432102]    ffff8300bd8fd000 07ff830000000000
> 000000000000001b 000000000000001b
> (XEN) [ 3315.432109]    ffff8310031540c0 0000000000000003
> ffff83102dba7e48 ffff83103ffe37c0
> (XEN) [ 3315.432116] Xen call trace:
> (XEN) [ 3315.432122]    [<ffff82d08012944f>]
> sched_credit.c#_csched_cpu_pick+0x1af/0x549
> (XEN) [ 3315.432129]    [<ffff82d080121afd>]
> page_alloc.c#alloc_heap_pages+0x604/0x6d7
> (XEN) [ 3315.432135]    [<ffff82d080121a1c>]
> page_alloc.c#alloc_heap_pages+0x523/0x6d7
> (XEN) [ 3315.432141]    [<ffff82d08013dc34>] xmem_pool_alloc+0x43f/0x46d
> (XEN) [ 3315.432147]    [<ffff82d08013dff0>] _xmalloc+0xcb/0x1fc
> (XEN) [ 3315.432153]    [<ffff82d080129804>]
> sched_credit.c#csched_cpu_pick+0x1b/0x1d
> (XEN) [ 3315.432160]    [<ffff82d080129833>]
> sched_credit.c#csched_vcpu_insert+0x2d/0x14f
> (XEN) [ 3315.432166]    [<ffff82d08012f930>] sched_init_vcpu+0x24e/0x2ec
> (XEN) [ 3315.432173]    [<ffff82d080107847>] alloc_vcpu+0x1d1/0x2ca
> (XEN) [ 3315.432178]    [<ffff82d08010537c>] do_domctl+0x98f/0x1de3
> (XEN) [ 3315.432189]    [<ffff82d08022ac5b>] lstar_enter+0x9b/0xa0
> (XEN) [ 3315.432192]
> (XEN) [ 3317.105524]
> (XEN) [ 3317.114726] ****************************************
> (XEN) [ 3317.139954] Panic on CPU 3:
> (XEN) [ 3317.155197] GENERAL PROTECTION FAULT
> (XEN) [ 3317.174247] [error_code=0000]
> (XEN) [ 3317.190248] ****************************************
> (XEN) [ 3317.215469]
> (XEN) [ 3317.224674] Reboot in five seconds...
> (XEN) [ 3317.243913] Executing kexec image on cpu3
> (XEN) [ 3317.265338] Shot down all CPUs
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scheduler regression in 4.7
  2016-08-11 13:24 ` George Dunlap
@ 2016-08-11 13:39   ` Andrew Cooper
  2016-08-11 14:28     ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-08-11 13:39 UTC (permalink / raw)
  To: George Dunlap, George Dunlap; +Cc: Dario Faggioli, Jan Beulich, Xen-devel List

On 11/08/16 14:24, George Dunlap wrote:
> On 11/08/16 12:35, Andrew Cooper wrote:
>> Hello,
>>
>> XenServer testing has discovered a regression from recent changes in
>> staging-4.7.
>>
>> The actual cause is _csched_cpu_pick() falling over LIST_POISON, which
>> happened to occur at the same time as a domain was shutting down.  The
>> instruction in question is `mov 0x10(%rax),%rax` which looks like
>> reverse list traversal.
> I don't see in sched_credit.c:_csched_cpu_pick() where any list
> traversal happens.  The instruction above could easily be any pointer
> dereference (although you'd noramlly expect pointers to be either valid
> or NULL).
>
> Could you use line2addr or objdump -dl to get a better idea where the
> #GP is happening?

addr2line -e xen-syms-4.7.0-xs127493 ffff82d08012944f
/obj/RPM_BUILD_DIRECTORY/xen-4.7.0/xen/common/sched_credit.c:775
(discriminator 1)

It will be IS_RUNQ_IDLE() which is the problem.


For linked lists, the pointers are deliberately poisoned when list
elements are deleted, to catch bugs like this.

*** include/asm-x86/config.h:
<global>[88]                   #define LIST_POISON2 ((void
*)0x0200200200200200UL)

but it is sufficiently recognisable that could spot it in %rax before
disassembling the faulting instruction.

~Andrew

>
>  -George
>
>> The regression is across the changes
>>
>> xen-4.7/xen$ git lg d37c2b9^..f2160ba
>> * f2160ba - x86/mmcfg: Fix initalisation of variables in
>> pci_mmcfg_nvidia_mcp55() (6 days ago) <Andrew Cooper>
>> * 471a151 - xen: Remove buggy initial placement algorithm (6 days ago)
>> <George Dunlap>
>> * c732d3c - xen: Have schedulers revise initial placement (6 days ago)
>> <George Dunlap>
>> * d37c2b9 - x86/EFI + Live Patch: avoid symbol address truncation (6
>> days ago) <Jan Beulich>
>>
>> and is almost certainly c732d3c.
>>
>> The log is below, although being a non-debug build, has mostly stack
>> rubble in the stack trace.
>>
>> ~Andrew
>>
>> (XEN) [ 3315.431878] ----[ Xen-4.7.0-xs127546  x86_64  debug=n  Not
>> tainted ]----
>> (XEN) [ 3315.431884] CPU:    3
>> (XEN) [ 3315.431888] RIP:    e008:[<ffff82d08012944f>]
>> sched_credit.c#_csched_cpu_pick+0x1af/0x549
>> (XEN) [ 3315.431900] RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v6)
>> (XEN) [ 3315.431907] rax: 0200200200200200   rbx: 0000000000000006  
>> rcx: 0000000000000006
>> (XEN) [ 3315.431914] rdx: 0000003fbfc42580   rsi: ffff82d0802df3a0  
>> rdi: ffff83102dba7c78
>> (XEN) [ 3315.431919] rbp: ffff83102dba7d28   rsp: ffff83102dba7bb8  
>> r8:  0000000000000001
>> (XEN) [ 3315.431924] r9:  0000000000000001   r10: ffff82d080317528  
>> r11: 0000000000000000
>> (XEN) [ 3315.431930] r12: ffff831108d7a000   r13: 0000000000000040  
>> r14: ffff83110889e980
>> (XEN) [ 3315.431934] r15: 0000000000000000   cr0: 0000000080050033  
>> cr4: 00000000000426e0
>> (XEN) [ 3315.431939] cr3: 000000202036a000   cr2: ffff88013dc783d8
>> (XEN) [ 3315.431944] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
>> e010   cs: e008
>> (XEN) [ 3315.431952] Xen code around <ffff82d08012944f>
>> (sched_credit.c#_csched_cpu_pick+0x1af/0x549):
>> (XEN) [ 3315.431956]  18 48 8b 00 48 8b 40 28 <48> 8b 40 10 66 81 38 ff
>> 7f 75 07 0f ab 9d 50 ff
>> (XEN) [ 3315.431973] Xen stack trace from rsp=ffff83102dba7bb8:
>> (XEN) [ 3315.431976]    000000012dba7c78 ffff83102db9d9c0
>> ffff82d0802da560 0000000100000002
>> (XEN) [ 3315.431984]    ffff8300bdb7e000 ffff82d080121afd
>> ffff82d08035ebb0 ffff82d08035eba8
>> (XEN) [ 3315.431992]    ff00000000000000 0000000100000028
>> 00000011088c4001 0000000000000001
>> (XEN) [ 3315.431999]    ffff83102dba7c38 0000000000000000
>> 0000000000000000 0000000000000000
>> (XEN) [ 3315.432005]    0000000000000000 0000000000000003
>> ffff83102dba7c98 0000000000000206
>> (XEN) [ 3315.432011]    0000000000000292 000000fb2dba7c78
>> 0000000000000206 ffff82d08032ab78
>> (XEN) [ 3315.432018]    00000000fffddfb7 ffff82d080121a1c
>> ffff83102dba7ca8 000000002dba7ca8
>> (XEN) [ 3315.432025]    ff00000000000000 ffff830000000028
>> ffff83102dba7ce8 ffff82d08013dc34
>> (XEN) [ 3315.432032]    00000000ffffffff 0000000000000010
>> 0000000000000048 0000000000000048
>> (XEN) [ 3315.432038]    0000000000000001 ffff83110889e8c0
>> ffff83102dba7d38 ffff82d08013dff0
>> (XEN) [ 3315.432045]    ffff83102dba7d28 ffff8300bdb7e000
>> ffff831108d7a000 0000000000000040
>> (XEN) [ 3315.432053]    ffff83110889e980 ffff83110889e8c0
>> ffff83102dba7d38 ffff82d080129804
>> (XEN) [ 3315.432060]    ffff83102dba7d78 ffff82d080129833
>> ffff83102dba7d98 ffff8300bdb7e000
>> (XEN) [ 3315.432068]    ffff831108d7a000 0000000000000040
>> 0000000000000001 ffff83110889e8c0
>> (XEN) [ 3315.432074]    ffff83102dba7db8 ffff82d08012f930
>> 0000000000000006 ffff8300bdb7e000
>> (XEN) [ 3315.432081]    ffff831108d7a000 000000000000001b
>> 0000000000000006 0000000000000020
>> (XEN) [ 3315.432087]    ffff83102dba7de8 ffff82d080107847
>> ffff831108d7a000 0000000000000006
>> (XEN) [ 3315.432095]    00007f9f7007b004 ffff83102db9d9c0
>> ffff83102dba7f08 ffff82d08010537c
>> (XEN) [ 3315.432102]    ffff8300bd8fd000 07ff830000000000
>> 000000000000001b 000000000000001b
>> (XEN) [ 3315.432109]    ffff8310031540c0 0000000000000003
>> ffff83102dba7e48 ffff83103ffe37c0
>> (XEN) [ 3315.432116] Xen call trace:
>> (XEN) [ 3315.432122]    [<ffff82d08012944f>]
>> sched_credit.c#_csched_cpu_pick+0x1af/0x549
>> (XEN) [ 3315.432129]    [<ffff82d080121afd>]
>> page_alloc.c#alloc_heap_pages+0x604/0x6d7
>> (XEN) [ 3315.432135]    [<ffff82d080121a1c>]
>> page_alloc.c#alloc_heap_pages+0x523/0x6d7
>> (XEN) [ 3315.432141]    [<ffff82d08013dc34>] xmem_pool_alloc+0x43f/0x46d
>> (XEN) [ 3315.432147]    [<ffff82d08013dff0>] _xmalloc+0xcb/0x1fc
>> (XEN) [ 3315.432153]    [<ffff82d080129804>]
>> sched_credit.c#csched_cpu_pick+0x1b/0x1d
>> (XEN) [ 3315.432160]    [<ffff82d080129833>]
>> sched_credit.c#csched_vcpu_insert+0x2d/0x14f
>> (XEN) [ 3315.432166]    [<ffff82d08012f930>] sched_init_vcpu+0x24e/0x2ec
>> (XEN) [ 3315.432173]    [<ffff82d080107847>] alloc_vcpu+0x1d1/0x2ca
>> (XEN) [ 3315.432178]    [<ffff82d08010537c>] do_domctl+0x98f/0x1de3
>> (XEN) [ 3315.432189]    [<ffff82d08022ac5b>] lstar_enter+0x9b/0xa0
>> (XEN) [ 3315.432192]
>> (XEN) [ 3317.105524]
>> (XEN) [ 3317.114726] ****************************************
>> (XEN) [ 3317.139954] Panic on CPU 3:
>> (XEN) [ 3317.155197] GENERAL PROTECTION FAULT
>> (XEN) [ 3317.174247] [error_code=0000]
>> (XEN) [ 3317.190248] ****************************************
>> (XEN) [ 3317.215469]
>> (XEN) [ 3317.224674] Reboot in five seconds...
>> (XEN) [ 3317.243913] Executing kexec image on cpu3
>> (XEN) [ 3317.265338] Shot down all CPUs
>>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scheduler regression in 4.7
  2016-08-11 13:39   ` Andrew Cooper
@ 2016-08-11 14:28     ` Dario Faggioli
  2016-08-11 15:42       ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2016-08-11 14:28 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, George Dunlap; +Cc: Jan Beulich, Xen-devel List


[-- Attachment #1.1: Type: text/plain, Size: 2673 bytes --]

On Thu, 2016-08-11 at 14:39 +0100, Andrew Cooper wrote:
> On 11/08/16 14:24, George Dunlap wrote:
> > On 11/08/16 12:35, Andrew Cooper wrote:
> > > The actual cause is _csched_cpu_pick() falling over LIST_POISON,
> > > which
> > > happened to occur at the same time as a domain was shutting
> > > down.  The
> > > instruction in question is `mov 0x10(%rax),%rax` which looks like
> > > reverse list traversal.
>
Thanks for the report.

> > Could you use line2addr or objdump -dl to get a better idea where
> > the
> > #GP is happening?
> addr2line -e xen-syms-4.7.0-xs127493 ffff82d08012944f
> /obj/RPM_BUILD_DIRECTORY/xen-4.7.0/xen/common/sched_credit.c:775
> (discriminator 1)
> 
> It will be IS_RUNQ_IDLE() which is the problem.
> 
Ok, that does one step of list traversing (the runq). What I didn't
understand from your report is what crashed when.

IS_RUNQ_IDLE() has been introduced a while back and anything like that
has been ever caught so far. George's patch makes _csched_cpu_pick() be
called during insert_vcpu()-->csched_vcpu_insert() which, in 4.7, is
called:
 1) during domain (well, vcpu) creation,
 2) when domain is moved among cpupools

AFAICR, during domain destruction we basically move the domain to
cpupool0, and without a patch that I sent recently, that is always done
as a full fledged cpupool movement, even if the domain is _already_ in
cpupool0. So, even if you are not using cpupools, and since you mention
domain shutdown we probably are looking at 2).

But this is what I'm not sure I got well... Do you have enough info to
tell precisely when the crash manifests? Is it indeed during a domain
shutdown, or was it during a domain creation (sched_init_vcpu() is in
the stack trace... although I've read it's a non-debug one)? And is it
a 'regular' domain or dom0 that is shutting down/coming up?

The idea behind IS_RUNQ_IDLE() is that we need to know whether there is
someone in the runq of a cpu or not, to correctly initialize --and
hence avoid biasing-- some load balancing calculations. I've never
liked the idea (leave it alone the code), but it's necessary (or, at
least, I don't see a sensible alternative).

The questions I'm asking above have the aim of figuring out what the
status of the runq could be, and why adding a call to csched_cpu_pick()
from insert_vcpu() is making things explode...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scheduler regression in 4.7
  2016-08-11 14:28     ` Dario Faggioli
@ 2016-08-11 15:42       ` Andrew Cooper
  2016-08-12  3:32         ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-08-11 15:42 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, George Dunlap; +Cc: Jan Beulich, Xen-devel List

On 11/08/16 15:28, Dario Faggioli wrote:
> On Thu, 2016-08-11 at 14:39 +0100, Andrew Cooper wrote:
>> On 11/08/16 14:24, George Dunlap wrote:
>>> On 11/08/16 12:35, Andrew Cooper wrote:
>>>> The actual cause is _csched_cpu_pick() falling over LIST_POISON,
>>>> which
>>>> happened to occur at the same time as a domain was shutting
>>>> down.  The
>>>> instruction in question is `mov 0x10(%rax),%rax` which looks like
>>>> reverse list traversal.
> Thanks for the report.
>
>>> Could you use line2addr or objdump -dl to get a better idea where
>>> the
>>> #GP is happening?
>> addr2line -e xen-syms-4.7.0-xs127493 ffff82d08012944f
>> /obj/RPM_BUILD_DIRECTORY/xen-4.7.0/xen/common/sched_credit.c:775
>> (discriminator 1)
>>
>> It will be IS_RUNQ_IDLE() which is the problem.
>>
> Ok, that does one step of list traversing (the runq). What I didn't
> understand from your report is what crashed when.

IS_RUNQ_IDLE() was traversing a list, and it encountered an element
which was being concurrently deleted on a different pcpu.

>
> IS_RUNQ_IDLE() has been introduced a while back and anything like that
> has been ever caught so far. George's patch makes _csched_cpu_pick() be
> called during insert_vcpu()-->csched_vcpu_insert() which, in 4.7, is
> called:
>  1) during domain (well, vcpu) creation,
>  2) when domain is moved among cpupools
>
> AFAICR, during domain destruction we basically move the domain to
> cpupool0, and without a patch that I sent recently, that is always done
> as a full fledged cpupool movement, even if the domain is _already_ in
> cpupool0. So, even if you are not using cpupools, and since you mention
> domain shutdown we probably are looking at 2).

XenServer doesn't use any cpupools, so all pcpus and vcpus are in cpupool0.

>
> But this is what I'm not sure I got well... Do you have enough info to
> tell precisely when the crash manifests? Is it indeed during a domain
> shutdown, or was it during a domain creation (sched_init_vcpu() is in
> the stack trace... although I've read it's a non-debug one)? And is it
> a 'regular' domain or dom0 that is shutting down/coming up?

It is a vm reboot of a an HVM domU (CentOS 7 64bit, although I doubt
that is relevant).

The testcase is vm lifecycle ops on a 32vcpu VM, on a host which happens
to have 32pcpus.

>
> The idea behind IS_RUNQ_IDLE() is that we need to know whether there is
> someone in the runq of a cpu or not, to correctly initialize --and
> hence avoid biasing-- some load balancing calculations. I've never
> liked the idea (leave it alone the code), but it's necessary (or, at
> least, I don't see a sensible alternative).
>
> The questions I'm asking above have the aim of figuring out what the
> status of the runq could be, and why adding a call to csched_cpu_pick()
> from insert_vcpu() is making things explode...

It turns out that the stack trace is rather less stack rubble than I
first thought.  We are in domain construction, and specifically the
XEN_DOMCTL_max_vcpus hypercall.  All other pcpus are in idle.

    for ( i = 0; i < max; i++ )
    {
            if ( d->vcpu[i] != NULL )
                continue;

            cpu = (i == 0) ?
                cpumask_any(online) :
                cpumask_cycle(d->vcpu[i-1]->processor, online);

            if ( alloc_vcpu(d, i, cpu) == NULL )
                goto maxvcpu_out;
    }

The cpumask_cycle() call is complete and execution has moved into
alloc_vcpu()

Unfortunately, none of the code around here spills i or cpu onto the
stack, so I can't see which values the have from the stack dump.

However, I see that csched_vcpu_insert() plays with vc->processor, which
surely invalidates the cycle logic behind this loop?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scheduler regression in 4.7
  2016-08-11 15:42       ` Andrew Cooper
@ 2016-08-12  3:32         ` Dario Faggioli
  0 siblings, 0 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-08-12  3:32 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, George Dunlap; +Cc: Jan Beulich, Xen-devel List


[-- Attachment #1.1: Type: text/plain, Size: 5828 bytes --]

On Thu, 2016-08-11 at 16:42 +0100, Andrew Cooper wrote:
> On 11/08/16 15:28, Dario Faggioli wrote:
> > On Thu, 2016-08-11 at 14:39 +0100, Andrew Cooper wrote:
> > > It will be IS_RUNQ_IDLE() which is the problem.
> > > 
> > Ok, that does one step of list traversing (the runq). What I didn't
> > understand from your report is what crashed when.
> IS_RUNQ_IDLE() was traversing a list, and it encountered an element
> which was being concurrently deleted on a different pcpu.
> 
Yes, I figured it was a race like this, I was asking whether it was
boot, domain creation, domain shutdown, etc (to which you replied
below, so thanks :-) ).

> > AFAICR, during domain destruction we basically move the domain to
> > cpupool0, and without a patch that I sent recently, that is always
> > done
> > as a full fledged cpupool movement, even if the domain is _already_
> > in
> > cpupool0. So, even if you are not using cpupools, and since you
> > mention
> > domain shutdown we probably are looking at 2).
> XenServer doesn't use any cpupools, so all pcpus and vcpus are in
> cpupool0.
> 
Right. But since you say the issue manifests during domain destruction,
this may still be triggered by this callchain:

  domain_kill(d)
    cpupool_move_domain(d, cpupool0)
      cpupool_move_domain_locked()
        sched_move_domain()
          SCHED_OP(insert_vcpu)
            csched_vcpu_insert()
              csched_cpu_pick()

Considering that, as I was saying, without f6bde162c4 (which is not in
4.7), even if the domain is already in cpupool0, sched_move_domain() is
called, and it then calls insert_vcpu(), etc.

Of course, f6bde162c4 is *not* the solution. It will mitigate the issue
in such a way that it won't show up if you don't really use cpupools,
but if there's a race, that still may happen when using cpupools and
destroying a domain in a pool different than cpupool0.

I have to say that, on staging-4.7, I am able to create, reboot and
destroy a domain, without seeing this issue... but again, if it's a
race, this certainly does not mean it's not there!

Also, below, you say that you think we're in domain construction
(which, I agree, is what seems to result from the stack trace).

> It is a vm reboot of a an HVM domU (CentOS 7 64bit, although I doubt
> that is relevant).
> 
> The testcase is vm lifecycle ops on a 32vcpu VM, on a host which
> happens
> to have 32pcpus.
> 
FWIW, I tried 16 vcpus on an host with 16 pcpus (and 16 vcpus dom0).

> > The questions I'm asking above have the aim of figuring out what
> > the
> > status of the runq could be, and why adding a call to
> > csched_cpu_pick()
> > from insert_vcpu() is making things explode...
> It turns out that the stack trace is rather less stack rubble than I
> first thought.  We are in domain construction, and specifically the
> XEN_DOMCTL_max_vcpus hypercall.  All other pcpus are in idle.
> 
>     for ( i = 0; i < max; i++ )
>     {
>             if ( d->vcpu[i] != NULL )
>                 continue;
> 
>             cpu = (i == 0) ?
>                 cpumask_any(online) :
>                 cpumask_cycle(d->vcpu[i-1]->processor, online);
> 
>             if ( alloc_vcpu(d, i, cpu) == NULL )
>                 goto maxvcpu_out;
>     }
> 
> The cpumask_cycle() call is complete and execution has moved into
> alloc_vcpu()
> 
> Unfortunately, none of the code around here spills i or cpu onto the
> stack, so I can't see which values the have from the stack dump.
> 
> However, I see that csched_vcpu_insert() plays with vc->processor,
> which
> surely invalidates the cycle logic behind this loop?
> 
Yes, but I don't think that is a problem. The purpose of calling
csched_cpu_pick() from insert_vcpu() is to find a (the best?) placement
for the vcpu. That will have to be put in v->processor (and the vcpu
queued in the runq of such processor).

But the point is that _csched_vcpu_pick(), in order to come up with a
(potentially) new (and better!) cpu for the vcpu, *reads* v->processor, 
to know where v is now, and take that into account in load balancing
calculations (mostly, idleness/SMT stuff). So, those calls to
cpumask_any() and cpumask_cycle() are useful as they give
csched_cpu_pick() something to start with, and it's ok for it to be
overridden.

But fact is, IS_RUNQ_IDLE() *does* access the runq of that initial cpu
and, as explained here, for Credit1, this happens without holding the
proper spinlock:

    /* This is safe because vc isn't yet being scheduled */
    vc->processor = csched_cpu_pick(ops, vc);

    lock = vcpu_schedule_lock_irq(vc);

    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
        __runq_insert(svc);

    vcpu_schedule_unlock_irq(lock, vc);

Which I think may well explain the race.

So, the solution appears to me to be to move the call to
csched_cpu_pick() inside the critical section. As a matter of fact, for
Credit2 it's already like that (while for RTDS, that may indeed not be
necessary).

I guess the fact that the runq was actually being accessed also in the
Credit1 case, was hidden enough inside IS_RUNQ_IDLE() for both George
not to notice when doing the patch, as well as me when reviewing...
sorry for that.

I'll send a patch for staging (which then will have to be backported).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-12  3:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 11:35 Scheduler regression in 4.7 Andrew Cooper
2016-08-11 13:24 ` George Dunlap
2016-08-11 13:39   ` Andrew Cooper
2016-08-11 14:28     ` Dario Faggioli
2016-08-11 15:42       ` Andrew Cooper
2016-08-12  3:32         ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.