All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: Scheduler regression in 4.7
Date: Thu, 11 Aug 2016 14:24:37 +0100	[thread overview]
Message-ID: <07d7f503-f4e2-8955-0470-55ecbb532b88@citrix.com> (raw)
In-Reply-To: <0ebca08f-71bd-7b96-9182-caa66e4f370f@citrix.com>

On 11/08/16 12:35, Andrew Cooper wrote:
> Hello,
> 
> XenServer testing has discovered a regression from recent changes in
> staging-4.7.
> 
> The actual cause is _csched_cpu_pick() falling over LIST_POISON, which
> happened to occur at the same time as a domain was shutting down.  The
> instruction in question is `mov 0x10(%rax),%rax` which looks like
> reverse list traversal.

I don't see in sched_credit.c:_csched_cpu_pick() where any list
traversal happens.  The instruction above could easily be any pointer
dereference (although you'd noramlly expect pointers to be either valid
or NULL).

Could you use line2addr or objdump -dl to get a better idea where the
#GP is happening?

 -George

> 
> The regression is across the changes
> 
> xen-4.7/xen$ git lg d37c2b9^..f2160ba
> * f2160ba - x86/mmcfg: Fix initalisation of variables in
> pci_mmcfg_nvidia_mcp55() (6 days ago) <Andrew Cooper>
> * 471a151 - xen: Remove buggy initial placement algorithm (6 days ago)
> <George Dunlap>
> * c732d3c - xen: Have schedulers revise initial placement (6 days ago)
> <George Dunlap>
> * d37c2b9 - x86/EFI + Live Patch: avoid symbol address truncation (6
> days ago) <Jan Beulich>
> 
> and is almost certainly c732d3c.
> 
> The log is below, although being a non-debug build, has mostly stack
> rubble in the stack trace.
> 
> ~Andrew
> 
> (XEN) [ 3315.431878] ----[ Xen-4.7.0-xs127546  x86_64  debug=n  Not
> tainted ]----
> (XEN) [ 3315.431884] CPU:    3
> (XEN) [ 3315.431888] RIP:    e008:[<ffff82d08012944f>]
> sched_credit.c#_csched_cpu_pick+0x1af/0x549
> (XEN) [ 3315.431900] RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v6)
> (XEN) [ 3315.431907] rax: 0200200200200200   rbx: 0000000000000006  
> rcx: 0000000000000006
> (XEN) [ 3315.431914] rdx: 0000003fbfc42580   rsi: ffff82d0802df3a0  
> rdi: ffff83102dba7c78
> (XEN) [ 3315.431919] rbp: ffff83102dba7d28   rsp: ffff83102dba7bb8  
> r8:  0000000000000001
> (XEN) [ 3315.431924] r9:  0000000000000001   r10: ffff82d080317528  
> r11: 0000000000000000
> (XEN) [ 3315.431930] r12: ffff831108d7a000   r13: 0000000000000040  
> r14: ffff83110889e980
> (XEN) [ 3315.431934] r15: 0000000000000000   cr0: 0000000080050033  
> cr4: 00000000000426e0
> (XEN) [ 3315.431939] cr3: 000000202036a000   cr2: ffff88013dc783d8
> (XEN) [ 3315.431944] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
> e010   cs: e008
> (XEN) [ 3315.431952] Xen code around <ffff82d08012944f>
> (sched_credit.c#_csched_cpu_pick+0x1af/0x549):
> (XEN) [ 3315.431956]  18 48 8b 00 48 8b 40 28 <48> 8b 40 10 66 81 38 ff
> 7f 75 07 0f ab 9d 50 ff
> (XEN) [ 3315.431973] Xen stack trace from rsp=ffff83102dba7bb8:
> (XEN) [ 3315.431976]    000000012dba7c78 ffff83102db9d9c0
> ffff82d0802da560 0000000100000002
> (XEN) [ 3315.431984]    ffff8300bdb7e000 ffff82d080121afd
> ffff82d08035ebb0 ffff82d08035eba8
> (XEN) [ 3315.431992]    ff00000000000000 0000000100000028
> 00000011088c4001 0000000000000001
> (XEN) [ 3315.431999]    ffff83102dba7c38 0000000000000000
> 0000000000000000 0000000000000000
> (XEN) [ 3315.432005]    0000000000000000 0000000000000003
> ffff83102dba7c98 0000000000000206
> (XEN) [ 3315.432011]    0000000000000292 000000fb2dba7c78
> 0000000000000206 ffff82d08032ab78
> (XEN) [ 3315.432018]    00000000fffddfb7 ffff82d080121a1c
> ffff83102dba7ca8 000000002dba7ca8
> (XEN) [ 3315.432025]    ff00000000000000 ffff830000000028
> ffff83102dba7ce8 ffff82d08013dc34
> (XEN) [ 3315.432032]    00000000ffffffff 0000000000000010
> 0000000000000048 0000000000000048
> (XEN) [ 3315.432038]    0000000000000001 ffff83110889e8c0
> ffff83102dba7d38 ffff82d08013dff0
> (XEN) [ 3315.432045]    ffff83102dba7d28 ffff8300bdb7e000
> ffff831108d7a000 0000000000000040
> (XEN) [ 3315.432053]    ffff83110889e980 ffff83110889e8c0
> ffff83102dba7d38 ffff82d080129804
> (XEN) [ 3315.432060]    ffff83102dba7d78 ffff82d080129833
> ffff83102dba7d98 ffff8300bdb7e000
> (XEN) [ 3315.432068]    ffff831108d7a000 0000000000000040
> 0000000000000001 ffff83110889e8c0
> (XEN) [ 3315.432074]    ffff83102dba7db8 ffff82d08012f930
> 0000000000000006 ffff8300bdb7e000
> (XEN) [ 3315.432081]    ffff831108d7a000 000000000000001b
> 0000000000000006 0000000000000020
> (XEN) [ 3315.432087]    ffff83102dba7de8 ffff82d080107847
> ffff831108d7a000 0000000000000006
> (XEN) [ 3315.432095]    00007f9f7007b004 ffff83102db9d9c0
> ffff83102dba7f08 ffff82d08010537c
> (XEN) [ 3315.432102]    ffff8300bd8fd000 07ff830000000000
> 000000000000001b 000000000000001b
> (XEN) [ 3315.432109]    ffff8310031540c0 0000000000000003
> ffff83102dba7e48 ffff83103ffe37c0
> (XEN) [ 3315.432116] Xen call trace:
> (XEN) [ 3315.432122]    [<ffff82d08012944f>]
> sched_credit.c#_csched_cpu_pick+0x1af/0x549
> (XEN) [ 3315.432129]    [<ffff82d080121afd>]
> page_alloc.c#alloc_heap_pages+0x604/0x6d7
> (XEN) [ 3315.432135]    [<ffff82d080121a1c>]
> page_alloc.c#alloc_heap_pages+0x523/0x6d7
> (XEN) [ 3315.432141]    [<ffff82d08013dc34>] xmem_pool_alloc+0x43f/0x46d
> (XEN) [ 3315.432147]    [<ffff82d08013dff0>] _xmalloc+0xcb/0x1fc
> (XEN) [ 3315.432153]    [<ffff82d080129804>]
> sched_credit.c#csched_cpu_pick+0x1b/0x1d
> (XEN) [ 3315.432160]    [<ffff82d080129833>]
> sched_credit.c#csched_vcpu_insert+0x2d/0x14f
> (XEN) [ 3315.432166]    [<ffff82d08012f930>] sched_init_vcpu+0x24e/0x2ec
> (XEN) [ 3315.432173]    [<ffff82d080107847>] alloc_vcpu+0x1d1/0x2ca
> (XEN) [ 3315.432178]    [<ffff82d08010537c>] do_domctl+0x98f/0x1de3
> (XEN) [ 3315.432189]    [<ffff82d08022ac5b>] lstar_enter+0x9b/0xa0
> (XEN) [ 3315.432192]
> (XEN) [ 3317.105524]
> (XEN) [ 3317.114726] ****************************************
> (XEN) [ 3317.139954] Panic on CPU 3:
> (XEN) [ 3317.155197] GENERAL PROTECTION FAULT
> (XEN) [ 3317.174247] [error_code=0000]
> (XEN) [ 3317.190248] ****************************************
> (XEN) [ 3317.215469]
> (XEN) [ 3317.224674] Reboot in five seconds...
> (XEN) [ 3317.243913] Executing kexec image on cpu3
> (XEN) [ 3317.265338] Shot down all CPUs
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-08-11 13:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 11:35 Scheduler regression in 4.7 Andrew Cooper
2016-08-11 13:24 ` George Dunlap [this message]
2016-08-11 13:39   ` Andrew Cooper
2016-08-11 14:28     ` Dario Faggioli
2016-08-11 15:42       ` Andrew Cooper
2016-08-12  3:32         ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07d7f503-f4e2-8955-0470-55ecbb532b88@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.