All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@eu.citrix.com>
To: Andre Przywara <andre.przywara@amd.com>
Cc: Keir Fraser <keir@xen.org>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Mon, 31 Jan 2011 15:28:54 +0000	[thread overview]
Message-ID: <AANLkTi=ppBtb1nhdfbhGZa0Rt6kVyopdS3iJPr5fVA1x@mail.gmail.com> (raw)
In-Reply-To: <4D46CE4F.3090003@amd.com>

On Mon, Jan 31, 2011 at 2:59 PM, Andre Przywara <andre.przywara@amd.com> wrote:
> Right, that was also my impression.
>
> I seemed to get a bit further, though:
> By accident I found that in c/s 22846 the issue is fixed, it works now
> without crashing. I bisected it down to my own patch, which disables the
> NODEID_MSR in Dom0. I could confirm this theory by a) applying this single
> line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash and b) by
> removing this line from 22846 and seeing it crash.
>
> So my theory is that Dom0 sees different nodes on its virtual CPUs via the
> physical NodeID MSR, but this association can (and will) be changed every
> moment by the Xen scheduler. So Dom0 will build a bogus topology based upon
> these values. As soon as all vCPUs of Dom0 are contained into one node (node
> 0, this is caused by the cpupool-numa-split call), the Xen scheduler somehow
> hicks up.
> So it seems to be bad combination caused by the NodeID-MSR (on newer AMD
> platforms: sockets C32 and G34) and a NodeID MSR aware Dom0 (2.6.32.27).
> Since this is a hypervisor crash, I assume that the bug is still there, only
> the current tip will make it much less likely to be triggered.
>
> Hope that help, I will dig deeper now.

Thanks.  The crashes you're getting are in fact very strange.  They
have to do with assumptions that the credit scheduler makes as part of
its accounting process.  It would only make sense for those to be
triggered if a vcpu was moved from one pool to another pool without
the proper accounting being done.  (Specifically, each vcpu is
classified as either "active" or "inactive"; and each scheduler
instance keeps track of the total weight of all "active" vcpus.  The
BUGs you're tripping over are saying that this invariant has been
violated.)  However, I've looked at the cpupools vcpu-migrate code,
and it looks like it does everything right.  So I'm a bit mystified.
My only thought is if possibly a cpumask somewhere that wasn't getting
set properly, such that a vcpu was being run on a cpu from another
pool.

Unfortunately I can't take a good look at this right now; hopefully
I'll be able to take a look next week.

Andre, if you were keen, you might go through the credit code and put
in a bunch of ASSERTs that the current pcpu is in the mask of the
current vcpu; and that the current vcpu is assigned to the pool of the
current pcpu, and so on.

 -George

  reply	other threads:[~2011-01-31 15:28 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28  6:47 ` Juergen Gross
2011-01-28 11:07   ` Andre Przywara
2011-01-28 11:44     ` Juergen Gross
2011-01-28 13:14       ` Andre Przywara
2011-01-31  7:04         ` Juergen Gross
2011-01-31 14:59           ` Andre Przywara
2011-01-31 15:28             ` George Dunlap [this message]
2011-02-01 16:32               ` Andre Przywara
2011-02-02  6:27                 ` Juergen Gross
2011-02-02  8:49                   ` Juergen Gross
2011-02-02 10:05                     ` Juergen Gross
2011-02-02 10:59                       ` Andre Przywara
2011-02-02 14:39                 ` Stephan Diestelhorst
2011-02-02 15:14                   ` Juergen Gross
2011-02-02 16:01                     ` Stephan Diestelhorst
2011-02-03  5:57                       ` Juergen Gross
2011-02-03  9:18                         ` Juergen Gross
2011-02-04 14:09                           ` Andre Przywara
2011-02-07 12:38                             ` Andre Przywara
2011-02-07 13:32                               ` Juergen Gross
2011-02-07 15:55                                 ` George Dunlap
2011-02-08  5:43                                   ` Juergen Gross
2011-02-08 12:08                                     ` George Dunlap
2011-02-08 12:14                                       ` George Dunlap
2011-02-08 16:33                                         ` Andre Przywara
2011-02-09 12:27                                           ` George Dunlap
2011-02-09 12:27                                             ` George Dunlap
2011-02-09 13:04                                               ` Juergen Gross
2011-02-09 13:39                                                 ` Andre Przywara
2011-02-09 13:51                                               ` Andre Przywara
2011-02-09 14:21                                                 ` Juergen Gross
2011-02-10  6:42                                                   ` Juergen Gross
2011-02-10  9:25                                                     ` Andre Przywara
2011-02-10 14:18                                                       ` Andre Przywara
2011-02-11  6:17                                                         ` Juergen Gross
2011-02-11  7:39                                                           ` Andre Przywara
2011-02-14 17:57                                                             ` George Dunlap
2011-02-15  7:22                                                               ` Juergen Gross
2011-02-16  9:47                                                                 ` Juergen Gross
2011-02-16 13:54                                                                   ` George Dunlap
     [not found]                                                                     ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11                                                                     ` Juergen Gross
2011-02-16 14:28                                                                       ` Juergen Gross
2011-02-17  0:05                                                                       ` André Przywara
2011-02-17  7:05                                                                     ` Juergen Gross
2011-02-17  9:11                                                                       ` Juergen Gross
2011-02-21 10:00                                                                     ` Andre Przywara
2011-02-21 13:19                                                                       ` Juergen Gross
2011-02-21 14:45                                                                         ` Andre Przywara
2011-02-21 14:50                                                                           ` Juergen Gross
2011-02-08 12:23                                       ` Juergen Gross
2011-01-28 11:13   ` George Dunlap
2011-01-28 13:05     ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=ppBtb1nhdfbhGZa0Rt6kVyopdS3iJPr5fVA1x@mail.gmail.com' \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.