All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <dunlapg@umich.edu>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [Hackathon minutes] PV frontends/backends and NUMA machines
Date: Mon, 20 May 2013 14:48:50 +0100	[thread overview]
Message-ID: <CAFLBxZbonzEwo4mF6PTSq6WQjU2haN_Ray-Z_3Td83i=f7zsbA@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1305201443510.4799@kaball.uk.xensource.com>

On Mon, May 20, 2013 at 2:44 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> Hi all,
> these are my notes from the discussion that we had at the Hackathon
> regarding PV frontends and backends running on NUMA machines.
>
>
> ---
>
> The problem: how can we make sure that frontends and backends run in the
> same NUMA node?
>
> We would need to run one backend kthread per NUMA node: we have already
> one kthread per netback vif (one per guest), we could pin each of them
> on a different NUMA node, the same one the frontend is running on.
>
> But that means that dom0 would be running on several NUMA nodes at once,
> how much of a performance penalty would that be?
> We would need to export NUMA information to dom0, so that dom0 can make
> smart decisions on memory allocations and we would also need to allocate
> memory for dom0 from multiple nodes.
>
> We need a way to automatically allocate the initial dom0 memory in Xen
> in a NUMA-aware way and we need Xen to automatically create one dom0 vcpu
> per NUMA node.
>
> After dom0 boots, the toolstack is going to decide where to place any
> new guests: it allocates the memory from the NUMA node it wants to run
> the guest on and it is going to ask dom0 to allocate the kthread from
> that node too. (Maybe writing the NUMA node on xenstore.)
>
> We need to make sure that the interrupts/MSIs coming from the NIC arrive
> on the same pcpu that is running the vcpu that needs to receive it.
> We need to do irqbalacing in dom0, then Xen automatically will make the
> physical MSIs follow the vcpu automatically.
>
> If the card is multiqueue we need to make sure that we use the multiple
> queues so that we can have difference sources of interrupts/MSIs for
> each vif. This allows us to independently notify each dom0 vcpu.

So the work items I remember are as follows:
1. Implement NUMA affinity for vcpus
2. Implement Guest NUMA support for PV guests
3. Teach Xen how to make a sensible NUMA allocation layout for dom0
4. Teach the toolstack to pin the netback threads to dom0 vcpus
running on the correct node (s)

Dario will do #1.  I volunteered to take a stab at #2 and #3.  #4 we
should be able to do independently of 2 and 3 -- it should give a
slight performance improvement due to cache proximity even if dom0
memory is striped across the nodes.

Does someone want to volunteer to take a look at #4?  I suspect that
the core technical implementation will be simple, but getting a stable
design that everyone is happy with for the future will take a
significant number of iterations.  Learn from my fail w/ USB hot-plug
in 4.3, and start the design process early. :-)

 -George

  reply	other threads:[~2013-05-20 13:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-20 13:44 [Hackathon minutes] PV frontends/backends and NUMA machines Stefano Stabellini
2013-05-20 13:48 ` George Dunlap [this message]
2013-05-21  8:32   ` Tim Deegan
2013-05-21  8:47     ` George Dunlap
2013-05-21  8:49       ` George Dunlap
2013-05-21 10:03         ` Dario Faggioli
2013-05-21  9:20       ` Tim Deegan
2013-05-21  9:45         ` George Dunlap
2013-05-21 10:24           ` Tim Deegan
2013-05-21 10:28             ` George Dunlap
2013-05-21 11:12               ` Dario Faggioli
2013-05-21  9:53         ` Dario Faggioli
2013-05-21 10:06       ` Jan Beulich
2013-05-21 10:30         ` Dario Faggioli
2013-05-21 10:43           ` Jan Beulich
2013-05-21 10:58             ` Dario Faggioli
2013-05-21 11:47               ` Jan Beulich
2013-05-21 13:43                 ` Dario Faggioli
2013-05-24 16:00                   ` George Dunlap
2013-05-25 13:57                     ` Dario Faggioli
2013-05-21  8:44   ` Roger Pau Monné
2013-05-21  9:24     ` Wei Liu
2013-05-21  9:53       ` George Dunlap
2013-05-21 10:17         ` Dario Faggioli
2013-05-21 11:10   ` Dario Faggioli
2013-05-23 17:21     ` Dario Faggioli
2013-05-22  1:28   ` Konrad Rzeszutek Wilk
2013-05-22  7:44     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFLBxZbonzEwo4mF6PTSq6WQjU2haN_Ray-Z_3Td83i=f7zsbA@mail.gmail.com' \
    --to=dunlapg@umich.edu \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.