All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Chao Gao <chao.gao@intel.com>, xen-devel@lists.xen.org
Cc: Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH 0/4] mitigate the per-pCPU blocking list may be too long
Date: Wed, 26 Apr 2017 17:39:57 +0100	[thread overview]
Message-ID: <15f405cc-04aa-ac3d-8ae2-17f684b21d36@citrix.com> (raw)
In-Reply-To: <1493167967-74144-1-git-send-email-chao.gao@intel.com>

On 26/04/17 01:52, Chao Gao wrote:
> VT-d PI introduces a per-pCPU blocking list to track the blocked vCPU
> running on the pCPU. Theoretically, there are 32K domain on single
> host, 128 vCPUs per domain. If all vCPUs are blocked on the same pCPU,
> 4M vCPUs are in the same list. Travelling this issue consumes too
> much time. We have discussed this issue in [1,2,3].
> 
> To mitigate this issue, we proposed the following two method [3]:
> 1. Evenly distributing all the blocked vCPUs among all pCPUs.

So you're not actually distributing the *vcpus* among the pcpus (which
would imply some interaction with the scheduler); you're distributing
the vcpu PI wake-up interrupt between pcpus.  Is that right?

Doesn't having a PI on a different pcpu than where the vcpu is running
mean at least one IPI to wake up that vcpu?  If so, aren't we imposing a
constant overhead on basically every single interrupt, as well as
increasing the IPI traffic, in order to avoid a highly unlikely
theoretical corner case?

A general maxim in OS development is "Make the common case fast, and the
uncommon case correct."  It seems like it would be better in the common
case to have the PI vectors on the pcpu on which the vcpu is running,
and only implement the balancing when the list starts to get too long.

What do you think?

> 2. Don't put the blocked vCPUs which won't be woken by the wakeup
> interrupt into the per-pCPU list.
> 
> PATCH 1/4 tracks the event, adding entry to PI blocking list. With the
> patch, some data can be acquired to help to validate the following
> patches. 
> 
> Patch 2/4 randomly distritbutes entries (vCPUs) among all oneline
> pCPUs, which can theoretically decrease the maximum of #entry
> in the list by N times. N is #pCPU.
> 
> Patch 3/4 adds a refcount to vcpu's pi_desc. If the pi_desc is
> recorded in one IRTE, the refcount increase by 1 and If the pi_desc is
> cleared in one IRTE, the refcount decrease by 1.
> 
> In Patch 4/4, one vCPU is added to PI blocking list only if its
> pi_desc is referred by at least one IRTE.
> 
> I tested this series in the following scene:
> * One 128 vCPUs guest and assign a NIC to it
> * all 128 vCPUs are pinned to one pCPU.
> * use xentrace to collect events for 5 minutes
> 
> I compared the maximum of #entry in one list and #event (adding entry to
> PI blocking list) with and without the three latter patches. Here
> is the result:
> -------------------------------------------------------------
> |               |                      |                    |
> |    Items      |   Maximum of #entry  |      #event        |
> |               |                      |                    |
> -------------------------------------------------------------
> |               |                      |                    |
> |W/ the patches |         6            |       22740        |
> |               |                      |                    |
> -------------------------------------------------------------
> |               |                      |                    |
> |W/O the patches|        128           |       46481        |
> |               |                      |                    |
> -------------------------------------------------------------

Any chance you could trace how long the list traversal took?  It would
be good for future reference to have an idea what kinds of timescales
we're talking about.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2017-04-26 16:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  0:52 [PATCH 0/4] mitigate the per-pCPU blocking list may be too long Chao Gao
2017-04-26  0:52 ` [PATCH 1/4] xentrace: add TRC_HVM_VT_D_PI_BLOCK Chao Gao
2017-04-26  0:52 ` [PATCH 2/4] VT-d PI: Randomly Distribute entries to all online pCPUs' pi blocking list Chao Gao
2017-04-26  0:52 ` [PATCH 3/4] VT-d PI: Add reference count to pi_desc Chao Gao
2017-04-26  0:52 ` [PATCH 4/4] VT-d PI: Don't add vCPU to PI blocking list for a case Chao Gao
2017-04-26  8:19 ` [PATCH 0/4] mitigate the per-pCPU blocking list may be too long Jan Beulich
2017-04-26  3:30   ` Chao Gao
2017-04-26 10:52     ` Jan Beulich
2017-04-26 16:39 ` George Dunlap [this message]
2017-04-27  0:43   ` Chao Gao
2017-04-27  9:44     ` George Dunlap
2017-04-27  5:02       ` Chao Gao
2017-05-02  5:45   ` Chao Gao
2017-05-03 10:08     ` George Dunlap
2017-05-03 10:21       ` Jan Beulich
2017-05-08 16:15         ` Chao Gao
2017-05-08  8:39           ` Jan Beulich
2017-05-08 16:38             ` Chao Gao
2017-05-08  9:13           ` George Dunlap
2017-05-08  9:24             ` Jan Beulich
2017-05-08 17:37               ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15f405cc-04aa-ac3d-8ae2-17f684b21d36@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=chao.gao@intel.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.