From: David Vrabel <david.vrabel@citrix.com>
To: George Dunlap <george.dunlap@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Kevin Tian <kevin.tian@intel.com>
Cc: Lars Kurth <lars.kurth@citrix.com>, Feng Wu <feng.wu@intel.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Dario Faggioli <dario.faggioli@citrix.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: vmx: VT-d posted-interrupt core logic handling
Date: Thu, 10 Mar 2016 11:16:54 +0000 [thread overview]
Message-ID: <56E157A6.4060909@citrix.com> (raw)
In-Reply-To: <56E1509F.1060108@citrix.com>
On 10/03/16 10:46, George Dunlap wrote:
> On 10/03/16 10:35, David Vrabel wrote:
>> On 10/03/16 10:18, Jan Beulich wrote:
>>>>>> On 10.03.16 at 11:05, <kevin.tian@intel.com> wrote:
>>>>> From: Tian, Kevin
>>>>> Sent: Thursday, March 10, 2016 5:20 PM
>>>>>
>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>>> Sent: Thursday, March 10, 2016 5:06 PM
>>>>>>
>>>>>>
>>>>>>> There are many linked list usages today in Xen hypervisor, which
>>>>>>> have different theoretical maximum possible number. The closest
>>>>>>> one to PI might be the usage in tmem (pool->share_list) which is
>>>>>>> page based so could grow 'overly large'. Other examples are
>>>>>>> magnitude lower, e.g. s->ioreq_vcpu_list in ioreq server (which
>>>>>>> could be 8K in above example), and d->arch.hvm_domain.msixtbl_list
>>>>>>> in MSI-x virtualization (which could be 2^11 per spec). Do we
>>>>>>> also want to create some artificial scenarios to examine them
>>>>>>> since based on actual operation K-level entries may also become
>>>>>>> a problem?
>>>>>>>
>>>>>>> Just want to figure out how best we can solve all related linked-list
>>>>>>> usages in current hypervisor.
>>>>>>
>>>>>> As you say, those are (perhaps with the exception of tmem, which
>>>>>> isn't supported anyway due to XSA-15, and which therefore also
>>>>>> isn't on by default) in the order of a few thousand list elements.
>>>>>> And as mentioned above, different bounds apply for lists traversed
>>>>>> in interrupt context vs such traversed only in "normal" context.
>>>>>>
>>>>>
>>>>> That's a good point. Interrupt context should have more restrictions.
>>>>
>>>> Hi, Jan,
>>>>
>>>> I'm thinking your earlier idea about evenly distributed list:
>>>>
>>>> --
>>>> Ah, right, I think that limitation was named before, yet I've
>>>> forgotten about it again. But that only slightly alters the
>>>> suggestion: To distribute vCPU-s evenly would then require to
>>>> change their placement on the pCPU in the course of entering
>>>> blocked state.
>>>> --
>>>>
>>>> Actually after more thinking, there is no hard requirement that
>>>> the vcpu must block on the pcpu which is configured in 'NDST'
>>>> of that vcpu's PI descriptor. What really matters, is that the
>>>> vcpu is added to the linked list of the very pcpu, then when PI
>>>> notification comes we can always find out the vcpu struct from
>>>> that pcpu's linked list. Of course one drawback of such placement
>>>> is additional IPI incurred in wake up path.
>>>>
>>>> Then one possible optimized policy within vmx_vcpu_block could
>>>> be:
>>>>
>>>> (Say PCPU1 which VCPU1 is currently blocked on)
>>>> - As long as the #vcpus in the linked list on PCPU1 is below a
>>>> threshold (say 16), add VCPU1 to the list. NDST set to PCPU1;
>>>> Upon PI notification on PCPU1, local linked list is searched to
>>>> find VCPU1 and then VCPU1 will be unblocked on PCPU1;
>>>>
>>>> - Otherwise, add VCPU1 to PCPU2 based on a simple distribution
>>>> algorithm (based on vcpu_id/vm_id). VCPU1 still blocks on PCPU1
>>>> but NDST set to PCPU2. Upon notification on PCPU2, local linked
>>>> list is searched to find VCPU1 and then an IPI is sent to PCPU1 to
>>>> unblock VCPU1;
>>>
>>> Sounds possible, if the lock handling can be got right. But of
>>> course there can't be any hard limit like 16, at least not alone
>>> (on a systems with extremely many mostly idle vCPU-s we'd
>>> need to allow larger counts - see my earlier explanations in this
>>> regard).
>>
>> You could also consider only waking the first N VCPUs and just making
>> the rest runnable. If you wake more VCPUs than PCPUs at the same time
>> most of them won't actually be scheduled.
>
> "Waking" a vcpu means "changing from blocked to runnable", so those two
> things are the same. And I can't figure out what you mean instead --
> can you elaborate?
>
> Waking up 1000 vcpus is going to take strictly more time than checking
> whether there's a PI interrupt pending on 1000 vcpus to see if they need
> to be woken up.
Waking means making it runnable /and/ attempt to make it running.
So I mean, for the > N'th VCPU don't call __runq_tickle(), only call
__runq_insert().
David
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-03-10 11:16 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-29 3:00 [PATCH v14 0/2] Add VT-d Posted-Interrupts support Feng Wu
2016-02-29 3:00 ` [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling Feng Wu
2016-02-29 13:33 ` Jan Beulich
2016-02-29 13:52 ` Dario Faggioli
2016-03-01 5:39 ` Wu, Feng
2016-03-01 9:24 ` Jan Beulich
2016-03-01 10:16 ` George Dunlap
2016-03-01 13:06 ` Wu, Feng
2016-03-01 5:24 ` Tian, Kevin
2016-03-01 5:39 ` Wu, Feng
2016-03-04 22:00 ` Ideas " Konrad Rzeszutek Wilk
2016-03-07 11:21 ` George Dunlap
2016-03-07 15:53 ` Konrad Rzeszutek Wilk
2016-03-07 16:19 ` Dario Faggioli
2016-03-07 20:23 ` Konrad Rzeszutek Wilk
2016-03-08 12:02 ` George Dunlap
2016-03-08 13:10 ` Wu, Feng
2016-03-08 14:42 ` George Dunlap
2016-03-08 15:42 ` Jan Beulich
2016-03-08 17:05 ` George Dunlap
2016-03-08 17:26 ` Jan Beulich
2016-03-08 18:38 ` George Dunlap
2016-03-09 5:06 ` Wu, Feng
2016-03-09 13:39 ` Jan Beulich
2016-03-09 16:01 ` George Dunlap
2016-03-09 16:31 ` Jan Beulich
2016-03-09 16:23 ` On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling") George Dunlap
2016-03-09 16:58 ` On setting clear criteria for declaring a feature acceptable Jan Beulich
2016-03-09 18:02 ` On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling") David Vrabel
2016-03-10 1:15 ` Wu, Feng
2016-03-10 9:30 ` George Dunlap
2016-03-10 5:09 ` Tian, Kevin
2016-03-10 8:07 ` vmx: VT-d posted-interrupt core logic handling Jan Beulich
2016-03-10 8:43 ` Tian, Kevin
2016-03-10 9:05 ` Jan Beulich
2016-03-10 9:20 ` Tian, Kevin
2016-03-10 10:05 ` Tian, Kevin
2016-03-10 10:18 ` Jan Beulich
2016-03-10 10:35 ` David Vrabel
2016-03-10 10:46 ` George Dunlap
2016-03-10 11:16 ` David Vrabel [this message]
2016-03-10 11:49 ` George Dunlap
2016-03-10 13:24 ` Jan Beulich
2016-03-10 11:00 ` George Dunlap
2016-03-10 11:21 ` Dario Faggioli
2016-03-10 13:36 ` Wu, Feng
2016-05-17 13:27 ` Konrad Rzeszutek Wilk
2016-05-19 7:22 ` Wu, Feng
2016-03-10 10:41 ` George Dunlap
2016-03-09 5:22 ` Ideas Re: [PATCH v14 1/2] " Wu, Feng
2016-03-09 11:25 ` George Dunlap
2016-03-09 12:06 ` Wu, Feng
2016-02-29 3:00 ` [PATCH v14 2/2] Add a command line parameter for VT-d posted-interrupts Feng Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56E157A6.4060909@citrix.com \
--to=david.vrabel@citrix.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=feng.wu@intel.com \
--cc=george.dunlap@citrix.com \
--cc=kevin.tian@intel.com \
--cc=lars.kurth@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).