From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: Xen Platform QoS design discussion
Date: Wed, 7 May 2014 22:18:58 +0100
Message-ID: <536AA342.8030003@citrix.com>
References: <40776A41FC278F40B59438AD47D147A9119F3FEA@SHSMSX104.ccr.corp.intel.com>
	<1398877340.5166.13.camel@kazak.uk.xensource.com>
	<40776A41FC278F40B59438AD47D147A9119F42A2@SHSMSX104.ccr.corp.intel.com>
	<5363804B020000780000E604@mail.emea.novell.com>
	<40776A41FC278F40B59438AD47D147A9119F4EF4@SHSMSX104.ccr.corp.intel.com>
	<5363AE54020000780000E7A2@mail.emea.novell.com>
	<40776A41FC278F40B59438AD47D147A9119FE6BB@SHSMSX104.ccr.corp.intel.com>
	<5368B418.9000307@citrix.com>
	<CAFLBxZZkPMLBBAieLP6abpwOaFBccy9U6yQfSShwAEP4tBtE7A@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <CAFLBxZZkPMLBBAieLP6abpwOaFBccy9U6yQfSShwAEP4tBtE7A@mail.gmail.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: "Xu, Dongxiao" <dongxiao.xu@intel.com>, Ian Campbell <Ian.Campbell@citrix.com>, Jan Beulich <JBeulich@suse.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 07/05/14 14:26, George Dunlap wrote:
> On Tue, May 6, 2014 at 11:06 AM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 06/05/14 02:40, Xu, Dongxiao wrote:
>>>> -----Original Message-----
>>>> From: Xu, Dongxiao
>>>> Sent: Sunday, May 04, 2014 8:46 AM
>>>> To: Jan Beulich
>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>> xen-devel@lists.xen.org
>>>> Subject: RE: Xen Platform QoS design discussion
>>>>
>>>>> -----Original Message-----
>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>> Sent: Friday, May 02, 2014 8:40 PM
>>>>> To: Xu, Dongxiao
>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>>> xen-devel@lists.xen.org
>>>>> Subject: RE: Xen Platform QoS design discussion
>>>>>
>>>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
>>>>>>>  -----Original Message-----
>>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>>>> Sent: Friday, May 02, 2014 5:24 PM
>>>>>>> To: Xu, Dongxiao
>>>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>>>>> xen-devel@lists.xen.org
>>>>>>> Subject: RE: Xen Platform QoS design discussion
>>>>>>>
>>>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>>>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>>>>>>>>> Have you asked yourself whether this information even needs to be
>>>>>>>>> exposed all the way up to libxl? Who are the expected consumers of this
>>>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>>>>>>>>> expecting toolstacks to plumb this information all the way up to their
>>>>>>>>> GUI or CLI (e.g. xl or virsh)?
>>>>>>>> The information returned to libxl users is the cache utilization for a
>>>>>>>> certain domain in certain socket, and the main consumers are cloud users
>>>>>> like
>>>>>>>> openstack, etc. Of course, we will also provide an xl command to present
>>>>>> such
>>>>>>>> information.
>>>>>>> To me this doesn't really address the question Ian asked, yet knowing
>>>>>>> who's going to be the consumer of the data is also quite relevant for
>>>>>>> answering your original question on the method to obtain that data.
>>>>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
>>>>>>> a suitable approach despite the data being more of sysctl kind. But if
>>>>>>> a global view would be more important, that model would seem to make
>>>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>>>>>>> to agree that not using shared pages would be preferable; iirc their use
>>>>>>> was mainly suggested because of the size of the data.
>>>>>> From the discussion with openstack developers, on certain cloud host, all
>>>>>> running VM's information (e.g., domain ID) will be stored in a database, and
>>>>>> openstack software will use libvirt/XenAPI to query specific domain
>>>>>> information. That libvirt/XenAPI API interface basically accepts the domain
>>>>>> ID as input parameter and get the domain information, including the platform
>>>>>> QoS one.
>>>>>>
>>>>>> Based on above information, I think we'd better design the QoS hypercall
>>>>>> per-domain.
>>>>> If you think that this is going to be the only (or at least prevalent)
>>>>> usage model, that's probably okay then. But I'm a little puzzled that
>>>>> all this effort is just for a single, rather specific consumer. I thought
>>>>> that if this is so important to Intel there would be wider interested
>>>>> audience.
>>> Since there is no further comments, I suppose we all agreed on making the hypercall per-domain and use data copying mechanism between hypervisor and Dom0 tool stack?
>>>
>> No - the onus is very much on you to prove that your API will *not* be
>> used in the following way:
>>
>> every $TIMEPERIOD
>>   for each domain
>>     for each type of information
>>       get-$TYPE-information-for-$DOMAIN
>>
>>
>> Which is the source of my concerns regarding overhead.
>>
>> As far as I can see, as soon as you provide access to this QoS
>> information, higher level toolstacks are going to want all information
>> for all domains.  Given your proposed domctl, they will have exactly one
>> (bad) way of getting this information.
> Is this really going to be that much of a critical path that we need
> to even have this discussion?

Absolutely.

If that logical set of nested loops is on a remote control instance
where get-$TYPE-information-for-$DOMAIN involves rpc to a particular
dom0, then the domctls can be approximated as being functionally
infinite time periods apart.

If the set of nested loops is a daemon or script in dom0, the domctls
will be very close together.

As the current implementation involves taking a global spinlock, IPI'ing
the other sockets and MSR interactions, the net impact on the running
system can be massive, particularly if back-to-back IPIs interrupt HVM
guests.

>
> We have two different hypercalls right now for getting "dominfo": a
> domctl and a sysctl.  You use the domctl if you want information about
> a single domain, you use sysctl if you want information about all
> domains.  The sysctl implementation calls the domctl implementation
> internally.

It is not a fair comparison, given the completely different nature of
the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
more than reading specific bits of data out the appropriate struct
domain and its struct vcpu's which can trivially be done by the cpu
handling the hypercall.

>
> Is there a problem with doing the same thing here?  Or, with starting
> with a domctl, and then creating a sysctl if iterating over all
> domains (and calling the domctl internally) if we measure the domctl
> to be too slow for many callers?
>
>  -George

My problem is not with the domctl per-se.

My problem is that this is not a QoS design discussion;  this is an
email thread about a specific QoS implementation which is not answering
the concerns raised against it to the satisfaction of people raising the
concerns.

The core argument here is that a statement of "OpenStack want to get a
piece of QoS data back from libvirt/xenapi when querying a specific
domain" is being used to justify implementing the hypercall in an
identical fashion.

This is not a libxl design; this is a single user story forming part of
the requirement "I as a cloud service provider would like QoS
information for each VM to be available to my
$CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
customers, balance my load more evenly, etc}".

The only valid justification for implementing a brand new hypercall in a
certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
perform the actions I need to perform", for appropriately
substitutions.  Not "because it is the same way I want to hand this
information off at the higher level".

As part of this design discussion. I have raised a concern saying "I
believe the usecase of having a stats gathering daemon in dom0 has not
been appropriately considered", qualified with "If you were to use the
domctl as currently designed from a stats gathering daemon, you will
cripple Xen with the overhead".

Going back to the original use, xenapi has a stats daemon for these
things.  It has an rpc interface so a query given a specific domain can
return some or all data for that domain, but it very definitely does not
translate each request into a hypercall for the requested information. 
I have no real experience with libvirt, so can't comment on stats
gathering in that context.

I have proposed an alternative Xen->libxc interface designed with a
stats daemon in mind, explaining why I believe it has lower overheads to
Xen and why is more in line with what I expect ${VENDOR}Stack to
actually want.

I am now waiting for a reasoned rebuttal which has more content than
"because there are a set of patches which already implement it in this way".

~Andrew