All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Platform QoS design discussion
@ 2014-04-30 16:47 Xu, Dongxiao
  2014-04-30 17:02 ` Ian Campbell
  0 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-04-30 16:47 UTC (permalink / raw)
  To: Jan Beulich (JBeulich@suse.com),
	Ian.Campbell, Andrew Cooper (andrew.cooper3@citrix.com)
  Cc: xen-devel

Hello maintainers,

As PQoS feature has already been reviewed for 10 rounds, and also considering that more QoS related features are published in new SDM, like MBM and CQE (refer to chapter 17.14 and 17.15 of Intel SDM Volume 3). Before sending out another version of patch, I'd like to make sure we are aligned on some basic design issues, making the final implementation simple and performance efficient.

1) The hypercall to query QoS monitoring data, should be sysctl or domctl?
Previous QoS monitoring data query hypercall is designed with sysctl, which returns the whole QoS data in a 2-dimenstion array format for all domains. However users don't like to handle such 2-dimension array themselves, but preferring the hypercall API to be issued per-domain, by using domain id as the input parameter and returns the domain related QoS data. Here I propose to use the domctl style hypercall to get QoS data for specific domain. This has the advantage of simplifying the libxl QoS APIs for user-space developers, and also make the QoS memory allocation in Xen much easier.

2) How much memory needs to be allocated in Xen for QoS monitoring data?
There are a lot of comments complaining the previous memory allocation for QoS monitoring data. But if we adopt the domctl style of hypercall proposed in 1), then the needed memory is much smaller since the data structure becomes 1-dimension. We only need to allocate "nr_sockets" related size in initialization time to present the domain's QoS monitoring data. Furthermore, no additional QoS resource allocation logic is needed when CPU online/offline.

3) Copy or share for QoS monitoring data?
In previous patches, sharing mechanism is used for passing data between Xen and dom0 toolstack. However when MBM feature comes out, we may need to re-consider this solution.
CQM and MBM features belong to L3 monitoring category, and they share the same CPUID enumeration method for its QoS data structure, which is 64bit width. From data structure design point of view, all L3 monitoring category features (CQM, MBM and others in L3 category) should better to share the following structure:

struct socket_l3 {
    unsigned int socket_id;
    unsigned int l3_type; // CQM or MBM or others
    unsigned long qm_data_l3;
}

In hypercall, user sets the "l3_type" as an input parameter (CQM or MBM or others in L3 category), and Xen returns the qm_data_l3 as the monitoring data for that certain feature.
In this case, sharing doesn't work since data may be collapsed if there are two xl instances running simultaneously to separately get CQM and MBM data. Besides, if we use the per-domain hypercall (domctl instead of sysctl), the data amount is very small for every hypercall. (saying 4 sockets, the data amount is 4*sizeof(struct socket_l3) = 64 bytes per hypercall). Even for data sharing mechanism, we need to pass the page address as the parameter, which also requires 8 or more bytes. Therefore in my opinion, data copy is acceptable in PQoS case.

Any feedback is welcome!

Thanks,
Dongxiao

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-04-30 16:47 Xen Platform QoS design discussion Xu, Dongxiao
@ 2014-04-30 17:02 ` Ian Campbell
  2014-05-01  0:56   ` Xu, Dongxiao
  0 siblings, 1 reply; 46+ messages in thread
From: Ian Campbell @ 2014-04-30 17:02 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: Andrew Cooper (andrew.cooper3@citrix.com),
	Jan Beulich (JBeulich@suse.com),
	xen-devel

On Wed, 2014-04-30 at 16:47 +0000, Xu, Dongxiao wrote:
> domain related QoS data. Here I propose to use the domctl style
> hypercall to get QoS data for specific domain. This has the advantage
> of simplifying the libxl QoS APIs for user-space developers, and also
> make the QoS memory allocation in Xen much easier.

Note that the libxl QoS API need not have any particular resemblance to
the underlying hypercall API, it is perfectly reasonable for libxl (or
libxc even) to massage the data provided by the raw hypercall (or
several hypercalls) into something nicer for end user consumption.

The important thing about any libxl level interface is that the library
API cannot change once it has been introduced, so thought needs to be
given to extensibility and future proofing. The API should also be
structured (so no binary blobs, or arrays of numbers which need special
knowledge to interpret etc).

Have you asked yourself whether this information even needs to be
exposed all the way up to libxl? Who are the expected consumers of this
interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
expecting toolstacks to plumb this information all the way up to their
GUI or CLI (e.g. xl or virsh)?

Ian.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-04-30 17:02 ` Ian Campbell
@ 2014-05-01  0:56   ` Xu, Dongxiao
  2014-05-02  9:23     ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-01  0:56 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper (andrew.cooper3@citrix.com),
	Jan Beulich (JBeulich@suse.com),
	xen-devel

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Thursday, May 01, 2014 1:02 AM
> To: Xu, Dongxiao
> Cc: Jan Beulich (JBeulich@suse.com); Andrew Cooper
> (andrew.cooper3@citrix.com); xen-devel@lists.xen.org
> Subject: Re: Xen Platform QoS design discussion
> 
> On Wed, 2014-04-30 at 16:47 +0000, Xu, Dongxiao wrote:
> > domain related QoS data. Here I propose to use the domctl style
> > hypercall to get QoS data for specific domain. This has the advantage
> > of simplifying the libxl QoS APIs for user-space developers, and also
> > make the QoS memory allocation in Xen much easier.
> 
> Note that the libxl QoS API need not have any particular resemblance to
> the underlying hypercall API, it is perfectly reasonable for libxl (or
> libxc even) to massage the data provided by the raw hypercall (or
> several hypercalls) into something nicer for end user consumption.

Yes, I understand.
If use sysctl to get per-domain QoS info, Xen still needs to provide the entire QoS data for user, and libxl QoS API may extract the per-domain data from it. However this is not high efficient.
While using domctl, Xen provides the exact info that needed by Dom0 toolstack.

> 
> The important thing about any libxl level interface is that the library
> API cannot change once it has been introduced, so thought needs to be
> given to extensibility and future proofing. The API should also be
> structured (so no binary blobs, or arrays of numbers which need special
> knowledge to interpret etc).

Agree.
Previously we are returning an 2-dimention array of nr_rmids * nr_sockets number of data back to Dom0 toolstack. Since the size may already very big, so we structure the data in binary blob to save memory space. If using domctl, then the data is only 1-dimention and much less, and we can use structured data to present to libxl users.

> 
> Have you asked yourself whether this information even needs to be
> exposed all the way up to libxl? Who are the expected consumers of this
> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> expecting toolstacks to plumb this information all the way up to their
> GUI or CLI (e.g. xl or virsh)?

The information returned to libxl users is the cache utilization for a certain domain in certain socket, and the main consumers are cloud users like openstack, etc. Of course, we will also provide an xl command to present such information.

Thanks,
Dongxiao

> 
> Ian.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-01  0:56   ` Xu, Dongxiao
@ 2014-05-02  9:23     ` Jan Beulich
  2014-05-02 12:30       ` Xu, Dongxiao
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-02  9:23 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Ian Campbell, xen-devel

>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>> Have you asked yourself whether this information even needs to be
>> exposed all the way up to libxl? Who are the expected consumers of this
>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>> expecting toolstacks to plumb this information all the way up to their
>> GUI or CLI (e.g. xl or virsh)?
> 
> The information returned to libxl users is the cache utilization for a 
> certain domain in certain socket, and the main consumers are cloud users like 
> openstack, etc. Of course, we will also provide an xl command to present such 
> information.

To me this doesn't really address the question Ian asked, yet knowing
who's going to be the consumer of the data is also quite relevant for
answering your original question on the method to obtain that data.
Obviously, if the main use of it is per-domain, a domctl would seem like
a suitable approach despite the data being more of sysctl kind. But if
a global view would be more important, that model would seem to make
life needlessly hard for the consumers. In turn, if using a domctl, I tend
to agree that not using shared pages would be preferable; iirc their use
was mainly suggested because of the size of the data.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-02  9:23     ` Jan Beulich
@ 2014-05-02 12:30       ` Xu, Dongxiao
  2014-05-02 12:40         ` Jan Beulich
  2014-05-02 12:50         ` Andrew Cooper
  0 siblings, 2 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-02 12:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 02, 2014 5:24 PM
> To: Xu, Dongxiao
> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> xen-devel@lists.xen.org
> Subject: RE: Xen Platform QoS design discussion
> 
> >>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> >> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> >> Have you asked yourself whether this information even needs to be
> >> exposed all the way up to libxl? Who are the expected consumers of this
> >> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> >> expecting toolstacks to plumb this information all the way up to their
> >> GUI or CLI (e.g. xl or virsh)?
> >
> > The information returned to libxl users is the cache utilization for a
> > certain domain in certain socket, and the main consumers are cloud users like
> > openstack, etc. Of course, we will also provide an xl command to present such
> > information.
> 
> To me this doesn't really address the question Ian asked, yet knowing
> who's going to be the consumer of the data is also quite relevant for
> answering your original question on the method to obtain that data.
> Obviously, if the main use of it is per-domain, a domctl would seem like
> a suitable approach despite the data being more of sysctl kind. But if
> a global view would be more important, that model would seem to make
> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> to agree that not using shared pages would be preferable; iirc their use
> was mainly suggested because of the size of the data.

>From the discussion with openstack developers, on certain cloud host, all running VM's information (e.g., domain ID) will be stored in a database, and openstack software will use libvirt/XenAPI to query specific domain information. That libvirt/XenAPI API interface basically accepts the domain ID as input parameter and get the domain information, including the platform QoS one.

Based on above information, I think we'd better design the QoS hypercall per-domain.

Thanks,
Dongxiao




> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-02 12:30       ` Xu, Dongxiao
@ 2014-05-02 12:40         ` Jan Beulich
  2014-05-04  0:46           ` Xu, Dongxiao
  2014-05-06  1:40           ` Xu, Dongxiao
  2014-05-02 12:50         ` Andrew Cooper
  1 sibling, 2 replies; 46+ messages in thread
From: Jan Beulich @ 2014-05-02 12:40 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Ian Campbell, xen-devel

>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 02, 2014 5:24 PM
>> To: Xu, Dongxiao
>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>> xen-devel@lists.xen.org 
>> Subject: RE: Xen Platform QoS design discussion
>> 
>> >>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>> >> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>> >> Have you asked yourself whether this information even needs to be
>> >> exposed all the way up to libxl? Who are the expected consumers of this
>> >> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>> >> expecting toolstacks to plumb this information all the way up to their
>> >> GUI or CLI (e.g. xl or virsh)?
>> >
>> > The information returned to libxl users is the cache utilization for a
>> > certain domain in certain socket, and the main consumers are cloud users 
> like
>> > openstack, etc. Of course, we will also provide an xl command to present 
> such
>> > information.
>> 
>> To me this doesn't really address the question Ian asked, yet knowing
>> who's going to be the consumer of the data is also quite relevant for
>> answering your original question on the method to obtain that data.
>> Obviously, if the main use of it is per-domain, a domctl would seem like
>> a suitable approach despite the data being more of sysctl kind. But if
>> a global view would be more important, that model would seem to make
>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>> to agree that not using shared pages would be preferable; iirc their use
>> was mainly suggested because of the size of the data.
> 
> From the discussion with openstack developers, on certain cloud host, all 
> running VM's information (e.g., domain ID) will be stored in a database, and 
> openstack software will use libvirt/XenAPI to query specific domain 
> information. That libvirt/XenAPI API interface basically accepts the domain 
> ID as input parameter and get the domain information, including the platform 
> QoS one.
> 
> Based on above information, I think we'd better design the QoS hypercall 
> per-domain.

If you think that this is going to be the only (or at least prevalent)
usage model, that's probably okay then. But I'm a little puzzled that
all this effort is just for a single, rather specific consumer. I thought
that if this is so important to Intel there would be wider interested
audience.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-02 12:30       ` Xu, Dongxiao
  2014-05-02 12:40         ` Jan Beulich
@ 2014-05-02 12:50         ` Andrew Cooper
  2014-05-04  2:34           ` Xu, Dongxiao
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2014-05-02 12:50 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: Ian Campbell, Jan Beulich, xen-devel

On 02/05/14 13:30, Xu, Dongxiao wrote:
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 02, 2014 5:24 PM
>> To: Xu, Dongxiao
>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>> xen-devel@lists.xen.org
>> Subject: RE: Xen Platform QoS design discussion
>>
>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>>>> Have you asked yourself whether this information even needs to be
>>>> exposed all the way up to libxl? Who are the expected consumers of this
>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>>>> expecting toolstacks to plumb this information all the way up to their
>>>> GUI or CLI (e.g. xl or virsh)?
>>> The information returned to libxl users is the cache utilization for a
>>> certain domain in certain socket, and the main consumers are cloud users like
>>> openstack, etc. Of course, we will also provide an xl command to present such
>>> information.
>> To me this doesn't really address the question Ian asked, yet knowing
>> who's going to be the consumer of the data is also quite relevant for
>> answering your original question on the method to obtain that data.
>> Obviously, if the main use of it is per-domain, a domctl would seem like
>> a suitable approach despite the data being more of sysctl kind. But if
>> a global view would be more important, that model would seem to make
>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>> to agree that not using shared pages would be preferable; iirc their use
>> was mainly suggested because of the size of the data.
> From the discussion with openstack developers, on certain cloud host, all running VM's information (e.g., domain ID) will be stored in a database, and openstack software will use libvirt/XenAPI to query specific domain information. That libvirt/XenAPI API interface basically accepts the domain ID as input parameter and get the domain information, including the platform QoS one.
>
> Based on above information, I think we'd better design the QoS hypercall per-domain.

The design of the hypercall has nothing to do with the design of the
libxl/XenAPI interface.

It is clear from this statement that cloudstack want all information for
all domains.  Therefore, at one level at least there will be a big set
of nested loops like:

every $TIMEPERIOD
  for each domain
    for each type of information
      get-$TYPE-information-for-$DOMAIN

As far as a XenAPI inteface would go, this would be information coming
from the rrdd-daemon.  This daemon most certainly wont want to be using
a hypercall designed like this.

Does anyone know how libvirt/libxl would go about
collecting/storing/passing this information?

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-02 12:40         ` Jan Beulich
@ 2014-05-04  0:46           ` Xu, Dongxiao
  2014-05-06  9:10             ` Ian Campbell
  2014-05-06  1:40           ` Xu, Dongxiao
  1 sibling, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-04  0:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 02, 2014 8:40 PM
> To: Xu, Dongxiao
> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> xen-devel@lists.xen.org
> Subject: RE: Xen Platform QoS design discussion
> 
> >>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, May 02, 2014 5:24 PM
> >> To: Xu, Dongxiao
> >> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >> xen-devel@lists.xen.org
> >> Subject: RE: Xen Platform QoS design discussion
> >>
> >> >>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> >> >> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> >> >> Have you asked yourself whether this information even needs to be
> >> >> exposed all the way up to libxl? Who are the expected consumers of this
> >> >> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> >> >> expecting toolstacks to plumb this information all the way up to their
> >> >> GUI or CLI (e.g. xl or virsh)?
> >> >
> >> > The information returned to libxl users is the cache utilization for a
> >> > certain domain in certain socket, and the main consumers are cloud users
> > like
> >> > openstack, etc. Of course, we will also provide an xl command to present
> > such
> >> > information.
> >>
> >> To me this doesn't really address the question Ian asked, yet knowing
> >> who's going to be the consumer of the data is also quite relevant for
> >> answering your original question on the method to obtain that data.
> >> Obviously, if the main use of it is per-domain, a domctl would seem like
> >> a suitable approach despite the data being more of sysctl kind. But if
> >> a global view would be more important, that model would seem to make
> >> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> >> to agree that not using shared pages would be preferable; iirc their use
> >> was mainly suggested because of the size of the data.
> >
> > From the discussion with openstack developers, on certain cloud host, all
> > running VM's information (e.g., domain ID) will be stored in a database, and
> > openstack software will use libvirt/XenAPI to query specific domain
> > information. That libvirt/XenAPI API interface basically accepts the domain
> > ID as input parameter and get the domain information, including the platform
> > QoS one.
> >
> > Based on above information, I think we'd better design the QoS hypercall
> > per-domain.
> 
> If you think that this is going to be the only (or at least prevalent)
> usage model, that's probably okay then. But I'm a little puzzled that
> all this effort is just for a single, rather specific consumer. I thought
> that if this is so important to Intel there would be wider interested
> audience.

Not specifically for a single customer.
Currently we consider Openstack a lot because it is one of the most popular cloud software.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-02 12:50         ` Andrew Cooper
@ 2014-05-04  2:34           ` Xu, Dongxiao
  2014-05-06  9:12             ` Ian Campbell
  2014-05-06 10:00             ` Andrew Cooper
  0 siblings, 2 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-04  2:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Campbell, Jan Beulich, xen-devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, May 02, 2014 8:51 PM
> To: Xu, Dongxiao
> Cc: Jan Beulich; Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: Xen Platform QoS design discussion
> 
> On 02/05/14 13:30, Xu, Dongxiao wrote:
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, May 02, 2014 5:24 PM
> >> To: Xu, Dongxiao
> >> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >> xen-devel@lists.xen.org
> >> Subject: RE: Xen Platform QoS design discussion
> >>
> >>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> >>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> >>>> Have you asked yourself whether this information even needs to be
> >>>> exposed all the way up to libxl? Who are the expected consumers of this
> >>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> >>>> expecting toolstacks to plumb this information all the way up to their
> >>>> GUI or CLI (e.g. xl or virsh)?
> >>> The information returned to libxl users is the cache utilization for a
> >>> certain domain in certain socket, and the main consumers are cloud users
> like
> >>> openstack, etc. Of course, we will also provide an xl command to present
> such
> >>> information.
> >> To me this doesn't really address the question Ian asked, yet knowing
> >> who's going to be the consumer of the data is also quite relevant for
> >> answering your original question on the method to obtain that data.
> >> Obviously, if the main use of it is per-domain, a domctl would seem like
> >> a suitable approach despite the data being more of sysctl kind. But if
> >> a global view would be more important, that model would seem to make
> >> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> >> to agree that not using shared pages would be preferable; iirc their use
> >> was mainly suggested because of the size of the data.
> > From the discussion with openstack developers, on certain cloud host, all
> running VM's information (e.g., domain ID) will be stored in a database, and
> openstack software will use libvirt/XenAPI to query specific domain information.
> That libvirt/XenAPI API interface basically accepts the domain ID as input
> parameter and get the domain information, including the platform QoS one.
> >
> > Based on above information, I think we'd better design the QoS hypercall
> per-domain.
> 
> The design of the hypercall has nothing to do with the design of the
> libxl/XenAPI interface.

If use the share mechanism between Xen and Dom0 user space, plus explicitly listing all the available CQM features as you proposed (see below structure cited from previous mail), then the ABI between Xen and Dom0 user space may need to be changing every time when a new QoS feature is introduced, which breaks the compatibility to some extent. :(

struct
{
    uint64_t[nr l3 events xen knows how to collect] l3_data;
    uint... new categories.
} [system max_rmid];


uint64_t[max_cqm_rmid] l3_occupency;
uint64_t[max_mbm_rmid] l3_total_bandwidth;
uint64_t[max_mbm_rmid] l3_local_bandwidth;
uint... new categories.

Thanks,
Dongxiao

> 
> It is clear from this statement that cloudstack want all information for
> all domains.  Therefore, at one level at least there will be a big set
> of nested loops like:
> 
> every $TIMEPERIOD
>   for each domain
>     for each type of information
>       get-$TYPE-information-for-$DOMAIN
> 
> As far as a XenAPI inteface would go, this would be information coming
> from the rrdd-daemon.  This daemon most certainly wont want to be using
> a hypercall designed like this.
> 
> Does anyone know how libvirt/libxl would go about
> collecting/storing/passing this information?
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-02 12:40         ` Jan Beulich
  2014-05-04  0:46           ` Xu, Dongxiao
@ 2014-05-06  1:40           ` Xu, Dongxiao
  2014-05-06  7:55             ` Jan Beulich
  2014-05-06 10:06             ` Andrew Cooper
  1 sibling, 2 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-06  1:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Ian Campbell, xen-devel

> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Sunday, May 04, 2014 8:46 AM
> To: Jan Beulich
> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> xen-devel@lists.xen.org
> Subject: RE: Xen Platform QoS design discussion
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Friday, May 02, 2014 8:40 PM
> > To: Xu, Dongxiao
> > Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> > xen-devel@lists.xen.org
> > Subject: RE: Xen Platform QoS design discussion
> >
> > >>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
> > >>  -----Original Message-----
> > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Friday, May 02, 2014 5:24 PM
> > >> To: Xu, Dongxiao
> > >> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> > >> xen-devel@lists.xen.org
> > >> Subject: RE: Xen Platform QoS design discussion
> > >>
> > >> >>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> > >> >> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > >> >> Have you asked yourself whether this information even needs to be
> > >> >> exposed all the way up to libxl? Who are the expected consumers of this
> > >> >> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> > >> >> expecting toolstacks to plumb this information all the way up to their
> > >> >> GUI or CLI (e.g. xl or virsh)?
> > >> >
> > >> > The information returned to libxl users is the cache utilization for a
> > >> > certain domain in certain socket, and the main consumers are cloud users
> > > like
> > >> > openstack, etc. Of course, we will also provide an xl command to present
> > > such
> > >> > information.
> > >>
> > >> To me this doesn't really address the question Ian asked, yet knowing
> > >> who's going to be the consumer of the data is also quite relevant for
> > >> answering your original question on the method to obtain that data.
> > >> Obviously, if the main use of it is per-domain, a domctl would seem like
> > >> a suitable approach despite the data being more of sysctl kind. But if
> > >> a global view would be more important, that model would seem to make
> > >> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> > >> to agree that not using shared pages would be preferable; iirc their use
> > >> was mainly suggested because of the size of the data.
> > >
> > > From the discussion with openstack developers, on certain cloud host, all
> > > running VM's information (e.g., domain ID) will be stored in a database, and
> > > openstack software will use libvirt/XenAPI to query specific domain
> > > information. That libvirt/XenAPI API interface basically accepts the domain
> > > ID as input parameter and get the domain information, including the platform
> > > QoS one.
> > >
> > > Based on above information, I think we'd better design the QoS hypercall
> > > per-domain.
> >
> > If you think that this is going to be the only (or at least prevalent)
> > usage model, that's probably okay then. But I'm a little puzzled that
> > all this effort is just for a single, rather specific consumer. I thought
> > that if this is so important to Intel there would be wider interested
> > audience.

Since there is no further comments, I suppose we all agreed on making the hypercall per-domain and use data copying mechanism between hypervisor and Dom0 tool stack?

Thanks,
Dongxiao

> 
> Not specifically for a single customer.
> Currently we consider Openstack a lot because it is one of the most popular cloud
> software.
> 
> Thanks,
> Dongxiao
> 
> >
> > Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-06  1:40           ` Xu, Dongxiao
@ 2014-05-06  7:55             ` Jan Beulich
  2014-05-06 10:06             ` Andrew Cooper
  1 sibling, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2014-05-06  7:55 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Ian Campbell, xen-devel

>>> On 06.05.14 at 03:40, <dongxiao.xu@intel.com> wrote:
> Since there is no further comments, I suppose we all agreed on making the 
> hypercall per-domain and use data copying mechanism between hypervisor and 
> Dom0 tool stack?

I think Andrew wasn't really in agreement with this, and I also think
the ball is in your court to tell us whether indeed the prevalent usage
model would benefit from that choice of yours. Considering just
OpenStack, as I said before, seems a little narrow minded to me.

That said, I'm neverthtless not really opposed to the per-domain-
copying model.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-04  0:46           ` Xu, Dongxiao
@ 2014-05-06  9:10             ` Ian Campbell
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Campbell @ 2014-05-06  9:10 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: Andrew Cooper(andrew.cooper3@citrix.com), Jan Beulich, xen-devel

On Sun, 2014-05-04 at 00:46 +0000, Xu, Dongxiao wrote:
> Currently we consider Openstack a lot because it is one of the most
> popular cloud software.

How are you planning to expose this information from libxl up into the
relevant part of openstack? via libvirt bindings or some other means?
Has any of this been reviewed by potential consumers of the data?

Ian.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-04  2:34           ` Xu, Dongxiao
@ 2014-05-06  9:12             ` Ian Campbell
  2014-05-06 10:00             ` Andrew Cooper
  1 sibling, 0 replies; 46+ messages in thread
From: Ian Campbell @ 2014-05-06  9:12 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Sun, 2014-05-04 at 02:34 +0000, Xu, Dongxiao wrote:
> > -----Original Message-----
> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > Sent: Friday, May 02, 2014 8:51 PM
> > To: Xu, Dongxiao
> > Cc: Jan Beulich; Ian Campbell; xen-devel@lists.xen.org
> > Subject: Re: Xen Platform QoS design discussion
> > 
> > On 02/05/14 13:30, Xu, Dongxiao wrote:
> > >> -----Original Message-----
> > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Friday, May 02, 2014 5:24 PM
> > >> To: Xu, Dongxiao
> > >> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> > >> xen-devel@lists.xen.org
> > >> Subject: RE: Xen Platform QoS design discussion
> > >>
> > >>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> > >>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > >>>> Have you asked yourself whether this information even needs to be
> > >>>> exposed all the way up to libxl? Who are the expected consumers of this
> > >>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> > >>>> expecting toolstacks to plumb this information all the way up to their
> > >>>> GUI or CLI (e.g. xl or virsh)?
> > >>> The information returned to libxl users is the cache utilization for a
> > >>> certain domain in certain socket, and the main consumers are cloud users
> > like
> > >>> openstack, etc. Of course, we will also provide an xl command to present
> > such
> > >>> information.
> > >> To me this doesn't really address the question Ian asked, yet knowing
> > >> who's going to be the consumer of the data is also quite relevant for
> > >> answering your original question on the method to obtain that data.
> > >> Obviously, if the main use of it is per-domain, a domctl would seem like
> > >> a suitable approach despite the data being more of sysctl kind. But if
> > >> a global view would be more important, that model would seem to make
> > >> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> > >> to agree that not using shared pages would be preferable; iirc their use
> > >> was mainly suggested because of the size of the data.
> > > From the discussion with openstack developers, on certain cloud host, all
> > running VM's information (e.g., domain ID) will be stored in a database, and
> > openstack software will use libvirt/XenAPI to query specific domain information.
> > That libvirt/XenAPI API interface basically accepts the domain ID as input
> > parameter and get the domain information, including the platform QoS one.
> > >
> > > Based on above information, I think we'd better design the QoS hypercall
> > per-domain.
> > 
> > The design of the hypercall has nothing to do with the design of the
> > libxl/XenAPI interface.
> 
> If use the share mechanism between Xen and Dom0 user space, plus
> explicitly listing all the available CQM features as you proposed (see
> below structure cited from previous mail), then the ABI between Xen
> and Dom0 user space may need to be changing every time when a new QoS
> feature is introduced, which breaks the compatibility to some
> extent. :(

This is generally acceptable for a domctl, although if it can be defined
to avoid it even better.

This isn't acceptable for the libxl layer interface though, there API
compatibility is required.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-04  2:34           ` Xu, Dongxiao
  2014-05-06  9:12             ` Ian Campbell
@ 2014-05-06 10:00             ` Andrew Cooper
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Cooper @ 2014-05-06 10:00 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: Ian Campbell, Jan Beulich, xen-devel

On 04/05/14 03:34, Xu, Dongxiao wrote:
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Friday, May 02, 2014 8:51 PM
>> To: Xu, Dongxiao
>> Cc: Jan Beulich; Ian Campbell; xen-devel@lists.xen.org
>> Subject: Re: Xen Platform QoS design discussion
>>
>> On 02/05/14 13:30, Xu, Dongxiao wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Sent: Friday, May 02, 2014 5:24 PM
>>>> To: Xu, Dongxiao
>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>> xen-devel@lists.xen.org
>>>> Subject: RE: Xen Platform QoS design discussion
>>>>
>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>>>>>> Have you asked yourself whether this information even needs to be
>>>>>> exposed all the way up to libxl? Who are the expected consumers of this
>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>>>>>> expecting toolstacks to plumb this information all the way up to their
>>>>>> GUI or CLI (e.g. xl or virsh)?
>>>>> The information returned to libxl users is the cache utilization for a
>>>>> certain domain in certain socket, and the main consumers are cloud users
>> like
>>>>> openstack, etc. Of course, we will also provide an xl command to present
>> such
>>>>> information.
>>>> To me this doesn't really address the question Ian asked, yet knowing
>>>> who's going to be the consumer of the data is also quite relevant for
>>>> answering your original question on the method to obtain that data.
>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
>>>> a suitable approach despite the data being more of sysctl kind. But if
>>>> a global view would be more important, that model would seem to make
>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>>>> to agree that not using shared pages would be preferable; iirc their use
>>>> was mainly suggested because of the size of the data.
>>> From the discussion with openstack developers, on certain cloud host, all
>> running VM's information (e.g., domain ID) will be stored in a database, and
>> openstack software will use libvirt/XenAPI to query specific domain information.
>> That libvirt/XenAPI API interface basically accepts the domain ID as input
>> parameter and get the domain information, including the platform QoS one.
>>> Based on above information, I think we'd better design the QoS hypercall
>> per-domain.
>>
>> The design of the hypercall has nothing to do with the design of the
>> libxl/XenAPI interface.
> If use the share mechanism between Xen and Dom0 user space, plus explicitly listing all the available CQM features as you proposed (see below structure cited from previous mail), then the ABI between Xen and Dom0 user space may need to be changing every time when a new QoS feature is introduced, which breaks the compatibility to some extent. :(

Not in the slightest.  Xen and libxc are required to be a matching set,
compiled from the same changeset.  There are no problems at all changing
structures like this going forwards.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-06  1:40           ` Xu, Dongxiao
  2014-05-06  7:55             ` Jan Beulich
@ 2014-05-06 10:06             ` Andrew Cooper
  2014-05-07  2:08               ` Xu, Dongxiao
  2014-05-07 13:26               ` George Dunlap
  1 sibling, 2 replies; 46+ messages in thread
From: Andrew Cooper @ 2014-05-06 10:06 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: Ian Campbell, Jan Beulich, xen-devel

On 06/05/14 02:40, Xu, Dongxiao wrote:
>> -----Original Message-----
>> From: Xu, Dongxiao
>> Sent: Sunday, May 04, 2014 8:46 AM
>> To: Jan Beulich
>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>> xen-devel@lists.xen.org
>> Subject: RE: Xen Platform QoS design discussion
>>
>>> -----Original Message-----
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Friday, May 02, 2014 8:40 PM
>>> To: Xu, Dongxiao
>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>> xen-devel@lists.xen.org
>>> Subject: RE: Xen Platform QoS design discussion
>>>
>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
>>>>>  -----Original Message-----
>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>> Sent: Friday, May 02, 2014 5:24 PM
>>>>> To: Xu, Dongxiao
>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>>> xen-devel@lists.xen.org
>>>>> Subject: RE: Xen Platform QoS design discussion
>>>>>
>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>>>>>>> Have you asked yourself whether this information even needs to be
>>>>>>> exposed all the way up to libxl? Who are the expected consumers of this
>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>>>>>>> expecting toolstacks to plumb this information all the way up to their
>>>>>>> GUI or CLI (e.g. xl or virsh)?
>>>>>> The information returned to libxl users is the cache utilization for a
>>>>>> certain domain in certain socket, and the main consumers are cloud users
>>>> like
>>>>>> openstack, etc. Of course, we will also provide an xl command to present
>>>> such
>>>>>> information.
>>>>> To me this doesn't really address the question Ian asked, yet knowing
>>>>> who's going to be the consumer of the data is also quite relevant for
>>>>> answering your original question on the method to obtain that data.
>>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
>>>>> a suitable approach despite the data being more of sysctl kind. But if
>>>>> a global view would be more important, that model would seem to make
>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>>>>> to agree that not using shared pages would be preferable; iirc their use
>>>>> was mainly suggested because of the size of the data.
>>>> From the discussion with openstack developers, on certain cloud host, all
>>>> running VM's information (e.g., domain ID) will be stored in a database, and
>>>> openstack software will use libvirt/XenAPI to query specific domain
>>>> information. That libvirt/XenAPI API interface basically accepts the domain
>>>> ID as input parameter and get the domain information, including the platform
>>>> QoS one.
>>>>
>>>> Based on above information, I think we'd better design the QoS hypercall
>>>> per-domain.
>>> If you think that this is going to be the only (or at least prevalent)
>>> usage model, that's probably okay then. But I'm a little puzzled that
>>> all this effort is just for a single, rather specific consumer. I thought
>>> that if this is so important to Intel there would be wider interested
>>> audience.
> Since there is no further comments, I suppose we all agreed on making the hypercall per-domain and use data copying mechanism between hypervisor and Dom0 tool stack?
>

No - the onus is very much on you to prove that your API will *not* be
used in the following way:

every $TIMEPERIOD
  for each domain
    for each type of information
      get-$TYPE-information-for-$DOMAIN


Which is the source of my concerns regarding overhead.

As far as I can see, as soon as you provide access to this QoS
information, higher level toolstacks are going to want all information
for all domains.  Given your proposed domctl, they will have exactly one
(bad) way of getting this information.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-06 10:06             ` Andrew Cooper
@ 2014-05-07  2:08               ` Xu, Dongxiao
  2014-05-07  9:10                 ` Ian Campbell
  2014-05-07 13:26               ` George Dunlap
  1 sibling, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-07  2:08 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Campbell, Jan Beulich, xen-devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, May 06, 2014 6:06 PM
> To: Xu, Dongxiao
> Cc: Jan Beulich; Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: Xen Platform QoS design discussion
> 
> On 06/05/14 02:40, Xu, Dongxiao wrote:
> >> -----Original Message-----
> >> From: Xu, Dongxiao
> >> Sent: Sunday, May 04, 2014 8:46 AM
> >> To: Jan Beulich
> >> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >> xen-devel@lists.xen.org
> >> Subject: RE: Xen Platform QoS design discussion
> >>
> >>> -----Original Message-----
> >>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>> Sent: Friday, May 02, 2014 8:40 PM
> >>> To: Xu, Dongxiao
> >>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >>> xen-devel@lists.xen.org
> >>> Subject: RE: Xen Platform QoS design discussion
> >>>
> >>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
> >>>>>  -----Original Message-----
> >>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>>> Sent: Friday, May 02, 2014 5:24 PM
> >>>>> To: Xu, Dongxiao
> >>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >>>>> xen-devel@lists.xen.org
> >>>>> Subject: RE: Xen Platform QoS design discussion
> >>>>>
> >>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> >>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> >>>>>>> Have you asked yourself whether this information even needs to be
> >>>>>>> exposed all the way up to libxl? Who are the expected consumers of
> this
> >>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> >>>>>>> expecting toolstacks to plumb this information all the way up to their
> >>>>>>> GUI or CLI (e.g. xl or virsh)?
> >>>>>> The information returned to libxl users is the cache utilization for a
> >>>>>> certain domain in certain socket, and the main consumers are cloud
> users
> >>>> like
> >>>>>> openstack, etc. Of course, we will also provide an xl command to present
> >>>> such
> >>>>>> information.
> >>>>> To me this doesn't really address the question Ian asked, yet knowing
> >>>>> who's going to be the consumer of the data is also quite relevant for
> >>>>> answering your original question on the method to obtain that data.
> >>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
> >>>>> a suitable approach despite the data being more of sysctl kind. But if
> >>>>> a global view would be more important, that model would seem to make
> >>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> >>>>> to agree that not using shared pages would be preferable; iirc their use
> >>>>> was mainly suggested because of the size of the data.
> >>>> From the discussion with openstack developers, on certain cloud host, all
> >>>> running VM's information (e.g., domain ID) will be stored in a database, and
> >>>> openstack software will use libvirt/XenAPI to query specific domain
> >>>> information. That libvirt/XenAPI API interface basically accepts the domain
> >>>> ID as input parameter and get the domain information, including the
> platform
> >>>> QoS one.
> >>>>
> >>>> Based on above information, I think we'd better design the QoS hypercall
> >>>> per-domain.
> >>> If you think that this is going to be the only (or at least prevalent)
> >>> usage model, that's probably okay then. But I'm a little puzzled that
> >>> all this effort is just for a single, rather specific consumer. I thought
> >>> that if this is so important to Intel there would be wider interested
> >>> audience.
> > Since there is no further comments, I suppose we all agreed on making the
> hypercall per-domain and use data copying mechanism between hypervisor and
> Dom0 tool stack?
> >
> 

Reply previous Ian and Andrew's comments in this mail.

> No - the onus is very much on you to prove that your API will *not* be
> used in the following way:
> 
> every $TIMEPERIOD
>   for each domain
>     for each type of information
>       get-$TYPE-information-for-$DOMAIN

The "for loop" mentioned here does exist in certain software levels, and there are several options:
1. For loop in libvirt/openstack layer (likely):
In this case, domctl would be better which returns per-domain's QoS info. Otherwise it will repeatedly call sysctl hypercall to get the entire data structure but only returns one domain's info to user space.

2. For loop within libxl API function and returns whole QoS data (unlikely):
If we return such entire PQoS info to Dom0 user space via libxl API, then this API will be changing once new PQoS feature comes out. As Ian mentioned, we need certain compatibility for libxl API.

> 
> Which is the source of my concerns regarding overhead.
> 
> As far as I can see, as soon as you provide access to this QoS
> information, higher level toolstacks are going to want all information
> for all domains.  Given your proposed domctl, they will have exactly one
> (bad) way of getting this information.

I understand your point.
I think the final decision should be a compromise between:
1) Overhead.
2) Compatibility.
3) Code flexibility and simplicity.

Thanks,
Dongxiao

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-07  2:08               ` Xu, Dongxiao
@ 2014-05-07  9:10                 ` Ian Campbell
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Campbell @ 2014-05-07  9:10 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Wed, 2014-05-07 at 02:08 +0000, Xu, Dongxiao wrote:
> > -----Original Message-----
> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > Sent: Tuesday, May 06, 2014 6:06 PM
> > To: Xu, Dongxiao
> > Cc: Jan Beulich; Ian Campbell; xen-devel@lists.xen.org
> > Subject: Re: Xen Platform QoS design discussion
> > 
> > On 06/05/14 02:40, Xu, Dongxiao wrote:
> > >> -----Original Message-----
> > >> From: Xu, Dongxiao
> > >> Sent: Sunday, May 04, 2014 8:46 AM
> > >> To: Jan Beulich
> > >> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> > >> xen-devel@lists.xen.org
> > >> Subject: RE: Xen Platform QoS design discussion
> > >>
> > >>> -----Original Message-----
> > >>> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >>> Sent: Friday, May 02, 2014 8:40 PM
> > >>> To: Xu, Dongxiao
> > >>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> > >>> xen-devel@lists.xen.org
> > >>> Subject: RE: Xen Platform QoS design discussion
> > >>>
> > >>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
> > >>>>>  -----Original Message-----
> > >>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >>>>> Sent: Friday, May 02, 2014 5:24 PM
> > >>>>> To: Xu, Dongxiao
> > >>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> > >>>>> xen-devel@lists.xen.org
> > >>>>> Subject: RE: Xen Platform QoS design discussion
> > >>>>>
> > >>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> > >>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > >>>>>>> Have you asked yourself whether this information even needs to be
> > >>>>>>> exposed all the way up to libxl? Who are the expected consumers of
> > this
> > >>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> > >>>>>>> expecting toolstacks to plumb this information all the way up to their
> > >>>>>>> GUI or CLI (e.g. xl or virsh)?
> > >>>>>> The information returned to libxl users is the cache utilization for a
> > >>>>>> certain domain in certain socket, and the main consumers are cloud
> > users
> > >>>> like
> > >>>>>> openstack, etc. Of course, we will also provide an xl command to present
> > >>>> such
> > >>>>>> information.
> > >>>>> To me this doesn't really address the question Ian asked, yet knowing
> > >>>>> who's going to be the consumer of the data is also quite relevant for
> > >>>>> answering your original question on the method to obtain that data.
> > >>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
> > >>>>> a suitable approach despite the data being more of sysctl kind. But if
> > >>>>> a global view would be more important, that model would seem to make
> > >>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> > >>>>> to agree that not using shared pages would be preferable; iirc their use
> > >>>>> was mainly suggested because of the size of the data.
> > >>>> From the discussion with openstack developers, on certain cloud host, all
> > >>>> running VM's information (e.g., domain ID) will be stored in a database, and
> > >>>> openstack software will use libvirt/XenAPI to query specific domain
> > >>>> information. That libvirt/XenAPI API interface basically accepts the domain
> > >>>> ID as input parameter and get the domain information, including the
> > platform
> > >>>> QoS one.
> > >>>>
> > >>>> Based on above information, I think we'd better design the QoS hypercall
> > >>>> per-domain.
> > >>> If you think that this is going to be the only (or at least prevalent)
> > >>> usage model, that's probably okay then. But I'm a little puzzled that
> > >>> all this effort is just for a single, rather specific consumer. I thought
> > >>> that if this is so important to Intel there would be wider interested
> > >>> audience.
> > > Since there is no further comments, I suppose we all agreed on making the
> > hypercall per-domain and use data copying mechanism between hypervisor and
> > Dom0 tool stack?
> > >
> > 
> 
> Reply previous Ian and Andrew's comments in this mail.
> 
> > No - the onus is very much on you to prove that your API will *not* be
> > used in the following way:
> > 
> > every $TIMEPERIOD
> >   for each domain
> >     for each type of information
> >       get-$TYPE-information-for-$DOMAIN
> 
> The "for loop" mentioned here does exist in certain software levels, and there are several options:
> 1. For loop in libvirt/openstack layer (likely):
> In this case, domctl would be better which returns per-domain's QoS
> info. Otherwise it will repeatedly call sysctl hypercall to get the
> entire data structure but only returns one domain's info to user
> space.
> 
> 2. For loop within libxl API function and returns whole QoS data (unlikely):
> If we return such entire PQoS info to Dom0 user space via libxl API,
> then this API will be changing once new PQoS feature comes out.

I don't see why this is a) any more likely with #2 than #1 or b) why the
API can't be designed in such a way as to make it extensible to start
with.

Please take a look in libxl.h for a bug comment about the mechanisms
which we support for extending the API, the most obvious one being that
adding a new field to a struct (or case to an enum etc) is OK so long as
a suitable #define LIBXL_HAVE_THING is added at the same time.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-06 10:06             ` Andrew Cooper
  2014-05-07  2:08               ` Xu, Dongxiao
@ 2014-05-07 13:26               ` George Dunlap
  2014-05-07 21:18                 ` Andrew Cooper
  1 sibling, 1 reply; 46+ messages in thread
From: George Dunlap @ 2014-05-07 13:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xu, Dongxiao, Ian Campbell, Jan Beulich, xen-devel

On Tue, May 6, 2014 at 11:06 AM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 06/05/14 02:40, Xu, Dongxiao wrote:
>>> -----Original Message-----
>>> From: Xu, Dongxiao
>>> Sent: Sunday, May 04, 2014 8:46 AM
>>> To: Jan Beulich
>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>> xen-devel@lists.xen.org
>>> Subject: RE: Xen Platform QoS design discussion
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Sent: Friday, May 02, 2014 8:40 PM
>>>> To: Xu, Dongxiao
>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>> xen-devel@lists.xen.org
>>>> Subject: RE: Xen Platform QoS design discussion
>>>>
>>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
>>>>>>  -----Original Message-----
>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>>> Sent: Friday, May 02, 2014 5:24 PM
>>>>>> To: Xu, Dongxiao
>>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>>>> xen-devel@lists.xen.org
>>>>>> Subject: RE: Xen Platform QoS design discussion
>>>>>>
>>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>>>>>>>> Have you asked yourself whether this information even needs to be
>>>>>>>> exposed all the way up to libxl? Who are the expected consumers of this
>>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>>>>>>>> expecting toolstacks to plumb this information all the way up to their
>>>>>>>> GUI or CLI (e.g. xl or virsh)?
>>>>>>> The information returned to libxl users is the cache utilization for a
>>>>>>> certain domain in certain socket, and the main consumers are cloud users
>>>>> like
>>>>>>> openstack, etc. Of course, we will also provide an xl command to present
>>>>> such
>>>>>>> information.
>>>>>> To me this doesn't really address the question Ian asked, yet knowing
>>>>>> who's going to be the consumer of the data is also quite relevant for
>>>>>> answering your original question on the method to obtain that data.
>>>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
>>>>>> a suitable approach despite the data being more of sysctl kind. But if
>>>>>> a global view would be more important, that model would seem to make
>>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>>>>>> to agree that not using shared pages would be preferable; iirc their use
>>>>>> was mainly suggested because of the size of the data.
>>>>> From the discussion with openstack developers, on certain cloud host, all
>>>>> running VM's information (e.g., domain ID) will be stored in a database, and
>>>>> openstack software will use libvirt/XenAPI to query specific domain
>>>>> information. That libvirt/XenAPI API interface basically accepts the domain
>>>>> ID as input parameter and get the domain information, including the platform
>>>>> QoS one.
>>>>>
>>>>> Based on above information, I think we'd better design the QoS hypercall
>>>>> per-domain.
>>>> If you think that this is going to be the only (or at least prevalent)
>>>> usage model, that's probably okay then. But I'm a little puzzled that
>>>> all this effort is just for a single, rather specific consumer. I thought
>>>> that if this is so important to Intel there would be wider interested
>>>> audience.
>> Since there is no further comments, I suppose we all agreed on making the hypercall per-domain and use data copying mechanism between hypervisor and Dom0 tool stack?
>>
>
> No - the onus is very much on you to prove that your API will *not* be
> used in the following way:
>
> every $TIMEPERIOD
>   for each domain
>     for each type of information
>       get-$TYPE-information-for-$DOMAIN
>
>
> Which is the source of my concerns regarding overhead.
>
> As far as I can see, as soon as you provide access to this QoS
> information, higher level toolstacks are going to want all information
> for all domains.  Given your proposed domctl, they will have exactly one
> (bad) way of getting this information.

Is this really going to be that much of a critical path that we need
to even have this discussion?

We have two different hypercalls right now for getting "dominfo": a
domctl and a sysctl.  You use the domctl if you want information about
a single domain, you use sysctl if you want information about all
domains.  The sysctl implementation calls the domctl implementation
internally.

Is there a problem with doing the same thing here?  Or, with starting
with a domctl, and then creating a sysctl if iterating over all
domains (and calling the domctl internally) if we measure the domctl
to be too slow for many callers?

 -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-07 13:26               ` George Dunlap
@ 2014-05-07 21:18                 ` Andrew Cooper
  2014-05-08  5:21                   ` Xu, Dongxiao
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2014-05-07 21:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: Xu, Dongxiao, Ian Campbell, Jan Beulich, xen-devel

On 07/05/14 14:26, George Dunlap wrote:
> On Tue, May 6, 2014 at 11:06 AM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 06/05/14 02:40, Xu, Dongxiao wrote:
>>>> -----Original Message-----
>>>> From: Xu, Dongxiao
>>>> Sent: Sunday, May 04, 2014 8:46 AM
>>>> To: Jan Beulich
>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>> xen-devel@lists.xen.org
>>>> Subject: RE: Xen Platform QoS design discussion
>>>>
>>>>> -----Original Message-----
>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>> Sent: Friday, May 02, 2014 8:40 PM
>>>>> To: Xu, Dongxiao
>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>>> xen-devel@lists.xen.org
>>>>> Subject: RE: Xen Platform QoS design discussion
>>>>>
>>>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
>>>>>>>  -----Original Message-----
>>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>>>> Sent: Friday, May 02, 2014 5:24 PM
>>>>>>> To: Xu, Dongxiao
>>>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
>>>>>>> xen-devel@lists.xen.org
>>>>>>> Subject: RE: Xen Platform QoS design discussion
>>>>>>>
>>>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
>>>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>>>>>>>>> Have you asked yourself whether this information even needs to be
>>>>>>>>> exposed all the way up to libxl? Who are the expected consumers of this
>>>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
>>>>>>>>> expecting toolstacks to plumb this information all the way up to their
>>>>>>>>> GUI or CLI (e.g. xl or virsh)?
>>>>>>>> The information returned to libxl users is the cache utilization for a
>>>>>>>> certain domain in certain socket, and the main consumers are cloud users
>>>>>> like
>>>>>>>> openstack, etc. Of course, we will also provide an xl command to present
>>>>>> such
>>>>>>>> information.
>>>>>>> To me this doesn't really address the question Ian asked, yet knowing
>>>>>>> who's going to be the consumer of the data is also quite relevant for
>>>>>>> answering your original question on the method to obtain that data.
>>>>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
>>>>>>> a suitable approach despite the data being more of sysctl kind. But if
>>>>>>> a global view would be more important, that model would seem to make
>>>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
>>>>>>> to agree that not using shared pages would be preferable; iirc their use
>>>>>>> was mainly suggested because of the size of the data.
>>>>>> From the discussion with openstack developers, on certain cloud host, all
>>>>>> running VM's information (e.g., domain ID) will be stored in a database, and
>>>>>> openstack software will use libvirt/XenAPI to query specific domain
>>>>>> information. That libvirt/XenAPI API interface basically accepts the domain
>>>>>> ID as input parameter and get the domain information, including the platform
>>>>>> QoS one.
>>>>>>
>>>>>> Based on above information, I think we'd better design the QoS hypercall
>>>>>> per-domain.
>>>>> If you think that this is going to be the only (or at least prevalent)
>>>>> usage model, that's probably okay then. But I'm a little puzzled that
>>>>> all this effort is just for a single, rather specific consumer. I thought
>>>>> that if this is so important to Intel there would be wider interested
>>>>> audience.
>>> Since there is no further comments, I suppose we all agreed on making the hypercall per-domain and use data copying mechanism between hypervisor and Dom0 tool stack?
>>>
>> No - the onus is very much on you to prove that your API will *not* be
>> used in the following way:
>>
>> every $TIMEPERIOD
>>   for each domain
>>     for each type of information
>>       get-$TYPE-information-for-$DOMAIN
>>
>>
>> Which is the source of my concerns regarding overhead.
>>
>> As far as I can see, as soon as you provide access to this QoS
>> information, higher level toolstacks are going to want all information
>> for all domains.  Given your proposed domctl, they will have exactly one
>> (bad) way of getting this information.
> Is this really going to be that much of a critical path that we need
> to even have this discussion?

Absolutely.

If that logical set of nested loops is on a remote control instance
where get-$TYPE-information-for-$DOMAIN involves rpc to a particular
dom0, then the domctls can be approximated as being functionally
infinite time periods apart.

If the set of nested loops is a daemon or script in dom0, the domctls
will be very close together.

As the current implementation involves taking a global spinlock, IPI'ing
the other sockets and MSR interactions, the net impact on the running
system can be massive, particularly if back-to-back IPIs interrupt HVM
guests.

>
> We have two different hypercalls right now for getting "dominfo": a
> domctl and a sysctl.  You use the domctl if you want information about
> a single domain, you use sysctl if you want information about all
> domains.  The sysctl implementation calls the domctl implementation
> internally.

It is not a fair comparison, given the completely different nature of
the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
more than reading specific bits of data out the appropriate struct
domain and its struct vcpu's which can trivially be done by the cpu
handling the hypercall.

>
> Is there a problem with doing the same thing here?  Or, with starting
> with a domctl, and then creating a sysctl if iterating over all
> domains (and calling the domctl internally) if we measure the domctl
> to be too slow for many callers?
>
>  -George

My problem is not with the domctl per-se.

My problem is that this is not a QoS design discussion;  this is an
email thread about a specific QoS implementation which is not answering
the concerns raised against it to the satisfaction of people raising the
concerns.

The core argument here is that a statement of "OpenStack want to get a
piece of QoS data back from libvirt/xenapi when querying a specific
domain" is being used to justify implementing the hypercall in an
identical fashion.

This is not a libxl design; this is a single user story forming part of
the requirement "I as a cloud service provider would like QoS
information for each VM to be available to my
$CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
customers, balance my load more evenly, etc}".

The only valid justification for implementing a brand new hypercall in a
certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
perform the actions I need to perform", for appropriately
substitutions.  Not "because it is the same way I want to hand this
information off at the higher level".

As part of this design discussion. I have raised a concern saying "I
believe the usecase of having a stats gathering daemon in dom0 has not
been appropriately considered", qualified with "If you were to use the
domctl as currently designed from a stats gathering daemon, you will
cripple Xen with the overhead".

Going back to the original use, xenapi has a stats daemon for these
things.  It has an rpc interface so a query given a specific domain can
return some or all data for that domain, but it very definitely does not
translate each request into a hypercall for the requested information. 
I have no real experience with libvirt, so can't comment on stats
gathering in that context.

I have proposed an alternative Xen->libxc interface designed with a
stats daemon in mind, explaining why I believe it has lower overheads to
Xen and why is more in line with what I expect ${VENDOR}Stack to
actually want.

I am now waiting for a reasoned rebuttal which has more content than
"because there are a set of patches which already implement it in this way".

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-07 21:18                 ` Andrew Cooper
@ 2014-05-08  5:21                   ` Xu, Dongxiao
  2014-05-08 11:25                     ` Andrew Cooper
  0 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-08  5:21 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap; +Cc: Ian Campbell, Jan Beulich, xen-devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Thursday, May 08, 2014 5:19 AM
> To: George Dunlap
> Cc: Xu, Dongxiao; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> 
> On 07/05/14 14:26, George Dunlap wrote:
> > On Tue, May 6, 2014 at 11:06 AM, Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 06/05/14 02:40, Xu, Dongxiao wrote:
> >>>> -----Original Message-----
> >>>> From: Xu, Dongxiao
> >>>> Sent: Sunday, May 04, 2014 8:46 AM
> >>>> To: Jan Beulich
> >>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >>>> xen-devel@lists.xen.org
> >>>> Subject: RE: Xen Platform QoS design discussion
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>>> Sent: Friday, May 02, 2014 8:40 PM
> >>>>> To: Xu, Dongxiao
> >>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >>>>> xen-devel@lists.xen.org
> >>>>> Subject: RE: Xen Platform QoS design discussion
> >>>>>
> >>>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@intel.com> wrote:
> >>>>>>>  -----Original Message-----
> >>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>>>>> Sent: Friday, May 02, 2014 5:24 PM
> >>>>>>> To: Xu, Dongxiao
> >>>>>>> Cc: Andrew Cooper(andrew.cooper3@citrix.com); Ian Campbell;
> >>>>>>> xen-devel@lists.xen.org
> >>>>>>> Subject: RE: Xen Platform QoS design discussion
> >>>>>>>
> >>>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@intel.com> wrote:
> >>>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> >>>>>>>>> Have you asked yourself whether this information even needs to be
> >>>>>>>>> exposed all the way up to libxl? Who are the expected consumers of
> this
> >>>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are you
> >>>>>>>>> expecting toolstacks to plumb this information all the way up to their
> >>>>>>>>> GUI or CLI (e.g. xl or virsh)?
> >>>>>>>> The information returned to libxl users is the cache utilization for a
> >>>>>>>> certain domain in certain socket, and the main consumers are cloud
> users
> >>>>>> like
> >>>>>>>> openstack, etc. Of course, we will also provide an xl command to
> present
> >>>>>> such
> >>>>>>>> information.
> >>>>>>> To me this doesn't really address the question Ian asked, yet knowing
> >>>>>>> who's going to be the consumer of the data is also quite relevant for
> >>>>>>> answering your original question on the method to obtain that data.
> >>>>>>> Obviously, if the main use of it is per-domain, a domctl would seem like
> >>>>>>> a suitable approach despite the data being more of sysctl kind. But if
> >>>>>>> a global view would be more important, that model would seem to
> make
> >>>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I tend
> >>>>>>> to agree that not using shared pages would be preferable; iirc their use
> >>>>>>> was mainly suggested because of the size of the data.
> >>>>>> From the discussion with openstack developers, on certain cloud host, all
> >>>>>> running VM's information (e.g., domain ID) will be stored in a database,
> and
> >>>>>> openstack software will use libvirt/XenAPI to query specific domain
> >>>>>> information. That libvirt/XenAPI API interface basically accepts the
> domain
> >>>>>> ID as input parameter and get the domain information, including the
> platform
> >>>>>> QoS one.
> >>>>>>
> >>>>>> Based on above information, I think we'd better design the QoS
> hypercall
> >>>>>> per-domain.
> >>>>> If you think that this is going to be the only (or at least prevalent)
> >>>>> usage model, that's probably okay then. But I'm a little puzzled that
> >>>>> all this effort is just for a single, rather specific consumer. I thought
> >>>>> that if this is so important to Intel there would be wider interested
> >>>>> audience.
> >>> Since there is no further comments, I suppose we all agreed on making the
> hypercall per-domain and use data copying mechanism between hypervisor and
> Dom0 tool stack?
> >>>
> >> No - the onus is very much on you to prove that your API will *not* be
> >> used in the following way:
> >>
> >> every $TIMEPERIOD
> >>   for each domain
> >>     for each type of information
> >>       get-$TYPE-information-for-$DOMAIN
> >>
> >>
> >> Which is the source of my concerns regarding overhead.
> >>
> >> As far as I can see, as soon as you provide access to this QoS
> >> information, higher level toolstacks are going to want all information
> >> for all domains.  Given your proposed domctl, they will have exactly one
> >> (bad) way of getting this information.
> > Is this really going to be that much of a critical path that we need
> > to even have this discussion?
> 
> Absolutely.
> 
> If that logical set of nested loops is on a remote control instance
> where get-$TYPE-information-for-$DOMAIN involves rpc to a particular
> dom0, then the domctls can be approximated as being functionally
> infinite time periods apart.
> 
> If the set of nested loops is a daemon or script in dom0, the domctls
> will be very close together.
> 
> As the current implementation involves taking a global spinlock, IPI'ing
> the other sockets and MSR interactions, the net impact on the running
> system can be massive, particularly if back-to-back IPIs interrupt HVM
> guests.
> 
> >
> > We have two different hypercalls right now for getting "dominfo": a
> > domctl and a sysctl.  You use the domctl if you want information about
> > a single domain, you use sysctl if you want information about all
> > domains.  The sysctl implementation calls the domctl implementation
> > internally.
> 
> It is not a fair comparison, given the completely different nature of
> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
> more than reading specific bits of data out the appropriate struct
> domain and its struct vcpu's which can trivially be done by the cpu
> handling the hypercall.
> 
> >
> > Is there a problem with doing the same thing here?  Or, with starting
> > with a domctl, and then creating a sysctl if iterating over all
> > domains (and calling the domctl internally) if we measure the domctl
> > to be too slow for many callers?
> >
> >  -George
> 
> My problem is not with the domctl per-se.
> 
> My problem is that this is not a QoS design discussion;  this is an
> email thread about a specific QoS implementation which is not answering
> the concerns raised against it to the satisfaction of people raising the
> concerns.
> 
> The core argument here is that a statement of "OpenStack want to get a
> piece of QoS data back from libvirt/xenapi when querying a specific
> domain" is being used to justify implementing the hypercall in an
> identical fashion.
> 
> This is not a libxl design; this is a single user story forming part of
> the requirement "I as a cloud service provider would like QoS
> information for each VM to be available to my
> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
> customers, balance my load more evenly, etc}".
> 
> The only valid justification for implementing a brand new hypercall in a
> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
> perform the actions I need to perform", for appropriately
> substitutions.  Not "because it is the same way I want to hand this
> information off at the higher level".
> 
> As part of this design discussion. I have raised a concern saying "I
> believe the usecase of having a stats gathering daemon in dom0 has not
> been appropriately considered", qualified with "If you were to use the
> domctl as currently designed from a stats gathering daemon, you will
> cripple Xen with the overhead".
> 
> Going back to the original use, xenapi has a stats daemon for these
> things.  It has an rpc interface so a query given a specific domain can
> return some or all data for that domain, but it very definitely does not
> translate each request into a hypercall for the requested information.
> I have no real experience with libvirt, so can't comment on stats
> gathering in that context.
> 
> I have proposed an alternative Xen->libxc interface designed with a
> stats daemon in mind, explaining why I believe it has lower overheads to
> Xen and why is more in line with what I expect ${VENDOR}Stack to
> actually want.
> 
> I am now waiting for a reasoned rebuttal which has more content than
> "because there are a set of patches which already implement it in this way".

No, I don't have the patch for domctl implementation. 

In the past half year, all previous v1-v10 patches are implemented in sysctl way, however based on that, people raised a lot of comments (large size of memory, runtime non-0 order of memory allocation, page sharing with user space, CPU online/offline special logic, etc.), and these make the platform QoS implementation more and more complex in Xen. That's why I am proposing the domctl method that can make things easier.

I don't have more things to argue or rebuttal, and if you prefer sysctl, I can continue to work out a v11, v12 or more, to present the big 2-dimension array to end user and let them withdraw their real required data, still includes the extra CPU online/offline logics to handle the QoS resource runtime allocation.

Thanks,
Dongxiao

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-08  5:21                   ` Xu, Dongxiao
@ 2014-05-08 11:25                     ` Andrew Cooper
  2014-05-09  2:41                       ` Xu, Dongxiao
                                         ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Andrew Cooper @ 2014-05-08 11:25 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: George Dunlap, Ian Campbell, Jan Beulich, xen-devel

On 08/05/14 06:21, Xu, Dongxiao wrote:

<massive snip>

>>
>>> We have two different hypercalls right now for getting "dominfo": a
>>> domctl and a sysctl.  You use the domctl if you want information about
>>> a single domain, you use sysctl if you want information about all
>>> domains.  The sysctl implementation calls the domctl implementation
>>> internally.
>> It is not a fair comparison, given the completely different nature of
>> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
>> more than reading specific bits of data out the appropriate struct
>> domain and its struct vcpu's which can trivially be done by the cpu
>> handling the hypercall.
>>
>>> Is there a problem with doing the same thing here?  Or, with starting
>>> with a domctl, and then creating a sysctl if iterating over all
>>> domains (and calling the domctl internally) if we measure the domctl
>>> to be too slow for many callers?
>>>
>>>  -George
>> My problem is not with the domctl per-se.
>>
>> My problem is that this is not a QoS design discussion;  this is an
>> email thread about a specific QoS implementation which is not answering
>> the concerns raised against it to the satisfaction of people raising the
>> concerns.
>>
>> The core argument here is that a statement of "OpenStack want to get a
>> piece of QoS data back from libvirt/xenapi when querying a specific
>> domain" is being used to justify implementing the hypercall in an
>> identical fashion.
>>
>> This is not a libxl design; this is a single user story forming part of
>> the requirement "I as a cloud service provider would like QoS
>> information for each VM to be available to my
>> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
>> customers, balance my load more evenly, etc}".
>>
>> The only valid justification for implementing a brand new hypercall in a
>> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
>> perform the actions I need to perform", for appropriately
>> substitutions.  Not "because it is the same way I want to hand this
>> information off at the higher level".
>>
>> As part of this design discussion. I have raised a concern saying "I
>> believe the usecase of having a stats gathering daemon in dom0 has not
>> been appropriately considered", qualified with "If you were to use the
>> domctl as currently designed from a stats gathering daemon, you will
>> cripple Xen with the overhead".
>>
>> Going back to the original use, xenapi has a stats daemon for these
>> things.  It has an rpc interface so a query given a specific domain can
>> return some or all data for that domain, but it very definitely does not
>> translate each request into a hypercall for the requested information.
>> I have no real experience with libvirt, so can't comment on stats
>> gathering in that context.
>>
>> I have proposed an alternative Xen->libxc interface designed with a
>> stats daemon in mind, explaining why I believe it has lower overheads to
>> Xen and why is more in line with what I expect ${VENDOR}Stack to
>> actually want.
>>
>> I am now waiting for a reasoned rebuttal which has more content than
>> "because there are a set of patches which already implement it in this way".
> No, I don't have the patch for domctl implementation. 
>
> In the past half year, all previous v1-v10 patches are implemented in sysctl way, however based on that, people raised a lot of comments (large size of memory, runtime non-0 order of memory allocation, page sharing with user space, CPU online/offline special logic, etc.), and these make the platform QoS implementation more and more complex in Xen. That's why I am proposing the domctl method that can make things easier.
>
> I don't have more things to argue or rebuttal, and if you prefer sysctl, I can continue to work out a v11, v12 or more, to present the big 2-dimension array to end user and let them withdraw their real required data, still includes the extra CPU online/offline logics to handle the QoS resource runtime allocation.
>
> Thanks,
> Dongxiao

I am sorry - I was not trying to make an argument for one of the
proposed mechanisms over the other.  The point I was trying to make
(which on further consideration isn't as clear as I was hoping) is that
you cannot possibly design the hypercall interface before knowing the
library usecases, and there is a clear lack of understanding (or at
least communication) in this regard.


So, starting from the top. OpenStack want QoS information, and want to
get it from libvirt/XenAPI.  I think libvirt/XenAPI is the correct level
to do this at, and think exactly the same would apply to CloudStack as
well.  The relevant part of this is the question "how does
libvirt/XenAPI collect stats".

XenAPI collects stats with the RRD Daemon, running in dom0.  It has an
internal database of statistics, and hands data from this database out
upon RPC requests.  It also has threads whose purpose is to periodically
refresh the data in the database.  This provides a disconnect between
${FOO}Stack requesting stats for a domain and the logic to obtain stats
for that domain.

I am however unfamiliar with libvirt in this regard.  Could you please
explain how the libvirt daemon deals with stats?

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-08 11:25                     ` Andrew Cooper
@ 2014-05-09  2:41                       ` Xu, Dongxiao
  2014-05-13  1:53                       ` Xu, Dongxiao
  2014-05-16  5:11                       ` Xu, Dongxiao
  2 siblings, 0 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-09  2:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Ian Campbell, Jan Beulich, xen-devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Thursday, May 08, 2014 7:26 PM
> To: Xu, Dongxiao
> Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> 
> On 08/05/14 06:21, Xu, Dongxiao wrote:
> 
> <massive snip>
> 
> >>
> >>> We have two different hypercalls right now for getting "dominfo": a
> >>> domctl and a sysctl.  You use the domctl if you want information about
> >>> a single domain, you use sysctl if you want information about all
> >>> domains.  The sysctl implementation calls the domctl implementation
> >>> internally.
> >> It is not a fair comparison, given the completely different nature of
> >> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
> >> more than reading specific bits of data out the appropriate struct
> >> domain and its struct vcpu's which can trivially be done by the cpu
> >> handling the hypercall.
> >>
> >>> Is there a problem with doing the same thing here?  Or, with starting
> >>> with a domctl, and then creating a sysctl if iterating over all
> >>> domains (and calling the domctl internally) if we measure the domctl
> >>> to be too slow for many callers?
> >>>
> >>>  -George
> >> My problem is not with the domctl per-se.
> >>
> >> My problem is that this is not a QoS design discussion;  this is an
> >> email thread about a specific QoS implementation which is not answering
> >> the concerns raised against it to the satisfaction of people raising the
> >> concerns.
> >>
> >> The core argument here is that a statement of "OpenStack want to get a
> >> piece of QoS data back from libvirt/xenapi when querying a specific
> >> domain" is being used to justify implementing the hypercall in an
> >> identical fashion.
> >>
> >> This is not a libxl design; this is a single user story forming part of
> >> the requirement "I as a cloud service provider would like QoS
> >> information for each VM to be available to my
> >> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
> >> customers, balance my load more evenly, etc}".
> >>
> >> The only valid justification for implementing a brand new hypercall in a
> >> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
> >> perform the actions I need to perform", for appropriately
> >> substitutions.  Not "because it is the same way I want to hand this
> >> information off at the higher level".
> >>
> >> As part of this design discussion. I have raised a concern saying "I
> >> believe the usecase of having a stats gathering daemon in dom0 has not
> >> been appropriately considered", qualified with "If you were to use the
> >> domctl as currently designed from a stats gathering daemon, you will
> >> cripple Xen with the overhead".
> >>
> >> Going back to the original use, xenapi has a stats daemon for these
> >> things.  It has an rpc interface so a query given a specific domain can
> >> return some or all data for that domain, but it very definitely does not
> >> translate each request into a hypercall for the requested information.
> >> I have no real experience with libvirt, so can't comment on stats
> >> gathering in that context.
> >>
> >> I have proposed an alternative Xen->libxc interface designed with a
> >> stats daemon in mind, explaining why I believe it has lower overheads to
> >> Xen and why is more in line with what I expect ${VENDOR}Stack to
> >> actually want.
> >>
> >> I am now waiting for a reasoned rebuttal which has more content than
> >> "because there are a set of patches which already implement it in this way".
> > No, I don't have the patch for domctl implementation.
> >
> > In the past half year, all previous v1-v10 patches are implemented in sysctl way,
> however based on that, people raised a lot of comments (large size of memory,
> runtime non-0 order of memory allocation, page sharing with user space, CPU
> online/offline special logic, etc.), and these make the platform QoS
> implementation more and more complex in Xen. That's why I am proposing the
> domctl method that can make things easier.
> >
> > I don't have more things to argue or rebuttal, and if you prefer sysctl, I can
> continue to work out a v11, v12 or more, to present the big 2-dimension array to
> end user and let them withdraw their real required data, still includes the extra
> CPU online/offline logics to handle the QoS resource runtime allocation.
> >
> > Thanks,
> > Dongxiao
> 
> I am sorry - I was not trying to make an argument for one of the
> proposed mechanisms over the other.  The point I was trying to make
> (which on further consideration isn't as clear as I was hoping) is that
> you cannot possibly design the hypercall interface before knowing the
> library usecases, and there is a clear lack of understanding (or at
> least communication) in this regard.
> 
> 
> So, starting from the top. OpenStack want QoS information, and want to
> get it from libvirt/XenAPI.  I think libvirt/XenAPI is the correct level
> to do this at, and think exactly the same would apply to CloudStack as
> well.  The relevant part of this is the question "how does
> libvirt/XenAPI collect stats".
> 
> XenAPI collects stats with the RRD Daemon, running in dom0.  It has an
> internal database of statistics, and hands data from this database out
> upon RPC requests.  It also has threads whose purpose is to periodically
> refresh the data in the database.  This provides a disconnect between
> ${FOO}Stack requesting stats for a domain and the logic to obtain stats
> for that domain.
> 
> I am however unfamiliar with libvirt in this regard.  Could you please
> explain how the libvirt daemon deals with stats?

I am not the libvirt expert either.
Consult from other guys who work in libvirt that, libvirt doesn't maintain the domain status itself, but just expose the APIs for upper cloud/openstack to query, and these APIs accept the domain id as input parameter.

Thanks,
Dongxiao

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-08 11:25                     ` Andrew Cooper
  2014-05-09  2:41                       ` Xu, Dongxiao
@ 2014-05-13  1:53                       ` Xu, Dongxiao
  2014-05-16  5:11                       ` Xu, Dongxiao
  2 siblings, 0 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-13  1:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Ian Campbell, Jan Beulich, xen-devel

> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Friday, May 09, 2014 10:41 AM
> To: Andrew Cooper
> Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] Xen Platform QoS design discussion
> 
> > -----Original Message-----
> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > Sent: Thursday, May 08, 2014 7:26 PM
> > To: Xu, Dongxiao
> > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> >
> > On 08/05/14 06:21, Xu, Dongxiao wrote:
> >
> > <massive snip>
> >
> > >>
> > >>> We have two different hypercalls right now for getting "dominfo": a
> > >>> domctl and a sysctl.  You use the domctl if you want information about
> > >>> a single domain, you use sysctl if you want information about all
> > >>> domains.  The sysctl implementation calls the domctl implementation
> > >>> internally.
> > >> It is not a fair comparison, given the completely different nature of
> > >> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
> > >> more than reading specific bits of data out the appropriate struct
> > >> domain and its struct vcpu's which can trivially be done by the cpu
> > >> handling the hypercall.
> > >>
> > >>> Is there a problem with doing the same thing here?  Or, with starting
> > >>> with a domctl, and then creating a sysctl if iterating over all
> > >>> domains (and calling the domctl internally) if we measure the domctl
> > >>> to be too slow for many callers?
> > >>>
> > >>>  -George
> > >> My problem is not with the domctl per-se.
> > >>
> > >> My problem is that this is not a QoS design discussion;  this is an
> > >> email thread about a specific QoS implementation which is not answering
> > >> the concerns raised against it to the satisfaction of people raising the
> > >> concerns.
> > >>
> > >> The core argument here is that a statement of "OpenStack want to get a
> > >> piece of QoS data back from libvirt/xenapi when querying a specific
> > >> domain" is being used to justify implementing the hypercall in an
> > >> identical fashion.
> > >>
> > >> This is not a libxl design; this is a single user story forming part of
> > >> the requirement "I as a cloud service provider would like QoS
> > >> information for each VM to be available to my
> > >> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
> > >> customers, balance my load more evenly, etc}".
> > >>
> > >> The only valid justification for implementing a brand new hypercall in a
> > >> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
> > >> perform the actions I need to perform", for appropriately
> > >> substitutions.  Not "because it is the same way I want to hand this
> > >> information off at the higher level".
> > >>
> > >> As part of this design discussion. I have raised a concern saying "I
> > >> believe the usecase of having a stats gathering daemon in dom0 has not
> > >> been appropriately considered", qualified with "If you were to use the
> > >> domctl as currently designed from a stats gathering daemon, you will
> > >> cripple Xen with the overhead".
> > >>
> > >> Going back to the original use, xenapi has a stats daemon for these
> > >> things.  It has an rpc interface so a query given a specific domain can
> > >> return some or all data for that domain, but it very definitely does not
> > >> translate each request into a hypercall for the requested information.
> > >> I have no real experience with libvirt, so can't comment on stats
> > >> gathering in that context.
> > >>
> > >> I have proposed an alternative Xen->libxc interface designed with a
> > >> stats daemon in mind, explaining why I believe it has lower overheads to
> > >> Xen and why is more in line with what I expect ${VENDOR}Stack to
> > >> actually want.
> > >>
> > >> I am now waiting for a reasoned rebuttal which has more content than
> > >> "because there are a set of patches which already implement it in this way".
> > > No, I don't have the patch for domctl implementation.
> > >
> > > In the past half year, all previous v1-v10 patches are implemented in sysctl
> way,
> > however based on that, people raised a lot of comments (large size of memory,
> > runtime non-0 order of memory allocation, page sharing with user space, CPU
> > online/offline special logic, etc.), and these make the platform QoS
> > implementation more and more complex in Xen. That's why I am proposing the
> > domctl method that can make things easier.
> > >
> > > I don't have more things to argue or rebuttal, and if you prefer sysctl, I can
> > continue to work out a v11, v12 or more, to present the big 2-dimension array
> to
> > end user and let them withdraw their real required data, still includes the extra
> > CPU online/offline logics to handle the QoS resource runtime allocation.
> > >
> > > Thanks,
> > > Dongxiao
> >
> > I am sorry - I was not trying to make an argument for one of the
> > proposed mechanisms over the other.  The point I was trying to make
> > (which on further consideration isn't as clear as I was hoping) is that
> > you cannot possibly design the hypercall interface before knowing the
> > library usecases, and there is a clear lack of understanding (or at
> > least communication) in this regard.
> >
> >
> > So, starting from the top. OpenStack want QoS information, and want to
> > get it from libvirt/XenAPI.  I think libvirt/XenAPI is the correct level
> > to do this at, and think exactly the same would apply to CloudStack as
> > well.  The relevant part of this is the question "how does
> > libvirt/XenAPI collect stats".
> >
> > XenAPI collects stats with the RRD Daemon, running in dom0.  It has an
> > internal database of statistics, and hands data from this database out
> > upon RPC requests.  It also has threads whose purpose is to periodically
> > refresh the data in the database.  This provides a disconnect between
> > ${FOO}Stack requesting stats for a domain and the logic to obtain stats
> > for that domain.
> >
> > I am however unfamiliar with libvirt in this regard.  Could you please
> > explain how the libvirt daemon deals with stats?
> 
> I am not the libvirt expert either.
> Consult from other guys who work in libvirt that, libvirt doesn't maintain the
> domain status itself, but just expose the APIs for upper cloud/openstack to query,
> and these APIs accept the domain id as input parameter.

Hi Andrew,

Do you have more thought considering this libvirt usage?

Thanks,
Dongxiao

> 
> Thanks,
> Dongxiao
> 
> >
> > ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-08 11:25                     ` Andrew Cooper
  2014-05-09  2:41                       ` Xu, Dongxiao
  2014-05-13  1:53                       ` Xu, Dongxiao
@ 2014-05-16  5:11                       ` Xu, Dongxiao
  2014-05-19 11:28                         ` George Dunlap
  2 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-16  5:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Ian Campbell, Jan Beulich, xen-devel

> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Tuesday, May 13, 2014 9:53 AM
> To: Andrew Cooper
> Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] Xen Platform QoS design discussion
> 
> > -----Original Message-----
> > From: Xu, Dongxiao
> > Sent: Friday, May 09, 2014 10:41 AM
> > To: Andrew Cooper
> > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> > Subject: RE: [Xen-devel] Xen Platform QoS design discussion
> >
> > > -----Original Message-----
> > > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > > Sent: Thursday, May 08, 2014 7:26 PM
> > > To: Xu, Dongxiao
> > > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> > >
> > > On 08/05/14 06:21, Xu, Dongxiao wrote:
> > >
> > > <massive snip>
> > >
> > > >>
> > > >>> We have two different hypercalls right now for getting "dominfo": a
> > > >>> domctl and a sysctl.  You use the domctl if you want information about
> > > >>> a single domain, you use sysctl if you want information about all
> > > >>> domains.  The sysctl implementation calls the domctl implementation
> > > >>> internally.
> > > >> It is not a fair comparison, given the completely different nature of
> > > >> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
> > > >> more than reading specific bits of data out the appropriate struct
> > > >> domain and its struct vcpu's which can trivially be done by the cpu
> > > >> handling the hypercall.
> > > >>
> > > >>> Is there a problem with doing the same thing here?  Or, with starting
> > > >>> with a domctl, and then creating a sysctl if iterating over all
> > > >>> domains (and calling the domctl internally) if we measure the domctl
> > > >>> to be too slow for many callers?
> > > >>>
> > > >>>  -George
> > > >> My problem is not with the domctl per-se.
> > > >>
> > > >> My problem is that this is not a QoS design discussion;  this is an
> > > >> email thread about a specific QoS implementation which is not answering
> > > >> the concerns raised against it to the satisfaction of people raising the
> > > >> concerns.
> > > >>
> > > >> The core argument here is that a statement of "OpenStack want to get a
> > > >> piece of QoS data back from libvirt/xenapi when querying a specific
> > > >> domain" is being used to justify implementing the hypercall in an
> > > >> identical fashion.
> > > >>
> > > >> This is not a libxl design; this is a single user story forming part of
> > > >> the requirement "I as a cloud service provider would like QoS
> > > >> information for each VM to be available to my
> > > >> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
> > > >> customers, balance my load more evenly, etc}".
> > > >>
> > > >> The only valid justification for implementing a brand new hypercall in a
> > > >> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way
> to
> > > >> perform the actions I need to perform", for appropriately
> > > >> substitutions.  Not "because it is the same way I want to hand this
> > > >> information off at the higher level".
> > > >>
> > > >> As part of this design discussion. I have raised a concern saying "I
> > > >> believe the usecase of having a stats gathering daemon in dom0 has not
> > > >> been appropriately considered", qualified with "If you were to use the
> > > >> domctl as currently designed from a stats gathering daemon, you will
> > > >> cripple Xen with the overhead".
> > > >>
> > > >> Going back to the original use, xenapi has a stats daemon for these
> > > >> things.  It has an rpc interface so a query given a specific domain can
> > > >> return some or all data for that domain, but it very definitely does not
> > > >> translate each request into a hypercall for the requested information.
> > > >> I have no real experience with libvirt, so can't comment on stats
> > > >> gathering in that context.
> > > >>
> > > >> I have proposed an alternative Xen->libxc interface designed with a
> > > >> stats daemon in mind, explaining why I believe it has lower overheads to
> > > >> Xen and why is more in line with what I expect ${VENDOR}Stack to
> > > >> actually want.
> > > >>
> > > >> I am now waiting for a reasoned rebuttal which has more content than
> > > >> "because there are a set of patches which already implement it in this
> way".
> > > > No, I don't have the patch for domctl implementation.
> > > >
> > > > In the past half year, all previous v1-v10 patches are implemented in sysctl
> > way,
> > > however based on that, people raised a lot of comments (large size of
> memory,
> > > runtime non-0 order of memory allocation, page sharing with user space, CPU
> > > online/offline special logic, etc.), and these make the platform QoS
> > > implementation more and more complex in Xen. That's why I am proposing
> the
> > > domctl method that can make things easier.
> > > >
> > > > I don't have more things to argue or rebuttal, and if you prefer sysctl, I can
> > > continue to work out a v11, v12 or more, to present the big 2-dimension array
> > to
> > > end user and let them withdraw their real required data, still includes the
> extra
> > > CPU online/offline logics to handle the QoS resource runtime allocation.
> > > >
> > > > Thanks,
> > > > Dongxiao
> > >
> > > I am sorry - I was not trying to make an argument for one of the
> > > proposed mechanisms over the other.  The point I was trying to make
> > > (which on further consideration isn't as clear as I was hoping) is that
> > > you cannot possibly design the hypercall interface before knowing the
> > > library usecases, and there is a clear lack of understanding (or at
> > > least communication) in this regard.
> > >
> > >
> > > So, starting from the top. OpenStack want QoS information, and want to
> > > get it from libvirt/XenAPI.  I think libvirt/XenAPI is the correct level
> > > to do this at, and think exactly the same would apply to CloudStack as
> > > well.  The relevant part of this is the question "how does
> > > libvirt/XenAPI collect stats".
> > >
> > > XenAPI collects stats with the RRD Daemon, running in dom0.  It has an
> > > internal database of statistics, and hands data from this database out
> > > upon RPC requests.  It also has threads whose purpose is to periodically
> > > refresh the data in the database.  This provides a disconnect between
> > > ${FOO}Stack requesting stats for a domain and the logic to obtain stats
> > > for that domain.
> > >
> > > I am however unfamiliar with libvirt in this regard.  Could you please
> > > explain how the libvirt daemon deals with stats?
> >
> > I am not the libvirt expert either.
> > Consult from other guys who work in libvirt that, libvirt doesn't maintain the
> > domain status itself, but just expose the APIs for upper cloud/openstack to
> query,
> > and these APIs accept the domain id as input parameter.
> 
> Hi Andrew,
> 
> Do you have more thought considering this libvirt usage?

Ping...

Thanks,
Dongxiao

> 
> Thanks,
> Dongxiao
> 
> >
> > Thanks,
> > Dongxiao
> >
> > >
> > > ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-16  5:11                       ` Xu, Dongxiao
@ 2014-05-19 11:28                         ` George Dunlap
  2014-05-19 11:45                           ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: George Dunlap @ 2014-05-19 11:28 UTC (permalink / raw)
  To: Xu, Dongxiao; +Cc: Andrew Cooper, Ian Campbell, Jan Beulich, xen-devel

On Fri, May 16, 2014 at 6:11 AM, Xu, Dongxiao <dongxiao.xu@intel.com> wrote:
>> -----Original Message-----
>> From: Xu, Dongxiao
>> Sent: Tuesday, May 13, 2014 9:53 AM
>> To: Andrew Cooper
>> Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
>> Subject: RE: [Xen-devel] Xen Platform QoS design discussion
>>
>> > -----Original Message-----
>> > From: Xu, Dongxiao
>> > Sent: Friday, May 09, 2014 10:41 AM
>> > To: Andrew Cooper
>> > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
>> > Subject: RE: [Xen-devel] Xen Platform QoS design discussion
>> >
>> > > -----Original Message-----
>> > > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> > > Sent: Thursday, May 08, 2014 7:26 PM
>> > > To: Xu, Dongxiao
>> > > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@lists.xen.org
>> > > Subject: Re: [Xen-devel] Xen Platform QoS design discussion
>> > >
>> > > On 08/05/14 06:21, Xu, Dongxiao wrote:
>> > >
>> > > <massive snip>
>> > >
>> > > >>
>> > > >>> We have two different hypercalls right now for getting "dominfo": a
>> > > >>> domctl and a sysctl.  You use the domctl if you want information about
>> > > >>> a single domain, you use sysctl if you want information about all
>> > > >>> domains.  The sysctl implementation calls the domctl implementation
>> > > >>> internally.
>> > > >> It is not a fair comparison, given the completely different nature of
>> > > >> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
>> > > >> more than reading specific bits of data out the appropriate struct
>> > > >> domain and its struct vcpu's which can trivially be done by the cpu
>> > > >> handling the hypercall.
>> > > >>
>> > > >>> Is there a problem with doing the same thing here?  Or, with starting
>> > > >>> with a domctl, and then creating a sysctl if iterating over all
>> > > >>> domains (and calling the domctl internally) if we measure the domctl
>> > > >>> to be too slow for many callers?
>> > > >>>
>> > > >>>  -George
>> > > >> My problem is not with the domctl per-se.
>> > > >>
>> > > >> My problem is that this is not a QoS design discussion;  this is an
>> > > >> email thread about a specific QoS implementation which is not answering
>> > > >> the concerns raised against it to the satisfaction of people raising the
>> > > >> concerns.
>> > > >>
>> > > >> The core argument here is that a statement of "OpenStack want to get a
>> > > >> piece of QoS data back from libvirt/xenapi when querying a specific
>> > > >> domain" is being used to justify implementing the hypercall in an
>> > > >> identical fashion.
>> > > >>
>> > > >> This is not a libxl design; this is a single user story forming part of
>> > > >> the requirement "I as a cloud service provider would like QoS
>> > > >> information for each VM to be available to my
>> > > >> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
>> > > >> customers, balance my load more evenly, etc}".
>> > > >>
>> > > >> The only valid justification for implementing a brand new hypercall in a
>> > > >> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way
>> to
>> > > >> perform the actions I need to perform", for appropriately
>> > > >> substitutions.  Not "because it is the same way I want to hand this
>> > > >> information off at the higher level".
>> > > >>
>> > > >> As part of this design discussion. I have raised a concern saying "I
>> > > >> believe the usecase of having a stats gathering daemon in dom0 has not
>> > > >> been appropriately considered", qualified with "If you were to use the
>> > > >> domctl as currently designed from a stats gathering daemon, you will
>> > > >> cripple Xen with the overhead".
>> > > >>
>> > > >> Going back to the original use, xenapi has a stats daemon for these
>> > > >> things.  It has an rpc interface so a query given a specific domain can
>> > > >> return some or all data for that domain, but it very definitely does not
>> > > >> translate each request into a hypercall for the requested information.
>> > > >> I have no real experience with libvirt, so can't comment on stats
>> > > >> gathering in that context.
>> > > >>
>> > > >> I have proposed an alternative Xen->libxc interface designed with a
>> > > >> stats daemon in mind, explaining why I believe it has lower overheads to
>> > > >> Xen and why is more in line with what I expect ${VENDOR}Stack to
>> > > >> actually want.
>> > > >>
>> > > >> I am now waiting for a reasoned rebuttal which has more content than
>> > > >> "because there are a set of patches which already implement it in this
>> way".
>> > > > No, I don't have the patch for domctl implementation.
>> > > >
>> > > > In the past half year, all previous v1-v10 patches are implemented in sysctl
>> > way,
>> > > however based on that, people raised a lot of comments (large size of
>> memory,
>> > > runtime non-0 order of memory allocation, page sharing with user space, CPU
>> > > online/offline special logic, etc.), and these make the platform QoS
>> > > implementation more and more complex in Xen. That's why I am proposing
>> the
>> > > domctl method that can make things easier.
>> > > >
>> > > > I don't have more things to argue or rebuttal, and if you prefer sysctl, I can
>> > > continue to work out a v11, v12 or more, to present the big 2-dimension array
>> > to
>> > > end user and let them withdraw their real required data, still includes the
>> extra
>> > > CPU online/offline logics to handle the QoS resource runtime allocation.
>> > > >
>> > > > Thanks,
>> > > > Dongxiao
>> > >
>> > > I am sorry - I was not trying to make an argument for one of the
>> > > proposed mechanisms over the other.  The point I was trying to make
>> > > (which on further consideration isn't as clear as I was hoping) is that
>> > > you cannot possibly design the hypercall interface before knowing the
>> > > library usecases, and there is a clear lack of understanding (or at
>> > > least communication) in this regard.
>> > >
>> > >
>> > > So, starting from the top. OpenStack want QoS information, and want to
>> > > get it from libvirt/XenAPI.  I think libvirt/XenAPI is the correct level
>> > > to do this at, and think exactly the same would apply to CloudStack as
>> > > well.  The relevant part of this is the question "how does
>> > > libvirt/XenAPI collect stats".
>> > >
>> > > XenAPI collects stats with the RRD Daemon, running in dom0.  It has an
>> > > internal database of statistics, and hands data from this database out
>> > > upon RPC requests.  It also has threads whose purpose is to periodically
>> > > refresh the data in the database.  This provides a disconnect between
>> > > ${FOO}Stack requesting stats for a domain and the logic to obtain stats
>> > > for that domain.
>> > >
>> > > I am however unfamiliar with libvirt in this regard.  Could you please
>> > > explain how the libvirt daemon deals with stats?
>> >
>> > I am not the libvirt expert either.
>> > Consult from other guys who work in libvirt that, libvirt doesn't maintain the
>> > domain status itself, but just expose the APIs for upper cloud/openstack to
>> query,
>> > and these APIs accept the domain id as input parameter.
>>
>> Hi Andrew,
>>
>> Do you have more thought considering this libvirt usage?
>
> Ping...

So AndyC and I had a chat about this, and I think we came up with
something that would be do-able.  (This is from memory, so please
correct me if I missed anything, Andy.)

So the situation, as I understand it, is:

Stats are generated by MSRs on each CPU.  Collecting the stats from
the CPUs is potentially fairly expensive, including a number of IPIs;
and possibly reading the stats may be expensive as well.

However, we expect that many callers (including perhaps libvirt, or
xl/libxl) will want to view the information on a per-domain basis; and
may want to collect them on, say, a 1-second granularity.  Iterating
over each domain, collecting the stats for each one separately, for a
large installation may mean spending a non-negligible amount of time
just doing IPIs and reading MSRs, introducing an unacceptable level of
overhead.

So it seems like it would be better to collect information for all
domains at one time, amortizing the cost in one set of IPIs, and then
answering queries about each domain from that bit of "stored"
information.

The initial idea that comes to mind is having a daemon in dom0 collect
the metrics on a specified granularity (say, 1s) and then answer
per-domain queries.  However, we don't actually have the
infrastructure and standard architecture in place in libxl for
starting, managing, and talking to such a daemon; the entire thing
would have to be designed from scratch.

But in reality, all we need the daemon for is a place to store the
information to query.  The idea we came up with was to allocate memory
*inside the hypervisor* to store the information.  The idea is that
we'd have a sysctl to prompt Xen to *collect* the data into some
memory buffers inside of Xen, and then a domctl that would allow you
query the data on a per-domain basis.

That should be a good balance -- it's not quite as good as having as
separate daemon, but it's a pretty good compromise.

Thoughts?

There are a couple of options regarding collecting the data.  One is
to simply require the caller to do a "poll" sysctl every time they
want to refresh the data.  Another possibility would be to have a
sysctl "freshness" knob: you could say, "Please make sure the data is
no more than 1000ms old"; Xen could then automatically do a refresh
when necessary.

The advantage of the "poll" method is that you could get a consistent
snapshot across all domains; but you'd have to add in code to do the
refresh.  (An xl command querying an individual domain would
undoubtedly end up calling the poll on each execution, for instance.)

An advantage of the "freshness" knob, on the other hand, is that you
automatically get coalescing without having to do anything special
with the interface.

Does that make sense?  Is that something you might be willing to
implement, Dongxiao?

 -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-19 11:28                         ` George Dunlap
@ 2014-05-19 11:45                           ` Jan Beulich
  2014-05-19 12:13                             ` George Dunlap
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-19 11:45 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Dongxiao Xu, Ian Campbell, xen-devel

>>> On 19.05.14 at 13:28, <George.Dunlap@eu.citrix.com> wrote:
> But in reality, all we need the daemon for is a place to store the
> information to query.  The idea we came up with was to allocate memory
> *inside the hypervisor* to store the information.  The idea is that
> we'd have a sysctl to prompt Xen to *collect* the data into some
> memory buffers inside of Xen, and then a domctl that would allow you
> query the data on a per-domain basis.
> 
> That should be a good balance -- it's not quite as good as having as
> separate daemon, but it's a pretty good compromise.

Which all leaves aside the suggested alternative of making available
a couple of simple operations allowing an eventual daemon to do the
MSR accesses without the hypervisor being concerned about where
to store the data and how to make it accessible to the consumer.

> There are a couple of options regarding collecting the data.  One is
> to simply require the caller to do a "poll" sysctl every time they
> want to refresh the data.  Another possibility would be to have a
> sysctl "freshness" knob: you could say, "Please make sure the data is
> no more than 1000ms old"; Xen could then automatically do a refresh
> when necessary.
> 
> The advantage of the "poll" method is that you could get a consistent
> snapshot across all domains; but you'd have to add in code to do the
> refresh.  (An xl command querying an individual domain would
> undoubtedly end up calling the poll on each execution, for instance.)
> 
> An advantage of the "freshness" knob, on the other hand, is that you
> automatically get coalescing without having to do anything special
> with the interface.

With the clear disadvantage of potentially doing work the results of
which is never going to be looked at by anyone.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-19 11:45                           ` Jan Beulich
@ 2014-05-19 12:13                             ` George Dunlap
  2014-05-19 12:41                               ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: George Dunlap @ 2014-05-19 12:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Dongxiao Xu, Ian Campbell, xen-devel

On 05/19/2014 12:45 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 13:28, <George.Dunlap@eu.citrix.com> wrote:
>> But in reality, all we need the daemon for is a place to store the
>> information to query.  The idea we came up with was to allocate memory
>> *inside the hypervisor* to store the information.  The idea is that
>> we'd have a sysctl to prompt Xen to *collect* the data into some
>> memory buffers inside of Xen, and then a domctl that would allow you
>> query the data on a per-domain basis.
>>
>> That should be a good balance -- it's not quite as good as having as
>> separate daemon, but it's a pretty good compromise.
> Which all leaves aside the suggested alternative of making available
> a couple of simple operations allowing an eventual daemon to do the
> MSR accesses without the hypervisor being concerned about where
> to store the data and how to make it accessible to the consumer.

 From a libxl perspective, if we provide "libxl_qos_refresh()" (or 
"libxl_qos_freshness_set()") and "libxl_qos_domain_query()" (or 
something like it), it doesn't matter whether it's backed by memory 
stored in Xen via hypercall or by a daemon.

What I was actually envisioning was an option to either query them by a 
domctl hypercall, or by having a daemon map the pages and read them 
directly.  That way we have the daemon available for those who want it 
(say, maybe xapi, or a future libxl daemon / stat collector), but we can 
get a basic level implemented right now without a terrible amount of 
architectural work.

>> There are a couple of options regarding collecting the data.  One is
>> to simply require the caller to do a "poll" sysctl every time they
>> want to refresh the data.  Another possibility would be to have a
>> sysctl "freshness" knob: you could say, "Please make sure the data is
>> no more than 1000ms old"; Xen could then automatically do a refresh
>> when necessary.
>>
>> The advantage of the "poll" method is that you could get a consistent
>> snapshot across all domains; but you'd have to add in code to do the
>> refresh.  (An xl command querying an individual domain would
>> undoubtedly end up calling the poll on each execution, for instance.)
>>
>> An advantage of the "freshness" knob, on the other hand, is that you
>> automatically get coalescing without having to do anything special
>> with the interface.
> With the clear disadvantage of potentially doing work the results of
> which is never going to be looked at by anyone.

Jan, when you make a criticism it needs to be clear what alternate you 
are suggesting.

AFAICT, regarding "collection" of the data, we have exactly three options:
A. Implement a "collect for all domains" option (with an additional 
"query data for a single domain" mechanism; either by daemon or hypercall).
B. Implement a "collect information for a single domain at a time" option
C. Implement both options.

"Doing work that is never looked at by anyone" will always be a 
potential problem if we choose A, whether we use a daemon, or use the 
polling method, or use the automatic "freshness" knob.  The only way to 
avoid that is to do B or C.

We've already said that we expect the common case we expect is for a 
toolstack to want to query all domains anyway.  If we think that's true, 
"make the common case fast and the uncommon case correct" would dictate 
against B.

So are you suggesting B (disputing the expected use case)?  Or are you 
suggesting C?  Or are you just finding fault without thinking things 
through?

  -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-19 12:13                             ` George Dunlap
@ 2014-05-19 12:41                               ` Jan Beulich
  2014-05-22  8:19                                 ` Xu, Dongxiao
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-19 12:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Dongxiao Xu, Ian Campbell, xen-devel

>>> On 19.05.14 at 14:13, <george.dunlap@eu.citrix.com> wrote:
> On 05/19/2014 12:45 PM, Jan Beulich wrote:
>>>>> On 19.05.14 at 13:28, <George.Dunlap@eu.citrix.com> wrote:
>>> But in reality, all we need the daemon for is a place to store the
>>> information to query.  The idea we came up with was to allocate memory
>>> *inside the hypervisor* to store the information.  The idea is that
>>> we'd have a sysctl to prompt Xen to *collect* the data into some
>>> memory buffers inside of Xen, and then a domctl that would allow you
>>> query the data on a per-domain basis.
>>>
>>> That should be a good balance -- it's not quite as good as having as
>>> separate daemon, but it's a pretty good compromise.
>> Which all leaves aside the suggested alternative of making available
>> a couple of simple operations allowing an eventual daemon to do the
>> MSR accesses without the hypervisor being concerned about where
>> to store the data and how to make it accessible to the consumer.
> 
>  From a libxl perspective, if we provide "libxl_qos_refresh()" (or 
> "libxl_qos_freshness_set()") and "libxl_qos_domain_query()" (or 
> something like it), it doesn't matter whether it's backed by memory 
> stored in Xen via hypercall or by a daemon.
> 
> What I was actually envisioning was an option to either query them by a 
> domctl hypercall, or by having a daemon map the pages and read them 
> directly.  That way we have the daemon available for those who want it 
> (say, maybe xapi, or a future libxl daemon / stat collector), but we can 
> get a basic level implemented right now without a terrible amount of 
> architectural work.

But that's all centric towards the daemon concept (if we consider
storing the data inn hypervisor memory also being some kind of a
daemon). Whereas the simple helpers I'm suggesting wouldn't
necessarily require a daemon to be written at all - a query
operation for a domain would then simply be broken down at the
tools level to a number of MSR writes/reads.

>>> There are a couple of options regarding collecting the data.  One is
>>> to simply require the caller to do a "poll" sysctl every time they
>>> want to refresh the data.  Another possibility would be to have a
>>> sysctl "freshness" knob: you could say, "Please make sure the data is
>>> no more than 1000ms old"; Xen could then automatically do a refresh
>>> when necessary.
>>>
>>> The advantage of the "poll" method is that you could get a consistent
>>> snapshot across all domains; but you'd have to add in code to do the
>>> refresh.  (An xl command querying an individual domain would
>>> undoubtedly end up calling the poll on each execution, for instance.)
>>>
>>> An advantage of the "freshness" knob, on the other hand, is that you
>>> automatically get coalescing without having to do anything special
>>> with the interface.
>> With the clear disadvantage of potentially doing work the results of
>> which is never going to be looked at by anyone.
> 
> Jan, when you make a criticism it needs to be clear what alternate you 
> are suggesting.

With there only having been given two options, it seemed clear that
by seeing an obvious downside for one I would mean to other to be
preferable. Of course you're right in saying (further down) that the
risk of obtaining data that no-one is interested in is always there,
just that when someone says "poll" I'd imply (s)he's interested in the
data, as opposed to doing the collect periodically.

> AFAICT, regarding "collection" of the data, we have exactly three options:
> A. Implement a "collect for all domains" option (with an additional 
> "query data for a single domain" mechanism; either by daemon or hypercall).
> B. Implement a "collect information for a single domain at a time" option
> C. Implement both options.
> 
> "Doing work that is never looked at by anyone" will always be a 
> potential problem if we choose A, whether we use a daemon, or use the 
> polling method, or use the automatic "freshness" knob.  The only way to 
> avoid that is to do B or C.
> 
> We've already said that we expect the common case we expect is for a 
> toolstack to want to query all domains anyway.  If we think that's true, 
> "make the common case fast and the uncommon case correct" would dictate 
> against B.
> 
> So are you suggesting B (disputing the expected use case)?  Or are you 
> suggesting C?  Or are you just finding fault without thinking things 
> through?

I'm certainly putting under question whether the supposed use case
indeed is the common one, and I do that no matter which model
someone claims is going to be the "one". I simply see neither backed
by any sufficiently hard data.

And without seeing the need for any advanced access mechanism,
I'm continuing to try to promote D - implement simple, policy free
(platform or sysctl) hypercalls providing MSR access to the tool stack
(along the lines of the msr.ko Linux kernel driver).

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-19 12:41                               ` Jan Beulich
@ 2014-05-22  8:19                                 ` Xu, Dongxiao
  2014-05-22  8:39                                   ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-22  8:19 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap; +Cc: Andrew Cooper, Ian Campbell, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Jan Beulich
> Sent: Monday, May 19, 2014 8:42 PM
> To: George Dunlap
> Cc: Andrew Cooper; Xu, Dongxiao; Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> 
> >>> On 19.05.14 at 14:13, <george.dunlap@eu.citrix.com> wrote:
> > On 05/19/2014 12:45 PM, Jan Beulich wrote:
> >>>>> On 19.05.14 at 13:28, <George.Dunlap@eu.citrix.com> wrote:
> >>> But in reality, all we need the daemon for is a place to store the
> >>> information to query.  The idea we came up with was to allocate memory
> >>> *inside the hypervisor* to store the information.  The idea is that
> >>> we'd have a sysctl to prompt Xen to *collect* the data into some
> >>> memory buffers inside of Xen, and then a domctl that would allow you
> >>> query the data on a per-domain basis.
> >>>
> >>> That should be a good balance -- it's not quite as good as having as
> >>> separate daemon, but it's a pretty good compromise.
> >> Which all leaves aside the suggested alternative of making available
> >> a couple of simple operations allowing an eventual daemon to do the
> >> MSR accesses without the hypervisor being concerned about where
> >> to store the data and how to make it accessible to the consumer.
> >
> >  From a libxl perspective, if we provide "libxl_qos_refresh()" (or
> > "libxl_qos_freshness_set()") and "libxl_qos_domain_query()" (or
> > something like it), it doesn't matter whether it's backed by memory
> > stored in Xen via hypercall or by a daemon.
> >
> > What I was actually envisioning was an option to either query them by a
> > domctl hypercall, or by having a daemon map the pages and read them
> > directly.  That way we have the daemon available for those who want it
> > (say, maybe xapi, or a future libxl daemon / stat collector), but we can
> > get a basic level implemented right now without a terrible amount of
> > architectural work.
> 
> But that's all centric towards the daemon concept (if we consider
> storing the data inn hypervisor memory also being some kind of a
> daemon). Whereas the simple helpers I'm suggesting wouldn't
> necessarily require a daemon to be written at all - a query
> operation for a domain would then simply be broken down at the
> tools level to a number of MSR writes/reads.
> 
> >>> There are a couple of options regarding collecting the data.  One is
> >>> to simply require the caller to do a "poll" sysctl every time they
> >>> want to refresh the data.  Another possibility would be to have a
> >>> sysctl "freshness" knob: you could say, "Please make sure the data is
> >>> no more than 1000ms old"; Xen could then automatically do a refresh
> >>> when necessary.
> >>>
> >>> The advantage of the "poll" method is that you could get a consistent
> >>> snapshot across all domains; but you'd have to add in code to do the
> >>> refresh.  (An xl command querying an individual domain would
> >>> undoubtedly end up calling the poll on each execution, for instance.)
> >>>
> >>> An advantage of the "freshness" knob, on the other hand, is that you
> >>> automatically get coalescing without having to do anything special
> >>> with the interface.
> >> With the clear disadvantage of potentially doing work the results of
> >> which is never going to be looked at by anyone.
> >
> > Jan, when you make a criticism it needs to be clear what alternate you
> > are suggesting.
> 
> With there only having been given two options, it seemed clear that
> by seeing an obvious downside for one I would mean to other to be
> preferable. Of course you're right in saying (further down) that the
> risk of obtaining data that no-one is interested in is always there,
> just that when someone says "poll" I'd imply (s)he's interested in the
> data, as opposed to doing the collect periodically.
> 
> > AFAICT, regarding "collection" of the data, we have exactly three options:
> > A. Implement a "collect for all domains" option (with an additional
> > "query data for a single domain" mechanism; either by daemon or hypercall).
> > B. Implement a "collect information for a single domain at a time" option
> > C. Implement both options.
> >
> > "Doing work that is never looked at by anyone" will always be a
> > potential problem if we choose A, whether we use a daemon, or use the
> > polling method, or use the automatic "freshness" knob.  The only way to
> > avoid that is to do B or C.
> >
> > We've already said that we expect the common case we expect is for a
> > toolstack to want to query all domains anyway.  If we think that's true,
> > "make the common case fast and the uncommon case correct" would dictate
> > against B.
> >
> > So are you suggesting B (disputing the expected use case)?  Or are you
> > suggesting C?  Or are you just finding fault without thinking things
> > through?
> 
> I'm certainly putting under question whether the supposed use case
> indeed is the common one, and I do that no matter which model
> someone claims is going to be the "one". I simply see neither backed
> by any sufficiently hard data.
> 
> And without seeing the need for any advanced access mechanism,
> I'm continuing to try to promote D - implement simple, policy free
> (platform or sysctl) hypercalls providing MSR access to the tool stack
> (along the lines of the msr.ko Linux kernel driver).

Do you mean some hypercall implementation like following:
In this case, Dom0 toolstack actually queries the real physical CPU MSRs.

struct xen_sysctl_accessmsr      accessmsr
{
    unsigned int cpu;
    unsigned int msr;
    unsigned long value;
}

do_sysctl () {
...
case XEN_SYSCTL_accessmsr:
    /* store the msr value in accessmsr.value */
    on_selected_cpus(cpumask_of(cpu), read_msr, &(op->u.accessmsr), 1);
}


Thanks,
Dongxiao

> 
> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-22  8:19                                 ` Xu, Dongxiao
@ 2014-05-22  8:39                                   ` Jan Beulich
  2014-05-22  9:27                                     ` George Dunlap
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-22  8:39 UTC (permalink / raw)
  To: Dongxiao Xu; +Cc: George Dunlap, Andrew Cooper, Ian Campbell, xen-devel

>>> On 22.05.14 at 10:19, <dongxiao.xu@intel.com> wrote:
>> From: xen-devel-bounces@lists.xen.org 
>> And without seeing the need for any advanced access mechanism,
>> I'm continuing to try to promote D - implement simple, policy free
>> (platform or sysctl) hypercalls providing MSR access to the tool stack
>> (along the lines of the msr.ko Linux kernel driver).
> 
> Do you mean some hypercall implementation like following:
> In this case, Dom0 toolstack actually queries the real physical CPU MSRs.
> 
> struct xen_sysctl_accessmsr      accessmsr
> {
>     unsigned int cpu;
>     unsigned int msr;
>     unsigned long value;
> }
> 
> do_sysctl () {
> ...
> case XEN_SYSCTL_accessmsr:
>     /* store the msr value in accessmsr.value */
>     on_selected_cpus(cpumask_of(cpu), read_msr, &(op->u.accessmsr), 1);
> }

Yes, along those lines, albeit slightly more sophisticated based on
the specific kind of operations needed for e.g. QoS (Andrew had
some comments to the effect that simple read and write operations
alone may not suffice).

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-22  8:39                                   ` Jan Beulich
@ 2014-05-22  9:27                                     ` George Dunlap
  2014-05-26  0:51                                       ` Xu, Dongxiao
  2014-05-29  0:45                                       ` Xu, Dongxiao
  0 siblings, 2 replies; 46+ messages in thread
From: George Dunlap @ 2014-05-22  9:27 UTC (permalink / raw)
  To: Jan Beulich, Dongxiao Xu; +Cc: Andrew Cooper, Ian Campbell, xen-devel

On 05/22/2014 09:39 AM, Jan Beulich wrote:
>>>> On 22.05.14 at 10:19, <dongxiao.xu@intel.com> wrote:
>>> From: xen-devel-bounces@lists.xen.org
>>> And without seeing the need for any advanced access mechanism,
>>> I'm continuing to try to promote D - implement simple, policy free
>>> (platform or sysctl) hypercalls providing MSR access to the tool stack
>>> (along the lines of the msr.ko Linux kernel driver).
>> Do you mean some hypercall implementation like following:
>> In this case, Dom0 toolstack actually queries the real physical CPU MSRs.
>>
>> struct xen_sysctl_accessmsr      accessmsr
>> {
>>      unsigned int cpu;
>>      unsigned int msr;
>>      unsigned long value;
>> }
>>
>> do_sysctl () {
>> ...
>> case XEN_SYSCTL_accessmsr:
>>      /* store the msr value in accessmsr.value */
>>      on_selected_cpus(cpumask_of(cpu), read_msr, &(op->u.accessmsr), 1);
>> }
> Yes, along those lines, albeit slightly more sophisticated based on
> the specific kind of operations needed for e.g. QoS (Andrew had
> some comments to the effect that simple read and write operations
> alone may not suffice).

That sounds nice and clean, and hopefully would be flexible enough to do 
stuff in the future.

But fundamentally that doesn't address Andrew's concerns that if callers 
are going to make repeated calls into libxl for each domain, this won't 
scale.

On the other hand, there may be an argument for saying, "We'll optimize 
that if we find it's a problem."

Dongxiao, is this functionality implemented for KVM yet?  Do you know 
how they're doing it?

  -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-22  9:27                                     ` George Dunlap
@ 2014-05-26  0:51                                       ` Xu, Dongxiao
  2014-05-29  0:45                                       ` Xu, Dongxiao
  1 sibling, 0 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-26  0:51 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: Andrew Cooper, Ian Campbell, xen-devel

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> Sent: Thursday, May 22, 2014 5:27 PM
> To: Jan Beulich; Xu, Dongxiao
> Cc: Andrew Cooper; Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> 
> On 05/22/2014 09:39 AM, Jan Beulich wrote:
> >>>> On 22.05.14 at 10:19, <dongxiao.xu@intel.com> wrote:
> >>> From: xen-devel-bounces@lists.xen.org
> >>> And without seeing the need for any advanced access mechanism,
> >>> I'm continuing to try to promote D - implement simple, policy free
> >>> (platform or sysctl) hypercalls providing MSR access to the tool stack
> >>> (along the lines of the msr.ko Linux kernel driver).
> >> Do you mean some hypercall implementation like following:
> >> In this case, Dom0 toolstack actually queries the real physical CPU MSRs.
> >>
> >> struct xen_sysctl_accessmsr      accessmsr
> >> {
> >>      unsigned int cpu;
> >>      unsigned int msr;
> >>      unsigned long value;
> >> }
> >>
> >> do_sysctl () {
> >> ...
> >> case XEN_SYSCTL_accessmsr:
> >>      /* store the msr value in accessmsr.value */
> >>      on_selected_cpus(cpumask_of(cpu), read_msr, &(op->u.accessmsr),
> 1);
> >> }
> > Yes, along those lines, albeit slightly more sophisticated based on
> > the specific kind of operations needed for e.g. QoS (Andrew had
> > some comments to the effect that simple read and write operations
> > alone may not suffice).
> 
> That sounds nice and clean, and hopefully would be flexible enough to do
> stuff in the future.
> 
> But fundamentally that doesn't address Andrew's concerns that if callers
> are going to make repeated calls into libxl for each domain, this won't
> scale.
> 
> On the other hand, there may be an argument for saying, "We'll optimize
> that if we find it's a problem."
> 
> Dongxiao, is this functionality implemented for KVM yet?  Do you know
> how they're doing it?

No, KVM CQM is not enabled yet. :(

Thanks,
Dongxiao

> 
>   -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-22  9:27                                     ` George Dunlap
  2014-05-26  0:51                                       ` Xu, Dongxiao
@ 2014-05-29  0:45                                       ` Xu, Dongxiao
  2014-05-29  7:01                                         ` Jan Beulich
  1 sibling, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-29  0:45 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: Andrew Cooper, Ian Campbell, xen-devel

> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Monday, May 26, 2014 8:52 AM
> To: George Dunlap; Jan Beulich
> Cc: Andrew Cooper; Ian Campbell; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] Xen Platform QoS design discussion
> 
> > -----Original Message-----
> > From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> > Sent: Thursday, May 22, 2014 5:27 PM
> > To: Jan Beulich; Xu, Dongxiao
> > Cc: Andrew Cooper; Ian Campbell; xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> >
> > On 05/22/2014 09:39 AM, Jan Beulich wrote:
> > >>>> On 22.05.14 at 10:19, <dongxiao.xu@intel.com> wrote:
> > >>> From: xen-devel-bounces@lists.xen.org
> > >>> And without seeing the need for any advanced access mechanism,
> > >>> I'm continuing to try to promote D - implement simple, policy free
> > >>> (platform or sysctl) hypercalls providing MSR access to the tool stack
> > >>> (along the lines of the msr.ko Linux kernel driver).
> > >> Do you mean some hypercall implementation like following:
> > >> In this case, Dom0 toolstack actually queries the real physical CPU MSRs.
> > >>
> > >> struct xen_sysctl_accessmsr      accessmsr
> > >> {
> > >>      unsigned int cpu;
> > >>      unsigned int msr;
> > >>      unsigned long value;
> > >> }
> > >>
> > >> do_sysctl () {
> > >> ...
> > >> case XEN_SYSCTL_accessmsr:
> > >>      /* store the msr value in accessmsr.value */
> > >>      on_selected_cpus(cpumask_of(cpu), read_msr, &(op->u.accessmsr),
> > 1);
> > >> }
> > > Yes, along those lines, albeit slightly more sophisticated based on
> > > the specific kind of operations needed for e.g. QoS (Andrew had
> > > some comments to the effect that simple read and write operations
> > > alone may not suffice).
> >
> > That sounds nice and clean, and hopefully would be flexible enough to do
> > stuff in the future.
> >
> > But fundamentally that doesn't address Andrew's concerns that if callers
> > are going to make repeated calls into libxl for each domain, this won't
> > scale.
> >
> > On the other hand, there may be an argument for saying, "We'll optimize
> > that if we find it's a problem."
> >
> > Dongxiao, is this functionality implemented for KVM yet?  Do you know
> > how they're doing it?
> 
> No, KVM CQM is not enabled yet. :(

I think Jan's opinion here is similar to what I proposed in the beginning of this thread.
The only difference is that, Jan prefers to get the CQM data per-socket and per-domain with data copying, while I proposed to get the CQM data per-domain for all sockets that can reduce the amount of hypercalls.

Stakeholders, please provide your suggestion, whether the hypercall is designed to get the data per-socket and per-domain, or per-domain for all sockets. 
Do you think whether it is better to implement such a version of patch based on this idea? I am okay to implement either. :)

Thanks,
Dongxiao

> 
> Thanks,
> Dongxiao
> 
> >
> >   -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-29  0:45                                       ` Xu, Dongxiao
@ 2014-05-29  7:01                                         ` Jan Beulich
  2014-05-29  7:31                                           ` Xu, Dongxiao
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-29  7:01 UTC (permalink / raw)
  To: george.dunlap, dongxiao.xu; +Cc: andrew.cooper3, Ian.Campbell, xen-devel

>>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 2:46 AM >>>
>I think Jan's opinion here is similar to what I proposed in the beginning of this thread.
>The only difference is that, Jan prefers to get the CQM data per-socket and per-domain
>with data copying, while I proposed to get the CQM data per-domain for all sockets
>that can reduce the amount of hypercalls.

I don't think I ever voiced any preference between these two. All I said it depends on
prevalent usage models, and to date I don't think I've seen a proper analysis of what
the main usage model would be - it all seems guesswork and/or taking random
examples.

What I did say I'd prefer is to have all this done outside the hypervisor, with the
hypervisor just providing fundamental infrastructure (MSR accesses).

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-29  7:01                                         ` Jan Beulich
@ 2014-05-29  7:31                                           ` Xu, Dongxiao
  2014-05-29  9:11                                             ` Jan Beulich
  2014-05-29  9:13                                             ` Andrew Cooper
  0 siblings, 2 replies; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-29  7:31 UTC (permalink / raw)
  To: Jan Beulich, george.dunlap
  Cc: andrew.cooper3, Auld, Will, Ian.Campbell, Nakajima, Jun, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: Thursday, May 29, 2014 3:02 PM
> To: george.dunlap@eu.citrix.com; Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> xen-devel@lists.xen.org
> Subject: Re: RE: [Xen-devel] Xen Platform QoS design discussion
> 
> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 2:46 AM >>>
> >I think Jan's opinion here is similar to what I proposed in the beginning of this
> thread.
> >The only difference is that, Jan prefers to get the CQM data per-socket and
> per-domain
> >with data copying, while I proposed to get the CQM data per-domain for all
> sockets
> >that can reduce the amount of hypercalls.
> 
> I don't think I ever voiced any preference between these two. All I said it
> depends on
> prevalent usage models, and to date I don't think I've seen a proper analysis of
> what
> the main usage model would be - it all seems guesswork and/or taking random
> examples.
> 
> What I did say I'd prefer is to have all this done outside the hypervisor, with the
> hypervisor just providing fundamental infrastructure (MSR accesses).

Okay. If I understand correctly, you prefer to implement a pure MSR access hypercall for one CPU, and put all other CQM things in libxc/libxl layer.

In this case, if libvert/XenAPI is trying to query a domain's cache utilization in the system (say 2 sockets), then it will trigger _two_ such MSR access hypercalls for CPUs in the 2 different sockets.
If you are okay with this idea, I am going to implement it.
 
Thanks,
Dongxiao
 

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-29  7:31                                           ` Xu, Dongxiao
@ 2014-05-29  9:11                                             ` Jan Beulich
  2014-05-30  9:10                                               ` Ian Campbell
  2014-05-29  9:13                                             ` Andrew Cooper
  1 sibling, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-29  9:11 UTC (permalink / raw)
  To: george.dunlap, dongxiao.xu
  Cc: andrew.cooper3, will.auld, Ian.Campbell, jun.nakajima, xen-devel

>>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 9:31 AM >>>
>Okay. If I understand correctly, you prefer to implement a pure MSR access
>hypercall for one CPU, and put all other CQM things in libxc/libxl layer.

>In this case, if libvert/XenAPI is trying to query a domain's cache utilization
>in the system (say 2 sockets), then it will trigger _two_ such MSR access
>hypercalls for CPUs in the 2 different sockets.
>If you are okay with this idea, I am going to implement it.
 
I am okay with it, but give it a couple of days before you start so that others
can voice their opinions too.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-29  7:31                                           ` Xu, Dongxiao
  2014-05-29  9:11                                             ` Jan Beulich
@ 2014-05-29  9:13                                             ` Andrew Cooper
  2014-05-30  1:07                                               ` Xu, Dongxiao
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2014-05-29  9:13 UTC (permalink / raw)
  To: Xu, Dongxiao, Jan Beulich, george.dunlap
  Cc: Auld, Will, Ian.Campbell, Nakajima, Jun, xen-devel


On 29/05/2014 08:31, Xu, Dongxiao wrote:
>> -----Original Message-----
>> From: Jan Beulich [mailto:jbeulich@suse.com]
>> Sent: Thursday, May 29, 2014 3:02 PM
>> To: george.dunlap@eu.citrix.com; Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> xen-devel@lists.xen.org
>> Subject: Re: RE: [Xen-devel] Xen Platform QoS design discussion
>>
>>>>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 2:46 AM >>>
>>> I think Jan's opinion here is similar to what I proposed in the beginning of this
>> thread.
>>> The only difference is that, Jan prefers to get the CQM data per-socket and
>> per-domain
>>> with data copying, while I proposed to get the CQM data per-domain for all
>> sockets
>>> that can reduce the amount of hypercalls.
>> I don't think I ever voiced any preference between these two. All I said it
>> depends on
>> prevalent usage models, and to date I don't think I've seen a proper analysis of
>> what
>> the main usage model would be - it all seems guesswork and/or taking random
>> examples.
>>
>> What I did say I'd prefer is to have all this done outside the hypervisor, with the
>> hypervisor just providing fundamental infrastructure (MSR accesses).
> Okay. If I understand correctly, you prefer to implement a pure MSR access hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
>
> In this case, if libvert/XenAPI is trying to query a domain's cache utilization in the system (say 2 sockets), then it will trigger _two_ such MSR access hypercalls for CPUs in the 2 different sockets.
> If you are okay with this idea, I am going to implement it.
>   
> Thanks,
> Dongxiao

While I can see the use and attraction of a generic MSR access 
hypercalls, using this method for getting QoS data is going to have 
subsantitally higher overhead than even the original domctl suggestion.

I do not believe it will be an effective means of getting large 
quantities of data from ring0 MSRs into dom0 userspace.  This is not to 
say that having a generic MSR interface is a bad thing, but I don't 
think it should be used for this purpose.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-29  9:13                                             ` Andrew Cooper
@ 2014-05-30  1:07                                               ` Xu, Dongxiao
  2014-05-30  6:23                                                 ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-30  1:07 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, george.dunlap
  Cc: Auld, Will, Ian.Campbell, Nakajima, Jun, xen-devel

> On 29/05/2014 08:31, Xu, Dongxiao wrote:
> >> -----Original Message-----
> > Okay. If I understand correctly, you prefer to implement a pure MSR access
> hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
> >
> > In this case, if libvert/XenAPI is trying to query a domain's cache utilization in
> the system (say 2 sockets), then it will trigger _two_ such MSR access hypercalls
> for CPUs in the 2 different sockets.
> > If you are okay with this idea, I am going to implement it.
> >
> > Thanks,
> > Dongxiao
> 
> While I can see the use and attraction of a generic MSR access
> hypercalls, using this method for getting QoS data is going to have
> subsantitally higher overhead than even the original domctl suggestion.
> 
> I do not believe it will be an effective means of getting large
> quantities of data from ring0 MSRs into dom0 userspace.  This is not to
> say that having a generic MSR interface is a bad thing, but I don't
> think it should be used for this purpose.

They are the two directions to implement this feature:

Generic MSR access hypercall: It removes most of the policy in hypervisor and put them in Dom0 userspace, which makes the implementation in hypervisor very simple, with slightly higher cost (more hypercalls).
Sysctl hypercall: Hypervisor needs to consolidate all the CQM data which burdens the implementation, with more memory allocation, data sharing mechanism (2-level, page's address page, and page itself), also more policies in hypervisor, etc. While its cost is slightly less.

In my opinion, domctl is a compromise between two approaches, with moderate policies in hypervisor, moderate size of memory allocation, with moderate cost.

I would prefer domctl or generic MSR access way, which makes the implementation in hypervisor simple enough, but with only slightly higher cost.

Thanks,
Dongxiao

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-30  1:07                                               ` Xu, Dongxiao
@ 2014-05-30  6:23                                                 ` Jan Beulich
  2014-05-30  7:51                                                   ` Xu, Dongxiao
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-05-30  6:23 UTC (permalink / raw)
  To: andrew.cooper3, george.dunlap, dongxiao.xu
  Cc: will.auld, Ian.Campbell, jun.nakajima, xen-devel

>>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/30/14 3:07 AM >>>
>> While I can see the use and attraction of a generic MSR access
>> hypercalls, using this method for getting QoS data is going to have
>> subsantitally higher overhead than even the original domctl suggestion.
>> 
>> I do not believe it will be an effective means of getting large
>> quantities of data from ring0 MSRs into dom0 userspace.  This is not to
>> say that having a generic MSR interface is a bad thing, but I don't
>> think it should be used for this purpose.
>
>They are the two directions to implement this feature:
>
>Generic MSR access hypercall: It removes most of the policy in hypervisor
> and put them in Dom0 userspace, which makes the implementation in
> hypervisor very simple, with slightly higher cost (more hypercalls).
>Sysctl hypercall: Hypervisor needs to consolidate all the CQM data which
> burdens the implementation, with more memory allocation, data sharing
> mechanism (2-level, page's address page, and page itself), also more
> policies in hypervisor, etc. While its cost is slightly less.

>In my opinion, domctl is a compromise between two approaches, with
> moderate policies in hypervisor, moderate size of memory allocation, with
> moderate cost.

>I would prefer domctl or generic MSR access way, which makes the
> implementation in hypervisor simple enough, but with only slightly higher cost.

Andrew and I talked a little more about this yesterday, and I think we agreed
that until there is a proven need for the sysctl and/or the domctl approach, the
generic MSR access route should be good enough. Suitably batched the
number of hypercalls doesn't even need to be much higher than either of the
other approaches.

Question is whether this mechanism (which I'd like to be done so it can also
later get added support for e.g. port I/O: see Linux'es dcdbas_smi_request()
for where this might be useful) should become a sysctl op or - to be
easily usable by the kernel too - a platform one.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-30  6:23                                                 ` Jan Beulich
@ 2014-05-30  7:51                                                   ` Xu, Dongxiao
  2014-05-30 11:15                                                     ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-05-30  7:51 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3, george.dunlap
  Cc: Auld, Will, Ian.Campbell, Nakajima, Jun, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: Friday, May 30, 2014 2:23 PM
> To: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Xu, Dongxiao
> Cc: Ian.Campbell@citrix.com; Nakajima, Jun; Auld, Will; xen-devel@lists.xen.org
> Subject: Re: RE: [Xen-devel] Xen Platform QoS design discussion
> 
> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/30/14 3:07 AM >>>
> >> While I can see the use and attraction of a generic MSR access
> >> hypercalls, using this method for getting QoS data is going to have
> >> subsantitally higher overhead than even the original domctl suggestion.
> >>
> >> I do not believe it will be an effective means of getting large
> >> quantities of data from ring0 MSRs into dom0 userspace.  This is not to
> >> say that having a generic MSR interface is a bad thing, but I don't
> >> think it should be used for this purpose.
> >
> >They are the two directions to implement this feature:
> >
> >Generic MSR access hypercall: It removes most of the policy in hypervisor
> > and put them in Dom0 userspace, which makes the implementation in
> > hypervisor very simple, with slightly higher cost (more hypercalls).
> >Sysctl hypercall: Hypervisor needs to consolidate all the CQM data which
> > burdens the implementation, with more memory allocation, data sharing
> > mechanism (2-level, page's address page, and page itself), also more
> > policies in hypervisor, etc. While its cost is slightly less.
> 
> >In my opinion, domctl is a compromise between two approaches, with
> > moderate policies in hypervisor, moderate size of memory allocation, with
> > moderate cost.
> 
> >I would prefer domctl or generic MSR access way, which makes the
> > implementation in hypervisor simple enough, but with only slightly higher cost.
> 
> Andrew and I talked a little more about this yesterday, and I think we agreed
> that until there is a proven need for the sysctl and/or the domctl approach, the
> generic MSR access route should be good enough. Suitably batched the
> number of hypercalls doesn't even need to be much higher than either of the
> other approaches.

Okay, I will wait several days to see whether other people have more comments about this MSR access hypercall. If no, I will start to implement it.

> 
> Question is whether this mechanism (which I'd like to be done so it can also
> later get added support for e.g. port I/O: see Linux'es dcdbas_smi_request()
> for where this might be useful) should become a sysctl op or - to be
> easily usable by the kernel too - a platform one.

For CQM, this MSR access hypercall may be somewhat specific, because it requires one MSR write and one MSR read, which cannot be split into two separate hypercalls to avoid preemption in the middle.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-29  9:11                                             ` Jan Beulich
@ 2014-05-30  9:10                                               ` Ian Campbell
  2014-05-30 11:17                                                 ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Ian Campbell @ 2014-05-30  9:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: george.dunlap, andrew.cooper3, will.auld, xen-devel, dongxiao.xu,
	jun.nakajima

On Thu, 2014-05-29 at 10:11 +0100, Jan Beulich wrote:
> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 9:31 AM >>>
> >Okay. If I understand correctly, you prefer to implement a pure MSR access
> >hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
> 
> >In this case, if libvert/XenAPI is trying to query a domain's cache utilization
> >in the system (say 2 sockets), then it will trigger _two_ such MSR access
> >hypercalls for CPUs in the 2 different sockets.
> >If you are okay with this idea, I am going to implement it.
>  
> I am okay with it, but give it a couple of days before you start so that others
> can voice their opinions too.

Dom0 may not have a vcpu which is scheduled/schedulable on every socket.
scheduled it can probably deal with by doing awful sounding temporary
things to its affinity mask, but if it is not schedulable (e.g. due to
cpupools etc) then that sounds even harder to sort...

Ian.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-30  7:51                                                   ` Xu, Dongxiao
@ 2014-05-30 11:15                                                     ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2014-05-30 11:15 UTC (permalink / raw)
  To: andrew.cooper3, george.dunlap, dongxiao.xu
  Cc: will.auld, Ian.Campbell, jun.nakajima, xen-devel

>>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/30/14 9:52 AM >>>
> From: Jan Beulich [mailto:jbeulich@suse.com]
>> Question is whether this mechanism (which I'd like to be done so it can also
>> later get added support for e.g. port I/O: see Linux'es dcdbas_smi_request()
>> for where this might be useful) should become a sysctl op or - to be
>> easily usable by the kernel too - a platform one.
>
>For CQM, this MSR access hypercall may be somewhat specific, because it
>requires one MSR write and one MSR read, which cannot be split into two
>separate hypercalls to avoid preemption in the middle.

Sure - you'd need a flag to suppress preemption between two specific (batched)
operations.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-30  9:10                                               ` Ian Campbell
@ 2014-05-30 11:17                                                 ` Jan Beulich
  2014-05-30 12:33                                                   ` Ian Campbell
  2014-06-05  0:48                                                   ` Xu, Dongxiao
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Beulich @ 2014-05-30 11:17 UTC (permalink / raw)
  To: ian.campbell
  Cc: george.dunlap, andrew.cooper3, will.auld, xen-devel, dongxiao.xu,
	jun.nakajima

>>> Ian Campbell <ian.campbell@citrix.com> 05/30/14 11:11 AM >>>
>On Thu, 2014-05-29 at 10:11 +0100, Jan Beulich wrote:
>> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 9:31 AM >>>
>> >Okay. If I understand correctly, you prefer to implement a pure MSR access
>> >hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
>> 
>> >In this case, if libvert/XenAPI is trying to query a domain's cache utilization
>> >in the system (say 2 sockets), then it will trigger _two_ such MSR access
>> >hypercalls for CPUs in the 2 different sockets.
>> >If you are okay with this idea, I am going to implement it.
>>  
>> I am okay with it, but give it a couple of days before you start so that others
>> can voice their opinions too.
>
>Dom0 may not have a vcpu which is scheduled/schedulable on every socket.
>scheduled it can probably deal with by doing awful sounding temporary
>things to its affinity mask, but if it is not schedulable (e.g. due to
>cpupools etc) then that sounds even harder to sort...

But that's why we're intending to add a helper hypercall in the first place. This
isn't intended to be a 'read MSR' one, but a 'read MSR in this CPU'.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-30 11:17                                                 ` Jan Beulich
@ 2014-05-30 12:33                                                   ` Ian Campbell
  2014-06-05  0:48                                                   ` Xu, Dongxiao
  1 sibling, 0 replies; 46+ messages in thread
From: Ian Campbell @ 2014-05-30 12:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: george.dunlap, andrew.cooper3, will.auld, xen-devel, dongxiao.xu,
	jun.nakajima

On Fri, 2014-05-30 at 12:17 +0100, Jan Beulich wrote:
> >>> Ian Campbell <ian.campbell@citrix.com> 05/30/14 11:11 AM >>>
> >On Thu, 2014-05-29 at 10:11 +0100, Jan Beulich wrote:
> >> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 9:31 AM >>>
> >> >Okay. If I understand correctly, you prefer to implement a pure MSR access
> >> >hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
> >> 
> >> >In this case, if libvert/XenAPI is trying to query a domain's cache utilization
> >> >in the system (say 2 sockets), then it will trigger _two_ such MSR access
> >> >hypercalls for CPUs in the 2 different sockets.
> >> >If you are okay with this idea, I am going to implement it.
> >>  
> >> I am okay with it, but give it a couple of days before you start so that others
> >> can voice their opinions too.
> >
> >Dom0 may not have a vcpu which is scheduled/schedulable on every socket.
> >scheduled it can probably deal with by doing awful sounding temporary
> >things to its affinity mask, but if it is not schedulable (e.g. due to
> >cpupools etc) then that sounds even harder to sort...
> 
> But that's why we're intending to add a helper hypercall in the first place. This
> isn't intended to be a 'read MSR' one, but a 'read MSR in this CPU'.

Oh good, you should ignore me then ;-)

Ian.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-05-30 11:17                                                 ` Jan Beulich
  2014-05-30 12:33                                                   ` Ian Campbell
@ 2014-06-05  0:48                                                   ` Xu, Dongxiao
  2014-06-05 10:43                                                     ` George Dunlap
  1 sibling, 1 reply; 46+ messages in thread
From: Xu, Dongxiao @ 2014-06-05  0:48 UTC (permalink / raw)
  To: Jan Beulich, ian.campbell
  Cc: george.dunlap, andrew.cooper3, Auld, Will, Nakajima, Jun, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: Friday, May 30, 2014 7:18 PM
> To: ian.campbell@citrix.com
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Xu, Dongxiao;
> Nakajima, Jun; Auld, Will; xen-devel@lists.xen.org
> Subject: Re: RE: RE: [Xen-devel] Xen Platform QoS design discussion
> 
> >>> Ian Campbell <ian.campbell@citrix.com> 05/30/14 11:11 AM >>>
> >On Thu, 2014-05-29 at 10:11 +0100, Jan Beulich wrote:
> >> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 9:31 AM >>>
> >> >Okay. If I understand correctly, you prefer to implement a pure MSR access
> >> >hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
> >>
> >> >In this case, if libvert/XenAPI is trying to query a domain's cache utilization
> >> >in the system (say 2 sockets), then it will trigger _two_ such MSR access
> >> >hypercalls for CPUs in the 2 different sockets.
> >> >If you are okay with this idea, I am going to implement it.
> >>
> >> I am okay with it, but give it a couple of days before you start so that others
> >> can voice their opinions too.
> >
> >Dom0 may not have a vcpu which is scheduled/schedulable on every socket.
> >scheduled it can probably deal with by doing awful sounding temporary
> >things to its affinity mask, but if it is not schedulable (e.g. due to
> >cpupools etc) then that sounds even harder to sort...
> 
> But that's why we're intending to add a helper hypercall in the first place. This
> isn't intended to be a 'read MSR' one, but a 'read MSR in this CPU'.

No more comments on this MSR access hypercall design now, so I assume people are mostly okay with it?

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Xen Platform QoS design discussion
  2014-06-05  0:48                                                   ` Xu, Dongxiao
@ 2014-06-05 10:43                                                     ` George Dunlap
  0 siblings, 0 replies; 46+ messages in thread
From: George Dunlap @ 2014-06-05 10:43 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: ian.campbell, andrew.cooper3, xen-devel, Auld, Will, Jan Beulich,
	Nakajima, Jun

On Thu, Jun 5, 2014 at 1:48 AM, Xu, Dongxiao <dongxiao.xu@intel.com> wrote:
>> -----Original Message-----
>> From: Jan Beulich [mailto:jbeulich@suse.com]
>> Sent: Friday, May 30, 2014 7:18 PM
>> To: ian.campbell@citrix.com
>> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Xu, Dongxiao;
>> Nakajima, Jun; Auld, Will; xen-devel@lists.xen.org
>> Subject: Re: RE: RE: [Xen-devel] Xen Platform QoS design discussion
>>
>> >>> Ian Campbell <ian.campbell@citrix.com> 05/30/14 11:11 AM >>>
>> >On Thu, 2014-05-29 at 10:11 +0100, Jan Beulich wrote:
>> >> >>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 05/29/14 9:31 AM >>>
>> >> >Okay. If I understand correctly, you prefer to implement a pure MSR access
>> >> >hypercall for one CPU, and put all other CQM things in libxc/libxl layer.
>> >>
>> >> >In this case, if libvert/XenAPI is trying to query a domain's cache utilization
>> >> >in the system (say 2 sockets), then it will trigger _two_ such MSR access
>> >> >hypercalls for CPUs in the 2 different sockets.
>> >> >If you are okay with this idea, I am going to implement it.
>> >>
>> >> I am okay with it, but give it a couple of days before you start so that others
>> >> can voice their opinions too.
>> >
>> >Dom0 may not have a vcpu which is scheduled/schedulable on every socket.
>> >scheduled it can probably deal with by doing awful sounding temporary
>> >things to its affinity mask, but if it is not schedulable (e.g. due to
>> >cpupools etc) then that sounds even harder to sort...
>>
>> But that's why we're intending to add a helper hypercall in the first place. This
>> isn't intended to be a 'read MSR' one, but a 'read MSR in this CPU'.
>
> No more comments on this MSR access hypercall design now, so I assume people are mostly okay with it?

Yes -- I think everyone who commented before is satisfied with that
approach, and anyone who hasn't commented has had ample opportunity to
do so.

 -George

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-06-05 10:43 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-30 16:47 Xen Platform QoS design discussion Xu, Dongxiao
2014-04-30 17:02 ` Ian Campbell
2014-05-01  0:56   ` Xu, Dongxiao
2014-05-02  9:23     ` Jan Beulich
2014-05-02 12:30       ` Xu, Dongxiao
2014-05-02 12:40         ` Jan Beulich
2014-05-04  0:46           ` Xu, Dongxiao
2014-05-06  9:10             ` Ian Campbell
2014-05-06  1:40           ` Xu, Dongxiao
2014-05-06  7:55             ` Jan Beulich
2014-05-06 10:06             ` Andrew Cooper
2014-05-07  2:08               ` Xu, Dongxiao
2014-05-07  9:10                 ` Ian Campbell
2014-05-07 13:26               ` George Dunlap
2014-05-07 21:18                 ` Andrew Cooper
2014-05-08  5:21                   ` Xu, Dongxiao
2014-05-08 11:25                     ` Andrew Cooper
2014-05-09  2:41                       ` Xu, Dongxiao
2014-05-13  1:53                       ` Xu, Dongxiao
2014-05-16  5:11                       ` Xu, Dongxiao
2014-05-19 11:28                         ` George Dunlap
2014-05-19 11:45                           ` Jan Beulich
2014-05-19 12:13                             ` George Dunlap
2014-05-19 12:41                               ` Jan Beulich
2014-05-22  8:19                                 ` Xu, Dongxiao
2014-05-22  8:39                                   ` Jan Beulich
2014-05-22  9:27                                     ` George Dunlap
2014-05-26  0:51                                       ` Xu, Dongxiao
2014-05-29  0:45                                       ` Xu, Dongxiao
2014-05-29  7:01                                         ` Jan Beulich
2014-05-29  7:31                                           ` Xu, Dongxiao
2014-05-29  9:11                                             ` Jan Beulich
2014-05-30  9:10                                               ` Ian Campbell
2014-05-30 11:17                                                 ` Jan Beulich
2014-05-30 12:33                                                   ` Ian Campbell
2014-06-05  0:48                                                   ` Xu, Dongxiao
2014-06-05 10:43                                                     ` George Dunlap
2014-05-29  9:13                                             ` Andrew Cooper
2014-05-30  1:07                                               ` Xu, Dongxiao
2014-05-30  6:23                                                 ` Jan Beulich
2014-05-30  7:51                                                   ` Xu, Dongxiao
2014-05-30 11:15                                                     ` Jan Beulich
2014-05-02 12:50         ` Andrew Cooper
2014-05-04  2:34           ` Xu, Dongxiao
2014-05-06  9:12             ` Ian Campbell
2014-05-06 10:00             ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.