All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matias Ezequiel Vara Larsen <matiasevara@gmail.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Matias Ezequiel Vara Larsen" <matias.vara@vates.fr>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Julien Grall" <julien@xen.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Dario Faggioli" <dfaggioli@suse.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [RFC PATCH v2 1/2] xen/memory : Add a stats_table resource type
Date: Thu, 16 Feb 2023 15:48:56 +0100	[thread overview]
Message-ID: <20230216144856.GA2137216@horizon> (raw)
In-Reply-To: <283a8095-3046-3324-3e8a-8671dc1ee8e7@suse.com>

Hello Jan and thanks for your comments. I addressed most of the them but
I've still some questions. Please find my questions below:

On Tue, Dec 13, 2022 at 06:02:55PM +0100, Jan Beulich wrote:
> On 07.10.2022 14:39, Matias Ezequiel Vara Larsen wrote:
> > This commit proposes a new mechanism to query the RUNSTATE_running counter for
> > a given vcpu from a dom0 userspace application. This commit proposes to expose
> > that counter by using the acquire_resource interface. The current mechanism
> > relies on the XEN_DOMCTL_getvcpuinfo and holds a single global domctl_lock for
> > the entire hypercall; and iterate over every vcpu in the system for every
> > update thus impacting operations that share that lock.
> > 
> > This commit proposes to expose vcpu RUNSTATE_running via the
> > xenforeignmemory interface thus preventing to issue the hypercall and holding
> > the lock. For that purpose, a new resource type named stats_table is added. The
> > first frame of this resource stores per-vcpu counters. The frame has one entry
> > of type struct vcpu_stats per vcpu. The allocation of this frame only happens
> > if the resource is requested. The frame is released after the domain is
> > destroyed.
> > 
> > Note that the updating of this counter is in a hot path, thus, in this commit,
> > copying only happens if it is specifically required.
> > 
> > Note that the exposed structure is extensible in two ways. First, the structure
> > vcpu_stats can be extended with new per-vcpu counters while it fits in a frame.
> 
> I'm afraid I don't see how this is "extensible". I would recommend that
> you outline for yourself how a change would look like to actually add
> such a 2nd counter. While doing that keep in mind that whatever changes
> you make may not break existing consumers.
> 
> It's also not clear what you mean with "fits in a frame": struct
> shared_vcpustatspage is a container for an array with a single element.
> I may guess (looking at just the public interface) that this really is
> meant to be a flexible array (and hence should be marked as such - see
> other uses of XEN_FLEX_ARRAY_DIM in the public headers). Yet if that's
> the case, then a single page already won't suffice for a domain with
> sufficiently many vCPU-s.
> 

I taclked this by using "d->max_vcpus" to calculate the number of required frames
to allocate for a given guest. Also, I added a new type-specific resource named
XENMEM_resource_stats_table_id_vcpustats to host per-vcpu counters. I
completely forgot the "id" in the previous series.

> > Second, new frames can be added in case new counters are required.
> 
> Are you talking of "new counters" here which aren't "new per-vcpu
> counters"? Or else what's the difference from the 1st way?

Yes, I was talking about that sort of counters. In the next series, that sort
of counters could be added by adding a new type-specific resource id.

> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -741,6 +741,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
> >  
> >      ioreq_server_destroy_all(d);
> >  
> > +    stats_free_vcpu_mfn(d);
> 
> How come this lives here? Surely this new feature should be not only
> guest-type independent, but also arch-agnostic? Clearly you putting
> the new data in struct domain (and not struct arch_domain or yet
> deeper in the hierarchy) indicates you may have been meaning to make
> it so.
> 

The whole feature shall to be guest-type independent and also arch-agnostic.
Would it be better to put it at xen/common/domain.c:domain_kill()?
 
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -1078,6 +1078,12 @@ unsigned int ioreq_server_max_frames(const struct domain *d)
> >      return nr;
> >  }
> >  
> > +unsigned int stats_table_max_frames(const struct domain *d)
> > +{
> > +    /* One frame per 512 vcpus. */
> > +    return 1;
> > +}
> 
> As alluded to earlier already - 1 isn't going to be suitable for
> arbitrary size domains. (Yes, HVM domains are presently limited to
> 128 vCPU-s, but as per above this shouldn't be a HVM-only feature.)
>

I am going to use "d->max_vcpus" to calculate the number of required frames for
per-vcpu counters for a given guest.
 
> > @@ -1162,6 +1171,88 @@ static int acquire_vmtrace_buf(
> >      return nr_frames;
> >  }
> >  
> > +void stats_free_vcpu_mfn(struct domain * d)
> > +{
> > +    struct page_info *pg = d->vcpustats_page.pg;
> > +
> > +    if ( !pg )
> > +        return;
> > +
> > +    d->vcpustats_page.pg = NULL;
> > +
> > +    if ( d->vcpustats_page.va )
> > +        unmap_domain_page_global(d->vcpustats_page.va);
> > +
> > +    d->vcpustats_page.va = NULL;
> 
> We ought to gain UNMAP_DOMAIN_PAGE_GLOBAL() for purposes like this one,
> paralleling UNMAP_DOMAIN_PAGE().
> 

I do not understand this comment. Could you elaborate it?

> > +    put_page_alloc_ref(pg);
> > +    put_page_and_type(pg);
> > +}
> > +
> > +static int stats_vcpu_alloc_mfn(struct domain *d)
> > +{
> > +    struct page_info *pg;
> > +
> > +    pg = alloc_domheap_page(d, MEMF_no_refcount);
> > +
> > +    if ( !pg )
> > +        return -ENOMEM;
> > +
> > +    if ( !get_page_and_type(pg, d, PGT_writable_page) ) {
> 
> Style: Brace placement (more elsewhere).
> 
> > +        put_page_alloc_ref(pg);
> 
> This is not allowed when what you may put is the last reference.
> See other examples we have in the tree.
> 

I do not understand this comment. Could you point me to an example? I used
ioreq_server_alloc_mfn() as example but it may not be a good example. 

> > +        return -ENODATA;
> > +    }
> > +
> > +    d->vcpustats_page.va = __map_domain_page_global(pg);
> > +    if ( !d->vcpustats_page.va )
> > +        goto fail;
> > +
> > +    d->vcpustats_page.pg = pg;
> > +    clear_page(d->vcpustats_page.va);
> 
> I guess this should happen before you globally announce the
> address.
> 

If I understand correctly, I should invoke clear_page() before I assign the
address to "d->vcpustats_page.va". Am I right?

> > +    return 1;
> 
> Functions returning -errno on error want to return 0 on success,
> unless e.g. multiple success indicators are needed.
> 
> > +fail:
> 
> Style: Label indentation.
> 
> > +    put_page_alloc_ref(pg);
> > +    put_page_and_type(pg);
> > +
> > +    return -ENOMEM;
> > +}
> > +
> > +static int acquire_stats_table(struct domain *d,
> > +                                unsigned int id,
> > +                                unsigned int frame,
> > +                                unsigned int nr_frames,
> > +                                xen_pfn_t mfn_list[])
> 
> Style: Indentation.
> 
> > +{
> > +    mfn_t mfn;
> > +    int rc;
> > +    unsigned int i;
> > +
> > +    if ( !d )
> > +        return -ENOENT;
> > +
> > +    for ( i = 0; i < nr_frames; i++ )
> > +    {
> > +        switch ( i )
> > +        {
> > +        case XENMEM_resource_stats_frame_vcpustats:
> 
> Isn't this supposed to be indexed by "id" (which presently you ignore
> altogether, which can't be right)?

I forgot the "id". I added a new type-specific resource id in the next
series. 

> 
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -264,6 +264,8 @@ static inline void vcpu_runstate_change(
> >  {
> >      s_time_t delta;
> >      struct sched_unit *unit = v->sched_unit;
> > +    shared_vcpustatspage_t * vcpustats_va;
> 
> Style: Stray blank (more elsewhere).
> 
> > @@ -287,6 +289,20 @@ static inline void vcpu_runstate_change(
> >      }
> >  
> >      v->runstate.state = new_state;
> > +
> > +    vcpustats_va = (shared_vcpustatspage_t*)d->vcpustats_page.va;
> 
> There should be no need for a cast here.
> 
> > +    if ( vcpustats_va )
> > +    {
> > +	vcpustats_va->vcpu_info[v->vcpu_id].version =
> 
> Style: Hard tab.
> 
> > +	    version_update_begin(vcpustats_va->vcpu_info[v->vcpu_id].version);
> > +        smp_wmb();
> > +        memcpy(&vcpustats_va->vcpu_info[v->vcpu_id].runstate_running_time,
> > +               &v->runstate.time[RUNSTATE_running],
> > +               sizeof(v->runstate.time[RUNSTATE_running]));
> 
> Why memcpy() and not a plain assignment?
> 
> > +        smp_wmb();
> > +        vcpustats_va->vcpu_info[v->vcpu_id].version =
> > +            version_update_end(vcpustats_va->vcpu_info[v->vcpu_id].version);
> > +    }
> 
> Overall latching &vcpustats_va->vcpu_info[v->vcpu_id] into a helper
> variable would likely help readability quite a bit.
> 
> > --- a/xen/include/public/vcpu.h
> > +++ b/xen/include/public/vcpu.h
> > @@ -235,6 +235,22 @@ struct vcpu_register_time_memory_area {
> >  typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
> >  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
> >  
> > +struct vcpu_stats{
> 
> Style: Missing blank.
> 
> > +    /* If the least-significant bit of the version number is set then an update
> 
> Style: Comment layout.
> 

Thanks for the comments regarding style.

Matias


  reply	other threads:[~2023-02-16 14:49 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-07 12:39 [RFC PATCH v2 0/2] Add a new acquire resource to query vcpu statistics Matias Ezequiel Vara Larsen
2022-10-07 12:39 ` [RFC PATCH v2 1/2] xen/memory : Add a stats_table resource type Matias Ezequiel Vara Larsen
2022-12-13 17:02   ` Jan Beulich
2023-02-16 14:48     ` Matias Ezequiel Vara Larsen [this message]
2023-02-16 15:10       ` Jan Beulich
2022-12-14  7:29   ` Jan Beulich
2022-12-14  7:56     ` Jan Beulich
2023-02-17  8:50       ` Matias Ezequiel Vara Larsen
2023-02-17  8:57         ` Jan Beulich
2023-02-17  9:29           ` Matias Ezequiel Vara Larsen
2023-02-17 14:10             ` Jan Beulich
2023-02-23 12:16               ` Matias Ezequiel Vara Larsen
2023-02-23 12:42                 ` Jan Beulich
2023-03-07 14:44                   ` Matias Ezequiel Vara Larsen
2023-03-07 16:55                     ` Jan Beulich
2023-03-09  9:22                       ` Matias Ezequiel Vara Larsen
2023-02-16 15:07     ` Matias Ezequiel Vara Larsen
2023-02-16 15:15       ` Jan Beulich
2023-02-20 16:51         ` Matias Ezequiel Vara Larsen
2023-02-21  8:48           ` Jan Beulich
2022-10-07 12:39 ` [RFC PATCH v2 2/2] tools/misc: Add xen-vcpus-stats tool Matias Ezequiel Vara Larsen
2023-02-23 16:01   ` Andrew Cooper
2023-02-23 20:31     ` Julien Grall
2023-03-17 11:01       ` Matias Ezequiel Vara Larsen
2023-03-29 21:29         ` Julien Grall
2023-02-24 15:31     ` Matias Ezequiel Vara Larsen
2023-02-23 19:56 ` API/ABIs: Re: [RFC PATCH v2 0/2] Add a new acquire resource to query vcpu statistics Andrew Cooper
2023-03-06 14:23   ` Matias Ezequiel Vara Larsen
2023-03-07 10:12     ` Jan Beulich
2023-03-08 11:54       ` Matias Ezequiel Vara Larsen
2023-03-08 14:16         ` Jan Beulich
2023-03-09 10:38           ` Matias Ezequiel Vara Larsen
2023-03-09 11:50             ` Jan Beulich
2023-03-10 10:58               ` Matias Ezequiel Vara Larsen
2023-03-10 11:34                 ` Jan Beulich
2023-03-14 10:28                   ` Matias Ezequiel Vara Larsen
2023-03-14 10:34                     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230216144856.GA2137216@horizon \
    --to=matiasevara@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=matias.vara@vates.fr \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.