On 17 May 2022, at 15:33, Matias Ezequiel Vara Larsen <matiasevara@gmail.com> wrote:

Allow to map vcpu stats using acquire_resource().

This needs a lot more expansion in terms of what it is that you’re doing in this patch and why.

Signed-off-by: Matias Ezequiel Vara Larsen <matias.vara@vates.fr>
---
xen/common/domain.c         | 42 +++++++++++++++++++++++++++++++++++++
xen/common/memory.c         | 29 +++++++++++++++++++++++++
xen/common/sched/core.c     | 5 +++++
xen/include/public/memory.h | 1 +
xen/include/xen/sched.h     | 5 +++++
5 files changed, 82 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 17cc32fde3..ddd9f88874 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -132,6 +132,42 @@ static void vcpu_info_reset(struct vcpu *v)
    v->vcpu_info_mfn = INVALID_MFN;
}

+static void stats_free_buffer(struct vcpu * v)
+{
+    struct page_info *pg = v->stats.pg;
+
+    if ( !pg )
+        return;
+
+    v->stats.va = NULL;
+
+    if ( v->stats.va )
+        unmap_domain_page_global(v->stats.va);
+
+    v->stats.va = NULL;

Looks like you meant to delete the first `va->stats.va = NULL` but forgot?

+
+ free_domheap_page(pg);

Pretty sure this will crash.

Unfortunately page allocation and freeing is somewhat complicated and requires a bit of boilerplate. You can look at the vmtrace_alloc_buffer() code for a template, but the general sequence you want is as follows:

* On the allocate side:

1. Allocate the page

pg = alloc_domheap_page(d, MEMF_no_refcount);

This will allocate a page with the PGC_allocated bit set and a single reference count. (If you pass a page with PGC_allocated set to free_domheap_page(), it will crash; which is why I said the above.)

2. Grab a general reference count for the vcpu struct’s reference to it; as well as a writable type count, so that it doesn’t get used as a special page

if (get_page_and_type(pg, d, PGT_writable_page)) {

put_page_alloc_ref(p);

/* failure path */

}

* On the free side, don’t call free_domheap_pages() directly. Rather, drop the allocation, then drop your own type count, thus:

v->stats.va = NULL;

put_page_alloc_ref(pg);

put_page_and_type(pg);

The issue here is that we can’t free the page until all references have dropped; and the whole point of this exercise is to allow guest userspace in dom0 to gain a reference to the page. We can’t actually free the page until *all* references are gone, including the userspace one in dom0. The put_page() which brings the reference count to 0 will automatically free the page.

+}
+
+static int stats_alloc_buffer(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct page_info *pg;
+
+    pg = alloc_domheap_page(d, MEMF_no_refcount);
+
+    if ( !pg )
+        return -ENOMEM;
+
+    v->stats.va = __map_domain_page_global(pg);
+    if ( !v->stats.va )
+        return -ENOMEM;
+
+    v->stats.pg = pg;
+    clear_page(v->stats.va);
+    return 0;
+}

The other thing to say about this is that the memory is being allocated unconditionally, even if nobody is planning to read it. The vast majority of Xen users will not be running xcp-rrd, so it will be pointless overheard.

At a basic level, you want to follow suit with the vmtrace buffers, which are only allocated if the proper domctl flag is set on domain creation. (We could consider instead, or in addition, having a global Xen command-line parameter which would enable this feature for all domains.)

+
static void vmtrace_free_buffer(struct vcpu *v)
{
    const struct domain *d = v->domain;
@@ -203,6 +239,9 @@ static int vmtrace_alloc_buffer(struct vcpu *v)
*/
static int vcpu_teardown(struct vcpu *v)
{
+
+    stats_free_buffer(v);
+
    vmtrace_free_buffer(v);

    return 0;
@@ -269,6 +308,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
    if ( vmtrace_alloc_buffer(v) != 0 )
        goto fail_wq;

+    if ( stats_alloc_buffer(v) != 0 )
+        goto fail_wq;
+
    if ( arch_vcpu_create(v) != 0 )
        goto fail_sched;

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 297b98a562..39de6d9d05 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1099,6 +1099,10 @@ static unsigned int resource_max_frames(const struct domain *d,
    case XENMEM_resource_vmtrace_buf:
        return d->vmtrace_size >> PAGE_SHIFT;

+    // WIP: to figure out the correct size of the resource
+    case XENMEM_resource_stats_table:
+        return 1;
+
    default:
        return -EOPNOTSUPP;
    }
@@ -1162,6 +1166,28 @@ static int acquire_vmtrace_buf(
    return nr_frames;
}

+static int acquire_stats_table(struct domain *d,
+                                unsigned int id,
+                                unsigned int frame,
+                                unsigned int nr_frames,
+                                xen_pfn_t mfn_list[])
+{
+    const struct vcpu *v = domain_vcpu(d, id);
+    mfn_t mfn;
+

Maybe I’m paranoid, but I might add an “ASSERT(nr_frames == 1)” here

+    if ( !v )
+        return -ENOENT;
+
+    if ( !v->stats.pg )
+        return -EINVAL;
+
+    mfn = page_to_mfn(v->stats.pg);
+    mfn_list[0] = mfn_x(mfn);
+
+    printk("acquire_perf_table: id: %d, nr_frames: %d, %p, domainid: %d\n", id, nr_frames, v->stats.pg, d->domain_id);
+    return 1;
+}
+
/*
* Returns -errno on error, or positive in the range [1, nr_frames] on
* success. Returning less than nr_frames contitutes a request for a
@@ -1182,6 +1208,9 @@ static int _acquire_resource(
    case XENMEM_resource_vmtrace_buf:
        return acquire_vmtrace_buf(d, id, frame, nr_frames, mfn_list);

+    case XENMEM_resource_stats_table:
+        return acquire_stats_table(d, id, frame, nr_frames, mfn_list);
+
    default:
        return -EOPNOTSUPP;
    }
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 8f4b1ca10d..2a8b534977 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -264,6 +264,7 @@ static inline void vcpu_runstate_change(
{
    s_time_t delta;
    struct sched_unit *unit = v->sched_unit;
+    uint64_t * runstate;

    ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
    if ( v->runstate.state == new_state )
@@ -287,6 +288,10 @@ static inline void vcpu_runstate_change(
    }

    v->runstate.state = new_state;
+
+    // WIP: use a different interface
+    runstate = (uint64_t*)v->stats.va;

I think you should look at xen.git/xen/include/public/hvm/ioreq.h:shared_iopage_t for inspiration. Basically, you cast the void pointer to that type, and then just access `iopage->vcpu_ioreq[vcpuid]`. Put it in a public header, and then both the userspace consumer and Xen can use the same structure.

As I said in my response to the cover letter, I think vcpu_runstate_info_t would be something to look at and gain inspiration from.

The other thing to say here is that this is a hot path; we don’t want to be copying lots of information around if it’s not going to be used. So you should only allocate the buffers if specifically enabled, and you should only copy the information over if v->stats.va != NULL.

I think that should be enough to start with; we can nail down more when you send v1.

Peace,

-George