All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matias Ezequiel Vara Larsen <matiasevara@gmail.com>
To: xen-devel@lists.xenproject.org
Cc: Matias Ezequiel Vara Larsen <matias.vara@vates.fr>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>
Subject: [RFC PATCH] xen/docs: Document acquire resource interface
Date: Tue, 24 May 2022 13:19:30 +0200	[thread overview]
Message-ID: <324b2ea5b95ef5233202aa8eed2085c665259753.1653390261.git.matias.vara@vates.fr> (raw)

This commit creates a new doc to document the acquire resource interface. This
is a reference document.

Signed-off-by: Matias Ezequiel Vara Larsen <matias.vara@vates.fr>
---
RFC: The current document still contains TODOs. I am not really sure why
different resources are implemented differently. I would like to understand it
better so I can document it and then easily build new resources. I structured
the document in two sections but I am not sure if that is the right way to do
it.

---
 .../acquire_resource_reference.rst            | 337 ++++++++++++++++++
 docs/hypervisor-guide/index.rst               |   2 +
 2 files changed, 339 insertions(+)
 create mode 100644 docs/hypervisor-guide/acquire_resource_reference.rst

diff --git a/docs/hypervisor-guide/acquire_resource_reference.rst b/docs/hypervisor-guide/acquire_resource_reference.rst
new file mode 100644
index 0000000000..a9944aae1d
--- /dev/null
+++ b/docs/hypervisor-guide/acquire_resource_reference.rst
@@ -0,0 +1,337 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Acquire resource reference
+==========================
+
+Acquire resource allows you to share a resource between a domain and a dom0 pv
+tool.  Resources are generally represented by pages that are mapped into the pv
+tool memory space. These pages are accessed by Xen and they may or may not be
+accessed by the DomU itself. This document describes the api to build pv tools.
+The document also describes the software components required to create and
+expose a domain's resource. This is not a tutorial or a how-to guide. It merely
+describes the machinery that is already described in the code itself.
+
+.. warning::
+
+    The code in this document may already be out of date, however it may
+    be enough to illustrate how the acquire resource interface works.
+
+
+PV tool API
+-----------
+
+This section describes the api to map a resource from a pv tool. The api is based
+on the following functions:
+
+* xenforeignmemory_open()
+
+* xenforeignmemory_resource_size()
+
+* xenforeignmemory_map_resource()
+
+* xenforeignmemory_unmap_resource()
+
+The ``xenforeignmemory_open()`` function gets the handler that is used by the
+rest of the functions:
+
+.. code-block:: c
+
+   fh = xenforeignmemory_open(NULL, 0);
+
+The ``xenforeignmemory_resource_size()`` function gets the size of the resource.
+For example, in the following code, we get the size of the
+``XENMEM_RESOURCE_VMTRACE_BUF``:
+
+.. code-block:: c
+
+    rc = xenforeignmemory_resource_size(fh, domid, XENMEM_resource_vmtrace_buf, vcpu, &size);
+
+The size of the resource is returned in ``size`` in bytes.
+
+The ``xenforeignmemory_map_resource()`` function maps a domain's resource. The
+function is declared as follows:
+
+.. code-block:: c
+
+    xenforeignmemory_resource_handle *xenforeignmemory_map_resource(
+        xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+        unsigned int id, unsigned long frame, unsigned long nr_frames,
+        void **paddr, int prot, int flags);
+
+The size of the resource is in number of frames. For example, **QEMU** uses it
+to map the ioreq server between the domain and QEMU:
+
+.. code-block:: c
+
+    fres = xenforeignmemory_map_resource(xen_fmem, xen_domid, XENMEM_resource_ioreq_server,
+         state->ioservid, 0, 2, &addr, PROT_READ | PROT_WRITE, 0);
+
+
+The third parameter corresponds with the resource that we request from the
+domain, e.g., ``XENMEM_resource_ioreq_server``. The seventh parameter returns a
+point-to-pointer to the address of the mapped resource.
+
+Finally, the ``xenforeignmemory_unmap_resource()`` function unmaps the region:
+
+.. code-block:: c
+    :caption: tools/misc/xen-vmtrace.c
+
+    if ( fres && xenforeignmemory_unmap_resource(fh, fres) )
+        perror("xenforeignmemory_unmap_resource()");
+
+Sharing a resource with a pv tool
+---------------------------------
+
+In this section, we describe how to build a new resource and share it with a pv
+too. Resources are defined in ``xen/include/public/memory.h``. In Xen-4.16,
+there are three resources:
+
+.. code-block:: c
+    :caption: xen/include/public/memory.h
+
+    #define XENMEM_resource_ioreq_server 0
+    #define XENMEM_resource_grant_table 1
+    #define XENMEM_resource_vmtrace_buf 2
+
+The ``resource_max_frames()`` function returns the size of a resource. The
+resource may provide a handler to get the size. This is the definition of the
+``resource_max_frame()`` function:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/memory.c
+
+    static unsigned int resource_max_frames(const struct domain *d,
+                                            unsigned int type, unsigned int id)
+    {
+        switch ( type )
+        {
+        case XENMEM_resource_grant_table:
+            return gnttab_resource_max_frames(d, id);
+
+        case XENMEM_resource_ioreq_server:
+            return ioreq_server_max_frames(d);
+
+        case XENMEM_resource_vmtrace_buf:
+            return d->vmtrace_size >> PAGE_SHIFT;
+
+        default:
+            return -EOPNOTSUPP;
+        }
+    }
+
+The ``_acquire_resource()`` function invokes the corresponding handler that maps
+the resource. The handler relies on ``type`` to select the right handler:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/memory.c
+
+    static int _acquire_resource(
+        struct domain *d, unsigned int type, unsigned int id, unsigned int frame,
+        unsigned int nr_frames, xen_pfn_t mfn_list[])
+    {
+        switch ( type )
+        {
+        case XENMEM_resource_grant_table:
+            return gnttab_acquire_resource(d, id, frame, nr_frames, mfn_list);
+
+        case XENMEM_resource_ioreq_server:
+            return acquire_ioreq_server(d, id, frame, nr_frames, mfn_list);
+
+        case XENMEM_resource_vmtrace_buf:
+            return acquire_vmtrace_buf(d, id, frame, nr_frames, mfn_list);
+
+        default:
+            return -EOPNOTSUPP;
+        }
+    }
+
+Note that if a new resource has to be added, these two functions need to be
+modified. These handlers have the common declaration:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/memory.c
+
+    static int acquire_vmtrace_buf(
+        struct domain *d, unsigned int id, unsigned int frame,
+        unsigned int nr_frames, xen_pfn_t mfn_list[])
+    {
+
+The function returns in ``mfn_list[]`` a number of ``nr_frames`` of pointers to
+mfn pages. For example, for the ``XENMEM_resource_vmtrace_buf`` resource, the
+handler is defined as follows:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/memory.c
+
+    static int acquire_vmtrace_buf(
+        struct domain *d, unsigned int id, unsigned int frame,
+        unsigned int nr_frames, xen_pfn_t mfn_list[])
+    {
+        const struct vcpu *v = domain_vcpu(d, id);
+        unsigned int i;
+        mfn_t mfn;
+
+        if ( !v )
+            return -ENOENT;
+
+        if ( !v->vmtrace.pg ||
+             (frame + nr_frames) > (d->vmtrace_size >> PAGE_SHIFT) )
+            return -EINVAL;
+
+        mfn = page_to_mfn(v->vmtrace.pg);
+
+        for ( i = 0; i < nr_frames; i++ )
+            mfn_list[i] = mfn_x(mfn) + frame + i;
+
+        return nr_frames;
+    }
+
+Note that the handler only returns the mfn pages that have been previously
+allocated in ``vmtrace.pg``. The allocation of the resource happens during the
+instantiation of the vcpu. A set of pages is allocated during the instantiation
+of each vcpu. For allocating the page, we use the domheap with the
+``MEMF_no_refcount`` flag:
+
+.. What do we require to set this flag?
+
+.. code-block:: c
+
+    v->vmtrace.pg = alloc_domheap_page(s->target, MEMF_no_refcount);
+
+To access the pages in the context of Xen, we are required to map the page by
+using:
+
+.. code-block:: c
+
+    va_page = __map_domain_page_global(page);
+
+The ``va_page`` pointer is used in the context of Xen. The function that
+allocates the pages runs the following verification after allocation. For
+example, the following code is from ``vmtrace_alloc_buffer()`` that allocates
+the page for vmtrace for a given vcpu:
+
+.. Why is this verification required after allocation?
+
+.. code-block:: c
+
+    for ( i = 0; i < (d->vmtrace_size >> PAGE_SHIFT); i++ )
+        if ( unlikely(!get_page_and_type(&pg[i], d, PGT_writable_page)) )
+            /*
+             * The domain can't possibly know about this page yet, so failure
+             * here is a clear indication of something fishy going on.
+             */
+            goto refcnt_err;
+
+The allocated pages are released by first using ``unmap_domheap_page()`` and
+then using ``free_domheap_page()`` to finally release the page. Note that the
+releasing of these resources may vary depending on how there are allocated.
+
+Acquire Resources
+-----------------
+
+This section briefly describes the resources that rely on the acquire resource
+interface. These resources are mapped by pv tools like QEMU.
+
+Intel Processor Trace (IPT)
+```````````````````````````
+
+This resource is named ``XENMEM_resource_vmtrace_buf`` and its size in bytes is
+set in ``d->vmtrace_size``. It contains the traces generated by the IPT. These
+traces are generated by each vcpu. The pages are allocated during
+``vcpu_create()``. The pages are stored in the ``vcpu`` structure in
+``sched.h``:
+
+.. code-block:: c
+
+   struct {
+        struct page_info *pg; /* One contiguous allocation of d->vmtrace_size */
+    } vmtrace;
+
+During ``vcpu_create()``, the pg is allocated by using the per-domain heap:
+
+.. code-block:: c
+
+    pg = alloc_domheap_pages(d, get_order_from_bytes(d->vmtrace_size), MEMF_no_refcount);
+
+For a given vcpu, the page is loaded into the guest at
+``vmx_restore_guest_msrs()``:
+
+.. code-block:: c
+    :caption: xen/arch/x86/hvm/vmx/vmx.c
+
+    wrmsrl(MSR_RTIT_OUTPUT_BASE, page_to_maddr(v->vmtrace.pg));
+
+The releasing of the pages happens during the vcpu teardown.
+
+Grant Table
+```````````
+
+The grant tables are represented by the ``XENMEM_resource_grant_table``
+resource. Grant tables are special since guests can map grant tables. Dom0 also
+needs to write into the grant table to set up the grants for xenstored and
+xenconsoled. When acquiring the resource, the pages are allocated from the xen
+heap in ``gnttab_get_shared_frame_mfn()``:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/grant_table.c
+
+    gt->shared_raw[i] = alloc_xenheap_page()
+    share_xen_page_with_guest(virt_to_page(gt->shared_raw[i]), d, SHARE_rw);
+
+Then, pages are shared with the guest. These pages are then converted from virt
+to mfn before returning:
+
+.. code-block:: c
+    :linenos:
+
+    for ( i = 0; i < nr_frames; ++i )
+         mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
+
+Ioreq server
+````````````
+
+The ioreq server is represented by the ``XENMEM_resource_ioreq_server``
+resource. An ioreq server provides emulated devices to HVM and PVH guests. The
+allocation is done in ``ioreq_server_alloc_mfn()``. The following code partially
+shows the allocation of the pages that represent the ioreq server:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/ioreq.c
+
+    page = alloc_domheap_page(s->target, MEMF_no_refcount);
+
+    iorp->va = __map_domain_page_global(page);
+    if ( !iorp->va )
+        goto fail;
+
+    iorp->page = page;
+    clear_page(iorp->va);
+    return 0;
+
+The function above is invoked from ``ioreq_server_get_frame()`` which is called
+from ``acquire_ioreq_server()``. For acquiring, the function returns the
+allocated pages as follows:
+
+.. code-block:: c
+
+    *mfn = page_to_mfn(s->bufioreq.page);
+
+The ``ioreq_server_free_mfn()`` function releases the pages as follows:
+
+.. code-block:: c
+    :linenos:
+    :caption: xen/common/ioreq.c
+
+    unmap_domain_page_global(iorp->va);
+    iorp->va = NULL;
+
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+
+.. TODO: Why unmap() and free() are not used instead?
diff --git a/docs/hypervisor-guide/index.rst b/docs/hypervisor-guide/index.rst
index e4393b0697..961a11525f 100644
--- a/docs/hypervisor-guide/index.rst
+++ b/docs/hypervisor-guide/index.rst
@@ -9,3 +9,5 @@ Hypervisor documentation
    code-coverage
 
    x86/index
+
+   acquire_resource_reference
-- 
2.25.1



             reply	other threads:[~2022-05-24 11:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24 11:19 Matias Ezequiel Vara Larsen [this message]
2022-07-06  7:30 ` [RFC PATCH] xen/docs: Document acquire resource interface Henry Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=324b2ea5b95ef5233202aa8eed2085c665259753.1653390261.git.matias.vara@vates.fr \
    --to=matiasevara@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=matias.vara@vates.fr \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.