From: Wei Chen <Wei.Chen@arm.com>
To: Oleksandr <olekstysh@gmail.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
"will@kernel.org" <will@kernel.org>,
"julien.thierry.kdev@gmail.com" <julien.thierry.kdev@gmail.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
Julien Grall <julien@xen.org>,
Andre Przywara <Andre.Przywara@arm.com>,
Marc Zyngier <maz@kernel.org>,
Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>,
nd <nd@arm.com>
Subject: RE: [Kvmtool] Some thoughts on using kvmtool Virtio for Xen
Date: Thu, 8 Jul 2021 06:51:42 +0000 [thread overview]
Message-ID: <DB9PR08MB6857D8C481F55954C9DF72F29E199@DB9PR08MB6857.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <17f02c54-4697-7aaa-6c6b-19c2bbeb169b@gmail.com>
Hi Oleksandr,
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Oleksandr
> Sent: 2021年7月6日 20:07
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; will@kernel.org;
> julien.thierry.kdev@gmail.com; kvm@vger.kernel.org; xen-
> devel@lists.xen.org; jean-philippe@linaro.org; Julien Grall
> <julien@xen.org>; Andre Przywara <Andre.Przywara@arm.com>; Marc Zyngier
> <maz@kernel.org>; Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
> Subject: Re: [Kvmtool] Some thoughts on using kvmtool Virtio for Xen
>
>
> Hello Wei,
>
>
> Sorry for the late response.
> And thanks for working in that direction and preparing the document.
>
>
> On 05.07.21 13:02, Wei Chen wrote:
> > Hi Stefano,
> >
> > Thanks for your comments.
> >
> >> -----Original Message-----
> >> From: Stefano Stabellini <sstabellini@kernel.org>
> >> Sent: 2021年6月30日 8:43
> >> To: will@kernel.org; julien.thierry.kdev@gmail.com; Wei Chen
> >> <Wei.Chen@arm.com>
> >> Cc: kvm@vger.kernel.org; xen-devel@lists.xen.org; jean-
> philippe@linaro.org;
> >> Julien Grall <julien@xen.org>; Andre Przywara <Andre.Przywara@arm.com>;
> >> Marc Zyngier <maz@kernel.org>; Stefano Stabellini
> <sstabellini@kernel.org>;
> >> Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
> >> Subject: Re: [Kvmtool] Some thoughts on using kvmtool Virtio for Xen
> >>
> >> Hi Wei,
> >>
> >> Sorry for the late reply.
> >>
> >>
> >> On Tue, 15 Jun 2021, Wei Chen wrote:
> >>> Hi,
> >>>
> >>> I have some thoughts of using kvmtool Virtio implementation
> >>> for Xen. I copied my markdown file to this email. If you have
> >>> time, could you please help me review it?
> >>>
> >>> Any feedback is welcome!
> >>>
> >>> # Some thoughts on using kvmtool Virtio for Xen
> >>> ## Background
> >>>
> >>> Xen community is working on adding VIRTIO capability to Xen. And we're
> >> working
> >>> on VIRTIO backend of Xen. But except QEMU can support virtio-net for
> >> x86-xen,
> >>> there is not any VIRTIO backend can support Xen. Because of the
> >> community's
> >>> strong voice of Out-of-QEMU, we want to find a light weight VIRTIO
> >> backend to
> >>> support Xen.
>
>
> Yes, having something light weight to provide Virtio backends for the at
> least *main* devices (console, blk, net)
> which we could run on Xen without an extra effort would be really nice.
>
>
> >>>
> >>> We have an idea of utilizing the virtio implementaton of kvmtool for
> Xen.
> >> And
> >>> We know there was some agreement that kvmtool won't try to be a full
> >> QEMU
> >>> alternative. So we have written two proposals in following content for
> >>> communities to discuss in public:
> >>>
> >>> ## Proposals
> >>> ### 1. Introduce a new "dm-only" command
> >>> 1. Introduce a new "dm-only" command to provide a pure device model
> mode.
> >> In
> >>> this mode, kvmtool only handles IO request. VM creation and
> >> initialization
> >>> will be bypassed.
> >>>
> >>> * We will rework the interface between the virtio code and the
> rest
> >> of
> >>> kvmtool, to use just the minimal set of information. At the end,
> >> there
> >>> would be MMIO accesses and shared memory that control the device
> >> model,
> >>> so that could be abstracted to do away with any KVM specifics at
> all.
> >> If
> >>> this is workable, we will send the first set of patches to
> introduce
> >> this
> >>> interface, and adapt the existing kvmtool to it. Then later we
> will
> >> can
> >>> add Xen support on top of it.
> >>>
> >>> About Xen support, we will detect the presence of Xen libraries,
> >> also
> >>> allow people to ignore them, as kvmtoll do with optional features
> >> like
> >>> libz or libaio.
> >>>
> >>> Idealy, we want to move all code replying on Xen libraries to a
> set
> >> of
> >>> new files. In this case, thes files can only be compiled when Xen
> >>> libraries are detected. But if we can't decouple this code
> >> completely,
> >>> we may introduce a bit of #ifdefs to protect this code.
> >>>
> >>> If kvm or other VMM do not need "dm-only" mode. Or "dm-only" can
> not
> >>> work without Xen libraries. We will make "dm-only" command
> depends
> >> on
> >>> the presence of Xen libraries.
> >>>
> >>> So a normal compile (without the Xen libraries installed) would
> >> create
> >>> a binary as close as possible to the current code, and only the
> >> people
> >>> who having Xen libraries installed would ever generate a "dm-
> only"
> >>> capable kvmtool.
> >>>
> >>> ### 2. Abstract kvmtool virtio implementation as a library
> >>> 1. Add a kvmtool Makefile target to generate a virtio library. In this
> >>> scenario, not just Xen, but any project else want to provide a
> >>> userspace virtio backend service can link to this virtio libraris.
> >>> These users would benefit from the VIRTIO implementation of
> kvmtool
> >>> and will participate in improvements, upgrades, and maintenance of
> >>> the VIRTIO libraries.
> >>>
> >>> * In this case, Xen part code will not upstream to kvmtool repo,
> >>> it would then be natural parts of the xen repo, in xen/tools or
> >>> maintained in other repo.
> >>>
> >>> We will have a completely separate VIRTIO backend for Xen, just
> >>> linking to kvmtool's VIRTIO library.
> >>>
> >>> * The main changes of kvmtool would be:
> >>> 1. Still need to rework the interface between the virtio code
> >>> and the rest of kvmtool, to abstract the whole virtio
> >>> implementation into a library
> >>> 2. Modify current build system to add a new virtio library
> >> target.
> >>
> >>
> >> I don't really have a preference between the two.
> >>
> >> From my past experience with Xen enablement in QEMU, I can say that
> the
> >> Xen part of receiving IO emulation requests is actually pretty minimal.
>
> In general, both proposals sound good to me, probably with a little
> preference for #1, but I am not sure that I can see all pitfalls here.
>
>
> > Yes, we have done some prototyping, and the code of Xen receive IOREQ
> > support can be implemented in a separate new file without invasion into
> > the existing kvmtool.
> >
> > The point is that the device implementation calls the hypervisor
> interfaces
> > to handle these IOREQs, and is currently tightly coupled to Linux-KVM in
> the
> > implementation of each device. Without some abstract work, these
> adaptations
> > can lead to more intrusive modifications.
> >
> >> See as a reference
> >>
> https://github.com/qemu/qemu/blob/13d5f87cc3b94bfccc501142df4a7b12fee3a6e7
> >> /hw/i386/xen/xen-hvm.c#L1163.
> >> The modifications to rework the internal interfaces that you listed
> >> below are far more "interesting" than the code necessary to receive
> >> emulation requests from Xen.
>
>
> +1
>
> >>
> > I'm glad to hear that : )
> >
> >> So it looks like option-1 would be less efforts and fewer code changes
> >> overall to kvmtools. Option-2 is more work. The library could be nice
> to
> >> have but then we would have to be very careful about the API/ABI,
> >> compatibility, etc.
> >>
> >> Will Deacon and Julien Thierry might have an opinion.
> >>
> >>
> > Looking forward to Will and Julien's comments.
> >
> >>> ## Reworking the interface is the common work for above proposals
> >>> **In kvmtool, one virtual device can be separated into three layers:**
> >>>
> >>> - A device type layer to provide an abstract
> >>> - Provide interface to collect and store device configuration.
> >>> Using block device as an example, kvmtool is using disk_image
> to
> >>> - collect and store disk parameters like:
> >>> - backend image format: raw, qcow or block device
> >>> - backend block device or file image path
> >>> - Readonly, direct and etc
> >>> - Provide operations to interact with real backend devices or
> >> services:
> >>> - provide backend device operations:
> >>> - block device operations
> >>> - raw image operations
> >>> - qcow image operations
> >>> - Hypervisor interfaces
> >>> - Guest memory mapping and unmapping interfaces
> >>> - Virtual device register interface
> >>> - MMIO/PIO space register
> >>> - IRQ register
> >>> - Virtual IRQ inject interface
> >>> - Hypervisor eventfd interface
> >> The "hypervisor interfaces" are the ones that are most interesting as
> we
> >> need an alternative implementation for Xen for each of them. This is
> >> the part that was a bit more delicate when we added Xen support to QEMU.
> >> Especially the memory mapping and unmapping. All doable but we need
> >> proper abstractions.
> >>
> > Yes. Guest memory mapping and unmapping, if we use option#1, this will
> be a
> > a big change introduced in Kvmtool. Since Linux-KVM guest memory in
> kvmtool
> > is flat mapped in advance, it does not require dynamic Guest memory
> mapping
> > and unmapping. A proper abstract interface can bridge this gap.
>
> The layer separation scheme looks reasonable to me at first sight.
> Agree, "Hypervisor interfaces" worry the most, especially "Guest memory
> mapping and unmapping" which is something completely different on Xen in
> comparison with Kvm. If I am not mistaken, in the PoC the Virtio ring(s)
> are mapped at once during device initialization and unmapped during
> releasing it, while the payloads I/O buffers are mapped/unmapped at
> run-time ...
Yes, current PoC works in this way.
> If only we could map all memory in advance and just calculate virt addr
> at run-time like it was done for Kvm case in guest_flat_to_host(). What
> we would just need is to re-map memory once the guest memory layout is
> changed
Sorry, I am not very sure about guest memory layout changed here?
Guest memory hotplug? balloon?
> (fortunately, we have invalidate mapcache request to signal about that).
>
>
> FYI, I had a discussion with Julien on IRC regarding foreign memory
> mappings and possible improvements, the main problem today is that we
> need to steal page from the backend domain memory in order to map guest
> page into backend address space, so if we decide to map all memory in
> advance and need to serve guest(s) with a lot of memory we may run out
> of memory in the host very quickly (see XSA-300). So the idea is to try
> to map guest memory into some unused address space provided by the
> hypervisor and then hot-plugged without charging real domain pages
> (everything not mapped into P2M could be theoretically treated as
> unused). I have already started investigations, but unfortunately had to
> postpone them due to project related activities, definitely I have a
> plan to resume them again and create a PoC at least. This would simplify
> things, improve performance and eliminate the memory pressure in the host.
>
Yes, definitely, with this improvements, the gaps between KVM and Xen
in guest memory mapping and unmapping can be reduced. At least the
mapping/unmapping code embedding into the virtio device implementations
in our PoC is no longer needed.
>
> >
> >>> - An implementation layer to handle guest IO request.
> >>> - Kvmtool provides virtual devices for guest. Some virtual
> devices
> >> two
> >>> kinds of implementations:
> >>> - VIRTIO implementation
> >>> - Real hardware emulation
> >>>
> >>> For example, kvmtool console has virtio console and 8250 serial two
> >> kinds
> >>> of implementations. These implementation depends on device type
> >> parameters
> >>> to create devices, and depends on device type ops to forward data
> >> from/to
> >>> real device. And the implementation will invoke hypervisor interfaces
> to
> >>> map/unmap resources and notify guest.
> >>>
> >>> In the current kvmtool code, the boundaries between these three layers
> >> are
> >>> relatively clear, but there are a few pieces of code that are somewhat
> >>> interleaved, for example:
> >>> - In virtio_blk__init(...) function, the code will use disk_image
> >> directly.
> >>> This data is kvmtool specified. If we want to make VIRTIO
> >> implementation
> >>> become hypervisor agnostic. Such kind of code should be moved to
> other
> >>> place. Or we just keep code from virtio_blk__init_one(...) in
> virtio
> >> block
> >>> implementation, but keep virtio_blk__init(...) in kvmtool specified
> >> part
> >>> code.
> >>>
> >>> However, in the current VIRTIO device creation and data handling
> process,
> >>> the device type and hypervisor API used are both exclusive to kvmtool
> >> and
> >>> KVM. If we want to use current VIRTIO implementation for other device
> >>> models and hypervisors, it is unlikely to work properly.
> >>>
> >>> So, the major work of reworking interface is decoupling VIRTIO
> >> implementation
> >>> from kvmtool and KVM.
> >>>
> >>> **Introduce some intermediate data structures to do decouple:**
> >>> 1. Introduce intermedidate type data structures like
> `virtio_disk_type`,
> >>> `virtio_net_type`, `virtio_console_type` and etc. These data
> >> structures
> >>> will be the standard device type interfaces between virtio device
> >>> implementation and hypervisor. Using virtio_disk_type as an
> example:
> >>> ~~~~
> >>> struct virtio_disk_type {
> >>> /*
> >>> * Essential configuration for virtio block device can be got
> >> from
> >>> * kvmtool disk_image. Other hypervisor device model also can
> >> use
> >>> * this data structure to pass necessary parameters for
> creating
> >>> * a virtio block device.
> >>> */
> >>> struct virtio_blk_cfg vblk_cfg;
> >>> /*
> >>> * Virtio block device MMIO address and IRQ line. These two
> >> members
> >>> * are optional. If hypervisor provides allocate_mmio_space
> and
> >>> * allocate_irq_line capability and device model doesn't set
> >> these
> >>> * two fields, virtio block implementation will use
> hypervisor
> >> APIs
> >>> * to allocate MMIO address and IRQ line. If these two fields
> >> are
> >>> * configured, virtio block implementation will use them.
> >>> */
> >>> paddr_t addr;
> >>> uint32_t irq;
> >>> /*
> >>> * In kvmtool, this ops will connect to disk_image APIs.
> Other
> >>> * hypervisor device model should provide similar APIs for
> this
> >>> * ops to interact with real backend device.
> >>> */
> >>> struct disk_type_ops {
> >>> .read
> >>> .write
> >>> .flush
> >>> .wait
> >>> ...
> >>> } ops;
> >>> };
> >>> ~~~~
> >>>
> >>> 2. Introduce a intermediate hypervisor data structure. This data
> >> structure
> >>> provides a set of standard hypervisor API interfaces. In virtio
> >>> implementation, the KVM specified APIs, like kvm_register_mmio,
> will
> >> not
> >>> be invoked directly. The virtio implementation will use these
> >> interfaces
> >>> to access hypervisor specified APIs. for example `struct vmm_impl`:
> >>> ~~~~
> >>> struct vmm_impl {
> >>> /*
> >>> * Pointer that link to real hypervisor handle like `struct
> kvm
> >> *kvm`.
> >>> * This pointer will be passed to the vmm ops;
> >>> */
> >>> void *vmm;
> >>> allocate_irq_line_fn_t(void* vmm, ...);
> >>> allocate_mmio_space_fn_t(void* vmm, ...);
> >>> register_mmio_fn_t(void* vmm, ...);
> >>> map_guest_page_fn_t(void* vmm, ...);
> >>> unmap_guest_page_fn_t(void* vmm, ...);
> >>> virtual_irq_inject_fn_t(void* vmm, ...);
> >>> };
> >>> ~~~~
> >> Are the map_guest_page and unmap_guest_page functions already called at
> >> the appropriate places for KVM?
> > As I had mentioned in above, KVM doesn't need map_guest_page and
> unmap_guest_page
> > dynamically while handling the IOREQ. These two interfaces can be
> pointed to NULL
> > or empty functions for KVM.
> >
> >> If not, the main issue is going to be adding the
> >> map_guest_page/unmap_guest_page calls to the virtio device
> >> implementations.
> >>
> > Yes, we can place them to virtio device implementations, and keep NOP
> > operation for KVM. Other VMMs can be implemented as the case may be
> >
> >>> 3. After decoupled with kvmtool, any hypervisor can use standard
> >> `vmm_impl`
> >>> and `virtio_xxxx_type` interfaces to invoke standard virtio
> >> implementation
> >>> interfaces to create virtio devices.
> >>> ~~~~
> >>> /* Prepare VMM interface */
> >>> struct vmm_impl *vmm = ...;
> >>> vmm->register_mmio_fn_t = kvm__register_mmio;
> >>> /* kvm__map_guset_page is a wrapper guest_flat_to_host */
> >>> vmm->map_guest_page_fn_t = kvm__map_guset_page;
> >>> ...
> >>>
> >>> /* Prepare virtio_disk_type */
> >>> struct virtio_disk_type *vdisk_type = ...;
> >>> vdisk_type->vblk_cfg.capacity = disk_image->size / SECTOR_SIZE;
> >>> ...
> >>> vdisk_type->ops->read = disk_image__read;
> >>> vdisk_type->ops->write = disk_image__write;
> >>> ...
> >>>
> >>> /* Invoke VIRTIO implementation API to create a virtio block
> device
> >> */
> >>> virtio_blk__init_one(vmm, vdisk_type);
> >>> ~~~~
> >>>
> >>> VIRTIO block device simple flow before reworking interface:
> >>>
> >>
> https://drive.google.com/file/d/1k0Grd4RSuCmhKUPktHj9FRamEYrPCFkX/view?usp
> >> =sharing
> >>> ![image](https://drive.google.com/uc?export=view&id=1k0Grd4RSuCmhKUPkt
> Hj
> >> 9FRamEYrPCFkX)
> >>> VIRTIO block device simple flow after reworking interface:
> >>>
> >>
> https://drive.google.com/file/d/1rMXRvulwlRO39juWf08Wgk3G1NZtG2nL/view?usp
> >> =sharing
> >>> ![image](https://drive.google.com/uc?export=view&id=1rMXRvulwlRO39juWf
> 08
> >> Wgk3G1NZtG2nL)
>
> Could you please provide an access for these documents if possible?
>
Can you access them through these two links?
https://drive.google.com/file/d/1rMXRvulwlRO39juWf08Wgk3G1NZtG2nL/view?usp=sharing
https://drive.google.com/file/d/1k0Grd4RSuCmhKUPktHj9FRamEYrPCFkX/view?usp=sharing
I am sorry I had set the wrong sharing option for the second one!
>
> >>>
> >>> Thanks,
> >>> Wei Chen
> >>> IMPORTANT NOTICE: The contents of this email and any attachments are
> >> confidential and may also be privileged. If you are not the intended
> >> recipient, please notify the sender immediately and do not disclose the
> >> contents to any other person, use it for any purpose, or store or copy
> the
> >> information in any medium. Thank you.
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> --
> Regards,
>
> Oleksandr Tyshchenko
>
next prev parent reply other threads:[~2021-07-08 6:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 6:12 [Kvmtool] Some thoughts on using kvmtool Virtio for Xen Wei Chen
2021-06-28 5:29 ` Wei Chen
2021-06-30 0:43 ` Stefano Stabellini
2021-07-05 10:02 ` Wei Chen
2021-07-06 12:07 ` Oleksandr
2021-07-08 6:51 ` Wei Chen [this message]
2021-07-09 11:37 ` Andre Przywara
2021-07-12 20:52 ` Stefano Stabellini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DB9PR08MB6857D8C481F55954C9DF72F29E199@DB9PR08MB6857.eurprd08.prod.outlook.com \
--to=wei.chen@arm.com \
--cc=Andre.Przywara@arm.com \
--cc=Oleksandr_Tyshchenko@epam.com \
--cc=jean-philippe@linaro.org \
--cc=julien.thierry.kdev@gmail.com \
--cc=julien@xen.org \
--cc=kvm@vger.kernel.org \
--cc=maz@kernel.org \
--cc=nd@arm.com \
--cc=olekstysh@gmail.com \
--cc=sstabellini@kernel.org \
--cc=will@kernel.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).