All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
To: dev@dpdk.org, Maxime Coquelin <maxime.coquelin@redhat.com>,
	Tiwei Bie <tiwei.bie@intel.com>,
	Tetsuya Mukawa <mtetsuyah@gmail.com>,
	Thomas Monjalon <thomas@monjalon.net>
Cc: yliu@fridaylinux.org, Stefan Hajnoczi <stefanha@redhat.com>,
	Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Subject: [RFC] vhost: new rte_vhost API proposal
Date: Thu, 10 May 2018 15:22:53 +0200	[thread overview]
Message-ID: <1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com> (raw)

rte_vhost has been confirmed not to work with some Virtio devices
(it's not vhost-user spec compliant, see details below) and fixing
it directly would require quite a big amount of changes which would
completely break backwards compatibility. This library is intended
to smooth out the transition. It exposes a low-level API for
implementing new Virtio drivers/targets. The existing rte_vhost
is about to be refactored to use rte_virtio library underneath, and
demanding drivers could now use rte_virtio directly.

rte_virtio would offer both vhost and virtio driver APIs. These two
have a lot of common code for vhost-user handling or PCI access for
initiator/virtio-vhost-user (and possibly vDPA) so there's little
sense to keep target and initiator code separated between different
libs. Of course, the APIs would be separate - only some parts of
the code would be shared.

rte_virtio intends to abstract away most vhost-user/virtio-vhost-user
specifics and to allow developers to implement Virtio targets/drivers
with an ease. It calls user-provided callbacks once proper device
initialization state has been reached. That is - memory mappings
have changed, virtqueues are ready to be processed, features have
changed in runtime, etc.

Compared to the rte_vhost, this lib additionally allows the following:
* ability to start/stop particular queues - that's required
by the vhost-user spec. rte_vhost has been already confirmed
not to work with some Virtio devices which do not initialize
some of their management queues.
* most callbacks are now asynchronous - it greatly simplifies
the event handling for asynchronous applications and doesn't
make anything harder for synchronous ones.
* this is low-level API. It doesn't have any vhost-net, nvme
or crypto references. These backend-specific libraries will
be later refactored to use *this* generic library underneath.
This implies that the library doesn't do any virtqueue processing,
it only delivers vring addresses to the user, so he can process
virtqueues by himself.
* abstracting away PCI/vhost-user.
* The API imposes how public functions can be called and how
internal data can change, so there's only a minimal work required
to ensure thread-safety. Possibly no mutexes are required at all.
* full Virtio 1.0/vhost-user specification compliance.

This patch only introduces the API. Some additional functions
for vDPA might be still required, but everything present here
so far shouldn't need changing.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
---
 lib/librte_virtio/rte_virtio.h | 245 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 245 insertions(+)
 create mode 100644 lib/librte_virtio/rte_virtio.h

diff --git a/lib/librte_virtio/rte_virtio.h b/lib/librte_virtio/rte_virtio.h
new file mode 100644
index 0000000..0203d5e
--- /dev/null
+++ b/lib/librte_virtio/rte_virtio.h
@@ -0,0 +1,245 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/vhost.h>
+
+/** Single memory region. Both physically and virtually contiguous */
+struct rte_virtio_mem_region {
+ uint64_t guest_phys_addr;
+ uint64_t guest_user_addr;
+ uint64_t host_user_addr;
+ uint64_t size;
+ void *mmap_addr;
+ uint64_t mmap_size;
+ int fd;
+};
+
+struct rte_virtio_memory {
+ uint32_t nregions;
+ struct rte_virtio_mem_region regions[];
+};
+
+/**
+ * Vhost device created and managed by rte_virtio. Accessible via
+ * \c rte_virtio_tgt_ops callbacks. This is only a part of the real
+ * vhost device data. This struct is published just for inline vdev
+ * functions to access their data directly.
+ */
+struct rte_virtio_dev {
+ struct rte_virtio_memory *mem;
+ uint64_t features;
+};
+
+/**
+ * Virtqueue created and managed by rte_virtio. Accessible via
+ * \c rte_virtio_tgt_ops callbacks.
+ */
+struct rte_virtio_vq {
+ struct vring_desc *desc;
+ struct vring_avail *avail;
+ struct vring_used *used;
+ /* available only if F_LOG_ALL has been negotiated */
+ void *log;
+ uint16_t size;
+};
+
+/**
+ * Device/queue related callbacks, all optional. Provided callback
+ * parameters are guaranteed not to be NULL until explicitly specified.
+ */
+struct rte_virtio_tgt_ops {
+ /** New initiator connected. */
+ void (*device_create)(struct rte_virtio_dev *vdev);
+ /**
+ * Device is ready to operate. vdev->mem is now available.
+ * This callback may be called multiple times as memory mappings
+ * can change dynamically. All queues are guaranteed to be stopped
+ * by now.
+ */
+ void (*device_init)(struct rte_virtio_dev *vdev);
+ /**
+ * Features have changed in runtime. Queues might be still running
+ * at this point.
+ */
+ void (*device_features_changed)(struct rte_virtio_dev *vdev);
+ /**
+ * Start processing vq. The `vq` is guaranteed not to be modified before
+ * `queue_stop` is called.
+ */
+ void (*queue_start)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+ /**
+ * Stop processing vq. It shouldn't be accessed after this callback
+ * completes (via tgt_cb_complete). This can be called prior to shutdown
+ * or before actions that require changing vhost device/vq state.
+ */
+ void (*queue_stop)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+ /** Device disconnected. All queues are guaranteed to be stopped by now */
+ void (*device_destroy)(struct rte_virtio_dev *vdev);
+ /**
+ * Custom message handler. `vdev` and `vq` can be NULL. This is called
+ * for backend-specific actions. The `id` should be prefixed by the
+ * backend name (net/crypto/scsi) and `ctx` is message-specific data
+ * that should be available until tgt_cb_complete is called.
+ */
+ void (*custom_msg)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq,
+   char *id, void *ctx);
+
+ /**
+ * Interrupt handler, synchronous. If this callback is set to NULL,
+ * rte_virtio will hint the initiators not to send any interrupts.
+ */
+ void (*queue_kick)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+ /** Device config read, synchronous. */
+ int (*get_config)(struct rte_virtio_dev *vdev, uint8_t *config,
+  uint32_t config_len);
+ /** Device config changed by the driver, synchronous. */
+ int (*set_config)(struct rte_virtio_dev *vdev, uint8_t *config,
+  uint32_t offset, uint32_t len, uint32_t flags);
+};
+
+/**
+ * Registers a new vhost target accepting remote connections. Multiple
+ * available transports are available. It is possible to create a Vhost-user
+ * Unix domain socket polling local connections or connect to a physical
+ * Virtio device and install an interrupt handler .
+ * \param trtype type of the transport used, e.g. "PCI", "PCI-vhost-user",
+ * "PCI-vDPA", "vhost-user".
+ * \param trid identifier of the device. For PCI this would be the BDF address,
+ * for vhost-user the socket name.
+ * \param trctx additional data for the specified transport. Can be NULL.
+ * \param tgt_ops callbacks to be called upon reaching specific initialization
+ * states.
+ * \param features supported Virtio features. To be negotiated with the
+ * driver ones. rte_virtio will append a couple of generic feature bits
+ * which are required by the Virtio spec. TODO list these features here
+ * \return 0 on success, negative errno otherwise
+ */
+int rte_virtio_tgt_register(char *trtype, char *trid, void *trctx,
+   struct rte_virtio_tgt_ops *tgt_ops,
+   uint64_t features);
+
+/**
+ * Finish async device tgt ops callback. Unless a tgt op has been documented
+ * as 'synchronous' this function must be called at the end of the op handler.
+ * It can be called either before or after the op handler returns. rte_virtio
+ * won't call any callbacks while another one hasn't been finished yet.
+ * \param vdev vhost device
+ * \param rc 0 on success, negative errno otherwise.
+ */
+int rte_virtio_tgt_cb_complete(struct rte_virtio_dev *vdev, int rc);
+
+/**
+ * Unregisters a vhost target asynchronously.
+ * \param cb_fn callback to be called on finish
+ * \param cb_arg argument for \c cb_fn
+ */
+void rte_virtio_tgt_unregister(char *trid,
+      void (*cb_fn)(void *arg), void *cb_arg);
+
+/**
+ * Bypass F_IOMMU_PLATFORM and translate gpa directly.
+ * \param mem vhost device memory
+ * \param gpa guest physical address
+ * \param len length of the memory to translate (in bytes). If requested
+ * memory chunk crosses memory region boundary, the *len will be set to
+ * the remaining, maximum length of virtually contiguous memory. In such
+ * case the user will be required to call another gpa_to_vva(gpa + *len).
+ * \return vhost virtual address or NULL if requested `gpa` is not mapped.
+ */
+static inline void *
+rte_virtio_gpa_to_vva(struct rte_virtio_memory *mem, uint64_t gpa, uint64_t *len)
+{
+ struct rte_virtio_mem_region *r;
+ uint32_t i;
+
+ for (i = 0; i < mem->nregions; i++) {
+ r = &mem->regions[i];
+ if (gpa >= r->guest_phys_addr &&
+    gpa <  r->guest_phys_addr + r->size) {
+
+ if (unlikely(*len > r->guest_phys_addr + r->size - gpa)) {
+ *len = r->guest_phys_addr + r->size - gpa;
+ }
+
+ return gpa - r->guest_phys_addr +
+       r->host_user_addr;
+ }
+ }
+ *len = 0;
+
+ return 0;
+}
+
+/**
+ * Translate I/O virtual address to vhost address space.
+ * If F_IOMMU_PLATFORM has been negotiated, this might potentially
+ * send a TLB miss and wait for the TLB update response.
+ * If F_IOMMU_PLATFORM has not been negotiated, `iova` is
+ * a physical address and `perm` is ignored.
+ * \param vdev vhost device
+ * \param iova I/O virtual address
+ * \param len length of the memory to translate (in bytes). If requested
+ * memory chunk crosses memory region boundary, the *len will be set to
+ * the remaining, maximum length of virtually contiguous memory. In such
+ * case the user will be required to call another gpa_to_vva(gpa + *len).
+ * \perm VHOST_ACCESS_RO,VHOST_ACCESS_WO or VHOST_ACCESS_RW
+ * \return vhost virtual address or NULL if requested `iova` is not mapped
+ * or the `perm` doesn't match.
+ */
+static inline void *
+rte_virtio_iova_to_vva(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq,
+      uint64_t iova, uint32_t *len, uint8_t perm)
+{
+ void *__vhost_iova_to_vva(struct virtio_net * dev, struct vhost_virtqueue * vq,
+  uint64_t iova, uint64_t size, uint8_t perm);
+
+ if (!(vdev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) {
+ return rte_virtio_gpa_to_vva(vdev->mem, iova, len);
+ }
+
+ return __vhost_iova_to_vva(vdev, vq, iova, len, perm);
+}
+
+/**
+ * Notify the driver about vq change. This is an eventfd_write for vhost-user
+ * or MMIO write for PCI devices.
+ */
+void rte_virtio_dev_call(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+
+/**
+ * Notify the driver about device config change. This will result in \c
+ * rte_virtio_tgt_ops->get_config being called. This is an eventfd_write
+ * for vhost-user or MMIO write for PCI devices
+ */
+void rte_virtio_dev_cfg_call(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+
-- 
2.7.4

             reply	other threads:[~2018-05-10  9:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-10 13:22 Dariusz Stojaczyk [this message]
     [not found] ` <20180510163643.GD9308@stefanha-x1.localdomain>
2018-05-11  5:55   ` [RFC] vhost: new rte_vhost API proposal Stojaczyk, DariuszX
     [not found]     ` <20180511100531.GA19894@stefanha-x1.localdomain>
2018-05-18  7:51       ` Stojaczyk, DariuszX
2018-05-18 13:01 ` [RFC v2] " Dariusz Stojaczyk
2018-05-18 13:50   ` Maxime Coquelin
2018-05-20  7:07     ` Yuanhan Liu
2018-05-22 10:19     ` Stojaczyk, DariuszX
     [not found]   ` <20180525100550.GD14757@stefanha-x1.localdomain>
2018-05-29 13:38     ` Stojaczyk, DariuszX
     [not found]       ` <20180530085700.GC14623@stefanha-x1.localdomain>
2018-05-30 12:24         ` Stojaczyk, DariuszX
     [not found]   ` <20180607151227.23660-1-darek.stojaczyk@gmail.com>
     [not found]     ` <20180608100852.GA31164@stefanha-x1.localdomain>
2018-06-13  9:41       ` [RFC v3 0/7] vhost2: new librte_vhost2 proposal Dariusz Stojaczyk
2018-06-25 11:01     ` Tiwei Bie
2018-06-25 12:17       ` Stojaczyk, DariuszX
2018-06-26  8:22         ` Tiwei Bie
2018-06-26  8:30           ` Thomas Monjalon
2018-06-26  8:47           ` Stojaczyk, DariuszX
2018-06-26  9:14             ` Tiwei Bie
2018-06-26  9:38               ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com \
    --to=dariuszx.stojaczyk@intel.com \
    --cc=dev@dpdk.org \
    --cc=maxime.coquelin@redhat.com \
    --cc=mtetsuyah@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=thomas@monjalon.net \
    --cc=tiwei.bie@intel.com \
    --cc=yliu@fridaylinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.