All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yongji Xie <xieyongji@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	"Parav Pandit" <parav@nvidia.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Christian Brauner" <christian.brauner@canonical.com>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Jens Axboe" <axboe@kernel.dk>,
	bcrl@kvack.org, "Jonathan Corbet" <corbet@lwn.net>,
	"Mika Penttilä" <mika.penttila@nextfour.com>,
	"Dan Carpenter" <dan.carpenter@oracle.com>,
	joro@8bytes.org, "Greg KH" <gregkh@linuxfoundation.org>,
	songmuchun@bytedance.com,
	virtualization <virtualization@lists.linux-foundation.org>,
	netdev@vger.kernel.org, kvm <kvm@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
Date: Tue, 29 Jun 2021 13:43:11 +0800	[thread overview]
Message-ID: <CACycT3uxnQmXWsgmNVxQtiRhz1UXXTAJFY3OiAJqokbJH6ifMA@mail.gmail.com> (raw)
In-Reply-To: <YNSCH6l31zwPxBjL@stefanha-x1.localdomain>

On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jun 15, 2021 at 10:13:31PM +0800, Xie Yongji wrote:
> > VDUSE (vDPA Device in Userspace) is a framework to support
> > implementing software-emulated vDPA devices in userspace. This
> > document is intended to clarify the VDUSE design and usage.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > ---
> >  Documentation/userspace-api/index.rst |   1 +
> >  Documentation/userspace-api/vduse.rst | 222 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 223 insertions(+)
> >  create mode 100644 Documentation/userspace-api/vduse.rst
> >
> > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > index 0b5eefed027e..c432be070f67 100644
> > --- a/Documentation/userspace-api/index.rst
> > +++ b/Documentation/userspace-api/index.rst
> > @@ -27,6 +27,7 @@ place where this information is gathered.
> >     iommu
> >     media/index
> >     sysfs-platform_profile
> > +   vduse
> >
> >  .. only::  subproject and html
> >
> > diff --git a/Documentation/userspace-api/vduse.rst b/Documentation/userspace-api/vduse.rst
> > new file mode 100644
> > index 000000000000..2f9cd1a4e530
> > --- /dev/null
> > +++ b/Documentation/userspace-api/vduse.rst
> > @@ -0,0 +1,222 @@
> > +==================================
> > +VDUSE - "vDPA Device in Userspace"
> > +==================================
> > +
> > +vDPA (virtio data path acceleration) device is a device that uses a
> > +datapath which complies with the virtio specifications with vendor
> > +specific control path. vDPA devices can be both physically located on
> > +the hardware or emulated by software. VDUSE is a framework that makes it
> > +possible to implement software-emulated vDPA devices in userspace. And
> > +to make it simple, the emulated vDPA device's control path is handled in
> > +the kernel and only the data path is implemented in the userspace.
> > +
> > +Note that only virtio block device is supported by VDUSE framework now,
> > +which can reduce security risks when the userspace process that implements
> > +the data path is run by an unprivileged user. The Support for other device
> > +types can be added after the security issue is clarified or fixed in the future.
> > +
> > +Start/Stop VDUSE devices
> > +------------------------
> > +
> > +VDUSE devices are started as follows:
> > +
> > +1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
> > +   /dev/vduse/control.
> > +
> > +2. Begin processing VDUSE messages from /dev/vduse/$NAME. The first
> > +   messages will arrive while attaching the VDUSE instance to vDPA bus.
> > +
> > +3. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE
> > +   instance to vDPA bus.
> > +
> > +VDUSE devices are stopped as follows:
> > +
> > +1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE
> > +   instance from vDPA bus.
> > +
> > +2. Close the file descriptor referring to /dev/vduse/$NAME
> > +
> > +3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on
> > +   /dev/vduse/control
> > +
> > +The netlink messages metioned above can be sent via vdpa tool in iproute2
> > +or use the below sample codes:
> > +
> > +.. code-block:: c
> > +
> > +     static int netlink_add_vduse(const char *name, enum vdpa_command cmd)
> > +     {
> > +             struct nl_sock *nlsock;
> > +             struct nl_msg *msg;
> > +             int famid;
> > +
> > +             nlsock = nl_socket_alloc();
> > +             if (!nlsock)
> > +                     return -ENOMEM;
> > +
> > +             if (genl_connect(nlsock))
> > +                     goto free_sock;
> > +
> > +             famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME);
> > +             if (famid < 0)
> > +                     goto close_sock;
> > +
> > +             msg = nlmsg_alloc();
> > +             if (!msg)
> > +                     goto close_sock;
> > +
> > +             if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0))
> > +                     goto nla_put_failure;
> > +
> > +             NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name);
> > +             if (cmd == VDPA_CMD_DEV_NEW)
> > +                     NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse");
> > +
> > +             if (nl_send_sync(nlsock, msg))
> > +                     goto close_sock;
> > +
> > +             nl_close(nlsock);
> > +             nl_socket_free(nlsock);
> > +
> > +             return 0;
> > +     nla_put_failure:
> > +             nlmsg_free(msg);
> > +     close_sock:
> > +             nl_close(nlsock);
> > +     free_sock:
> > +             nl_socket_free(nlsock);
> > +             return -1;
> > +     }
> > +
> > +How VDUSE works
> > +---------------
> > +
> > +Since the emuldated vDPA device's control path is handled in the kernel,
>
> s/emuldated/emulated/
>

Will fix it.

> > +a message-based communication protocol and few types of control messages
> > +are introduced by VDUSE framework to make userspace be aware of the data
> > +path related changes:
> > +
> > +- VDUSE_GET_VQ_STATE: Get the state for virtqueue from userspace
> > +
> > +- VDUSE_START_DATAPLANE: Notify userspace to start the dataplane
> > +
> > +- VDUSE_STOP_DATAPLANE: Notify userspace to stop the dataplane
> > +
> > +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device IOTLB
> > +
> > +Userspace needs to read()/write() on /dev/vduse/$NAME to receive/reply
> > +those control messages from/to VDUSE kernel module as follows:
> > +
> > +.. code-block:: c
> > +
> > +     static int vduse_message_handler(int dev_fd)
> > +     {
> > +             int len;
> > +             struct vduse_dev_request req;
> > +             struct vduse_dev_response resp;
> > +
> > +             len = read(dev_fd, &req, sizeof(req));
> > +             if (len != sizeof(req))
> > +                     return -1;
> > +
> > +             resp.request_id = req.request_id;
> > +
> > +             switch (req.type) {
> > +
> > +             /* handle different types of message */
> > +
> > +             }
> > +
> > +             if (req.flags & VDUSE_REQ_FLAGS_NO_REPLY)
> > +                     return 0;
> > +
> > +             len = write(dev_fd, &resp, sizeof(resp));
> > +             if (len != sizeof(resp))
> > +                     return -1;
> > +
> > +             return 0;
> > +     }
> > +
> > +After VDUSE_START_DATAPLANE messages is received, userspace should start the
> > +dataplane processing with the help of some ioctls on /dev/vduse/$NAME:
> > +
> > +- VDUSE_IOTLB_GET_FD: get the file descriptor to the first overlapped iova region.
> > +  Userspace can access this iova region by passing fd and corresponding size, offset,
> > +  perm to mmap(). For example:
> > +
> > +.. code-block:: c
> > +
> > +     static int perm_to_prot(uint8_t perm)
> > +     {
> > +             int prot = 0;
> > +
> > +             switch (perm) {
> > +             case VDUSE_ACCESS_WO:
> > +                     prot |= PROT_WRITE;
> > +                     break;
> > +             case VDUSE_ACCESS_RO:
> > +                     prot |= PROT_READ;
> > +                     break;
> > +             case VDUSE_ACCESS_RW:
> > +                     prot |= PROT_READ | PROT_WRITE;
> > +                     break;
> > +             }
> > +
> > +             return prot;
> > +     }
> > +
> > +     static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len)
> > +     {
> > +             int fd;
> > +             void *addr;
> > +             size_t size;
> > +             struct vduse_iotlb_entry entry;
> > +
> > +             entry.start = iova;
> > +             entry.last = iova + 1;
>
> Why +1?
>
> I expected the request to include *len so that VDUSE can create a bounce
> buffer for the full iova range, if necessary.
>

The function is used to translate iova to va. And the *len is not
specified by the caller. Instead, it's used to tell the caller the
length of the contiguous iova region from the specified iova. And the
ioctl VDUSE_IOTLB_GET_FD will get the file descriptor to the first
overlapped iova region. So using iova + 1 should be enough here.

> > +             fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry);
> > +             if (fd < 0)
> > +                     return NULL;
> > +
> > +             size = entry.last - entry.start + 1;
> > +             *len = entry.last - iova + 1;
> > +             addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED,
> > +                         fd, entry.offset);
> > +             close(fd);
> > +             if (addr == MAP_FAILED)
> > +                     return NULL;
> > +
> > +             /* do something to cache this iova region */
>
> How is userspace expected to manage iotlb mmaps? When should munmap(2)
> be called?
>

The simple way is using a list to store the iotlb mappings. And we
should call the munmap(2) for the old mappings when VDUSE_UPDATE_IOTLB
or VDUSE_STOP_DATAPLANE message is received.

> Should userspace expect VDUSE_IOTLB_GET_FD to return a full chunk of
> guest RAM (e.g. multiple gigabytes) that can be cached permanently or
> will it return just enough pages to cover [start, last)?
>

It should return one iotlb mapping that covers [start, last). In
vhost-vdpa cases, it might be a full chunk of guest RAM. In
virtio-vdpa cases, it might be the whole bounce buffer or one coherent
mapping (produced by dma_alloc_coherent()).

> > +
> > +             return addr + iova - entry.start;
> > +     }
> > +
> > +- VDUSE_DEV_GET_FEATURES: Get the negotiated features
>
> Are these VIRTIO feature bits? Please explain how feature negotiation
> works. There must be a way for userspace to report the device's
> supported feature bits to the kernel.
>

Yes, these are VIRTIO feature bits. Userspace will specify the
device's supported feature bits when creating a new VDUSE device with
ioctl(VDUSE_CREATE_DEV).

> > +- VDUSE_DEV_UPDATE_CONFIG: Update the configuration space and inject a config interrupt
>
> Does this mean the contents of the configuration space are cached by
> VDUSE?

Yes, but the kernel will also store the same contents.

> The downside is that the userspace code cannot generate the
> contents on demand. Most devices doin't need to generate the contents
> on demand, so I think this is okay but I had expected a different
> interface:
>
> kernel->userspace VDUSE_DEV_GET_CONFIG
> userspace->kernel VDUSE_DEV_INJECT_CONFIG_IRQ
>

The problem is how to handle the failure of VDUSE_DEV_GET_CONFIG. We
will need lots of modification of virtio codes to support that. So to
make it simple, we choose this way:

userspace -> kernel VDUSE_DEV_SET_CONFIG
userspace -> kernel VDUSE_DEV_INJECT_CONFIG_IRQ

> I think you can leave it the way it is, but I wanted to mention this in
> case someone thinks it's important to support generating the contents of
> the configuration space on demand.
>

Sorry, I didn't get you here. Can't VDUSE_DEV_SET_CONFIG and
VDUSE_DEV_INJECT_CONFIG_IRQ achieve that?

> > +- VDUSE_VQ_GET_INFO: Get the specified virtqueue's metadata
> > +
> > +- VDUSE_VQ_SETUP_KICKFD: set the kickfd for virtqueue, this eventfd is used
> > +  by VDUSE kernel module to notify userspace to consume the vring.
> > +
> > +- VDUSE_INJECT_VQ_IRQ: inject an interrupt for specific virtqueue
>
> This information is useful but it's not enough to be able to implement a
> userspace device. Please provide more developer documentation or at
> least refer to uapi header files, published documents, etc that contain
> the details.

OK, I will try to add more details.

Thanks,
Yongji

WARNING: multiple messages have this Message-ID (diff)
From: Yongji Xie <xieyongji@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: kvm <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	virtualization <virtualization@lists.linux-foundation.org>,
	"Christian Brauner" <christian.brauner@canonical.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Dan Carpenter" <dan.carpenter@oracle.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	songmuchun@bytedance.com, "Jens Axboe" <axboe@kernel.dk>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	iommu@lists.linux-foundation.org, bcrl@kvack.org,
	netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Mika Penttilä" <mika.penttila@nextfour.com>
Subject: Re: Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
Date: Tue, 29 Jun 2021 13:43:11 +0800	[thread overview]
Message-ID: <CACycT3uxnQmXWsgmNVxQtiRhz1UXXTAJFY3OiAJqokbJH6ifMA@mail.gmail.com> (raw)
In-Reply-To: <YNSCH6l31zwPxBjL@stefanha-x1.localdomain>

On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jun 15, 2021 at 10:13:31PM +0800, Xie Yongji wrote:
> > VDUSE (vDPA Device in Userspace) is a framework to support
> > implementing software-emulated vDPA devices in userspace. This
> > document is intended to clarify the VDUSE design and usage.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > ---
> >  Documentation/userspace-api/index.rst |   1 +
> >  Documentation/userspace-api/vduse.rst | 222 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 223 insertions(+)
> >  create mode 100644 Documentation/userspace-api/vduse.rst
> >
> > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > index 0b5eefed027e..c432be070f67 100644
> > --- a/Documentation/userspace-api/index.rst
> > +++ b/Documentation/userspace-api/index.rst
> > @@ -27,6 +27,7 @@ place where this information is gathered.
> >     iommu
> >     media/index
> >     sysfs-platform_profile
> > +   vduse
> >
> >  .. only::  subproject and html
> >
> > diff --git a/Documentation/userspace-api/vduse.rst b/Documentation/userspace-api/vduse.rst
> > new file mode 100644
> > index 000000000000..2f9cd1a4e530
> > --- /dev/null
> > +++ b/Documentation/userspace-api/vduse.rst
> > @@ -0,0 +1,222 @@
> > +==================================
> > +VDUSE - "vDPA Device in Userspace"
> > +==================================
> > +
> > +vDPA (virtio data path acceleration) device is a device that uses a
> > +datapath which complies with the virtio specifications with vendor
> > +specific control path. vDPA devices can be both physically located on
> > +the hardware or emulated by software. VDUSE is a framework that makes it
> > +possible to implement software-emulated vDPA devices in userspace. And
> > +to make it simple, the emulated vDPA device's control path is handled in
> > +the kernel and only the data path is implemented in the userspace.
> > +
> > +Note that only virtio block device is supported by VDUSE framework now,
> > +which can reduce security risks when the userspace process that implements
> > +the data path is run by an unprivileged user. The Support for other device
> > +types can be added after the security issue is clarified or fixed in the future.
> > +
> > +Start/Stop VDUSE devices
> > +------------------------
> > +
> > +VDUSE devices are started as follows:
> > +
> > +1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
> > +   /dev/vduse/control.
> > +
> > +2. Begin processing VDUSE messages from /dev/vduse/$NAME. The first
> > +   messages will arrive while attaching the VDUSE instance to vDPA bus.
> > +
> > +3. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE
> > +   instance to vDPA bus.
> > +
> > +VDUSE devices are stopped as follows:
> > +
> > +1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE
> > +   instance from vDPA bus.
> > +
> > +2. Close the file descriptor referring to /dev/vduse/$NAME
> > +
> > +3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on
> > +   /dev/vduse/control
> > +
> > +The netlink messages metioned above can be sent via vdpa tool in iproute2
> > +or use the below sample codes:
> > +
> > +.. code-block:: c
> > +
> > +     static int netlink_add_vduse(const char *name, enum vdpa_command cmd)
> > +     {
> > +             struct nl_sock *nlsock;
> > +             struct nl_msg *msg;
> > +             int famid;
> > +
> > +             nlsock = nl_socket_alloc();
> > +             if (!nlsock)
> > +                     return -ENOMEM;
> > +
> > +             if (genl_connect(nlsock))
> > +                     goto free_sock;
> > +
> > +             famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME);
> > +             if (famid < 0)
> > +                     goto close_sock;
> > +
> > +             msg = nlmsg_alloc();
> > +             if (!msg)
> > +                     goto close_sock;
> > +
> > +             if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0))
> > +                     goto nla_put_failure;
> > +
> > +             NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name);
> > +             if (cmd == VDPA_CMD_DEV_NEW)
> > +                     NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse");
> > +
> > +             if (nl_send_sync(nlsock, msg))
> > +                     goto close_sock;
> > +
> > +             nl_close(nlsock);
> > +             nl_socket_free(nlsock);
> > +
> > +             return 0;
> > +     nla_put_failure:
> > +             nlmsg_free(msg);
> > +     close_sock:
> > +             nl_close(nlsock);
> > +     free_sock:
> > +             nl_socket_free(nlsock);
> > +             return -1;
> > +     }
> > +
> > +How VDUSE works
> > +---------------
> > +
> > +Since the emuldated vDPA device's control path is handled in the kernel,
>
> s/emuldated/emulated/
>

Will fix it.

> > +a message-based communication protocol and few types of control messages
> > +are introduced by VDUSE framework to make userspace be aware of the data
> > +path related changes:
> > +
> > +- VDUSE_GET_VQ_STATE: Get the state for virtqueue from userspace
> > +
> > +- VDUSE_START_DATAPLANE: Notify userspace to start the dataplane
> > +
> > +- VDUSE_STOP_DATAPLANE: Notify userspace to stop the dataplane
> > +
> > +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device IOTLB
> > +
> > +Userspace needs to read()/write() on /dev/vduse/$NAME to receive/reply
> > +those control messages from/to VDUSE kernel module as follows:
> > +
> > +.. code-block:: c
> > +
> > +     static int vduse_message_handler(int dev_fd)
> > +     {
> > +             int len;
> > +             struct vduse_dev_request req;
> > +             struct vduse_dev_response resp;
> > +
> > +             len = read(dev_fd, &req, sizeof(req));
> > +             if (len != sizeof(req))
> > +                     return -1;
> > +
> > +             resp.request_id = req.request_id;
> > +
> > +             switch (req.type) {
> > +
> > +             /* handle different types of message */
> > +
> > +             }
> > +
> > +             if (req.flags & VDUSE_REQ_FLAGS_NO_REPLY)
> > +                     return 0;
> > +
> > +             len = write(dev_fd, &resp, sizeof(resp));
> > +             if (len != sizeof(resp))
> > +                     return -1;
> > +
> > +             return 0;
> > +     }
> > +
> > +After VDUSE_START_DATAPLANE messages is received, userspace should start the
> > +dataplane processing with the help of some ioctls on /dev/vduse/$NAME:
> > +
> > +- VDUSE_IOTLB_GET_FD: get the file descriptor to the first overlapped iova region.
> > +  Userspace can access this iova region by passing fd and corresponding size, offset,
> > +  perm to mmap(). For example:
> > +
> > +.. code-block:: c
> > +
> > +     static int perm_to_prot(uint8_t perm)
> > +     {
> > +             int prot = 0;
> > +
> > +             switch (perm) {
> > +             case VDUSE_ACCESS_WO:
> > +                     prot |= PROT_WRITE;
> > +                     break;
> > +             case VDUSE_ACCESS_RO:
> > +                     prot |= PROT_READ;
> > +                     break;
> > +             case VDUSE_ACCESS_RW:
> > +                     prot |= PROT_READ | PROT_WRITE;
> > +                     break;
> > +             }
> > +
> > +             return prot;
> > +     }
> > +
> > +     static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len)
> > +     {
> > +             int fd;
> > +             void *addr;
> > +             size_t size;
> > +             struct vduse_iotlb_entry entry;
> > +
> > +             entry.start = iova;
> > +             entry.last = iova + 1;
>
> Why +1?
>
> I expected the request to include *len so that VDUSE can create a bounce
> buffer for the full iova range, if necessary.
>

The function is used to translate iova to va. And the *len is not
specified by the caller. Instead, it's used to tell the caller the
length of the contiguous iova region from the specified iova. And the
ioctl VDUSE_IOTLB_GET_FD will get the file descriptor to the first
overlapped iova region. So using iova + 1 should be enough here.

> > +             fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry);
> > +             if (fd < 0)
> > +                     return NULL;
> > +
> > +             size = entry.last - entry.start + 1;
> > +             *len = entry.last - iova + 1;
> > +             addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED,
> > +                         fd, entry.offset);
> > +             close(fd);
> > +             if (addr == MAP_FAILED)
> > +                     return NULL;
> > +
> > +             /* do something to cache this iova region */
>
> How is userspace expected to manage iotlb mmaps? When should munmap(2)
> be called?
>

The simple way is using a list to store the iotlb mappings. And we
should call the munmap(2) for the old mappings when VDUSE_UPDATE_IOTLB
or VDUSE_STOP_DATAPLANE message is received.

> Should userspace expect VDUSE_IOTLB_GET_FD to return a full chunk of
> guest RAM (e.g. multiple gigabytes) that can be cached permanently or
> will it return just enough pages to cover [start, last)?
>

It should return one iotlb mapping that covers [start, last). In
vhost-vdpa cases, it might be a full chunk of guest RAM. In
virtio-vdpa cases, it might be the whole bounce buffer or one coherent
mapping (produced by dma_alloc_coherent()).

> > +
> > +             return addr + iova - entry.start;
> > +     }
> > +
> > +- VDUSE_DEV_GET_FEATURES: Get the negotiated features
>
> Are these VIRTIO feature bits? Please explain how feature negotiation
> works. There must be a way for userspace to report the device's
> supported feature bits to the kernel.
>

Yes, these are VIRTIO feature bits. Userspace will specify the
device's supported feature bits when creating a new VDUSE device with
ioctl(VDUSE_CREATE_DEV).

> > +- VDUSE_DEV_UPDATE_CONFIG: Update the configuration space and inject a config interrupt
>
> Does this mean the contents of the configuration space are cached by
> VDUSE?

Yes, but the kernel will also store the same contents.

> The downside is that the userspace code cannot generate the
> contents on demand. Most devices doin't need to generate the contents
> on demand, so I think this is okay but I had expected a different
> interface:
>
> kernel->userspace VDUSE_DEV_GET_CONFIG
> userspace->kernel VDUSE_DEV_INJECT_CONFIG_IRQ
>

The problem is how to handle the failure of VDUSE_DEV_GET_CONFIG. We
will need lots of modification of virtio codes to support that. So to
make it simple, we choose this way:

userspace -> kernel VDUSE_DEV_SET_CONFIG
userspace -> kernel VDUSE_DEV_INJECT_CONFIG_IRQ

> I think you can leave it the way it is, but I wanted to mention this in
> case someone thinks it's important to support generating the contents of
> the configuration space on demand.
>

Sorry, I didn't get you here. Can't VDUSE_DEV_SET_CONFIG and
VDUSE_DEV_INJECT_CONFIG_IRQ achieve that?

> > +- VDUSE_VQ_GET_INFO: Get the specified virtqueue's metadata
> > +
> > +- VDUSE_VQ_SETUP_KICKFD: set the kickfd for virtqueue, this eventfd is used
> > +  by VDUSE kernel module to notify userspace to consume the vring.
> > +
> > +- VDUSE_INJECT_VQ_IRQ: inject an interrupt for specific virtqueue
>
> This information is useful but it's not enough to be able to implement a
> userspace device. Please provide more developer documentation or at
> least refer to uapi header files, published documents, etc that contain
> the details.

OK, I will try to add more details.

Thanks,
Yongji
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-06-29  5:43 UTC|newest]

Thread overview: 193+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 14:13 [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2021-06-15 14:13 ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 01/10] iova: Export alloc_iova_fast() and free_iova_fast(); Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 02/10] file: Export receive_fd() to modules Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal() Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-17  8:33   ` He Zhe
2021-06-17  8:33     ` He Zhe
2021-06-17  8:33     ` He Zhe
2021-06-18  3:29     ` Yongji Xie
2021-06-18  3:29       ` Yongji Xie
2021-06-18  8:41       ` He Zhe
2021-06-18  8:41         ` He Zhe
2021-06-18  8:41         ` He Zhe
2021-06-18  8:44       ` [PATCH] eventfd: Enlarge recursion limit to allow vhost to work He Zhe
2021-06-18  8:44         ` He Zhe
2021-06-18  8:44         ` He Zhe
2021-07-03  8:31         ` Michael S. Tsirkin
2021-07-03  8:31           ` Michael S. Tsirkin
2021-07-03  8:31           ` Michael S. Tsirkin
2021-08-25  7:57         ` Yongji Xie
2021-08-25  7:57           ` Yongji Xie
2021-06-15 14:13 ` [PATCH v8 04/10] vhost-iotlb: Add an opaque pointer for vhost IOTLB Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 05/10] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 06/10] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 07/10] vdpa: Support transferring virtual addressing during DMA mapping Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 08/10] vduse: Implement an MMU-based IOMMU driver Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-15 14:13 ` [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-21  9:13   ` Jason Wang
2021-06-21  9:13     ` Jason Wang
2021-06-21  9:13     ` Jason Wang
2021-06-21 10:41     ` Yongji Xie
2021-06-21 10:41       ` Yongji Xie
2021-06-22  5:06       ` Jason Wang
2021-06-22  5:06         ` Jason Wang
2021-06-22  5:06         ` Jason Wang
2021-06-22  7:22         ` Yongji Xie
2021-06-22  7:22           ` Yongji Xie
2021-06-22  7:49           ` Jason Wang
2021-06-22  7:49             ` Jason Wang
2021-06-22  7:49             ` Jason Wang
2021-06-22  8:14             ` Yongji Xie
2021-06-22  8:14               ` Yongji Xie
2021-06-23  3:30               ` Jason Wang
2021-06-23  3:30                 ` Jason Wang
2021-06-23  3:30                 ` Jason Wang
2021-06-23  5:50                 ` Yongji Xie
2021-06-23  5:50                   ` Yongji Xie
2021-06-24  3:34                   ` Jason Wang
2021-06-24  3:34                     ` Jason Wang
2021-06-24  3:34                     ` Jason Wang
2021-06-24  4:46                     ` Yongji Xie
2021-06-24  4:46                       ` Yongji Xie
2021-06-24  8:13                       ` Jason Wang
2021-06-24  8:13                         ` Jason Wang
2021-06-24  8:13                         ` Jason Wang
2021-06-24  9:16                         ` Yongji Xie
2021-06-24  9:16                           ` Yongji Xie
2021-06-25  3:08                           ` Jason Wang
2021-06-25  3:08                             ` Jason Wang
2021-06-25  3:08                             ` Jason Wang
2021-06-25  4:19                             ` Yongji Xie
2021-06-25  4:19                               ` Yongji Xie
2021-06-28  4:40                               ` Jason Wang
2021-06-28  4:40                                 ` Jason Wang
2021-06-28  4:40                                 ` Jason Wang
2021-06-29  2:26                                 ` Yongji Xie
2021-06-29  2:26                                   ` Yongji Xie
2021-06-29  3:29                                   ` Jason Wang
2021-06-29  3:29                                     ` Jason Wang
2021-06-29  3:29                                     ` Jason Wang
2021-06-29  3:56                                     ` Yongji Xie
2021-06-29  3:56                                       ` Yongji Xie
2021-06-29  4:03                                       ` Jason Wang
2021-06-29  4:03                                         ` Jason Wang
2021-06-29  4:03                                         ` Jason Wang
2021-06-24 14:46   ` Stefan Hajnoczi
2021-06-24 14:46     ` Stefan Hajnoczi
2021-06-24 14:46     ` Stefan Hajnoczi
2021-06-29  2:59     ` Yongji Xie
2021-06-29  2:59       ` Yongji Xie
2021-06-30  9:51       ` Stefan Hajnoczi
2021-06-30  9:51         ` Stefan Hajnoczi
2021-06-30  9:51         ` Stefan Hajnoczi
2021-07-01  6:50         ` Yongji Xie
2021-07-01  6:50           ` Yongji Xie
2021-07-01  7:55           ` Jason Wang
2021-07-01  7:55             ` Jason Wang
2021-07-01  7:55             ` Jason Wang
2021-07-01 10:26             ` Yongji Xie
2021-07-01 10:26               ` Yongji Xie
2021-07-02  3:25               ` Jason Wang
2021-07-02  3:25                 ` Jason Wang
2021-07-02  3:25                 ` Jason Wang
2021-07-07  8:52   ` Stefan Hajnoczi
2021-07-07  8:52     ` Stefan Hajnoczi
2021-07-07  8:52     ` Stefan Hajnoczi
2021-07-07  9:19     ` Yongji Xie
2021-07-07  9:19       ` Yongji Xie
2021-06-15 14:13 ` [PATCH v8 10/10] Documentation: Add documentation for VDUSE Xie Yongji
2021-06-15 14:13   ` Xie Yongji
2021-06-24 13:01   ` Stefan Hajnoczi
2021-06-24 13:01     ` Stefan Hajnoczi
2021-06-24 13:01     ` Stefan Hajnoczi
2021-06-29  5:43     ` Yongji Xie [this message]
2021-06-29  5:43       ` Yongji Xie
2021-06-30 10:06       ` Stefan Hajnoczi
2021-06-30 10:06         ` Stefan Hajnoczi
2021-06-30 10:06         ` Stefan Hajnoczi
2021-07-01 10:00         ` Yongji Xie
2021-07-01 10:00           ` Yongji Xie
2021-07-01 13:15           ` Stefan Hajnoczi
2021-07-01 13:15             ` Stefan Hajnoczi
2021-07-01 13:15             ` Stefan Hajnoczi
2021-07-04  9:49             ` Yongji Xie
2021-07-04  9:49               ` Yongji Xie
2021-07-05  3:36               ` Jason Wang
2021-07-05  3:36                 ` Jason Wang
2021-07-05  3:36                 ` Jason Wang
2021-07-05 12:49                 ` Stefan Hajnoczi
2021-07-05 12:49                   ` Stefan Hajnoczi
2021-07-05 12:49                   ` Stefan Hajnoczi
2021-07-06  2:34                   ` Jason Wang
2021-07-06  2:34                     ` Jason Wang
2021-07-06  2:34                     ` Jason Wang
2021-07-06 10:14                     ` Stefan Hajnoczi
2021-07-06 10:14                       ` Stefan Hajnoczi
2021-07-06 10:14                       ` Stefan Hajnoczi
     [not found]                       ` <CACGkMEs2HHbUfarum8uQ6wuXoDwLQUSXTsa-huJFiqr__4cwRg@mail.gmail.com>
     [not found]                         ` <YOSOsrQWySr0andk@stefanha-x1.localdomain>
     [not found]                           ` <100e6788-7fdf-1505-d69c-bc28a8bc7a78@redhat.com>
     [not found]                             ` <YOVr801d01YOPzLL@stefanha-x1.localdomain>
2021-07-07  9:24                               ` Jason Wang
2021-07-07  9:24                                 ` Jason Wang
2021-07-07  9:24                                 ` Jason Wang
2021-07-07 15:54                                 ` Stefan Hajnoczi
2021-07-07 15:54                                   ` Stefan Hajnoczi
2021-07-07 15:54                                   ` Stefan Hajnoczi
2021-07-08  4:17                                   ` Jason Wang
2021-07-08  4:17                                     ` Jason Wang
2021-07-08  4:17                                     ` Jason Wang
2021-07-08  9:06                                     ` Stefan Hajnoczi
2021-07-08  9:06                                       ` Stefan Hajnoczi
2021-07-08  9:06                                       ` Stefan Hajnoczi
2021-07-08 12:35                                       ` Yongji Xie
2021-07-08 12:35                                         ` Yongji Xie
2021-07-06  3:04                   ` Yongji Xie
2021-07-06  3:04                     ` Yongji Xie
2021-07-06 10:22                     ` Stefan Hajnoczi
2021-07-06 10:22                       ` Stefan Hajnoczi
2021-07-06 10:22                       ` Stefan Hajnoczi
2021-07-07  9:09                       ` Yongji Xie
2021-07-07  9:09                         ` Yongji Xie
2021-07-08  9:07                         ` Stefan Hajnoczi
2021-07-08  9:07                           ` Stefan Hajnoczi
2021-07-08  9:07                           ` Stefan Hajnoczi
2021-06-24 15:12 ` [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace Stefan Hajnoczi
2021-06-24 15:12   ` Stefan Hajnoczi
2021-06-24 15:12   ` Stefan Hajnoczi
2021-06-29  3:15   ` Yongji Xie
2021-06-29  3:15     ` Yongji Xie
2021-06-28 10:33 ` Liu Xiaodong
2021-06-28 10:33   ` Liu Xiaodong
2021-06-28 10:33   ` Liu Xiaodong
2021-06-28  4:35   ` Jason Wang
2021-06-28  4:35     ` Jason Wang
2021-06-28  4:35     ` Jason Wang
2021-06-28  5:54     ` Liu, Xiaodong
2021-06-28  5:54       ` Liu, Xiaodong
2021-06-28  5:54       ` Liu, Xiaodong
2021-06-29  4:10       ` Jason Wang
2021-06-29  4:10         ` Jason Wang
2021-06-29  4:10         ` Jason Wang
2021-06-29  7:56         ` Liu, Xiaodong
2021-06-29  7:56           ` Liu, Xiaodong
2021-06-29  7:56           ` Liu, Xiaodong
2021-06-29  8:14           ` Yongji Xie
2021-06-29  8:14             ` Yongji Xie
2021-06-28 10:32   ` Yongji Xie
2021-06-28 10:32     ` Yongji Xie
2021-06-28 10:32     ` Yongji Xie
2021-06-29  4:12     ` Jason Wang
2021-06-29  4:12       ` Jason Wang
2021-06-29  4:12       ` Jason Wang
2021-06-29  6:40       ` Yongji Xie
2021-06-29  6:40         ` Yongji Xie
2021-06-29  7:33         ` Jason Wang
2021-06-29  7:33           ` Jason Wang
2021-06-29  7:33           ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACycT3uxnQmXWsgmNVxQtiRhz1UXXTAJFY3OiAJqokbJH6ifMA@mail.gmail.com \
    --to=xieyongji@bytedance.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=christian.brauner@canonical.com \
    --cc=corbet@lwn.net \
    --cc=dan.carpenter@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mika.penttila@nextfour.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=rdunlap@infradead.org \
    --cc=sgarzare@redhat.com \
    --cc=songmuchun@bytedance.com \
    --cc=stefanha@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.