* [RFC 0/1] Use dmabufs for display updates instead of pixman @ 2021-03-02 8:03 Vivek Kasireddy 2021-03-02 8:03 ` [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible " Vivek Kasireddy ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Vivek Kasireddy @ 2021-03-02 8:03 UTC (permalink / raw) To: qemu-devel Cc: Daniel Vetter, Marc-André Lureau, Dongwon Kim, Vivek Kasireddy, Gerd Hoffmann This is still a WIP/RFC patch that attempts to use dmabufs for display updates with the help of Udmabuf driver instead of pixman. This patch is posted to the ML to elicit feedback and start a discussion whether something like this would be useful or not for mainly non-Virgl rendered BOs and also potentially in other cases. This patch was tested to work OK with Weston version from here: https://gitlab.freedesktop.org/Vivek/weston/-/blob/virtgpu_dmabuf/libweston/backend-drm/drm-gbm.c#L282 and Qemu launched with these options: qemu-system-x86_64 -m 8192m .... -device virtio-gpu-pci,max_outputs=1 -display gtk,gl=on ..... -object memory-backend-memfd,id=mem1,size=8192M -machine memory-backend=mem1 TODO: - Use Blob resources for getting meta-data such as modifier, format, etc. - Test with Virgil rendered BOs to see if this can be used in that case.. Considerations/Challenges: - One of the main concerns with using dmabufs is how to synchronize access to them and this use-case is no different. If the Guest is running Weston, then it could use a maximum of 4 color buffers but uses only 2 by default and flips between them if it is not sharing the FBs with other plugins while running with the drm backend. In this case, how do we make sure that Weston and Qemu UI are not using the same buffer at any given time? This is complicated by the fact that the toolkits (that Qemu UI uses) do not seem to provide a way to wait for buffer events. For example, GTK does not apparently provide a way to either wait for "send done" events or register a listener for wl_buffer release events that native Wayland/Weston clients have access to. - If we have Xorg running in the Guest, then it gets even more interesting as Xorg in some cases does frontbuffer rendering (uses DRM_IOCTL_MODE_DIRTYFB). The same challenge arises in this case as well to determine how to safely destroy or reuse the buffer in the Guest while it might be used on the Host. Some of the potential solutions for addressing the above challenges include using syncronization primitives such as fences/sync objs in Qemu UI to determine when a buffer/dmabuf is consumed by the Host display server/compositor and hold up the vblank/flip done event until that time. But this one comes with a performance concern as the Guest would not be able to queue up another flip until the previous one finishes. Other options include caching 2 or more dmabufs and releasing the others but this may not be feasible without having to modify the Guest display server/ compositor to use all color buffers or create a new color buffer each time. Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Vivek Kasireddy (1): virtio-gpu: Use dmabuf for display updates if possible instead of pixman hw/display/virtio-gpu.c | 133 +++++++++++++++++++++++++++++++++ include/hw/virtio/virtio-gpu.h | 12 +++ 2 files changed, 145 insertions(+) -- 2.26.2 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible instead of pixman 2021-03-02 8:03 [RFC 0/1] Use dmabufs for display updates instead of pixman Vivek Kasireddy @ 2021-03-02 8:03 ` Vivek Kasireddy 2021-03-02 8:29 ` Marc-André Lureau 2021-03-02 8:21 ` [RFC 0/1] Use dmabufs for display updates " no-reply 2021-03-02 12:03 ` Gerd Hoffmann 2 siblings, 1 reply; 13+ messages in thread From: Vivek Kasireddy @ 2021-03-02 8:03 UTC (permalink / raw) To: qemu-devel; +Cc: Vivek Kasireddy If a dmabuf can be created using Udmabuf driver for non-Virgil rendered resources, then this should be preferred over pixman. If a dmabuf cannot be created then we can fallback to pixman. The dmabuf create and release functions are inspired by similar functions in vfio/display.c. Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> --- hw/display/virtio-gpu.c | 133 +++++++++++++++++++++++++++++++++ include/hw/virtio/virtio-gpu.h | 12 +++ 2 files changed, 145 insertions(+) diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index 2e4a9822b6..399d46eac3 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -18,6 +18,7 @@ #include "trace.h" #include "sysemu/dma.h" #include "sysemu/sysemu.h" +#include "exec/ramblock.h" #include "hw/virtio/virtio.h" #include "migration/qemu-file-types.h" #include "hw/virtio/virtio-gpu.h" @@ -30,9 +31,14 @@ #include "qemu/module.h" #include "qapi/error.h" #include "qemu/error-report.h" +#include "standard-headers/drm/drm_fourcc.h" +#include <sys/ioctl.h> + +#include <linux/udmabuf.h> #define VIRTIO_GPU_VM_VERSION 1 +static int udmabuf_fd; static struct virtio_gpu_simple_resource* virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id); @@ -519,6 +525,119 @@ static void virtio_unref_resource(pixman_image_t *image, void *data) pixman_image_unref(data); } +static VGPUDMABuf *virtio_gpu_get_dmabuf(VirtIOGPU *g, + struct virtio_gpu_simple_resource *res) +{ + VGPUDMABuf *dmabuf; + RAMBlock *rb; + ram_addr_t offset; + struct udmabuf_create_list *create; + uint32_t modifier_hi, modifier_lo; + uint64_t modifier; + static uint64_t ids = 1; + int i, dmabuf_fd; + + create = g_malloc0(sizeof(*create) + + res->iov_cnt * sizeof (struct udmabuf_create_item)); + if (!create) + return NULL; + + create->count = res->iov_cnt; + create->flags = UDMABUF_FLAGS_CLOEXEC; + for (i = 0; i < res->iov_cnt; i++) { + rb = qemu_ram_block_from_host(res->iov[i].iov_base, false, &offset); + if (!rb || rb->fd < 0) { + g_free(create); + return NULL; + } + + create->list[i].memfd = rb->fd; + create->list[i].__pad = 0; + create->list[i].offset = offset; + create->list[i].size = res->iov[i].iov_len; + } + + dmabuf_fd = ioctl(udmabuf_fd, UDMABUF_CREATE_LIST, create); + if (dmabuf_fd < 0) { + g_free(create); + return NULL; + } + + /* FIXME: We should get the modifier and format info with blob resources */ + modifier_hi = fourcc_mod_code(INTEL, I915_FORMAT_MOD_X_TILED) >> 32; + modifier_lo = fourcc_mod_code(INTEL,I915_FORMAT_MOD_X_TILED) & 0xFFFFFFFF; + modifier = ((uint64_t)modifier_hi << 32) | modifier_lo; + + dmabuf = g_new0(VGPUDMABuf, 1); + dmabuf->dmabuf_id = ids++; + dmabuf->buf.width = res->width; + dmabuf->buf.height = res->height; + dmabuf->buf.stride = pixman_image_get_stride(res->image); + dmabuf->buf.fourcc = DRM_FORMAT_XRGB8888; + dmabuf->buf.modifier = modifier; + dmabuf->buf.fd = dmabuf_fd; + + QTAILQ_INSERT_HEAD(&g->dmabuf.bufs, dmabuf, next); + g_free(create); + + return dmabuf; +} + +static void virtio_gpu_free_one_dmabuf(VirtIOGPU *g, VGPUDMABuf *dmabuf, + struct virtio_gpu_scanout *scanout) +{ + QTAILQ_REMOVE(&g->dmabuf.bufs, dmabuf, next); + dpy_gl_release_dmabuf(scanout->con, &dmabuf->buf); + + close(dmabuf->buf.fd); + g_free(dmabuf); +} + +static void virtio_gpu_free_dmabufs(VirtIOGPU *g, + struct virtio_gpu_scanout *scanout) +{ + VGPUDMABuf *dmabuf, *tmp; + uint32_t keep = 1; + + QTAILQ_FOREACH_SAFE(dmabuf, &g->dmabuf.bufs, next, tmp) { + if (keep > 0) { + keep--; + continue; + } + assert(dmabuf != g->dmabuf.primary); + virtio_gpu_free_one_dmabuf(g, dmabuf, scanout); + } +} + +static int virtio_gpu_dmabuf_update(VirtIOGPU *g, + struct virtio_gpu_simple_resource *res, + struct virtio_gpu_scanout *scanout) +{ + VGPUDMABuf *primary; + bool free_bufs = false; + + primary = virtio_gpu_get_dmabuf(g, res); + if (!primary) { + return -EINVAL; + } + + if (g->dmabuf.primary != primary) { + g->dmabuf.primary = primary; + qemu_console_resize(scanout->con, + primary->buf.width, primary->buf.height); + dpy_gl_scanout_dmabuf(scanout->con, &primary->buf); + free_bufs = true; + } + + dpy_gl_update(scanout->con, 0, 0, primary->buf.width, primary->buf.height); + + if (free_bufs) { + virtio_gpu_free_dmabufs(g, scanout); + } + + return 0; +} + static void virtio_gpu_set_scanout(VirtIOGPU *g, struct virtio_gpu_ctrl_command *cmd) { @@ -528,6 +647,7 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g, uint32_t offset; int bpp; struct virtio_gpu_set_scanout ss; + int ret; VIRTIO_GPU_FILL_CMD(ss); virtio_gpu_bswap_32(&ss, sizeof(ss)); @@ -574,6 +694,12 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g, scanout = &g->parent_obj.scanout[ss.scanout_id]; + if (udmabuf_fd > 0) { + ret = virtio_gpu_dmabuf_update(g, res, scanout); + if (!ret) + return; + } + format = pixman_image_get_format(res->image); bpp = DIV_ROUND_UP(PIXMAN_FORMAT_BPP(format), 8); offset = (ss.r.x * bpp) + ss.r.y * pixman_image_get_stride(res->image); @@ -1139,6 +1265,13 @@ static void virtio_gpu_device_realize(DeviceState *qdev, Error **errp) return; } + udmabuf_fd = open("/dev/udmabuf", O_RDWR); + if (udmabuf_fd < 0) { + error_setg_errno(errp, errno, + "udmabuf: failed to open vhost device"); + return; + } + g->ctrl_vq = virtio_get_queue(vdev, 0); g->cursor_vq = virtio_get_queue(vdev, 1); g->ctrl_bh = qemu_bh_new(virtio_gpu_ctrl_bh, g); diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index fae149235c..153f3364a9 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -131,6 +131,13 @@ struct VirtIOGPUBaseClass { DEFINE_PROP_UINT32("xres", _state, _conf.xres, 1024), \ DEFINE_PROP_UINT32("yres", _state, _conf.yres, 768) +typedef struct VGPUDMABuf { + QemuDmaBuf buf; + uint32_t x, y; + uint64_t dmabuf_id; + QTAILQ_ENTRY(VGPUDMABuf) next; +} VGPUDMABuf; + struct VirtIOGPU { VirtIOGPUBase parent_obj; @@ -161,6 +168,11 @@ struct VirtIOGPU { uint32_t req_3d; uint32_t bytes_3d; } stats; + + struct { + QTAILQ_HEAD(, VGPUDMABuf) bufs; + VGPUDMABuf *primary; + } dmabuf; }; struct VhostUserGPU { -- 2.26.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible instead of pixman 2021-03-02 8:03 ` [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible " Vivek Kasireddy @ 2021-03-02 8:29 ` Marc-André Lureau 2021-03-09 8:26 ` Kasireddy, Vivek 0 siblings, 1 reply; 13+ messages in thread From: Marc-André Lureau @ 2021-03-02 8:29 UTC (permalink / raw) To: QEMU; +Cc: Vivek Kasireddy, Gerd Hoffmann [-- Attachment #1: Type: text/plain, Size: 8287 bytes --] Hi (adding the maintainer, Gerd, in CC) On Tue, Mar 2, 2021 at 12:17 PM Vivek Kasireddy <vivek.kasireddy@intel.com> wrote: > If a dmabuf can be created using Udmabuf driver for non-Virgil rendered > resources, then this should be preferred over pixman. If a dmabuf cannot > be created then we can fallback to pixman. > > The dmabuf create and release functions are inspired by similar > functions in vfio/display.c. > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> > --- > hw/display/virtio-gpu.c | 133 +++++++++++++++++++++++++++++++++ > include/hw/virtio/virtio-gpu.h | 12 +++ > 2 files changed, 145 insertions(+) > > diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c > index 2e4a9822b6..399d46eac3 100644 > --- a/hw/display/virtio-gpu.c > +++ b/hw/display/virtio-gpu.c > @@ -18,6 +18,7 @@ > #include "trace.h" > #include "sysemu/dma.h" > #include "sysemu/sysemu.h" > +#include "exec/ramblock.h" > #include "hw/virtio/virtio.h" > #include "migration/qemu-file-types.h" > #include "hw/virtio/virtio-gpu.h" > @@ -30,9 +31,14 @@ > #include "qemu/module.h" > #include "qapi/error.h" > #include "qemu/error-report.h" > +#include "standard-headers/drm/drm_fourcc.h" > +#include <sys/ioctl.h> > + > +#include <linux/udmabuf.h> > > #define VIRTIO_GPU_VM_VERSION 1 > > +static int udmabuf_fd; > static struct virtio_gpu_simple_resource* > virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id); > > @@ -519,6 +525,119 @@ static void virtio_unref_resource(pixman_image_t > *image, void *data) > pixman_image_unref(data); > } > > +static VGPUDMABuf *virtio_gpu_get_dmabuf(VirtIOGPU *g, > + struct > virtio_gpu_simple_resource *res) > +{ > + VGPUDMABuf *dmabuf; > + RAMBlock *rb; > + ram_addr_t offset; > + struct udmabuf_create_list *create; > + uint32_t modifier_hi, modifier_lo; > + uint64_t modifier; > + static uint64_t ids = 1; > + int i, dmabuf_fd; > + > + create = g_malloc0(sizeof(*create) + > + res->iov_cnt * sizeof (struct > udmabuf_create_item)); > + if (!create) > + return NULL; > Pointless allocation check (g_malloc will abort if it failed to allocate) + > + create->count = res->iov_cnt; > + create->flags = UDMABUF_FLAGS_CLOEXEC; > + for (i = 0; i < res->iov_cnt; i++) { > + rb = qemu_ram_block_from_host(res->iov[i].iov_base, false, > &offset); > + if (!rb || rb->fd < 0) { > + g_free(create); > + return NULL; > + } > + > + create->list[i].memfd = rb->fd; > + create->list[i].__pad = 0; > + create->list[i].offset = offset; > + create->list[i].size = res->iov[i].iov_len; > + } > + > + dmabuf_fd = ioctl(udmabuf_fd, UDMABUF_CREATE_LIST, create); > + if (dmabuf_fd < 0) { > It would be useful to print the error with errno. > + g_free(create); > (we now like new code to use g_auto) + return NULL; > + } > + > + /* FIXME: We should get the modifier and format info with blob > resources */ > + modifier_hi = fourcc_mod_code(INTEL, I915_FORMAT_MOD_X_TILED) >> 32; > + modifier_lo = fourcc_mod_code(INTEL,I915_FORMAT_MOD_X_TILED) & > 0xFFFFFFFF; > I have no idea if this format is going to work with various drivers and matches the underlying memory representation. > + modifier = ((uint64_t)modifier_hi << 32) | modifier_lo; > + > + dmabuf = g_new0(VGPUDMABuf, 1); > + dmabuf->dmabuf_id = ids++; > + dmabuf->buf.width = res->width; > + dmabuf->buf.height = res->height; > + dmabuf->buf.stride = pixman_image_get_stride(res->image); > + dmabuf->buf.fourcc = DRM_FORMAT_XRGB8888; > + dmabuf->buf.modifier = modifier; > + dmabuf->buf.fd = dmabuf_fd; > + > + QTAILQ_INSERT_HEAD(&g->dmabuf.bufs, dmabuf, next); > + g_free(create); > + > + return dmabuf; > +} > + > +static void virtio_gpu_free_one_dmabuf(VirtIOGPU *g, VGPUDMABuf *dmabuf, > + struct virtio_gpu_scanout *scanout) > +{ > + QTAILQ_REMOVE(&g->dmabuf.bufs, dmabuf, next); > + dpy_gl_release_dmabuf(scanout->con, &dmabuf->buf); > + > + close(dmabuf->buf.fd); > + g_free(dmabuf); > +} > + > +static void virtio_gpu_free_dmabufs(VirtIOGPU *g, > + struct virtio_gpu_scanout *scanout) > +{ > + VGPUDMABuf *dmabuf, *tmp; > + uint32_t keep = 1; > + > + QTAILQ_FOREACH_SAFE(dmabuf, &g->dmabuf.bufs, next, tmp) { > + if (keep > 0) { > + keep--; > + continue; > + } > + assert(dmabuf != g->dmabuf.primary); > + virtio_gpu_free_one_dmabuf(g, dmabuf, scanout); > + } > +} > + > +static int virtio_gpu_dmabuf_update(VirtIOGPU *g, > + struct virtio_gpu_simple_resource > *res, > + struct virtio_gpu_scanout *scanout) > +{ > + VGPUDMABuf *primary; > + bool free_bufs = false; > + > + primary = virtio_gpu_get_dmabuf(g, res); > + if (!primary) { > + return -EINVAL; > + } > + > + if (g->dmabuf.primary != primary) { > + g->dmabuf.primary = primary; > + qemu_console_resize(scanout->con, > + primary->buf.width, primary->buf.height); > + dpy_gl_scanout_dmabuf(scanout->con, &primary->buf); > + free_bufs = true; > + } > + > + dpy_gl_update(scanout->con, 0, 0, primary->buf.width, > primary->buf.height); > + > + if (free_bufs) { > + virtio_gpu_free_dmabufs(g, scanout); > + } > + > + return 0; > +} > + > static void virtio_gpu_set_scanout(VirtIOGPU *g, > struct virtio_gpu_ctrl_command *cmd) > { > @@ -528,6 +647,7 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g, > uint32_t offset; > int bpp; > struct virtio_gpu_set_scanout ss; > + int ret; > > VIRTIO_GPU_FILL_CMD(ss); > virtio_gpu_bswap_32(&ss, sizeof(ss)); > @@ -574,6 +694,12 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g, > > scanout = &g->parent_obj.scanout[ss.scanout_id]; > > + if (udmabuf_fd > 0) { > + ret = virtio_gpu_dmabuf_update(g, res, scanout); > + if (!ret) > + return; > + } > + > format = pixman_image_get_format(res->image); > bpp = DIV_ROUND_UP(PIXMAN_FORMAT_BPP(format), 8); > offset = (ss.r.x * bpp) + ss.r.y * > pixman_image_get_stride(res->image); > @@ -1139,6 +1265,13 @@ static void virtio_gpu_device_realize(DeviceState > *qdev, Error **errp) > return; > } > > + udmabuf_fd = open("/dev/udmabuf", O_RDWR); > If udmabuf_fd is already opened, this will leak fds. > + if (udmabuf_fd < 0) { > + error_setg_errno(errp, errno, > + "udmabuf: failed to open vhost device"); > + return; > The fallback path should keep working + } > + > g->ctrl_vq = virtio_get_queue(vdev, 0); > g->cursor_vq = virtio_get_queue(vdev, 1); > g->ctrl_bh = qemu_bh_new(virtio_gpu_ctrl_bh, g); > diff --git a/include/hw/virtio/virtio-gpu.h > b/include/hw/virtio/virtio-gpu.h > index fae149235c..153f3364a9 100644 > --- a/include/hw/virtio/virtio-gpu.h > +++ b/include/hw/virtio/virtio-gpu.h > @@ -131,6 +131,13 @@ struct VirtIOGPUBaseClass { > DEFINE_PROP_UINT32("xres", _state, _conf.xres, 1024), \ > DEFINE_PROP_UINT32("yres", _state, _conf.yres, 768) > > +typedef struct VGPUDMABuf { > + QemuDmaBuf buf; > + uint32_t x, y; > + uint64_t dmabuf_id; > + QTAILQ_ENTRY(VGPUDMABuf) next; > +} VGPUDMABuf; > + > struct VirtIOGPU { > VirtIOGPUBase parent_obj; > > @@ -161,6 +168,11 @@ struct VirtIOGPU { > uint32_t req_3d; > uint32_t bytes_3d; > } stats; > + > + struct { > + QTAILQ_HEAD(, VGPUDMABuf) bufs; > + VGPUDMABuf *primary; > + } dmabuf; > }; > > struct VhostUserGPU { > -- > 2.26.2 > > > It would be worth doing a similar change in vhost-user-gpu. -- Marc-André Lureau [-- Attachment #2: Type: text/html, Size: 11092 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible instead of pixman 2021-03-02 8:29 ` Marc-André Lureau @ 2021-03-09 8:26 ` Kasireddy, Vivek 0 siblings, 0 replies; 13+ messages in thread From: Kasireddy, Vivek @ 2021-03-09 8:26 UTC (permalink / raw) To: Marc-André Lureau, QEMU; +Cc: Gerd Hoffmann, Kim, Dongwon [-- Attachment #1: Type: text/plain, Size: 2831 bytes --] Hi Marc-Andre, <snip> + create = g_malloc0(sizeof(*create) + + res->iov_cnt * sizeof (struct udmabuf_create_item)); + if (!create) + return NULL; Pointless allocation check (g_malloc will abort if it failed to allocate) [Kasireddy, Vivek] Ok, will remove it in v2. + + create->count = res->iov_cnt; + create->flags = UDMABUF_FLAGS_CLOEXEC; + for (i = 0; i < res->iov_cnt; i++) { + rb = qemu_ram_block_from_host(res->iov[i].iov_base, false, &offset); + if (!rb || rb->fd < 0) { + g_free(create); + return NULL; + } + + create->list[i].memfd = rb->fd; + create->list[i].__pad = 0; + create->list[i].offset = offset; + create->list[i].size = res->iov[i].iov_len; + } + + dmabuf_fd = ioctl(udmabuf_fd, UDMABUF_CREATE_LIST, create); + if (dmabuf_fd < 0) { It would be useful to print the error with errno. [Kasireddy, Vivek] Sure. + g_free(create); (we now like new code to use g_auto) + return NULL; + } + + /* FIXME: We should get the modifier and format info with blob resources */ + modifier_hi = fourcc_mod_code(INTEL, I915_FORMAT_MOD_X_TILED) >> 32; + modifier_lo = fourcc_mod_code(INTEL,I915_FORMAT_MOD_X_TILED) & 0xFFFFFFFF; I have no idea if this format is going to work with various drivers and matches the underlying memory representation. [Kasireddy, Vivek] No, it won’t work with other drivers. The modifier information should be obtained from the Guest that allocates the buffers. I just added it as a temporary hack for testing with a Intel GPU. + modifier = ((uint64_t)modifier_hi << 32) | modifier_lo; + + dmabuf = g_new0(VGPUDMABuf, 1); + dmabuf->dmabuf_id = ids++; + dmabuf->buf.width = res->width; + dmabuf->buf.height = res->height; + dmabuf->buf.stride = pixman_image_get_stride(res->image); + dmabuf->buf.fourcc = DRM_FORMAT_XRGB8888; + dmabuf->buf.modifier = modifier; + dmabuf->buf.fd = dmabuf_fd; + + QTAILQ_INSERT_HEAD(&g->dmabuf.bufs, dmabuf, next); + g_free(create); + + return dmabuf; +} + udmabuf_fd = open("/dev/udmabuf", O_RDWR); If udmabuf_fd is already opened, this will leak fds. [Kasireddy, Vivek] Yup, will take care of it in v2. + if (udmabuf_fd < 0) { + error_setg_errno(errp, errno, + "udmabuf: failed to open vhost device"); + return; The fallback path should keep working [Kasireddy, Vivek] Makes sense; will fix it in v2. It would be worth doing a similar change in vhost-user-gpu. [Kasireddy, Vivek] Sure, will look into it after understanding the deltas between vhost-user and the in-qemu implementations. Thanks, Vivek -- Marc-André Lureau [-- Attachment #2: Type: text/html, Size: 9848 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-02 8:03 [RFC 0/1] Use dmabufs for display updates instead of pixman Vivek Kasireddy 2021-03-02 8:03 ` [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible " Vivek Kasireddy @ 2021-03-02 8:21 ` no-reply 2021-03-02 12:03 ` Gerd Hoffmann 2 siblings, 0 replies; 13+ messages in thread From: no-reply @ 2021-03-02 8:21 UTC (permalink / raw) To: vivek.kasireddy Cc: dongwon.kim, daniel.vetter, vivek.kasireddy, qemu-devel, kraxel, marcandre.lureau Patchew URL: https://patchew.org/QEMU/20210302080358.3095748-1-vivek.kasireddy@intel.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20210302080358.3095748-1-vivek.kasireddy@intel.com Subject: [RFC 0/1] Use dmabufs for display updates instead of pixman === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20210302080358.3095748-1-vivek.kasireddy@intel.com -> patchew/20210302080358.3095748-1-vivek.kasireddy@intel.com Switched to a new branch 'test' 5509447 virtio-gpu: Use dmabuf for display updates if possible instead of pixman === OUTPUT BEGIN === ERROR: space prohibited between function name and open parenthesis '(' #65: FILE: hw/display/virtio-gpu.c:541: + res->iov_cnt * sizeof (struct udmabuf_create_item)); ERROR: braces {} are necessary for all arms of this statement #66: FILE: hw/display/virtio-gpu.c:542: + if (!create) [...] ERROR: space required after that ',' (ctx:VxV) #92: FILE: hw/display/virtio-gpu.c:568: + modifier_lo = fourcc_mod_code(INTEL,I915_FORMAT_MOD_X_TILED) & 0xFFFFFFFF; ^ ERROR: braces {} are necessary for all arms of this statement #182: FILE: hw/display/virtio-gpu.c:699: + if (!ret) [...] total: 4 errors, 0 warnings, 196 lines checked Commit 550944737e2a (virtio-gpu: Use dmabuf for display updates if possible instead of pixman) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20210302080358.3095748-1-vivek.kasireddy@intel.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-02 8:03 [RFC 0/1] Use dmabufs for display updates instead of pixman Vivek Kasireddy 2021-03-02 8:03 ` [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible " Vivek Kasireddy 2021-03-02 8:21 ` [RFC 0/1] Use dmabufs for display updates " no-reply @ 2021-03-02 12:03 ` Gerd Hoffmann 2021-03-09 8:18 ` Kasireddy, Vivek 2021-03-18 6:24 ` Zhang, Tina 2 siblings, 2 replies; 13+ messages in thread From: Gerd Hoffmann @ 2021-03-02 12:03 UTC (permalink / raw) To: Vivek Kasireddy Cc: Daniel Vetter, Dongwon Kim, qemu-devel, Marc-André Lureau On Tue, Mar 02, 2021 at 12:03:57AM -0800, Vivek Kasireddy wrote: > This is still a WIP/RFC patch that attempts to use dmabufs for display > updates with the help of Udmabuf driver instead of pixman. This patch > is posted to the ML to elicit feedback and start a discussion whether > something like this would be useful or not for mainly non-Virgl > rendered BOs and also potentially in other cases. Yes, it surely makes sense to go into that direction. The patch as-is doesn't, it breaks the guest/host interface. That's ok-ish for a quick proof-of-concept, but clearly not merge-able. > TODO: > - Use Blob resources for getting meta-data such as modifier, format, etc. That is pretty much mandatory. Without blob resources there is no concept of resources shared between host and guest in virtio-gpu, all data is explicitly copied with transfer commands. Which implies quite a bit of work because we don't have blob resource support in qemu yet. > - Test with Virgil rendered BOs to see if this can be used in that case.. That also opens up the question how to go forward with virtio-gpu in general. The object hierarchy we have right now (skipping pci + vga variants for simplicity): TYPE_VIRTIO_GPU_BASE (abstract base) -> TYPE_VIRTIO_GPU (in-qemu implementation) -> TYPE_VHOST_USER_GPU (vhost-user implementation) When compiled with opengl + virgl TYPE_VIRTIO_GPU has a virgl=on/off property. Having a single device is not ideal for modular builds. because the hw-display-virtio-gpu.so module has a dependency on ui-opengl.so so that is needed (due to symbol references) even for the virgl=off case. Also the code is a bit of a #ifdef mess. I think we should split TYPE_VIRTIO_GPU into two devices. Remove virgl+opengl support from TYPE_VIRTIO_GPU. Add a new TYPE_VIRTIO_GPU_VIRGL, with either TYPE_VIRTIO_GPU or TYPE_VIRTIO_GPU_BASE as parent (not sure which is easier), have all opengl/virgl support code there. I think when using opengl it makes sense to also require virgl, so we can use the virglrenderer library to manage blob resources (even when the actual rendering isn't done with virgl). Also reduces the complexity and test matrix. Maybe it even makes sense to deprecate in-qemu virgl support and focus exclusively on the vhost-user implementation, so we don't have to duplicate all work for both implementations. > Considerations/Challenges: > - One of the main concerns with using dmabufs is how to synchronize access > to them and this use-case is no different. If the Guest is running Weston, > then it could use a maximum of 4 color buffers but uses only 2 by default and > flips between them if it is not sharing the FBs with other plugins while > running with the drm backend. In this case, how do we make sure that Weston > and Qemu UI are not using the same buffer at any given time? There is graphic_hw_gl_block + graphic_hw_gl_flushed for syncronization. Right now this is only wired up in spice, and it is rather simple (just stalls virgl rendering instead of providing per-buffer syncronization). > - If we have Xorg running in the Guest, then it gets even more interesting as > Xorg in some cases does frontbuffer rendering (uses DRM_IOCTL_MODE_DIRTYFB). Well, if the guest does frontbuffer rendering we can't do much about it and have to live with rendering glitches I guess. take care, Gerd ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-02 12:03 ` Gerd Hoffmann @ 2021-03-09 8:18 ` Kasireddy, Vivek 2021-03-09 9:37 ` Gerd Hoffmann 2021-03-18 6:24 ` Zhang, Tina 1 sibling, 1 reply; 13+ messages in thread From: Kasireddy, Vivek @ 2021-03-09 8:18 UTC (permalink / raw) To: Gerd Hoffmann Cc: Daniel Vetter, Kim, Dongwon, qemu-devel, Marc-André Lureau Hi Gerd, > Yes, it surely makes sense to go into that direction. > The patch as-is doesn't, it breaks the guest/host interface. > That's ok-ish for a quick proof-of-concept, but clearly not merge-able. > > > TODO: > > - Use Blob resources for getting meta-data such as modifier, format, etc. > > That is pretty much mandatory. Without blob resources there is no concept of resources > shared between host and guest in virtio-gpu, all data is explicitly copied with transfer > commands. [Kasireddy, Vivek] My understanding of virtio-gpu and the concept of resources is still fairly limited but are blob resources really needed for non-Virgl use-cases -- other than something like a dmabuf/scanout blob that shares the meta-data such as modifer? I thought the main motivation for blob resources would be to avoid the explicit copy you mentioned for Virgl workloads. > > Which implies quite a bit of work because we don't have blob resource support in qemu > yet. [Kasireddy, Vivek] I was scrubbing through old mailing list messages to understand the motivation behind blob resources as to why they are needed and came across this: https://gitlab.freedesktop.org/virgl/qemu/-/commits/virtio-gpu-next Does your work above not count for anything? > > > - Test with Virgil rendered BOs to see if this can be used in that case.. > > That also opens up the question how to go forward with virtio-gpu in general. The object > hierarchy we have right now (skipping pci + vga variants for simplicity): > > TYPE_VIRTIO_GPU_BASE (abstract base) > -> TYPE_VIRTIO_GPU (in-qemu implementation) > -> TYPE_VHOST_USER_GPU (vhost-user implementation) > > When compiled with opengl + virgl TYPE_VIRTIO_GPU has a virgl=on/off property. > Having a single device is not ideal for modular builds. > because the hw-display-virtio-gpu.so module has a dependency on ui-opengl.so so that is > needed (due to symbol references) even for the virgl=off case. Also the code is a bit of a > #ifdef mess. > > I think we should split TYPE_VIRTIO_GPU into two devices. Remove > virgl+opengl support from TYPE_VIRTIO_GPU. Add a new > TYPE_VIRTIO_GPU_VIRGL, with either TYPE_VIRTIO_GPU or > TYPE_VIRTIO_GPU_BASE as parent (not sure which is easier), have all opengl/virgl > support code there. > > I think when using opengl it makes sense to also require virgl, so we can use the > virglrenderer library to manage blob resources (even when the actual rendering isn't done > with virgl). Also reduces the complexity and test matrix. [Kasireddy, Vivek] When you say "using opengl" are you referring to the presentation of the rendered buffer via dmabuf or pixman? If yes, I am not sure why this would need to depend on Virgl. For our use-case(s) where we are using virtio-gpu in buffer sharing mode, we'd still need opengl for submitting the dmabuf to UI, IIUC. > > Maybe it even makes sense to deprecate in-qemu virgl support and focus exclusively on > the vhost-user implementation, so we don't have to duplicate all work for both > implementations. [Kasireddy, Vivek] Is the vhost-user implementation better in terms of performance, generally? > > case, how do we make sure that Weston and Qemu UI are not using the same buffer at > any given time? > > There is graphic_hw_gl_block + graphic_hw_gl_flushed for syncronization. > Right now this is only wired up in spice, and it is rather simple (just stalls virgl rendering > instead of providing per-buffer syncronization). [Kasireddy, Vivek] I guess that might work for Virgl rendering but not for our use-case. What we need is a way to tell if the previously submitted dmabuf has been consumed by the Host compositor or not before we release/close it. Weston (wl_buffer.release event and fences) and EGL (sync and fences) do provide few options but I am not sure if GTK lets us use any of those or not. Any recommendations? EGLSync objects? On a different note, any particular reason why Qemu UI EGL implementation is limited to Xorg and not extended to Wayland/Weston for which there is GTK glarea? Thanks, Vivek > > take care, > Gerd ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-09 8:18 ` Kasireddy, Vivek @ 2021-03-09 9:37 ` Gerd Hoffmann 2021-03-17 8:28 ` Kasireddy, Vivek 0 siblings, 1 reply; 13+ messages in thread From: Gerd Hoffmann @ 2021-03-09 9:37 UTC (permalink / raw) To: Kasireddy, Vivek Cc: Daniel Vetter, Kim, Dongwon, qemu-devel, Marc-André Lureau Hi, > > That is pretty much mandatory. Without blob resources there is no concept of resources > > shared between host and guest in virtio-gpu, all data is explicitly copied with transfer > > commands. > [Kasireddy, Vivek] My understanding of virtio-gpu and the concept of resources is still > fairly limited but are blob resources really needed for non-Virgl use-cases -- other than > something like a dmabuf/scanout blob that shares the meta-data such as modifer? I > thought the main motivation for blob resources would be to avoid the explicit copy you > mentioned for Virgl workloads. Well, you want avoid the copy as well, right? With blob resources you can do that in a well defined way, i.e. the guest knows what you are doing and behaves accordingly. Without blob resources you can't, at least not without violating the guests expectation that any changes it does only visible to the host after an explicit transfer (aka copy) command. > > Which implies quite a bit of work because we don't have blob resource support in qemu > > yet. > [Kasireddy, Vivek] I was scrubbing through old mailing list messages to understand the > motivation behind blob resources as to why they are needed and came across this: > https://gitlab.freedesktop.org/virgl/qemu/-/commits/virtio-gpu-next > > Does your work above not count for anything? It is quite old, and I think not up-to-date with the final revision of the blob resource specification. I wouldn't be able to update this in near future due to being busy with other projects. Feel free to grab & update & submit these patches though. > > I think when using opengl it makes sense to also require virgl, so we can use the > > virglrenderer library to manage blob resources (even when the actual rendering isn't done > > with virgl). Also reduces the complexity and test matrix. > [Kasireddy, Vivek] When you say "using opengl" are you referring to the presentation of > the rendered buffer via dmabuf or pixman? If yes, I am not sure why this would need to > depend on Virgl. Well, you can probably do it without virgl as well. But why? Instead of just using the virglrenderer library effectively duplicate the blob resource management bits in qemu? Beside the code duplication this is also a maintainance issue. This adds one more configuration to virtio-gpu. Right now you can build virtio-gpu with virgl (depends on opengl), or you can build without virgl (doesn't use opengl then). I don't think it is a good idea to add a third mode, without virgl support but using opengl for blob dma-bufs. > For our use-case(s) where we are using virtio-gpu in buffer sharing mode, > we'd still need opengl for submitting the dmabuf to UI, IIUC. Correct. When you want use dma-bufs you need opengl. > > Maybe it even makes sense to deprecate in-qemu virgl support and focus exclusively on > > the vhost-user implementation, so we don't have to duplicate all work for both > > implementations. > [Kasireddy, Vivek] Is the vhost-user implementation better in terms of performance, generally? It is better both in terms of security (it's easier to sandbox) and performance. The in-qemu implementation runs in the qemu iothread. Which also handles a bunch of other jobs. Also virglrenderer being busy -- for example with compiling complex shaders -- can block qemu for a while, which in turn can cause latency spikes in the guest. With the vhost-user implementation this is not a problem. Drawback is the extra communication (and synchronization) needed between vhost-user + qemu to make the guest display available via spice or gtk. The latter can possibly be solved by exporting the guest display as pipewire remote desktop (random idea I didn't investigate much yet). > On a different note, any particular reason why Qemu UI EGL > implementation is limited to Xorg and not extended to Wayland/Weston > for which there is GTK glarea? Well, ideally I'd love to just use glarea. Which happens on wayland. The problem with Xorg is that the gtk x11 backend uses glx not egl to create an opengl context for glarea. At least that used to be the case in the past, maybe that has changed with newer versions. qemu needs egl contexts though, otherwise dma-bufs don't work. So we are stuck with our own egl widget implementation for now. Probably we will be able to drop it at some point in the future. HTH, Gerd ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-09 9:37 ` Gerd Hoffmann @ 2021-03-17 8:28 ` Kasireddy, Vivek 2021-03-17 14:20 ` Gerd Hoffmann 0 siblings, 1 reply; 13+ messages in thread From: Kasireddy, Vivek @ 2021-03-17 8:28 UTC (permalink / raw) To: Gerd Hoffmann Cc: Kim, Dongwon, Daniel Vetter, qemu-devel, Zhang, Tina, Vetter, Daniel, Marc-André Lureau Hi Gerd, Sorry for the delayed response. I wanted to wait until I finished my proof-of-concept -- that included adding synchronization -- to ask follow up questions. > > > > Does your work above not count for anything? > > It is quite old, and I think not up-to-date with the final revision of the blob resource > specification. I wouldn't be able to update this in near future due to being busy with other > projects. Feel free to grab & update & submit these patches though. [Kasireddy, Vivek] Sure, we'll take a look at your work and use that as a starting point. Roughly, how much of your work can be reused? Also, given my limited understanding of how discrete GPUs work, I was wondering how many copies would there need to be with blob resources/dmabufs and whether a zero-copy goal would be feasible or not? > > Beside the code duplication this is also a maintainance issue. This adds one more > configuration to virtio-gpu. Right now you can build virtio-gpu with virgl (depends on > opengl), or you can build without virgl (doesn't use opengl then). I don't think it is a good > idea to add a third mode, without virgl support but using opengl for blob dma-bufs. [Kasireddy, Vivek] We'll have to re-visit this part but for our use-case with virtio-gpu, we are disabling virglrenderer in Qemu and virgl DRI driver in the Guest. However, we still need to use Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part of the UI/GTK updates. > > > > On a different note, any particular reason why Qemu UI EGL > > implementation is limited to Xorg and not extended to Wayland/Weston > > for which there is GTK glarea? > > Well, ideally I'd love to just use glarea. Which happens on wayland. > > The problem with Xorg is that the gtk x11 backend uses glx not egl to create an opengl > context for glarea. At least that used to be the case in the past, maybe that has changed > with newer versions. qemu needs egl contexts though, otherwise dma-bufs don't work. So > we are stuck with our own egl widget implementation for now. Probably we will be able > to drop it at some point in the future. [Kasireddy, Vivek] GTK X11 backend still uses GLX and it seems like that is not going to change anytime soon. Having said that, I was wondering if it makes sense to add a new purely Wayland backend besides GtkGlArea so that Qemu UI can more quickly adopt new features such as explicit sync. I was thinking about the new backend being similar to this: https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf-egl.c The reason why I am proposing this idea is because even if we manage to add explicit sync support to GTK and it gets merged, upgrading Qemu GTK support from 3.22 to > 4.x may prove to be daunting. Currently, the way I am doing explicit sync is by adding these new APIs to GTK and calling them from Qemu: static int create_egl_fence_fd(EGLDisplay dpy) { EGLSyncKHR sync = eglCreateSyncKHR(dpy, EGL_SYNC_NATIVE_FENCE_ANDROID, NULL); int fd; g_assert(sync != EGL_NO_SYNC_KHR); fd = eglDupNativeFenceFDANDROID(dpy, sync); g_assert(fd >= 0); eglDestroySyncKHR(dpy, sync); return fd; } static void wait_for_buffer_release_fence(EGLDisplay dpy) { int ret; EGLint attrib_list[] = { EGL_SYNC_NATIVE_FENCE_FD_ANDROID, release_fence_fd, EGL_NONE, }; if (release_fence_fd < 0) return; EGLSyncKHR sync = eglCreateSyncKHR(dpy, EGL_SYNC_NATIVE_FENCE_ANDROID, attrib_list); g_assert(sync); release_fence_fd = -1; eglClientWaitSyncKHR(dpy, sync, 0, EGL_FOREVER_KHR); eglDestroySyncKHR(dpy, sync); } And, of-course, I am tying the wait above to a dma_fence associated with the previous guest FB that is signalled to ensure that the Host is done using the FB thereby providing explicit synchronization between Guest and Host. It seems to work OK but I was wondering if you had any alternative ideas or suggestions for doing explicit or implicit sync that are more easier. Lastly, on a different note, I noticed that there is a virtio-gpu Windows driver here: https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/viogpu We are going to try it out but do you know how up to date it is kept? Thanks, Vivek ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-17 8:28 ` Kasireddy, Vivek @ 2021-03-17 14:20 ` Gerd Hoffmann 2021-03-18 23:41 ` Kasireddy, Vivek 0 siblings, 1 reply; 13+ messages in thread From: Gerd Hoffmann @ 2021-03-17 14:20 UTC (permalink / raw) To: Kasireddy, Vivek Cc: Kim, Dongwon, Daniel Vetter, qemu-devel, Zhang, Tina, Vetter, Daniel, Marc-André Lureau On Wed, Mar 17, 2021 at 08:28:33AM +0000, Kasireddy, Vivek wrote: > Hi Gerd, > Sorry for the delayed response. I wanted to wait until I finished my proof-of-concept -- > that included adding synchronization -- to ask follow up questions. > > > > > > > Does your work above not count for anything? > > > > It is quite old, and I think not up-to-date with the final revision of the blob resource > > specification. I wouldn't be able to update this in near future due to being busy with other > > projects. Feel free to grab & update & submit these patches though. > [Kasireddy, Vivek] Sure, we'll take a look at your work and use that as a starting > point. Roughly, how much of your work can be reused? There are some small udmabuf support patches which can probably be reused pretty much as-is. Everything else needs larger changes I suspect, but it's been a while I looked at this ... > Also, given my limited understanding of how discrete GPUs work, I was wondering how > many copies would there need to be with blob resources/dmabufs and whether a zero-copy > goal would be feasible or not? Good question. Right now there are two copies (gtk ui): (1) guest ram -> DisplaySurface -> gtk widget (gl=off), or (2) guest ram -> DisplaySurface -> texture (gl=on). You should be able to reduce this to one copy for gl=on ... (3) guest ram -> texture ... by taking DisplaySurface out of the picture, without any changes to the guest/host interface. Drawback is that it requires adding an opengl dependency to virtio-gpu even with virgl=off, because the virtio-gpu device will have to handle the copy to the texture then, in response to guest TRANSFER commands. When adding blob resource support: Easiest is probably supporting VIRTIO_GPU_BLOB_MEM_GUEST (largely identical to non-blob resources) with VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE (allows the host to create a shared mapping). Then you can go create a udmabuf for the resource on the host side. For the non-gl code path you can mmap() the udmabuf (which gives you a linear mapping for the scattered guest pages) and create a DisplaySurface backed by guest ram pages (removing the guest ram -> DisplaySurface copy). For the gl code path you can create a texture backed by the udmabuf and go render on the host without copying at all. Using VIRTIO_GPU_BLOB_MEM_GUEST + VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE for resources needs guest changes too, either in mesa (when using virgl) or the kernel driver's dumb buffer handling (when not using virgl). Alternatively (listed more for completeness): You can create a blob resource with VIRTGPU_BLOB_MEM_HOST3D (requires virgl, see also virgl_drm_winsys_resource_create_blob in mesa). It will be allocated by the host, then mapped into the guest using a virtual pci memory bar. Guest userspace (aka mesa driver) can mmap() these resources and has direct, zero-copy access to the host resource. Going to dma-buf export that, import into i915, then let the gpu render implies we are doing p2p dma from a physical (pci-assigned) device to the memory bar of a virtual pci device. Doing that should be possible, but frankly I would be surprised if that actually works out-of-the-box. Dunno how many dragons are lurking here. Could become an interesting challenge to make that fly. > > Beside the code duplication this is also a maintainance issue. This adds one more > > configuration to virtio-gpu. Right now you can build virtio-gpu with virgl (depends on > > opengl), or you can build without virgl (doesn't use opengl then). I don't think it is a good > > idea to add a third mode, without virgl support but using opengl for blob dma-bufs. > [Kasireddy, Vivek] We'll have to re-visit this part but for our use-case with virtio-gpu, we > are disabling virglrenderer in Qemu and virgl DRI driver in the Guest. However, we still > need to use Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part of > the UI/GTK updates. Well, VIRTGPU_BLOB_MEM_HOST3D blob resources are created using virgl renderer commands (VIRGL_CCMD_PIPE_RESOURCE_CREATE). So supporting that without virglrenderer is not an option. VIRTIO_GPU_BLOB_MEM_GUEST might be possible without too much effort. > > > On a different note, any particular reason why Qemu UI EGL > > > implementation is limited to Xorg and not extended to Wayland/Weston > > > for which there is GTK glarea? > > > > Well, ideally I'd love to just use glarea. Which happens on wayland. > > > > The problem with Xorg is that the gtk x11 backend uses glx not egl to create an opengl > > context for glarea. At least that used to be the case in the past, maybe that has changed > > with newer versions. qemu needs egl contexts though, otherwise dma-bufs don't work. So > > we are stuck with our own egl widget implementation for now. Probably we will be able > > to drop it at some point in the future. > [Kasireddy, Vivek] GTK X11 backend still uses GLX and it seems like that is not going to > change anytime soon. Hmm, so the egl backend has to stay for the time being. > Having said that, I was wondering if it makes sense to add a new > purely Wayland backend besides GtkGlArea so that Qemu UI can more quickly adopt new > features such as explicit sync. I was thinking about the new backend being similar to this: > https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf-egl.c I'd prefer to not do that. > The reason why I am proposing this idea is because even if we manage to add explicit > sync support to GTK and it gets merged, upgrading Qemu GTK support from 3.22 > to > 4.x may prove to be daunting. Currently, the way I am doing explicit sync is > by adding these new APIs to GTK and calling them from Qemu: Well, we had the same code supporting gtk2+3 with #ifdefs. There are also #ifdefs to avoid using functions deprecated during 3.x lifetime. So I expect porting to gtk4 wouldn't be too bad. Also I expect qemu wouldn't be the only application needing sync support, so trying to get that integrated with upstream gtk certainly makes sense. > Lastly, on a different note, I noticed that there is a virtio-gpu Windows driver here: > https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/viogpu > > We are going to try it out but do you know how up to date it is kept? No, not following development closely. take care, Gerd ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-17 14:20 ` Gerd Hoffmann @ 2021-03-18 23:41 ` Kasireddy, Vivek 0 siblings, 0 replies; 13+ messages in thread From: Kasireddy, Vivek @ 2021-03-18 23:41 UTC (permalink / raw) To: Gerd Hoffmann Cc: Kim, Dongwon, Daniel Vetter, qemu-devel, Zhang, Tina, Vetter, Daniel, Marc-André Lureau Hi Gerd, Thank you for taking the time to explain how support for blob resources needs to be added. We are going to get started soon and here are the tasks we are planning to do in order of priority: 1) Add support for VIRTIO_GPU_BLOB_MEM_GUEST + VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE 2) Upgrade Qemu GTK UI from 3.22 to 4.x 3) Add explicit sync support to GTK4 and Qemu UI 4) Add support for VIRTGPU_BLOB_MEM_HOST3D We'll start sending patches as we go along. Thanks, Vivek > > [Kasireddy, Vivek] Sure, we'll take a look at your work and use that > > as a starting point. Roughly, how much of your work can be reused? > > There are some small udmabuf support patches which can probably be reused pretty much > as-is. Everything else needs larger changes I suspect, but it's been a while I looked at this > ... > > > Also, given my limited understanding of how discrete GPUs work, I was > > wondering how many copies would there need to be with blob > > resources/dmabufs and whether a zero-copy goal would be feasible or not? > > Good question. > > Right now there are two copies (gtk ui): > > (1) guest ram -> DisplaySurface -> gtk widget (gl=off), or > (2) guest ram -> DisplaySurface -> texture (gl=on). > > You should be able to reduce this to one copy for gl=on ... > > (3) guest ram -> texture > > ... by taking DisplaySurface out of the picture, without any changes to the guest/host > interface. Drawback is that it requires adding an opengl dependency to virtio-gpu even > with virgl=off, because the virtio-gpu device will have to handle the copy to the texture > then, in response to guest TRANSFER commands. > > When adding blob resource support: > > Easiest is probably supporting VIRTIO_GPU_BLOB_MEM_GUEST (largely identical to > non-blob resources) with VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE > (allows the host to create a shared mapping). Then you can go create a udmabuf for the > resource on the host side. For the non-gl code path you can mmap() the udmabuf (which > gives you a linear mapping for the scattered guest pages) and create a DisplaySurface > backed by guest ram pages (removing the guest ram -> DisplaySurface copy). For the gl > code path you can create a texture backed by the udmabuf and go render on the host > without copying at all. > > Using VIRTIO_GPU_BLOB_MEM_GUEST + > VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE for resources needs guest changes too, > either in mesa (when using virgl) or the kernel driver's dumb buffer handling (when not > using virgl). > > Alternatively (listed more for completeness): > > You can create a blob resource with VIRTGPU_BLOB_MEM_HOST3D (requires virgl, > see also virgl_drm_winsys_resource_create_blob in mesa). It will be allocated by the > host, then mapped into the guest using a virtual pci memory bar. Guest userspace (aka > mesa driver) can mmap() these resources and has direct, zero-copy access to the host > resource. > > Going to dma-buf export that, import into i915, then let the gpu render implies we are > doing p2p dma from a physical (pci-assigned) device to the memory bar of a virtual pci > device. > > Doing that should be possible, but frankly I would be surprised if that actually works out- > of-the-box. Dunno how many dragons are lurking here. > Could become an interesting challenge to make that fly. > > > > Beside the code duplication this is also a maintainance issue. This > > > adds one more configuration to virtio-gpu. Right now you can build > > > virtio-gpu with virgl (depends on opengl), or you can build without > > > virgl (doesn't use opengl then). I don't think it is a good idea to add a third mode, > without virgl support but using opengl for blob dma-bufs. > > [Kasireddy, Vivek] We'll have to re-visit this part but for our > > use-case with virtio-gpu, we are disabling virglrenderer in Qemu and > > virgl DRI driver in the Guest. However, we still need to use > > Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part of the > UI/GTK updates. > > Well, VIRTGPU_BLOB_MEM_HOST3D blob resources are created using virgl renderer > commands (VIRGL_CCMD_PIPE_RESOURCE_CREATE). So supporting that without > virglrenderer is not an option. > > VIRTIO_GPU_BLOB_MEM_GUEST might be possible without too much effort. > > > > > On a different note, any particular reason why Qemu UI EGL > > > > implementation is limited to Xorg and not extended to > > > > Wayland/Weston for which there is GTK glarea? > > > > > > Well, ideally I'd love to just use glarea. Which happens on wayland. > > > > > > The problem with Xorg is that the gtk x11 backend uses glx not egl > > > to create an opengl context for glarea. At least that used to be > > > the case in the past, maybe that has changed with newer versions. > > > qemu needs egl contexts though, otherwise dma-bufs don't work. So > > > we are stuck with our own egl widget implementation for now. Probably we will be > able to drop it at some point in the future. > > > [Kasireddy, Vivek] GTK X11 backend still uses GLX and it seems like > > that is not going to change anytime soon. > > Hmm, so the egl backend has to stay for the time being. > > > Having said that, I was wondering if it makes sense to add a new > > purely Wayland backend besides GtkGlArea so that Qemu UI can more > > quickly adopt new features such as explicit sync. I was thinking about the new backend > being similar to this: > > https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf > > -egl.c > > I'd prefer to not do that. > > > The reason why I am proposing this idea is because even if we manage > > to add explicit sync support to GTK and it gets merged, upgrading Qemu > > GTK support from 3.22 to > 4.x may prove to be daunting. Currently, > > the way I am doing explicit sync is by adding these new APIs to GTK and calling them > from Qemu: > > Well, we had the same code supporting gtk2+3 with #ifdefs. There are also #ifdefs to > avoid using functions deprecated during 3.x lifetime. > So I expect porting to gtk4 wouldn't be too bad. > > Also I expect qemu wouldn't be the only application needing sync support, so trying to get > that integrated with upstream gtk certainly makes sense. > > > Lastly, on a different note, I noticed that there is a virtio-gpu Windows driver here: > > https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/vi > > ogpu > > > > We are going to try it out but do you know how up to date it is kept? > > No, not following development closely. > > take care, > Gerd ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-02 12:03 ` Gerd Hoffmann 2021-03-09 8:18 ` Kasireddy, Vivek @ 2021-03-18 6:24 ` Zhang, Tina 2021-03-19 10:45 ` Gerd Hoffmann 1 sibling, 1 reply; 13+ messages in thread From: Zhang, Tina @ 2021-03-18 6:24 UTC (permalink / raw) To: Gerd Hoffmann, Kasireddy, Vivek Cc: Daniel Vetter, Kim, Dongwon, Marc-André Lureau, qemu-devel > -----Original Message----- > From: Qemu-devel <qemu-devel-bounces+tina.zhang=intel.com@nongnu.org> > On Behalf Of Gerd Hoffmann > Sent: Tuesday, March 2, 2021 8:04 PM > To: Kasireddy, Vivek <vivek.kasireddy@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>; Kim, Dongwon > <dongwon.kim@intel.com>; qemu-devel@nongnu.org; Marc-André Lureau > <marcandre.lureau@redhat.com> > Subject: Re: [RFC 0/1] Use dmabufs for display updates instead of pixman > > On Tue, Mar 02, 2021 at 12:03:57AM -0800, Vivek Kasireddy wrote: > > This is still a WIP/RFC patch that attempts to use dmabufs for display > > updates with the help of Udmabuf driver instead of pixman. This patch > > is posted to the ML to elicit feedback and start a discussion whether > > something like this would be useful or not for mainly non-Virgl > > rendered BOs and also potentially in other cases. > > Yes, it surely makes sense to go into that direction. > The patch as-is doesn't, it breaks the guest/host interface. > That's ok-ish for a quick proof-of-concept, but clearly not merge-able. Hi, According to https://lore.kernel.org/dri-devel/20210212110140.gdpu7kapnr7ovdcn@sirius.home.kraxel.org/ proposal, we made some progress on making a 'virtio-gpu (display) + pass-through GPU' prototype. We leverage the kmsro framework provided by mesa to let the virtio-gpu display work with a passed-through GPU in headless mode. And the MR is here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592 Although our work is different from this on-going discussion which is about enabling a general way to share buffers between guest and host, we'd like to leverage this patch. So, is there any plan to refine this patch? E.g. move the uuid blob support into another patch, as the implementation of the proposal doesn't require guest user space to share buffers with host side, and also maybe add the dma-buf support for cursor plane. Thanks. BR, Tina > > > TODO: > > - Use Blob resources for getting meta-data such as modifier, format, etc. > > That is pretty much mandatory. Without blob resources there is no concept of > resources shared between host and guest in virtio-gpu, all data is explicitly > copied with transfer commands. > > Which implies quite a bit of work because we don't have blob resource support > in qemu yet. > > > - Test with Virgil rendered BOs to see if this can be used in that case.. > > That also opens up the question how to go forward with virtio-gpu in general. > The object hierarchy we have right now (skipping pci + vga variants for > simplicity): > > TYPE_VIRTIO_GPU_BASE (abstract base) > -> TYPE_VIRTIO_GPU (in-qemu implementation) > -> TYPE_VHOST_USER_GPU (vhost-user implementation) > > When compiled with opengl + virgl TYPE_VIRTIO_GPU has a virgl=on/off > property. Having a single device is not ideal for modular builds. > because the hw-display-virtio-gpu.so module has a dependency on ui-opengl.so > so that is needed (due to symbol references) even for the virgl=off case. Also > the code is a bit of a #ifdef mess. > > I think we should split TYPE_VIRTIO_GPU into two devices. Remove > virgl+opengl support from TYPE_VIRTIO_GPU. Add a new > TYPE_VIRTIO_GPU_VIRGL, with either TYPE_VIRTIO_GPU or > TYPE_VIRTIO_GPU_BASE as parent (not sure which is easier), have all > opengl/virgl support code there. > > I think when using opengl it makes sense to also require virgl, so we can use the > virglrenderer library to manage blob resources (even when the actual rendering > isn't done with virgl). Also reduces the complexity and test matrix. > > Maybe it even makes sense to deprecate in-qemu virgl support and focus > exclusively on the vhost-user implementation, so we don't have to duplicate all > work for both implementations. > > > Considerations/Challenges: > > - One of the main concerns with using dmabufs is how to synchronize > > access to them and this use-case is no different. If the Guest is > > running Weston, then it could use a maximum of 4 color buffers but > > uses only 2 by default and flips between them if it is not sharing the > > FBs with other plugins while running with the drm backend. In this > > case, how do we make sure that Weston and Qemu UI are not using the same > buffer at any given time? > > There is graphic_hw_gl_block + graphic_hw_gl_flushed for syncronization. > Right now this is only wired up in spice, and it is rather simple (just stalls virgl > rendering instead of providing per-buffer syncronization). > > > - If we have Xorg running in the Guest, then it gets even more > > interesting as Xorg in some cases does frontbuffer rendering (uses > DRM_IOCTL_MODE_DIRTYFB). > > Well, if the guest does frontbuffer rendering we can't do much about it and have > to live with rendering glitches I guess. > > take care, > Gerd > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC 0/1] Use dmabufs for display updates instead of pixman 2021-03-18 6:24 ` Zhang, Tina @ 2021-03-19 10:45 ` Gerd Hoffmann 0 siblings, 0 replies; 13+ messages in thread From: Gerd Hoffmann @ 2021-03-19 10:45 UTC (permalink / raw) To: Zhang, Tina Cc: qemu-devel, Daniel Vetter, Kasireddy, Vivek, Marc-André Lureau, Kim, Dongwon > Hi, > According to > https://lore.kernel.org/dri-devel/20210212110140.gdpu7kapnr7ovdcn@sirius.home.kraxel.org/ > proposal, we made some progress on making a 'virtio-gpu (display) + > pass-through GPU' prototype. We leverage the kmsro framework provided > by mesa to let the virtio-gpu display work with a passed-through GPU > in headless mode. And the MR is here: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592 Cool. > Although our work is different from this on-going discussion which is > about enabling a general way to share buffers between guest and host, > we'd like to leverage this patch. So, is there any plan to refine this > patch? Item (1) on Vivek's new TODO list should provide that. Once we have shared blob resources we can create udmabufs on the host side, which in turn allows to drop extra copies in the display path and speed up this use case as well (both with and without opengl). take care, Gerd ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-03-19 10:46 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-02 8:03 [RFC 0/1] Use dmabufs for display updates instead of pixman Vivek Kasireddy 2021-03-02 8:03 ` [RFC 1/1] virtio-gpu: Use dmabuf for display updates if possible " Vivek Kasireddy 2021-03-02 8:29 ` Marc-André Lureau 2021-03-09 8:26 ` Kasireddy, Vivek 2021-03-02 8:21 ` [RFC 0/1] Use dmabufs for display updates " no-reply 2021-03-02 12:03 ` Gerd Hoffmann 2021-03-09 8:18 ` Kasireddy, Vivek 2021-03-09 9:37 ` Gerd Hoffmann 2021-03-17 8:28 ` Kasireddy, Vivek 2021-03-17 14:20 ` Gerd Hoffmann 2021-03-18 23:41 ` Kasireddy, Vivek 2021-03-18 6:24 ` Zhang, Tina 2021-03-19 10:45 ` Gerd Hoffmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.