On Wed, Sep 08, 2021 at 01:37:53PM +0000, John Levon wrote: > On Wed, Sep 08, 2021 at 01:58:46PM +0100, Stefan Hajnoczi wrote: > > > > +static void *vfu_object_attach_ctx(void *opaque) > > > +{ > > > + VfuObject *o = opaque; > > > + int ret; > > > + > > > +retry_attach: > > > + ret = vfu_attach_ctx(o->vfu_ctx); > > > + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { > > > > Does this loop consume 100% CPU since this is non-blocking? > > Looks like it. Instead after vfu_create_ctx, there should be a vfu_get_poll_fd() > to get the listen socket, then a qemu_set_fd_handler(vfu_object_attach_ctx) > to handle the attach when the listen socket is ready, modulo the below part. > > > libvfio-user has non-blocking listen_fd but conn_fd is always blocking. > > It is, but in vfu_run_ctx(), we poll on it: > > ``` > 790 if (vfu_ctx->flags & LIBVFIO_USER_FLAG_ATTACH_NB) { > 791 sock_flags = MSG_DONTWAIT | MSG_WAITALL; > 792 } > 793 return get_msg(hdr, sizeof(*hdr), fds, nr_fds, ts->conn_fd, sock_flags); > ``` This is only used for the request header. Other I/O is blocking. > > > This means ATTACH_NB is not useful because vfu_attach_ctx() is actually > > blocking. > > You're correct that vfu_attach_ctx is in fact partially blocking: after > accepting the connection, we call negotiate(), which can indeed block waiting if > the client hasn't sent anything. > > > I think this means vfu_run_ctx() is also blocking in some places > > Correct. There's a presumption that if a message is ready, we can read it all > without blocking, and equally that we can write to the socket without blocking. > > The library docs are not at all clear on this point. > > > and QEMU's event loop might hang :( > > > > Can you make libvfio-user non-blocking in order to solve these issues? > > I presume you're concerned about the security aspect: a malicious client could > withhold a write, and hence hang the device server. > > Problem is the libvfio-user API is synchronous: there's no way to return > half-way through a vfu_attach_ctx() (or a vfu_run_ctx() after we read the > header) then resume. > > We'd have to have a whole separate API to do that, so a separate thread seems a > better approach? Whether to support non-blocking properly in libvfio-user is a decision for you. If libvfio-user doesn't support non-blocking, then QEMU should run a dedicated thread instead of the partially non-blocking approach in this patch. A non-blocking approach is nice when there are many devices hosted in a single process or a lot of async replies (which requires extra thread synchronization with the blocking approach). Stefan