On Thu, Sep 09, 2021 at 05:11:49AM +0000, John Johnson wrote:
> 
> 
> > On Sep 7, 2021, at 6:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > 
> > This way the network communication code doesn't need to know how
> > messages will by processed by the client or server. There is no need for
> > if (isreply) { qemu_cond_signal(&reply->cv); } else {
> > proxy->request(proxy->reqarg, buf, &reqfds); }. The callbacks and
> > threads aren't hardcoded into the network communication code.
> > 
> 
> 	I fear we are talking past each other.  The vfio-user protocol
> is bi-directional.  e.g., the client both sends requests to the server
> and receives requests from the server on the same socket.  No matter
> what threading model we use, the receive algorithm will be:
> 
> 
> read message header
> if it’s a reply
>    schedule the thread waiting for the reply
> else
>    run a callback to process the request
> 
> 
> 	The only way I can see changing this is to establish two
> uni-directional sockets: one for requests outbound to the server,
> and one for requests inbound from the server.
> 
> 	This is the reason I chose the iothread model.  It can run
> independently of any vCPU/main threads waiting for replies and of
> the callback thread.  I did muddle this idea by having the iothread
> become a callback thread by grabbing BQL and running the callback
> inline when it receives a request from the server, but if you like a
> pure event driven model, I can make incoming requests kick a BH from
> the main loop.  e.g.,
> 
> if it’s a reply
>    qemu_cond_signal(reply cv)
> else
>    qemu_bh_schedule(proxy bh)
> 
> 	That would avoid disconnect having to handle the iothread
> blocked on BQL.
> 
> 
> > This goes back to the question earlier about why a dedicated thread is
> > necessary here. I suggest writing the network communication code using
> > coroutines. That way the code is easier to read (no callbacks or
> > thread synchronization), there are fewer thread-safety issues to worry
> > about, and users or management tools don't need to know about additional
> > threads (e.g. CPU/NUMA affinity).
> > 
> 
> 
> 	I did look at coroutines, but they seemed to work when the sender
> is triggering the coroutine on send, not when request packets are arriving
> asynchronously to the sends.

This can be done with a receiver coroutine. Its job is to be the only
thing that reads vfio-user messages from the socket. A receiver
coroutine reads messages from the socket and wakes up the waiting
coroutine that yielded from vfio_user_send_recv() or
vfio_user_pci_process_req().

(Although vfio_user_pci_process_req() could be called directly from the
receiver coroutine, it seems safer to have a separate coroutine that
processes requests so that the receiver isn't blocked in case
vfio_user_pci_process_req() yields while processing a request.)

Going back to what you mentioned above, the receiver coroutine does
something like this:

  if it's a reply
      reply = find_reply(...)
      qemu_coroutine_enter(reply->co) // instead of signalling reply->cv
  else
      QSIMPLEQ_INSERT_TAIL(&pending_reqs, request, next);
      if (pending_reqs_was_empty) {
          qemu_coroutine_enter(process_request_co);
      }

The pending_reqs queue holds incoming requests that the
process_request_co coroutine processes.

Stefan