On Wed, May 15, 2019 at 04:33:23PM +0000, Anton Yakovlev wrote:
> 
> >> After careful consideration, I think polling mode should make it possible. Like, using dedicated "data" queue per stream in polling mode (multiplexing is not an option, since it reduces queue throughput in proportion to the number of multiplexed streams). How to describe it in specification?
> >
> > Polling is an implementation detail for both drivers and devices.
> > Therefore it's not described explicitly in the specification.
> 
> And the only source of information in such case is to take a look into already existed implementations?

Yes.  The spec focusses on the hardware interface and the semantics but
not on how to implement it.

> > By the way, virtqueue buffers can be completed out-of-order by the
> > device.  This means sharing a virtqueue between multiple streams does
> > not necessarily introduce any kind of waiting.
> >
> > The only issue is the virtqueue size (e.g. 128 buffers) determines how
> > many buffers can be made available from the driver to the device at any
> > given time.  Some device implementations uses virtqueue sizes like 1024
> > so that this does not become a practical concern.  Does this change your
> > view on multiplexing streams?
> 
> Personally, I'm not a big fan of, because:
> 
> 1. It makes overall logic more complex: you need to serialize access to the queue on driver side and dispatch messages (find out recepient, serialize access etc) on device side. With separated queues you have only one producer and consumer, and they both knows ahead how to handle buffers.
> 2. It requires to define message header for buffer transfers. With separated queues you just put into descriptors a pointer to the buffer and its length.
> 
> The simpler the better. And with multiplexing basically you have no pros except having one queue. Why separated queues are undesirable?

I'm not sure which approach is best either.  In the real-time audio
programming I've done there was a function like this:

  process(const float **in_frames, int in_channels,
          float **out_frames, int out_channels,
	  int nframes);

This function processes audio frames periodically and is a callback from
the real-time audio API.

If the device multiplexes streams on a single virtqueue and uses the
batched request layout that I previously described, then the driver
implementation is very simple.  There is no need to bring together audio
data supplied through multiple virtqueues because it all comes in a
single buffer on a single virtqueue.  You mentioned that separate queues
avoid synchronization but in this case it seems to actually introduce a
synchronization requirement.

Having a single virtqueue where all audio data flows between the
process() function and the device is simple and minimizes per-virtqueue
overhead.  In an interrupt-driven implementation there are per-virtqueue
virtio-pci Queue Notify register writes and per-virtqueue interrupt
handlers, so this can generate a lot of vmexits and interrupts compared
to the multiplexed batched approach.

If your application processes audio streams in different threads and
without inter-stream synchronization, then I agree that multiple
virtqueues is the way to go.

I don't have a strong opinion on the optimal approach, just wanted to
explain why I had this idea.  Please choose as you wish.

Stefan