Hi,

>> I've seen you have a notify_ops in the vmci bits.  Do you have different
>> notify ops depending on socket type or something?  Does it make sense to
>> move the notify ops ptr into "struct vsock_sock" maybe?
> 
> The notify stuff only applies to STREAMs.  However, we have two different
> notify impls, one for legacy ESX and one for newer, and we figure out at
> runtime which protocol we're using with the hypervisor and set the
> callbacks appropriately.

Ok.

> The difference between the two is that the
> newer one is much smarter and knows not to signal (the peer) quite so much,
> i.e., it has some basic but sensible flow-control, which improves
> performance quite a bit.

Yea, with that background it makes more sense.  I think we can reduce
number of hooks though.

Looking at vsock_stream_sendmsg:
  * notify_send_init -- needed I guess.
  * notify_send_pre_block -- needed too, I guess you'll force signaling
    the peer here (buffers full and there is more data).
  * notify_send_pre_enqueue + notify_send_post_enqueue look like they
    could easily be folded into stream_enqueue.

> Again, that might not make any sense at all
> for virtio.  Do you need to signal when you enqueue to a ring?  And is
> there coalescing?

Sure virtio signals, we have to if we don't want poll ;)

Usually virtio only signals in case it finds the ring empty, if there
are requests not yet processed by the peer there is no need to signal.
The ring lives in guest memory and to figure whenever the peer has seen
the requests in there we only need a memory barrier, not a vmexit.

In the end it is up to the driver when he signals the peer.  The virtio
net driver goes beyond the simple "signal-when-the-ring-is-empty" and
has some logic for coalescing things when there is a high network load,
while trying to not trade that for high latencies when the virtual nic
is almost idle.

I guess you try archiving something simliar with vsock/vmci using the
notifications, and yes, we might want use them with virtio too some day.
 Most likely not for the first revision though, there are more important
issues to tackle first.

>> And can we make it optional please (i.e. allow the function pointers to
>> be NULL)?
> 
> They were originally allowed to be NULL, but I changed it in the last
> round of patches while moving them into the transport, since I disliked
> the NULL checks so much.  I can put them back, but that's a bigger
> change, and I'm not sure we want to push large patches to Dave right
> now :)

Yea, it's probably better to discuss+collect the refinements and send
them batched.

>> Which problem you are trying to tackle with the notifications?
> 
> It's to do with signaling the peer, or more appropriately, trying to
> avoid signaling the peer when possible.  The naive impl. is to signal
> every time we enqueue or dequeue data (into our VMCI queuepairs).
> But signaling is slow, since it involves a world exit, so we prefer
> not to.  Which means we need to keep track of rate of flow and figure
> out when we should and should not, and that's what all the notification
> stuff does.  It's...ugly...

But makes sense even if it isn't that pretty.

But /me wonders why you maintain the notify state per-call and not
per-socket/queuepair.  For the data flow it shouldn't make a difference
whenever the application does a few send(big-buffer) or many
send(small-buffer) calls.

>>> For the VMCI transport, it indicates if the underlying queuepair is
>>> still around (i.e., make sure we haven't torn it down while sleeping
>>> in a blocking send or receive).  Perhaps it's not the best name?
>>
>> How you'd hit that?  Peer closing the socket while sleeping?  Other
>> thread closing the socket wile sleeping?  Both?
>>
>> I think a state field in struct vsock_sock would be a better solution here.
> 
> Hrm, lemme think about this one.

There is already vsock->sk.sk_state ...

cheers,
  Gerd