Re: [Qemu-devel] [RFC 2/2] spec/vhost-user spec: Add IOMMU support

From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: mst@redhat.com, vkaplans@redhat.com, jasowang@redhat.com,
	wexu@redhat.com, yuanhan.liu@linux.intel.com,
	virtio-comment@lists.oasis-open.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC 2/2] spec/vhost-user spec: Add IOMMU support
Date: Wed, 12 Apr 2017 09:24:47 +0200	[thread overview]
Message-ID: <0f3cde33-98f0-c7b0-2f3b-372a26b83384@redhat.com> (raw)
In-Reply-To: <20170412071708.GE16464@pxdev.xzpeter.org>

On 04/12/2017 09:17 AM, Peter Xu wrote:
> On Tue, Apr 11, 2017 at 05:16:19PM +0200, Maxime Coquelin wrote:
>> On 04/11/2017 03:20 PM, Peter Xu wrote:
>>> On Tue, Apr 11, 2017 at 12:10:02PM +0200, Maxime Coquelin wrote:
>
> [...]
>
>>>
>>>> +slave is expected to reply with a zero payload, non-zero otherwise.
>>>
>>> Is this ack mechanism really necessary? If not, not sure it'll be nice
>>> to keep vhost-user/vhost-kernel aligned on this behavior. At least
>>> that'll simplify vhost-user implementation on QEMU side (iiuc even
>>> without introducing new functions for update/invalidate operations).
>>
>> I think this is necessary, and it won't complexify the vhost-user
>> implementation on QEMU side, since already widely used (see reply-ack
>> feature).
>
> Could you provide file/function/link pointer to the "reply-ack"
> feature? I failed to find it myself.
>
>>
>> This reply-ack mechanism is used to obtain a behaviour closer to kernel
>> backend. Indeed, when QEMU sends a vhost_msg to the kernel backend, it
>> is blocked in the write() while the message is being processed in the
>> Kernel. With user backend, QEMU is unblocked from the write() when the
>> backend has read the message, before it is being processed.
>>
>
> I see. Then I agree with you that we may need a synchronized way to do
> it. One thing I think of is IOMMU page invalidation - it should be a
> sync operation to make sure that all the related caches were destroyed
> when the invalidation command returns in QEMU vIOMMU emulation path.
>
>>
>>>> +
>>>> +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
>>>> +master initiated the slave to master communication channel using the
>>>> +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
>>>> +failure events by sending VHOST_USER_IOTLB_MSG requests to the master with a
>>>> +struct vhost_iotlb_msg payload. For miss events, the iotlb payload has to be
>>>> +filled with the miss message type (1), the I/O virtual address and the
>>>> +permissions flags. For access failure event, the iotlb payload has to be
>>>> +filled with the access failure message type (4), the I/O virtual address and
>>>> +the permissions flags. On success, the master is expected to reply  when the
>>>> +request has been handled (for example, on miss requests, once the device IOTLB
>>>> +has been updated) with a zero payload, non-zero otherwise.
>>>
>>> Failed to understand the last sentence clearly. IIUC vhost-net will
>>> reply with an UPDATE message when a MISS message is received. Here for
>>> vhost-user are we going to send one extra zero payload after that?
>>
>> Not exactly. There are two channels, one for QEMU to backend requests
>> (channel A), one for backend to QEMU requests (channel B).
>>
>> The backend may be multi-threaded (like DPDK), one thread for handling
>> QEMU initiated requests (channel A), the others to handle packet
>> processing (i.e. one for Rx, one for Tx).
>>
>> The processing threads will need to translate iova adresses by
>> searching in the IOTLB cache. In case of miss, it will send an IOTLB
>> miss request on channel B, and then wait for the ack/nack. In case of
>> ack, it can search again the IOTLB cache and find the translation.
>>
>> On QEMU side, when the thread handling channel B requests receives the
>> IOTLB miss message, it gets the translation and send an IOTLB update
>> message on channel A. Then it waits for the ack from the backend,
>> meaning that the IOTLB cache has been updated, and replies ack on
>> channel B.
>
> If the ack on channel B is used to notify the processing thread that
> "cache is ready", then... would it be faster that we just let the
> processing thread poll the cache until it finds it, or let the other
> thread notify it when it receives ack on channel A? Not sure whether
> it'll be faster.

Not sure either.
Not requiring a ack can indeed make sense in some cases, for example
with single-threaded backends.

What we can do is to remove the mandatory ack reply for
VHOST_USER_IOTLB_MSG slave requests (miss, access fail).
The backend then can just rely on the REPLY_ACK feature, and set the
VHOST_USER_NEED_REPLY flag if it want to receive such ack.

Would it be fine for you?

Thanks,
Maxime