From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tetsuya Mukawa <mukawa@igel.co.jp>
Subject: Re: [RFC PATCH v2] vhost: Add VHOST PMD
Date: Wed, 21 Oct 2015 13:30:54 +0900
Message-ID: <562714FE.4030504@igel.co.jp>
References: <1440993326-21205-1-git-send-email-mukawa@igel.co.jp>
 <1440993326-21205-2-git-send-email-mukawa@igel.co.jp>
 <74F120C019F4A64C9B78E802F6AD4CC24CB97F66@IRSMSX106.ger.corp.intel.com>
 <5620B804.1050300@igel.co.jp>
 <74F120C019F4A64C9B78E802F6AD4CC24F7AC24F@IRSMSX106.ger.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Cc: "ann.zhuangyanying@huawei.com" <ann.zhuangyanying@huawei.com>
To: "Loftus, Ciara" <ciara.loftus@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com
 [209.85.220.51]) by dpdk.org (Postfix) with ESMTP id 7F5CC8F9C
 for <dev@dpdk.org>; Wed, 21 Oct 2015 06:30:58 +0200 (CEST)
Received: by pacfv9 with SMTP id fv9so43929069pac.3
 for <dev@dpdk.org>; Tue, 20 Oct 2015 21:30:57 -0700 (PDT)
In-Reply-To: <74F120C019F4A64C9B78E802F6AD4CC24F7AC24F@IRSMSX106.ger.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 2015/10/20 23:13, Loftus, Ciara wrote:
>
>>>> +
>>>> +static uint16_t
>>>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>>>> +{
>>>> +	struct vhost_queue *r = q;
>>>> +	uint16_t i, nb_tx = 0;
>>>> +
>>>> +	if (unlikely(r->internal == NULL))
>>>> +		return 0;
>>>> +
>>>> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
>>>> +		return 0;
>>>> +
>>>> +	rte_atomic16_set(&r->tx_executing, 1);
>>>> +
>>>> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
>>>> +		goto out;
>>>> +
>>>> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
>>>> +			VIRTIO_RXQ, bufs, nb_bufs);
>>>> +
>>>> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
>>>> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
>>>> +
>>>> +	for (i = 0; likely(i < nb_tx); i++)
>>>> +		rte_pktmbuf_free(bufs[i]);
>>> We may not always want to free these mbufs. For example, if a call is made
>> to rte_eth_tx_burst with buffers from another (non DPDK) source, they may
>> not be ours to free.
>>
>> Sorry, I am not sure what type of buffer you want to transfer.
>>
>> This is a PMD that wraps librte_vhost.
>> And I guess other PMDs cannot handle buffers from another non DPDK
>> source.
>> Should we take care such buffers?
>>
>> I have also checked af_packet PMD.
>> It seems the tx function of af_packet PMD just frees mbuf.
> For example if using the PMD with an application that receives buffers from another source. Eg. a virtual switch receiving packets from an interface using the kernel driver.

For example, if a software switch on host tries to send data to DPDK
application on guest using vhost PMD and virtio-net PMD.
Also let's assume transfer data of software switch is come from kernel
driver.
In this case, these data on software switch will be copied and
transferred to virio-net PMD through virtqueue.
Because of this, we can free data after sending.
Could you please also check API documentation rte_eth_tx_burst?
(Freeing buffer is default behavior)

> I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?

I guess ring PMD is something special.
Because we don't want to copy data with this PMD, RX function doesn't
allocate buffers, also TX function doesn't free buffers.
But other normal PMD will allocate buffers when RX is called, and free
buffers when TX is called.

>>>> +
>>>> +
>>>> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
>>>> +	if (eth_dev == NULL) {
>>>> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
>>> Typo: Failure. Same for the destroy_device function
>> Thanks, I will fix it in next patches.
>>
>>>> +		return -1;
>>>> +	}
>>>> +
>>>> +	internal = eth_dev->data->dev_private;
>>>> +
>>>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>>>> +		vq = &internal->rx_vhost_queues[i];
>>>> +		vq->device = dev;
>>>> +		vq->internal = internal;
>>>> +	}
>>>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>>>> +		vq = &internal->tx_vhost_queues[i];
>>>> +		vq->device = dev;
>>>> +		vq->internal = internal;
>>>> +	}
>>>> +
>>>> +	dev->flags |= VIRTIO_DEV_RUNNING;
>>>> +	dev->priv = eth_dev;
>>>> +
>>>> +	eth_dev->data->dev_link.link_status = 1;
>>>> +	rte_atomic16_set(&internal->xfer, 1);
>>>> +
>>>> +	RTE_LOG(INFO, PMD, "New connection established\n");
>>>> +
>>>> +	return 0;
>>> Some freedom is taken away if the new_device and destroy_device
>> callbacks are implemented in the driver.
>>> For example if one wishes to  call the rte_vhost_enable_guest_notification
>> function when a new device is brought up. They cannot now as there is no
>> scope to modify these callbacks, as is done in for example the vHost sample
>> app. Is this correct?
>>
>> So how about adding one more parameter to be able to choose guest
>> notification behavior?
>>
>> ex)
>> ./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'
>>
>> In above case, all queues in this device will have
>> VRING_USED_F_NO_NOTIFY.
> I'm not too concerned about this particular function, I was just making an example. The main concern I was expressing here and in the other thread with Bruce, is the risk that we will lose some functionality available in the library but not in the PMD. This function is an example of that. If we could find some way to retain the functionality available in the library, it would be ideal.

I will reply to an other thread.
Anyway, I am going to keep current vhost library APIs.

Thanks,
Tetsuya