From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tetsuya Mukawa <mukawa@igel.co.jp>
Subject: Re: [PATCH v6 2/2] vhost: Add VHOST PMD
Date: Wed, 3 Feb 2016 11:13:44 +0900
Message-ID: <56B16258.4050607@igel.co.jp>
References: <1448355603-21275-2-git-send-email-mukawa@igel.co.jp>
 <1454411922-5597-3-git-send-email-mukawa@igel.co.jp>
 <20160202234319.GA925@sivlogin002.ir.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Cc: dev@dpdk.org, ann.zhuangyanying@huawei.com, yuanhan.liu@intel.com
To: ferruh.yigit@intel.com
Return-path: <dev-bounces@dpdk.org>
Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com
 [209.85.220.47]) by dpdk.org (Postfix) with ESMTP id E889B95CC
 for <dev@dpdk.org>; Wed,  3 Feb 2016 03:13:48 +0100 (CET)
Received: by mail-pa0-f47.google.com with SMTP id ho8so4422143pac.2
 for <dev@dpdk.org>; Tue, 02 Feb 2016 18:13:48 -0800 (PST)
In-Reply-To: <20160202234319.GA925@sivlogin002.ir.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 2016/02/03 8:43, Ferruh Yigit wrote:
> On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
>> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
>> index 33c9cea..5819cdb 100644
>> --- a/doc/guides/nics/index.rst
>> +++ b/doc/guides/nics/index.rst
>> @@ -47,6 +47,7 @@ Network Interface Controller Drivers
>>      nfp
>>      szedata2
>>      virtio
>> +    vhost
>>      vmxnet3
>>      pcap_ring
>>  
>> diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
> Should this be 2.3 release notes?

Hi Ferruh,

Thanks for your checking.
Yes, it should be. I will fix it.

>> +
>> +struct pmd_internal {
>> +	TAILQ_ENTRY(pmd_internal) next;
>> +	char *dev_name;
>> +	char *iface_name;
>> +	unsigned nb_rx_queues;
>> +	unsigned nb_tx_queues;
>> +	uint8_t port_id;
> nb_rx_queuues, nb_tx_queues and port_id are duplicated in rte_eth_dev_data, there is no reason to not use them but create new ones.
> But you may need to keep list of eth devices instead of internals_list for this update, not sure.
> please check: http://dpdk.org/dev/patchwork/patch/10284/

It seems I wrongly understand how to use queues in virtual PMD, then
duplicates some values.
Sure, I will follow above patch, and fix all same issues in my patch.
Also will check Null PMD fixing.

>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>> +		vq = internal->rx_vhost_queues[i];
>> +		if (vq == NULL)
>> +			continue;
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +		rte_vhost_enable_guest_notification(
>> +				dev, vq->virtqueue_id, 0);
> syntax: no line wrap required here

Will fix.

>
>> +	}
>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>> +		vq = internal->tx_vhost_queues[i];
>> +		if (vq == NULL)
>> +			continue;
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +		rte_vhost_enable_guest_notification(
>> +				dev, vq->virtqueue_id, 0);
> syntax: no line wrap required here

Will fix it also.

>> +
>> +static int
>> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>> +{
>> +	struct rte_vhost_vring_state *state;
>> +	struct pmd_internal *internal;
>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>> +	int newnode, ret;
>> +#endif
>> +
>> +	if (dev == NULL) {
>> +		RTE_LOG(ERR, PMD, "Invalid argument\n");
>> +		return -1;
>> +	}
>> +
>> +	internal = find_internal_resource(dev->ifname);
> Can find_internal_resource() return NULL here?

Will add null pointer checking here.

>> +	state = vring_states[internal->port_id];
>> +	if (!state) {
>> +		RTE_LOG(ERR, PMD, "Unused port\n");
>> +		return -1;
>> +	}
>> +
>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
>> +			MPOL_F_NODE | MPOL_F_ADDR);
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, PMD, "Unknow numa node\n");
>> +		return -1;
>> +	}
>> +
>> +	rte_eth_devices[internal->port_id].data->numa_node = newnode;
> Isn't dev->priv already has eth_dev, do we need to access to eth_dev as rte_eth_devices[...]
> valid for above, instead of find_internal_resource() can't we use dev->priv->data->dev_private

'dev->priv' will be filled in new_device(). And this event will be come
before calling new_device().
Because of this, the function accesses rte_eth_devices[].

>
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
>> +		   uint16_t nb_rx_desc __rte_unused,
>> +		   unsigned int socket_id,
>> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +		   struct rte_mempool *mb_pool)
>> +{
>> +	struct pmd_internal *internal = dev->data->dev_private;
>> +	struct vhost_queue *vq;
>> +
>> +	rte_free(internal->rx_vhost_queues[rx_queue_id]);
> Why free here, initially this value already should be NULL?
> If possible to call queue_setup multiple times, does make sense to free in rx/tx_queue_release() functions?

Yes, you are right.
I forgot to delete debug code.
Will remove it.

>> +
>> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
>> +			RTE_CACHE_LINE_SIZE, socket_id);
>> +	if (vq == NULL) {
>> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
>> +		return -ENOMEM;
>> +	}
> Other vPMDs use static memory for queues in internals struct to escape from allocate/free with a cost of memory consumption
> Just another option if you prefer.

This is because queues may be in different numa node in vhost PMD case.

>> +	dev_info->min_rx_bufsize = 0;
>> +}
>> +
>> +static void
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> why name is igb_stats, historical?

Yeah, it's came from copy and paste.
I will fix it and below issues you  pointed out.
I appreciate your carefully reviewing!

Tetsuya

>> +static int
>> +eth_dev_vhost_create(const char *name, int index,
>> +		     char *iface_name,
>> +		     int16_t queues,
>> +		     const unsigned numa_node)
>> +{
>> +	struct rte_eth_dev_data *data = NULL;
>> +	struct pmd_internal *internal = NULL;
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct ether_addr *eth_addr = NULL;
>> +	struct rte_vhost_vring_state *vring_state = NULL;
>> +
>> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
>> +		numa_node);
>> +
>> +	/* now do all data allocation - for eth_dev structure, dummy pci driver
>> +	 * and internal (private) data
>> +	 */
>> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
>> +	if (data == NULL)
>> +		goto error;
>> +
>> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
>> +	if (internal == NULL)
>> +		goto error;
>> +
>> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
>> +	if (eth_addr == NULL)
>> +		goto error;
>> +	*eth_addr = base_eth_addr;
>> +	eth_addr->addr_bytes[5] = index;
>> +
>> +	/* reserve an ethdev entry */
>> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
>> +	if (eth_dev == NULL)
>> +		goto error;
>> +
>> +	TAILQ_INIT(&(eth_dev->link_intr_cbs));
>> +
>> +	/* now put it all together
>> +	 * - store queue data in internal,
>> +	 * - store numa_node info in ethdev data
>> +	 * - point eth_dev_data to internals
>> +	 * - and point eth_dev structure to new eth_dev_data structure
>> +	 */
>> +	internal->nb_rx_queues = queues;
>> +	internal->nb_tx_queues = queues;
>> +	internal->dev_name = strdup(name);
>> +	if (internal->dev_name == NULL)
>> +		goto error;
> eth_dev successfully allocated, do we need to do something in error case?
>
>> +	internal->iface_name = strdup(iface_name);
>> +	if (internal->iface_name == NULL) {
>> +		free(internal->dev_name);
>> +		goto error;
>> +	}
>> +	internal->port_id = eth_dev->data->port_id;
>> +
>> +	vring_state = rte_zmalloc_socket(name,
>> +			sizeof(*vring_state), 0, numa_node);
>> +	if (vring_state == NULL)
> free dev_name & iface_name.
>
>> +		goto error;
>> +	rte_spinlock_init(&vring_state->lock);
>> +	vring_states[eth_dev->data->port_id] = vring_state;
>> +
>> +	pthread_mutex_lock(&internal_list_lock);
>> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
>> +	pthread_mutex_unlock(&internal_list_lock);
>> +
>> +	data->dev_private = internal;
>> +	data->port_id = eth_dev->data->port_id;
>> +	memmove(data->name, eth_dev->data->name, sizeof(data->name));
>> +	data->nb_rx_queues = queues;
>> +	data->nb_tx_queues = queues;
>> +	data->dev_link = pmd_link;
>> +	data->mac_addrs = eth_addr;
>> +
>> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
>> +	 * vhost PMD resources won't be shared between multi processes.
>> +	 */
>> +	eth_dev->data = data;
>> +	eth_dev->dev_ops = &ops;
>> +	eth_dev->driver = NULL;
>> +	eth_dev->data->dev_flags =
>> +		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
>> +	eth_dev->data->kdrv = RTE_KDRV_NONE;
>> +	eth_dev->data->drv_name = internal->dev_name;
>> +	eth_dev->data->numa_node = numa_node;
> Cosmetics but you can access as data->xxx instead of eth_dev->data->xxx
>
>> +static int
>> +rte_pmd_vhost_devinit(const char *name, const char *params)
>> +{
>> +	struct rte_kvargs *kvlist = NULL;
>> +	int ret = 0;
>> +	int index;
>> +	char *iface_name;
>> +	uint16_t queues;
>> +
>> +	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
>> +
>> +	if (strlen(name) < strlen("eth_vhost"))
>> +		return -1;
> No need to do this check, rte_eal_vdev_init() already checks name and this functions called only if there is a match.
>
>> +
>> +	index = strtol(name + strlen("eth_vhost"), NULL, 0);
>> +	if (errno == ERANGE)
>> +		return -1;
> Does device name has to contain integer, does "eth_vhostA" valid name? If so does it make sense to use port_id instead of index?
>
>> +
>> +	kvlist = rte_kvargs_parse(params, valid_arguments);
>> +	if (kvlist == NULL)
>> +		return -1;
>> +
>> +	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
>> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
>> +					 &open_iface, &iface_name);
>> +		if (ret < 0)
>> +			goto out_free;
>> +	} else {
>> +		ret = -1;
>> +		goto out_free;
>> +	}
>> +
>> +	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
>> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
>> +					 &open_queues, &queues);
>> +		if (ret < 0)
>> +			goto out_free;
>> +
>> +	} else
>> +		queues = 1;
>> +
>> +	eth_dev_vhost_create(name, index,
>> +			iface_name, queues, rte_socket_id());
> syntax: no line wrap required here
>
>> +
>> +out_free:
>> +	rte_kvargs_free(kvlist);
>> +	return ret;
>> +}
>> +
>> +static int
>> +rte_pmd_vhost_devuninit(const char *name)
>> +{
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct pmd_internal *internal;
>> +	unsigned int i;
>> +
>> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
>> +
>> +	if (name == NULL)
>> +		return -EINVAL;
> This check is not required, already done in rte_eal_vdev_uninit()
>
>> +
>> +	/* find an ethdev entry */
>> +	eth_dev = rte_eth_dev_allocated(name);
>> +	if (eth_dev == NULL)
>> +		return -ENODEV;
>> +
>> +	internal = eth_dev->data->dev_private;
>> +
>> +	rte_free(vring_states[internal->port_id]);
>> +	vring_states[internal->port_id] = NULL;
>> +
>> +	pthread_mutex_lock(&internal_list_lock);
>> +	TAILQ_REMOVE(&internals_list, internal, next);
>> +	pthread_mutex_unlock(&internal_list_lock);
>> +
>> +	eth_dev_stop(eth_dev);
>> +
>> +	if ((internal) && (internal->dev_name))
> if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here.
>
>> +		free(internal->dev_name);
>> +	if ((internal) && (internal->iface_name))
> Do we need parentheses around internal->iface_name (and internal if it will stay)
>
>> +		free(internal->iface_name);
>> +
>> +	rte_free(eth_dev->data->mac_addrs);
>> +	rte_free(eth_dev->data);
>> +
>> +	for (i = 0; i < internal->nb_rx_queues; i++)
>> +		rte_free(internal->rx_vhost_queues[i]);
>> +	for (i = 0; i < internal->nb_tx_queues; i++)
>> +		rte_free(internal->tx_vhost_queues[i]);
>> +	rte_free(internal);
>> +
>> +	rte_eth_dev_release_port(eth_dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct rte_driver pmd_vhost_drv = {
>> +	.name = "eth_vhost",
>> +	.type = PMD_VDEV,
>> +	.init = rte_pmd_vhost_devinit,
>> +	.uninit = rte_pmd_vhost_devuninit,
>> +};
>> +
>> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
>> diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
>> new file mode 100644
>> index 0000000..8aa894a
>> --- /dev/null
>> +++ b/drivers/net/vhost/rte_eth_vhost.h
>> +/**
>> + * Disable features in feature_mask. Returns 0 on success.
>> + */
>> +int rte_eth_vhost_feature_disable(uint64_t feature_mask);
>> +
>> +/**
>> + *  Enable features in feature_mask. Returns 0 on success.
>> + */
>> +int rte_eth_vhost_feature_enable(uint64_t feature_mask);
>> +
>> +/* Returns currently supported vhost features */
> This also can be commented in doxygen style.
>
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index 8ecab41..04f7087 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -159,7 +159,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>>  
>> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
>> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>> +
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
>> +
>> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
>> +
>> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
> I guess "!" in comment is to say this if block is for SHARED_LIB==n, if so we shouldn't update the comment to remove "!",
> And the line you have added should have "!" in comment : "endif # $(CONFIG_RTE_LIBRTE_VHOST)"
>
>>  
>>  endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
>>  
>> -- 
>> 2.1.4
>>