From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tetsuya Mukawa Subject: Re: [PATCH v6 2/2] vhost: Add VHOST PMD Date: Wed, 3 Feb 2016 11:13:44 +0900 Message-ID: <56B16258.4050607@igel.co.jp> References: <1448355603-21275-2-git-send-email-mukawa@igel.co.jp> <1454411922-5597-3-git-send-email-mukawa@igel.co.jp> <20160202234319.GA925@sivlogin002.ir.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org, ann.zhuangyanying@huawei.com, yuanhan.liu@intel.com To: ferruh.yigit@intel.com Return-path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by dpdk.org (Postfix) with ESMTP id E889B95CC for ; Wed, 3 Feb 2016 03:13:48 +0100 (CET) Received: by mail-pa0-f47.google.com with SMTP id ho8so4422143pac.2 for ; Tue, 02 Feb 2016 18:13:48 -0800 (PST) In-Reply-To: <20160202234319.GA925@sivlogin002.ir.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2016/02/03 8:43, Ferruh Yigit wrote: > On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote: >> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst >> index 33c9cea..5819cdb 100644 >> --- a/doc/guides/nics/index.rst >> +++ b/doc/guides/nics/index.rst >> @@ -47,6 +47,7 @@ Network Interface Controller Drivers >> nfp >> szedata2 >> virtio >> + vhost >> vmxnet3 >> pcap_ring >> >> diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst > Should this be 2.3 release notes? Hi Ferruh, Thanks for your checking. Yes, it should be. I will fix it. >> + >> +struct pmd_internal { >> + TAILQ_ENTRY(pmd_internal) next; >> + char *dev_name; >> + char *iface_name; >> + unsigned nb_rx_queues; >> + unsigned nb_tx_queues; >> + uint8_t port_id; > nb_rx_queuues, nb_tx_queues and port_id are duplicated in rte_eth_dev_data, there is no reason to not use them but create new ones. > But you may need to keep list of eth devices instead of internals_list for this update, not sure. > please check: http://dpdk.org/dev/patchwork/patch/10284/ It seems I wrongly understand how to use queues in virtual PMD, then duplicates some values. Sure, I will follow above patch, and fix all same issues in my patch. Also will check Null PMD fixing. >> + for (i = 0; i < internal->nb_rx_queues; i++) { >> + vq = internal->rx_vhost_queues[i]; >> + if (vq == NULL) >> + continue; >> + vq->device = dev; >> + vq->internal = internal; >> + rte_vhost_enable_guest_notification( >> + dev, vq->virtqueue_id, 0); > syntax: no line wrap required here Will fix. > >> + } >> + for (i = 0; i < internal->nb_tx_queues; i++) { >> + vq = internal->tx_vhost_queues[i]; >> + if (vq == NULL) >> + continue; >> + vq->device = dev; >> + vq->internal = internal; >> + rte_vhost_enable_guest_notification( >> + dev, vq->virtqueue_id, 0); > syntax: no line wrap required here Will fix it also. >> + >> +static int >> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable) >> +{ >> + struct rte_vhost_vring_state *state; >> + struct pmd_internal *internal; >> +#ifdef RTE_LIBRTE_VHOST_NUMA >> + int newnode, ret; >> +#endif >> + >> + if (dev == NULL) { >> + RTE_LOG(ERR, PMD, "Invalid argument\n"); >> + return -1; >> + } >> + >> + internal = find_internal_resource(dev->ifname); > Can find_internal_resource() return NULL here? Will add null pointer checking here. >> + state = vring_states[internal->port_id]; >> + if (!state) { >> + RTE_LOG(ERR, PMD, "Unused port\n"); >> + return -1; >> + } >> + >> +#ifdef RTE_LIBRTE_VHOST_NUMA >> + ret = get_mempolicy(&newnode, NULL, 0, dev, >> + MPOL_F_NODE | MPOL_F_ADDR); >> + if (ret < 0) { >> + RTE_LOG(ERR, PMD, "Unknow numa node\n"); >> + return -1; >> + } >> + >> + rte_eth_devices[internal->port_id].data->numa_node = newnode; > Isn't dev->priv already has eth_dev, do we need to access to eth_dev as rte_eth_devices[...] > valid for above, instead of find_internal_resource() can't we use dev->priv->data->dev_private 'dev->priv' will be filled in new_device(). And this event will be come before calling new_device(). Because of this, the function accesses rte_eth_devices[]. > >> +static int >> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, >> + uint16_t nb_rx_desc __rte_unused, >> + unsigned int socket_id, >> + const struct rte_eth_rxconf *rx_conf __rte_unused, >> + struct rte_mempool *mb_pool) >> +{ >> + struct pmd_internal *internal = dev->data->dev_private; >> + struct vhost_queue *vq; >> + >> + rte_free(internal->rx_vhost_queues[rx_queue_id]); > Why free here, initially this value already should be NULL? > If possible to call queue_setup multiple times, does make sense to free in rx/tx_queue_release() functions? Yes, you are right. I forgot to delete debug code. Will remove it. >> + >> + vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue), >> + RTE_CACHE_LINE_SIZE, socket_id); >> + if (vq == NULL) { >> + RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n"); >> + return -ENOMEM; >> + } > Other vPMDs use static memory for queues in internals struct to escape from allocate/free with a cost of memory consumption > Just another option if you prefer. This is because queues may be in different numa node in vhost PMD case. >> + dev_info->min_rx_bufsize = 0; >> +} >> + >> +static void >> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats) > why name is igb_stats, historical? Yeah, it's came from copy and paste. I will fix it and below issues you pointed out. I appreciate your carefully reviewing! Tetsuya >> +static int >> +eth_dev_vhost_create(const char *name, int index, >> + char *iface_name, >> + int16_t queues, >> + const unsigned numa_node) >> +{ >> + struct rte_eth_dev_data *data = NULL; >> + struct pmd_internal *internal = NULL; >> + struct rte_eth_dev *eth_dev = NULL; >> + struct ether_addr *eth_addr = NULL; >> + struct rte_vhost_vring_state *vring_state = NULL; >> + >> + RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n", >> + numa_node); >> + >> + /* now do all data allocation - for eth_dev structure, dummy pci driver >> + * and internal (private) data >> + */ >> + data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); >> + if (data == NULL) >> + goto error; >> + >> + internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node); >> + if (internal == NULL) >> + goto error; >> + >> + eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node); >> + if (eth_addr == NULL) >> + goto error; >> + *eth_addr = base_eth_addr; >> + eth_addr->addr_bytes[5] = index; >> + >> + /* reserve an ethdev entry */ >> + eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); >> + if (eth_dev == NULL) >> + goto error; >> + >> + TAILQ_INIT(&(eth_dev->link_intr_cbs)); >> + >> + /* now put it all together >> + * - store queue data in internal, >> + * - store numa_node info in ethdev data >> + * - point eth_dev_data to internals >> + * - and point eth_dev structure to new eth_dev_data structure >> + */ >> + internal->nb_rx_queues = queues; >> + internal->nb_tx_queues = queues; >> + internal->dev_name = strdup(name); >> + if (internal->dev_name == NULL) >> + goto error; > eth_dev successfully allocated, do we need to do something in error case? > >> + internal->iface_name = strdup(iface_name); >> + if (internal->iface_name == NULL) { >> + free(internal->dev_name); >> + goto error; >> + } >> + internal->port_id = eth_dev->data->port_id; >> + >> + vring_state = rte_zmalloc_socket(name, >> + sizeof(*vring_state), 0, numa_node); >> + if (vring_state == NULL) > free dev_name & iface_name. > >> + goto error; >> + rte_spinlock_init(&vring_state->lock); >> + vring_states[eth_dev->data->port_id] = vring_state; >> + >> + pthread_mutex_lock(&internal_list_lock); >> + TAILQ_INSERT_TAIL(&internals_list, internal, next); >> + pthread_mutex_unlock(&internal_list_lock); >> + >> + data->dev_private = internal; >> + data->port_id = eth_dev->data->port_id; >> + memmove(data->name, eth_dev->data->name, sizeof(data->name)); >> + data->nb_rx_queues = queues; >> + data->nb_tx_queues = queues; >> + data->dev_link = pmd_link; >> + data->mac_addrs = eth_addr; >> + >> + /* We'll replace the 'data' originally allocated by eth_dev. So the >> + * vhost PMD resources won't be shared between multi processes. >> + */ >> + eth_dev->data = data; >> + eth_dev->dev_ops = &ops; >> + eth_dev->driver = NULL; >> + eth_dev->data->dev_flags = >> + RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC; >> + eth_dev->data->kdrv = RTE_KDRV_NONE; >> + eth_dev->data->drv_name = internal->dev_name; >> + eth_dev->data->numa_node = numa_node; > Cosmetics but you can access as data->xxx instead of eth_dev->data->xxx > >> +static int >> +rte_pmd_vhost_devinit(const char *name, const char *params) >> +{ >> + struct rte_kvargs *kvlist = NULL; >> + int ret = 0; >> + int index; >> + char *iface_name; >> + uint16_t queues; >> + >> + RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name); >> + >> + if (strlen(name) < strlen("eth_vhost")) >> + return -1; > No need to do this check, rte_eal_vdev_init() already checks name and this functions called only if there is a match. > >> + >> + index = strtol(name + strlen("eth_vhost"), NULL, 0); >> + if (errno == ERANGE) >> + return -1; > Does device name has to contain integer, does "eth_vhostA" valid name? If so does it make sense to use port_id instead of index? > >> + >> + kvlist = rte_kvargs_parse(params, valid_arguments); >> + if (kvlist == NULL) >> + return -1; >> + >> + if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) { >> + ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG, >> + &open_iface, &iface_name); >> + if (ret < 0) >> + goto out_free; >> + } else { >> + ret = -1; >> + goto out_free; >> + } >> + >> + if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) { >> + ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG, >> + &open_queues, &queues); >> + if (ret < 0) >> + goto out_free; >> + >> + } else >> + queues = 1; >> + >> + eth_dev_vhost_create(name, index, >> + iface_name, queues, rte_socket_id()); > syntax: no line wrap required here > >> + >> +out_free: >> + rte_kvargs_free(kvlist); >> + return ret; >> +} >> + >> +static int >> +rte_pmd_vhost_devuninit(const char *name) >> +{ >> + struct rte_eth_dev *eth_dev = NULL; >> + struct pmd_internal *internal; >> + unsigned int i; >> + >> + RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name); >> + >> + if (name == NULL) >> + return -EINVAL; > This check is not required, already done in rte_eal_vdev_uninit() > >> + >> + /* find an ethdev entry */ >> + eth_dev = rte_eth_dev_allocated(name); >> + if (eth_dev == NULL) >> + return -ENODEV; >> + >> + internal = eth_dev->data->dev_private; >> + >> + rte_free(vring_states[internal->port_id]); >> + vring_states[internal->port_id] = NULL; >> + >> + pthread_mutex_lock(&internal_list_lock); >> + TAILQ_REMOVE(&internals_list, internal, next); >> + pthread_mutex_unlock(&internal_list_lock); >> + >> + eth_dev_stop(eth_dev); >> + >> + if ((internal) && (internal->dev_name)) > if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here. > >> + free(internal->dev_name); >> + if ((internal) && (internal->iface_name)) > Do we need parentheses around internal->iface_name (and internal if it will stay) > >> + free(internal->iface_name); >> + >> + rte_free(eth_dev->data->mac_addrs); >> + rte_free(eth_dev->data); >> + >> + for (i = 0; i < internal->nb_rx_queues; i++) >> + rte_free(internal->rx_vhost_queues[i]); >> + for (i = 0; i < internal->nb_tx_queues; i++) >> + rte_free(internal->tx_vhost_queues[i]); >> + rte_free(internal); >> + >> + rte_eth_dev_release_port(eth_dev); >> + >> + return 0; >> +} >> + >> +static struct rte_driver pmd_vhost_drv = { >> + .name = "eth_vhost", >> + .type = PMD_VDEV, >> + .init = rte_pmd_vhost_devinit, >> + .uninit = rte_pmd_vhost_devuninit, >> +}; >> + >> +PMD_REGISTER_DRIVER(pmd_vhost_drv); >> diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h >> new file mode 100644 >> index 0000000..8aa894a >> --- /dev/null >> +++ b/drivers/net/vhost/rte_eth_vhost.h >> +/** >> + * Disable features in feature_mask. Returns 0 on success. >> + */ >> +int rte_eth_vhost_feature_disable(uint64_t feature_mask); >> + >> +/** >> + * Enable features in feature_mask. Returns 0 on success. >> + */ >> +int rte_eth_vhost_feature_enable(uint64_t feature_mask); >> + >> +/* Returns currently supported vhost features */ > This also can be commented in doxygen style. > >> diff --git a/mk/rte.app.mk b/mk/rte.app.mk >> index 8ecab41..04f7087 100644 >> --- a/mk/rte.app.mk >> +++ b/mk/rte.app.mk >> @@ -159,7 +159,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += -lrte_pmd_qat >> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB) += -lrte_pmd_aesni_mb >> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB) += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB >> >> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB) >> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) >> + >> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost >> + >> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST) >> + >> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB) > I guess "!" in comment is to say this if block is for SHARED_LIB==n, if so we shouldn't update the comment to remove "!", > And the line you have added should have "!" in comment : "endif # $(CONFIG_RTE_LIBRTE_VHOST)" > >> >> endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS >> >> -- >> 2.1.4 >>