All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/6] Introduce AF_XDP PMD
@ 2019-03-01  8:09 Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
                   ` (16 more replies)
  0 siblings, 17 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in bpf-next/master. [3]
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the latest bpf-next/master, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. extra steps to build dpdk 

   Add below lines to drivers/net/af_xdp/Makefile 

   CFLAGS += -I/<your linux src root>/tools/include
   CFLAGS += -I/<your linux src root>/tools/lib/bpf

   add below line to config/common_linuxapp to enble the config for af_xdp pmd 

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y


4. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev eth_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.


[1] https://lwn.net/Articles/750845/
[2] https://fosdem.org/2018/schedule/event/af_xdp/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=143bdc2e27b44d2559596424bfb017d578be33eb

Xiaolong Ye (6):
  net/af_xdp: introduce AF_XDP PMD driver
  lib/mbuf: enable parse flags when create mempool
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy
  app/testpmd: add mempool flags parameter

 MAINTAINERS                                   |   6 +
 app/test-pmd/parameters.c                     |  12 +
 app/test-pmd/testpmd.c                        |  17 +-
 app/test-pmd/testpmd.h                        |   1 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  43 +
 doc/guides/rel_notes/release_18_11.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  31 +
 drivers/net/af_xdp/meson.build                |   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 988 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 lib/librte_mbuf/rte_mbuf.c                    |  15 +-
 lib/librte_mbuf/rte_mbuf.h                    |   8 +-
 lib/librte_mempool/rte_mempool.c              |   3 +
 lib/librte_mempool/rte_mempool.h              |   1 +
 mk/rte.app.mk                                 |   1 +
 17 files changed, 1139 insertions(+), 11 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-01  8:09 ` Xiaolong Ye
  2019-03-01 15:38   ` Luca Boccassi
                     ` (4 more replies)
  2019-03-01  8:09 ` [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool Xiaolong Ye
                   ` (15 subsequent siblings)
  16 siblings, 5 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  43 +
 doc/guides/rel_notes/release_18_11.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  31 +
 drivers/net/af_xdp/meson.build                |   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 903 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 mk/rte.app.mk                                 |   1 +
 10 files changed, 1008 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 15c53888c..baa92a732 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -469,6 +469,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 7c6da5165..c45d2dad1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..126d9df3c
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,43 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel with XDP sockets configuration enabled;
+*  libbpf with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 65bab557d..e0918441a 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -229,6 +229,13 @@ New Features
   The AESNI MB PMD has been updated with additional support for the AES-GCM
   algorithm.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Added NXP CAAM JR PMD.**
 
   Added the new caam job ring driver for NXP platforms. See the
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 670d7f75a..93cccd2a8 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..e3755fff2
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+
+CFLAGS += -O3
+# below line should be removed
+CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
+CFLAGS += -I/root/yexl/shared_mks0/linux/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..4b6652685
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+sources = files('rte_eth_af_xdp.c')
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..6de769650
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,903 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	unsigned long rx_pkts;
+	unsigned long rx_bytes;
+	unsigned long rx_dropped;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	unsigned long tx_pkts;
+	unsigned long err_pkts;
+	unsigned long tx_bytes;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	void *addr = NULL;
+	int i, ret = 0;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (!ret) {
+		RTE_LOG(ERR, PMD, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		rte_ring_dequeue(umem->buf_ring, &addr);
+		*xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, reserve_size);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbuf;
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
+		nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (!rcvd)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	for (i = 0; i < rcvd; i++) {
+		uint64_t addr = xsk_ring_cons__rx_desc(rx, idx_rx)->addr;
+		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
+		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+		if (mbuf) {
+			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+			rte_pktmbuf_pkt_len(mbuf) =
+				rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			dropped++;
+		}
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->rx_pkts += (rcvd - dropped);
+	rxq->rx_bytes += rx_bytes;
+	rxq->rx_dropped += dropped;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	int i, n;
+	uint32_t idx_cq;
+	uint64_t addr;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+	if (n > 0) {
+		for (i = 0; i < n; i++) {
+			addr = *xsk_ring_cons__comp_addr(cq,
+							 idx_cq++);
+			rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		}
+
+		xsk_ring_cons__release(cq, n);
+	}
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+	int ret;
+
+	while (1) {
+		ret = sendto(xsk_socket__fd(txq->pair->xsk), NULL, 0,
+			     MSG_DONTWAIT, NULL, 0);
+
+		/* everything is ok */
+		if (ret >= 0)
+			break;
+
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
+		nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (!nb_pkts)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx)
+			!= nb_pkts)
+		return 0;
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		char *pkt;
+		unsigned int buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->err_pkts += nb_pkts - valid;
+	txq->tx_pkts += valid;
+	txq->tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+	dev_info->min_rx_bufsize = 0;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i;
+
+	optlen = sizeof(struct xdp_statistics);
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
+		stats->q_errors[i] = internals->tx_queues[i].err_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].rx_dropped;
+		getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, XDP_STATISTICS,
+				&xdp_stats, &optlen);
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += stats->q_errors[i];
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->rx_queues[i].rx_pkts = 0;
+		internals->rx_queues[i].rx_bytes = 0;
+		internals->rx_queues[i].rx_dropped = 0;
+
+		internals->tx_queues[i].tx_pkts = 0;
+		internals->tx_queues[i].err_pkts = 0;
+		internals->tx_queues[i].tx_bytes = 0;
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		RTE_LOG(ERR, PMD, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	RTE_LOG(INFO, PMD, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (!rxq->umem)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	if (umem->buffer)
+		free(umem->buffer);
+
+	free(umem);
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	char ring_name[0x100];
+	int ret;
+	uint64_t i;
+
+	umem = calloc(1, sizeof(*umem));
+	if (!umem) {
+		RTE_LOG(ERR, PMD, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	snprintf(ring_name, 0x100, "af_xdp_ring");
+	umem->buf_ring = rte_ring_create(ring_name,
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (!umem->buf_ring) {
+		RTE_LOG(ERR, PMD,
+			"Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		RTE_LOG(ERR, PMD, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (!rxq->umem) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		RTE_LOG(ERR, PMD, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		RTE_LOG(ERR, PMD, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+	int xsk_fd = xsk_socket__fd(rxq->xsk);
+
+	if (xsk_fd) {
+		close(xsk_fd);
+		if (internals->umem) {
+			xdp_umem_destroy(internals->umem);
+			internals->umem = NULL;
+		}
+	}
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	unsigned int buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret = 0;
+
+	if (mb_pool == NULL) {
+		RTE_LOG(ERR, PMD,
+			"Invalid mb_pool\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	if (dev->data->nb_rx_queues <= rx_queue_id) {
+		RTE_LOG(ERR, PMD,
+			"Invalid rx queue id: %d\n", rx_queue_id);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		RTE_LOG(ERR, PMD,
+			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		RTE_LOG(ERR, PMD,
+			"Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	if (dev->data->nb_tx_queues <= tx_queue_id) {
+		RTE_LOG(ERR, PMD, "Invalid tx queue id: %d\n", tx_queue_id);
+		return -EINVAL;
+	}
+
+	RTE_LOG(WARNING, PMD, "tx queue setup size=%d will be skipped\n",
+		nb_tx_desc);
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", internals->if_name);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", if_name);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static struct rte_vdev_driver pmd_af_xdp_drv;
+
+static void
+parse_parameters(struct rte_kvargs *kvlist,
+		 char **if_name,
+		 int *queue_idx)
+{
+	struct rte_kvargs_pair *pair = NULL;
+	unsigned int k_idx;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+		pair = &kvlist->pairs[k_idx];
+		if (strstr(pair->key, ETH_AF_XDP_IFACE_ARG))
+			*if_name = pair->value;
+		else if (strstr(pair->key, ETH_AF_XDP_QUEUE_IDX_ARG))
+			*queue_idx = atoi(pair->value);
+	}
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strcpy(ifr.ifr_name, if_name);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, 6);
+
+	close(sock);
+	*if_index = if_nametoindex(if_name);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static int
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	struct rte_eth_dev *eth_dev = NULL;
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals = NULL;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (!internals)
+		return -ENOMEM;
+
+	internals->queue_idx = queue_idx;
+	strcpy(internals->if_name, if_name);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (!eth_dev)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	rte_eth_dev_probing_finish(eth_dev);
+	return 0;
+
+err:
+	rte_free(internals);
+	return -1;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char *if_name = NULL;
+	int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev;
+	const char *name;
+	int ret;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (!eth_dev) {
+			RTE_LOG(ERR, PMD, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (!kvlist) {
+		RTE_LOG(ERR, PMD,
+			"Invalid kvargs\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	parse_parameters(kvlist, &if_name,
+			 &queue_idx);
+
+	ret = init_internals(dev, if_name, queue_idx);
+
+	rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	RTE_LOG(INFO, PMD, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (!dev)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (!eth_dev)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+	rte_free(internals);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_ALIAS(net_af_xdp, eth_af_xdp);
+RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..ef3539840
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index d0ab942d5..db3271c7b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD)        += -lrte_pmd_avf
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
@ 2019-03-01  8:09 ` Xiaolong Ye
  2019-03-05  8:30   ` David Marchand
  2019-03-01  8:09 ` [PATCH v1 3/6] lib/mempool: allow page size aligned mempool Xiaolong Ye
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

This give the option that applicaiton can configure each
memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD).

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 15 ++++++++++++---
 lib/librte_mbuf/rte_mbuf.h |  8 +++++++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..0f6fcff28 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -110,7 +110,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 struct rte_mempool *
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +130,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -164,9 +164,18 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 	int socket_id)
 {
 	return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
-			data_room_size, socket_id, NULL);
+			data_room_size, 0, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with NO_SPREAD */
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
+			data_room_size, flags, socket_id, NULL);
+}
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..7a3faf11c 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
@@ -1306,7 +1312,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 struct rte_mempool *
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name);
+	unsigned int flags, int socket_id, const char *ops_name);
 
 /**
  * Get the data room size of mbufs stored in a pktmbuf_pool
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v1 3/6] lib/mempool: allow page size aligned mempool
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool Xiaolong Ye
@ 2019-03-01  8:09 ` Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 4/6] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..33ab6a2b4 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+			align = getpagesize();
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..75553b36f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v1 4/6] net/af_xdp: use mbuf mempool for buffer management
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (2 preceding siblings ...)
  2019-03-01  8:09 ` [PATCH v1 3/6] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-01  8:09 ` Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 5/6] net/af_xdp: enable zero copy Xiaolong Ye
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool.
mbuf be allocated from rte_mempool can be convert to xdp_desc's
address and vice versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 121 +++++++++++++++++-----------
 1 file changed, 75 insertions(+), 46 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 6de769650..e8270360c 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -41,7 +41,11 @@
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -54,7 +58,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -108,12 +112,32 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
-	void *addr = NULL;
+	uint64_t addr;
 	int i, ret = 0;
 
 	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
@@ -123,8 +147,9 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 	}
 
 	for (i = 0; i < reserve_size; i++) {
-		rte_ring_dequeue(umem->buf_ring, &addr);
-		*xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		addr = mbuf_to_addr(umem, mbuf);
+		*xsk_ring_prod__fill_addr(fq, idx++) = addr;
 	}
 
 	xsk_ring_prod__submit(fq, reserve_size);
@@ -172,7 +197,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		} else {
 			dropped++;
 		}
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -195,9 +220,8 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	n = xsk_ring_cons__peek(cq, size, &idx_cq);
 	if (n > 0) {
 		for (i = 0; i < n; i++) {
-			addr = *xsk_ring_cons__comp_addr(cq,
-							 idx_cq++);
-			rte_ring_enqueue(umem->buf_ring, (void *)addr);
+			addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 		}
 
 		xsk_ring_cons__release(cq, n);
@@ -233,7 +257,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -243,11 +267,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (!nb_pkts)
-		return 0;
-
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx)
 			!= nb_pkts)
 		return 0;
@@ -260,7 +279,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (!mbuf_to_tx) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -276,10 +300,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->err_pkts += nb_pkts - valid;
 	txq->tx_pkts += valid;
 	txq->tx_bytes += tx_bytes;
@@ -429,12 +449,28 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	if (umem->buffer)
-		free(umem->buffer);
+	if (umem->mb_pool)
+		rte_mempool_free(umem->mb_pool);
 
 	free(umem);
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -443,10 +479,9 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
-	char ring_name[0x100];
+	void *base_addr = NULL;
+	char pool_name[0x100];
 	int ret;
-	uint64_t i;
 
 	umem = calloc(1, sizeof(*umem));
 	if (!umem) {
@@ -454,28 +489,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	snprintf(ring_name, 0x100, "af_xdp_ring");
-	umem->buf_ring = rte_ring_create(ring_name,
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (!umem->buf_ring) {
+	snprintf(pool_name, 0x100, "af_xdp_ring");
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+
+	if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
 		RTE_LOG(ERR, PMD,
-			"Failed to create rte_ring\n");
+			"Failed to create rte_mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		RTE_LOG(ERR, PMD, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -484,7 +514,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		RTE_LOG(ERR, PMD, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -880,8 +910,7 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_free(internals->umem);
 	rte_free(internals);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v1 5/6] net/af_xdp: enable zero copy
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (3 preceding siblings ...)
  2019-03-01  8:09 ` [PATCH v1 4/6] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-01  8:09 ` Xiaolong Ye
  2019-03-01  8:09 ` [PATCH v1 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 126 ++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 35 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index e8270360c..bfb93f50d 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -60,6 +60,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct pkt_rx_queue {
@@ -74,6 +75,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct pkt_tx_queue {
@@ -187,17 +189,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
 		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
-		if (mbuf) {
-			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+		if (rxq->zc) {
+			mbuf = addr_to_mbuf(rxq->umem, addr);
 			rte_pktmbuf_pkt_len(mbuf) =
 				rte_pktmbuf_data_len(mbuf) = len;
-			rx_bytes += len;
 			bufs[count++] = mbuf;
 		} else {
-			dropped++;
+			mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+			if (mbuf) {
+				memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+				rte_pktmbuf_pkt_len(mbuf) =
+					rte_pktmbuf_data_len(mbuf) = len;
+				rx_bytes += len;
+				bufs[count++] = mbuf;
+			} else {
+				dropped++;
+			}
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 		}
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -278,22 +287,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (!mbuf_to_tx) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (!mbuf_to_tx) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -471,7 +487,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -489,20 +505,26 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	snprintf(pool_name, 0x100, "af_xdp_ring");
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-
-	if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
-		RTE_LOG(ERR, PMD,
-			"Failed to create rte_mempool\n");
-		goto err;
+	if (!mb_pool) {
+		snprintf(pool_name, 0x100, "af_xdp_ring");
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
+			RTE_LOG(ERR, PMD,
+					"Failed to create rte_mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
+
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
 	ret = xsk_umem__create(&umem->umem, base_addr,
@@ -523,16 +545,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (!rxq->umem) {
 		ret = -ENOMEM;
 		goto err;
@@ -633,7 +682,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
 		RTE_LOG(ERR, PMD,
 			"Failed to configure xdp socket\n");
 		ret = -EINVAL;
@@ -642,6 +691,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		RTE_LOG(INFO, PMD,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v1 6/6] app/testpmd: add mempool flags parameter
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (4 preceding siblings ...)
  2019-03-01  8:09 ` [PATCH v1 5/6] net/af_xdp: enable zero copy Xiaolong Ye
@ 2019-03-01  8:09 ` Xiaolong Ye
  2019-03-01 18:34   ` Stephen Hemminger
  2019-03-11 16:46   ` Ferruh Yigit
  2019-03-11 16:43 ` [PATCH v1 0/6] Introduce AF_XDP PMD Ferruh Yigit
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-01  8:09 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Xiaolong Ye

When create rte_mempool, flags can be parsed from command line.
Now, it is possible for testpmd to create a af_xdp friendly
mempool (which enable zero copy).

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 app/test-pmd/parameters.c | 12 ++++++++++++
 app/test-pmd/testpmd.c    | 17 ++++++++++-------
 app/test-pmd/testpmd.h    |  1 +
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 38b419767..9d5be0007 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -61,6 +61,7 @@ usage(char* progname)
 	       "--tx-first | --stats-period=PERIOD | "
 	       "--coremask=COREMASK --portmask=PORTMASK --numa "
 	       "--mbuf-size= | --total-num-mbufs= | "
+	       "--mp-flags= | "
 	       "--nb-cores= | --nb-ports= | "
 #ifdef RTE_LIBRTE_CMDLINE
 	       "--eth-peers-configfile= | "
@@ -105,6 +106,7 @@ usage(char* progname)
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
 	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mp-flags=N: set the flags when create mbuf memory pool.\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -585,6 +587,7 @@ launch_args_parse(int argc, char** argv)
 		{ "ring-numa-config",           1, 0, 0 },
 		{ "socket-num",			1, 0, 0 },
 		{ "mbuf-size",			1, 0, 0 },
+		{ "mp-flags",			1, 0, 0 },
 		{ "total-num-mbufs",		1, 0, 0 },
 		{ "max-pkt-len",		1, 0, 0 },
 		{ "pkt-filter-mode",            1, 0, 0 },
@@ -811,6 +814,15 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "mbuf-size should be > 0 and < 65536\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-flags")) {
+				n = atoi(optarg);
+				if (n > 0 && n <= 0xFFFF)
+					mp_flags = (uint16_t)n;
+				else
+					rte_exit(EXIT_FAILURE,
+						 "mp-flags should be > 0 and < 65536\n");
+			}
+
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
 				if (n > 1024)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 98c1baa8b..e0519be3c 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -195,6 +195,7 @@ uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
 uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint16_t mp_flags = 0; /**< flags parsed when create mempool */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -834,6 +835,7 @@ setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
  */
 static void
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
+		 unsigned int flags,
 		 unsigned int socket_id)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
@@ -853,8 +855,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 			/* wrapper to rte_mempool_create() */
 			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
 					rte_mbuf_best_mempool_ops());
-			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, flags, socket_id);
 			break;
 		}
 	case MP_ALLOC_ANON:
@@ -891,8 +893,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 
 			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
 					rte_mbuf_best_mempool_ops());
-			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-					mb_mempool_cache, 0, mbuf_seg_size,
+			rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size, flags,
 					heap_socket);
 			break;
 		}
@@ -1128,13 +1130,14 @@ init_config(void)
 
 		for (i = 0; i < num_sockets; i++)
 			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-					 socket_ids[i]);
+					 mp_flags, socket_ids[i]);
 	} else {
 		if (socket_num == UMA_NO_CONFIG)
-			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, 0);
+			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
+					 mp_flags, 0);
 		else
 			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-						 socket_num);
+					 mp_flags, socket_num);
 	}
 
 	init_port_config();
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fa4887853..3ddb70e3e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -408,6 +408,7 @@ extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
 extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint16_t mp_flags;  /**< flags for mempool creation. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
@ 2019-03-01 15:38   ` Luca Boccassi
  2019-03-02  8:14     ` Ye Xiaolong
  2019-03-01 18:31   ` Stephen Hemminger
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-03-01 15:38 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang

On Fri, 2019-03-01 at 16:09 +0800, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to
> [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer
> registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  MAINTAINERS                                   |   6 +
>  config/common_base                            |   5 +
>  doc/guides/nics/af_xdp.rst                    |  43 +
>  doc/guides/rel_notes/release_18_11.rst        |   7 +
>  drivers/net/Makefile                          |   1 +
>  drivers/net/af_xdp/Makefile                   |  31 +
>  drivers/net/af_xdp/meson.build                |   7 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 903
> ++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
>  mk/rte.app.mk                                 |   1 +
>  10 files changed, 1008 insertions(+)
>  create mode 100644 doc/guides/nics/af_xdp.rst
>  create mode 100644 drivers/net/af_xdp/Makefile
>  create mode 100644 drivers/net/af_xdp/meson.build
>  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> 
<..>

> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
>  endif
>  
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
>  DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>  DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
>  DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf

You need a similar change in drivers/net/meson.build

> diff --git a/drivers/net/af_xdp/Makefile
> b/drivers/net/af_xdp/Makefile
> new file mode 100644
> index 000000000..e3755fff2
> --- /dev/null
> +++ b/drivers/net/af_xdp/Makefile
> @@ -0,0 +1,31 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_af_xdp.a
> +
> +EXPORT_MAP := rte_pmd_af_xdp_version.map
> +
> +LIBABIVER := 1
> +
> +
> +CFLAGS += -O3
> +# below line should be removed
> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/lib/bpf

Leftovers?

> +CFLAGS += $(WERROR_FLAGS)
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
> +LDLIBS += -lrte_bus_vdev
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/af_xdp/meson.build
> b/drivers/net/af_xdp/meson.build
> new file mode 100644
> index 000000000..4b6652685
> --- /dev/null
> +++ b/drivers/net/af_xdp/meson.build
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +if host_machine.system() != 'linux'
> +	build = false
> +endif
> +sources = files('rte_eth_af_xdp.c')

You need to add a dependency() on libbpf, and given upstream doesn't
ship a pkg-config file yet a fallback to find_library, so that the pmd
can be linked against it.
Check other PMDs to see how dependencies are handled, like
drivers/net/pcap/meson.build

<..>

> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> new file mode 100644
> index 000000000..ef3539840
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> @@ -0,0 +1,4 @@
> +DPDK_2.0 {

2.0 is a bit old :-)

> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index d0ab942d5..db3271c7b 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  +=
> -lrte_mempool_dpaa2
>  endif
>  
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
> -lelf -lbpf

Are symbols from libelf being used by the PMD?

>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD)        += -lrte_pmd_avf

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
  2019-03-01 15:38   ` Luca Boccassi
@ 2019-03-01 18:31   ` Stephen Hemminger
  2019-03-02  8:08     ` Ye Xiaolong
  2019-03-01 18:32   ` Stephen Hemminger
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-01 18:31 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang

On Fri,  1 Mar 2019 16:09:42 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +	if (umem->buffer)
> +		free(umem->buffer);

Minor nit: you don't need to check for NULL free() already handles this.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
  2019-03-01 15:38   ` Luca Boccassi
  2019-03-01 18:31   ` Stephen Hemminger
@ 2019-03-01 18:32   ` Stephen Hemminger
  2019-03-02  8:07     ` Ye Xiaolong
  2019-03-05  8:25   ` David Marchand
  2019-03-11 16:20   ` Ferruh Yigit
  4 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-01 18:32 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang

On Fri,  1 Mar 2019 16:09:42 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +static int
> +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> +{
> +	struct rte_kvargs *kvlist;
> +	char *if_name = NULL;
> +	int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
> +	struct rte_eth_dev *eth_dev;
> +	const char *name;
> +	int ret;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n",
> +		rte_vdev_device_name(dev));

The PMD log type is being phased out. I plan to mark it as deprecated.
All new drivers must use their own log types (see other every other device driver
for how to do this).

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 6/6] app/testpmd: add mempool flags parameter
  2019-03-01  8:09 ` [PATCH v1 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
@ 2019-03-01 18:34   ` Stephen Hemminger
  2019-03-02  8:06     ` Ye Xiaolong
  2019-03-11 16:46   ` Ferruh Yigit
  1 sibling, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-01 18:34 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang

On Fri,  1 Mar 2019 16:09:47 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> When create rte_mempool, flags can be parsed from command line.
> Now, it is possible for testpmd to create a af_xdp friendly
> mempool (which enable zero copy).
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

Please don't expose numeric values to user api

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 6/6] app/testpmd: add mempool flags parameter
  2019-03-01 18:34   ` Stephen Hemminger
@ 2019-03-02  8:06     ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-02  8:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang

Hi,

On 03/01, Stephen Hemminger wrote:
>On Fri,  1 Mar 2019 16:09:47 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> When create rte_mempool, flags can be parsed from command line.
>> Now, it is possible for testpmd to create a af_xdp friendly
>> mempool (which enable zero copy).
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>
>Please don't expose numeric values to user api

Do you mean instead of "--mp-flags=<number>", we should use something like
"--mp_flags=<strings_of_flags>"?

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01 18:32   ` Stephen Hemminger
@ 2019-03-02  8:07     ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-02  8:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang

Hi,

On 03/01, Stephen Hemminger wrote:
>On Fri,  1 Mar 2019 16:09:42 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +static int
>> +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>> +{
>> +	struct rte_kvargs *kvlist;
>> +	char *if_name = NULL;
>> +	int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
>> +	struct rte_eth_dev *eth_dev;
>> +	const char *name;
>> +	int ret;
>> +
>> +	RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n",
>> +		rte_vdev_device_name(dev));
>
>The PMD log type is being phased out. I plan to mark it as deprecated.
>All new drivers must use their own log types (see other every other device driver
>for how to do this).

Thanks for the feedback, will do.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01 18:31   ` Stephen Hemminger
@ 2019-03-02  8:08     ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-02  8:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang

Hi,

On 03/01, Stephen Hemminger wrote:
>On Fri,  1 Mar 2019 16:09:42 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +	if (umem->buffer)
>> +		free(umem->buffer);
>
>Minor nit: you don't need to check for NULL free() already handles this.

Thanks for the suggestion, will change accordingly.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01 15:38   ` Luca Boccassi
@ 2019-03-02  8:14     ` Ye Xiaolong
  2019-03-17  3:34       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-02  8:14 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev, Qi Zhang

Hi, Luca

Thanks for your review.

On 03/01, Luca Boccassi wrote:
>On Fri, 2019-03-01 at 16:09 +0800, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to
>> [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer
>> registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  MAINTAINERS                                   |   6 +
>>  config/common_base                            |   5 +
>>  doc/guides/nics/af_xdp.rst                    |  43 +
>>  doc/guides/rel_notes/release_18_11.rst        |   7 +
>>  drivers/net/Makefile                          |   1 +
>>  drivers/net/af_xdp/Makefile                   |  31 +
>>  drivers/net/af_xdp/meson.build                |   7 +
>>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 903
>> ++++++++++++++++++
>>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
>>  mk/rte.app.mk                                 |   1 +
>>  10 files changed, 1008 insertions(+)
>>  create mode 100644 doc/guides/nics/af_xdp.rst
>>  create mode 100644 drivers/net/af_xdp/Makefile
>>  create mode 100644 drivers/net/af_xdp/meson.build
>>  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> 
><..>
>
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
>>  endif
>>  
>>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
>>  DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>>  DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
>>  DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf
>
>You need a similar change in drivers/net/meson.build

will do.

>
>> diff --git a/drivers/net/af_xdp/Makefile
>> b/drivers/net/af_xdp/Makefile
>> new file mode 100644
>> index 000000000..e3755fff2
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/Makefile
>> @@ -0,0 +1,31 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>> +
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +#
>> +# library name
>> +#
>> +LIB = librte_pmd_af_xdp.a
>> +
>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +
>> +CFLAGS += -O3
>> +# below line should be removed
>> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
>> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/lib/bpf
>
>Leftovers?
>

Yes, will remove in next version.

>> +CFLAGS += $(WERROR_FLAGS)
>> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
>> +LDLIBS += -lrte_bus_vdev
>> +
>> +#
>> +# all source are stored in SRCS-y
>> +#
>> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
>> +
>> +include $(RTE_SDK)/mk/rte.lib.mk
>> diff --git a/drivers/net/af_xdp/meson.build
>> b/drivers/net/af_xdp/meson.build
>> new file mode 100644
>> index 000000000..4b6652685
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/meson.build
>> @@ -0,0 +1,7 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>> +
>> +if host_machine.system() != 'linux'
>> +	build = false
>> +endif
>> +sources = files('rte_eth_af_xdp.c')
>
>You need to add a dependency() on libbpf, and given upstream doesn't
>ship a pkg-config file yet a fallback to find_library, so that the pmd
>can be linked against it.
>Check other PMDs to see how dependencies are handled, like
>drivers/net/pcap/meson.build
>

will check the example to see how to handle it correctly.

><..>
>
>> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> new file mode 100644
>> index 000000000..ef3539840
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> @@ -0,0 +1,4 @@
>> +DPDK_2.0 {
>
>2.0 is a bit old :-)

will change it.

>
>> +
>> +	local: *;
>> +};
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index d0ab942d5..db3271c7b 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  +=
>> -lrte_mempool_dpaa2
>>  endif
>>  
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
>> -lelf -lbpf
>
>Are symbols from libelf being used by the PMD?

Hmm, it is a leftover of RFC, libelf is no longer needed in this version, will
remove it in next version.

Thanks,
Xiaolong
>
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD)        += -lrte_pmd_avf
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-01 18:32   ` Stephen Hemminger
@ 2019-03-05  8:25   ` David Marchand
  2019-03-07  3:19     ` Ye Xiaolong
  2019-03-11 16:20   ` Ferruh Yigit
  4 siblings, 1 reply; 214+ messages in thread
From: David Marchand @ 2019-03-05  8:25 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang

On Fri, Mar 1, 2019 at 9:13 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 65bab557d..e0918441a 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -229,6 +229,13 @@ New Features
>    The AESNI MB PMD has been updated with additional support for the
> AES-GCM
>    algorithm.
>
> +* **Added the AF_XDP PMD.**
> +
> +  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP
> socket
> +  and bind it to a specific netdev queue, it allows a DPDK application to
> send
> +  and receive raw packets through the socket which would bypass the kernel
> +  network stack to achieve high performance packet processing.
> +
>

Should be in 19.05 release notes.


diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> new file mode 100644
> index 000000000..6de769650
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>
> [snip]

+struct pkt_rx_queue {
> +       struct xsk_ring_cons rx;
> +       struct xsk_umem_info *umem;
> +       struct xsk_socket *xsk;
> +       struct rte_mempool *mb_pool;
> +
> +       unsigned long rx_pkts;
> +       unsigned long rx_bytes;
> +       unsigned long rx_dropped;
>

ethdev stats are uint64_t, why declare those internal stats as unsigned
long ?

+
> +       struct pkt_tx_queue *pair;
> +       uint16_t queue_idx;
> +};
> +
> +struct pkt_tx_queue {
> +       struct xsk_ring_prod tx;
> +
> +       unsigned long tx_pkts;
> +       unsigned long err_pkts;
> +       unsigned long tx_bytes;
>

Idem.

+
> +       struct pkt_rx_queue *pair;
> +       uint16_t queue_idx;
> +};
>

[snip]

+static int
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +       struct pmd_internals *internals = dev->data->dev_private;
> +       struct xdp_statistics xdp_stats;
> +       struct pkt_rx_queue *rxq;
> +       socklen_t optlen;
> +       int i;
> +
> +       optlen = sizeof(struct xdp_statistics);
> +       for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +               rxq = &internals->rx_queues[i];
> +               stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
> +               stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
> +
> +               stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
> +               stats->q_errors[i] = internals->tx_queues[i].err_pkts;
>

q_errors[] are for reception errors, see the patchset I sent:
http://mails.dpdk.org/archives/dev/2019-March/125703.html

If you want per queue stats, use xstats.
You can still account those errors in the global stats->oerrors below.

+               stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
> +
> +               stats->ipackets += stats->q_ipackets[i];
> +               stats->ibytes += stats->q_ibytes[i];
> +               stats->imissed += internals->rx_queues[i].rx_dropped;
> +               getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> XDP_STATISTICS,
> +                               &xdp_stats, &optlen);
> +               stats->imissed += xdp_stats.rx_dropped;
> +
> +               stats->opackets += stats->q_opackets[i];
> +               stats->oerrors += stats->q_errors[i];
>

-               stats->oerrors += stats->q_errors[i];
+               stats->oerrors += internals->tx_queues[i].err_pkts;

>
> +               stats->obytes += stats->q_obytes[i];
> +       }
> +
> +       return 0;
> +}
>
>

-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool
  2019-03-01  8:09 ` [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool Xiaolong Ye
@ 2019-03-05  8:30   ` David Marchand
  2019-03-07  3:07     ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: David Marchand @ 2019-03-05  8:30 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Olivier Matz

On Fri, Mar 1, 2019 at 9:13 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> This give the option that applicaiton can configure each
> memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD).
>
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>

Cc: maintainer

---
>  lib/librte_mbuf/rte_mbuf.c | 15 ++++++++++++---
>  lib/librte_mbuf/rte_mbuf.h |  8 +++++++-
>  2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 21f6f7404..0f6fcff28 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -110,7 +110,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>  struct rte_mempool *
>  rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
>         unsigned int cache_size, uint16_t priv_size, uint16_t
> data_room_size,
> -       int socket_id, const char *ops_name)
> +       unsigned int flags, int socket_id, const char *ops_name)
>  {
>         struct rte_mempool *mp;
>         struct rte_pktmbuf_pool_private mbp_priv;
>

You can't do that, rte_pktmbuf_pool_create_by_ops is exposed to the user
apps and part of the ABI.
You must define a new internal fonction that takes this flag, keeps the
existing interface and add your new experimental api.


-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool
  2019-03-05  8:30   ` David Marchand
@ 2019-03-07  3:07     ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-07  3:07 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Qi Zhang, Olivier Matz

Hi David,

Thanks for you comments.

On 03/05, David Marchand wrote:
>On Fri, Mar 1, 2019 at 9:13 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> This give the option that applicaiton can configure each
>> memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD).
>>
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>>
>
>Cc: maintainer
>
>---
>>  lib/librte_mbuf/rte_mbuf.c | 15 ++++++++++++---
>>  lib/librte_mbuf/rte_mbuf.h |  8 +++++++-
>>  2 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>> index 21f6f7404..0f6fcff28 100644
>> --- a/lib/librte_mbuf/rte_mbuf.c
>> +++ b/lib/librte_mbuf/rte_mbuf.c
>> @@ -110,7 +110,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>>  struct rte_mempool *
>>  rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
>>         unsigned int cache_size, uint16_t priv_size, uint16_t
>> data_room_size,
>> -       int socket_id, const char *ops_name)
>> +       unsigned int flags, int socket_id, const char *ops_name)
>>  {
>>         struct rte_mempool *mp;
>>         struct rte_pktmbuf_pool_private mbp_priv;
>>
>
>You can't do that, rte_pktmbuf_pool_create_by_ops is exposed to the user
>apps and part of the ABI.
>You must define a new internal fonction that takes this flag, keeps the
>existing interface and add your new experimental api.
>

In this case, if I define a new function that takes the flag, it seems would
have a lot of duplicated code with rte_pktmbuf_pool_create_by_ops, do you have
any suggestions for better handling?

Thanks,
Xiaolong

>
>-- 
>David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-05  8:25   ` David Marchand
@ 2019-03-07  3:19     ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-07  3:19 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Qi Zhang

Hi, David

Thanks for your review.

On 03/05, David Marchand wrote:
>On Fri, Mar 1, 2019 at 9:13 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> diff --git a/doc/guides/rel_notes/release_18_11.rst
>> b/doc/guides/rel_notes/release_18_11.rst
>> index 65bab557d..e0918441a 100644
>> --- a/doc/guides/rel_notes/release_18_11.rst
>> +++ b/doc/guides/rel_notes/release_18_11.rst
>> @@ -229,6 +229,13 @@ New Features
>>    The AESNI MB PMD has been updated with additional support for the
>> AES-GCM
>>    algorithm.
>>
>> +* **Added the AF_XDP PMD.**
>> +
>> +  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP
>> socket
>> +  and bind it to a specific netdev queue, it allows a DPDK application to
>> send
>> +  and receive raw packets through the socket which would bypass the kernel
>> +  network stack to achieve high performance packet processing.
>> +
>>
>
>Should be in 19.05 release notes.

Ooops, my bad, will swith to 19.05 in next version.

>
>
>diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> new file mode 100644
>> index 000000000..6de769650
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>
>> [snip]
>
>+struct pkt_rx_queue {
>> +       struct xsk_ring_cons rx;
>> +       struct xsk_umem_info *umem;
>> +       struct xsk_socket *xsk;
>> +       struct rte_mempool *mb_pool;
>> +
>> +       unsigned long rx_pkts;
>> +       unsigned long rx_bytes;
>> +       unsigned long rx_dropped;
>>
>
>ethdev stats are uint64_t, why declare those internal stats as unsigned
>long ?

You are right, should use uint64_t instead.

>
>+
>> +       struct pkt_tx_queue *pair;
>> +       uint16_t queue_idx;
>> +};
>> +
>> +struct pkt_tx_queue {
>> +       struct xsk_ring_prod tx;
>> +
>> +       unsigned long tx_pkts;
>> +       unsigned long err_pkts;
>> +       unsigned long tx_bytes;
>>
>
>Idem.

Will change to uint64_t.

>
>+
>> +       struct pkt_rx_queue *pair;
>> +       uint16_t queue_idx;
>> +};
>>
>
>[snip]
>
>+static int
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>> +{
>> +       struct pmd_internals *internals = dev->data->dev_private;
>> +       struct xdp_statistics xdp_stats;
>> +       struct pkt_rx_queue *rxq;
>> +       socklen_t optlen;
>> +       int i;
>> +
>> +       optlen = sizeof(struct xdp_statistics);
>> +       for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> +               rxq = &internals->rx_queues[i];
>> +               stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
>> +               stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
>> +
>> +               stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
>> +               stats->q_errors[i] = internals->tx_queues[i].err_pkts;
>>
>
>q_errors[] are for reception errors, see the patchset I sent:
>http://mails.dpdk.org/archives/dev/2019-March/125703.html
>
>If you want per queue stats, use xstats.
>You can still account those errors in the global stats->oerrors below.
>
>+               stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
>> +
>> +               stats->ipackets += stats->q_ipackets[i];
>> +               stats->ibytes += stats->q_ibytes[i];
>> +               stats->imissed += internals->rx_queues[i].rx_dropped;
>> +               getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
>> XDP_STATISTICS,
>> +                               &xdp_stats, &optlen);
>> +               stats->imissed += xdp_stats.rx_dropped;
>> +
>> +               stats->opackets += stats->q_opackets[i];
>> +               stats->oerrors += stats->q_errors[i];
>>
>
>-               stats->oerrors += stats->q_errors[i];
>+               stats->oerrors += internals->tx_queues[i].err_pkts;

will correct according to your patch.

Thanks,
Xiaolong

>
>>
>> +               stats->obytes += stats->q_obytes[i];
>> +       }
>> +
>> +       return 0;
>> +}
>>
>>
>
>-- 
>David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-05  8:25   ` David Marchand
@ 2019-03-11 16:20   ` Ferruh Yigit
  2019-03-12 15:54     ` Ye Xiaolong
  4 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-11 16:20 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang

On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  MAINTAINERS                                   |   6 +
>  config/common_base                            |   5 +
>  doc/guides/nics/af_xdp.rst                    |  43 +

Can you please add new .rst file to index file, doc/guides/nics/index.rst ?

>  doc/guides/rel_notes/release_18_11.rst        |   7 +

Please switch to latest release notes.

>  drivers/net/Makefile                          |   1 +
>  drivers/net/af_xdp/Makefile                   |  31 +
>  drivers/net/af_xdp/meson.build                |   7 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 903 ++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
>  mk/rte.app.mk                                 |   1 +

Can you please add .ini file too?

<...>

> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>  #
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>  
> +#
> +# Compile software PMD backed by AF_XDP sockets (Linux only)
> +#
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
> +

Why it is not enabled in linux config (config/common_linuxapp)? Is it because of
the external library dependencies?
I guess there is a requirement to the specific Linux kernel version, can it be
possible to detect it in Makefile and enable/disable according this information?

<...>

> +Prerequisites
> +-------------
> +
> +This is a Linux-specific PMD, thus the following prerequisites apply:
> +
> +*  A Linux Kernel with XDP sockets configuration enabled;

Can you please give more details of what exact vanilla kernel version?

> +*  libbpf with latest af_xdp support installed

Is there a specific version of libbpf for this?
I can see in makefile, libelf is also linked, is it a dependency?

<...>

> @@ -0,0 +1,31 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_af_xdp.a
> +
> +EXPORT_MAP := rte_pmd_af_xdp_version.map
> +
> +LIBABIVER := 1
> +
> +
> +CFLAGS += -O3
> +# below line should be removed

+1

> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/lib/bpf
> +
> +CFLAGS += $(WERROR_FLAGS)
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
> +LDLIBS += -lrte_bus_vdev

Dependent libraries should be linked here.

<...>

> +
> +#include <linux/if_ether.h>
> +#include <linux/if_xdp.h>
> +#include <linux/if_link.h>
> +#include <asm/barrier.h>

Getting an build error for this [1], can there be any include path param missing?

[1]
drivers/net/af_xdp/rte_eth_af_xdp.c:15:10: fatal error: asm/barrier.h: No such
file or directory

<...>

> +static void
> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
> +	dev_info->max_rx_queues = 1;
> +	dev_info->max_tx_queues = 1;

'ETH_AF_XDP_MAX_QUEUE_PAIRS' is '16' but you are forcing the max Rx/Tx queue
number to be '1', intentional?

> +	dev_info->min_rx_bufsize = 0;
> +
> +	dev_info->default_rxportconf.nb_queues = 1;
> +	dev_info->default_txportconf.nb_queues = 1;
> +	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
> +	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
> +}
> +
> +static int
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct xdp_statistics xdp_stats;
> +	struct pkt_rx_queue *rxq;
> +	socklen_t optlen;
> +	int i;
> +
> +	optlen = sizeof(struct xdp_statistics);
> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +		rxq = &internals->rx_queues[i];
> +		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
> +		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
> +
> +		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
> +		stats->q_errors[i] = internals->tx_queues[i].err_pkts;

There is a patch from David, which points the 'q_errors' is for Rx only:
https://patches.dpdk.org/cover/50783/

<...>

> +static void xdp_umem_destroy(struct xsk_umem_info *umem)
> +{
> +	if (umem->buffer)
> +		free(umem->buffer);
> +
> +	free(umem);

Should we set freed pointers to 'null'?

Should free 'umem->buf_ring' before freeing 'umem'?

<...>

> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev,
> +		   uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	unsigned int buf_size, data_size;
> +	struct pkt_rx_queue *rxq;
> +	int ret = 0;
> +
> +	if (mb_pool == NULL) {
> +		RTE_LOG(ERR, PMD,
> +			"Invalid mb_pool\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}

if 'mb_pool' is 'null', it will crash in 'rte_eth_rx_queue_setup()' before
coming here, I think we can drop this check.

> +
> +	if (dev->data->nb_rx_queues <= rx_queue_id) {
> +		RTE_LOG(ERR, PMD,
> +			"Invalid rx queue id: %d\n", rx_queue_id);
> +		ret = -EINVAL;
> +		goto err;
> +	}

This check already done in 'rte_eth_rx_queue_setup()' shouldn't need to be done
here.
<...>

> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev,
> +		   uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct pkt_tx_queue *txq;
> +
> +	if (dev->data->nb_tx_queues <= tx_queue_id) {
> +		RTE_LOG(ERR, PMD, "Invalid tx queue id: %d\n", tx_queue_id);
> +		return -EINVAL;
> +	}

Can skip the check, same as above.

> +
> +	RTE_LOG(WARNING, PMD, "tx queue setup size=%d will be skipped\n",
> +		nb_tx_desc);

Why setup will be skipped?

> +	txq = &internals->tx_queues[tx_queue_id];
> +
> +	dev->data->tx_queues[tx_queue_id] = txq;
> +	return 0;
> +}
> +
> +static int
> +eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct ifreq ifr = { .ifr_mtu = mtu };
> +	int ret;
> +	int s;
> +
> +	s = socket(PF_INET, SOCK_DGRAM, 0);
> +	if (s < 0)
> +		return -EINVAL;
> +
> +	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", internals->if_name);

Can you please prefer strlcpy?

> +	ret = ioctl(s, SIOCSIFMTU, &ifr);
> +	close(s);
> +
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static void
> +eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
> +{
> +	struct ifreq ifr;
> +	int s;
> +
> +	s = socket(PF_INET, SOCK_DGRAM, 0);
> +	if (s < 0)
> +		return;
> +
> +	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", if_name);

Can you please prefer strlcpy?

<...>

> +
> +static struct rte_vdev_driver pmd_af_xdp_drv;

Do we need this forward decleration?

> +
> +static void
> +parse_parameters(struct rte_kvargs *kvlist,
> +		 char **if_name,
> +		 int *queue_idx)
> +{
> +	struct rte_kvargs_pair *pair = NULL;
> +	unsigned int k_idx;
> +
> +	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
> +		pair = &kvlist->pairs[k_idx];
> +		if (strstr(pair->key, ETH_AF_XDP_IFACE_ARG))

It is better to use 'rte_kvargs_process()' instead of accessing the 'kvargs'
internals.

> +			*if_name = pair->value;
> +		else if (strstr(pair->key, ETH_AF_XDP_QUEUE_IDX_ARG))
> +			*queue_idx = atoi(pair->value);
> +	}
> +}
> +
> +static int
> +get_iface_info(const char *if_name,
> +	       struct ether_addr *eth_addr,
> +	       int *if_index)
> +{
> +	struct ifreq ifr;
> +	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
> +
> +	if (sock < 0)
> +		return -1;
> +
> +	strcpy(ifr.ifr_name, if_name);

Please prefer strlcpy.

> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
> +		goto error;
> +
> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
> +		goto error;
> +
> +	memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, 6);

Can use 'ether_addr_copy()'

> +
> +	close(sock);
> +	*if_index = if_nametoindex(if_name);
> +	return 0;
> +
> +error:
> +	close(sock);
> +	return -1;
> +}
> +
> +static int
> +init_internals(struct rte_vdev_device *dev,
> +	       const char *if_name,
> +	       int queue_idx)
> +{
> +	const char *name = rte_vdev_device_name(dev);
> +	struct rte_eth_dev *eth_dev = NULL;
> +	const unsigned int numa_node = dev->device.numa_node;
> +	struct pmd_internals *internals = NULL;
> +	int ret;
> +	int i;
> +
> +	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
> +	if (!internals)
> +		return -ENOMEM;
> +
> +	internals->queue_idx = queue_idx;
> +	strcpy(internals->if_name, if_name);

prefer 'strlcpy' please

> +
> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
> +		internals->tx_queues[i].pair = &internals->rx_queues[i];
> +		internals->rx_queues[i].pair = &internals->tx_queues[i];
> +	}
> +
> +	ret = get_iface_info(if_name, &internals->eth_addr,
> +			     &internals->if_index);
> +	if (ret)
> +		goto err;
> +
> +	eth_dev = rte_eth_vdev_allocate(dev, 0);
> +	if (!eth_dev)
> +		goto err;
> +
> +	eth_dev->data->dev_private = internals;
> +	eth_dev->data->dev_link = pmd_link;
> +	eth_dev->data->mac_addrs = &internals->eth_addr;
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
> +	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
> +
> +	rte_eth_dev_probing_finish(eth_dev);

What do you think moving this call into 'rte_pmd_af_xdp_probe' function if
'init_internals' returns sucess instead of setting here?

> +	return 0;
> +
> +err:
> +	rte_free(internals);
> +	return -1;
> +}
> +
> +static int
> +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> +{
> +	struct rte_kvargs *kvlist;
> +	char *if_name = NULL;
> +	int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;

This 'queue_idx' is for interface queue to pass to xsk_* API, also we have same
variable name 'queue_idx' that we use for DPDK queue index, they get confused
easily, what do you think rename this one something like 'xsk_queue_idx'?

> +	struct rte_eth_dev *eth_dev;
> +	const char *name;
> +	int ret;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n",
> +		rte_vdev_device_name(dev));
> +
> +	name = rte_vdev_device_name(dev);
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
> +		strlen(rte_vdev_device_args(dev)) == 0) {
> +		eth_dev = rte_eth_dev_attach_secondary(name);
> +		if (!eth_dev) {
> +			RTE_LOG(ERR, PMD, "Failed to probe %s\n", name);
> +			return -EINVAL;
> +		}
> +		eth_dev->dev_ops = &ops;
> +		rte_eth_dev_probing_finish(eth_dev);
> +	}
> +
> +	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
> +	if (!kvlist) {
> +		RTE_LOG(ERR, PMD,
> +			"Invalid kvargs\n");

No need to break the line.

> +		return -EINVAL;
> +	}
> +
> +	if (dev->device.numa_node == SOCKET_ID_ANY)
> +		dev->device.numa_node = rte_socket_id();
> +
> +	parse_parameters(kvlist, &if_name,
> +			 &queue_idx);

Same, no need to break the line.

> +
> +	ret = init_internals(dev, if_name, queue_idx);
> +
> +	rte_kvargs_free(kvlist);
> +
> +	return ret;
> +}
> +
> +static int
> +rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internals *internals;
> +
> +	RTE_LOG(INFO, PMD, "Removing AF_XDP ethdev on numa socket %u\n",
> +		rte_socket_id());
> +
> +	if (!dev)
> +		return -1;
> +
> +	/* find the ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
> +	if (!eth_dev)
> +		return -1;
> +
> +	internals = eth_dev->data->dev_private;
> +
> +	rte_ring_free(internals->umem->buf_ring);
> +	rte_free(internals->umem->buffer);
> +	rte_free(internals->umem);
> +	rte_free(internals);
> +
> +	rte_eth_dev_release_port(eth_dev);

This frees the 'eth_dev->data->dev_private', (internals), it can be problem to
try to free same pointer twice.

> +
> +
> +	return 0;
> +}
> +
> +static struct rte_vdev_driver pmd_af_xdp_drv = {
> +	.probe = rte_pmd_af_xdp_probe,
> +	.remove = rte_pmd_af_xdp_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
> +RTE_PMD_REGISTER_ALIAS(net_af_xdp, eth_af_xdp);

No need to create the alias, it is for backward compatibility.

> +RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
> +			      "iface=<string> "
> +			      "queue=<int> ");
> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> new file mode 100644
> index 000000000..ef3539840
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> @@ -0,0 +1,4 @@
> +DPDK_2.0 {

Use release version please.

> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index d0ab942d5..db3271c7b 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
>  endif
>  
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf

For which API libelf is required?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 0/6] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (5 preceding siblings ...)
  2019-03-01  8:09 ` [PATCH v1 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
@ 2019-03-11 16:43 ` Ferruh Yigit
  2019-03-11 17:19   ` Thomas Monjalon
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-11 16:43 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Thomas Monjalon, Bruce Richardson

On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
> Overview
> ========
> 
> This patchset adds a new PMD driver for AF_XDP which is a proposed
> faster version of AF_PACKET interface in Linux, see below links [1] [2] for
> details of AF_XDP introduction:
> 
> AF_XDP roadmap
> ==============
> - AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
>   in libbpf has been merged in bpf-next/master. [3]

And it seems it has been merged into main repo [1], I assume it will be part of
5.1, which I guess will be released mid May.

And we have release on 10 May. Taking into account that libbpf APIs used
extensively, does it mean we can't release af_xdp on 19.05?


[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=1cad078842396f0047a796694b6130fc096d97e2

<...>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 6/6] app/testpmd: add mempool flags parameter
  2019-03-01  8:09 ` [PATCH v1 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
  2019-03-01 18:34   ` Stephen Hemminger
@ 2019-03-11 16:46   ` Ferruh Yigit
  2019-03-12 15:10     ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-11 16:46 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang

On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
> When create rte_mempool, flags can be parsed from command line.
> Now, it is possible for testpmd to create a af_xdp friendly
> mempool (which enable zero copy).
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  app/test-pmd/parameters.c | 12 ++++++++++++
>  app/test-pmd/testpmd.c    | 17 ++++++++++-------
>  app/test-pmd/testpmd.h    |  1 +

Can you please document new added command line option in
doc/guides/testpmd_app_ug/run_app.rst ?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 0/6] Introduce AF_XDP PMD
  2019-03-11 16:43 ` [PATCH v1 0/6] Introduce AF_XDP PMD Ferruh Yigit
@ 2019-03-11 17:19   ` Thomas Monjalon
  2019-03-12  1:51     ` Zhang, Qi Z
  0 siblings, 1 reply; 214+ messages in thread
From: Thomas Monjalon @ 2019-03-11 17:19 UTC (permalink / raw)
  To: Ferruh Yigit, Xiaolong Ye; +Cc: dev, Qi Zhang, Bruce Richardson

11/03/2019 17:43, Ferruh Yigit:
> On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
> > Overview
> > ========
> > 
> > This patchset adds a new PMD driver for AF_XDP which is a proposed
> > faster version of AF_PACKET interface in Linux, see below links [1] [2] for
> > details of AF_XDP introduction:
> > 
> > AF_XDP roadmap
> > ==============
> > - AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
> >   in libbpf has been merged in bpf-next/master. [3]
> 
> And it seems it has been merged into main repo [1], I assume it will be part of
> 5.1, which I guess will be released mid May.
> 
> And we have release on 10 May. Taking into account that libbpf APIs used
> extensively, does it mean we can't release af_xdp on 19.05?
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=1cad078842396f0047a796694b6130fc096d97e2

I think the requirement is to have all dependencies upstream,
so we avoid releasing a feature not working when dependency is ready.

If all is ready in Linux mainline branch, I guess we are fine, are we?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 0/6] Introduce AF_XDP PMD
  2019-03-11 17:19   ` Thomas Monjalon
@ 2019-03-12  1:51     ` Zhang, Qi Z
  2019-03-12  7:55       ` Karlsson, Magnus
  0 siblings, 1 reply; 214+ messages in thread
From: Zhang, Qi Z @ 2019-03-12  1:51 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh, Ye, Xiaolong
  Cc: dev, Richardson, Bruce, Karlsson, Magnus, Topel, Bjorn

+ Magnus & Bjorn who can give more accurate comment about kernel upstream status.


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, March 12, 2019 1:20 AM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Cc: dev@dpdk.org; Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v1 0/6] Introduce AF_XDP PMD
> 
> 11/03/2019 17:43, Ferruh Yigit:
> > On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
> > > Overview
> > > ========
> > >
> > > This patchset adds a new PMD driver for AF_XDP which is a proposed
> > > faster version of AF_PACKET interface in Linux, see below links [1]
> > > [2] for details of AF_XDP introduction:
> > >
> > > AF_XDP roadmap
> > > ==============
> > > - AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
> > >   in libbpf has been merged in bpf-next/master. [3]
> >
> > And it seems it has been merged into main repo [1], I assume it will
> > be part of 5.1, which I guess will be released mid May.
> >
> > And we have release on 10 May. Taking into account that libbpf APIs
> > used extensively, does it mean we can't release af_xdp on 19.05?
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/com
> > mit?id=1cad078842396f0047a796694b6130fc096d97e2
> 
> I think the requirement is to have all dependencies upstream, so we avoid
> releasing a feature not working when dependency is ready.
> 
> If all is ready in Linux mainline branch, I guess we are fine, are we?
> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 0/6] Introduce AF_XDP PMD
  2019-03-12  1:51     ` Zhang, Qi Z
@ 2019-03-12  7:55       ` Karlsson, Magnus
  0 siblings, 0 replies; 214+ messages in thread
From: Karlsson, Magnus @ 2019-03-12  7:55 UTC (permalink / raw)
  To: Zhang, Qi Z, Thomas Monjalon, Yigit, Ferruh, Ye, Xiaolong
  Cc: dev, Richardson, Bruce, Topel, Bjorn



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Tuesday, March 12, 2019 2:52 AM
> To: Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn
> <bjorn.topel@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v1 0/6] Introduce AF_XDP PMD
> 
> + Magnus & Bjorn who can give more accurate comment about kernel
> upstream status.
> 
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Tuesday, March 12, 2019 1:20 AM
> > To: Yigit, Ferruh <ferruh.yigit@intel.com>; Ye, Xiaolong
> > <xiaolong.ye@intel.com>
> > Cc: dev@dpdk.org; Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson,
> > Bruce <bruce.richardson@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v1 0/6] Introduce AF_XDP PMD
> >
> > 11/03/2019 17:43, Ferruh Yigit:
> > > On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
> > > > Overview
> > > > ========
> > > >
> > > > This patchset adds a new PMD driver for AF_XDP which is a proposed
> > > > faster version of AF_PACKET interface in Linux, see below links
> > > > [1] [2] for details of AF_XDP introduction:
> > > >
> > > > AF_XDP roadmap
> > > > ==============
> > > > - AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
> > > >   in libbpf has been merged in bpf-next/master. [3]
> > >
> > > And it seems it has been merged into main repo [1], I assume it will
> > > be part of 5.1, which I guess will be released mid May.
> > >
> > > And we have release on 10 May. Taking into account that libbpf APIs
> > > used extensively, does it mean we can't release af_xdp on 19.05?
> > >
> > > [1]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > om
> > > mit?id=1cad078842396f0047a796694b6130fc096d97e2
> >
> > I think the requirement is to have all dependencies upstream, so we
> > avoid releasing a feature not working when dependency is ready.
> >
> > If all is ready in Linux mainline branch, I guess we are fine, are we?

Libbpf is in linux-next which means it will be in the 5.1 release. This release will likely be in the end of April if we count the normal 8 week cadence of releases. 8 weeks means April 28, 9 weeks = May 5. 10 weeks = May 12. So unless we go all the way to 10 weeks, this feature should be in a stable release before DPDK 19.05.

/Magnus

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 6/6] app/testpmd: add mempool flags parameter
  2019-03-11 16:46   ` Ferruh Yigit
@ 2019-03-12 15:10     ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-12 15:10 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Qi Zhang

On 03/11, Ferruh Yigit wrote:
>On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
>> When create rte_mempool, flags can be parsed from command line.
>> Now, it is possible for testpmd to create a af_xdp friendly
>> mempool (which enable zero copy).
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  app/test-pmd/parameters.c | 12 ++++++++++++
>>  app/test-pmd/testpmd.c    | 17 ++++++++++-------
>>  app/test-pmd/testpmd.h    |  1 +
>
>Can you please document new added command line option in
>doc/guides/testpmd_app_ug/run_app.rst ?

Got it, will add in next version.

Thanks,
Xiaolong
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-11 16:20   ` Ferruh Yigit
@ 2019-03-12 15:54     ` Ye Xiaolong
  2019-03-13 10:54       ` Ferruh Yigit
  2019-03-17  3:35       ` Ye Xiaolong
  0 siblings, 2 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-12 15:54 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Qi Zhang

Hi, Ferruh

Thanks for your review.

On 03/11, Ferruh Yigit wrote:
>On 3/1/2019 8:09 AM, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  MAINTAINERS                                   |   6 +
>>  config/common_base                            |   5 +
>>  doc/guides/nics/af_xdp.rst                    |  43 +
>
>Can you please add new .rst file to index file, doc/guides/nics/index.rst ?

Got it, will do in next version.

>
>>  doc/guides/rel_notes/release_18_11.rst        |   7 +
>
>Please switch to latest release notes.

My bad, will switch to 19.05..

>
>>  drivers/net/Makefile                          |   1 +
>>  drivers/net/af_xdp/Makefile                   |  31 +
>>  drivers/net/af_xdp/meson.build                |   7 +
>>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 903 ++++++++++++++++++
>>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
>>  mk/rte.app.mk                                 |   1 +
>
>Can you please add .ini file too?

will do.

>
><...>
>
>> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>>  #
>>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>>  
>> +#
>> +# Compile software PMD backed by AF_XDP sockets (Linux only)
>> +#
>> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
>> +
>
>Why it is not enabled in linux config (config/common_linuxapp)? Is it because of
>the external library dependencies?

Yes, af_xdp pmd is dependent on libbpf which is not included in any linux distribution yet.

>I guess there is a requirement to the specific Linux kernel version, can it be

libbpf should be included in kernel 5.1 release.

>possible to detect it in Makefile and enable/disable according this information?
>

Ok, I'll investigate how to do it.

><...>
>
>> +Prerequisites
>> +-------------
>> +
>> +This is a Linux-specific PMD, thus the following prerequisites apply:
>> +
>> +*  A Linux Kernel with XDP sockets configuration enabled;
>
>Can you please give more details of what exact vanilla kernel version?

Do you mean I should write more details about AF_XDP in kernel in this introduction 
document?

>
>> +*  libbpf with latest af_xdp support installed
>
>Is there a specific version of libbpf for this?

I'm not aware that there is specific version number for libbpf, it's part of linux
kernel src code.

>I can see in makefile, libelf is also linked, is it a dependency?

libelf is a leftover of RFC, will delete it in next version.

>
><...>
>
>> @@ -0,0 +1,31 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>> +
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +#
>> +# library name
>> +#
>> +LIB = librte_pmd_af_xdp.a
>> +
>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +
>> +CFLAGS += -O3
>> +# below line should be removed
>
>+1
>
>> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
>> +CFLAGS += -I/root/yexl/shared_mks0/linux/tools/lib/bpf
>> +
>> +CFLAGS += $(WERROR_FLAGS)
>> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
>> +LDLIBS += -lrte_bus_vdev
>
>Dependent libraries should be linked here.

Ok, I'll add libbpf here.

>
><...>
>
>> +
>> +#include <linux/if_ether.h>
>> +#include <linux/if_xdp.h>
>> +#include <linux/if_link.h>
>> +#include <asm/barrier.h>
>
>Getting an build error for this [1], can there be any include path param missing?
>
>[1]
>drivers/net/af_xdp/rte_eth_af_xdp.c:15:10: fatal error: asm/barrier.h: No such
>file or directory

Yes, it need something like 

CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include

as in above Makefile currently.

>
><...>
>
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	dev_info->if_index = internals->if_index;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
>> +	dev_info->max_rx_queues = 1;
>> +	dev_info->max_tx_queues = 1;
>
>'ETH_AF_XDP_MAX_QUEUE_PAIRS' is '16' but you are forcing the max Rx/Tx queue
>number to be '1', intentional?

Yes, current implementation is single queue only, we plan to support muli-queues
in futher.

>
>> +	dev_info->min_rx_bufsize = 0;
>> +
>> +	dev_info->default_rxportconf.nb_queues = 1;
>> +	dev_info->default_txportconf.nb_queues = 1;
>> +	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
>> +	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
>> +}
>> +
>> +static int
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct xdp_statistics xdp_stats;
>> +	struct pkt_rx_queue *rxq;
>> +	socklen_t optlen;
>> +	int i;
>> +
>> +	optlen = sizeof(struct xdp_statistics);
>> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> +		rxq = &internals->rx_queues[i];
>> +		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
>> +		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
>> +
>> +		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
>> +		stats->q_errors[i] = internals->tx_queues[i].err_pkts;
>
>There is a patch from David, which points the 'q_errors' is for Rx only:
>https://patches.dpdk.org/cover/50783/

Yes, I got the same comment from David, will change it accordingly.

>
><...>
>
>> +static void xdp_umem_destroy(struct xsk_umem_info *umem)
>> +{
>> +	if (umem->buffer)
>> +		free(umem->buffer);
>> +
>> +	free(umem);
>
>Should we set freed pointers to 'null'?

will do.

>
>Should free 'umem->buf_ring' before freeing 'umem'?

Good catch, will add free buf_ring.

>
><...>
>
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev,
>> +		   uint16_t rx_queue_id,
>> +		   uint16_t nb_rx_desc,
>> +		   unsigned int socket_id __rte_unused,
>> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +		   struct rte_mempool *mb_pool)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	unsigned int buf_size, data_size;
>> +	struct pkt_rx_queue *rxq;
>> +	int ret = 0;
>> +
>> +	if (mb_pool == NULL) {
>> +		RTE_LOG(ERR, PMD,
>> +			"Invalid mb_pool\n");
>> +		ret = -EINVAL;
>> +		goto err;
>> +	}
>
>if 'mb_pool' is 'null', it will crash in 'rte_eth_rx_queue_setup()' before
>coming here, I think we can drop this check.

Agree, it's a redundant check, will remove.

>
>> +
>> +	if (dev->data->nb_rx_queues <= rx_queue_id) {
>> +		RTE_LOG(ERR, PMD,
>> +			"Invalid rx queue id: %d\n", rx_queue_id);
>> +		ret = -EINVAL;
>> +		goto err;
>> +	}
>
>This check already done in 'rte_eth_rx_queue_setup()' shouldn't need to be done
>here.

will remove.

><...>
>
>> +static int
>> +eth_tx_queue_setup(struct rte_eth_dev *dev,
>> +		   uint16_t tx_queue_id,
>> +		   uint16_t nb_tx_desc,
>> +		   unsigned int socket_id __rte_unused,
>> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct pkt_tx_queue *txq;
>> +
>> +	if (dev->data->nb_tx_queues <= tx_queue_id) {
>> +		RTE_LOG(ERR, PMD, "Invalid tx queue id: %d\n", tx_queue_id);
>> +		return -EINVAL;
>> +	}
>
>Can skip the check, same as above.

Got it.

>
>> +
>> +	RTE_LOG(WARNING, PMD, "tx queue setup size=%d will be skipped\n",
>> +		nb_tx_desc);
>
>Why setup will be skipped?

leftover of RFC, will remove.

>
>> +	txq = &internals->tx_queues[tx_queue_id];
>> +
>> +	dev->data->tx_queues[tx_queue_id] = txq;
>> +	return 0;
>> +}
>> +
>> +static int
>> +eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct ifreq ifr = { .ifr_mtu = mtu };
>> +	int ret;
>> +	int s;
>> +
>> +	s = socket(PF_INET, SOCK_DGRAM, 0);
>> +	if (s < 0)
>> +		return -EINVAL;
>> +
>> +	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", internals->if_name);
>
>Can you please prefer strlcpy?

Sure.

>
>> +	ret = ioctl(s, SIOCSIFMTU, &ifr);
>> +	close(s);
>> +
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
>> +{
>> +	struct ifreq ifr;
>> +	int s;
>> +
>> +	s = socket(PF_INET, SOCK_DGRAM, 0);
>> +	if (s < 0)
>> +		return;
>> +
>> +	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", if_name);
>
>Can you please prefer strlcpy?

Sure.

>
><...>
>
>> +
>> +static struct rte_vdev_driver pmd_af_xdp_drv;
>
>Do we need this forward decleration?

Part of af_xdp pmd is refering to af_packet pmd, this is a simple copy from that
driver, and as you point out, it should be unnecessary, will remove it.

>
>> +
>> +static void
>> +parse_parameters(struct rte_kvargs *kvlist,
>> +		 char **if_name,
>> +		 int *queue_idx)
>> +{
>> +	struct rte_kvargs_pair *pair = NULL;
>> +	unsigned int k_idx;
>> +
>> +	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
>> +		pair = &kvlist->pairs[k_idx];
>> +		if (strstr(pair->key, ETH_AF_XDP_IFACE_ARG))
>
>It is better to use 'rte_kvargs_process()' instead of accessing the 'kvargs'
>internals.

Will do.

>
>> +			*if_name = pair->value;
>> +		else if (strstr(pair->key, ETH_AF_XDP_QUEUE_IDX_ARG))
>> +			*queue_idx = atoi(pair->value);
>> +	}
>> +}
>> +
>> +static int
>> +get_iface_info(const char *if_name,
>> +	       struct ether_addr *eth_addr,
>> +	       int *if_index)
>> +{
>> +	struct ifreq ifr;
>> +	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
>> +
>> +	if (sock < 0)
>> +		return -1;
>> +
>> +	strcpy(ifr.ifr_name, if_name);
>
>Please prefer strlcpy.

Sure.

>
>> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
>> +		goto error;
>> +
>> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
>> +		goto error;
>> +
>> +	memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, 6);
>
>Can use 'ether_addr_copy()'

Got it.

>
>> +
>> +	close(sock);
>> +	*if_index = if_nametoindex(if_name);
>> +	return 0;
>> +
>> +error:
>> +	close(sock);
>> +	return -1;
>> +}
>> +
>> +static int
>> +init_internals(struct rte_vdev_device *dev,
>> +	       const char *if_name,
>> +	       int queue_idx)
>> +{
>> +	const char *name = rte_vdev_device_name(dev);
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	const unsigned int numa_node = dev->device.numa_node;
>> +	struct pmd_internals *internals = NULL;
>> +	int ret;
>> +	int i;
>> +
>> +	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
>> +	if (!internals)
>> +		return -ENOMEM;
>> +
>> +	internals->queue_idx = queue_idx;
>> +	strcpy(internals->if_name, if_name);
>
>prefer 'strlcpy' please

Got it.

>
>> +
>> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
>> +		internals->tx_queues[i].pair = &internals->rx_queues[i];
>> +		internals->rx_queues[i].pair = &internals->tx_queues[i];
>> +	}
>> +
>> +	ret = get_iface_info(if_name, &internals->eth_addr,
>> +			     &internals->if_index);
>> +	if (ret)
>> +		goto err;
>> +
>> +	eth_dev = rte_eth_vdev_allocate(dev, 0);
>> +	if (!eth_dev)
>> +		goto err;
>> +
>> +	eth_dev->data->dev_private = internals;
>> +	eth_dev->data->dev_link = pmd_link;
>> +	eth_dev->data->mac_addrs = &internals->eth_addr;
>> +	eth_dev->dev_ops = &ops;
>> +	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
>> +	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
>> +
>> +	rte_eth_dev_probing_finish(eth_dev);
>
>What do you think moving this call into 'rte_pmd_af_xdp_probe' function if
>'init_internals' returns sucess instead of setting here?

Sounds better, will do.

>
>> +	return 0;
>> +
>> +err:
>> +	rte_free(internals);
>> +	return -1;
>> +}
>> +
>> +static int
>> +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>> +{
>> +	struct rte_kvargs *kvlist;
>> +	char *if_name = NULL;
>> +	int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
>
>This 'queue_idx' is for interface queue to pass to xsk_* API, also we have same
>variable name 'queue_idx' that we use for DPDK queue index, they get confused
>easily, what do you think rename this one something like 'xsk_queue_idx'?

Agree, xsk_queue_idx is a better name, will adopt.

>
>> +	struct rte_eth_dev *eth_dev;
>> +	const char *name;
>> +	int ret;
>> +
>> +	RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n",
>> +		rte_vdev_device_name(dev));
>> +
>> +	name = rte_vdev_device_name(dev);
>> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
>> +		strlen(rte_vdev_device_args(dev)) == 0) {
>> +		eth_dev = rte_eth_dev_attach_secondary(name);
>> +		if (!eth_dev) {
>> +			RTE_LOG(ERR, PMD, "Failed to probe %s\n", name);
>> +			return -EINVAL;
>> +		}
>> +		eth_dev->dev_ops = &ops;
>> +		rte_eth_dev_probing_finish(eth_dev);
>> +	}
>> +
>> +	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
>> +	if (!kvlist) {
>> +		RTE_LOG(ERR, PMD,
>> +			"Invalid kvargs\n");
>
>No need to break the line.

Got it.

>
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (dev->device.numa_node == SOCKET_ID_ANY)
>> +		dev->device.numa_node = rte_socket_id();
>> +
>> +	parse_parameters(kvlist, &if_name,
>> +			 &queue_idx);
>
>Same, no need to break the line.

Got it.

>
>> +
>> +	ret = init_internals(dev, if_name, queue_idx);
>> +
>> +	rte_kvargs_free(kvlist);
>> +
>> +	return ret;
>> +}
>> +
>> +static int
>> +rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
>> +{
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct pmd_internals *internals;
>> +
>> +	RTE_LOG(INFO, PMD, "Removing AF_XDP ethdev on numa socket %u\n",
>> +		rte_socket_id());
>> +
>> +	if (!dev)
>> +		return -1;
>> +
>> +	/* find the ethdev entry */
>> +	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
>> +	if (!eth_dev)
>> +		return -1;
>> +
>> +	internals = eth_dev->data->dev_private;
>> +
>> +	rte_ring_free(internals->umem->buf_ring);
>> +	rte_free(internals->umem->buffer);
>> +	rte_free(internals->umem);
>> +	rte_free(internals);
>> +
>> +	rte_eth_dev_release_port(eth_dev);
>
>This frees the 'eth_dev->data->dev_private', (internals), it can be problem to
>try to free same pointer twice.

Thanks for pointing out, will remove the `rte_free(internals)` to avoid a double free.

>
>> +
>> +
>> +	return 0;
>> +}
>> +
>> +static struct rte_vdev_driver pmd_af_xdp_drv = {
>> +	.probe = rte_pmd_af_xdp_probe,
>> +	.remove = rte_pmd_af_xdp_remove,
>> +};
>> +
>> +RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
>> +RTE_PMD_REGISTER_ALIAS(net_af_xdp, eth_af_xdp);
>
>No need to create the alias, it is for backward compatibility.

Got it.

>
>> +RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
>> +			      "iface=<string> "
>> +			      "queue=<int> ");
>> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> new file mode 100644
>> index 000000000..ef3539840
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> @@ -0,0 +1,4 @@
>> +DPDK_2.0 {
>
>Use release version please.

Got it.

>
>> +
>> +	local: *;
>> +};
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index d0ab942d5..db3271c7b 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
>>  endif
>>  
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
>
>For which API libelf is required?

It is a leftover and will be removed.

Thanks,
Xiaolong
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-12 15:54     ` Ye Xiaolong
@ 2019-03-13 10:54       ` Ferruh Yigit
  2019-03-13 11:12         ` Ye Xiaolong
  2019-03-17  3:35       ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-13 10:54 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Qi Zhang


>> <...>
>>
>>> +Prerequisites
>>> +-------------
>>> +
>>> +This is a Linux-specific PMD, thus the following prerequisites apply:
>>> +
>>> +*  A Linux Kernel with XDP sockets configuration enabled;
>>
>> Can you please give more details of what exact vanilla kernel version?
> 
> Do you mean I should write more details about AF_XDP in kernel in this introduction 
> document?

I think it is good to document the exact version information instead of saying
"Linux Kernel with af_xdp".

>>> +*  libbpf with latest af_xdp support installed
>>
>> Is there a specific version of libbpf for this?
> 
> I'm not aware that there is specific version number for libbpf, it's part of linux
> kernel src code.

If it is coming with Linux kernel, which version of Linux kernel?

>> <...>
>>
>>> +
>>> +#include <linux/if_ether.h>
>>> +#include <linux/if_xdp.h>
>>> +#include <linux/if_link.h>
>>> +#include <asm/barrier.h>
>>
>> Getting an build error for this [1], can there be any include path param missing?
>>
>> [1]
>> drivers/net/af_xdp/rte_eth_af_xdp.c:15:10: fatal error: asm/barrier.h: No such
>> file or directory
> 
> Yes, it need something like 
> 
> CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
> 
> as in above Makefile currently.

I see, assuming you will be booting up with that kernel, can something like
below work:

CFLAGS += -I/lib/modules/$(shell uname -r)/build/tools/include/

>> <...>
>>
>>> +static void
>>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>>> +{
>>> +	struct pmd_internals *internals = dev->data->dev_private;
>>> +
>>> +	dev_info->if_index = internals->if_index;
>>> +	dev_info->max_mac_addrs = 1;
>>> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
>>> +	dev_info->max_rx_queues = 1;
>>> +	dev_info->max_tx_queues = 1;
>>
>> 'ETH_AF_XDP_MAX_QUEUE_PAIRS' is '16' but you are forcing the max Rx/Tx queue
>> number to be '1', intentional?
> 
> Yes, current implementation is single queue only, we plan to support muli-queues
> in futher.

OK, Can you please document this information?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-13 10:54       ` Ferruh Yigit
@ 2019-03-13 11:12         ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-13 11:12 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Qi Zhang

On 03/13, Ferruh Yigit wrote:
>
>>> <...>
>>>
>>>> +Prerequisites
>>>> +-------------
>>>> +
>>>> +This is a Linux-specific PMD, thus the following prerequisites apply:
>>>> +
>>>> +*  A Linux Kernel with XDP sockets configuration enabled;
>>>
>>> Can you please give more details of what exact vanilla kernel version?
>> 
>> Do you mean I should write more details about AF_XDP in kernel in this introduction 
>> document?
>
>I think it is good to document the exact version information instead of saying
>"Linux Kernel with af_xdp".

Get your point now, will add the exact kernel info.

>
>>>> +*  libbpf with latest af_xdp support installed
>>>
>>> Is there a specific version of libbpf for this?
>> 
>> I'm not aware that there is specific version number for libbpf, it's part of linux
>> kernel src code.
>
>If it is coming with Linux kernel, which version of Linux kernel?

Will add the kernel version info.

>
>>> <...>
>>>
>>>> +
>>>> +#include <linux/if_ether.h>
>>>> +#include <linux/if_xdp.h>
>>>> +#include <linux/if_link.h>
>>>> +#include <asm/barrier.h>
>>>
>>> Getting an build error for this [1], can there be any include path param missing?
>>>
>>> [1]
>>> drivers/net/af_xdp/rte_eth_af_xdp.c:15:10: fatal error: asm/barrier.h: No such
>>> file or directory
>> 
>> Yes, it need something like 
>> 
>> CFLAGS += -I/root/yexl/shared_mks0/linux/tools/include
>> 
>> as in above Makefile currently.
>
>I see, assuming you will be booting up with that kernel, can something like
>below work:
>
>CFLAGS += -I/lib/modules/$(shell uname -r)/build/tools/include/
>

I'll have a try.

>>> <...>
>>>
>>>> +static void
>>>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>>>> +{
>>>> +	struct pmd_internals *internals = dev->data->dev_private;
>>>> +
>>>> +	dev_info->if_index = internals->if_index;
>>>> +	dev_info->max_mac_addrs = 1;
>>>> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
>>>> +	dev_info->max_rx_queues = 1;
>>>> +	dev_info->max_tx_queues = 1;
>>>
>>> 'ETH_AF_XDP_MAX_QUEUE_PAIRS' is '16' but you are forcing the max Rx/Tx queue
>>> number to be '1', intentional?
>> 
>> Yes, current implementation is single queue only, we plan to support muli-queues
>> in futher.
>
>OK, Can you please document this information?

Sure.

Thanks,
Xiaolong
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-02  8:14     ` Ye Xiaolong
@ 2019-03-17  3:34       ` Ye Xiaolong
  2019-03-24 12:07         ` Luca Boccassi
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-17  3:34 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev, Qi Zhang

On 03/02, Ye Xiaolong wrote:
>>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
>>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
>>> -lelf -lbpf
>>
>>Are symbols from libelf being used by the PMD?
>
>Hmm, it is a leftover of RFC, libelf is no longer needed in this version, will
>remove it in next version.
>

Correction, libelf is needed for libbpf, so we still need to keep it. 

Thanks,
Xiaolong
>Thanks,
>Xiaolong
>>
>>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
>>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
>>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD)        += -lrte_pmd_avf
>>
>>-- 
>>Kind regards,
>>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-12 15:54     ` Ye Xiaolong
  2019-03-13 10:54       ` Ferruh Yigit
@ 2019-03-17  3:35       ` Ye Xiaolong
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-17  3:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Qi Zhang

On 03/12, Ye Xiaolong wrote:
>>I can see in makefile, libelf is also linked, is it a dependency?
>
>libelf is a leftover of RFC, will delete it in next version.

Correction, libbpf depends on libelf, so I still need to keep it.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 0/6] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (6 preceding siblings ...)
  2019-03-11 16:43 ` [PATCH v1 0/6] Introduce AF_XDP PMD Ferruh Yigit
@ 2019-03-19  7:12 ` Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                     ` (5 more replies)
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
                   ` (8 subsequent siblings)
  16 siblings, 6 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev eth_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

[1] https://lwn.net/Articles/750845/
[2] https://fosdem.org/2018/schedule/event/af_xdp/

Xiaolong Ye (6):
  net/af_xdp: introduce AF XDP PMD driver
  lib/mbuf: introduce helper to create mempool with flags
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy
  app/testpmd: add mempool flags parameter

 MAINTAINERS                                   |    6 +
 app/test-pmd/parameters.c                     |   12 +
 app/test-pmd/testpmd.c                        |   17 +-
 app/test-pmd/testpmd.h                        |    1 +
 config/common_base                            |    5 +
 config/common_linux                           |    1 +
 doc/guides/nics/af_xdp.rst                    |   45 +
 doc/guides/nics/features/af_xdp.ini           |   11 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/rel_notes/release_19_05.rst        |    7 +
 doc/guides/testpmd_app_ug/run_app.rst         |    4 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   33 +
 drivers/net/af_xdp/meson.build                |   21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1013 +++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    3 +
 drivers/net/meson.build                       |    1 +
 lib/librte_mbuf/rte_mbuf.c                    |   29 +-
 lib/librte_mbuf/rte_mbuf.h                    |   45 +
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 22 files changed, 1249 insertions(+), 12 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
@ 2019-03-19  7:12   ` Xiaolong Ye
  2019-03-19  9:07     ` Mattias Rönnblom
                       ` (3 more replies)
  2019-03-19  7:12   ` [PATCH v2 2/6] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
                     ` (4 subsequent siblings)
  5 siblings, 4 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 config/common_linux                           |   1 +
 doc/guides/nics/af_xdp.rst                    |  45 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  33 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 930 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1066 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..1cc54b439 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/config/common_linux b/config/common_linux
index 75334273d..0b1249da0 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..dd5654dd1
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..7b8fcce00
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
\ No newline at end of file
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c7383..062facf89 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..6cf0ed7db
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+LINUX_VERSION := $(shell uname -r)
+CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/include
+CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..635e67483
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..96dedc0c4
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,930 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#define RTE_LOGTYPE_AF_XDP RTE_LOGTYPE_USER1
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	void *addr = NULL;
+	int i, ret = 0;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (!ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		rte_ring_dequeue(umem->buf_ring, &addr);
+		*xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, reserve_size);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbuf;
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
+		nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (!rcvd)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	for (i = 0; i < rcvd; i++) {
+		uint64_t addr = xsk_ring_cons__rx_desc(rx, idx_rx)->addr;
+		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
+		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+		if (mbuf) {
+			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+			rte_pktmbuf_pkt_len(mbuf) =
+				rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			dropped++;
+		}
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->rx_pkts += (rcvd - dropped);
+	rxq->rx_bytes += rx_bytes;
+	rxq->rx_dropped += dropped;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	int i, n;
+	uint32_t idx_cq;
+	uint64_t addr;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+	if (n > 0) {
+		for (i = 0; i < n; i++) {
+			addr = *xsk_ring_cons__comp_addr(cq,
+							 idx_cq++);
+			rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		}
+
+		xsk_ring_cons__release(cq, n);
+	}
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+	int ret;
+
+	while (1) {
+		ret = sendto(xsk_socket__fd(txq->pair->xsk), NULL, 0,
+			     MSG_DONTWAIT, NULL, 0);
+
+		/* everything is ok */
+		if (ret >= 0)
+			break;
+
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
+		nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (!nb_pkts)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		char *pkt;
+		unsigned int buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->err_pkts += nb_pkts - valid;
+	txq->tx_pkts += valid;
+	txq->tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+	dev_info->min_rx_bufsize = 0;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i;
+
+	optlen = sizeof(struct xdp_statistics);
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].rx_dropped;
+		getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, XDP_STATISTICS,
+				&xdp_stats, &optlen);
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += stats->q_errors[i];
+		stats->oerrors += internals->tx_queues[i].err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->rx_queues[i].rx_pkts = 0;
+		internals->rx_queues[i].rx_bytes = 0;
+		internals->rx_queues[i].rx_dropped = 0;
+
+		internals->tx_queues[i].tx_pkts = 0;
+		internals->tx_queues[i].err_pkts = 0;
+		internals->tx_queues[i].tx_bytes = 0;
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		RTE_LOG(ERR, AF_XDP, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	RTE_LOG(INFO, AF_XDP, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (!rxq->umem)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	char ring_name[0x100];
+	int ret;
+	uint64_t i;
+
+	umem = calloc(1, sizeof(*umem));
+	if (!umem) {
+		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	snprintf(ring_name, 0x100, "af_xdp_ring");
+	umem->buf_ring = rte_ring_create(ring_name,
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (!umem->buf_ring) {
+		RTE_LOG(ERR, AF_XDP,
+			"Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (!rxq->umem) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+	int xsk_fd = xsk_socket__fd(rxq->xsk);
+
+	if (xsk_fd) {
+		close(xsk_fd);
+		if (internals->umem) {
+			xdp_umem_destroy(internals->umem);
+			internals->umem = NULL;
+		}
+	}
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	unsigned int buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret = 0;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		RTE_LOG(ERR, AF_XDP,
+			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		RTE_LOG(ERR, AF_XDP,
+			"Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		RTE_LOG(ERR, AF_XDP, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strlen(value) > IFNAMSIZ) {
+		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
+			"%u bytes.\n", value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret = 0;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	*if_index = if_nametoindex(if_name);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static int
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx,
+	       struct rte_eth_dev **eth_dev)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals = NULL;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (!internals)
+		return -ENOMEM;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	*eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (!*eth_dev)
+		goto err;
+
+	(*eth_dev)->data->dev_private = internals;
+	(*eth_dev)->data->dev_link = pmd_link;
+	(*eth_dev)->data->mac_addrs = &internals->eth_addr;
+	(*eth_dev)->dev_ops = &ops;
+	(*eth_dev)->rx_pkt_burst = eth_af_xdp_rx;
+	(*eth_dev)->tx_pkt_burst = eth_af_xdp_tx;
+
+	return 0;
+
+err:
+	rte_free(internals);
+	return -1;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+	int ret;
+
+	RTE_LOG(INFO, AF_XDP, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (!eth_dev) {
+			RTE_LOG(ERR, AF_XDP, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (!kvlist) {
+		RTE_LOG(ERR, AF_XDP, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		RTE_LOG(ERR, AF_XDP, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	ret = init_internals(dev, if_name, xsk_queue_idx, &eth_dev);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to init internals\n");
+		return ret;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	RTE_LOG(INFO, AF_XDP, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (!dev)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (!eth_dev)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v2 2/6] lib/mbuf: introduce helper to create mempool with flags
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-19  7:12   ` Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 3/6] lib/mempool: allow page size aligned mempool Xiaolong Ye
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

This allows applications to create mbuf mempool with specific flags
such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 29 +++++++++++++++++++-----
 lib/librte_mbuf/rte_mbuf.h | 45 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..c1db9e298 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 	m->next = NULL;
 }
 
-/* Helper to create a mbuf pool with given mempool ops name*/
-struct rte_mempool *
-rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+static struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	return mp;
 }
 
+/* Helper to create a mbuf pool with given mempool ops name*/
+struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	int socket_id, const char *ops_name)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			 priv_size, data_room_size, 0, socket_id, ops_name);
+}
+
 /* helper to create a mbuf pool */
 struct rte_mempool *
 rte_pktmbuf_pool_create(const char *name, unsigned int n,
@@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 			data_room_size, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			priv_size, data_room_size, flags, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..105ead6de 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+/**
+ * Create a mbuf pool with flags.
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v2 3/6] lib/mempool: allow page size aligned mempool
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 2/6] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-19  7:12   ` Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 4/6] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..33ab6a2b4 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+			align = getpagesize();
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..75553b36f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v2 4/6] net/af_xdp: use mbuf mempool for buffer management
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-19  7:12   ` [PATCH v2 3/6] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-19  7:12   ` Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 5/6] net/af_xdp: enable zero copy Xiaolong Ye
  2019-03-19  7:12   ` [PATCH v2 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
  5 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be
allocated from rte_mempool can be convert to xdp_desc's address and vice
versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 127 +++++++++++++++++-----------
 1 file changed, 78 insertions(+), 49 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 96dedc0c4..fc60cb5c5 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -43,7 +43,11 @@
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -56,7 +60,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -110,12 +114,32 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
-	void *addr = NULL;
+	uint64_t addr;
 	int i, ret = 0;
 
 	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
@@ -125,11 +149,14 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 	}
 
 	for (i = 0; i < reserve_size; i++) {
-		rte_ring_dequeue(umem->buf_ring, &addr);
-		*xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		if (!mbuf)
+			break;
+		addr = mbuf_to_addr(umem, mbuf);
+		*xsk_ring_prod__fill_addr(fq, idx++) = addr;
 	}
 
-	xsk_ring_prod__submit(fq, reserve_size);
+	xsk_ring_prod__submit(fq, i);
 
 	return 0;
 }
@@ -174,7 +201,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		} else {
 			dropped++;
 		}
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -197,9 +224,8 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	n = xsk_ring_cons__peek(cq, size, &idx_cq);
 	if (n > 0) {
 		for (i = 0; i < n; i++) {
-			addr = *xsk_ring_cons__comp_addr(cq,
-							 idx_cq++);
-			rte_ring_enqueue(umem->buf_ring, (void *)addr);
+			addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 		}
 
 		xsk_ring_cons__release(cq, n);
@@ -236,7 +262,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -246,10 +272,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (!nb_pkts)
-		return 0;
 
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
 		kick_tx(txq);
@@ -264,7 +286,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (!mbuf_to_tx) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -280,10 +307,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->err_pkts += nb_pkts - valid;
 	txq->tx_pkts += valid;
 	txq->tx_bytes += tx_bytes;
@@ -433,16 +456,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	free(umem->buffer);
-	umem->buffer = NULL;
-
-	rte_ring_free(umem->buf_ring);
-	umem->buf_ring = NULL;
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
 
 	free(umem);
 	umem = NULL;
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -451,10 +487,9 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
-	char ring_name[0x100];
+	void *base_addr = NULL;
+	char pool_name[0x100];
 	int ret;
-	uint64_t i;
 
 	umem = calloc(1, sizeof(*umem));
 	if (!umem) {
@@ -462,28 +497,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	snprintf(ring_name, 0x100, "af_xdp_ring");
-	umem->buf_ring = rte_ring_create(ring_name,
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (!umem->buf_ring) {
+	snprintf(pool_name, 0x100, "af_xdp_ring");
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+
+	if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
 		RTE_LOG(ERR, AF_XDP,
-			"Failed to create rte_ring\n");
+			"Failed to create rte_mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -492,7 +522,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -909,8 +939,7 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_free(internals->umem);
 
 	rte_eth_dev_release_port(eth_dev);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v2 5/6] net/af_xdp: enable zero copy
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-19  7:12   ` [PATCH v2 4/6] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-19  7:12   ` Xiaolong Ye
  2019-03-19  8:12     ` Mattias Rönnblom
  2019-03-20  9:22     ` David Marchand
  2019-03-19  7:12   ` [PATCH v2 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
  5 siblings, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 128 ++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 37 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index fc60cb5c5..c22791e51 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -62,6 +62,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct pkt_rx_queue {
@@ -76,6 +77,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct pkt_tx_queue {
@@ -191,17 +193,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
 		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
-		if (mbuf) {
-			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+		if (rxq->zc) {
+			mbuf = addr_to_mbuf(rxq->umem, addr);
 			rte_pktmbuf_pkt_len(mbuf) =
 				rte_pktmbuf_data_len(mbuf) = len;
-			rx_bytes += len;
 			bufs[count++] = mbuf;
 		} else {
-			dropped++;
+			mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+			if (mbuf) {
+				memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+				rte_pktmbuf_pkt_len(mbuf) =
+					rte_pktmbuf_data_len(mbuf) = len;
+				rx_bytes += len;
+				bufs[count++] = mbuf;
+			} else {
+				dropped++;
+			}
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 		}
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -285,22 +294,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (!mbuf_to_tx) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (!mbuf_to_tx) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -479,7 +495,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -497,20 +513,25 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	snprintf(pool_name, 0x100, "af_xdp_ring");
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-
-	if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
-		RTE_LOG(ERR, AF_XDP,
-			"Failed to create rte_mempool\n");
-		goto err;
+	if (!mb_pool) {
+		snprintf(pool_name, 0x100, "af_xdp_ring");
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
+			RTE_LOG(ERR, AF_XDP, "Failed to create rte_mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
+
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
 	ret = xsk_umem__create(&umem->umem, base_addr,
@@ -531,16 +552,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (!rxq->umem) {
 		ret = -ENOMEM;
 		goto err;
@@ -627,15 +675,21 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
-		RTE_LOG(ERR, AF_XDP,
-			"Failed to configure xdp socket\n");
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
+		RTE_LOG(ERR, AF_XDP, "Failed to configure xdp socket\n");
 		ret = -EINVAL;
 		goto err;
 	}
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		RTE_LOG(INFO, AF_XDP,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v2 6/6] app/testpmd: add mempool flags parameter
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
                     ` (4 preceding siblings ...)
  2019-03-19  7:12   ` [PATCH v2 5/6] net/af_xdp: enable zero copy Xiaolong Ye
@ 2019-03-19  7:12   ` Xiaolong Ye
  2019-03-19 23:36     ` Jerin Jacob Kollanukkaran
  5 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-19  7:12 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

When create rte_mempool, flags can be parsed from command line.
Now, it is possible for testpmd to create a af_xdp friendly
mempool (which enable zero copy).

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 app/test-pmd/parameters.c             | 12 ++++++++++++
 app/test-pmd/testpmd.c                | 17 ++++++++++-------
 app/test-pmd/testpmd.h                |  1 +
 doc/guides/testpmd_app_ug/run_app.rst |  4 ++++
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 38b419767..9d5be0007 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -61,6 +61,7 @@ usage(char* progname)
 	       "--tx-first | --stats-period=PERIOD | "
 	       "--coremask=COREMASK --portmask=PORTMASK --numa "
 	       "--mbuf-size= | --total-num-mbufs= | "
+	       "--mp-flags= | "
 	       "--nb-cores= | --nb-ports= | "
 #ifdef RTE_LIBRTE_CMDLINE
 	       "--eth-peers-configfile= | "
@@ -105,6 +106,7 @@ usage(char* progname)
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
 	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mp-flags=N: set the flags when create mbuf memory pool.\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -585,6 +587,7 @@ launch_args_parse(int argc, char** argv)
 		{ "ring-numa-config",           1, 0, 0 },
 		{ "socket-num",			1, 0, 0 },
 		{ "mbuf-size",			1, 0, 0 },
+		{ "mp-flags",			1, 0, 0 },
 		{ "total-num-mbufs",		1, 0, 0 },
 		{ "max-pkt-len",		1, 0, 0 },
 		{ "pkt-filter-mode",            1, 0, 0 },
@@ -811,6 +814,15 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "mbuf-size should be > 0 and < 65536\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-flags")) {
+				n = atoi(optarg);
+				if (n > 0 && n <= 0xFFFF)
+					mp_flags = (uint16_t)n;
+				else
+					rte_exit(EXIT_FAILURE,
+						 "mp-flags should be > 0 and < 65536\n");
+			}
+
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
 				if (n > 1024)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index d9d0c16d4..eb46dfa53 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -195,6 +195,7 @@ uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
 uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint16_t mp_flags = 0; /**< flags parsed when create mempool */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -834,6 +835,7 @@ setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
  */
 static void
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
+		 unsigned int flags,
 		 unsigned int socket_id)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
@@ -853,8 +855,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 			/* wrapper to rte_mempool_create() */
 			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
 					rte_mbuf_best_mempool_ops());
-			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, flags, socket_id);
 			break;
 		}
 	case MP_ALLOC_ANON:
@@ -891,8 +893,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 
 			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
 					rte_mbuf_best_mempool_ops());
-			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-					mb_mempool_cache, 0, mbuf_seg_size,
+			rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size, flags,
 					heap_socket);
 			break;
 		}
@@ -1128,13 +1130,14 @@ init_config(void)
 
 		for (i = 0; i < num_sockets; i++)
 			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-					 socket_ids[i]);
+					 mp_flags, socket_ids[i]);
 	} else {
 		if (socket_num == UMA_NO_CONFIG)
-			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, 0);
+			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
+					 mp_flags, 0);
 		else
 			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-						 socket_num);
+					 mp_flags, socket_num);
 	}
 
 	init_port_config();
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fa4887853..3ddb70e3e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -408,6 +408,7 @@ extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
 extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint16_t mp_flags;  /**< flags for mempool creation. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 4495ed038..bafb9c493 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -392,6 +392,10 @@ The commandline options are:
     * xmemhuge: create and populate mempool using externally and anonymously
       allocated hugepage area
 
+*   ``--mp-flag=<N>``
+
+    Select mempool allocation flag.
+
 *   ``--noisy-tx-sw-buffer-size``
 
     Set the number of maximum elements  of the FIFO queue to be created
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 5/6] net/af_xdp: enable zero copy
  2019-03-19  7:12   ` [PATCH v2 5/6] net/af_xdp: enable zero copy Xiaolong Ye
@ 2019-03-19  8:12     ` Mattias Rönnblom
  2019-03-19  8:39       ` Ye Xiaolong
  2019-03-20  9:22     ` David Marchand
  1 sibling, 1 reply; 214+ messages in thread
From: Mattias Rönnblom @ 2019-03-19  8:12 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn

On 2019-03-19 08:12, Xiaolong Ye wrote:
> Try to check if external mempool (from rx_queue_setup) is fit for
> af_xdp, if it is, it will be registered to af_xdp socket directly and
> there will be no packet data copy on Rx and Tx.
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   drivers/net/af_xdp/rte_eth_af_xdp.c | 128 ++++++++++++++++++++--------
>   1 file changed, 91 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index fc60cb5c5..c22791e51 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -62,6 +62,7 @@ struct xsk_umem_info {
>   	struct xsk_umem *umem;
>   	struct rte_mempool *mb_pool;
>   	void *buffer;
> +	uint8_t zc;
>   };
>   
>   struct pkt_rx_queue {
> @@ -76,6 +77,7 @@ struct pkt_rx_queue {
>   
>   	struct pkt_tx_queue *pair;
>   	uint16_t queue_idx;
> +	uint8_t zc;
>   };
>   
>   struct pkt_tx_queue {
> @@ -191,17 +193,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
>   		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>   
> -		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
> -		if (mbuf) {
> -			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
> +		if (rxq->zc) {
> +			mbuf = addr_to_mbuf(rxq->umem, addr);
>   			rte_pktmbuf_pkt_len(mbuf) =
>   				rte_pktmbuf_data_len(mbuf) = len;
> -			rx_bytes += len;
>   			bufs[count++] = mbuf;
>   		} else {
> -			dropped++;
> +			mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
> +			if (mbuf) {

if (likely(mbuf != NULL))

> +				memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);

Use rte_memcpy.

> +				rte_pktmbuf_pkt_len(mbuf) =
> +					rte_pktmbuf_data_len(mbuf) = len;
> +				rx_bytes += len;
> +				bufs[count++] = mbuf;
> +			} else {
> +				dropped++;
> +			}
> +			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>   		}
> -		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>   	}
>   
>   	xsk_ring_cons__release(rx, rcvd);
> @@ -285,22 +294,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   					- ETH_AF_XDP_DATA_HEADROOM;
>   		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
>   		mbuf = bufs[i];
> -		if (mbuf->pkt_len <= buf_len) {
> -			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
> -			if (!mbuf_to_tx) {
> -				rte_pktmbuf_free(mbuf);
> -				continue;
> -			}
> -			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
> +		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
> +			desc->addr = mbuf_to_addr(umem, mbuf);
>   			desc->len = mbuf->pkt_len;
> -			pkt = xsk_umem__get_data(umem->buffer,
> -						 desc->addr);
> -			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
> -			       desc->len);
>   			valid++;
>   			tx_bytes += mbuf->pkt_len;
> +		} else {
> +			if (mbuf->pkt_len <= buf_len) {
> +				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
> +				if (!mbuf_to_tx) {

if (unlikely(mbuf_to_tx == NULL))

See DPDK coding conventions 1.8.1 for how to do pointer comparisons.

> +					rte_pktmbuf_free(mbuf);
> +					continue;
> +				}
> +				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
> +				desc->len = mbuf->pkt_len;
> +				pkt = xsk_umem__get_data(umem->buffer,
> +							 desc->addr);
> +				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
> +				       desc->len);

rte_memcpy()

> +				valid++;
> +				tx_bytes += mbuf->pkt_len;
> +			}
> +			rte_pktmbuf_free(mbuf);
>   		}
> -		rte_pktmbuf_free(mbuf);
>   	}
>   
>   	xsk_ring_prod__submit(&txq->tx, nb_pkts);
> @@ -479,7 +495,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
>   	return (uint64_t)(memhdr->len);
>   }
>   
> -static struct xsk_umem_info *xdp_umem_configure(void)
> +static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
>   {
>   	struct xsk_umem_info *umem;
>   	struct xsk_umem_config usr_config = {
> @@ -497,20 +513,25 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>   		return NULL;
>   	}
>   
> -	snprintf(pool_name, 0x100, "af_xdp_ring");
> -	umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
> -			ETH_AF_XDP_NUM_BUFFERS,
> -			250, 0,
> -			ETH_AF_XDP_FRAME_SIZE -
> -			ETH_AF_XDP_MBUF_OVERHEAD,
> -			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
> -			SOCKET_ID_ANY);
> -
> -	if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
> -		RTE_LOG(ERR, AF_XDP,
> -			"Failed to create rte_mempool\n");
> -		goto err;
> +	if (!mb_pool) {

1.8.1

> +		snprintf(pool_name, 0x100, "af_xdp_ring");

0x100??

> +		umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
> +				ETH_AF_XDP_NUM_BUFFERS,
> +				250, 0,
> +				ETH_AF_XDP_FRAME_SIZE -
> +				ETH_AF_XDP_MBUF_OVERHEAD,
> +				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
> +				SOCKET_ID_ANY);
> +
> +		if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {

1.8.1

> +			RTE_LOG(ERR, AF_XDP, "Failed to create rte_mempool\n");
> +			goto err;
> +		}
> +	} else {
> +		umem->mb_pool = mb_pool;
> +		umem->zc = 1;
>   	}
> +
>   	base_addr = (void *)get_base_addr(umem->mb_pool);
>   
>   	ret = xsk_umem__create(&umem->umem, base_addr,
> @@ -531,16 +552,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>   	return NULL;
>   }
>   
> +static uint8_t
> +check_mempool_zc(struct rte_mempool *mp)
> +{
> +	RTE_ASSERT(mp);
> +
> +	/* must continues */
> +	if (mp->nb_mem_chunks > 1)
> +		return 0;
> +
> +	/* check header size */
> +	if (mp->header_size != RTE_CACHE_LINE_SIZE)
> +		return 0;
> +
> +	/* check base address */
> +	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)

This should be an uintptr_t cast.

> +		return 0;
> +
> +	/* check chunk size */
> +	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
> +	    ETH_AF_XDP_FRAME_SIZE != 0)
> +		return 0;
> +
> +	return 1;
> +}
> +
>   static int
>   xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
> -	      int ring_size)
> +	      int ring_size, struct rte_mempool *mb_pool)
>   {
>   	struct xsk_socket_config cfg;
>   	struct pkt_tx_queue *txq = rxq->pair;
> +	struct rte_mempool *mp;
>   	int ret = 0;
>   	int reserve_size;
>   
> -	rxq->umem = xdp_umem_configure();
> +	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
> +	rxq->umem = xdp_umem_configure(mp);
>   	if (!rxq->umem) {
>   		ret = -ENOMEM;
>   		goto err;
> @@ -627,15 +675,21 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>   
>   	rxq->mb_pool = mb_pool;
>   
> -	if (xsk_configure(internals, rxq, nb_rx_desc)) {
> -		RTE_LOG(ERR, AF_XDP,
> -			"Failed to configure xdp socket\n");
> +	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to configure xdp socket\n");
>   		ret = -EINVAL;
>   		goto err;
>   	}
>   
>   	internals->umem = rxq->umem;
>   
> +	if (mb_pool == internals->umem->mb_pool)
> +		rxq->zc = internals->umem->zc;
> +
> +	if (rxq->zc)
> +		RTE_LOG(INFO, AF_XDP,
> +			"zero copy enabled on rx queue %d\n", rx_queue_id);
> +
>   	dev->data->rx_queues[rx_queue_id] = rxq;
>   	return 0;
>   
> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 5/6] net/af_xdp: enable zero copy
  2019-03-19  8:12     ` Mattias Rönnblom
@ 2019-03-19  8:39       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-19  8:39 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

Hi, Mattias

Thanks for the review.

On 03/19, Mattias Rönnblom wrote:
>On 2019-03-19 08:12, Xiaolong Ye wrote:
>> Try to check if external mempool (from rx_queue_setup) is fit for
>> af_xdp, if it is, it will be registered to af_xdp socket directly and
>> there will be no packet data copy on Rx and Tx.
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>   drivers/net/af_xdp/rte_eth_af_xdp.c | 128 ++++++++++++++++++++--------
>>   1 file changed, 91 insertions(+), 37 deletions(-)
>> 
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index fc60cb5c5..c22791e51 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -62,6 +62,7 @@ struct xsk_umem_info {
>>   	struct xsk_umem *umem;
>>   	struct rte_mempool *mb_pool;
>>   	void *buffer;
>> +	uint8_t zc;
>>   };
>>   struct pkt_rx_queue {
>> @@ -76,6 +77,7 @@ struct pkt_rx_queue {
>>   	struct pkt_tx_queue *pair;
>>   	uint16_t queue_idx;
>> +	uint8_t zc;
>>   };
>>   struct pkt_tx_queue {
>> @@ -191,17 +193,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
>>   		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>> -		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
>> -		if (mbuf) {
>> -			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
>> +		if (rxq->zc) {
>> +			mbuf = addr_to_mbuf(rxq->umem, addr);
>>   			rte_pktmbuf_pkt_len(mbuf) =
>>   				rte_pktmbuf_data_len(mbuf) = len;
>> -			rx_bytes += len;
>>   			bufs[count++] = mbuf;
>>   		} else {
>> -			dropped++;
>> +			mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
>> +			if (mbuf) {
>
>if (likely(mbuf != NULL))

Got it.

>
>> +				memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
>
>Use rte_memcpy.

Got it.

>
>> +				rte_pktmbuf_pkt_len(mbuf) =
>> +					rte_pktmbuf_data_len(mbuf) = len;
>> +				rx_bytes += len;
>> +				bufs[count++] = mbuf;
>> +			} else {
>> +				dropped++;
>> +			}
>> +			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>>   		}
>> -		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>>   	}
>>   	xsk_ring_cons__release(rx, rcvd);
>> @@ -285,22 +294,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   					- ETH_AF_XDP_DATA_HEADROOM;
>>   		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
>>   		mbuf = bufs[i];
>> -		if (mbuf->pkt_len <= buf_len) {
>> -			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
>> -			if (!mbuf_to_tx) {
>> -				rte_pktmbuf_free(mbuf);
>> -				continue;
>> -			}
>> -			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
>> +		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
>> +			desc->addr = mbuf_to_addr(umem, mbuf);
>>   			desc->len = mbuf->pkt_len;
>> -			pkt = xsk_umem__get_data(umem->buffer,
>> -						 desc->addr);
>> -			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
>> -			       desc->len);
>>   			valid++;
>>   			tx_bytes += mbuf->pkt_len;
>> +		} else {
>> +			if (mbuf->pkt_len <= buf_len) {
>> +				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
>> +				if (!mbuf_to_tx) {
>
>if (unlikely(mbuf_to_tx == NULL))
>
>See DPDK coding conventions 1.8.1 for how to do pointer comparisons.

I'll check it and do it correctly in next version.

>
>> +					rte_pktmbuf_free(mbuf);
>> +					continue;
>> +				}
>> +				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
>> +				desc->len = mbuf->pkt_len;
>> +				pkt = xsk_umem__get_data(umem->buffer,
>> +							 desc->addr);
>> +				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
>> +				       desc->len);
>
>rte_memcpy()

Got it.

>
>> +				valid++;
>> +				tx_bytes += mbuf->pkt_len;
>> +			}
>> +			rte_pktmbuf_free(mbuf);
>>   		}
>> -		rte_pktmbuf_free(mbuf);
>>   	}
>>   	xsk_ring_prod__submit(&txq->tx, nb_pkts);
>> @@ -479,7 +495,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
>>   	return (uint64_t)(memhdr->len);
>>   }
>> -static struct xsk_umem_info *xdp_umem_configure(void)
>> +static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
>>   {
>>   	struct xsk_umem_info *umem;
>>   	struct xsk_umem_config usr_config = {
>> @@ -497,20 +513,25 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>>   		return NULL;
>>   	}
>> -	snprintf(pool_name, 0x100, "af_xdp_ring");
>> -	umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
>> -			ETH_AF_XDP_NUM_BUFFERS,
>> -			250, 0,
>> -			ETH_AF_XDP_FRAME_SIZE -
>> -			ETH_AF_XDP_MBUF_OVERHEAD,
>> -			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
>> -			SOCKET_ID_ANY);
>> -
>> -	if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
>> -		RTE_LOG(ERR, AF_XDP,
>> -			"Failed to create rte_mempool\n");
>> -		goto err;
>> +	if (!mb_pool) {
>
>1.8.1

Got it.

>
>> +		snprintf(pool_name, 0x100, "af_xdp_ring");
>
>0x100??

Will use RTE_MEMPOOL_NAMESIZE instead.

>
>> +		umem->mb_pool = rte_pktmbuf_pool_create_with_flags(pool_name,
>> +				ETH_AF_XDP_NUM_BUFFERS,
>> +				250, 0,
>> +				ETH_AF_XDP_FRAME_SIZE -
>> +				ETH_AF_XDP_MBUF_OVERHEAD,
>> +				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
>> +				SOCKET_ID_ANY);
>> +
>> +		if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) {
>
>1.8.1

Got it.

>
>> +			RTE_LOG(ERR, AF_XDP, "Failed to create rte_mempool\n");
>> +			goto err;
>> +		}
>> +	} else {
>> +		umem->mb_pool = mb_pool;
>> +		umem->zc = 1;
>>   	}
>> +
>>   	base_addr = (void *)get_base_addr(umem->mb_pool);
>>   	ret = xsk_umem__create(&umem->umem, base_addr,
>> @@ -531,16 +552,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>>   	return NULL;
>>   }
>> +static uint8_t
>> +check_mempool_zc(struct rte_mempool *mp)
>> +{
>> +	RTE_ASSERT(mp);
>> +
>> +	/* must continues */
>> +	if (mp->nb_mem_chunks > 1)
>> +		return 0;
>> +
>> +	/* check header size */
>> +	if (mp->header_size != RTE_CACHE_LINE_SIZE)
>> +		return 0;
>> +
>> +	/* check base address */
>> +	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
>
>This should be an uintptr_t cast.

Got it.

Thanks,
Xiaolong
>
>> +		return 0;
>> +
>> +	/* check chunk size */
>> +	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
>> +	    ETH_AF_XDP_FRAME_SIZE != 0)
>> +		return 0;
>> +
>> +	return 1;
>> +}
>> +
>>   static int
>>   xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
>> -	      int ring_size)
>> +	      int ring_size, struct rte_mempool *mb_pool)
>>   {
>>   	struct xsk_socket_config cfg;
>>   	struct pkt_tx_queue *txq = rxq->pair;
>> +	struct rte_mempool *mp;
>>   	int ret = 0;
>>   	int reserve_size;
>> -	rxq->umem = xdp_umem_configure();
>> +	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
>> +	rxq->umem = xdp_umem_configure(mp);
>>   	if (!rxq->umem) {
>>   		ret = -ENOMEM;
>>   		goto err;
>> @@ -627,15 +675,21 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>>   	rxq->mb_pool = mb_pool;
>> -	if (xsk_configure(internals, rxq, nb_rx_desc)) {
>> -		RTE_LOG(ERR, AF_XDP,
>> -			"Failed to configure xdp socket\n");
>> +	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to configure xdp socket\n");
>>   		ret = -EINVAL;
>>   		goto err;
>>   	}
>>   	internals->umem = rxq->umem;
>> +	if (mb_pool == internals->umem->mb_pool)
>> +		rxq->zc = internals->umem->zc;
>> +
>> +	if (rxq->zc)
>> +		RTE_LOG(INFO, AF_XDP,
>> +			"zero copy enabled on rx queue %d\n", rx_queue_id);
>> +
>>   	dev->data->rx_queues[rx_queue_id] = rxq;
>>   	return 0;
>> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-19  9:07     ` Mattias Rönnblom
  2019-03-19  9:49       ` Ye Xiaolong
  2019-03-19 16:14     ` Stephen Hemminger
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 214+ messages in thread
From: Mattias Rönnblom @ 2019-03-19  9:07 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn

On 2019-03-19 08:12, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   MAINTAINERS                                   |   6 +
>   config/common_base                            |   5 +
>   config/common_linux                           |   1 +
>   doc/guides/nics/af_xdp.rst                    |  45 +
>   doc/guides/nics/features/af_xdp.ini           |  11 +
>   doc/guides/nics/index.rst                     |   1 +
>   doc/guides/rel_notes/release_19_05.rst        |   7 +
>   drivers/net/Makefile                          |   1 +
>   drivers/net/af_xdp/Makefile                   |  33 +
>   drivers/net/af_xdp/meson.build                |  21 +
>   drivers/net/af_xdp/rte_eth_af_xdp.c           | 930 ++++++++++++++++++
>   drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>   drivers/net/meson.build                       |   1 +
>   mk/rte.app.mk                                 |   1 +
>   14 files changed, 1066 insertions(+)
>   create mode 100644 doc/guides/nics/af_xdp.rst
>   create mode 100644 doc/guides/nics/features/af_xdp.ini
>   create mode 100644 drivers/net/af_xdp/Makefile
>   create mode 100644 drivers/net/af_xdp/meson.build
>   create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>   create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 452b8eb82..1cc54b439 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
>   F: drivers/net/af_packet/
>   F: doc/guides/nics/features/afpacket.ini
>   
> +Linux AF_XDP
> +M: Xiaolong Ye <xiaolong.ye@intel.com>
> +M: Qi Zhang <qi.z.zhang@intel.com>
> +F: drivers/net/af_xdp/
> +F: doc/guides/nics/features/af_xdp.rst
> +
>   Amazon ENA
>   M: Marcin Wojtas <mw@semihalf.com>
>   M: Michal Krawczyk <mk@semihalf.com>
> diff --git a/config/common_base b/config/common_base
> index 0b09a9348..4044de205 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>   #
>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>   
> +#
> +# Compile software PMD backed by AF_XDP sockets (Linux only)
> +#
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
> +
>   #
>   # Compile link bonding PMD library
>   #
> diff --git a/config/common_linux b/config/common_linux
> index 75334273d..0b1249da0 100644
> --- a/config/common_linux
> +++ b/config/common_linux
> @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
>   CONFIG_RTE_LIBRTE_PMD_VHOST=y
>   CONFIG_RTE_LIBRTE_IFC_PMD=y
>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
>   CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
>   CONFIG_RTE_LIBRTE_PMD_TAP=y
>   CONFIG_RTE_LIBRTE_AVP_PMD=y
> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> new file mode 100644
> index 000000000..dd5654dd1
> --- /dev/null
> +++ b/doc/guides/nics/af_xdp.rst
> @@ -0,0 +1,45 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018 Intel Corporation.
> +
> +AF_XDP Poll Mode Driver
> +==========================
> +
> +AF_XDP is an address family that is optimized for high performance
> +packet processing. AF_XDP sockets enable the possibility for XDP program to
> +redirect packets to a memory buffer in userspace.
> +
> +For the full details behind AF_XDP socket, you can refer to
> +`AF_XDP documentation in the Kernel
> +<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
> +
> +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
> +specific netdev queue, it allows a DPDK application to send and receive raw
> +packets through the socket which would bypass the kernel network stack.
> +Current implementation only supports single queue, multi-queues feature will
> +be added later.
> +
> +Options
> +-------
> +
> +The following options can be provided to set up an af_xdp port in DPDK.
> +
> +*   ``iface`` - name of the Kernel interface to attach to (required);
> +*   ``queue`` - netdev queue id (optional, default 0);
> +
> +Prerequisites
> +-------------
> +
> +This is a Linux-specific PMD, thus the following prerequisites apply:
> +
> +*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
> +*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
> +*  A Kernel bound interface to attach to.
> +
> +Set up an af_xdp interface
> +-----------------------------
> +
> +The following example will set up an af_xdp interface in DPDK:
> +
> +.. code-block:: console
> +
> +    --vdev eth_af_xdp,iface=ens786f1,queue=0
> diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
> new file mode 100644
> index 000000000..7b8fcce00
> --- /dev/null
> +++ b/doc/guides/nics/features/af_xdp.ini
> @@ -0,0 +1,11 @@
> +;
> +; Supported features of the 'af_xdp' network poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Link status          = Y
> +MTU update           = Y
> +Promiscuous mode     = Y
> +Stats per queue      = Y
> +x86-64               = Y
> \ No newline at end of file
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 5c80e3baa..a4b80a3d0 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -12,6 +12,7 @@ Network Interface Controller Drivers
>       features
>       build_and_test
>       af_packet
> +    af_xdp
>       ark
>       atlantic
>       avp
> diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
> index 61a2c7383..062facf89 100644
> --- a/doc/guides/rel_notes/release_19_05.rst
> +++ b/doc/guides/rel_notes/release_19_05.rst
> @@ -65,6 +65,13 @@ New Features
>       process.
>     * Added support for Rx packet types list in a secondary process.
>   
> +* **Added the AF_XDP PMD.**
> +
> +  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
> +  and bind it to a specific netdev queue, it allows a DPDK application to send
> +  and receive raw packets through the socket which would bypass the kernel
> +  network stack to achieve high performance packet processing.
> +
>   * **Updated Mellanox drivers.**
>   
>      New features and improvements were done in mlx4 and mlx5 PMDs:
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 502869a87..5d401b8c5 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
>   endif
>   
>   DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
>   DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>   DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
>   DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
> diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
> new file mode 100644
> index 000000000..6cf0ed7db
> --- /dev/null
> +++ b/drivers/net/af_xdp/Makefile
> @@ -0,0 +1,33 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_af_xdp.a
> +
> +EXPORT_MAP := rte_pmd_af_xdp_version.map
> +
> +LIBABIVER := 1
> +
> +CFLAGS += -O3
> +
> +# require kernel version >= v5.1-rc1
> +LINUX_VERSION := $(shell uname -r)
> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/include
> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/lib/bpf
> +
> +CFLAGS += $(WERROR_FLAGS)
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
> +LDLIBS += -lrte_bus_vdev
> +LDLIBS += -lbpf
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
> new file mode 100644
> index 000000000..635e67483
> --- /dev/null
> +++ b/drivers/net/af_xdp/meson.build
> @@ -0,0 +1,21 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +if host_machine.system() != 'linux'
> +	build = false
> +endif
> +
> +bpf_dep = dependency('libbpf', required: false)
> +if bpf_dep.found()
> +	build = true
> +else
> +	bpf_dep = cc.find_library('libbpf', required: false)
> +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
> +		build = true
> +		pkgconfig_extra_libs += '-lbpf'
> +	else
> +		build = false
> +	endif
> +endif
> +sources = files('rte_eth_af_xdp.c')
> +ext_deps += bpf_dep
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> new file mode 100644
> index 000000000..96dedc0c4
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -0,0 +1,930 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation.
> + */
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev_driver.h>
> +#include <rte_ethdev_vdev.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +#include <rte_bus_vdev.h>
> +#include <rte_string_fns.h>
> +
> +#include <linux/if_ether.h>
> +#include <linux/if_xdp.h>
> +#include <linux/if_link.h>
> +#include <asm/barrier.h>

Is this include used?

> +#include <arpa/inet.h>
> +#include <net/if.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +#include <poll.h>
> +#include <bpf/bpf.h>
> +#include <xsk.h>
> +
> +#define RTE_LOGTYPE_AF_XDP RTE_LOGTYPE_USER1
> +#ifndef SOL_XDP
> +#define SOL_XDP 283
> +#endif
> +
> +#ifndef AF_XDP
> +#define AF_XDP 44
> +#endif
> +
> +#ifndef PF_XDP
> +#define PF_XDP AF_XDP
> +#endif
> +
> +#define ETH_AF_XDP_IFACE_ARG			"iface"
> +#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
> +
> +#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
> +#define ETH_AF_XDP_NUM_BUFFERS		4096
> +#define ETH_AF_XDP_DATA_HEADROOM	0
> +#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
> +#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
> +
> +#define ETH_AF_XDP_RX_BATCH_SIZE	32
> +#define ETH_AF_XDP_TX_BATCH_SIZE	32
> +
> +#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
> +
> +struct xsk_umem_info {
> +	struct xsk_ring_prod fq;
> +	struct xsk_ring_cons cq;
> +	struct xsk_umem *umem;
> +	struct rte_ring *buf_ring;
> +	void *buffer;
> +};
> +
> +struct pkt_rx_queue {
> +	struct xsk_ring_cons rx;
> +	struct xsk_umem_info *umem;
> +	struct xsk_socket *xsk;
> +	struct rte_mempool *mb_pool;
> +
> +	uint64_t rx_pkts;
> +	uint64_t rx_bytes;
> +	uint64_t rx_dropped;
> +
> +	struct pkt_tx_queue *pair;
> +	uint16_t queue_idx;
> +};
> +
> +struct pkt_tx_queue {
> +	struct xsk_ring_prod tx;
> +
> +	uint64_t tx_pkts;
> +	uint64_t err_pkts;
> +	uint64_t tx_bytes;
> +
> +	struct pkt_rx_queue *pair;
> +	uint16_t queue_idx;
> +};
> +
> +struct pmd_internals {
> +	int if_index;
> +	char if_name[IFNAMSIZ];
> +	uint16_t queue_idx;
> +	struct ether_addr eth_addr;
> +	struct xsk_umem_info *umem;
> +	struct rte_mempool *mb_pool_share;
> +
> +	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
> +	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
> +};
> +
> +static const char * const valid_arguments[] = {
> +	ETH_AF_XDP_IFACE_ARG,
> +	ETH_AF_XDP_QUEUE_IDX_ARG,
> +	NULL
> +};
> +
> +static struct rte_eth_link pmd_link = {
> +	.link_speed = ETH_SPEED_NUM_10G,
> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
> +	.link_status = ETH_LINK_DOWN,
> +	.link_autoneg = ETH_LINK_AUTONEG
> +};
> +
> +static inline int
> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
> +{
> +	struct xsk_ring_prod *fq = &umem->fq;
> +	uint32_t idx;
> +	void *addr = NULL;
> +	int i, ret = 0;

No need to initialize 'ret'. Is there a point to set 'addr'?

> +
> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
> +	if (!ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
> +		return ret;
> +	}
> +
> +	for (i = 0; i < reserve_size; i++) {
> +		rte_ring_dequeue(umem->buf_ring, &addr);
> +		*xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;

Consider introducing a tmp variable to make this more readable.

> +	}
> +
> +	xsk_ring_prod__submit(fq, reserve_size);
> +
> +	return 0;
> +}
> +
> +static uint16_t
> +eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> +{
> +	struct pkt_rx_queue *rxq = queue;
> +	struct xsk_ring_cons *rx = &rxq->rx;
> +	struct xsk_umem_info *umem = rxq->umem;
> +	struct xsk_ring_prod *fq = &umem->fq;
> +	uint32_t idx_rx;
> +	uint32_t free_thresh = fq->size >> 1;
> +	struct rte_mbuf *mbuf;
> +	unsigned long dropped = 0;
> +	unsigned long rx_bytes = 0;
> +	uint16_t count = 0;
> +	int rcvd, i;
> +
> +	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
> +		nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
> +
> +	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx); > +	if (!rcvd)

Since peek returns the number of entries, not a boolean, do:
rcvd == 0

> +		return 0;
> +
> +	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
> +		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
> +
> +	for (i = 0; i < rcvd; i++) {
> +		uint64_t addr = xsk_ring_cons__rx_desc(rx, idx_rx)->addr;
> +		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;

Use a tmp variable, instead of two calls.

> +		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
> +

Don't mix declaration and code. Why is this a char pointer? As opppose 
to void.

> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
> +		if (mbuf) {

1.8.1

> +			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);

rte_memcpy()

> +			rte_pktmbuf_pkt_len(mbuf) =
> +				rte_pktmbuf_data_len(mbuf) = len;

Consider splitting this into two statements.

> +			rx_bytes += len;
> +			bufs[count++] = mbuf;
> +		} else {
> +			dropped++;
> +		}
> +		rte_ring_enqueue(umem->buf_ring, (void *)addr);
> +	}
> +
> +	xsk_ring_cons__release(rx, rcvd);
> +
> +	/* statistics */
> +	rxq->rx_pkts += (rcvd - dropped);
> +	rxq->rx_bytes += rx_bytes;
> +	rxq->rx_dropped += dropped;
> +
> +	return count;
> +}
> +
> +static void pull_umem_cq(struct xsk_umem_info *umem, int size)
> +{
> +	struct xsk_ring_cons *cq = &umem->cq;
> +	int i, n;
> +	uint32_t idx_cq;
> +	uint64_t addr;
> +
> +	n = xsk_ring_cons__peek(cq, size, &idx_cq);

Use size_t for n.

> +	if (n > 0) {
> +		for (i = 0; i < n; i++) {

Consider declaring 'addr' in this scope.

> +			addr = *xsk_ring_cons__comp_addr(cq,
> +							 idx_cq++); > +			rte_ring_enqueue(umem->buf_ring, (void *)addr);
> +		}
> +
> +		xsk_ring_cons__release(cq, n);
> +	}
> +}
> +
> +static void kick_tx(struct pkt_tx_queue *txq)
> +{
> +	struct xsk_umem_info *umem = txq->pair->umem;
> +	int ret;
> +
> +	while (1) {

for (;;)

> +		ret = sendto(xsk_socket__fd(txq->pair->xsk), NULL, 0,
> +			     MSG_DONTWAIT, NULL, 0);
> +
> +		/* everything is ok */
> +		if (ret >= 0)

Use likely()?

> +			break;
> +
> +		/* some thing unexpected */
> +		if (errno != EBUSY && errno != EAGAIN)
> +			break;
> +
> +		/* pull from complete qeueu to leave more space */
> +		if (errno == EAGAIN)
> +			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
> +	}
> +	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
> +}
> +
> +static uint16_t
> +eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> +{
> +	struct pkt_tx_queue *txq = queue;
> +	struct xsk_umem_info *umem = txq->pair->umem;
> +	struct rte_mbuf *mbuf;
> +	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
> +	unsigned long tx_bytes = 0;
> +	int i, valid = 0;
> +	uint32_t idx_tx;
> +
> +	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
> +		nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;

Use RTE_MIN().

> +
> +	pull_umem_cq(umem, nb_pkts);
> +
> +	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
> +					nb_pkts, NULL);
> +	if (!nb_pkts)

nb_pkts == 0

> +		return 0;
> +
> +	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
> +		kick_tx(txq);
> +		return 0;
> +	}
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		struct xdp_desc *desc;
> +		char *pkt;

Use void pointer?

> +		unsigned int buf_len = ETH_AF_XDP_FRAME_SIZE
> +					- ETH_AF_XDP_DATA_HEADROOM;

Use uint32_t, as you seem to do elsewhere.

> +		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
> +		mbuf = bufs[i];
> +		if (mbuf->pkt_len <= buf_len) {
> +			desc->addr = (uint64_t)addrs[valid];
> +			desc->len = mbuf->pkt_len;
> +			pkt = xsk_umem__get_data(umem->buffer,
> +						 desc->addr);
> +			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
> +			       desc->len);

rte_memcpy()

> +			valid++;
> +			tx_bytes += mbuf->pkt_len;
> +		}
> +		rte_pktmbuf_free(mbuf);
> +	}
> +
> +	xsk_ring_prod__submit(&txq->tx, nb_pkts);
> +
> +	kick_tx(txq);
> +
> +	if (valid < nb_pkts)
> +		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
> +				 nb_pkts - valid, NULL);
> +
> +	txq->err_pkts += nb_pkts - valid;
> +	txq->tx_pkts += valid;
> +	txq->tx_bytes += tx_bytes;
> +
> +	return nb_pkts;
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	dev->data->dev_link.link_status = ETH_LINK_UP;
> +
> +	return 0;
> +}
> +
> +/* This function gets called when the current port gets stopped. */
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	dev->data->dev_link.link_status = ETH_LINK_DOWN;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +	/* rx/tx must be paired */
> +	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
> +	dev_info->max_rx_queues = 1;
> +	dev_info->max_tx_queues = 1;
> +	dev_info->min_rx_bufsize = 0;
> +
> +	dev_info->default_rxportconf.nb_queues = 1;
> +	dev_info->default_txportconf.nb_queues = 1;
> +	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
> +	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
> +}
> +
> +static int
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct xdp_statistics xdp_stats;
> +	struct pkt_rx_queue *rxq;
> +	socklen_t optlen;
> +	int i;
> +
> +	optlen = sizeof(struct xdp_statistics);
> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +		rxq = &internals->rx_queues[i];
> +		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
> +		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
> +
> +		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
> +		stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
> +
> +		stats->ipackets += stats->q_ipackets[i];
> +		stats->ibytes += stats->q_ibytes[i];
> +		stats->imissed += internals->rx_queues[i].rx_dropped;
> +		getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, XDP_STATISTICS,
> +				&xdp_stats, &optlen);
> +		stats->imissed += xdp_stats.rx_dropped;
> +
> +		stats->opackets += stats->q_opackets[i];
> +		stats->oerrors += stats->q_errors[i];
> +		stats->oerrors += internals->tx_queues[i].err_pkts;
> +		stats->obytes += stats->q_obytes[i];
> +	}
> +
> +	return 0;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	int i;
> +
> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
> +		internals->rx_queues[i].rx_pkts = 0;
> +		internals->rx_queues[i].rx_bytes = 0;
> +		internals->rx_queues[i].rx_dropped = 0;
> +
> +		internals->tx_queues[i].tx_pkts = 0;
> +		internals->tx_queues[i].err_pkts = 0;
> +		internals->tx_queues[i].tx_bytes = 0;
> +	}
> +}
> +
> +static void remove_xdp_program(struct pmd_internals *internals)
> +{
> +	uint32_t curr_prog_id = 0;
> +
> +	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
> +				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
> +		RTE_LOG(ERR, AF_XDP, "bpf_get_link_xdp_id failed\n");
> +		return;
> +	}
> +	bpf_set_link_xdp_fd(internals->if_index, -1,
> +			XDP_FLAGS_UPDATE_IF_NOEXIST);
> +}
> +
> +static void
> +eth_dev_close(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct pkt_rx_queue *rxq;
> +	int i;
> +
> +	RTE_LOG(INFO, AF_XDP, "Closing AF_XDP ethdev on numa socket %u\n",
> +		rte_socket_id());
> +
> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
> +		rxq = &internals->rx_queues[i];
> +		if (!rxq->umem)
> +			break;
> +		xsk_socket__delete(rxq->xsk);
> +	}
> +
> +	(void)xsk_umem__delete(internals->umem->umem);
> +	remove_xdp_program(internals);
> +}
> +
> +static void
> +eth_queue_release(void *q __rte_unused)
> +{
> +}
> +
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static void xdp_umem_destroy(struct xsk_umem_info *umem)
> +{
> +	free(umem->buffer);
> +	umem->buffer = NULL;
> +
> +	rte_ring_free(umem->buf_ring);
> +	umem->buf_ring = NULL;
> +
> +	free(umem);
> +	umem = NULL;
> +}
> +
> +static struct xsk_umem_info *xdp_umem_configure(void)
> +{
> +	struct xsk_umem_info *umem;
> +	struct xsk_umem_config usr_config = {
> +		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
> +		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
> +		.frame_size = ETH_AF_XDP_FRAME_SIZE,
> +		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
> +	void *bufs = NULL;
> +	char ring_name[0x100];
> +	int ret;
> +	uint64_t i;
> +
> +	umem = calloc(1, sizeof(*umem));
> +	if (!umem) {

1.8.1

> +		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
> +		return NULL;
> +	}
> +
> +	snprintf(ring_name, 0x100, "af_xdp_ring");

Again the magical 0x100.

> +	umem->buf_ring = rte_ring_create(ring_name,
> +					 ETH_AF_XDP_NUM_BUFFERS,
> +					 SOCKET_ID_ANY,
> +					 0x0);
> +	if (!umem->buf_ring) {

1.8.1

> +		RTE_LOG(ERR, AF_XDP,
> +			"Failed to create rte_ring\n");
> +		goto err;
> +	}
> +
> +	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
> +		rte_ring_enqueue(umem->buf_ring,
> +				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
> +					  ETH_AF_XDP_DATA_HEADROOM));
> +
> +	if (posix_memalign(&bufs, getpagesize(),
> +			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
> +		goto err;
> +	}
> +	ret = xsk_umem__create(&umem->umem, bufs,
> +			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
> +			       &umem->fq, &umem->cq,
> +			       &usr_config);
> +
> +	if (ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
> +		goto err;
> +	}
> +	umem->buffer = bufs;
> +
> +	return umem;
> +
> +err:
> +	xdp_umem_destroy(umem);
> +	return NULL;
> +}
> +
> +static int
> +xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
> +	      int ring_size)
> +{
> +	struct xsk_socket_config cfg;
> +	struct pkt_tx_queue *txq = rxq->pair;
> +	int ret = 0;
> +	int reserve_size;
> +
> +	rxq->umem = xdp_umem_configure();
> +	if (!rxq->umem) {

1.8.1

> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	cfg.rx_size = ring_size;
> +	cfg.tx_size = ring_size;
> +	cfg.libbpf_flags = 0;
> +	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> +	cfg.bind_flags = 0;
> +	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> +			internals->queue_idx, rxq->umem->umem, &rxq->rx,
> +			&txq->tx, &cfg);
> +	if (ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to create xsk socket.\n");
> +		goto err;
> +	}
> +
> +	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
> +	ret = reserve_fill_queue(rxq->umem, reserve_size);
> +	if (ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve fill queue.\n");
> +		goto err;
> +	}
> +
> +	return 0;
> +
> +err:
> +	xdp_umem_destroy(rxq->umem);
> +
> +	return ret;
> +}
> +
> +static void
> +queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
> +{
> +	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
> +	struct pkt_tx_queue *txq = rxq->pair;
> +	int xsk_fd = xsk_socket__fd(rxq->xsk);
> +
> +	if (xsk_fd) {
> +		close(xsk_fd);
> +		if (internals->umem) {

1.8.1

> +			xdp_umem_destroy(internals->umem);
> +			internals->umem = NULL;
> +		}
> +	}
> +	memset(rxq, 0, sizeof(*rxq));
> +	memset(txq, 0, sizeof(*txq));
> +	rxq->pair = txq;
> +	txq->pair = rxq;
> +	rxq->queue_idx = queue_idx;
> +	txq->queue_idx = queue_idx;
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev,
> +		   uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	unsigned int buf_size, data_size;

uint32_t

> +	struct pkt_rx_queue *rxq;
> +	int ret = 0;

No need to set 'ret'. Alternatively, restructure so you always return 'ret'.

> +
> +	rxq = &internals->rx_queues[rx_queue_id];
> +	queue_reset(internals, rx_queue_id);
> +
> +	/* Now get the space available for data in the mbuf */
> +	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
> +		RTE_PKTMBUF_HEADROOM;
> +	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
> +
> +	if (data_size > buf_size) {
> +		RTE_LOG(ERR, AF_XDP,
> +			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
> +			dev->device->name, data_size, buf_size);
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	rxq->mb_pool = mb_pool;
> +
> +	if (xsk_configure(internals, rxq, nb_rx_desc)) {
> +		RTE_LOG(ERR, AF_XDP,
> +			"Failed to configure xdp socket\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	internals->umem = rxq->umem;
> +
> +	dev->data->rx_queues[rx_queue_id] = rxq;
> +	return 0;
> +
> +err:
> +	queue_reset(internals, rx_queue_id);
> +	return ret;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev,
> +		   uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct pkt_tx_queue *txq;
> +
> +	txq = &internals->tx_queues[tx_queue_id];
> +
> +	dev->data->tx_queues[tx_queue_id] = txq;
> +	return 0;
> +}
> +
> +static int
> +eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct ifreq ifr = { .ifr_mtu = mtu };
> +	int ret;
> +	int s;
> +
> +	s = socket(PF_INET, SOCK_DGRAM, 0);
> +	if (s < 0)
> +		return -EINVAL;
> +
> +	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
> +	ret = ioctl(s, SIOCSIFMTU, &ifr);
> +	close(s);
> +
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static void
> +eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
> +{
> +	struct ifreq ifr;
> +	int s;
> +
> +	s = socket(PF_INET, SOCK_DGRAM, 0);
> +	if (s < 0)
> +		return;
> +
> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
> +	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
> +		goto out;
> +	ifr.ifr_flags &= mask;
> +	ifr.ifr_flags |= flags;
> +	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
> +		goto out;
> +out:
> +	close(s);
> +}
> +
> +static void
> +eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
> +}
> +
> +static void
> +eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
> +}
> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_close = eth_dev_close,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.mtu_set = eth_dev_mtu_set,
> +	.promiscuous_enable = eth_dev_promiscuous_enable,
> +	.promiscuous_disable = eth_dev_promiscuous_disable,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +/** parse integer from integer argument */
> +static int
> +parse_integer_arg(const char *key __rte_unused,
> +		  const char *value, void *extra_args)
> +{
> +	int *i = (int *)extra_args;
> +
> +	*i = atoi(value);

Use strtol().

> +	if (*i < 0) {
> +		RTE_LOG(ERR, AF_XDP, "Argument has to be positive.\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +/** parse name argument */
> +static int
> +parse_name_arg(const char *key __rte_unused,
> +	       const char *value, void *extra_args)
> +{
> +	char *name = extra_args;
> +
> +	if (strlen(value) > IFNAMSIZ) {

The buffer is IFNAMSIZ bytes (which it should be), so it can't hold a 
string with strlen() == IFNAMSIZ.

> +		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
> +			"%u bytes.\n", value, IFNAMSIZ);
> +		return -EINVAL;
> +	}
> +
> +	strlcpy(name, value, IFNAMSIZ);
> +
> +	return 0;
> +}
> +
> +static int
> +parse_parameters(struct rte_kvargs *kvlist,
> +		 char *if_name,
> +		 int *queue_idx)
> +{
> +	int ret = 0;

Should not be initialized.

> +
> +	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
> +				 &parse_name_arg, if_name);
> +	if (ret < 0)
> +		goto free_kvlist;
> +
> +	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
> +				 &parse_integer_arg, queue_idx);
> +	if (ret < 0)
> +		goto free_kvlist;

I fail to see the point of this goto, but maybe there's more code to 
follow in future patches.

> +
> +free_kvlist:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
> +static int
> +get_iface_info(const char *if_name,
> +	       struct ether_addr *eth_addr,
> +	       int *if_index)
> +{
> +	struct ifreq ifr;
> +	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
> +
> +	if (sock < 0)
> +		return -1;
> +
> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
> +		goto error;
> +
> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
> +		goto error;
> +
> +	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
> +
> +	close(sock);
> +	*if_index = if_nametoindex(if_name);
> +	return 0;
> +
> +error:
> +	close(sock);
> +	return -1;
> +}
> +
> +static int
> +init_internals(struct rte_vdev_device *dev,
> +	       const char *if_name,
> +	       int queue_idx,
> +	       struct rte_eth_dev **eth_dev)
> +{
> +	const char *name = rte_vdev_device_name(dev);
> +	const unsigned int numa_node = dev->device.numa_node;
> +	struct pmd_internals *internals = NULL;
> +	int ret;
> +	int i;
> +
> +	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
> +	if (!internals)

1.8.1

> +		return -ENOMEM;
> +
> +	internals->queue_idx = queue_idx;
> +	strlcpy(internals->if_name, if_name, IFNAMSIZ);
> +
> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
> +		internals->tx_queues[i].pair = &internals->rx_queues[i];
> +		internals->rx_queues[i].pair = &internals->tx_queues[i];
> +	}
> +
> +	ret = get_iface_info(if_name, &internals->eth_addr,
> +			     &internals->if_index);
> +	if (ret)
> +		goto err;
> +
> +	*eth_dev = rte_eth_vdev_allocate(dev, 0);
> +	if (!*eth_dev)

1.8.1

> +		goto err;
> +
> +	(*eth_dev)->data->dev_private = internals;
> +	(*eth_dev)->data->dev_link = pmd_link;
> +	(*eth_dev)->data->mac_addrs = &internals->eth_addr;
> +	(*eth_dev)->dev_ops = &ops;
> +	(*eth_dev)->rx_pkt_burst = eth_af_xdp_rx;
> +	(*eth_dev)->tx_pkt_burst = eth_af_xdp_tx;
> +
> +	return 0;
> +
> +err:
> +	rte_free(internals);
> +	return -1;
> +}
> +
> +static int
> +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> +{
> +	struct rte_kvargs *kvlist;
> +	char if_name[IFNAMSIZ];
> +	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	const char *name;
> +	int ret;
> +
> +	RTE_LOG(INFO, AF_XDP, "Initializing pmd_af_xdp for %s\n",
> +		rte_vdev_device_name(dev));
> +
> +	name = rte_vdev_device_name(dev);
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
> +		strlen(rte_vdev_device_args(dev)) == 0) {
> +		eth_dev = rte_eth_dev_attach_secondary(name);
> +		if (!eth_dev) {
> +			RTE_LOG(ERR, AF_XDP, "Failed to probe %s\n", name);
> +			return -EINVAL;
> +		}
> +		eth_dev->dev_ops = &ops;
> +		rte_eth_dev_probing_finish(eth_dev);
> +		return 0;
> +	}
> +
> +	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
> +	if (!kvlist) {
> +		RTE_LOG(ERR, AF_XDP, "Invalid kvargs key\n");
> +		return -EINVAL;
> +	}
> +
> +	if (dev->device.numa_node == SOCKET_ID_ANY)
> +		dev->device.numa_node = rte_socket_id();
> +
> +	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
> +		RTE_LOG(ERR, AF_XDP, "Invalid kvargs value\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = init_internals(dev, if_name, xsk_queue_idx, &eth_dev);
> +	if (ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to init internals\n");
> +		return ret;
> +	}
> +
> +	rte_eth_dev_probing_finish(eth_dev);
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internals *internals;
> +
> +	RTE_LOG(INFO, AF_XDP, "Removing AF_XDP ethdev on numa socket %u\n",
> +		rte_socket_id());
> +
> +	if (!dev)
> +		return -1;
> +
> +	/* find the ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
> +	if (!eth_dev)
> +		return -1;
> +
> +	internals = eth_dev->data->dev_private;
> +
> +	rte_ring_free(internals->umem->buf_ring);
> +	rte_free(internals->umem->buffer);
> +	rte_free(internals->umem);
> +
> +	rte_eth_dev_release_port(eth_dev);
> +
> +
> +	return 0;
> +}
> +
> +static struct rte_vdev_driver pmd_af_xdp_drv = {
> +	.probe = rte_pmd_af_xdp_probe,
> +	.remove = rte_pmd_af_xdp_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
> +RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
> +			      "iface=<string> "
> +			      "queue=<int> ");
> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> new file mode 100644
> index 000000000..c6db030fe
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> @@ -0,0 +1,3 @@
> +DPDK_19.05 {
> +	local: *;
> +};
> diff --git a/drivers/net/meson.build b/drivers/net/meson.build
> index 3ecc78cee..1105e72d8 100644
> --- a/drivers/net/meson.build
> +++ b/drivers/net/meson.build
> @@ -2,6 +2,7 @@
>   # Copyright(c) 2017 Intel Corporation
>   
>   drivers = ['af_packet',
> +	'af_xdp',
>   	'ark',
>   	'atlantic',
>   	'avp',
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 262132fc6..be0af73cc 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
>   endif
>   
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19  9:07     ` Mattias Rönnblom
@ 2019-03-19  9:49       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-19  9:49 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

Hi, Mattias

Thanks for your comments.

On 03/19, Mattias Rönnblom wrote:
>On 2019-03-19 08:12, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>   MAINTAINERS                                   |   6 +
>>   config/common_base                            |   5 +
>>   config/common_linux                           |   1 +
>>   doc/guides/nics/af_xdp.rst                    |  45 +
>>   doc/guides/nics/features/af_xdp.ini           |  11 +
>>   doc/guides/nics/index.rst                     |   1 +
>>   doc/guides/rel_notes/release_19_05.rst        |   7 +
>>   drivers/net/Makefile                          |   1 +
>>   drivers/net/af_xdp/Makefile                   |  33 +
>>   drivers/net/af_xdp/meson.build                |  21 +
>>   drivers/net/af_xdp/rte_eth_af_xdp.c           | 930 ++++++++++++++++++
>>   drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>>   drivers/net/meson.build                       |   1 +
>>   mk/rte.app.mk                                 |   1 +
>>   14 files changed, 1066 insertions(+)
>>   create mode 100644 doc/guides/nics/af_xdp.rst
>>   create mode 100644 doc/guides/nics/features/af_xdp.ini
>>   create mode 100644 drivers/net/af_xdp/Makefile
>>   create mode 100644 drivers/net/af_xdp/meson.build
>>   create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>>   create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 452b8eb82..1cc54b439 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
>>   F: drivers/net/af_packet/
>>   F: doc/guides/nics/features/afpacket.ini
>> +Linux AF_XDP
>> +M: Xiaolong Ye <xiaolong.ye@intel.com>
>> +M: Qi Zhang <qi.z.zhang@intel.com>
>> +F: drivers/net/af_xdp/
>> +F: doc/guides/nics/features/af_xdp.rst
>> +
>>   Amazon ENA
>>   M: Marcin Wojtas <mw@semihalf.com>
>>   M: Michal Krawczyk <mk@semihalf.com>
>> diff --git a/config/common_base b/config/common_base
>> index 0b09a9348..4044de205 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>>   #
>>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>> +#
>> +# Compile software PMD backed by AF_XDP sockets (Linux only)
>> +#
>> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
>> +
>>   #
>>   # Compile link bonding PMD library
>>   #
>> diff --git a/config/common_linux b/config/common_linux
>> index 75334273d..0b1249da0 100644
>> --- a/config/common_linux
>> +++ b/config/common_linux
>> @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
>>   CONFIG_RTE_LIBRTE_PMD_VHOST=y
>>   CONFIG_RTE_LIBRTE_IFC_PMD=y
>>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
>>   CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
>>   CONFIG_RTE_LIBRTE_PMD_TAP=y
>>   CONFIG_RTE_LIBRTE_AVP_PMD=y
>> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
>> new file mode 100644
>> index 000000000..dd5654dd1
>> --- /dev/null
>> +++ b/doc/guides/nics/af_xdp.rst
>> @@ -0,0 +1,45 @@
>> +..  SPDX-License-Identifier: BSD-3-Clause
>> +    Copyright(c) 2018 Intel Corporation.
>> +
>> +AF_XDP Poll Mode Driver
>> +==========================
>> +
>> +AF_XDP is an address family that is optimized for high performance
>> +packet processing. AF_XDP sockets enable the possibility for XDP program to
>> +redirect packets to a memory buffer in userspace.
>> +
>> +For the full details behind AF_XDP socket, you can refer to
>> +`AF_XDP documentation in the Kernel
>> +<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
>> +
>> +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
>> +specific netdev queue, it allows a DPDK application to send and receive raw
>> +packets through the socket which would bypass the kernel network stack.
>> +Current implementation only supports single queue, multi-queues feature will
>> +be added later.
>> +
>> +Options
>> +-------
>> +
>> +The following options can be provided to set up an af_xdp port in DPDK.
>> +
>> +*   ``iface`` - name of the Kernel interface to attach to (required);
>> +*   ``queue`` - netdev queue id (optional, default 0);
>> +
>> +Prerequisites
>> +-------------
>> +
>> +This is a Linux-specific PMD, thus the following prerequisites apply:
>> +
>> +*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
>> +*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
>> +*  A Kernel bound interface to attach to.
>> +
>> +Set up an af_xdp interface
>> +-----------------------------
>> +
>> +The following example will set up an af_xdp interface in DPDK:
>> +
>> +.. code-block:: console
>> +
>> +    --vdev eth_af_xdp,iface=ens786f1,queue=0
>> diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
>> new file mode 100644
>> index 000000000..7b8fcce00
>> --- /dev/null
>> +++ b/doc/guides/nics/features/af_xdp.ini
>> @@ -0,0 +1,11 @@
>> +;
>> +; Supported features of the 'af_xdp' network poll mode driver.
>> +;
>> +; Refer to default.ini for the full list of available PMD features.
>> +;
>> +[Features]
>> +Link status          = Y
>> +MTU update           = Y
>> +Promiscuous mode     = Y
>> +Stats per queue      = Y
>> +x86-64               = Y
>> \ No newline at end of file
>> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
>> index 5c80e3baa..a4b80a3d0 100644
>> --- a/doc/guides/nics/index.rst
>> +++ b/doc/guides/nics/index.rst
>> @@ -12,6 +12,7 @@ Network Interface Controller Drivers
>>       features
>>       build_and_test
>>       af_packet
>> +    af_xdp
>>       ark
>>       atlantic
>>       avp
>> diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
>> index 61a2c7383..062facf89 100644
>> --- a/doc/guides/rel_notes/release_19_05.rst
>> +++ b/doc/guides/rel_notes/release_19_05.rst
>> @@ -65,6 +65,13 @@ New Features
>>       process.
>>     * Added support for Rx packet types list in a secondary process.
>> +* **Added the AF_XDP PMD.**
>> +
>> +  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
>> +  and bind it to a specific netdev queue, it allows a DPDK application to send
>> +  and receive raw packets through the socket which would bypass the kernel
>> +  network stack to achieve high performance packet processing.
>> +
>>   * **Updated Mellanox drivers.**
>>      New features and improvements were done in mlx4 and mlx5 PMDs:
>> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>> index 502869a87..5d401b8c5 100644
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
>>   endif
>>   DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
>>   DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>>   DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
>>   DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
>> diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
>> new file mode 100644
>> index 000000000..6cf0ed7db
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/Makefile
>> @@ -0,0 +1,33 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>> +
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +#
>> +# library name
>> +#
>> +LIB = librte_pmd_af_xdp.a
>> +
>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +CFLAGS += -O3
>> +
>> +# require kernel version >= v5.1-rc1
>> +LINUX_VERSION := $(shell uname -r)
>> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/include
>> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/lib/bpf
>> +
>> +CFLAGS += $(WERROR_FLAGS)
>> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
>> +LDLIBS += -lrte_bus_vdev
>> +LDLIBS += -lbpf
>> +
>> +#
>> +# all source are stored in SRCS-y
>> +#
>> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
>> +
>> +include $(RTE_SDK)/mk/rte.lib.mk
>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>> new file mode 100644
>> index 000000000..635e67483
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/meson.build
>> @@ -0,0 +1,21 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>> +
>> +if host_machine.system() != 'linux'
>> +	build = false
>> +endif
>> +
>> +bpf_dep = dependency('libbpf', required: false)
>> +if bpf_dep.found()
>> +	build = true
>> +else
>> +	bpf_dep = cc.find_library('libbpf', required: false)
>> +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
>> +		build = true
>> +		pkgconfig_extra_libs += '-lbpf'
>> +	else
>> +		build = false
>> +	endif
>> +endif
>> +sources = files('rte_eth_af_xdp.c')
>> +ext_deps += bpf_dep
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> new file mode 100644
>> index 000000000..96dedc0c4
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -0,0 +1,930 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2019 Intel Corporation.
>> + */
>> +
>> +#include <rte_mbuf.h>
>> +#include <rte_ethdev_driver.h>
>> +#include <rte_ethdev_vdev.h>
>> +#include <rte_malloc.h>
>> +#include <rte_kvargs.h>
>> +#include <rte_bus_vdev.h>
>> +#include <rte_string_fns.h>
>> +
>> +#include <linux/if_ether.h>
>> +#include <linux/if_xdp.h>
>> +#include <linux/if_link.h>
>> +#include <asm/barrier.h>
>
>Is this include used?

Yes, this is needed by xsk.h below which calls smp_wmb() & smp_rmb().

>
>> +#include <arpa/inet.h>
>> +#include <net/if.h>
>> +#include <sys/types.h>
>> +#include <sys/socket.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/mman.h>
>> +#include <unistd.h>
>> +#include <poll.h>
>> +#include <bpf/bpf.h>
>> +#include <xsk.h>
>> +
>> +#define RTE_LOGTYPE_AF_XDP RTE_LOGTYPE_USER1
>> +#ifndef SOL_XDP
>> +#define SOL_XDP 283
>> +#endif
>> +
>> +#ifndef AF_XDP
>> +#define AF_XDP 44
>> +#endif
>> +
>> +#ifndef PF_XDP
>> +#define PF_XDP AF_XDP
>> +#endif
>> +
>> +#define ETH_AF_XDP_IFACE_ARG			"iface"
>> +#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
>> +
>> +#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>> +#define ETH_AF_XDP_NUM_BUFFERS		4096
>> +#define ETH_AF_XDP_DATA_HEADROOM	0
>> +#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
>> +#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
>> +
>> +#define ETH_AF_XDP_RX_BATCH_SIZE	32
>> +#define ETH_AF_XDP_TX_BATCH_SIZE	32
>> +
>> +#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
>> +
>> +struct xsk_umem_info {
>> +	struct xsk_ring_prod fq;
>> +	struct xsk_ring_cons cq;
>> +	struct xsk_umem *umem;
>> +	struct rte_ring *buf_ring;
>> +	void *buffer;
>> +};
>> +
>> +struct pkt_rx_queue {
>> +	struct xsk_ring_cons rx;
>> +	struct xsk_umem_info *umem;
>> +	struct xsk_socket *xsk;
>> +	struct rte_mempool *mb_pool;
>> +
>> +	uint64_t rx_pkts;
>> +	uint64_t rx_bytes;
>> +	uint64_t rx_dropped;
>> +
>> +	struct pkt_tx_queue *pair;
>> +	uint16_t queue_idx;
>> +};
>> +
>> +struct pkt_tx_queue {
>> +	struct xsk_ring_prod tx;
>> +
>> +	uint64_t tx_pkts;
>> +	uint64_t err_pkts;
>> +	uint64_t tx_bytes;
>> +
>> +	struct pkt_rx_queue *pair;
>> +	uint16_t queue_idx;
>> +};
>> +
>> +struct pmd_internals {
>> +	int if_index;
>> +	char if_name[IFNAMSIZ];
>> +	uint16_t queue_idx;
>> +	struct ether_addr eth_addr;
>> +	struct xsk_umem_info *umem;
>> +	struct rte_mempool *mb_pool_share;
>> +
>> +	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
>> +	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
>> +};
>> +
>> +static const char * const valid_arguments[] = {
>> +	ETH_AF_XDP_IFACE_ARG,
>> +	ETH_AF_XDP_QUEUE_IDX_ARG,
>> +	NULL
>> +};
>> +
>> +static struct rte_eth_link pmd_link = {
>> +	.link_speed = ETH_SPEED_NUM_10G,
>> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
>> +	.link_status = ETH_LINK_DOWN,
>> +	.link_autoneg = ETH_LINK_AUTONEG
>> +};
>> +
>> +static inline int
>> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>> +{
>> +	struct xsk_ring_prod *fq = &umem->fq;
>> +	uint32_t idx;
>> +	void *addr = NULL;
>> +	int i, ret = 0;
>
>No need to initialize 'ret'. Is there a point to set 'addr'?

Ok, I'll keep it uninitialized.
addr needs be set otherwise there would be a build error:

drivers/net/af_xdp/rte_eth_af_xdp.c:129:40: error: ‘addr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
   *xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;
                                        ^
>
>> +
>> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
>> +	if (!ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
>> +		return ret;
>> +	}
>> +
>> +	for (i = 0; i < reserve_size; i++) {
>> +		rte_ring_dequeue(umem->buf_ring, &addr);
>> +		*xsk_ring_prod__fill_addr(fq, idx++) = (uint64_t)addr;
>
>Consider introducing a tmp variable to make this more readable.

Got it.

>
>> +	}
>> +
>> +	xsk_ring_prod__submit(fq, reserve_size);
>> +
>> +	return 0;
>> +}
>> +
>> +static uint16_t
>> +eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>> +{
>> +	struct pkt_rx_queue *rxq = queue;
>> +	struct xsk_ring_cons *rx = &rxq->rx;
>> +	struct xsk_umem_info *umem = rxq->umem;
>> +	struct xsk_ring_prod *fq = &umem->fq;
>> +	uint32_t idx_rx;
>> +	uint32_t free_thresh = fq->size >> 1;
>> +	struct rte_mbuf *mbuf;
>> +	unsigned long dropped = 0;
>> +	unsigned long rx_bytes = 0;
>> +	uint16_t count = 0;
>> +	int rcvd, i;
>> +
>> +	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
>> +		nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
>> +
>> +	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx); > +	if (!rcvd)
>
>Since peek returns the number of entries, not a boolean, do:
>rcvd == 0

Got it.

>
>> +		return 0;
>> +
>> +	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
>> +		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
>> +
>> +	for (i = 0; i < rcvd; i++) {
>> +		uint64_t addr = xsk_ring_cons__rx_desc(rx, idx_rx)->addr;
>> +		uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
>
>Use a tmp variable, instead of two calls.

Got it.

>
>> +		char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>> +
>
>Don't mix declaration and code. Why is this a char pointer? As opppose to
>void.

I'll split the declaration and code.
pkt should be a void *, will correct it in next version.


>
>> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
>> +		if (mbuf) {
>
>1.8.1

Got it.

>
>> +			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
>
>rte_memcpy()

Got it.

>
>> +			rte_pktmbuf_pkt_len(mbuf) =
>> +				rte_pktmbuf_data_len(mbuf) = len;
>
>Consider splitting this into two statements.

Got it.

>
>> +			rx_bytes += len;
>> +			bufs[count++] = mbuf;
>> +		} else {
>> +			dropped++;
>> +		}
>> +		rte_ring_enqueue(umem->buf_ring, (void *)addr);
>> +	}
>> +
>> +	xsk_ring_cons__release(rx, rcvd);
>> +
>> +	/* statistics */
>> +	rxq->rx_pkts += (rcvd - dropped);
>> +	rxq->rx_bytes += rx_bytes;
>> +	rxq->rx_dropped += dropped;
>> +
>> +	return count;
>> +}
>> +
>> +static void pull_umem_cq(struct xsk_umem_info *umem, int size)
>> +{
>> +	struct xsk_ring_cons *cq = &umem->cq;
>> +	int i, n;
>> +	uint32_t idx_cq;
>> +	uint64_t addr;
>> +
>> +	n = xsk_ring_cons__peek(cq, size, &idx_cq);
>
>Use size_t for n.

Got it.
>
>> +	if (n > 0) {
>> +		for (i = 0; i < n; i++) {
>
>Consider declaring 'addr' in this scope.

Got it.

>
>> +			addr = *xsk_ring_cons__comp_addr(cq,
>> +							 idx_cq++); > +			rte_ring_enqueue(umem->buf_ring, (void *)addr);
>> +		}
>> +
>> +		xsk_ring_cons__release(cq, n);
>> +	}
>> +}
>> +
>> +static void kick_tx(struct pkt_tx_queue *txq)
>> +{
>> +	struct xsk_umem_info *umem = txq->pair->umem;
>> +	int ret;
>> +
>> +	while (1) {
>
>for (;;)

Got it, is it a preference in DPDK convention for loop?

>
>> +		ret = sendto(xsk_socket__fd(txq->pair->xsk), NULL, 0,
>> +			     MSG_DONTWAIT, NULL, 0);
>> +
>> +		/* everything is ok */
>> +		if (ret >= 0)
>
>Use likely()?

Got it.

>
>> +			break;
>> +
>> +		/* some thing unexpected */
>> +		if (errno != EBUSY && errno != EAGAIN)
>> +			break;
>> +
>> +		/* pull from complete qeueu to leave more space */
>> +		if (errno == EAGAIN)
>> +			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
>> +	}
>> +	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
>> +}
>> +
>> +static uint16_t
>> +eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>> +{
>> +	struct pkt_tx_queue *txq = queue;
>> +	struct xsk_umem_info *umem = txq->pair->umem;
>> +	struct rte_mbuf *mbuf;
>> +	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
>> +	unsigned long tx_bytes = 0;
>> +	int i, valid = 0;
>> +	uint32_t idx_tx;
>> +
>> +	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
>> +		nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
>
>Use RTE_MIN().

Got it.

>
>> +
>> +	pull_umem_cq(umem, nb_pkts);
>> +
>> +	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
>> +					nb_pkts, NULL);
>> +	if (!nb_pkts)
>
>nb_pkts == 0

Got it.

>
>> +		return 0;
>> +
>> +	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
>> +		kick_tx(txq);
>> +		return 0;
>> +	}
>> +
>> +	for (i = 0; i < nb_pkts; i++) {
>> +		struct xdp_desc *desc;
>> +		char *pkt;
>
>Use void pointer?

Got it.

>
>> +		unsigned int buf_len = ETH_AF_XDP_FRAME_SIZE
>> +					- ETH_AF_XDP_DATA_HEADROOM;
>
>Use uint32_t, as you seem to do elsewhere.

Got it.

>
>> +		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
>> +		mbuf = bufs[i];
>> +		if (mbuf->pkt_len <= buf_len) {
>> +			desc->addr = (uint64_t)addrs[valid];
>> +			desc->len = mbuf->pkt_len;
>> +			pkt = xsk_umem__get_data(umem->buffer,
>> +						 desc->addr);
>> +			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
>> +			       desc->len);
>
>rte_memcpy()

Got it.

>
>> +			valid++;
>> +			tx_bytes += mbuf->pkt_len;
>> +		}
>> +		rte_pktmbuf_free(mbuf);
>> +	}
>> +
>> +	xsk_ring_prod__submit(&txq->tx, nb_pkts);
>> +
>> +	kick_tx(txq);
>> +
>> +	if (valid < nb_pkts)
>> +		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
>> +				 nb_pkts - valid, NULL);
>> +
>> +	txq->err_pkts += nb_pkts - valid;
>> +	txq->tx_pkts += valid;
>> +	txq->tx_bytes += tx_bytes;
>> +
>> +	return nb_pkts;
>> +}
>> +
>> +static int
>> +eth_dev_start(struct rte_eth_dev *dev)
>> +{
>> +	dev->data->dev_link.link_status = ETH_LINK_UP;
>> +
>> +	return 0;
>> +}
>> +
>> +/* This function gets called when the current port gets stopped. */
>> +static void
>> +eth_dev_stop(struct rte_eth_dev *dev)
>> +{
>> +	dev->data->dev_link.link_status = ETH_LINK_DOWN;
>> +}
>> +
>> +static int
>> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
>> +{
>> +	/* rx/tx must be paired */
>> +	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	dev_info->if_index = internals->if_index;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
>> +	dev_info->max_rx_queues = 1;
>> +	dev_info->max_tx_queues = 1;
>> +	dev_info->min_rx_bufsize = 0;
>> +
>> +	dev_info->default_rxportconf.nb_queues = 1;
>> +	dev_info->default_txportconf.nb_queues = 1;
>> +	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
>> +	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
>> +}
>> +
>> +static int
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct xdp_statistics xdp_stats;
>> +	struct pkt_rx_queue *rxq;
>> +	socklen_t optlen;
>> +	int i;
>> +
>> +	optlen = sizeof(struct xdp_statistics);
>> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> +		rxq = &internals->rx_queues[i];
>> +		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
>> +		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
>> +
>> +		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
>> +		stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
>> +
>> +		stats->ipackets += stats->q_ipackets[i];
>> +		stats->ibytes += stats->q_ibytes[i];
>> +		stats->imissed += internals->rx_queues[i].rx_dropped;
>> +		getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, XDP_STATISTICS,
>> +				&xdp_stats, &optlen);
>> +		stats->imissed += xdp_stats.rx_dropped;
>> +
>> +		stats->opackets += stats->q_opackets[i];
>> +		stats->oerrors += stats->q_errors[i];
>> +		stats->oerrors += internals->tx_queues[i].err_pkts;
>> +		stats->obytes += stats->q_obytes[i];
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +eth_stats_reset(struct rte_eth_dev *dev)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	int i;
>> +
>> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
>> +		internals->rx_queues[i].rx_pkts = 0;
>> +		internals->rx_queues[i].rx_bytes = 0;
>> +		internals->rx_queues[i].rx_dropped = 0;
>> +
>> +		internals->tx_queues[i].tx_pkts = 0;
>> +		internals->tx_queues[i].err_pkts = 0;
>> +		internals->tx_queues[i].tx_bytes = 0;
>> +	}
>> +}
>> +
>> +static void remove_xdp_program(struct pmd_internals *internals)
>> +{
>> +	uint32_t curr_prog_id = 0;
>> +
>> +	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
>> +				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
>> +		RTE_LOG(ERR, AF_XDP, "bpf_get_link_xdp_id failed\n");
>> +		return;
>> +	}
>> +	bpf_set_link_xdp_fd(internals->if_index, -1,
>> +			XDP_FLAGS_UPDATE_IF_NOEXIST);
>> +}
>> +
>> +static void
>> +eth_dev_close(struct rte_eth_dev *dev)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct pkt_rx_queue *rxq;
>> +	int i;
>> +
>> +	RTE_LOG(INFO, AF_XDP, "Closing AF_XDP ethdev on numa socket %u\n",
>> +		rte_socket_id());
>> +
>> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
>> +		rxq = &internals->rx_queues[i];
>> +		if (!rxq->umem)
>> +			break;
>> +		xsk_socket__delete(rxq->xsk);
>> +	}
>> +
>> +	(void)xsk_umem__delete(internals->umem->umem);
>> +	remove_xdp_program(internals);
>> +}
>> +
>> +static void
>> +eth_queue_release(void *q __rte_unused)
>> +{
>> +}
>> +
>> +static int
>> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
>> +		int wait_to_complete __rte_unused)
>> +{
>> +	return 0;
>> +}
>> +
>> +static void xdp_umem_destroy(struct xsk_umem_info *umem)
>> +{
>> +	free(umem->buffer);
>> +	umem->buffer = NULL;
>> +
>> +	rte_ring_free(umem->buf_ring);
>> +	umem->buf_ring = NULL;
>> +
>> +	free(umem);
>> +	umem = NULL;
>> +}
>> +
>> +static struct xsk_umem_info *xdp_umem_configure(void)
>> +{
>> +	struct xsk_umem_info *umem;
>> +	struct xsk_umem_config usr_config = {
>> +		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>> +		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>> +		.frame_size = ETH_AF_XDP_FRAME_SIZE,
>> +		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
>> +	void *bufs = NULL;
>> +	char ring_name[0x100];
>> +	int ret;
>> +	uint64_t i;
>> +
>> +	umem = calloc(1, sizeof(*umem));
>> +	if (!umem) {
>
>1.8.1

Got it.

>
>> +		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
>> +		return NULL;
>> +	}
>> +
>> +	snprintf(ring_name, 0x100, "af_xdp_ring");
>
>Again the magical 0x100.

Will correct it.

>
>> +	umem->buf_ring = rte_ring_create(ring_name,
>> +					 ETH_AF_XDP_NUM_BUFFERS,
>> +					 SOCKET_ID_ANY,
>> +					 0x0);
>> +	if (!umem->buf_ring) {
>
>1.8.1

Got it.

>
>> +		RTE_LOG(ERR, AF_XDP,
>> +			"Failed to create rte_ring\n");
>> +		goto err;
>> +	}
>> +
>> +	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
>> +		rte_ring_enqueue(umem->buf_ring,
>> +				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
>> +					  ETH_AF_XDP_DATA_HEADROOM));
>> +
>> +	if (posix_memalign(&bufs, getpagesize(),
>> +			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
>> +		goto err;
>> +	}
>> +	ret = xsk_umem__create(&umem->umem, bufs,
>> +			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>> +			       &umem->fq, &umem->cq,
>> +			       &usr_config);
>> +
>> +	if (ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
>> +		goto err;
>> +	}
>> +	umem->buffer = bufs;
>> +
>> +	return umem;
>> +
>> +err:
>> +	xdp_umem_destroy(umem);
>> +	return NULL;
>> +}
>> +
>> +static int
>> +xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
>> +	      int ring_size)
>> +{
>> +	struct xsk_socket_config cfg;
>> +	struct pkt_tx_queue *txq = rxq->pair;
>> +	int ret = 0;
>> +	int reserve_size;
>> +
>> +	rxq->umem = xdp_umem_configure();
>> +	if (!rxq->umem) {
>
>1.8.1

Got it.

>
>> +		ret = -ENOMEM;
>> +		goto err;
>> +	}
>> +
>> +	cfg.rx_size = ring_size;
>> +	cfg.tx_size = ring_size;
>> +	cfg.libbpf_flags = 0;
>> +	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>> +	cfg.bind_flags = 0;
>> +	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
>> +			internals->queue_idx, rxq->umem->umem, &rxq->rx,
>> +			&txq->tx, &cfg);
>> +	if (ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to create xsk socket.\n");
>> +		goto err;
>> +	}
>> +
>> +	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
>> +	ret = reserve_fill_queue(rxq->umem, reserve_size);
>> +	if (ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve fill queue.\n");
>> +		goto err;
>> +	}
>> +
>> +	return 0;
>> +
>> +err:
>> +	xdp_umem_destroy(rxq->umem);
>> +
>> +	return ret;
>> +}
>> +
>> +static void
>> +queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
>> +{
>> +	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
>> +	struct pkt_tx_queue *txq = rxq->pair;
>> +	int xsk_fd = xsk_socket__fd(rxq->xsk);
>> +
>> +	if (xsk_fd) {
>> +		close(xsk_fd);
>> +		if (internals->umem) {
>
>1.8.1

Got it.

>
>> +			xdp_umem_destroy(internals->umem);
>> +			internals->umem = NULL;
>> +		}
>> +	}
>> +	memset(rxq, 0, sizeof(*rxq));
>> +	memset(txq, 0, sizeof(*txq));
>> +	rxq->pair = txq;
>> +	txq->pair = rxq;
>> +	rxq->queue_idx = queue_idx;
>> +	txq->queue_idx = queue_idx;
>> +}
>> +
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev,
>> +		   uint16_t rx_queue_id,
>> +		   uint16_t nb_rx_desc,
>> +		   unsigned int socket_id __rte_unused,
>> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +		   struct rte_mempool *mb_pool)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	unsigned int buf_size, data_size;
>
>uint32_t

Got it.

>
>> +	struct pkt_rx_queue *rxq;
>> +	int ret = 0;
>
>No need to set 'ret'. Alternatively, restructure so you always return 'ret'.

Got it.

>
>> +
>> +	rxq = &internals->rx_queues[rx_queue_id];
>> +	queue_reset(internals, rx_queue_id);
>> +
>> +	/* Now get the space available for data in the mbuf */
>> +	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
>> +		RTE_PKTMBUF_HEADROOM;
>> +	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
>> +
>> +	if (data_size > buf_size) {
>> +		RTE_LOG(ERR, AF_XDP,
>> +			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
>> +			dev->device->name, data_size, buf_size);
>> +		ret = -ENOMEM;
>> +		goto err;
>> +	}
>> +
>> +	rxq->mb_pool = mb_pool;
>> +
>> +	if (xsk_configure(internals, rxq, nb_rx_desc)) {
>> +		RTE_LOG(ERR, AF_XDP,
>> +			"Failed to configure xdp socket\n");
>> +		ret = -EINVAL;
>> +		goto err;
>> +	}
>> +
>> +	internals->umem = rxq->umem;
>> +
>> +	dev->data->rx_queues[rx_queue_id] = rxq;
>> +	return 0;
>> +
>> +err:
>> +	queue_reset(internals, rx_queue_id);
>> +	return ret;
>> +}
>> +
>> +static int
>> +eth_tx_queue_setup(struct rte_eth_dev *dev,
>> +		   uint16_t tx_queue_id,
>> +		   uint16_t nb_tx_desc __rte_unused,
>> +		   unsigned int socket_id __rte_unused,
>> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct pkt_tx_queue *txq;
>> +
>> +	txq = &internals->tx_queues[tx_queue_id];
>> +
>> +	dev->data->tx_queues[tx_queue_id] = txq;
>> +	return 0;
>> +}
>> +
>> +static int
>> +eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct ifreq ifr = { .ifr_mtu = mtu };
>> +	int ret;
>> +	int s;
>> +
>> +	s = socket(PF_INET, SOCK_DGRAM, 0);
>> +	if (s < 0)
>> +		return -EINVAL;
>> +
>> +	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
>> +	ret = ioctl(s, SIOCSIFMTU, &ifr);
>> +	close(s);
>> +
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
>> +{
>> +	struct ifreq ifr;
>> +	int s;
>> +
>> +	s = socket(PF_INET, SOCK_DGRAM, 0);
>> +	if (s < 0)
>> +		return;
>> +
>> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
>> +	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
>> +		goto out;
>> +	ifr.ifr_flags &= mask;
>> +	ifr.ifr_flags |= flags;
>> +	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
>> +		goto out;
>> +out:
>> +	close(s);
>> +}
>> +
>> +static void
>> +eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
>> +}
>> +
>> +static void
>> +eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
>> +}
>> +
>> +static const struct eth_dev_ops ops = {
>> +	.dev_start = eth_dev_start,
>> +	.dev_stop = eth_dev_stop,
>> +	.dev_close = eth_dev_close,
>> +	.dev_configure = eth_dev_configure,
>> +	.dev_infos_get = eth_dev_info,
>> +	.mtu_set = eth_dev_mtu_set,
>> +	.promiscuous_enable = eth_dev_promiscuous_enable,
>> +	.promiscuous_disable = eth_dev_promiscuous_disable,
>> +	.rx_queue_setup = eth_rx_queue_setup,
>> +	.tx_queue_setup = eth_tx_queue_setup,
>> +	.rx_queue_release = eth_queue_release,
>> +	.tx_queue_release = eth_queue_release,
>> +	.link_update = eth_link_update,
>> +	.stats_get = eth_stats_get,
>> +	.stats_reset = eth_stats_reset,
>> +};
>> +
>> +/** parse integer from integer argument */
>> +static int
>> +parse_integer_arg(const char *key __rte_unused,
>> +		  const char *value, void *extra_args)
>> +{
>> +	int *i = (int *)extra_args;
>> +
>> +	*i = atoi(value);
>
>Use strtol().

Got it.

>
>> +	if (*i < 0) {
>> +		RTE_LOG(ERR, AF_XDP, "Argument has to be positive.\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/** parse name argument */
>> +static int
>> +parse_name_arg(const char *key __rte_unused,
>> +	       const char *value, void *extra_args)
>> +{
>> +	char *name = extra_args;
>> +
>> +	if (strlen(value) > IFNAMSIZ) {
>
>The buffer is IFNAMSIZ bytes (which it should be), so it can't hold a string
>with strlen() == IFNAMSIZ.

Good catch. should be strlen(value) > IFNAMSIZ - 1 

>
>> +		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
>> +			"%u bytes.\n", value, IFNAMSIZ);
>> +		return -EINVAL;
>> +	}
>> +
>> +	strlcpy(name, value, IFNAMSIZ);
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +parse_parameters(struct rte_kvargs *kvlist,
>> +		 char *if_name,
>> +		 int *queue_idx)
>> +{
>> +	int ret = 0;
>
>Should not be initialized.

Got it.

>
>> +
>> +	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
>> +				 &parse_name_arg, if_name);
>> +	if (ret < 0)
>> +		goto free_kvlist;
>> +
>> +	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
>> +				 &parse_integer_arg, queue_idx);
>> +	if (ret < 0)
>> +		goto free_kvlist;
>
>I fail to see the point of this goto, but maybe there's more code to follow
>in future patches.

Yes, We'll add more parameters in the future.

>
>> +
>> +free_kvlist:
>> +	rte_kvargs_free(kvlist);
>> +	return ret;
>> +}
>> +
>> +static int
>> +get_iface_info(const char *if_name,
>> +	       struct ether_addr *eth_addr,
>> +	       int *if_index)
>> +{
>> +	struct ifreq ifr;
>> +	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
>> +
>> +	if (sock < 0)
>> +		return -1;
>> +
>> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
>> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
>> +		goto error;
>> +
>> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
>> +		goto error;
>> +
>> +	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
>> +
>> +	close(sock);
>> +	*if_index = if_nametoindex(if_name);
>> +	return 0;
>> +
>> +error:
>> +	close(sock);
>> +	return -1;
>> +}
>> +
>> +static int
>> +init_internals(struct rte_vdev_device *dev,
>> +	       const char *if_name,
>> +	       int queue_idx,
>> +	       struct rte_eth_dev **eth_dev)
>> +{
>> +	const char *name = rte_vdev_device_name(dev);
>> +	const unsigned int numa_node = dev->device.numa_node;
>> +	struct pmd_internals *internals = NULL;
>> +	int ret;
>> +	int i;
>> +
>> +	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
>> +	if (!internals)
>
>1.8.1

Got it.

>
>> +		return -ENOMEM;
>> +
>> +	internals->queue_idx = queue_idx;
>> +	strlcpy(internals->if_name, if_name, IFNAMSIZ);
>> +
>> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
>> +		internals->tx_queues[i].pair = &internals->rx_queues[i];
>> +		internals->rx_queues[i].pair = &internals->tx_queues[i];
>> +	}
>> +
>> +	ret = get_iface_info(if_name, &internals->eth_addr,
>> +			     &internals->if_index);
>> +	if (ret)
>> +		goto err;
>> +
>> +	*eth_dev = rte_eth_vdev_allocate(dev, 0);
>> +	if (!*eth_dev)
>
>1.8.1

Got it.

>
>> +		goto err;
>> +
>> +	(*eth_dev)->data->dev_private = internals;
>> +	(*eth_dev)->data->dev_link = pmd_link;
>> +	(*eth_dev)->data->mac_addrs = &internals->eth_addr;
>> +	(*eth_dev)->dev_ops = &ops;
>> +	(*eth_dev)->rx_pkt_burst = eth_af_xdp_rx;
>> +	(*eth_dev)->tx_pkt_burst = eth_af_xdp_tx;
>> +
>> +	return 0;
>> +
>> +err:
>> +	rte_free(internals);
>> +	return -1;
>> +}
>> +
>> +static int
>> +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>> +{
>> +	struct rte_kvargs *kvlist;
>> +	char if_name[IFNAMSIZ];
>> +	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	const char *name;
>> +	int ret;
>> +
>> +	RTE_LOG(INFO, AF_XDP, "Initializing pmd_af_xdp for %s\n",
>> +		rte_vdev_device_name(dev));
>> +
>> +	name = rte_vdev_device_name(dev);
>> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
>> +		strlen(rte_vdev_device_args(dev)) == 0) {
>> +		eth_dev = rte_eth_dev_attach_secondary(name);
>> +		if (!eth_dev) {
>> +			RTE_LOG(ERR, AF_XDP, "Failed to probe %s\n", name);
>> +			return -EINVAL;
>> +		}
>> +		eth_dev->dev_ops = &ops;
>> +		rte_eth_dev_probing_finish(eth_dev);
>> +		return 0;
>> +	}
>> +
>> +	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
>> +	if (!kvlist) {
>> +		RTE_LOG(ERR, AF_XDP, "Invalid kvargs key\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (dev->device.numa_node == SOCKET_ID_ANY)
>> +		dev->device.numa_node = rte_socket_id();
>> +
>> +	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
>> +		RTE_LOG(ERR, AF_XDP, "Invalid kvargs value\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = init_internals(dev, if_name, xsk_queue_idx, &eth_dev);
>> +	if (ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to init internals\n");
>> +		return ret;
>> +	}
>> +
>> +	rte_eth_dev_probing_finish(eth_dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
>> +{
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct pmd_internals *internals;
>> +
>> +	RTE_LOG(INFO, AF_XDP, "Removing AF_XDP ethdev on numa socket %u\n",
>> +		rte_socket_id());
>> +
>> +	if (!dev)
>> +		return -1;
>> +
>> +	/* find the ethdev entry */
>> +	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
>> +	if (!eth_dev)
>> +		return -1;
>> +
>> +	internals = eth_dev->data->dev_private;
>> +
>> +	rte_ring_free(internals->umem->buf_ring);
>> +	rte_free(internals->umem->buffer);
>> +	rte_free(internals->umem);
>> +
>> +	rte_eth_dev_release_port(eth_dev);
>> +
>> +
>> +	return 0;
>> +}
>> +
>> +static struct rte_vdev_driver pmd_af_xdp_drv = {
>> +	.probe = rte_pmd_af_xdp_probe,
>> +	.remove = rte_pmd_af_xdp_remove,
>> +};
>> +
>> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
>> +RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
>> +			      "iface=<string> "
>> +			      "queue=<int> ");
>> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> new file mode 100644
>> index 000000000..c6db030fe
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> @@ -0,0 +1,3 @@
>> +DPDK_19.05 {
>> +	local: *;
>> +};
>> diff --git a/drivers/net/meson.build b/drivers/net/meson.build
>> index 3ecc78cee..1105e72d8 100644
>> --- a/drivers/net/meson.build
>> +++ b/drivers/net/meson.build
>> @@ -2,6 +2,7 @@
>>   # Copyright(c) 2017 Intel Corporation
>>   drivers = ['af_packet',
>> +	'af_xdp',
>>   	'ark',
>>   	'atlantic',
>>   	'avp',
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index 262132fc6..be0af73cc 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
>>   endif
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
>> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-19  9:07     ` Mattias Rönnblom
@ 2019-03-19 16:14     ` Stephen Hemminger
  2019-03-20  2:32       ` Ye Xiaolong
  2019-03-19 16:16     ` Stephen Hemminger
  2019-03-20  9:23     ` David Marchand
  3 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-19 16:14 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

Lots of little review comments. This is what I saw in 30 minutes.
Expect more.


On Tue, 19 Mar 2019 15:12:51 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
> +		nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;

Maybe use RTE_MIN() ?

> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
> +		if (mbuf) {
> +			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);

Space necessary in "void*"
Use rte_memcpy.

> +static void pull_umem_cq(struct xsk_umem_info *umem, int size)
> +{
> +	struct xsk_ring_cons *cq = &umem->cq;
> +	int i, n;
> +	uint32_t idx_cq;
> +	uint64_t addr;
> +
> +	n = xsk_ring_cons__peek(cq, size, &idx_cq);
> +	if (n > 0) {
> +		for (i = 0; i < n; i++) {

You don't need the if (n > 0) since if n <= 0 the for loop
would happen 0 times.

> +
> +static void kick_tx(struct pkt_tx_queue *txq)
> +{
> +	struct xsk_umem_info *umem = txq->pair->umem;
> +	int ret;
> +
> +	while (1) {
> +		ret = sendto(xsk_socket__fd(txq->pair->xsk), NULL, 0,
> +			     MSG_DONTWAIT, NULL, 0);
> +
> +		/* everything is ok */
> +		if (ret >= 0)
> +			break;

I would prefer:
	while ((send(xsk_socket__fd(fd, NULL, 0, MSG_DONTWAIT) < 0) {

Because:
	- use while() to make looping clearer rather than while(1)
	- use send() rather than sendto() because you aren't sending with addr
	- you don't care about return value (ie.ret)

> +
> +		/* some thing unexpected */
> +		if (errno != EBUSY && errno != EAGAIN)
> +			break;

What about EINTR



> +static void
> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;

Cast here is unnecessary.

> +	dev_info->max_rx_queues = 1;
> +	dev_info->max_tx_queues = 1;
> +	dev_info->min_rx_bufsize = 0;

dev_info is already zero, don't need to fill other values.

> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
> +	dev_info->max_rx_queues = 1;
> +	dev_info->max_tx_queues = 1;
> +	dev_info->min_rx_bufsize = 0;
> +
> +	dev_info->default_rxportconf.nb_queues = 1;
> +	dev_info->default_txportconf.nb_queues = 1;
> +	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
> +	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
> +}
> +
> +static int
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct xdp_statistics xdp_stats;
> +	struct pkt_rx_queue *rxq;
> +	socklen_t optlen;
> +	int i;
> +
> +	optlen = sizeof(struct xdp_statistics);

In theory each call to getsockopt() could change or reduce the value of
optlen. Best to initialize in loop before each call.

> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +		rxq = &internals->rx_queues[i];
> +		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
> +		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
> +
> +		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
> +		stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
> +
> +		stats->ipackets += stats->q_ipackets[i];
> +		stats->ibytes += stats->q_ibytes[i];
> +		stats->imissed += internals->rx_queues[i].rx_dropped;
> +		getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, XDP_STATISTICS,
> +				&xdp_stats, &optlen);

You need to check return value of getsockopt() otherwise coverity will complain.

> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	int i;
> +
> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
> +		internals->rx_queues[i].rx_pkts = 0;
> +		internals->rx_queues[i].rx_bytes = 0;
> +		internals->rx_queues[i].rx_dropped = 0;
> +
> +		internals->tx_queues[i].tx_pkts = 0;
> +		internals->tx_queues[i].err_pkts = 0;
> +		internals->tx_queues[i].tx_bytes = 0;

Put all the statistics together and use memset?

> +static struct xsk_umem_info *xdp_umem_configure(void)
> +{
> +	struct xsk_umem_info *umem;
> +	struct xsk_umem_config usr_config = {
> +		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
> +		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
> +		.frame_size = ETH_AF_XDP_FRAME_SIZE,
> +		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
> +	void *bufs = NULL;
> +	char ring_name[0x100];

0x100 is unconvential here. Instead use RTE_RING_NAMESIZE
but variable is unnecessary, see below

> +	int ret;
> +	uint64_t i;
> +
> +	umem = calloc(1, sizeof(*umem));

Why not use rte_zmalloc_node to:
   1. work with primary/secondary
   2. guarantee memory on correct numa node?

> +	if (!umem) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
> +		return NULL;
> +	}
> +
> +	snprintf(ring_name, 0x100, "af_xdp_ring");

If ring is always the same, why copy it. Just use string literal

> +/** parse name argument */
> +static int
> +parse_name_arg(const char *key __rte_unused,
> +	       const char *value, void *extra_args)
> +{
> +	char *name = extra_args;
> +
> +	if (strlen(value) > IFNAMSIZ) {

Why not:
	if (strnlen(value, IFNAMSIZ) >= IFNAMSIZ) {

> +
> +static int
> +init_internals(struct rte_vdev_device *dev,
> +	       const char *if_name,
> +	       int queue_idx,
> +	       struct rte_eth_dev **eth_dev)

If you changed the function to return the new eth_dev (and return NULL on error)
then you wouldn't have to pass eth_dev by reference.

static struct rte_eth_dev *
allocate_ethdev(struct rte_vdev_device *dev, const char *if_name, uint16_t queue_idx)
{ 

> +{
> +	const char *name = rte_vdev_device_name(dev);
> +	const unsigned int numa_node = dev->device.numa_node;
> +	struct pmd_internals *internals = NULL;

Useless initialization, first thing you do is allocate this.


> +	int ret;
> +	int i;
> +
> +	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-19  9:07     ` Mattias Rönnblom
  2019-03-19 16:14     ` Stephen Hemminger
@ 2019-03-19 16:16     ` Stephen Hemminger
  2019-03-19 16:33       ` Bruce Richardson
  2019-03-20  2:05       ` Ye Xiaolong
  2019-03-20  9:23     ` David Marchand
  3 siblings, 2 replies; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-19 16:16 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Tue, 19 Mar 2019 15:12:51 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  MAINTAINERS                                   |   6 +
>  config/common_base                            |   5 +
>  config/common_linux                           |   1 +
>  doc/guides/nics/af_xdp.rst                    |  45 +
>  doc/guides/nics/features/af_xdp.ini           |  11 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/rel_notes/release_19_05.rst        |   7 +
>  drivers/net/Makefile                          |   1 +
>  drivers/net/af_xdp/Makefile                   |  33 +
>  drivers/net/af_xdp/meson.build                |  21 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 930 ++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>  drivers/net/meson.build                       |   1 +
>  mk/rte.app.mk                                 |   1 +
>  14 files changed, 1066 insertions(+)
>  create mode 100644 doc/guides/nics/af_xdp.rst
>  create mode 100644 doc/guides/nics/features/af_xdp.ini
>  create mode 100644 drivers/net/af_xdp/Makefile
>  create mode 100644 drivers/net/af_xdp/meson.build
>  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 452b8eb82..1cc54b439 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
>  F: drivers/net/af_packet/
>  F: doc/guides/nics/features/afpacket.ini
>  
> +Linux AF_XDP
> +M: Xiaolong Ye <xiaolong.ye@intel.com>
> +M: Qi Zhang <qi.z.zhang@intel.com>
> +F: drivers/net/af_xdp/
> +F: doc/guides/nics/features/af_xdp.rst
> +
>  Amazon ENA
>  M: Marcin Wojtas <mw@semihalf.com>
>  M: Michal Krawczyk <mk@semihalf.com>
> diff --git a/config/common_base b/config/common_base
> index 0b09a9348..4044de205 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>  #
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>  
> +#
> +# Compile software PMD backed by AF_XDP sockets (Linux only)
> +#
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
> +
>  #
>  # Compile link bonding PMD library
>  #
> diff --git a/config/common_linux b/config/common_linux
> index 75334273d..0b1249da0 100644
> --- a/config/common_linux
> +++ b/config/common_linux
> @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
>  CONFIG_RTE_LIBRTE_PMD_VHOST=y
>  CONFIG_RTE_LIBRTE_IFC_PMD=y
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
>  CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
>  CONFIG_RTE_LIBRTE_PMD_TAP=y
>  CONFIG_RTE_LIBRTE_AVP_PMD=y
> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> new file mode 100644
> index 000000000..dd5654dd1
> --- /dev/null
> +++ b/doc/guides/nics/af_xdp.rst
> @@ -0,0 +1,45 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018 Intel Corporation.
> +
> +AF_XDP Poll Mode Driver
> +==========================
> +
> +AF_XDP is an address family that is optimized for high performance
> +packet processing. AF_XDP sockets enable the possibility for XDP program to
> +redirect packets to a memory buffer in userspace.
> +
> +For the full details behind AF_XDP socket, you can refer to
> +`AF_XDP documentation in the Kernel
> +<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
> +
> +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
> +specific netdev queue, it allows a DPDK application to send and receive raw
> +packets through the socket which would bypass the kernel network stack.
> +Current implementation only supports single queue, multi-queues feature will
> +be added later.
> +
> +Options
> +-------
> +
> +The following options can be provided to set up an af_xdp port in DPDK.
> +
> +*   ``iface`` - name of the Kernel interface to attach to (required);
> +*   ``queue`` - netdev queue id (optional, default 0);
> +
> +Prerequisites
> +-------------
> +
> +This is a Linux-specific PMD, thus the following prerequisites apply:
> +
> +*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
> +*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
> +*  A Kernel bound interface to attach to.
> +
> +Set up an af_xdp interface
> +-----------------------------
> +
> +The following example will set up an af_xdp interface in DPDK:
> +
> +.. code-block:: console
> +
> +    --vdev eth_af_xdp,iface=ens786f1,queue=0
> diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
> new file mode 100644
> index 000000000..7b8fcce00
> --- /dev/null
> +++ b/doc/guides/nics/features/af_xdp.ini
> @@ -0,0 +1,11 @@
> +;
> +; Supported features of the 'af_xdp' network poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Link status          = Y
> +MTU update           = Y
> +Promiscuous mode     = Y
> +Stats per queue      = Y
> +x86-64               = Y
> \ No newline at end of file


This is bad. Configure your editor to always put newline at end of file.
In .emacs

   (setq require-final-newline t)

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19 16:16     ` Stephen Hemminger
@ 2019-03-19 16:33       ` Bruce Richardson
  2019-03-20  2:07         ` Ye Xiaolong
  2019-03-20  2:05       ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Bruce Richardson @ 2019-03-19 16:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Xiaolong Ye, dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Tue, Mar 19, 2019 at 09:16:27AM -0700, Stephen Hemminger wrote:
> On Tue, 19 Mar 2019 15:12:51 +0800
> Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> 
> > Add a new PMD driver for AF_XDP which is a proposed faster version of
> > AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> > [2].
> > 
> > This is the vanilla version PMD which just uses a raw buffer registered as
> > the umem.
> > 
> > [1] https://fosdem.org/2018/schedule/event/af_xdp/
> > [2] https://lwn.net/Articles/745934/
> > 
> > Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> > ---
> >  MAINTAINERS                                   |   6 +
> >  config/common_base                            |   5 +
> >  config/common_linux                           |   1 +
> >  doc/guides/nics/af_xdp.rst                    |  45 +
> >  doc/guides/nics/features/af_xdp.ini           |  11 +
> >  doc/guides/nics/index.rst                     |   1 +
> >  doc/guides/rel_notes/release_19_05.rst        |   7 +
> >  drivers/net/Makefile                          |   1 +
> >  drivers/net/af_xdp/Makefile                   |  33 +
> >  drivers/net/af_xdp/meson.build                |  21 +
> >  drivers/net/af_xdp/rte_eth_af_xdp.c           | 930 ++++++++++++++++++
> >  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
> >  drivers/net/meson.build                       |   1 +
> >  mk/rte.app.mk                                 |   1 +
> >  14 files changed, 1066 insertions(+)
> >  create mode 100644 doc/guides/nics/af_xdp.rst
> >  create mode 100644 doc/guides/nics/features/af_xdp.ini
> >  create mode 100644 drivers/net/af_xdp/Makefile
> >  create mode 100644 drivers/net/af_xdp/meson.build
> >  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
> >  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 452b8eb82..1cc54b439 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
> >  F: drivers/net/af_packet/
> >  F: doc/guides/nics/features/afpacket.ini
> >  
> > +Linux AF_XDP
> > +M: Xiaolong Ye <xiaolong.ye@intel.com>
> > +M: Qi Zhang <qi.z.zhang@intel.com>
> > +F: drivers/net/af_xdp/
> > +F: doc/guides/nics/features/af_xdp.rst
> > +
> >  Amazon ENA
> >  M: Marcin Wojtas <mw@semihalf.com>
> >  M: Michal Krawczyk <mk@semihalf.com>
> > diff --git a/config/common_base b/config/common_base
> > index 0b09a9348..4044de205 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
> >  #
> >  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
> >  
> > +#
> > +# Compile software PMD backed by AF_XDP sockets (Linux only)
> > +#
> > +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
> > +
> >  #
> >  # Compile link bonding PMD library
> >  #
> > diff --git a/config/common_linux b/config/common_linux
> > index 75334273d..0b1249da0 100644
> > --- a/config/common_linux
> > +++ b/config/common_linux
> > @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
> >  CONFIG_RTE_LIBRTE_PMD_VHOST=y
> >  CONFIG_RTE_LIBRTE_IFC_PMD=y
> >  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> > +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
> >  CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
> >  CONFIG_RTE_LIBRTE_PMD_TAP=y
> >  CONFIG_RTE_LIBRTE_AVP_PMD=y
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > new file mode 100644
> > index 000000000..dd5654dd1
> > --- /dev/null
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -0,0 +1,45 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(c) 2018 Intel Corporation.
> > +
> > +AF_XDP Poll Mode Driver
> > +==========================
> > +
> > +AF_XDP is an address family that is optimized for high performance
> > +packet processing. AF_XDP sockets enable the possibility for XDP program to
> > +redirect packets to a memory buffer in userspace.
> > +
> > +For the full details behind AF_XDP socket, you can refer to
> > +`AF_XDP documentation in the Kernel
> > +<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
> > +
> > +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
> > +specific netdev queue, it allows a DPDK application to send and receive raw
> > +packets through the socket which would bypass the kernel network stack.
> > +Current implementation only supports single queue, multi-queues feature will
> > +be added later.
> > +
> > +Options
> > +-------
> > +
> > +The following options can be provided to set up an af_xdp port in DPDK.
> > +
> > +*   ``iface`` - name of the Kernel interface to attach to (required);
> > +*   ``queue`` - netdev queue id (optional, default 0);
> > +
> > +Prerequisites
> > +-------------
> > +
> > +This is a Linux-specific PMD, thus the following prerequisites apply:
> > +
> > +*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
> > +*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
> > +*  A Kernel bound interface to attach to.
> > +
> > +Set up an af_xdp interface
> > +-----------------------------
> > +
> > +The following example will set up an af_xdp interface in DPDK:
> > +
> > +.. code-block:: console
> > +
> > +    --vdev eth_af_xdp,iface=ens786f1,queue=0
> > diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
> > new file mode 100644
> > index 000000000..7b8fcce00
> > --- /dev/null
> > +++ b/doc/guides/nics/features/af_xdp.ini
> > @@ -0,0 +1,11 @@
> > +;
> > +; Supported features of the 'af_xdp' network poll mode driver.
> > +;
> > +; Refer to default.ini for the full list of available PMD features.
> > +;
> > +[Features]
> > +Link status          = Y
> > +MTU update           = Y
> > +Promiscuous mode     = Y
> > +Stats per queue      = Y
> > +x86-64               = Y
> > \ No newline at end of file
> 
> 
> This is bad. Configure your editor to always put newline at end of file.
> In .emacs
> 
>    (setq require-final-newline t)
> 
Or use Vim which needs no extra configuration to do this. :-)

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 6/6] app/testpmd: add mempool flags parameter
  2019-03-19  7:12   ` [PATCH v2 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
@ 2019-03-19 23:36     ` Jerin Jacob Kollanukkaran
  2019-03-20  2:08       ` Ye Xiaolong
  2019-03-20  9:23       ` David Marchand
  0 siblings, 2 replies; 214+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-03-19 23:36 UTC (permalink / raw)
  To: xiaolong.ye, dev; +Cc: olivier.matz, magnus.karlsson, qi.z.zhang, bjorn.topel

On Tue, 2019-03-19 at 15:12 +0800, Xiaolong Ye wrote:
> When create rte_mempool, flags can be parsed from command line.
> Now, it is possible for testpmd to create a af_xdp friendly
> mempool (which enable zero copy).
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  app/test-pmd/parameters.c             | 12 ++++++++++++
>  app/test-pmd/testpmd.c                | 17 ++++++++++-------
>  app/test-pmd/testpmd.h                |  1 +
>  doc/guides/testpmd_app_ug/run_app.rst |  4 ++++

If I understand it correctly, The user needs to change all the
application in order to avail zero copy feature of XDP.

If so,

How about creating wrapper mempool driver for xdp at drivers/mempool/?
and mempool's best mempool feature to select the required mempool
driver for XDP at runtime without changing the apps.

see rte_mbuf_best_mempool_ops()
see struct eth_dev_ops::pool_ops_supported

/Jerin





^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19 16:16     ` Stephen Hemminger
  2019-03-19 16:33       ` Bruce Richardson
@ 2019-03-20  2:05       ` Ye Xiaolong
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-20  2:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/19, Stephen Hemminger wrote:
>> \ No newline at end of file
>
>
>This is bad. Configure your editor to always put newline at end of file.
>In .emacs
>
>   (setq require-final-newline t)
>

I wasn't aware of it before, thanks for the info, will configure it for my emacs.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19 16:33       ` Bruce Richardson
@ 2019-03-20  2:07         ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-20  2:07 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Stephen Hemminger, dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/19, Bruce Richardson wrote:
>> This is bad. Configure your editor to always put newline at end of file.
>> In .emacs
>> 
>>    (setq require-final-newline t)
>> 
>Or use Vim which needs no extra configuration to do this. :-)

I used to be a Vimer, just switch to emacs recently. :-)
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 6/6] app/testpmd: add mempool flags parameter
  2019-03-19 23:36     ` Jerin Jacob Kollanukkaran
@ 2019-03-20  2:08       ` Ye Xiaolong
  2019-03-20  9:23       ` David Marchand
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-20  2:08 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran
  Cc: dev, olivier.matz, magnus.karlsson, qi.z.zhang, bjorn.topel

Hi, 

On 03/19, Jerin Jacob Kollanukkaran wrote:
>On Tue, 2019-03-19 at 15:12 +0800, Xiaolong Ye wrote:
>> When create rte_mempool, flags can be parsed from command line.
>> Now, it is possible for testpmd to create a af_xdp friendly
>> mempool (which enable zero copy).
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  app/test-pmd/parameters.c             | 12 ++++++++++++
>>  app/test-pmd/testpmd.c                | 17 ++++++++++-------
>>  app/test-pmd/testpmd.h                |  1 +
>>  doc/guides/testpmd_app_ug/run_app.rst |  4 ++++
>
>If I understand it correctly, The user needs to change all the
>application in order to avail zero copy feature of XDP.
>
>If so,
>
>How about creating wrapper mempool driver for xdp at drivers/mempool/?
>and mempool's best mempool feature to select the required mempool
>driver for XDP at runtime without changing the apps.
>
>see rte_mbuf_best_mempool_ops()
>see struct eth_dev_ops::pool_ops_supported

Sounds a good suggestion, I'll investigate and see how to implement it.

Thanks,
Xiaolong
>
>/Jerin
>
>
>
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19 16:14     ` Stephen Hemminger
@ 2019-03-20  2:32       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-20  2:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

Hi, Stephen

On 03/19, Stephen Hemminger wrote:
>Lots of little review comments. This is what I saw in 30 minutes.
>Expect more.

Thanks for taking time to review my patch. They are all valuable inputs.

>
>
>On Tue, 19 Mar 2019 15:12:51 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
>> +		nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
>
>Maybe use RTE_MIN() ?

Will do.

>
>> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
>> +		if (mbuf) {
>> +			memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
>
>Space necessary in "void*"
>Use rte_memcpy.

Will do.

>
>> +static void pull_umem_cq(struct xsk_umem_info *umem, int size)
>> +{
>> +	struct xsk_ring_cons *cq = &umem->cq;
>> +	int i, n;
>> +	uint32_t idx_cq;
>> +	uint64_t addr;
>> +
>> +	n = xsk_ring_cons__peek(cq, size, &idx_cq);
>> +	if (n > 0) {
>> +		for (i = 0; i < n; i++) {
>
>You don't need the if (n > 0) since if n <= 0 the for loop
>would happen 0 times.

Will remove the redundant "if (n > 0)"

>
>> +
>> +static void kick_tx(struct pkt_tx_queue *txq)
>> +{
>> +	struct xsk_umem_info *umem = txq->pair->umem;
>> +	int ret;
>> +
>> +	while (1) {
>> +		ret = sendto(xsk_socket__fd(txq->pair->xsk), NULL, 0,
>> +			     MSG_DONTWAIT, NULL, 0);
>> +
>> +		/* everything is ok */
>> +		if (ret >= 0)
>> +			break;
>
>I would prefer:
>	while ((send(xsk_socket__fd(fd, NULL, 0, MSG_DONTWAIT) < 0) {
>
>Because:
>	- use while() to make looping clearer rather than while(1)
>	- use send() rather than sendto() because you aren't sending with addr
>	- you don't care about return value (ie.ret)
>
>> +
>> +		/* some thing unexpected */
>> +		if (errno != EBUSY && errno != EAGAIN)
>> +			break;
>
>What about EINTR
>
>
>
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	dev_info->if_index = internals->if_index;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
>
>Cast here is unnecessary.

Got it.

>
>> +	dev_info->max_rx_queues = 1;
>> +	dev_info->max_tx_queues = 1;
>> +	dev_info->min_rx_bufsize = 0;
>
>dev_info is already zero, don't need to fill other values.

Got it.

>
>> +
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	dev_info->if_index = internals->if_index;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
>> +	dev_info->max_rx_queues = 1;
>> +	dev_info->max_tx_queues = 1;
>> +	dev_info->min_rx_bufsize = 0;
>> +
>> +	dev_info->default_rxportconf.nb_queues = 1;
>> +	dev_info->default_txportconf.nb_queues = 1;
>> +	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
>> +	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
>> +}
>> +
>> +static int
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	struct xdp_statistics xdp_stats;
>> +	struct pkt_rx_queue *rxq;
>> +	socklen_t optlen;
>> +	int i;
>> +
>> +	optlen = sizeof(struct xdp_statistics);
>
>In theory each call to getsockopt() could change or reduce the value of
>optlen. Best to initialize in loop before each call.

Got it.

>
>> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> +		rxq = &internals->rx_queues[i];
>> +		stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
>> +		stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
>> +
>> +		stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
>> +		stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
>> +
>> +		stats->ipackets += stats->q_ipackets[i];
>> +		stats->ibytes += stats->q_ibytes[i];
>> +		stats->imissed += internals->rx_queues[i].rx_dropped;
>> +		getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, XDP_STATISTICS,
>> +				&xdp_stats, &optlen);
>
>You need to check return value of getsockopt() otherwise coverity will complain.

Got it.

>
>> +static void
>> +eth_stats_reset(struct rte_eth_dev *dev)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	int i;
>> +
>> +	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
>> +		internals->rx_queues[i].rx_pkts = 0;
>> +		internals->rx_queues[i].rx_bytes = 0;
>> +		internals->rx_queues[i].rx_dropped = 0;
>> +
>> +		internals->tx_queues[i].tx_pkts = 0;
>> +		internals->tx_queues[i].err_pkts = 0;
>> +		internals->tx_queues[i].tx_bytes = 0;
>
>Put all the statistics together and use memset?

Sounds good, will do.

>
>> +static struct xsk_umem_info *xdp_umem_configure(void)
>> +{
>> +	struct xsk_umem_info *umem;
>> +	struct xsk_umem_config usr_config = {
>> +		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>> +		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>> +		.frame_size = ETH_AF_XDP_FRAME_SIZE,
>> +		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
>> +	void *bufs = NULL;
>> +	char ring_name[0x100];
>
>0x100 is unconvential here. Instead use RTE_RING_NAMESIZE
>but variable is unnecessary, see below

Got it.

>
>> +	int ret;
>> +	uint64_t i;
>> +
>> +	umem = calloc(1, sizeof(*umem));
>
>Why not use rte_zmalloc_node to:
>   1. work with primary/secondary
>   2. guarantee memory on correct numa node?

Will do.

>
>> +	if (!umem) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
>> +		return NULL;
>> +	}
>> +
>> +	snprintf(ring_name, 0x100, "af_xdp_ring");
>
>If ring is always the same, why copy it. Just use string literal

Got it.

>
>> +/** parse name argument */
>> +static int
>> +parse_name_arg(const char *key __rte_unused,
>> +	       const char *value, void *extra_args)
>> +{
>> +	char *name = extra_args;
>> +
>> +	if (strlen(value) > IFNAMSIZ) {
>
>Why not:
>	if (strnlen(value, IFNAMSIZ) >= IFNAMSIZ) {

Will do.

>
>> +
>> +static int
>> +init_internals(struct rte_vdev_device *dev,
>> +	       const char *if_name,
>> +	       int queue_idx,
>> +	       struct rte_eth_dev **eth_dev)
>
>If you changed the function to return the new eth_dev (and return NULL on error)
>then you wouldn't have to pass eth_dev by reference.
>
>static struct rte_eth_dev *
>allocate_ethdev(struct rte_vdev_device *dev, const char *if_name, uint16_t queue_idx)

Sounds better, will do.

>{ 
>
>> +{
>> +	const char *name = rte_vdev_device_name(dev);
>> +	const unsigned int numa_node = dev->device.numa_node;
>> +	struct pmd_internals *internals = NULL;
>
>Useless initialization, first thing you do is allocate this.

Got it.

Thanks,
Xiaolong

>
>
>> +	int ret;
>> +	int i;
>> +
>> +	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 5/6] net/af_xdp: enable zero copy
  2019-03-19  7:12   ` [PATCH v2 5/6] net/af_xdp: enable zero copy Xiaolong Ye
  2019-03-19  8:12     ` Mattias Rönnblom
@ 2019-03-20  9:22     ` David Marchand
  2019-03-20  9:48       ` Zhang, Qi Z
  1 sibling, 1 reply; 214+ messages in thread
From: David Marchand @ 2019-03-20  9:22 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Tue, Mar 19, 2019 at 8:17 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> Try to check if external mempool (from rx_queue_setup) is fit for
> af_xdp, if it is, it will be registered to af_xdp socket directly and
> there will be no packet data copy on Rx and Tx.
>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 128 ++++++++++++++++++++--------
>  1 file changed, 91 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index fc60cb5c5..c22791e51 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -62,6 +62,7 @@ struct xsk_umem_info {
>         struct xsk_umem *umem;
>         struct rte_mempool *mb_pool;
>         void *buffer;
> +       uint8_t zc;
>  };
>
>  struct pkt_rx_queue {
> @@ -76,6 +77,7 @@ struct pkt_rx_queue {
>
>         struct pkt_tx_queue *pair;
>         uint16_t queue_idx;
> +       uint8_t zc;
>  };
>
>  struct pkt_tx_queue {
> @@ -191,17 +193,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>                 uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
>                 char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>
> -               mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
> -               if (mbuf) {
> -                       memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
> +               if (rxq->zc) {
> +                       mbuf = addr_to_mbuf(rxq->umem, addr);
>                         rte_pktmbuf_pkt_len(mbuf) =
>                                 rte_pktmbuf_data_len(mbuf) = len;
> -                       rx_bytes += len;
>                         bufs[count++] = mbuf;
>                 } else {
> -                       dropped++;
> +                       mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
> +                       if (mbuf) {
> +                               memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt,
> len);
> +                               rte_pktmbuf_pkt_len(mbuf) =
> +                                       rte_pktmbuf_data_len(mbuf) = len;
> +                               rx_bytes += len;
> +                               bufs[count++] = mbuf;
> +                       } else {
> +                               dropped++;
> +                       }
> +                       rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>                 }
> -               rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>         }
>

Did not understand how the zc parts are working, but at least looking at
the rx_burst function, when multi q will be supported, is there any reason
we would have zc enabled on one rxq and not others?
If the answer is that we would have either all or none rxq with zc, we
could have dedicated rx_burst functions and avoid this per mbuf test on
rxq->zc.


For the tx part, I don't understand the relation between rx and tx.
Should not the zc capability be global to the ethdev port ?

You might also want to look at "simple" tx burst functions like in i40e so
that you only need to look at the first mbuf to check its originating pool.


-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                       ` (2 preceding siblings ...)
  2019-03-19 16:16     ` Stephen Hemminger
@ 2019-03-20  9:23     ` David Marchand
  2019-03-20 15:20       ` Ye Xiaolong
  3 siblings, 1 reply; 214+ messages in thread
From: David Marchand @ 2019-03-20  9:23 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Tue, Mar 19, 2019 at 8:17 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> diff --git a/doc/guides/nics/features/af_xdp.ini
> b/doc/guides/nics/features/af_xdp.ini
> new file mode 100644
> index 000000000..7b8fcce00
> --- /dev/null
> +++ b/doc/guides/nics/features/af_xdp.ini
> @@ -0,0 +1,11 @@
> +;
> +; Supported features of the 'af_xdp' network poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Link status          = Y
> +MTU update           = Y
> +Promiscuous mode     = Y
> +Stats per queue      = Y
> +x86-64               = Y
>

Is there really a limitation on x86?


diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 502869a87..5d401b8c5 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
>  endif
>
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
>  DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>  DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
>  DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
> diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
> new file mode 100644
> index 000000000..6cf0ed7db
> --- /dev/null
> +++ b/drivers/net/af_xdp/Makefile
> @@ -0,0 +1,33 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
>

2018? 2019?

+
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_af_xdp.a
> +
> +EXPORT_MAP := rte_pmd_af_xdp_version.map
> +
> +LIBABIVER := 1
> +
> +CFLAGS += -O3
> +
> +# require kernel version >= v5.1-rc1
> +LINUX_VERSION := $(shell uname -r)
> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/include
> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/lib/bpf
>

We can reuse RTE_KERNELDIR here (even if the docs state that this was to
build kmods so far).


> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> new file mode 100644
> index 000000000..96dedc0c4
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>

[snip]


> +static int
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +       struct pmd_internals *internals = dev->data->dev_private;
> +       struct xdp_statistics xdp_stats;
> +       struct pkt_rx_queue *rxq;
> +       socklen_t optlen;
> +       int i;
> +
> +       optlen = sizeof(struct xdp_statistics);
> +       for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +               rxq = &internals->rx_queues[i];
> +               stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
> +               stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
> +
> +               stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
> +               stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
> +
> +               stats->ipackets += stats->q_ipackets[i];
> +               stats->ibytes += stats->q_ibytes[i];
> +               stats->imissed += internals->rx_queues[i].rx_dropped;
> +               getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> XDP_STATISTICS,
> +                               &xdp_stats, &optlen);
> +               stats->imissed += xdp_stats.rx_dropped;
> +
> +               stats->opackets += stats->q_opackets[i];
> +               stats->oerrors += stats->q_errors[i];
>

You forgot to remove stats->q_errors[i];

+               stats->oerrors += internals->tx_queues[i].err_pkts;
> +               stats->obytes += stats->q_obytes[i];
> +       }
> +
> +       return 0;
> +}



-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 6/6] app/testpmd: add mempool flags parameter
  2019-03-19 23:36     ` Jerin Jacob Kollanukkaran
  2019-03-20  2:08       ` Ye Xiaolong
@ 2019-03-20  9:23       ` David Marchand
  2019-03-20 15:22         ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: David Marchand @ 2019-03-20  9:23 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran
  Cc: xiaolong.ye, dev, olivier.matz, magnus.karlsson, qi.z.zhang, bjorn.topel

On Wed, Mar 20, 2019 at 12:37 AM Jerin Jacob Kollanukkaran <
jerinj@marvell.com> wrote:

> On Tue, 2019-03-19 at 15:12 +0800, Xiaolong Ye wrote:
> > When create rte_mempool, flags can be parsed from command line.
> > Now, it is possible for testpmd to create a af_xdp friendly
> > mempool (which enable zero copy).
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> > ---
> >  app/test-pmd/parameters.c             | 12 ++++++++++++
> >  app/test-pmd/testpmd.c                | 17 ++++++++++-------
> >  app/test-pmd/testpmd.h                |  1 +
> >  doc/guides/testpmd_app_ug/run_app.rst |  4 ++++
>
> If I understand it correctly, The user needs to change all the
> application in order to avail zero copy feature of XDP.
>
> If so,
>
> How about creating wrapper mempool driver for xdp at drivers/mempool/?
> and mempool's best mempool feature to select the required mempool
> driver for XDP at runtime without changing the apps.
>
> see rte_mbuf_best_mempool_ops()
> see struct eth_dev_ops::pool_ops_supported
>

Glab to read this, I was under the same impression :-)


-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 5/6] net/af_xdp: enable zero copy
  2019-03-20  9:22     ` David Marchand
@ 2019-03-20  9:48       ` Zhang, Qi Z
  0 siblings, 0 replies; 214+ messages in thread
From: Zhang, Qi Z @ 2019-03-20  9:48 UTC (permalink / raw)
  To: David Marchand, Ye, Xiaolong; +Cc: dev, Karlsson, Magnus, Topel, Bjorn



From: David Marchand [mailto:david.marchand@redhat.com] 
Sent: Wednesday, March 20, 2019 5:22 PM
To: Ye, Xiaolong <xiaolong.ye@intel.com>
Cc: dev <dev@dpdk.org>; Zhang, Qi Z <qi.z.zhang@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn <bjorn.topel@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2 5/6] net/af_xdp: enable zero copy



On Tue, Mar 19, 2019 at 8:17 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 128 ++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 37 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index fc60cb5c5..c22791e51 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -62,6 +62,7 @@ struct xsk_umem_info {
        struct xsk_umem *umem;
        struct rte_mempool *mb_pool;
        void *buffer;
+       uint8_t zc;
 };

 struct pkt_rx_queue {
@@ -76,6 +77,7 @@ struct pkt_rx_queue {

        struct pkt_tx_queue *pair;
        uint16_t queue_idx;
+       uint8_t zc;
 };

 struct pkt_tx_queue {
@@ -191,17 +193,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
                uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
                char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);

-               mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
-               if (mbuf) {
-                       memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+               if (rxq->zc) {
+                       mbuf = addr_to_mbuf(rxq->umem, addr);
                        rte_pktmbuf_pkt_len(mbuf) =
                                rte_pktmbuf_data_len(mbuf) = len;
-                       rx_bytes += len;
                        bufs[count++] = mbuf;
                } else {
-                       dropped++;
+                       mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+                       if (mbuf) {
+                               memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+                               rte_pktmbuf_pkt_len(mbuf) =
+                                       rte_pktmbuf_data_len(mbuf) = len;
+                               rx_bytes += len;
+                               bufs[count++] = mbuf;
+                       } else {
+                               dropped++;
+                       }
+                       rte_pktmbuf_free(addr_to_mbuf(umem, addr));
                }
-               rte_pktmbuf_free(addr_to_mbuf(umem, addr));
        }

Did not understand how the zc parts are working, but at least looking at the rx_burst function, when multi q will be supported, is there any reason we would have zc enabled on one rxq and not others?

[Qi:] the answer is no, we can't anticipate which memory pool application use during rx queue setup, also at the case multi queue share the same memory pool, umem still can't be shared due to race condition, so only one queue could be zc. To make all the queue have zc, we have to assign each queue different memory pool.

If the answer is that we would have either all or none rxq with zc, we could have dedicated rx_burst functions and avoid this per mbuf test on rxq->zc.




For the tx part, I don't understand the relation between rx and tx.
Should not the zc capability be global to the ethdev port ?

You might also want to look at "simple" tx burst functions like in i40e so that you only need to look at the first mbuf to check its originating pool.

[Qi:] if you mean DEV_TX_OFFLOAD_MBUF_FAST_FREE, yes I think that's good point.


-- 
David Marchand

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver
  2019-03-20  9:23     ` David Marchand
@ 2019-03-20 15:20       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-20 15:20 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

Thanks for your comments.

On 03/20, David Marchand wrote:
>On Tue, Mar 19, 2019 at 8:17 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> diff --git a/doc/guides/nics/features/af_xdp.ini
>> b/doc/guides/nics/features/af_xdp.ini
>> new file mode 100644
>> index 000000000..7b8fcce00
>> --- /dev/null
>> +++ b/doc/guides/nics/features/af_xdp.ini
>> @@ -0,0 +1,11 @@
>> +;
>> +; Supported features of the 'af_xdp' network poll mode driver.
>> +;
>> +; Refer to default.ini for the full list of available PMD features.
>> +;
>> +[Features]
>> +Link status          = Y
>> +MTU update           = Y
>> +Promiscuous mode     = Y
>> +Stats per queue      = Y
>> +x86-64               = Y
>>
>
>Is there really a limitation on x86?

I think no, just don't have a chance to try it on a x86-32 machine.

>
>
>diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>> index 502869a87..5d401b8c5 100644
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
>>  endif
>>
>>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
>>  DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>>  DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
>>  DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
>> diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
>> new file mode 100644
>> index 000000000..6cf0ed7db
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/Makefile
>> @@ -0,0 +1,33 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>>
>
>2018? 2019?

Will correct it to 2019.

>
>+
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +#
>> +# library name
>> +#
>> +LIB = librte_pmd_af_xdp.a
>> +
>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +CFLAGS += -O3
>> +
>> +# require kernel version >= v5.1-rc1
>> +LINUX_VERSION := $(shell uname -r)
>> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/include
>> +CFLAGS += -I/lib/modules/$(LINUX_VERSION)/build/tools/lib/bpf
>>
>
>We can reuse RTE_KERNELDIR here (even if the docs state that this was to
>build kmods so far).

Will do.

>
>
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> new file mode 100644
>> index 000000000..96dedc0c4
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>
>
>[snip]
>
>
>> +static int
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>> +{
>> +       struct pmd_internals *internals = dev->data->dev_private;
>> +       struct xdp_statistics xdp_stats;
>> +       struct pkt_rx_queue *rxq;
>> +       socklen_t optlen;
>> +       int i;
>> +
>> +       optlen = sizeof(struct xdp_statistics);
>> +       for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> +               rxq = &internals->rx_queues[i];
>> +               stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts;
>> +               stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes;
>> +
>> +               stats->q_opackets[i] = internals->tx_queues[i].tx_pkts;
>> +               stats->q_obytes[i] = internals->tx_queues[i].tx_bytes;
>> +
>> +               stats->ipackets += stats->q_ipackets[i];
>> +               stats->ibytes += stats->q_ibytes[i];
>> +               stats->imissed += internals->rx_queues[i].rx_dropped;
>> +               getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
>> XDP_STATISTICS,
>> +                               &xdp_stats, &optlen);
>> +               stats->imissed += xdp_stats.rx_dropped;
>> +
>> +               stats->opackets += stats->q_opackets[i];
>> +               stats->oerrors += stats->q_errors[i];
>>
>
>You forgot to remove stats->q_errors[i];
>
>+               stats->oerrors += internals->tx_queues[i].err_pkts;

My bad, will remove it in next version.


Thanks,
Xiaolong
>> +               stats->obytes += stats->q_obytes[i];
>> +       }
>> +
>> +       return 0;
>> +}
>
>
>
>-- 
>David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 6/6] app/testpmd: add mempool flags parameter
  2019-03-20  9:23       ` David Marchand
@ 2019-03-20 15:22         ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-20 15:22 UTC (permalink / raw)
  To: David Marchand
  Cc: Jerin Jacob Kollanukkaran, dev, olivier.matz, magnus.karlsson,
	qi.z.zhang, bjorn.topel

On 03/20, David Marchand wrote:
>On Wed, Mar 20, 2019 at 12:37 AM Jerin Jacob Kollanukkaran <
>jerinj@marvell.com> wrote:
>
>> On Tue, 2019-03-19 at 15:12 +0800, Xiaolong Ye wrote:
>> > When create rte_mempool, flags can be parsed from command line.
>> > Now, it is possible for testpmd to create a af_xdp friendly
>> > mempool (which enable zero copy).
>> >
>> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> > Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> > ---
>> >  app/test-pmd/parameters.c             | 12 ++++++++++++
>> >  app/test-pmd/testpmd.c                | 17 ++++++++++-------
>> >  app/test-pmd/testpmd.h                |  1 +
>> >  doc/guides/testpmd_app_ug/run_app.rst |  4 ++++
>>
>> If I understand it correctly, The user needs to change all the
>> application in order to avail zero copy feature of XDP.
>>
>> If so,
>>
>> How about creating wrapper mempool driver for xdp at drivers/mempool/?
>> and mempool's best mempool feature to select the required mempool
>> driver for XDP at runtime without changing the apps.
>>
>> see rte_mbuf_best_mempool_ops()
>> see struct eth_dev_ops::pool_ops_supported
>>
>
>Glab to read this, I was under the same impression :-)

We plan to separate this in another patchset and may target for next release.

Thanks,
Xiaolong
>
>
>-- 
>David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v3 0/5] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (7 preceding siblings ...)
  2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
@ 2019-03-21  9:18 ` Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                     ` (4 more replies)
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
                   ` (7 subsequent siblings)
  16 siblings, 5 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-21  9:18 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev eth_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.
Xiaolong Ye (5):
  net/af_xdp: introduce AF XDP PMD driver
  lib/mbuf: introduce helper to create mempool with flags
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy

 MAINTAINERS                                   |    6 +
 config/common_base                            |    5 +
 config/common_linux                           |    1 +
 doc/guides/nics/af_xdp.rst                    |   45 +
 doc/guides/nics/features/af_xdp.ini           |   11 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/rel_notes/release_19_05.rst        |    7 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   32 +
 drivers/net/af_xdp/meson.build                |   21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1018 +++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    3 +
 drivers/net/meson.build                       |    1 +
 lib/librte_mbuf/rte_mbuf.c                    |   29 +-
 lib/librte_mbuf/rte_mbuf.h                    |   45 +
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 18 files changed, 1226 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-21  9:18   ` Xiaolong Ye
  2019-03-21 15:24     ` Stephen Hemminger
                       ` (7 more replies)
  2019-03-21  9:18   ` [PATCH v3 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
                     ` (3 subsequent siblings)
  4 siblings, 8 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-21  9:18 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 config/common_linux                           |   1 +
 doc/guides/nics/af_xdp.rst                    |  45 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 932 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1067 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..1cc54b439 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/config/common_linux b/config/common_linux
index 75334273d..0b1249da0 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..dd5654dd1
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c7383..062facf89 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..db7d9aa57
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..635e67483
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..5e671670a
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,932 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#define RTE_LOGTYPE_AF_XDP RTE_LOGTYPE_USER1
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	void *addr = NULL;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (!ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		rte_ring_dequeue(umem->buf_ring, &addr);
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, reserve_size);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbuf;
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+		if (mbuf != NULL) {
+			rte_memcpy(rte_pktmbuf_mtod(mbuf, void *), pkt, len);
+			rte_pktmbuf_pkt_len(mbuf) = len;
+			rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			dropped++;
+		}
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+	rxq->stats.rx_dropped += dropped;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		RTE_LOG(ERR, AF_XDP, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	RTE_LOG(INFO, AF_XDP, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		RTE_LOG(ERR, AF_XDP,
+			"Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+	int xsk_fd = xsk_socket__fd(rxq->xsk);
+
+	if (xsk_fd) {
+		close(xsk_fd);
+		if (internals->umem != NULL) {
+			xdp_umem_destroy(internals->umem);
+			internals->umem = NULL;
+		}
+	}
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		RTE_LOG(ERR, AF_XDP,
+			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		RTE_LOG(ERR, AF_XDP,
+			"Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		RTE_LOG(ERR, AF_XDP, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
+			"%u bytes.\n", value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	*if_index = if_nametoindex(if_name);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	RTE_LOG(INFO, AF_XDP, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			RTE_LOG(ERR, AF_XDP, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		RTE_LOG(ERR, AF_XDP, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		RTE_LOG(ERR, AF_XDP, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		RTE_LOG(ERR, AF_XDP, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	RTE_LOG(INFO, AF_XDP, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v3 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-21  9:18   ` Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-21  9:18 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

This allows applications to create mbuf mempool with specific flags
such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 29 +++++++++++++++++++-----
 lib/librte_mbuf/rte_mbuf.h | 45 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..c1db9e298 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 	m->next = NULL;
 }
 
-/* Helper to create a mbuf pool with given mempool ops name*/
-struct rte_mempool *
-rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+static struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	return mp;
 }
 
+/* Helper to create a mbuf pool with given mempool ops name*/
+struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	int socket_id, const char *ops_name)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			 priv_size, data_room_size, 0, socket_id, ops_name);
+}
+
 /* helper to create a mbuf pool */
 struct rte_mempool *
 rte_pktmbuf_pool_create(const char *name, unsigned int n,
@@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 			data_room_size, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			priv_size, data_room_size, flags, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..105ead6de 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+/**
+ * Create a mbuf pool with flags.
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v3 3/5] lib/mempool: allow page size aligned mempool
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-21  9:18   ` Xiaolong Ye
  2019-03-21 14:00     ` Ananyev, Konstantin
  2019-03-21  9:18   ` [PATCH v3 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-21  9:18 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..33ab6a2b4 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+			align = getpagesize();
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..75553b36f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v3 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-21  9:18   ` [PATCH v3 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-21  9:18   ` Xiaolong Ye
  2019-03-21  9:18   ` [PATCH v3 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-21  9:18 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be
allocated from rte_mempool can be convert to xdp_desc's address and vice
versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 123 +++++++++++++++++-----------
 1 file changed, 77 insertions(+), 46 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 5e671670a..76783c33d 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -43,7 +43,11 @@
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -56,7 +60,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -118,12 +122,32 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
-	void *addr = NULL;
+	uint64_t addr;
 	int i, ret;
 
 	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
@@ -134,12 +158,17 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 
 	for (i = 0; i < reserve_size; i++) {
 		__u64 *fq_addr;
-		rte_ring_dequeue(umem->buf_ring, &addr);
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		if (mbuf == NULL) {
+			i--;
+			break;
+		}
+		addr = mbuf_to_addr(umem, mbuf);
 		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
-		*fq_addr = (uint64_t)addr;
+		*fq_addr = addr;
 	}
 
-	xsk_ring_prod__submit(fq, reserve_size);
+	xsk_ring_prod__submit(fq, i);
 
 	return 0;
 }
@@ -189,7 +218,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		} else {
 			dropped++;
 		}
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -213,7 +242,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	for (i = 0; i < n; i++) {
 		uint64_t addr;
 		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(cq, n);
@@ -242,7 +271,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -251,11 +280,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (nb_pkts == 0)
-		return 0;
-
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
 		kick_tx(txq);
 		return 0;
@@ -269,7 +293,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (mbuf_to_tx == NULL) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -285,10 +314,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->stats.err_pkts += nb_pkts - valid;
 	txq->stats.tx_pkts += valid;
 	txq->stats.tx_bytes += tx_bytes;
@@ -437,16 +462,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	free(umem->buffer);
-	umem->buffer = NULL;
-
-	rte_ring_free(umem->buf_ring);
-	umem->buf_ring = NULL;
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
 
 	free(umem);
 	umem = NULL;
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -455,9 +493,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
+	void *base_addr = NULL;
 	int ret;
-	uint64_t i;
 
 	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
 	if (umem == NULL) {
@@ -465,27 +502,22 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (umem->buf_ring == NULL) {
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+
+	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
 		RTE_LOG(ERR, AF_XDP,
-			"Failed to create rte_ring\n");
+			"Failed to create rte_mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -494,7 +526,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -911,8 +943,7 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_free(internals->umem);
 
 	rte_eth_dev_release_port(eth_dev);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v3 5/5] net/af_xdp: enable zero copy
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-21  9:18   ` [PATCH v3 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-21  9:18   ` Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-21  9:18 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 125 ++++++++++++++++++++--------
 1 file changed, 90 insertions(+), 35 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 76783c33d..0ff2f8fa8 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -62,6 +62,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct rx_stats {
@@ -80,6 +81,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct tx_stats {
@@ -208,17 +210,25 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		len = desc->len;
 		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
-		if (mbuf != NULL) {
-			rte_memcpy(rte_pktmbuf_mtod(mbuf, void *), pkt, len);
+		if (rxq->zc) {
+			mbuf = addr_to_mbuf(rxq->umem, addr);
 			rte_pktmbuf_pkt_len(mbuf) = len;
 			rte_pktmbuf_data_len(mbuf) = len;
 			rx_bytes += len;
 			bufs[count++] = mbuf;
 		} else {
-			dropped++;
+			mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+			if (mbuf != NULL) {
+				memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+				rte_pktmbuf_pkt_len(mbuf) = len;
+				rte_pktmbuf_data_len(mbuf) = len;
+				rx_bytes += len;
+				bufs[count++] = mbuf;
+			} else {
+				dropped++;
+			}
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 		}
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -292,22 +302,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (mbuf_to_tx == NULL) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (!mbuf_to_tx) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -485,7 +502,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -502,18 +519,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-
-	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
-		RTE_LOG(ERR, AF_XDP,
-			"Failed to create rte_mempool\n");
-		goto err;
+	if (!mb_pool) {
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (umem->mb_pool != NULL ||
+				umem->mb_pool->nb_mem_chunks != 1) {
+			RTE_LOG(ERR, AF_XDP, "Failed to create rte_mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
@@ -535,16 +557,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (rxq->umem == NULL) {
 		ret = -ENOMEM;
 		goto err;
@@ -631,15 +680,21 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
-		RTE_LOG(ERR, AF_XDP,
-			"Failed to configure xdp socket\n");
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
+		RTE_LOG(ERR, AF_XDP, "Failed to configure xdp socket\n");
 		ret = -EINVAL;
 		goto err;
 	}
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		RTE_LOG(INFO, AF_XDP,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 3/5] lib/mempool: allow page size aligned mempool
  2019-03-21  9:18   ` [PATCH v3 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-21 14:00     ` Ananyev, Konstantin
  2019-03-21 14:23       ` Zhang, Qi Z
  0 siblings, 1 reply; 214+ messages in thread
From: Ananyev, Konstantin @ 2019-03-21 14:00 UTC (permalink / raw)
  To: Ye, Xiaolong, dev
  Cc: Zhang, Qi Z, Karlsson, Magnus, Topel, Bjorn, Ye, Xiaolong



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xiaolong Ye
> Sent: Thursday, March 21, 2019 9:19 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn <bjorn.topel@intel.com>; Ye,
> Xiaolong <xiaolong.ye@intel.com>
> Subject: [dpdk-dev] [PATCH v3 3/5] lib/mempool: allow page size aligned mempool
> 
> Allow create a mempool with page size aligned base address.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  lib/librte_mempool/rte_mempool.c | 3 +++
>  lib/librte_mempool/rte_mempool.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 683b216f9..33ab6a2b4 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>  		if (try_contig)
>  			flags |= RTE_MEMZONE_IOVA_CONTIG;
> 
> +		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
> +			align = getpagesize();
> +

Might be a bit safer:
pg_sz = getpagesize();
align = RTE_MAX(align, pg_sz);

BTW, why do you need it always default page size aligned?
Is it for 'external' memory allocation or even for eal hugepages too?
Konstantin

>  		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
>  				mp->socket_id, flags, align);
> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 3/5] lib/mempool: allow page size aligned mempool
  2019-03-21 14:00     ` Ananyev, Konstantin
@ 2019-03-21 14:23       ` Zhang, Qi Z
  0 siblings, 0 replies; 214+ messages in thread
From: Zhang, Qi Z @ 2019-03-21 14:23 UTC (permalink / raw)
  To: Ananyev, Konstantin, Ye, Xiaolong, dev
  Cc: Karlsson, Magnus, Topel, Bjorn, Ye, Xiaolong



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, March 21, 2019 10:01 PM
> To: Ye, Xiaolong <xiaolong.ye@intel.com>; dev@dpdk.org
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; Topel, Bjorn <bjorn.topel@intel.com>; Ye,
> Xiaolong <xiaolong.ye@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v3 3/5] lib/mempool: allow page size aligned
> mempool
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xiaolong Ye
> > Sent: Thursday, March 21, 2019 9:19 AM
> > To: dev@dpdk.org
> > Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Karlsson, Magnus
> > <magnus.karlsson@intel.com>; Topel, Bjorn <bjorn.topel@intel.com>; Ye,
> > Xiaolong <xiaolong.ye@intel.com>
> > Subject: [dpdk-dev] [PATCH v3 3/5] lib/mempool: allow page size
> > aligned mempool
> >
> > Allow create a mempool with page size aligned base address.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> > ---
> >  lib/librte_mempool/rte_mempool.c | 3 +++
> > lib/librte_mempool/rte_mempool.h | 1 +
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/lib/librte_mempool/rte_mempool.c
> > b/lib/librte_mempool/rte_mempool.c
> > index 683b216f9..33ab6a2b4 100644
> > --- a/lib/librte_mempool/rte_mempool.c
> > +++ b/lib/librte_mempool/rte_mempool.c
> > @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool
> *mp)
> >  		if (try_contig)
> >  			flags |= RTE_MEMZONE_IOVA_CONTIG;
> >
> > +		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
> > +			align = getpagesize();
> > +
> 
> Might be a bit safer:
> pg_sz = getpagesize();
> align = RTE_MAX(align, pg_sz);
> 
> BTW, why do you need it always default page size aligned?
> Is it for 'external' memory allocation or even for eal hugepages too?

this help us to enable zero copy between xdp umem to mbuf.
af_xdp umem require 2K chunk size and also aligned on 2K address, 

Qi


> Konstantin
> 
> >  		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
> >  				mp->socket_id, flags, align);
> >

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-21 15:24     ` Stephen Hemminger
  2019-03-22  2:05       ` Ye Xiaolong
  2019-03-21 15:25     ` Stephen Hemminger
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:24 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +static inline int
> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
> +{
> +	struct xsk_ring_prod *fq = &umem->fq;
> +	uint32_t idx;
> +	void *addr = NULL;
> +	int i, ret;
> +
> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
> +	if (!ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
> +		return ret;
> +	}
> +
> +	for (i = 0; i < reserve_size; i++) {
> +		__u64 *fq_addr;
> +		rte_ring_dequeue(umem->buf_ring, &addr);

You should check return value of dequeue, otherwise static checkers will
(rightly) complain that "everyone else checks return value of of rte_ring_dequeue()
why not here?"

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-21 15:24     ` Stephen Hemminger
@ 2019-03-21 15:25     ` Stephen Hemminger
  2019-03-22  2:05       ` Ye Xiaolong
  2019-03-21 15:27     ` Stephen Hemminger
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:25 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +	for (i = 0; i < rcvd; i++) {
> +		const struct xdp_desc *desc;
> +		uint64_t addr;
> +		uint32_t len;
> +		void *pkt;
> +
> +		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
> +		addr = desc->addr;
> +		len = desc->len;
> +		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
> +
> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);

You could use rte_pktmbuf_alloc_bulk to get the mbufs in one call
before doing this. It saves rcvd-1 atomic operations.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-21 15:24     ` Stephen Hemminger
  2019-03-21 15:25     ` Stephen Hemminger
@ 2019-03-21 15:27     ` Stephen Hemminger
  2019-03-22  2:04       ` Ye Xiaolong
  2019-03-21 15:28     ` Stephen Hemminger
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:27 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> static void kick_tx(struct pkt_tx_queue *txq)
> +{
> +	struct xsk_umem_info *umem = txq->pair->umem;
> +
> +	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
> +		      0, MSG_DONTWAIT) < 0) {
> +		/* some thing unexpected */
> +		if (errno != EBUSY && errno != EAGAIN)
> +			break;
> +
> +		/* pull from complete qeueu to leave more space */
> +		if (errno == EAGAIN)
> +			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
> +	}

What about EINTR??
You should retry the send then.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                       ` (2 preceding siblings ...)
  2019-03-21 15:27     ` Stephen Hemminger
@ 2019-03-21 15:28     ` Stephen Hemminger
  2019-03-22  2:15       ` Ye Xiaolong
  2019-03-21 15:30     ` Stephen Hemminger
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:28 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +		if (ret != 0) {
> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
> +			return -1;

You need to use the new dynamic log types and not have a global logtype.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                       ` (3 preceding siblings ...)
  2019-03-21 15:28     ` Stephen Hemminger
@ 2019-03-21 15:30     ` Stephen Hemminger
  2019-03-22  2:01       ` Ye Xiaolong
  2019-03-21 15:31     ` Stephen Hemminger
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:30 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	return 0;

You could propogate kernel errno into DPDK?
	return (ret < 0) ? -errno : 0;

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                       ` (4 preceding siblings ...)
  2019-03-21 15:30     ` Stephen Hemminger
@ 2019-03-21 15:31     ` Stephen Hemminger
  2019-03-22  1:55       ` Ye Xiaolong
  2019-03-21 15:32     ` Stephen Hemminger
  2019-03-21 15:36     ` Stephen Hemminger
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:31 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
> +		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
> +			"%u bytes.\n", value, IFNAMSIZ)

Please don't break error message strings across multiple source lines.
It makes it harder to use tools like grep to find errors in source.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                       ` (5 preceding siblings ...)
  2019-03-21 15:31     ` Stephen Hemminger
@ 2019-03-21 15:32     ` Stephen Hemminger
  2019-03-22  1:54       ` Ye Xiaolong
  2019-03-21 15:36     ` Stephen Hemminger
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:32 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
> +		goto error;
> +
> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
> +		goto error;
> +
> +	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
> +
> +	close(sock);
> +	*if_index = if_nametoindex(if_name);

This seems confused:
	- first you get ifindex with SIOCGIFINDEX, then you ignore the result
	- then get MAC address.
	- then use if_nametoindex() which does SIOCGIFINDEX internally

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                       ` (6 preceding siblings ...)
  2019-03-21 15:32     ` Stephen Hemminger
@ 2019-03-21 15:36     ` Stephen Hemminger
  2019-03-22  1:49       ` Ye Xiaolong
  7 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:36 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);

The convention in other network drivers is to use net_XXX in the vdev name.
In AF_XDP that would be:

RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);

About naming, I would just drop AF_ from the name everywhere, the driver
is about running over XDP, and the "AF_" is just a prefix for address family.

Why not:
	net/xdp

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:36     ` Stephen Hemminger
@ 2019-03-22  1:49       ` Ye Xiaolong
  2019-03-22  9:32         ` Bruce Richardson
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  1:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
>
>The convention in other network drivers is to use net_XXX in the vdev name.
>In AF_XDP that would be:
>
>RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);

Got it.

>
>About naming, I would just drop AF_ from the name everywhere, the driver
>is about running over XDP, and the "AF_" is just a prefix for address family.
>
>Why not:
>	net/xdp

Thanks for the advice, Actually this driver is more about AF_XDP rathan than
XDP, the foundational objects it uses such as umem, umem fill ring, umem
completion ring, tx ring, rx ring which are all AF_XDP specfic, so I would
rather keep the naming.

Thanks,
Xiaolong
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:32     ` Stephen Hemminger
@ 2019-03-22  1:54       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  1:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
>> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
>> +		goto error;
>> +
>> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
>> +		goto error;
>> +
>> +	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
>> +
>> +	close(sock);
>> +	*if_index = if_nametoindex(if_name);
>
>This seems confused:
>	- first you get ifindex with SIOCGIFINDEX, then you ignore the result
>	- then get MAC address.
>	- then use if_nametoindex() which does SIOCGIFINDEX internally

You're right, the code is chaotic here, will improve it in next version.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:31     ` Stephen Hemminger
@ 2019-03-22  1:55       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  1:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
>> +		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
>> +			"%u bytes.\n", value, IFNAMSIZ)
>
>Please don't break error message strings across multiple source lines.
>It makes it harder to use tools like grep to find errors in source.

Good point, will keep this in mind.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:30     ` Stephen Hemminger
@ 2019-03-22  2:01       ` Ye Xiaolong
  2019-03-22 15:37         ` Stephen Hemminger
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  2:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	return 0;
>
>You could propogate kernel errno into DPDK?
>	return (ret < 0) ? -errno : 0;
>

Sorry, could you share the advantage of doing this?

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:27     ` Stephen Hemminger
@ 2019-03-22  2:04       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  2:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> static void kick_tx(struct pkt_tx_queue *txq)
>> +{
>> +	struct xsk_umem_info *umem = txq->pair->umem;
>> +
>> +	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
>> +		      0, MSG_DONTWAIT) < 0) {
>> +		/* some thing unexpected */
>> +		if (errno != EBUSY && errno != EAGAIN)
>> +			break;
>> +
>> +		/* pull from complete qeueu to leave more space */
>> +		if (errno == EAGAIN)
>> +			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
>> +	}
>
>What about EINTR??
>You should retry the send then.

Will do.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:25     ` Stephen Hemminger
@ 2019-03-22  2:05       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  2:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +	for (i = 0; i < rcvd; i++) {
>> +		const struct xdp_desc *desc;
>> +		uint64_t addr;
>> +		uint32_t len;
>> +		void *pkt;
>> +
>> +		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
>> +		addr = desc->addr;
>> +		len = desc->len;
>> +		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>> +
>> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
>
>You could use rte_pktmbuf_alloc_bulk to get the mbufs in one call
>before doing this. It saves rcvd-1 atomic operations.

Got it, will do.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:24     ` Stephen Hemminger
@ 2019-03-22  2:05       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  2:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +static inline int
>> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>> +{
>> +	struct xsk_ring_prod *fq = &umem->fq;
>> +	uint32_t idx;
>> +	void *addr = NULL;
>> +	int i, ret;
>> +
>> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
>> +	if (!ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
>> +		return ret;
>> +	}
>> +
>> +	for (i = 0; i < reserve_size; i++) {
>> +		__u64 *fq_addr;
>> +		rte_ring_dequeue(umem->buf_ring, &addr);
>
>You should check return value of dequeue, otherwise static checkers will
>(rightly) complain that "everyone else checks return value of of rte_ring_dequeue()
>why not here?"

Got it, will do.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-21 15:28     ` Stephen Hemminger
@ 2019-03-22  2:15       ` Ye Xiaolong
  2019-03-22 15:38         ` Stephen Hemminger
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22  2:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +		if (ret != 0) {
>> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
>> +			return -1;
>
>You need to use the new dynamic log types and not have a global logtype.

You mean for all the logs in this driver, right? Is it due to the global logtype
will be deprecated?

Will investigate and implement the dynamic log type.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22  1:49       ` Ye Xiaolong
@ 2019-03-22  9:32         ` Bruce Richardson
  0 siblings, 0 replies; 214+ messages in thread
From: Bruce Richardson @ 2019-03-22  9:32 UTC (permalink / raw)
  To: Ye Xiaolong
  Cc: Stephen Hemminger, dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Fri, Mar 22, 2019 at 09:49:03AM +0800, Ye Xiaolong wrote:
> On 03/21, Stephen Hemminger wrote:
> >On Thu, 21 Mar 2019 17:18:41 +0800
> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> >
> >> +
> >> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
> >
> >The convention in other network drivers is to use net_XXX in the vdev name.
> >In AF_XDP that would be:
> >
> >RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
> 
> Got it.
> 
> >
> >About naming, I would just drop AF_ from the name everywhere, the driver
> >is about running over XDP, and the "AF_" is just a prefix for address family.
> >
> >Why not:
> >	net/xdp
> 
> Thanks for the advice, Actually this driver is more about AF_XDP rathan than
> XDP, the foundational objects it uses such as umem, umem fill ring, umem
> completion ring, tx ring, rx ring which are all AF_XDP specfic, so I would
> rather keep the naming.
> 
+1 for the naming. AF_XDP is something different from XDP itself, though
the former does use the latter.

/Bruce

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v4 0/5] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (8 preceding siblings ...)
  2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-22 13:01 ` Xiaolong Ye
  2019-03-22 13:01   ` [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                     ` (4 more replies)
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
                   ` (6 subsequent siblings)
  16 siblings, 5 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-22 13:01 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (5):
  net/af_xdp: introduce AF XDP PMD driver
  lib/mbuf: introduce helper to create mempool with flags
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy

 MAINTAINERS                                   |    6 +
 config/common_base                            |    5 +
 config/common_linux                           |    1 +
 doc/guides/nics/af_xdp.rst                    |   45 +
 doc/guides/nics/features/af_xdp.ini           |   11 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/rel_notes/release_19_05.rst        |    7 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   32 +
 drivers/net/af_xdp/meson.build                |   21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1028 +++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    3 +
 drivers/net/meson.build                       |    1 +
 lib/librte_mbuf/rte_mbuf.c                    |   29 +-
 lib/librte_mbuf/rte_mbuf.h                    |   45 +
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 18 files changed, 1236 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-22 13:01   ` Xiaolong Ye
  2019-03-22 14:32     ` Maxime Coquelin
  2019-03-24 12:10     ` Luca Boccassi
  2019-03-22 13:01   ` [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-22 13:01 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 config/common_linux                           |   1 +
 doc/guides/nics/af_xdp.rst                    |  45 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 940 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1075 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..1cc54b439 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/config/common_linux b/config/common_linux
index 75334273d..0b1249da0 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..dd5654dd1
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c7383..062facf89 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..db7d9aa57
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..635e67483
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..9f0012347
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,940 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt "\n", __func__, ##args)
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	void *addr = NULL;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (!ret) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0)
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+	int xsk_fd = xsk_socket__fd(rxq->xsk);
+
+	if (xsk_fd) {
+		close(xsk_fd);
+		if (internals->umem != NULL) {
+			xdp_umem_destroy(internals->umem);
+			internals->umem = NULL;
+		}
+	}
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-22 13:01   ` [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-22 13:01   ` Xiaolong Ye
  2019-03-22 14:36     ` Maxime Coquelin
  2019-03-22 13:01   ` [PATCH v4 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-22 13:01 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

This allows applications to create mbuf mempool with specific flags
such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 29 +++++++++++++++++++-----
 lib/librte_mbuf/rte_mbuf.h | 45 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..c1db9e298 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 	m->next = NULL;
 }
 
-/* Helper to create a mbuf pool with given mempool ops name*/
-struct rte_mempool *
-rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+static struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	return mp;
 }
 
+/* Helper to create a mbuf pool with given mempool ops name*/
+struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	int socket_id, const char *ops_name)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			 priv_size, data_room_size, 0, socket_id, ops_name);
+}
+
 /* helper to create a mbuf pool */
 struct rte_mempool *
 rte_pktmbuf_pool_create(const char *name, unsigned int n,
@@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 			data_room_size, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			priv_size, data_room_size, flags, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..105ead6de 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+/**
+ * Create a mbuf pool with flags.
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v4 3/5] lib/mempool: allow page size aligned mempool
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-22 13:01   ` [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-22 13:01   ` [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-22 13:01   ` Xiaolong Ye
  2019-03-22 13:01   ` [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
  2019-03-22 13:01   ` [PATCH v4 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-22 13:01 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..171ba1057 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+			align = RTE_MAX(align, (size_t)getpagesize());
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..75553b36f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-22 13:01   ` [PATCH v4 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-22 13:01   ` Xiaolong Ye
  2019-03-22 14:51     ` Maxime Coquelin
  2019-03-22 13:01   ` [PATCH v4 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-22 13:01 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be
allocated from rte_mempool can be convert to xdp_desc's address and vice
versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
 1 file changed, 72 insertions(+), 45 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 9f0012347..6b1bc462a 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -48,7 +48,11 @@ static int af_xdp_logtype;
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -61,7 +65,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -123,12 +127,32 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
-	void *addr = NULL;
+	uint64_t addr;
 	int i, ret;
 
 	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
@@ -139,12 +163,14 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 
 	for (i = 0; i < reserve_size; i++) {
 		__u64 *fq_addr;
-		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		if (mbuf == NULL) {
 			i--;
 			break;
 		}
+		addr = mbuf_to_addr(umem, mbuf);
 		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
-		*fq_addr = (uint64_t)addr;
+		*fq_addr = addr;
 	}
 
 	xsk_ring_prod__submit(fq, i);
@@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		rx_bytes += len;
 		bufs[count++] = mbufs[i];
 
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	for (i = 0; i < n; i++) {
 		uint64_t addr;
 		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(cq, n);
@@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (nb_pkts == 0)
-		return 0;
-
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
 		kick_tx(txq);
 		return 0;
@@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (mbuf_to_tx == NULL) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->stats.err_pkts += nb_pkts - valid;
 	txq->stats.tx_pkts += valid;
 	txq->stats.tx_bytes += tx_bytes;
@@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	free(umem->buffer);
-	umem->buffer = NULL;
-
-	rte_ring_free(umem->buf_ring);
-	umem->buf_ring = NULL;
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
 
 	rte_free(umem);
 	umem = NULL;
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
+	void *base_addr = NULL;
 	int ret;
-	uint64_t i;
 
 	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
 	if (umem == NULL) {
@@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (umem->buf_ring == NULL) {
-		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
+		AF_XDP_LOG(ERR, "Failed to create mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		AF_XDP_LOG(ERR, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -912,10 +940,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
 	rte_free(internals->umem);
 
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_eth_dev_release_port(eth_dev);
 
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v4 5/5] net/af_xdp: enable zero copy
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-22 13:01   ` [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-22 13:01   ` Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-22 13:01 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 129 ++++++++++++++++++++--------
 1 file changed, 95 insertions(+), 34 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 6b1bc462a..124d341d0 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -67,6 +67,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct rx_stats {
@@ -85,6 +86,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct tx_stats {
@@ -202,7 +204,8 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
 		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
 
-	if (rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0)
+	if (!rxq->zc &&
+		rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0)
 		return 0;
 
 	for (i = 0; i < rcvd; i++) {
@@ -216,13 +219,23 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		len = desc->len;
 		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
-		rte_pktmbuf_pkt_len(mbufs[i]) = len;
-		rte_pktmbuf_data_len(mbufs[i]) = len;
-		rx_bytes += len;
-		bufs[count++] = mbufs[i];
-
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		if (rxq->zc) {
+			struct rte_mbuf *mbuf;
+			mbuf = addr_to_mbuf(rxq->umem, addr);
+			rte_pktmbuf_pkt_len(mbuf) = len;
+			rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+					pkt, len);
+			rte_pktmbuf_pkt_len(mbufs[i]) = len;
+			rte_pktmbuf_data_len(mbufs[i]) = len;
+			rx_bytes += len;
+			bufs[count++] = mbufs[i];
+
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		}
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -295,22 +308,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (mbuf_to_tx == NULL) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (!mbuf_to_tx) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -488,7 +508,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -505,16 +525,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
-		AF_XDP_LOG(ERR, "Failed to create mempool\n");
-		goto err;
+	if (!mb_pool) {
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (umem->mb_pool == NULL ||
+				umem->mb_pool->nb_mem_chunks != 1) {
+			AF_XDP_LOG(ERR, "Failed to create mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
@@ -536,16 +563,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (rxq->umem == NULL) {
 		ret = -ENOMEM;
 		goto err;
@@ -631,7 +685,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
 		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
 		ret = -EINVAL;
 		goto err;
@@ -639,6 +693,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		AF_XDP_LOG(INFO,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22 13:01   ` [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-22 14:32     ` Maxime Coquelin
  2019-03-24  9:32       ` Ye Xiaolong
  2019-03-24 12:10     ` Luca Boccassi
  1 sibling, 1 reply; 214+ messages in thread
From: Maxime Coquelin @ 2019-03-22 14:32 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn



On 3/22/19 2:01 PM, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   MAINTAINERS                                   |   6 +
>   config/common_base                            |   5 +
>   config/common_linux                           |   1 +
>   doc/guides/nics/af_xdp.rst                    |  45 +
>   doc/guides/nics/features/af_xdp.ini           |  11 +
>   doc/guides/nics/index.rst                     |   1 +
>   doc/guides/rel_notes/release_19_05.rst        |   7 +
>   drivers/net/Makefile                          |   1 +
>   drivers/net/af_xdp/Makefile                   |  32 +
>   drivers/net/af_xdp/meson.build                |  21 +
>   drivers/net/af_xdp/rte_eth_af_xdp.c           | 940 ++++++++++++++++++
>   drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>   drivers/net/meson.build                       |   1 +
>   mk/rte.app.mk                                 |   1 +
>   14 files changed, 1075 insertions(+)
>   create mode 100644 doc/guides/nics/af_xdp.rst
>   create mode 100644 doc/guides/nics/features/af_xdp.ini
>   create mode 100644 drivers/net/af_xdp/Makefile
>   create mode 100644 drivers/net/af_xdp/meson.build
>   create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>   create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> 

...

> diff --git a/config/common_base b/config/common_base
> index 0b09a9348..4044de205 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>   #
>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>   
> +#
> +# Compile software PMD backed by AF_XDP sockets (Linux only)
> +#
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
> +
>   #
>   # Compile link bonding PMD library
>   #
> diff --git a/config/common_linux b/config/common_linux
> index 75334273d..0b1249da0 100644
> --- a/config/common_linux
> +++ b/config/common_linux
> @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
>   CONFIG_RTE_LIBRTE_PMD_VHOST=y
>   CONFIG_RTE_LIBRTE_IFC_PMD=y
>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

It has to be disabled by default as it requires headers from kernels
more recent than minimum kernel version supported by DPDK.

>   CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
>   CONFIG_RTE_LIBRTE_PMD_TAP=y
>   CONFIG_RTE_LIBRTE_AVP_PMD=y

...

> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
> new file mode 100644
> index 000000000..635e67483
> --- /dev/null
> +++ b/drivers/net/af_xdp/meson.build
> @@ -0,0 +1,21 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +if host_machine.system() != 'linux'
> +	build = false
> +endif
> +
> +bpf_dep = dependency('libbpf', required: false)
> +if bpf_dep.found()
> +	build = true
> +else
> +	bpf_dep = cc.find_library('libbpf', required: false)
> +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
> +		build = true
> +		pkgconfig_extra_libs += '-lbpf'
> +	else
> +		build = false
> +	endif
> +endif

I think you need to add more checks, as above does not cover
linux/if_xdp.h header IIUC.

> +sources = files('rte_eth_af_xdp.c')
> +ext_deps += bpf_dep
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> new file mode 100644
> index 000000000..9f0012347
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -0,0 +1,940 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation.
> + */
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev_driver.h>
> +#include <rte_ethdev_vdev.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +#include <rte_bus_vdev.h>
> +#include <rte_string_fns.h>
> +
> +#include <linux/if_ether.h>
> +#include <linux/if_xdp.h>
> +#include <linux/if_link.h>
> +#include <asm/barrier.h>
> +#include <arpa/inet.h>
> +#include <net/if.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +#include <poll.h>
> +#include <bpf/bpf.h>
> +#include <xsk.h>
> +
> +#ifndef SOL_XDP
> +#define SOL_XDP 283
> +#endif
> +
> +#ifndef AF_XDP
> +#define AF_XDP 44
> +#endif
> +
> +#ifndef PF_XDP
> +#define PF_XDP AF_XDP
> +#endif
> +
> +static int af_xdp_logtype;
> +
> +#define AF_XDP_LOG(level, fmt, args...)			\
> +	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
> +		"%s(): " fmt "\n", __func__, ##args)
> +
> +#define ETH_AF_XDP_IFACE_ARG			"iface"
> +#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
> +
> +#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
> +#define ETH_AF_XDP_NUM_BUFFERS		4096
> +#define ETH_AF_XDP_DATA_HEADROOM	0
> +#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
> +#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
> +
> +#define ETH_AF_XDP_RX_BATCH_SIZE	32
> +#define ETH_AF_XDP_TX_BATCH_SIZE	32
> +
> +#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
> +
> +struct xsk_umem_info {
> +	struct xsk_ring_prod fq;
> +	struct xsk_ring_cons cq;
> +	struct xsk_umem *umem;
> +	struct rte_ring *buf_ring;
> +	void *buffer;
> +};
> +
> +struct rx_stats {
> +	uint64_t rx_pkts;
> +	uint64_t rx_bytes;
> +	uint64_t rx_dropped;
> +};
> +
> +struct pkt_rx_queue {
> +	struct xsk_ring_cons rx;
> +	struct xsk_umem_info *umem;
> +	struct xsk_socket *xsk;
> +	struct rte_mempool *mb_pool;
> +
> +	struct rx_stats stats;
> +
> +	struct pkt_tx_queue *pair;
> +	uint16_t queue_idx;
> +};
> +
> +struct tx_stats {
> +	uint64_t tx_pkts;
> +	uint64_t err_pkts;
> +	uint64_t tx_bytes;
> +};
> +
> +struct pkt_tx_queue {
> +	struct xsk_ring_prod tx;
> +
> +	struct tx_stats stats;
> +
> +	struct pkt_rx_queue *pair;
> +	uint16_t queue_idx;
> +};
> +
> +struct pmd_internals {
> +	int if_index;
> +	char if_name[IFNAMSIZ];
> +	uint16_t queue_idx;
> +	struct ether_addr eth_addr;
> +	struct xsk_umem_info *umem;
> +	struct rte_mempool *mb_pool_share;
> +
> +	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
> +	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
> +};
> +
> +static const char * const valid_arguments[] = {
> +	ETH_AF_XDP_IFACE_ARG,
> +	ETH_AF_XDP_QUEUE_IDX_ARG,
> +	NULL
> +};
> +
> +static struct rte_eth_link pmd_link = {
> +	.link_speed = ETH_SPEED_NUM_10G,
> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
> +	.link_status = ETH_LINK_DOWN,
> +	.link_autoneg = ETH_LINK_AUTONEG
> +};
> +
> +static inline int
> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
> +{
> +	struct xsk_ring_prod *fq = &umem->fq;
> +	uint32_t idx;
> +	void *addr = NULL;
> +	int i, ret;
> +
> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
> +	if (!ret) {

You could use unlikely() here as it is called in the hot path.

> +		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
> +		return ret;
> +	}
> +
> +	for (i = 0; i < reserve_size; i++) {
> +		__u64 *fq_addr;

For consistency, you could either declare addr here, or fq_addr at the
top of the function.

> +		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
> +			i--;
> +			break;
> +		}
> +		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
> +		*fq_addr = (uint64_t)addr;
> +	}
> +
> +	xsk_ring_prod__submit(fq, i);
> +
> +	return 0;
> +}
> +
> +static uint16_t
> +eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> +{
> +	struct pkt_rx_queue *rxq = queue;
> +	struct xsk_ring_cons *rx = &rxq->rx;
> +	struct xsk_umem_info *umem = rxq->umem;
> +	struct xsk_ring_prod *fq = &umem->fq;
> +	uint32_t idx_rx;
> +	uint32_t free_thresh = fq->size >> 1;
> +	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
> +	unsigned long dropped = 0;
> +	unsigned long rx_bytes = 0;
> +	uint16_t count = 0;
> +	int rcvd, i;
> +
> +	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
> +
> +	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> +	if (rcvd == 0)
> +		return 0;
> +
> +	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
> +		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
> +
> +	if (rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0)
unlikely()?
> +		return 0;
> +
> +	for (i = 0; i < rcvd; i++) {
> +		const struct xdp_desc *desc;
> +		uint64_t addr;
> +		uint32_t len;
> +		void *pkt;
> +
> +		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
> +		addr = desc->addr;
> +		len = desc->len;
> +		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
> +
> +		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
> +		rte_pktmbuf_pkt_len(mbufs[i]) = len;
> +		rte_pktmbuf_data_len(mbufs[i]) = len;
> +		rx_bytes += len;
> +		bufs[count++] = mbufs[i];
> +
> +		rte_ring_enqueue(umem->buf_ring, (void *)addr);
> +	}
> +
> +	xsk_ring_cons__release(rx, rcvd);
> +
> +	/* statistics */
> +	rxq->stats.rx_pkts += (rcvd - dropped);
> +	rxq->stats.rx_bytes += rx_bytes;
> +
> +	return count;
> +}
> +

...

> +
> +/* This function gets called when the current port gets stopped. */
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	dev->data->dev_link.link_status = ETH_LINK_DOWN;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)

Remove __rte_unused.

> +{
> +	/* rx/tx must be paired */
> +	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +

...

> +
> +static int
> +xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
> +	      int ring_size)
> +{
> +	struct xsk_socket_config cfg;
> +	struct pkt_tx_queue *txq = rxq->pair;
> +	int ret = 0;
> +	int reserve_size;
> +
> +	rxq->umem = xdp_umem_configure();
> +	if (rxq->umem == NULL) {
> +		ret = -ENOMEM;
> +		goto err;
You can return directly here as umem == NULL.

> +	}
> +
> +	cfg.rx_size = ring_size;
> +	cfg.tx_size = ring_size;
> +	cfg.libbpf_flags = 0;
> +	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> +	cfg.bind_flags = 0;
> +	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> +			internals->queue_idx, rxq->umem->umem, &rxq->rx,
> +			&txq->tx, &cfg);
> +	if (ret) {
> +		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
> +		goto err;
> +	}
> +
> +	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
> +	ret = reserve_fill_queue(rxq->umem, reserve_size);
> +	if (ret) {
> +		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
> +		goto err;

Shouldn't you call xsk_socket__delete(rxq->xsk) here?

> +	}
> +
> +	return 0;
> +
> +err:
> +	xdp_umem_destroy(rxq->umem);
> +
> +	return ret;
> +}
> +
> +static void
> +queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
> +{
> +	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
> +	struct pkt_tx_queue *txq = rxq->pair;
> +	int xsk_fd = xsk_socket__fd(rxq->xsk);
> +
> +	if (xsk_fd) {
> +		close(xsk_fd);
> +		if (internals->umem != NULL) {

Moving this condition out would work and be cleaner.

Anyway, it seems to never enter this condition as internal->umem is not
set yet when queue_reset() is called.

> +			xdp_umem_destroy(internals->umem);
> +			internals->umem = NULL;
> +		}
> +	}
> +	memset(rxq, 0, sizeof(*rxq));
> +	memset(txq, 0, sizeof(*txq));
> +	rxq->pair = txq;
> +	txq->pair = rxq;
> +	rxq->queue_idx = queue_idx;
> +	txq->queue_idx = queue_idx;
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev,
> +		   uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	uint32_t buf_size, data_size;
> +	struct pkt_rx_queue *rxq;
> +	int ret;
> +
> +	rxq = &internals->rx_queues[rx_queue_id];
> +	queue_reset(internals, rx_queue_id);
> +
> +	/* Now get the space available for data in the mbuf */
> +	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
> +		RTE_PKTMBUF_HEADROOM;
> +	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
> +
> +	if (data_size > buf_size) {
> +		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
> +			dev->device->name, data_size, buf_size);
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	rxq->mb_pool = mb_pool;
> +
> +	if (xsk_configure(internals, rxq, nb_rx_desc)) {
> +		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	internals->umem = rxq->umem;

If my previous comment is wrong, i.e. internals->umem may be already set
when queue_reset() is called, then it means you might have a leak here.

> +
> +	dev->data->rx_queues[rx_queue_id] = rxq;
> +	return 0;
> +
> +err:
> +	queue_reset(internals, rx_queue_id);
> +	return ret;
> +}
> +

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-22 13:01   ` [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-22 14:36     ` Maxime Coquelin
  2019-03-24  9:08       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Maxime Coquelin @ 2019-03-22 14:36 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn



On 3/22/19 2:01 PM, Xiaolong Ye wrote:
> This allows applications to create mbuf mempool with specific flags
> such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   lib/librte_mbuf/rte_mbuf.c | 29 +++++++++++++++++++-----
>   lib/librte_mbuf/rte_mbuf.h | 45 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 69 insertions(+), 5 deletions(-)
> 

> +/**
> + * Create a mbuf pool with flags.
> + *
> + * This function creates and initializes a packet mbuf pool. It is
> + * a wrapper to rte_mempool functions.
> + *
> + * @warning
> + * @b EXPERIMENTAL: This API may change without prior notice.
> + *
> + * @param name
> + *   The name of the mbuf pool.
> + * @param n
> + *   The number of elements in the mbuf pool. The optimum size (in terms
> + *   of memory usage) for a mempool is when n is a power of two minus one:
> + *   n = (2^q - 1).
> + * @param cache_size
> + *   Size of the per-core object cache. See rte_mempool_create() for
> + *   details.
> + * @param priv_size
> + *   Size of application private are between the rte_mbuf structure
> + *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
> + * @param data_room_size
> + *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
> + * @param flags
> + *   Flags controlling the behavior of the mempool. See
> + *   rte_mempool_create() for details.
> + * @param socket_id
> + *   The socket identifier where the memory should be allocated. The
> + *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
> + *   reserved zone.
> + * @return
> + *   The pointer to the new allocated mempool, on success. NULL on error
> + *   with rte_errno set appropriately. Possible rte_errno values include:
> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
> + *    - E_RTE_SECONDARY - function was called from a secondary process instance
> + *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
> + *    - ENOSPC - the maximum number of memzones has already been allocated
> + *    - EEXIST - a memzone with the same name already exists
> + *    - ENOMEM - no appropriate memory area found in which to create memzone
> + */
> +struct rte_mempool * __rte_experimental
> +rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
> +	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
> +	unsigned int flags, int socket_id);
> +
>   /**
>    * Create a mbuf pool with a given mempool ops name
>    *
> 

You need to add it to rte_mbuf_version.map too.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-22 13:01   ` [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-22 14:51     ` Maxime Coquelin
  2019-03-24  9:08       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Maxime Coquelin @ 2019-03-22 14:51 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn



On 3/22/19 2:01 PM, Xiaolong Ye wrote:
> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be
s/mbuf be allocated/mbuf allocated/
> allocated from rte_mempool can be convert to xdp_desc's address and vice
s/convert/converted/
> versa.
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
>   1 file changed, 72 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 9f0012347..6b1bc462a 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
>   
>   #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>   #define ETH_AF_XDP_NUM_BUFFERS		4096
> -#define ETH_AF_XDP_DATA_HEADROOM	0
> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
> +/* data start from offset 320 (192 + 128) bytes */
> +#define ETH_AF_XDP_DATA_HEADROOM				\
> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
>   #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
>   #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
>   
> @@ -61,7 +65,7 @@ struct xsk_umem_info {
>   	struct xsk_ring_prod fq;
>   	struct xsk_ring_cons cq;
>   	struct xsk_umem *umem;
> -	struct rte_ring *buf_ring;
> +	struct rte_mempool *mb_pool;
>   	void *buffer;
>   };
>   
> @@ -123,12 +127,32 @@ static struct rte_eth_link pmd_link = {
>   	.link_autoneg = ETH_LINK_AUTONEG
>   };
>   
> +static inline struct rte_mbuf *
> +addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
> +{
> +	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
> +			ETH_AF_XDP_FRAME_SIZE);
> +	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
> +				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
> +				    sizeof(struct rte_mbuf));
> +	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
> +	return mbuf;
> +}
> +
> +static inline uint64_t
> +mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
> +{
> +	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
> +		(uint64_t)umem->buffer;
> +}
> +
>   static inline int
>   reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>   {
>   	struct xsk_ring_prod *fq = &umem->fq;
> +	struct rte_mbuf *mbuf;
>   	uint32_t idx;
> -	void *addr = NULL;
> +	uint64_t addr;
>   	int i, ret;
>   
>   	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
> @@ -139,12 +163,14 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>   
>   	for (i = 0; i < reserve_size; i++) {
>   		__u64 *fq_addr;
> -		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
> +		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
> +		if (mbuf == NULL) {

unlikely()

>   			i--;
>   			break;
>   		}
> +		addr = mbuf_to_addr(umem, mbuf);
>   		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
> -		*fq_addr = (uint64_t)addr;
> +		*fq_addr = addr;
>   	}
>   
>   	xsk_ring_prod__submit(fq, i);
> @@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   		rx_bytes += len;
>   		bufs[count++] = mbufs[i];
>   
> -		rte_ring_enqueue(umem->buf_ring, (void *)addr);
> +		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>   	}
>   
>   	xsk_ring_cons__release(rx, rcvd);
> @@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
>   	for (i = 0; i < n; i++) {
>   		uint64_t addr;
>   		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
> -		rte_ring_enqueue(umem->buf_ring, (void *)addr);
> +		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>   	}
>   
>   	xsk_ring_cons__release(cq, n);
> @@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   	struct pkt_tx_queue *txq = queue;
>   	struct xsk_umem_info *umem = txq->pair->umem;
>   	struct rte_mbuf *mbuf;
> -	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
> +	struct rte_mbuf *mbuf_to_tx;
>   	unsigned long tx_bytes = 0;
>   	int i, valid = 0;
>   	uint32_t idx_tx;
> @@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   
>   	pull_umem_cq(umem, nb_pkts);
>   
> -	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
> -					nb_pkts, NULL);
> -	if (nb_pkts == 0)
> -		return 0;
> -
>   	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
>   		kick_tx(txq);
>   		return 0;
> @@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
>   		mbuf = bufs[i];
>   		if (mbuf->pkt_len <= buf_len) {
> -			desc->addr = (uint64_t)addrs[valid];
> +			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
> +			if (mbuf_to_tx == NULL) {

unlikely()

> +				rte_pktmbuf_free(mbuf);
> +				continue;
> +			}
> +			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
>   			desc->len = mbuf->pkt_len;
>   			pkt = xsk_umem__get_data(umem->buffer,
>   						 desc->addr);
> @@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   
>   	kick_tx(txq);
>   
> -	if (valid < nb_pkts)
> -		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
> -				 nb_pkts - valid, NULL);
> -
>   	txq->stats.err_pkts += nb_pkts - valid;
>   	txq->stats.tx_pkts += valid;
>   	txq->stats.tx_bytes += tx_bytes;
> @@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
>   
>   static void xdp_umem_destroy(struct xsk_umem_info *umem)
>   {
> -	free(umem->buffer);
> -	umem->buffer = NULL;
> -
> -	rte_ring_free(umem->buf_ring);
> -	umem->buf_ring = NULL;
> +	rte_mempool_free(umem->mb_pool);
> +	umem->mb_pool = NULL;
>   
>   	rte_free(umem);
>   	umem = NULL;
>   }
>   
> +static inline uint64_t get_base_addr(struct rte_mempool *mp)
> +{
> +	struct rte_mempool_memhdr *memhdr;
> +
> +	memhdr = STAILQ_FIRST(&mp->mem_list);
> +	return (uint64_t)(memhdr->addr);
> +}
> +
> +static inline uint64_t get_len(struct rte_mempool *mp)
> +{
> +	struct rte_mempool_memhdr *memhdr;
> +
> +	memhdr = STAILQ_FIRST(&mp->mem_list);
> +	return (uint64_t)(memhdr->len);
> +}
> +
>   static struct xsk_umem_info *xdp_umem_configure(void)
>   {
>   	struct xsk_umem_info *umem;
> @@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>   		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>   		.frame_size = ETH_AF_XDP_FRAME_SIZE,
>   		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
> -	void *bufs = NULL;
> +	void *base_addr = NULL;
>   	int ret;
> -	uint64_t i;
>   
>   	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
>   	if (umem == NULL) {
> @@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>   		return NULL;
>   	}
>   
> -	umem->buf_ring = rte_ring_create("af_xdp_ring",
> -					 ETH_AF_XDP_NUM_BUFFERS,
> -					 SOCKET_ID_ANY,
> -					 0x0);
> -	if (umem->buf_ring == NULL) {
> -		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
> +	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
> +			ETH_AF_XDP_NUM_BUFFERS,
> +			250, 0,
> +			ETH_AF_XDP_FRAME_SIZE -
> +			ETH_AF_XDP_MBUF_OVERHEAD,
> +			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
> +			SOCKET_ID_ANY);
> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
>   		goto err;
>   	}
> +	base_addr = (void *)get_base_addr(umem->mb_pool);
>   
> -	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
> -		rte_ring_enqueue(umem->buf_ring,
> -				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
> -					  ETH_AF_XDP_DATA_HEADROOM));
> -
> -	if (posix_memalign(&bufs, getpagesize(),
> -			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
> -		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
> -		goto err;
> -	}
> -	ret = xsk_umem__create(&umem->umem, bufs,
> +	ret = xsk_umem__create(&umem->umem, base_addr,
>   			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>   			       &umem->fq, &umem->cq,
>   			       &usr_config);
> @@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>   		AF_XDP_LOG(ERR, "Failed to create umem");
>   		goto err;

You need to destroy mb_pool if xsk_umem__create() fails.

>   	}
> -	umem->buffer = bufs;
> +	umem->buffer = base_addr;
>   
>   	return umem;
>   
> @@ -912,10 +940,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
>   
>   	internals = eth_dev->data->dev_private;
>   
> -	rte_ring_free(internals->umem->buf_ring);
> -	rte_free(internals->umem->buffer);
>   	rte_free(internals->umem);
>   
> +	rte_mempool_free(internals->umem->mb_pool);
>   	rte_eth_dev_release_port(eth_dev);
>   
>   
> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22  2:01       ` Ye Xiaolong
@ 2019-03-22 15:37         ` Stephen Hemminger
  2019-03-22 23:19           ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-22 15:37 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Fri, 22 Mar 2019 10:01:57 +0800
Ye Xiaolong <xiaolong.ye@intel.com> wrote:

> On 03/21, Stephen Hemminger wrote:
> >On Thu, 21 Mar 2019 17:18:41 +0800
> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> >  
> >> +
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> >> +
> >> +	return 0;  
> >
> >You could propogate kernel errno into DPDK?
> >	return (ret < 0) ? -errno : 0;
> >  
> 
> Sorry, could you share the advantage of doing this?
> 
> Thanks,
> Xiaolong

Suppose kernel returned -ENOTSUPP or other error, it could go back to
the caller rather than juse invalid.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22  2:15       ` Ye Xiaolong
@ 2019-03-22 15:38         ` Stephen Hemminger
  2019-03-22 23:20           ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-22 15:38 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On Fri, 22 Mar 2019 10:15:23 +0800
Ye Xiaolong <xiaolong.ye@intel.com> wrote:

> On 03/21, Stephen Hemminger wrote:
> >On Thu, 21 Mar 2019 17:18:41 +0800
> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> >  
> >> +		if (ret != 0) {
> >> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
> >> +			return -1;  
> >
> >You need to use the new dynamic log types and not have a global logtype.  
> 
> You mean for all the logs in this driver, right? Is it due to the global logtype
> will be deprecated?

Global log types should not be used or added by any new code.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22 15:37         ` Stephen Hemminger
@ 2019-03-22 23:19           ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22 23:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/22, Stephen Hemminger wrote:
>On Fri, 22 Mar 2019 10:01:57 +0800
>Ye Xiaolong <xiaolong.ye@intel.com> wrote:
>
>> On 03/21, Stephen Hemminger wrote:
>> >On Thu, 21 Mar 2019 17:18:41 +0800
>> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>> >  
>> >> +
>> >> +	if (ret < 0)
>> >> +		return -EINVAL;
>> >> +
>> >> +	return 0;  
>> >
>> >You could propogate kernel errno into DPDK?
>> >	return (ret < 0) ? -errno : 0;
>> >  
>> 
>> Sorry, could you share the advantage of doing this?
>> 
>> Thanks,
>> Xiaolong
>
>Suppose kernel returned -ENOTSUPP or other error, it could go back to
>the caller rather than juse invalid.

Got it.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22 15:38         ` Stephen Hemminger
@ 2019-03-22 23:20           ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-22 23:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/22, Stephen Hemminger wrote:
>On Fri, 22 Mar 2019 10:15:23 +0800
>Ye Xiaolong <xiaolong.ye@intel.com> wrote:
>
>> On 03/21, Stephen Hemminger wrote:
>> >On Thu, 21 Mar 2019 17:18:41 +0800
>> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>> >  
>> >> +		if (ret != 0) {
>> >> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
>> >> +			return -1;  
>> >
>> >You need to use the new dynamic log types and not have a global logtype.  
>> 
>> You mean for all the logs in this driver, right? Is it due to the global logtype
>> will be deprecated?
>
>Global log types should not be used or added by any new code.

Got it.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-22 14:51     ` Maxime Coquelin
@ 2019-03-24  9:08       ` Ye Xiaolong
  2019-03-24 11:52         ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-24  9:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/22, Maxime Coquelin wrote:
>
>
>On 3/22/19 2:01 PM, Xiaolong Ye wrote:
>> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be
>s/mbuf be allocated/mbuf allocated/
>> allocated from rte_mempool can be convert to xdp_desc's address and vice
>s/convert/converted/
>> versa.

Will fix these spell errors.

>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>   drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
>>   1 file changed, 72 insertions(+), 45 deletions(-)
>> 
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index 9f0012347..6b1bc462a 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
>>   #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>>   #define ETH_AF_XDP_NUM_BUFFERS		4096
>> -#define ETH_AF_XDP_DATA_HEADROOM	0
>> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
>> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
>> +/* data start from offset 320 (192 + 128) bytes */
>> +#define ETH_AF_XDP_DATA_HEADROOM				\
>> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
>>   #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
>>   #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
>> @@ -61,7 +65,7 @@ struct xsk_umem_info {
>>   	struct xsk_ring_prod fq;
>>   	struct xsk_ring_cons cq;
>>   	struct xsk_umem *umem;
>> -	struct rte_ring *buf_ring;
>> +	struct rte_mempool *mb_pool;
>>   	void *buffer;
>>   };
>> @@ -123,12 +127,32 @@ static struct rte_eth_link pmd_link = {
>>   	.link_autoneg = ETH_LINK_AUTONEG
>>   };
>> +static inline struct rte_mbuf *
>> +addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
>> +{
>> +	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
>> +			ETH_AF_XDP_FRAME_SIZE);
>> +	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
>> +				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
>> +				    sizeof(struct rte_mbuf));
>> +	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
>> +	return mbuf;
>> +}
>> +
>> +static inline uint64_t
>> +mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
>> +{
>> +	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
>> +		(uint64_t)umem->buffer;
>> +}
>> +
>>   static inline int
>>   reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>>   {
>>   	struct xsk_ring_prod *fq = &umem->fq;
>> +	struct rte_mbuf *mbuf;
>>   	uint32_t idx;
>> -	void *addr = NULL;
>> +	uint64_t addr;
>>   	int i, ret;
>>   	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
>> @@ -139,12 +163,14 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>>   	for (i = 0; i < reserve_size; i++) {
>>   		__u64 *fq_addr;
>> -		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
>> +		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
>> +		if (mbuf == NULL) {
>
>unlikely()

Got it.

>
>>   			i--;
>>   			break;
>>   		}
>> +		addr = mbuf_to_addr(umem, mbuf);
>>   		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
>> -		*fq_addr = (uint64_t)addr;
>> +		*fq_addr = addr;
>>   	}
>>   	xsk_ring_prod__submit(fq, i);
>> @@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   		rx_bytes += len;
>>   		bufs[count++] = mbufs[i];
>> -		rte_ring_enqueue(umem->buf_ring, (void *)addr);
>> +		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>>   	}
>>   	xsk_ring_cons__release(rx, rcvd);
>> @@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
>>   	for (i = 0; i < n; i++) {
>>   		uint64_t addr;
>>   		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
>> -		rte_ring_enqueue(umem->buf_ring, (void *)addr);
>> +		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
>>   	}
>>   	xsk_ring_cons__release(cq, n);
>> @@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   	struct pkt_tx_queue *txq = queue;
>>   	struct xsk_umem_info *umem = txq->pair->umem;
>>   	struct rte_mbuf *mbuf;
>> -	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
>> +	struct rte_mbuf *mbuf_to_tx;
>>   	unsigned long tx_bytes = 0;
>>   	int i, valid = 0;
>>   	uint32_t idx_tx;
>> @@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   	pull_umem_cq(umem, nb_pkts);
>> -	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
>> -					nb_pkts, NULL);
>> -	if (nb_pkts == 0)
>> -		return 0;
>> -
>>   	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
>>   		kick_tx(txq);
>>   		return 0;
>> @@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
>>   		mbuf = bufs[i];
>>   		if (mbuf->pkt_len <= buf_len) {
>> -			desc->addr = (uint64_t)addrs[valid];
>> +			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
>> +			if (mbuf_to_tx == NULL) {
>
>unlikely()

Got it.

>
>> +				rte_pktmbuf_free(mbuf);
>> +				continue;
>> +			}
>> +			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
>>   			desc->len = mbuf->pkt_len;
>>   			pkt = xsk_umem__get_data(umem->buffer,
>>   						 desc->addr);
>> @@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>   	kick_tx(txq);
>> -	if (valid < nb_pkts)
>> -		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
>> -				 nb_pkts - valid, NULL);
>> -
>>   	txq->stats.err_pkts += nb_pkts - valid;
>>   	txq->stats.tx_pkts += valid;
>>   	txq->stats.tx_bytes += tx_bytes;
>> @@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
>>   static void xdp_umem_destroy(struct xsk_umem_info *umem)
>>   {
>> -	free(umem->buffer);
>> -	umem->buffer = NULL;
>> -
>> -	rte_ring_free(umem->buf_ring);
>> -	umem->buf_ring = NULL;
>> +	rte_mempool_free(umem->mb_pool);
>> +	umem->mb_pool = NULL;
>>   	rte_free(umem);
>>   	umem = NULL;
>>   }
>> +static inline uint64_t get_base_addr(struct rte_mempool *mp)
>> +{
>> +	struct rte_mempool_memhdr *memhdr;
>> +
>> +	memhdr = STAILQ_FIRST(&mp->mem_list);
>> +	return (uint64_t)(memhdr->addr);
>> +}
>> +
>> +static inline uint64_t get_len(struct rte_mempool *mp)
>> +{
>> +	struct rte_mempool_memhdr *memhdr;
>> +
>> +	memhdr = STAILQ_FIRST(&mp->mem_list);
>> +	return (uint64_t)(memhdr->len);
>> +}
>> +
>>   static struct xsk_umem_info *xdp_umem_configure(void)
>>   {
>>   	struct xsk_umem_info *umem;
>> @@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>>   		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>>   		.frame_size = ETH_AF_XDP_FRAME_SIZE,
>>   		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
>> -	void *bufs = NULL;
>> +	void *base_addr = NULL;
>>   	int ret;
>> -	uint64_t i;
>>   	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
>>   	if (umem == NULL) {
>> @@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>>   		return NULL;
>>   	}
>> -	umem->buf_ring = rte_ring_create("af_xdp_ring",
>> -					 ETH_AF_XDP_NUM_BUFFERS,
>> -					 SOCKET_ID_ANY,
>> -					 0x0);
>> -	if (umem->buf_ring == NULL) {
>> -		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
>> +	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
>> +			ETH_AF_XDP_NUM_BUFFERS,
>> +			250, 0,
>> +			ETH_AF_XDP_FRAME_SIZE -
>> +			ETH_AF_XDP_MBUF_OVERHEAD,
>> +			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
>> +			SOCKET_ID_ANY);
>> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
>> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
>>   		goto err;
>>   	}
>> +	base_addr = (void *)get_base_addr(umem->mb_pool);
>> -	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
>> -		rte_ring_enqueue(umem->buf_ring,
>> -				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
>> -					  ETH_AF_XDP_DATA_HEADROOM));
>> -
>> -	if (posix_memalign(&bufs, getpagesize(),
>> -			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
>> -		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
>> -		goto err;
>> -	}
>> -	ret = xsk_umem__create(&umem->umem, bufs,
>> +	ret = xsk_umem__create(&umem->umem, base_addr,
>>   			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>>   			       &umem->fq, &umem->cq,
>>   			       &usr_config);
>> @@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>>   		AF_XDP_LOG(ERR, "Failed to create umem");
>>   		goto err;
>
>You need to destroy mb_pool if xsk_umem__create() fails.

Will do.

Thanks,
Xiaolong
>
>>   	}
>> -	umem->buffer = bufs;
>> +	umem->buffer = base_addr;
>>   	return umem;
>> @@ -912,10 +940,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
>>   	internals = eth_dev->data->dev_private;
>> -	rte_ring_free(internals->umem->buf_ring);
>> -	rte_free(internals->umem->buffer);
>>   	rte_free(internals->umem);
>> +	rte_mempool_free(internals->umem->mb_pool);
>>   	rte_eth_dev_release_port(eth_dev);
>> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-22 14:36     ` Maxime Coquelin
@ 2019-03-24  9:08       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-24  9:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/22, Maxime Coquelin wrote:
>
>
>On 3/22/19 2:01 PM, Xiaolong Ye wrote:
>> This allows applications to create mbuf mempool with specific flags
>> such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>   lib/librte_mbuf/rte_mbuf.c | 29 +++++++++++++++++++-----
>>   lib/librte_mbuf/rte_mbuf.h | 45 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 69 insertions(+), 5 deletions(-)
>> 
>
>> +/**
>> + * Create a mbuf pool with flags.
>> + *
>> + * This function creates and initializes a packet mbuf pool. It is
>> + * a wrapper to rte_mempool functions.
>> + *
>> + * @warning
>> + * @b EXPERIMENTAL: This API may change without prior notice.
>> + *
>> + * @param name
>> + *   The name of the mbuf pool.
>> + * @param n
>> + *   The number of elements in the mbuf pool. The optimum size (in terms
>> + *   of memory usage) for a mempool is when n is a power of two minus one:
>> + *   n = (2^q - 1).
>> + * @param cache_size
>> + *   Size of the per-core object cache. See rte_mempool_create() for
>> + *   details.
>> + * @param priv_size
>> + *   Size of application private are between the rte_mbuf structure
>> + *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
>> + * @param data_room_size
>> + *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
>> + * @param flags
>> + *   Flags controlling the behavior of the mempool. See
>> + *   rte_mempool_create() for details.
>> + * @param socket_id
>> + *   The socket identifier where the memory should be allocated. The
>> + *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
>> + *   reserved zone.
>> + * @return
>> + *   The pointer to the new allocated mempool, on success. NULL on error
>> + *   with rte_errno set appropriately. Possible rte_errno values include:
>> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
>> + *    - E_RTE_SECONDARY - function was called from a secondary process instance
>> + *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
>> + *    - ENOSPC - the maximum number of memzones has already been allocated
>> + *    - EEXIST - a memzone with the same name already exists
>> + *    - ENOMEM - no appropriate memory area found in which to create memzone
>> + */
>> +struct rte_mempool * __rte_experimental
>> +rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
>> +	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
>> +	unsigned int flags, int socket_id);
>> +
>>   /**
>>    * Create a mbuf pool with a given mempool ops name
>>    *
>> 
>
>You need to add it to rte_mbuf_version.map too.

Got it, will add in next version.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22 14:32     ` Maxime Coquelin
@ 2019-03-24  9:32       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-24  9:32 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/22, Maxime Coquelin wrote:
>
>
>On 3/22/19 2:01 PM, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>   MAINTAINERS                                   |   6 +
>>   config/common_base                            |   5 +
>>   config/common_linux                           |   1 +
>>   doc/guides/nics/af_xdp.rst                    |  45 +
>>   doc/guides/nics/features/af_xdp.ini           |  11 +
>>   doc/guides/nics/index.rst                     |   1 +
>>   doc/guides/rel_notes/release_19_05.rst        |   7 +
>>   drivers/net/Makefile                          |   1 +
>>   drivers/net/af_xdp/Makefile                   |  32 +
>>   drivers/net/af_xdp/meson.build                |  21 +
>>   drivers/net/af_xdp/rte_eth_af_xdp.c           | 940 ++++++++++++++++++
>>   drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>>   drivers/net/meson.build                       |   1 +
>>   mk/rte.app.mk                                 |   1 +
>>   14 files changed, 1075 insertions(+)
>>   create mode 100644 doc/guides/nics/af_xdp.rst
>>   create mode 100644 doc/guides/nics/features/af_xdp.ini
>>   create mode 100644 drivers/net/af_xdp/Makefile
>>   create mode 100644 drivers/net/af_xdp/meson.build
>>   create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>>   create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>> 
>
>...
>
>> diff --git a/config/common_base b/config/common_base
>> index 0b09a9348..4044de205 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
>>   #
>>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>> +#
>> +# Compile software PMD backed by AF_XDP sockets (Linux only)
>> +#
>> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
>> +
>>   #
>>   # Compile link bonding PMD library
>>   #
>> diff --git a/config/common_linux b/config/common_linux
>> index 75334273d..0b1249da0 100644
>> --- a/config/common_linux
>> +++ b/config/common_linux
>> @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
>>   CONFIG_RTE_LIBRTE_PMD_VHOST=y
>>   CONFIG_RTE_LIBRTE_IFC_PMD=y
>>   CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>> +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
>
>It has to be disabled by default as it requires headers from kernels
>more recent than minimum kernel version supported by DPDK.

Ok, got it.

>
>>   CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
>>   CONFIG_RTE_LIBRTE_PMD_TAP=y
>>   CONFIG_RTE_LIBRTE_AVP_PMD=y
>
>...
>
>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>> new file mode 100644
>> index 000000000..635e67483
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/meson.build
>> @@ -0,0 +1,21 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2018 Intel Corporation
>> +
>> +if host_machine.system() != 'linux'
>> +	build = false
>> +endif
>> +
>> +bpf_dep = dependency('libbpf', required: false)
>> +if bpf_dep.found()
>> +	build = true
>> +else
>> +	bpf_dep = cc.find_library('libbpf', required: false)
>> +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
>> +		build = true
>> +		pkgconfig_extra_libs += '-lbpf'
>> +	else
>> +		build = false
>> +	endif
>> +endif
>
>I think you need to add more checks, as above does not cover
>linux/if_xdp.h header IIUC.

will add more.

>
>> +sources = files('rte_eth_af_xdp.c')
>> +ext_deps += bpf_dep
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> new file mode 100644
>> index 000000000..9f0012347
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -0,0 +1,940 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2019 Intel Corporation.
>> + */
>> +
>> +#include <rte_mbuf.h>
>> +#include <rte_ethdev_driver.h>
>> +#include <rte_ethdev_vdev.h>
>> +#include <rte_malloc.h>
>> +#include <rte_kvargs.h>
>> +#include <rte_bus_vdev.h>
>> +#include <rte_string_fns.h>
>> +
>> +#include <linux/if_ether.h>
>> +#include <linux/if_xdp.h>
>> +#include <linux/if_link.h>
>> +#include <asm/barrier.h>
>> +#include <arpa/inet.h>
>> +#include <net/if.h>
>> +#include <sys/types.h>
>> +#include <sys/socket.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/mman.h>
>> +#include <unistd.h>
>> +#include <poll.h>
>> +#include <bpf/bpf.h>
>> +#include <xsk.h>
>> +
>> +#ifndef SOL_XDP
>> +#define SOL_XDP 283
>> +#endif
>> +
>> +#ifndef AF_XDP
>> +#define AF_XDP 44
>> +#endif
>> +
>> +#ifndef PF_XDP
>> +#define PF_XDP AF_XDP
>> +#endif
>> +
>> +static int af_xdp_logtype;
>> +
>> +#define AF_XDP_LOG(level, fmt, args...)			\
>> +	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
>> +		"%s(): " fmt "\n", __func__, ##args)
>> +
>> +#define ETH_AF_XDP_IFACE_ARG			"iface"
>> +#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
>> +
>> +#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>> +#define ETH_AF_XDP_NUM_BUFFERS		4096
>> +#define ETH_AF_XDP_DATA_HEADROOM	0
>> +#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
>> +#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
>> +
>> +#define ETH_AF_XDP_RX_BATCH_SIZE	32
>> +#define ETH_AF_XDP_TX_BATCH_SIZE	32
>> +
>> +#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
>> +
>> +struct xsk_umem_info {
>> +	struct xsk_ring_prod fq;
>> +	struct xsk_ring_cons cq;
>> +	struct xsk_umem *umem;
>> +	struct rte_ring *buf_ring;
>> +	void *buffer;
>> +};
>> +
>> +struct rx_stats {
>> +	uint64_t rx_pkts;
>> +	uint64_t rx_bytes;
>> +	uint64_t rx_dropped;
>> +};
>> +
>> +struct pkt_rx_queue {
>> +	struct xsk_ring_cons rx;
>> +	struct xsk_umem_info *umem;
>> +	struct xsk_socket *xsk;
>> +	struct rte_mempool *mb_pool;
>> +
>> +	struct rx_stats stats;
>> +
>> +	struct pkt_tx_queue *pair;
>> +	uint16_t queue_idx;
>> +};
>> +
>> +struct tx_stats {
>> +	uint64_t tx_pkts;
>> +	uint64_t err_pkts;
>> +	uint64_t tx_bytes;
>> +};
>> +
>> +struct pkt_tx_queue {
>> +	struct xsk_ring_prod tx;
>> +
>> +	struct tx_stats stats;
>> +
>> +	struct pkt_rx_queue *pair;
>> +	uint16_t queue_idx;
>> +};
>> +
>> +struct pmd_internals {
>> +	int if_index;
>> +	char if_name[IFNAMSIZ];
>> +	uint16_t queue_idx;
>> +	struct ether_addr eth_addr;
>> +	struct xsk_umem_info *umem;
>> +	struct rte_mempool *mb_pool_share;
>> +
>> +	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
>> +	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
>> +};
>> +
>> +static const char * const valid_arguments[] = {
>> +	ETH_AF_XDP_IFACE_ARG,
>> +	ETH_AF_XDP_QUEUE_IDX_ARG,
>> +	NULL
>> +};
>> +
>> +static struct rte_eth_link pmd_link = {
>> +	.link_speed = ETH_SPEED_NUM_10G,
>> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
>> +	.link_status = ETH_LINK_DOWN,
>> +	.link_autoneg = ETH_LINK_AUTONEG
>> +};
>> +
>> +static inline int
>> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>> +{
>> +	struct xsk_ring_prod *fq = &umem->fq;
>> +	uint32_t idx;
>> +	void *addr = NULL;
>> +	int i, ret;
>> +
>> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
>> +	if (!ret) {
>
>You could use unlikely() here as it is called in the hot path.

Got it.

>
>> +		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
>> +		return ret;
>> +	}
>> +
>> +	for (i = 0; i < reserve_size; i++) {
>> +		__u64 *fq_addr;
>
>For consistency, you could either declare addr here, or fq_addr at the
>top of the function.

ok, will make the code consistent in next version.

>
>> +		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
>> +			i--;
>> +			break;
>> +		}
>> +		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
>> +		*fq_addr = (uint64_t)addr;
>> +	}
>> +
>> +	xsk_ring_prod__submit(fq, i);
>> +
>> +	return 0;
>> +}
>> +
>> +static uint16_t
>> +eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>> +{
>> +	struct pkt_rx_queue *rxq = queue;
>> +	struct xsk_ring_cons *rx = &rxq->rx;
>> +	struct xsk_umem_info *umem = rxq->umem;
>> +	struct xsk_ring_prod *fq = &umem->fq;
>> +	uint32_t idx_rx;
>> +	uint32_t free_thresh = fq->size >> 1;
>> +	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
>> +	unsigned long dropped = 0;
>> +	unsigned long rx_bytes = 0;
>> +	uint16_t count = 0;
>> +	int rcvd, i;
>> +
>> +	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
>> +
>> +	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
>> +	if (rcvd == 0)
>> +		return 0;
>> +
>> +	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
>> +		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
>> +
>> +	if (rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0)
>unlikely()?

Got it.

>> +		return 0;
>> +
>> +	for (i = 0; i < rcvd; i++) {
>> +		const struct xdp_desc *desc;
>> +		uint64_t addr;
>> +		uint32_t len;
>> +		void *pkt;
>> +
>> +		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
>> +		addr = desc->addr;
>> +		len = desc->len;
>> +		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>> +
>> +		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
>> +		rte_pktmbuf_pkt_len(mbufs[i]) = len;
>> +		rte_pktmbuf_data_len(mbufs[i]) = len;
>> +		rx_bytes += len;
>> +		bufs[count++] = mbufs[i];
>> +
>> +		rte_ring_enqueue(umem->buf_ring, (void *)addr);
>> +	}
>> +
>> +	xsk_ring_cons__release(rx, rcvd);
>> +
>> +	/* statistics */
>> +	rxq->stats.rx_pkts += (rcvd - dropped);
>> +	rxq->stats.rx_bytes += rx_bytes;
>> +
>> +	return count;
>> +}
>> +
>
>...
>
>> +
>> +/* This function gets called when the current port gets stopped. */
>> +static void
>> +eth_dev_stop(struct rte_eth_dev *dev)
>> +{
>> +	dev->data->dev_link.link_status = ETH_LINK_DOWN;
>> +}
>> +
>> +static int
>> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
>
>Remove __rte_unused.

oops, my bad, will remove it.

>
>> +{
>> +	/* rx/tx must be paired */
>> +	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>
>...
>
>> +
>> +static int
>> +xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
>> +	      int ring_size)
>> +{
>> +	struct xsk_socket_config cfg;
>> +	struct pkt_tx_queue *txq = rxq->pair;
>> +	int ret = 0;
>> +	int reserve_size;
>> +
>> +	rxq->umem = xdp_umem_configure();
>> +	if (rxq->umem == NULL) {
>> +		ret = -ENOMEM;
>> +		goto err;
>You can return directly here as umem == NULL.

Got it.

>
>> +	}
>> +
>> +	cfg.rx_size = ring_size;
>> +	cfg.tx_size = ring_size;
>> +	cfg.libbpf_flags = 0;
>> +	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>> +	cfg.bind_flags = 0;
>> +	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
>> +			internals->queue_idx, rxq->umem->umem, &rxq->rx,
>> +			&txq->tx, &cfg);
>> +	if (ret) {
>> +		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
>> +		goto err;
>> +	}
>> +
>> +	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
>> +	ret = reserve_fill_queue(rxq->umem, reserve_size);
>> +	if (ret) {
>> +		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
>> +		goto err;
>
>Shouldn't you call xsk_socket__delete(rxq->xsk) here?

Good catch, xsk socket does need to be deleted here.

>
>> +	}
>> +
>> +	return 0;
>> +
>> +err:
>> +	xdp_umem_destroy(rxq->umem);
>> +
>> +	return ret;
>> +}
>> +
>> +static void
>> +queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
>> +{
>> +	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
>> +	struct pkt_tx_queue *txq = rxq->pair;
>> +	int xsk_fd = xsk_socket__fd(rxq->xsk);
>> +
>> +	if (xsk_fd) {
>> +		close(xsk_fd);
>> +		if (internals->umem != NULL) {
>
>Moving this condition out would work and be cleaner.
>
>Anyway, it seems to never enter this condition as internal->umem is not
>set yet when queue_reset() is called.

You are right, internal->umem and xsk_fd shouldn't be handled here, will refine
the code.

>
>> +			xdp_umem_destroy(internals->umem);
>> +			internals->umem = NULL;
>> +		}
>> +	}
>> +	memset(rxq, 0, sizeof(*rxq));
>> +	memset(txq, 0, sizeof(*txq));
>> +	rxq->pair = txq;
>> +	txq->pair = rxq;
>> +	rxq->queue_idx = queue_idx;
>> +	txq->queue_idx = queue_idx;
>> +}
>> +
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev,
>> +		   uint16_t rx_queue_id,
>> +		   uint16_t nb_rx_desc,
>> +		   unsigned int socket_id __rte_unused,
>> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +		   struct rte_mempool *mb_pool)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +	uint32_t buf_size, data_size;
>> +	struct pkt_rx_queue *rxq;
>> +	int ret;
>> +
>> +	rxq = &internals->rx_queues[rx_queue_id];
>> +	queue_reset(internals, rx_queue_id);
>> +
>> +	/* Now get the space available for data in the mbuf */
>> +	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
>> +		RTE_PKTMBUF_HEADROOM;
>> +	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
>> +
>> +	if (data_size > buf_size) {
>> +		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
>> +			dev->device->name, data_size, buf_size);
>> +		ret = -ENOMEM;
>> +		goto err;
>> +	}
>> +
>> +	rxq->mb_pool = mb_pool;
>> +
>> +	if (xsk_configure(internals, rxq, nb_rx_desc)) {
>> +		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
>> +		ret = -EINVAL;
>> +		goto err;
>> +	}
>> +
>> +	internals->umem = rxq->umem;
>
>If my previous comment is wrong, i.e. internals->umem may be already set
>when queue_reset() is called, then it means you might have a leak here.
>
>> +
>> +	dev->data->rx_queues[rx_queue_id] = rxq;
>> +	return 0;
>> +
>> +err:
>> +	queue_reset(internals, rx_queue_id);
>> +	return ret;
>> +}
>> +

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-24  9:08       ` Ye Xiaolong
@ 2019-03-24 11:52         ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-24 11:52 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Qi Zhang, Karlsson Magnus, Topel Bjorn

On 03/24, Ye Xiaolong wrote:
>>> -	ret = xsk_umem__create(&umem->umem, bufs,
>>> +	ret = xsk_umem__create(&umem->umem, base_addr,
>>>   			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>>>   			       &umem->fq, &umem->cq,
>>>   			       &usr_config);
>>> @@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
>>>   		AF_XDP_LOG(ERR, "Failed to create umem");
>>>   		goto err;
>>
>>You need to destroy mb_pool if xsk_umem__create() fails.
>
>Will do.

Correction, mp_pool destrcttion has alrealy been hanled in xdp_umem_destory.

Thanks,
Xiaolong
>
>Thanks,
>Xiaolong
>>
>>>   	}
>>> -	umem->buffer = bufs;
>>> +	umem->buffer = base_addr;
>>>   	return umem;
>>> @@ -912,10 +940,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
>>>   	internals = eth_dev->data->dev_private;
>>> -	rte_ring_free(internals->umem->buf_ring);
>>> -	rte_free(internals->umem->buffer);
>>>   	rte_free(internals->umem);
>>> +	rte_mempool_free(internals->umem->mb_pool);
>>>   	rte_eth_dev_release_port(eth_dev);
>>> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-17  3:34       ` Ye Xiaolong
@ 2019-03-24 12:07         ` Luca Boccassi
  2019-03-25  2:45           ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-03-24 12:07 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Qi Zhang

On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
> On 03/02, Ye Xiaolong wrote:
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
> > > > -lrte_pmd_af_packet
> > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
> > > > -lrte_pmd_af_xdp
> > > > -lelf -lbpf
> > > 
> > > Are symbols from libelf being used by the PMD?
> > 
> > Hmm, it is a leftover of RFC, libelf is no longer needed in this
> > version, will
> > remove it in next version.
> > 
> 
> Correction, libelf is needed for libbpf, so we still need to keep
> it. 

If libbpf needs libelf for its internal usage, it should be linked
against it itself. Unless symbols from libelf are used in static
functions defined in libbpf's public headers. Is this the case?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-22 13:01   ` [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-22 14:32     ` Maxime Coquelin
@ 2019-03-24 12:10     ` Luca Boccassi
  2019-03-24 16:27       ` Thomas Monjalon
  1 sibling, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-03-24 12:10 UTC (permalink / raw)
  To: Xiaolong Ye, dev; +Cc: Qi Zhang, Thomas Monjalon, Yigit, Ferruh

On Fri, 2019-03-22 at 21:01 +0800, Xiaolong Ye wrote:
> diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> new file mode 100644
> index 000000000..c6db030fe
> --- /dev/null
> +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> @@ -0,0 +1,3 @@
> +DPDK_19.05 {
> +       local: *;
> +};

This is a new PMD, shouldn't it be EXPERIMENTAL rather than DPDK_19.05?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-24 12:10     ` Luca Boccassi
@ 2019-03-24 16:27       ` Thomas Monjalon
  0 siblings, 0 replies; 214+ messages in thread
From: Thomas Monjalon @ 2019-03-24 16:27 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Xiaolong Ye, dev, Qi Zhang, Yigit, Ferruh

24/03/2019 13:10, Luca Boccassi:
> On Fri, 2019-03-22 at 21:01 +0800, Xiaolong Ye wrote:
> > diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> > b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> > new file mode 100644
> > index 000000000..c6db030fe
> > --- /dev/null
> > +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> > @@ -0,0 +1,3 @@
> > +DPDK_19.05 {
> > +       local: *;
> > +};
> 
> This is a new PMD, shouldn't it be EXPERIMENTAL rather than DPDK_19.05?

I don't think so, because there is no API here.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-24 12:07         ` Luca Boccassi
@ 2019-03-25  2:45           ` Ye Xiaolong
  2019-03-25 10:42             ` Luca Boccassi
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-25  2:45 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev, Qi Zhang, Karlsson, Magnus

On 03/24, Luca Boccassi wrote:
>On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
>> On 03/02, Ye Xiaolong wrote:
>> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
>> > > > -lrte_pmd_af_packet
>> > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
>> > > > -lrte_pmd_af_xdp
>> > > > -lelf -lbpf
>> > > 
>> > > Are symbols from libelf being used by the PMD?
>> > 
>> > Hmm, it is a leftover of RFC, libelf is no longer needed in this
>> > version, will
>> > remove it in next version.
>> > 
>> 
>> Correction, libelf is needed for libbpf, so we still need to keep
>> it. 
>
>If libbpf needs libelf for its internal usage, it should be linked
>against it itself. Unless symbols from libelf are used in static
>functions defined in libbpf's public headers. Is this the case?
>

Yes, that's the case. without the libelf, it would have build error as below,
and these symbols are used in static functions of libbpf.

/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_nextscn'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_rawdata'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_memory'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `gelf_getrel'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_strptr'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_end'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_getscn'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_begin'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `gelf_getsym'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_version'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `gelf_getehdr'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `elf_getdata'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so: undefined reference to `gelf_getshdr'

Thanks,
Xiaolong
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v5 0/5] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (9 preceding siblings ...)
  2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-25  6:03 ` Xiaolong Ye
  2019-03-25  6:03   ` [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                     ` (4 more replies)
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
                   ` (5 subsequent siblings)
  16 siblings, 5 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-25  6:03 UTC (permalink / raw)
  To: dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (5):
  net/af_xdp: introduce AF XDP PMD driver
  lib/mbuf: introduce helper to create mempool with flags
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy

 MAINTAINERS                                   |    6 +
 config/common_base                            |    5 +
 doc/guides/nics/af_xdp.rst                    |   45 +
 doc/guides/nics/features/af_xdp.ini           |   11 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/rel_notes/release_19_05.rst        |    7 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   32 +
 drivers/net/af_xdp/meson.build                |   21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1020 +++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    3 +
 drivers/net/meson.build                       |    1 +
 lib/librte_mbuf/rte_mbuf.c                    |   29 +-
 lib/librte_mbuf/rte_mbuf.h                    |   45 +
 lib/librte_mbuf/rte_mbuf_version.map          |    1 +
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 18 files changed, 1228 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-25  6:03   ` Xiaolong Ye
  2019-03-25 15:58     ` Stephen Hemminger
  2019-03-25  6:03   ` [PATCH v5 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-25  6:03 UTC (permalink / raw)
  To: dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  45 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 931 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1065 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..1cc54b439 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..dd5654dd1
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c7383..062facf89 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..db7d9aa57
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..7e51169b4
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..9f6ecace6
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,931 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt "\n", __func__, ##args)
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v5 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-25  6:03   ` [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-25  6:03   ` Xiaolong Ye
  2019-03-25  6:03   ` [PATCH v5 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-25  6:03 UTC (permalink / raw)
  To: dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

This allows applications to create mbuf mempool with specific flags
such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c           | 29 ++++++++++++++----
 lib/librte_mbuf/rte_mbuf.h           | 45 ++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf_version.map |  1 +
 3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..c1db9e298 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 	m->next = NULL;
 }
 
-/* Helper to create a mbuf pool with given mempool ops name*/
-struct rte_mempool *
-rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+static struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	return mp;
 }
 
+/* Helper to create a mbuf pool with given mempool ops name*/
+struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	int socket_id, const char *ops_name)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			 priv_size, data_room_size, 0, socket_id, ops_name);
+}
+
 /* helper to create a mbuf pool */
 struct rte_mempool *
 rte_pktmbuf_pool_create(const char *name, unsigned int n,
@@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 			data_room_size, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			priv_size, data_room_size, flags, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..105ead6de 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+/**
+ * Create a mbuf pool with flags.
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index 2662a37bf..2579538e0 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -50,4 +50,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_mbuf_check;
+	rte_pktmbuf_pool_create_with_flags;
 } DPDK_18.08;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v5 3/5] lib/mempool: allow page size aligned mempool
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-25  6:03   ` [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-25  6:03   ` [PATCH v5 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-25  6:03   ` Xiaolong Ye
  2019-03-25  9:04     ` Andrew Rybchenko
  2019-03-25  6:03   ` [PATCH v5 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
  2019-03-25  6:04   ` [PATCH v5 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-25  6:03 UTC (permalink / raw)
  To: dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..171ba1057 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+			align = RTE_MAX(align, (size_t)getpagesize());
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..75553b36f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v5 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-25  6:03   ` [PATCH v5 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-25  6:03   ` Xiaolong Ye
  2019-03-25  6:04   ` [PATCH v5 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-25  6:03 UTC (permalink / raw)
  To: dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
allocated from rte_mempool can be converted to xdp_desc's address and
vice versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
 1 file changed, 72 insertions(+), 45 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 9f6ecace6..9a8e0f994 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -48,7 +48,11 @@ static int af_xdp_logtype;
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -61,7 +65,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -123,10 +127,30 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
 	int i, ret;
 
@@ -138,13 +162,15 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 
 	for (i = 0; i < reserve_size; i++) {
 		__u64 *fq_addr;
-		void *addr = NULL;
-		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+		uint64_t addr;
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		if (unlikely(mbuf == NULL)) {
 			i--;
 			break;
 		}
+		addr = mbuf_to_addr(umem, mbuf);
 		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
-		*fq_addr = (uint64_t)addr;
+		*fq_addr = addr;
 	}
 
 	xsk_ring_prod__submit(fq, i);
@@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		rx_bytes += len;
 		bufs[count++] = mbufs[i];
 
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	for (i = 0; i < n; i++) {
 		uint64_t addr;
 		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(cq, n);
@@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (nb_pkts == 0)
-		return 0;
-
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
 		kick_tx(txq);
 		return 0;
@@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (unlikely(mbuf_to_tx == NULL)) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->stats.err_pkts += nb_pkts - valid;
 	txq->stats.tx_pkts += valid;
 	txq->stats.tx_bytes += tx_bytes;
@@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	free(umem->buffer);
-	umem->buffer = NULL;
-
-	rte_ring_free(umem->buf_ring);
-	umem->buf_ring = NULL;
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
 
 	rte_free(umem);
 	umem = NULL;
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
+	void *base_addr = NULL;
 	int ret;
-	uint64_t i;
 
 	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
 	if (umem == NULL) {
@@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (umem->buf_ring == NULL) {
-		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
+		AF_XDP_LOG(ERR, "Failed to create mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		AF_XDP_LOG(ERR, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -903,10 +931,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
 	rte_free(internals->umem);
 
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_eth_dev_release_port(eth_dev);
 
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v5 5/5] net/af_xdp: enable zero copy
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-25  6:03   ` [PATCH v5 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-25  6:04   ` Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-25  6:04 UTC (permalink / raw)
  To: dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 130 ++++++++++++++++++++--------
 1 file changed, 96 insertions(+), 34 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 9a8e0f994..e5e839d85 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -67,6 +67,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct rx_stats {
@@ -85,6 +86,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct tx_stats {
@@ -202,7 +204,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
 		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
 
-	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+	if (!rxq->zc &&
+		unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool,
+				mbufs, rcvd) != 0))
 		return 0;
 
 	for (i = 0; i < rcvd; i++) {
@@ -216,13 +220,23 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		len = desc->len;
 		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
-		rte_pktmbuf_pkt_len(mbufs[i]) = len;
-		rte_pktmbuf_data_len(mbufs[i]) = len;
-		rx_bytes += len;
-		bufs[count++] = mbufs[i];
-
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		if (rxq->zc) {
+			struct rte_mbuf *mbuf;
+			mbuf = addr_to_mbuf(rxq->umem, addr);
+			rte_pktmbuf_pkt_len(mbuf) = len;
+			rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+					pkt, len);
+			rte_pktmbuf_pkt_len(mbufs[i]) = len;
+			rte_pktmbuf_data_len(mbufs[i]) = len;
+			rx_bytes += len;
+			bufs[count++] = mbufs[i];
+
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		}
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -295,22 +309,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (unlikely(mbuf_to_tx == NULL)) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (unlikely(mbuf_to_tx == NULL)) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -488,7 +509,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -505,16 +526,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
-		AF_XDP_LOG(ERR, "Failed to create mempool\n");
-		goto err;
+	if (!mb_pool) {
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (umem->mb_pool == NULL ||
+				umem->mb_pool->nb_mem_chunks != 1) {
+			AF_XDP_LOG(ERR, "Failed to create mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
@@ -536,16 +564,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (rxq->umem == NULL)
 		return -ENOMEM;
 
@@ -622,7 +677,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
 		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
 		ret = -EINVAL;
 		goto err;
@@ -630,6 +685,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		AF_XDP_LOG(INFO,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v5 3/5] lib/mempool: allow page size aligned mempool
  2019-03-25  6:03   ` [PATCH v5 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-25  9:04     ` Andrew Rybchenko
  2019-03-26  3:27       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Andrew Rybchenko @ 2019-03-25  9:04 UTC (permalink / raw)
  To: Xiaolong Ye, dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Olivier Matz

On 3/25/19 9:03 AM, Xiaolong Ye wrote:
> Allow create a mempool with page size aligned base address.
>
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   lib/librte_mempool/rte_mempool.c | 3 +++
>   lib/librte_mempool/rte_mempool.h | 1 +
>   2 files changed, 4 insertions(+)
>
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 683b216f9..171ba1057 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		if (try_contig)
>   			flags |= RTE_MEMZONE_IOVA_CONTIG;
>   
> +		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
> +			align = RTE_MAX(align, (size_t)getpagesize());
> +
>   		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
>   				mp->socket_id, flags, align);
>   
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 7c9cd9a2f..75553b36f 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -264,6 +264,7 @@ struct rte_mempool {
>   #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>   #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
>   #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
> +#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */

For me it sounds like mempool objects should be page aligned since
MEMPOOL_F_NO_SPREAD, MEMPOOL_F_NO_CACHE_ALIGN and
MEMPOOL_F_NO_IOVA_CONTIG say about object properties, not chunk.

Personally I doubt that the final goal is just having chunk page aligned.

Andrew.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-25  2:45           ` Ye Xiaolong
@ 2019-03-25 10:42             ` Luca Boccassi
  2019-03-25 12:22               ` Ye Xiaolong
  2019-03-26  2:18               ` Ye Xiaolong
  0 siblings, 2 replies; 214+ messages in thread
From: Luca Boccassi @ 2019-03-25 10:42 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Qi Zhang, Karlsson, Magnus

On Mon, 2019-03-25 at 10:45 +0800, Ye Xiaolong wrote:
> On 03/24, Luca Boccassi wrote:
> > On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
> > > On 03/02, Ye Xiaolong wrote:
> > > > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
> > > > > > -lrte_pmd_af_packet
> > > > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
> > > > > > -lrte_pmd_af_xdp
> > > > > > -lelf -lbpf
> > > > > 
> > > > > Are symbols from libelf being used by the PMD?
> > > > 
> > > > Hmm, it is a leftover of RFC, libelf is no longer needed in
> > > > this
> > > > version, will
> > > > remove it in next version.
> > > > 
> > > 
> > > Correction, libelf is needed for libbpf, so we still need to keep
> > > it. 
> > 
> > If libbpf needs libelf for its internal usage, it should be linked
> > against it itself. Unless symbols from libelf are used in static
> > functions defined in libbpf's public headers. Is this the case?
> > 
> 
> Yes, that's the case. without the libelf, it would have build error
> as below,
> and these symbols are used in static functions of libbpf.
> 
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_nextscn'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_rawdata'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_memory'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `gelf_getrel'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_strptr'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_end'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_getscn'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_begin'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `gelf_getsym'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_version'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `gelf_getehdr'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `elf_getdata'
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
> undefined reference to `gelf_getshdr'
> 
> Thanks,
> Xiaolong

Looking at that list, at least the very first symbol is not used by a
public header, but internally in libbpf:

$ grep -r elf_nextscn
libbpf.c:	while ((scn = elf_nextscn(elf, scn)) != NULL) {

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/libbpf.c#n793

None of those symbols are used from the public headers:

$ grep elf_ bpf.h libbpf.h btf.h
$

Therefore, this is an omission in libbpf's Makefile, as it must link
against libelf. Please raise a ticket or send a patch upstream, and
remove the -lelf from DPDK's makefiles.

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-25 10:42             ` Luca Boccassi
@ 2019-03-25 12:22               ` Ye Xiaolong
  2019-03-26  2:18               ` Ye Xiaolong
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-25 12:22 UTC (permalink / raw)
  To: Luca Boccassi, Karlsson, Magnus, Topel Bjorn; +Cc: dev, Qi Zhang

On 03/25, Luca Boccassi wrote:
>On Mon, 2019-03-25 at 10:45 +0800, Ye Xiaolong wrote:
>> On 03/24, Luca Boccassi wrote:
>> > On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
>> > > On 03/02, Ye Xiaolong wrote:
>> > > > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
>> > > > > > -lrte_pmd_af_packet
>> > > > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
>> > > > > > -lrte_pmd_af_xdp
>> > > > > > -lelf -lbpf
>> > > > > 
>> > > > > Are symbols from libelf being used by the PMD?
>> > > > 
>> > > > Hmm, it is a leftover of RFC, libelf is no longer needed in
>> > > > this
>> > > > version, will
>> > > > remove it in next version.
>> > > > 
>> > > 
>> > > Correction, libelf is needed for libbpf, so we still need to keep
>> > > it. 
>> > 
>> > If libbpf needs libelf for its internal usage, it should be linked
>> > against it itself. Unless symbols from libelf are used in static
>> > functions defined in libbpf's public headers. Is this the case?
>> > 
>> 
>> Yes, that's the case. without the libelf, it would have build error
>> as below,
>> and these symbols are used in static functions of libbpf.
>> 
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_nextscn'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_rawdata'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_memory'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getrel'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_strptr'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_end'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_getscn'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_begin'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getsym'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_version'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getehdr'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_getdata'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getshdr'
>> 
>> Thanks,
>> Xiaolong
>
>Looking at that list, at least the very first symbol is not used by a
>public header, but internally in libbpf:
>
>$ grep -r elf_nextscn
>libbpf.c:	while ((scn = elf_nextscn(elf, scn)) != NULL) {
>
>https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/libbpf.c#n793
>
>None of those symbols are used from the public headers:
>
>$ grep elf_ bpf.h libbpf.h btf.h
>$
>
>Therefore, this is an omission in libbpf's Makefile, as it must link
>against libelf. Please raise a ticket or send a patch upstream, and
>remove the -lelf from DPDK's makefiles.

Hi, Magnus, could you help handle this?

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-25  6:03   ` [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-25 15:58     ` Stephen Hemminger
  2019-03-26  2:13       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-25 15:58 UTC (permalink / raw)
  To: Xiaolong Ye
  Cc: dev, David Marchand, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin

On Mon, 25 Mar 2019 14:03:56 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
> +	if (unlikely(!ret)) {
> +		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");

You defined AF_XDP_LOG to add a newline (similar to other drivers).
But all the messages already have a newline.
This will cause log to be double spaced.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-25 15:58     ` Stephen Hemminger
@ 2019-03-26  2:13       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-26  2:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, David Marchand, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin

On 03/25, Stephen Hemminger wrote:
>On Mon, 25 Mar 2019 14:03:56 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
>> +	if (unlikely(!ret)) {
>> +		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
>
>You defined AF_XDP_LOG to add a newline (similar to other drivers).
>But all the messages already have a newline.
>This will cause log to be double spaced.

Thanks for the catch, I'll remove the newline in the definition.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-25 10:42             ` Luca Boccassi
  2019-03-25 12:22               ` Ye Xiaolong
@ 2019-03-26  2:18               ` Ye Xiaolong
  2019-03-26 10:14                 ` Luca Boccassi
  1 sibling, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-26  2:18 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev, Qi Zhang, Karlsson, Magnus

On 03/25, Luca Boccassi wrote:
>On Mon, 2019-03-25 at 10:45 +0800, Ye Xiaolong wrote:
>> On 03/24, Luca Boccassi wrote:
>> > On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
>> > > On 03/02, Ye Xiaolong wrote:
>> > > > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
>> > > > > > -lrte_pmd_af_packet
>> > > > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
>> > > > > > -lrte_pmd_af_xdp
>> > > > > > -lelf -lbpf
>> > > > > 
>> > > > > Are symbols from libelf being used by the PMD?
>> > > > 
>> > > > Hmm, it is a leftover of RFC, libelf is no longer needed in
>> > > > this
>> > > > version, will
>> > > > remove it in next version.
>> > > > 
>> > > 
>> > > Correction, libelf is needed for libbpf, so we still need to keep
>> > > it. 
>> > 
>> > If libbpf needs libelf for its internal usage, it should be linked
>> > against it itself. Unless symbols from libelf are used in static
>> > functions defined in libbpf's public headers. Is this the case?
>> > 
>> 
>> Yes, that's the case. without the libelf, it would have build error
>> as below,
>> and these symbols are used in static functions of libbpf.
>> 
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_nextscn'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_rawdata'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_memory'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getrel'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_strptr'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_end'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_getscn'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_begin'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getsym'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_version'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getehdr'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `elf_getdata'
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libbpf.so:
>> undefined reference to `gelf_getshdr'
>> 
>> Thanks,
>> Xiaolong
>
>Looking at that list, at least the very first symbol is not used by a
>public header, but internally in libbpf:
>
>$ grep -r elf_nextscn
>libbpf.c:	while ((scn = elf_nextscn(elf, scn)) != NULL) {
>
>https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/libbpf.c#n793
>
>None of those symbols are used from the public headers:
>
>$ grep elf_ bpf.h libbpf.h btf.h
>$
>
>Therefore, this is an omission in libbpf's Makefile, as it must link
>against libelf. Please raise a ticket or send a patch upstream, and
>remove the -lelf from DPDK's makefiles.

As it may need sometime for kernel community to handle the libbpf's Makefile, 
We may still need the -lelf for af_xdp pmd currently, I'll remove it later after
libbpf correct to link against libelf. Is it acceptable?

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v5 3/5] lib/mempool: allow page size aligned mempool
  2019-03-25  9:04     ` Andrew Rybchenko
@ 2019-03-26  3:27       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-26  3:27 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, David Marchand, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Stephen Hemminger, Ferruh Yigit, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin, Olivier Matz

On 03/25, Andrew Rybchenko wrote:
>On 3/25/19 9:03 AM, Xiaolong Ye wrote:
>> Allow create a mempool with page size aligned base address.
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>   lib/librte_mempool/rte_mempool.c | 3 +++
>>   lib/librte_mempool/rte_mempool.h | 1 +
>>   2 files changed, 4 insertions(+)
>> 
>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>> index 683b216f9..171ba1057 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>   		if (try_contig)
>>   			flags |= RTE_MEMZONE_IOVA_CONTIG;
>> +		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
>> +			align = RTE_MAX(align, (size_t)getpagesize());
>> +
>>   		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
>>   				mp->socket_id, flags, align);
>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>> index 7c9cd9a2f..75553b36f 100644
>> --- a/lib/librte_mempool/rte_mempool.h
>> +++ b/lib/librte_mempool/rte_mempool.h
>> @@ -264,6 +264,7 @@ struct rte_mempool {
>>   #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>>   #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
>>   #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
>> +#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
>
>For me it sounds like mempool objects should be page aligned since
>MEMPOOL_F_NO_SPREAD, MEMPOOL_F_NO_CACHE_ALIGN and
>MEMPOOL_F_NO_IOVA_CONTIG say about object properties, not chunk.
>
>Personally I doubt that the final goal is just having chunk page aligned.

Hmm, how about naming it MEMPOOL_CHUNK_F_PAGE_ALIGN?

Thanks,
Xiaolong
>
>Andrew.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-26  2:18               ` Ye Xiaolong
@ 2019-03-26 10:14                 ` Luca Boccassi
  2019-03-26 12:12                   ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-03-26 10:14 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Qi Zhang, Karlsson, Magnus

On Tue, 2019-03-26 at 10:18 +0800, Ye Xiaolong wrote:
> On 03/25, Luca Boccassi wrote:
> > On Mon, 2019-03-25 at 10:45 +0800, Ye Xiaolong wrote:
> > > On 03/24, Luca Boccassi wrote:
> > > > On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
> > > > > On 03/02, Ye Xiaolong wrote:
> > > > > > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
> > > > > > > > -lrte_pmd_af_packet
> > > > > > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
> > > > > > > > -lrte_pmd_af_xdp
> > > > > > > > -lelf -lbpf
> > > > > > > 
> > > > > > > Are symbols from libelf being used by the PMD?
> > > > > > 
> > > > > > Hmm, it is a leftover of RFC, libelf is no longer needed in
> > > > > > this
> > > > > > version, will
> > > > > > remove it in next version.
> > > > > > 
> > > > > 
> > > > > Correction, libelf is needed for libbpf, so we still need to
> > > > > keep
> > > > > it. 
> > > > 
> > > > If libbpf needs libelf for its internal usage, it should be
> > > > linked
> > > > against it itself. Unless symbols from libelf are used in
> > > > static
> > > > functions defined in libbpf's public headers. Is this the case?
> > > > 
> > > 
> > > Yes, that's the case. without the libelf, it would have build
> > > error
> > > as below,
> > > and these symbols are used in static functions of libbpf.
> > > 
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_nextscn'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_rawdata'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_memory'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `gelf_getrel'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_strptr'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_end'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_getscn'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_begin'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `gelf_getsym'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_version'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `gelf_getehdr'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `elf_getdata'
> > > /usr/lib/gcc/x86_64-redhat-
> > > linux/4.8.5/../../../../lib64/libbpf.so:
> > > undefined reference to `gelf_getshdr'
> > > 
> > > Thanks,
> > > Xiaolong
> > 
> > Looking at that list, at least the very first symbol is not used by
> > a
> > public header, but internally in libbpf:
> > 
> > $ grep -r elf_nextscn
> > libbpf.c:	while ((scn = elf_nextscn(elf, scn)) != NULL) {
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/libbpf.c#n793
> > 
> > None of those symbols are used from the public headers:
> > 
> > $ grep elf_ bpf.h libbpf.h btf.h
> > $
> > 
> > Therefore, this is an omission in libbpf's Makefile, as it must
> > link
> > against libelf. Please raise a ticket or send a patch upstream, and
> > remove the -lelf from DPDK's makefiles.
> 
> As it may need sometime for kernel community to handle the libbpf's
> Makefile, 
> We may still need the -lelf for af_xdp pmd currently, I'll remove it
> later after
> libbpf correct to link against libelf. Is it acceptable?
> 
> Thanks,
> Xiaolong

Isn't the final version not out yet anyway till May? Can the fix be
included before the release?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver
  2019-03-26 10:14                 ` Luca Boccassi
@ 2019-03-26 12:12                   ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-26 12:12 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev, Qi Zhang, Karlsson, Magnus

On 03/26, Luca Boccassi wrote:
>On Tue, 2019-03-26 at 10:18 +0800, Ye Xiaolong wrote:
>> On 03/25, Luca Boccassi wrote:
>> > On Mon, 2019-03-25 at 10:45 +0800, Ye Xiaolong wrote:
>> > > On 03/24, Luca Boccassi wrote:
>> > > > On Sun, 2019-03-17 at 11:34 +0800, Ye Xiaolong wrote:
>> > > > > On 03/02, Ye Xiaolong wrote:
>> > > > > > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
>> > > > > > > > -lrte_pmd_af_packet
>> > > > > > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     +=
>> > > > > > > > -lrte_pmd_af_xdp
>> > > > > > > > -lelf -lbpf
>> > > > > > > 
>> > > > > > > Are symbols from libelf being used by the PMD?
>> > > > > > 
>> > > > > > Hmm, it is a leftover of RFC, libelf is no longer needed in
>> > > > > > this
>> > > > > > version, will
>> > > > > > remove it in next version.
>> > > > > > 
>> > > > > 
>> > > > > Correction, libelf is needed for libbpf, so we still need to
>> > > > > keep
>> > > > > it. 
>> > > > 
>> > > > If libbpf needs libelf for its internal usage, it should be
>> > > > linked
>> > > > against it itself. Unless symbols from libelf are used in
>> > > > static
>> > > > functions defined in libbpf's public headers. Is this the case?
>> > > > 
>> > > 
>> > > Yes, that's the case. without the libelf, it would have build
>> > > error
>> > > as below,
>> > > and these symbols are used in static functions of libbpf.
>> > > 
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_nextscn'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_rawdata'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_memory'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `gelf_getrel'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_strptr'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_end'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_getscn'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_begin'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `gelf_getsym'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_version'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `gelf_getehdr'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `elf_getdata'
>> > > /usr/lib/gcc/x86_64-redhat-
>> > > linux/4.8.5/../../../../lib64/libbpf.so:
>> > > undefined reference to `gelf_getshdr'
>> > > 
>> > > Thanks,
>> > > Xiaolong
>> > 
>> > Looking at that list, at least the very first symbol is not used by
>> > a
>> > public header, but internally in libbpf:
>> > 
>> > $ grep -r elf_nextscn
>> > libbpf.c:	while ((scn = elf_nextscn(elf, scn)) != NULL) {
>> > 
>> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/libbpf.c#n793
>> > 
>> > None of those symbols are used from the public headers:
>> > 
>> > $ grep elf_ bpf.h libbpf.h btf.h
>> > $
>> > 
>> > Therefore, this is an omission in libbpf's Makefile, as it must
>> > link
>> > against libelf. Please raise a ticket or send a patch upstream, and
>> > remove the -lelf from DPDK's makefiles.
>> 
>> As it may need sometime for kernel community to handle the libbpf's
>> Makefile, 
>> We may still need the -lelf for af_xdp pmd currently, I'll remove it
>> later after
>> libbpf correct to link against libelf. Is it acceptable?
>> 
>> Thanks,
>> Xiaolong
>
>Isn't the final version not out yet anyway till May? Can the fix be
>included before the release?

I think so.

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v6 0/5] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (10 preceding siblings ...)
  2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-26 12:20 ` Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                     ` (4 more replies)
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
                   ` (4 subsequent siblings)
  16 siblings, 5 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-26 12:20 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

V6:

- remove the newline in AF_XDP_LOG definition to avoid double new lines
  issue.
- rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (5):
  net/af_xdp: introduce AF XDP PMD driver
  lib/mbuf: introduce helper to create mempool with flags
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy

 MAINTAINERS                                   |    6 +
 config/common_base                            |    5 +
 doc/guides/nics/af_xdp.rst                    |   45 +
 doc/guides/nics/features/af_xdp.ini           |   11 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/rel_notes/release_19_05.rst        |    7 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   32 +
 drivers/net/af_xdp/meson.build                |   21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1020 +++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    3 +
 drivers/net/meson.build                       |    1 +
 lib/librte_mbuf/rte_mbuf.c                    |   29 +-
 lib/librte_mbuf/rte_mbuf.h                    |   45 +
 lib/librte_mbuf/rte_mbuf_version.map          |    1 +
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 18 files changed, 1228 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-26 12:20   ` Xiaolong Ye
  2019-03-26 19:08     ` Stephen Hemminger
  2019-03-26 12:20   ` [PATCH v6 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-26 12:20 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  45 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 931 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1065 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..1cc54b439 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -468,6 +468,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..dd5654dd1
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 6f76de3ff..14317cf16 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..db7d9aa57
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..7e51169b4
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..47a496ed7
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,931 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt, __func__, ##args)
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v6 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-26 12:20   ` Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-26 12:20 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

This allows applications to create mbuf mempool with specific flags
such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c           | 29 ++++++++++++++----
 lib/librte_mbuf/rte_mbuf.h           | 45 ++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf_version.map |  1 +
 3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..c1db9e298 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 	m->next = NULL;
 }
 
-/* Helper to create a mbuf pool with given mempool ops name*/
-struct rte_mempool *
-rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+static struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	return mp;
 }
 
+/* Helper to create a mbuf pool with given mempool ops name*/
+struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	int socket_id, const char *ops_name)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			 priv_size, data_room_size, 0, socket_id, ops_name);
+}
+
 /* helper to create a mbuf pool */
 struct rte_mempool *
 rte_pktmbuf_pool_create(const char *name, unsigned int n,
@@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 			data_room_size, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			priv_size, data_room_size, flags, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..105ead6de 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+/**
+ * Create a mbuf pool with flags.
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index 2662a37bf..2579538e0 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -50,4 +50,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_mbuf_check;
+	rte_pktmbuf_pool_create_with_flags;
 } DPDK_18.08;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v6 3/5] lib/mempool: allow page size aligned mempool
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-26 12:20   ` Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
  2019-03-26 12:20   ` [PATCH v6 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-26 12:20 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..cfbb49ea5 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_CHUNK_F_PAGE_ALIGN)
+			align = RTE_MAX(align, (size_t)getpagesize());
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..47729f7c9 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_CHUNK_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-26 12:20   ` [PATCH v6 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-26 12:20   ` Xiaolong Ye
  2019-03-29 17:42     ` Olivier Matz
  2019-03-26 12:20   ` [PATCH v6 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-26 12:20 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
allocated from rte_mempool can be converted to xdp_desc's address and
vice versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
 1 file changed, 72 insertions(+), 45 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 47a496ed7..a1fda9212 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -48,7 +48,11 @@ static int af_xdp_logtype;
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -61,7 +65,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -123,10 +127,30 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
 	int i, ret;
 
@@ -138,13 +162,15 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 
 	for (i = 0; i < reserve_size; i++) {
 		__u64 *fq_addr;
-		void *addr = NULL;
-		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+		uint64_t addr;
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		if (unlikely(mbuf == NULL)) {
 			i--;
 			break;
 		}
+		addr = mbuf_to_addr(umem, mbuf);
 		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
-		*fq_addr = (uint64_t)addr;
+		*fq_addr = addr;
 	}
 
 	xsk_ring_prod__submit(fq, i);
@@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		rx_bytes += len;
 		bufs[count++] = mbufs[i];
 
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	for (i = 0; i < n; i++) {
 		uint64_t addr;
 		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(cq, n);
@@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (nb_pkts == 0)
-		return 0;
-
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
 		kick_tx(txq);
 		return 0;
@@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (unlikely(mbuf_to_tx == NULL)) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->stats.err_pkts += nb_pkts - valid;
 	txq->stats.tx_pkts += valid;
 	txq->stats.tx_bytes += tx_bytes;
@@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	free(umem->buffer);
-	umem->buffer = NULL;
-
-	rte_ring_free(umem->buf_ring);
-	umem->buf_ring = NULL;
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
 
 	rte_free(umem);
 	umem = NULL;
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
+	void *base_addr = NULL;
 	int ret;
-	uint64_t i;
 
 	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
 	if (umem == NULL) {
@@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (umem->buf_ring == NULL) {
-		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
+		AF_XDP_LOG(ERR, "Failed to create mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		AF_XDP_LOG(ERR, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -903,10 +931,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
 	rte_free(internals->umem);
 
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_eth_dev_release_port(eth_dev);
 
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v6 5/5] net/af_xdp: enable zero copy
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-26 12:20   ` [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-26 12:20   ` Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-26 12:20 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 130 ++++++++++++++++++++--------
 1 file changed, 96 insertions(+), 34 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index a1fda9212..c6ade4c94 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -67,6 +67,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct rx_stats {
@@ -85,6 +86,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct tx_stats {
@@ -202,7 +204,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
 		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
 
-	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+	if (!rxq->zc &&
+		unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool,
+				mbufs, rcvd) != 0))
 		return 0;
 
 	for (i = 0; i < rcvd; i++) {
@@ -216,13 +220,23 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		len = desc->len;
 		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
-		rte_pktmbuf_pkt_len(mbufs[i]) = len;
-		rte_pktmbuf_data_len(mbufs[i]) = len;
-		rx_bytes += len;
-		bufs[count++] = mbufs[i];
-
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		if (rxq->zc) {
+			struct rte_mbuf *mbuf;
+			mbuf = addr_to_mbuf(rxq->umem, addr);
+			rte_pktmbuf_pkt_len(mbuf) = len;
+			rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+					pkt, len);
+			rte_pktmbuf_pkt_len(mbufs[i]) = len;
+			rte_pktmbuf_data_len(mbufs[i]) = len;
+			rx_bytes += len;
+			bufs[count++] = mbufs[i];
+
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		}
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -295,22 +309,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (unlikely(mbuf_to_tx == NULL)) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (unlikely(mbuf_to_tx == NULL)) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -488,7 +509,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -505,16 +526,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
-		AF_XDP_LOG(ERR, "Failed to create mempool\n");
-		goto err;
+	if (!mb_pool) {
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (umem->mb_pool == NULL ||
+				umem->mb_pool->nb_mem_chunks != 1) {
+			AF_XDP_LOG(ERR, "Failed to create mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
@@ -536,16 +564,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (rxq->umem == NULL)
 		return -ENOMEM;
 
@@ -622,7 +677,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
 		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
 		ret = -EINVAL;
 		goto err;
@@ -630,6 +685,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		AF_XDP_LOG(INFO,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-26 12:20   ` [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-26 19:08     ` Stephen Hemminger
  2019-03-27  5:33       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-03-26 19:08 UTC (permalink / raw)
  To: Xiaolong Ye
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Ferruh Yigit, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin

On Tue, 26 Mar 2019 20:20:25 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
> +specific netdev queue, it allows a DPDK application to send and receive raw
> +packets through the socket which would bypass the kernel network stack.
> +Current implementation only supports single queue, multi-queues feature will
> +be added later.
> +

It might be worth mentioning that MTU in XDP is limited because the kernel
doesn't allow XDP with segmented packets.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-26 19:08     ` Stephen Hemminger
@ 2019-03-27  5:33       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-27  5:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Ferruh Yigit, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin

On 03/26, Stephen Hemminger wrote:
>On Tue, 26 Mar 2019 20:20:25 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
>> +specific netdev queue, it allows a DPDK application to send and receive raw
>> +packets through the socket which would bypass the kernel network stack.
>> +Current implementation only supports single queue, multi-queues feature will
>> +be added later.
>> +
>
>It might be worth mentioning that MTU in XDP is limited because the kernel
>doesn't allow XDP with segmented packets.

Ok, will add it in next version.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v7 0/5] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (11 preceding siblings ...)
  2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-27  9:00 ` Xiaolong Ye
  2019-03-27  9:00   ` [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
                     ` (4 more replies)
  2019-04-02 10:45 ` [PATCH v8 0/1] AF_XDP PMD Xiaolong Ye
                   ` (3 subsequent siblings)
  16 siblings, 5 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-27  9:00 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

v7:
- mention mtu limitation in af_xdp.rst
- fix the vdev name in af_xdp.rst

V6:

- remove the newline in AF_XDP_LOG definition to avoid double new lines
  issue.
- rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (5):
  net/af_xdp: introduce AF XDP PMD driver
  lib/mbuf: introduce helper to create mempool with flags
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy

 MAINTAINERS                                   |    6 +
 config/common_base                            |    5 +
 doc/guides/nics/af_xdp.rst                    |   45 +
 doc/guides/nics/features/af_xdp.ini           |   11 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/rel_notes/release_19_05.rst        |    7 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   32 +
 drivers/net/af_xdp/meson.build                |   21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1020 +++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    3 +
 drivers/net/meson.build                       |    1 +
 lib/librte_mbuf/rte_mbuf.c                    |   29 +-
 lib/librte_mbuf/rte_mbuf.h                    |   45 +
 lib/librte_mbuf/rte_mbuf_version.map          |    1 +
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 18 files changed, 1228 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-03-27  9:00   ` Xiaolong Ye
  2019-03-28 17:51     ` Ferruh Yigit
  2019-03-27  9:00   ` [PATCH v7 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-27  9:00 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  48 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 931 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1068 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 21e164095..58ebe9d58 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -474,6 +474,12 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..af675d910
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,48 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2019 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Note that MTU of AF_XDP PMD is limited due to XDP lacks support for
+fragmentation.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev net_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index d11bb5a2b..c6facb88a 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..db7d9aa57
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..7e51169b4
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..47a496ed7
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,931 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt, __func__, ##args)
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v7 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-27  9:00   ` [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-27  9:00   ` Xiaolong Ye
  2019-03-28 19:30     ` Ferruh Yigit
  2019-03-27  9:00   ` [PATCH v7 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-27  9:00 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

This allows applications to create mbuf mempool with specific flags
such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c           | 29 ++++++++++++++----
 lib/librte_mbuf/rte_mbuf.h           | 45 ++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf_version.map |  1 +
 3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 21f6f7404..c1db9e298 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 	m->next = NULL;
 }
 
-/* Helper to create a mbuf pool with given mempool ops name*/
-struct rte_mempool *
-rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+static struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	return mp;
 }
 
+/* Helper to create a mbuf pool with given mempool ops name*/
+struct rte_mempool *
+rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	int socket_id, const char *ops_name)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			 priv_size, data_room_size, 0, socket_id, ops_name);
+}
+
 /* helper to create a mbuf pool */
 struct rte_mempool *
 rte_pktmbuf_pool_create(const char *name, unsigned int n,
@@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 			data_room_size, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size,
+			priv_size, data_room_size, flags, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d961ccaf6..105ead6de 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+/**
+ * Create a mbuf pool with flags.
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool * __rte_experimental
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id);
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index 2662a37bf..2579538e0 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -50,4 +50,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_mbuf_check;
+	rte_pktmbuf_pool_create_with_flags;
 } DPDK_18.08;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v7 3/5] lib/mempool: allow page size aligned mempool
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
  2019-03-27  9:00   ` [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-03-27  9:00   ` [PATCH v7 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-27  9:00   ` Xiaolong Ye
  2019-03-28 19:34     ` Ferruh Yigit
  2019-03-29 10:37     ` Andrew Rybchenko
  2019-03-27  9:00   ` [PATCH v7 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
  2019-03-27  9:00   ` [PATCH v7 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-27  9:00 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 3 +++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 683b216f9..cfbb49ea5 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		if (try_contig)
 			flags |= RTE_MEMZONE_IOVA_CONTIG;
 
+		if (mp->flags & MEMPOOL_CHUNK_F_PAGE_ALIGN)
+			align = RTE_MAX(align, (size_t)getpagesize());
+
 		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 				mp->socket_id, flags, align);
 
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7c9cd9a2f..47729f7c9 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -264,6 +264,7 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
+#define MEMPOOL_CHUNK_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
 
 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v7 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (2 preceding siblings ...)
  2019-03-27  9:00   ` [PATCH v7 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-27  9:00   ` Xiaolong Ye
  2019-03-27  9:00   ` [PATCH v7 5/5] net/af_xdp: enable zero copy Xiaolong Ye
  4 siblings, 0 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-27  9:00 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
allocated from rte_mempool can be converted to xdp_desc's address and
vice versa.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
 1 file changed, 72 insertions(+), 45 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 47a496ed7..a1fda9212 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -48,7 +48,11 @@ static int af_xdp_logtype;
 
 #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
 #define ETH_AF_XDP_NUM_BUFFERS		4096
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM				\
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -61,7 +65,7 @@ struct xsk_umem_info {
 	struct xsk_ring_prod fq;
 	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
-	struct rte_ring *buf_ring;
+	struct rte_mempool *mb_pool;
 	void *buffer;
 };
 
@@ -123,10 +127,30 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static inline struct rte_mbuf *
+addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr)
+{
+	uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE *
+			ETH_AF_XDP_FRAME_SIZE);
+	struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer +
+				    offset + ETH_AF_XDP_MBUF_OVERHEAD -
+				    sizeof(struct rte_mbuf));
+	mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD;
+	return mbuf;
+}
+
+static inline uint64_t
+mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf)
+{
+	return (uint64_t)mbuf->buf_addr + mbuf->data_off -
+		(uint64_t)umem->buffer;
+}
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
+	struct rte_mbuf *mbuf;
 	uint32_t idx;
 	int i, ret;
 
@@ -138,13 +162,15 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
 
 	for (i = 0; i < reserve_size; i++) {
 		__u64 *fq_addr;
-		void *addr = NULL;
-		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+		uint64_t addr;
+		mbuf = rte_pktmbuf_alloc(umem->mb_pool);
+		if (unlikely(mbuf == NULL)) {
 			i--;
 			break;
 		}
+		addr = mbuf_to_addr(umem, mbuf);
 		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
-		*fq_addr = (uint64_t)addr;
+		*fq_addr = addr;
 	}
 
 	xsk_ring_prod__submit(fq, i);
@@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		rx_bytes += len;
 		bufs[count++] = mbufs[i];
 
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size)
 	for (i = 0; i < n; i++) {
 		uint64_t addr;
 		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
-		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
 	}
 
 	xsk_ring_cons__release(cq, n);
@@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_tx_queue *txq = queue;
 	struct xsk_umem_info *umem = txq->pair->umem;
 	struct rte_mbuf *mbuf;
-	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbuf_to_tx;
 	unsigned long tx_bytes = 0;
 	int i, valid = 0;
 	uint32_t idx_tx;
@@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	pull_umem_cq(umem, nb_pkts);
 
-	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
-					nb_pkts, NULL);
-	if (nb_pkts == 0)
-		return 0;
-
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
 		kick_tx(txq);
 		return 0;
@@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			desc->addr = (uint64_t)addrs[valid];
+			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+			if (unlikely(mbuf_to_tx == NULL)) {
+				rte_pktmbuf_free(mbuf);
+				continue;
+			}
+			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
 			desc->len = mbuf->pkt_len;
 			pkt = xsk_umem__get_data(umem->buffer,
 						 desc->addr);
@@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	kick_tx(txq);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
-				 nb_pkts - valid, NULL);
-
 	txq->stats.err_pkts += nb_pkts - valid;
 	txq->stats.tx_pkts += valid;
 	txq->stats.tx_bytes += tx_bytes;
@@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 
 static void xdp_umem_destroy(struct xsk_umem_info *umem)
 {
-	free(umem->buffer);
-	umem->buffer = NULL;
-
-	rte_ring_free(umem->buf_ring);
-	umem->buf_ring = NULL;
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
 
 	rte_free(umem);
 	umem = NULL;
 }
 
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->addr);
+}
+
+static inline uint64_t get_len(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)(memhdr->len);
+}
+
 static struct xsk_umem_info *xdp_umem_configure(void)
 {
 	struct xsk_umem_info *umem;
@@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	void *bufs = NULL;
+	void *base_addr = NULL;
 	int ret;
-	uint64_t i;
 
 	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
 	if (umem == NULL) {
@@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
-					 ETH_AF_XDP_NUM_BUFFERS,
-					 SOCKET_ID_ANY,
-					 0x0);
-	if (umem->buf_ring == NULL) {
-		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+			ETH_AF_XDP_NUM_BUFFERS,
+			250, 0,
+			ETH_AF_XDP_FRAME_SIZE -
+			ETH_AF_XDP_MBUF_OVERHEAD,
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
+		AF_XDP_LOG(ERR, "Failed to create mempool\n");
 		goto err;
 	}
+	base_addr = (void *)get_base_addr(umem->mb_pool);
 
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(umem->buf_ring,
-				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
-					  ETH_AF_XDP_DATA_HEADROOM));
-
-	if (posix_memalign(&bufs, getpagesize(),
-			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
-		AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n");
-		goto err;
-	}
-	ret = xsk_umem__create(&umem->umem, bufs,
+	ret = xsk_umem__create(&umem->umem, base_addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			       &umem->fq, &umem->cq,
 			       &usr_config);
@@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		AF_XDP_LOG(ERR, "Failed to create umem");
 		goto err;
 	}
-	umem->buffer = bufs;
+	umem->buffer = base_addr;
 
 	return umem;
 
@@ -903,10 +931,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 
-	rte_ring_free(internals->umem->buf_ring);
-	rte_free(internals->umem->buffer);
 	rte_free(internals->umem);
 
+	rte_mempool_free(internals->umem->mb_pool);
 	rte_eth_dev_release_port(eth_dev);
 
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [PATCH v7 5/5] net/af_xdp: enable zero copy
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
                     ` (3 preceding siblings ...)
  2019-03-27  9:00   ` [PATCH v7 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-27  9:00   ` Xiaolong Ye
  2019-03-28 18:44     ` Ferruh Yigit
  4 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-03-27  9:00 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 130 ++++++++++++++++++++--------
 1 file changed, 96 insertions(+), 34 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index a1fda9212..c6ade4c94 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -67,6 +67,7 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t zc;
 };
 
 struct rx_stats {
@@ -85,6 +86,7 @@ struct pkt_rx_queue {
 
 	struct pkt_tx_queue *pair;
 	uint16_t queue_idx;
+	uint8_t zc;
 };
 
 struct tx_stats {
@@ -202,7 +204,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
 		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
 
-	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+	if (!rxq->zc &&
+		unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool,
+				mbufs, rcvd) != 0))
 		return 0;
 
 	for (i = 0; i < rcvd; i++) {
@@ -216,13 +220,23 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		len = desc->len;
 		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
 
-		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
-		rte_pktmbuf_pkt_len(mbufs[i]) = len;
-		rte_pktmbuf_data_len(mbufs[i]) = len;
-		rx_bytes += len;
-		bufs[count++] = mbufs[i];
-
-		rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		if (rxq->zc) {
+			struct rte_mbuf *mbuf;
+			mbuf = addr_to_mbuf(rxq->umem, addr);
+			rte_pktmbuf_pkt_len(mbuf) = len;
+			rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+					pkt, len);
+			rte_pktmbuf_pkt_len(mbufs[i]) = len;
+			rte_pktmbuf_data_len(mbufs[i]) = len;
+			rx_bytes += len;
+			bufs[count++] = mbufs[i];
+
+			rte_pktmbuf_free(addr_to_mbuf(umem, addr));
+		}
 	}
 
 	xsk_ring_cons__release(rx, rcvd);
@@ -295,22 +309,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 					- ETH_AF_XDP_DATA_HEADROOM;
 		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
-			if (unlikely(mbuf_to_tx == NULL)) {
-				rte_pktmbuf_free(mbuf);
-				continue;
-			}
-			desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+		if (txq->pair->zc && mbuf->pool == umem->mb_pool) {
+			desc->addr = mbuf_to_addr(umem, mbuf);
 			desc->len = mbuf->pkt_len;
-			pkt = xsk_umem__get_data(umem->buffer,
-						 desc->addr);
-			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       desc->len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
+		} else {
+			if (mbuf->pkt_len <= buf_len) {
+				mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool);
+				if (unlikely(mbuf_to_tx == NULL)) {
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				desc->addr = mbuf_to_addr(umem, mbuf_to_tx);
+				desc->len = mbuf->pkt_len;
+				pkt = xsk_umem__get_data(umem->buffer,
+							 desc->addr);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       desc->len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			rte_pktmbuf_free(mbuf);
 		}
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -488,7 +509,7 @@ static inline uint64_t get_len(struct rte_mempool *mp)
 	return (uint64_t)(memhdr->len);
 }
 
-static struct xsk_umem_info *xdp_umem_configure(void)
+static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config usr_config = {
@@ -505,16 +526,23 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
-			ETH_AF_XDP_NUM_BUFFERS,
-			250, 0,
-			ETH_AF_XDP_FRAME_SIZE -
-			ETH_AF_XDP_MBUF_OVERHEAD,
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
-		AF_XDP_LOG(ERR, "Failed to create mempool\n");
-		goto err;
+	if (!mb_pool) {
+		umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				ETH_AF_XDP_FRAME_SIZE -
+				ETH_AF_XDP_MBUF_OVERHEAD,
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+
+		if (umem->mb_pool == NULL ||
+				umem->mb_pool->nb_mem_chunks != 1) {
+			AF_XDP_LOG(ERR, "Failed to create mempool\n");
+			goto err;
+		}
+	} else {
+		umem->mb_pool = mb_pool;
+		umem->zc = 1;
 	}
 	base_addr = (void *)get_base_addr(umem->mb_pool);
 
@@ -536,16 +564,43 @@ static struct xsk_umem_info *xdp_umem_configure(void)
 	return NULL;
 }
 
+static uint8_t
+check_mempool_zc(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+	    ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
 static int
 xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
-	      int ring_size)
+	      int ring_size, struct rte_mempool *mb_pool)
 {
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
+	struct rte_mempool *mp;
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	mp = check_mempool_zc(mb_pool) ? mb_pool : NULL;
+	rxq->umem = xdp_umem_configure(mp);
 	if (rxq->umem == NULL)
 		return -ENOMEM;
 
@@ -622,7 +677,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	rxq->mb_pool = mb_pool;
 
-	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+	if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) {
 		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
 		ret = -EINVAL;
 		goto err;
@@ -630,6 +685,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	internals->umem = rxq->umem;
 
+	if (mb_pool == internals->umem->mb_pool)
+		rxq->zc = internals->umem->zc;
+
+	if (rxq->zc)
+		AF_XDP_LOG(INFO,
+			"zero copy enabled on rx queue %d\n", rx_queue_id);
+
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-27  9:00   ` [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-03-28 17:51     ` Ferruh Yigit
  2019-03-28 18:52       ` Luca Boccassi
  2019-03-29  2:05       ` Ye Xiaolong
  0 siblings, 2 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-28 17:51 UTC (permalink / raw)
  To: Xiaolong Ye, dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin

On 3/27/2019 9:00 AM, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

Hi Xiaolong,

Mostly looks good, only there are a few comments on minor issues, can you please
check them?

<...>

> @@ -474,6 +474,12 @@ M: John W. Linville <linville@tuxdriver.com>
>  F: drivers/net/af_packet/
>  F: doc/guides/nics/features/afpacket.ini
>  
> +Linux AF_XDP
> +M: Xiaolong Ye <xiaolong.ye@intel.com>
> +M: Qi Zhang <qi.z.zhang@intel.com>
> +F: drivers/net/af_xdp/
> +F: doc/guides/nics/features/af_xdp.rst

Can you please add .ini file too?

<...>

> +static const char * const valid_arguments[] = {
> +	ETH_AF_XDP_IFACE_ARG,
> +	ETH_AF_XDP_QUEUE_IDX_ARG,
> +	NULL
> +};

Minor issue, but can you please keep close the .*ARG defines and this structure,
either move this up or move defines down just above this struct?

> +
> +static struct rte_eth_link pmd_link = {
> +	.link_speed = ETH_SPEED_NUM_10G,
> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
> +	.link_status = ETH_LINK_DOWN,
> +	.link_autoneg = ETH_LINK_AUTONEG
> +};

Can this variable be const?

<...>

> +static void
> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
> +	dev_info->max_rx_queues = 1;
> +	dev_info->max_tx_queues = 1;

What do you think documenting the only single queue supported limitation?

<...>

> +static void remove_xdp_program(struct pmd_internals *internals)
> +{

According coding convention it should be:

static void
remove_xdp_program(struct pmd_internals *internals)
{

There is a usage of mixture, can you please update them according convention?

<...>

> +	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
> +		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
> +		return -EINVAL;
> +	}

At this point we don't know if the user provided a "if_name" value, needs a way
to confirm mandantory devargs provided before continue.

<...>

> +RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
> +RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,

s/eth_af_xdp/net_af_xdp

> +			      "iface=<string> "
> +			      "queue=<int> ");
> +
> +RTE_INIT(af_xdp_init_log)
> +{
> +	af_xdp_logtype = rte_log_register("pmd.net.xdp");

why not "pmd.net.af_xdp"?

<...>

> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
>  endif
>  
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf

Is "-lelf" still required?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 5/5] net/af_xdp: enable zero copy
  2019-03-27  9:00   ` [PATCH v7 5/5] net/af_xdp: enable zero copy Xiaolong Ye
@ 2019-03-28 18:44     ` Ferruh Yigit
  2019-03-29  1:53       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-28 18:44 UTC (permalink / raw)
  To: Xiaolong Ye, dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin

On 3/27/2019 9:00 AM, Xiaolong Ye wrote:
> Try to check if external mempool (from rx_queue_setup) is fit for
> af_xdp, if it is, it will be registered to af_xdp socket directly and
> there will be no packet data copy on Rx and Tx.
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 130 ++++++++++++++++++++--------
>  1 file changed, 96 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index a1fda9212..c6ade4c94 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -67,6 +67,7 @@ struct xsk_umem_info {
>  	struct xsk_umem *umem;
>  	struct rte_mempool *mb_pool;
>  	void *buffer;
> +	uint8_t zc;
>  };
>  
>  struct rx_stats {
> @@ -85,6 +86,7 @@ struct pkt_rx_queue {
>  
>  	struct pkt_tx_queue *pair;
>  	uint16_t queue_idx;
> +	uint8_t zc;
>  };
>  
>  struct tx_stats {

<...>

> @@ -630,6 +685,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>  
>  	internals->umem = rxq->umem;
>  
> +	if (mb_pool == internals->umem->mb_pool)
> +		rxq->zc = internals->umem->zc;
> +
> +	if (rxq->zc)
> +		AF_XDP_LOG(INFO,
> +			"zero copy enabled on rx queue %d\n", rx_queue_id);
> +

The "zero copy" implemented in this patch, also the variable 'zc', is from
'af_xdp' umem to mbuf data via versa copy, right?
There is also another "zero copy" support in af_xdp, device to buffers...
Do you think can these be confused with each other, should we have another log
message and variable name for this one?
Indeed I can't think of a good name, but something like, "pmd/driver zero copy"
& 'pmd_zc' ??

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-28 17:51     ` Ferruh Yigit
@ 2019-03-28 18:52       ` Luca Boccassi
  2019-04-02 19:55         ` Ferruh Yigit
  2019-03-29  2:05       ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-03-28 18:52 UTC (permalink / raw)
  To: Ferruh Yigit, Xiaolong Ye, dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Bruce Richardson, Ananyev Konstantin

On Thu, 2019-03-28 at 17:51 +0000, Ferruh Yigit wrote:
> > @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  +=
> > -lrte_mempool_dpaa2
> >   endif
> >   
> >   _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
> > -lrte_pmd_af_packet
> > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
> > -lelf -lbpf
> 
> Is "-lelf" still required?

This was fixed upstream in the bpf tree by Björn:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=89dedaef49d36adc2bb5e7e4c38b52fa3013c7c8

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 2/5] lib/mbuf: introduce helper to create mempool with flags
  2019-03-27  9:00   ` [PATCH v7 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
@ 2019-03-28 19:30     ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-28 19:30 UTC (permalink / raw)
  To: Xiaolong Ye, dev, David Marchand, Andrew Rybchenko, Olivier Matz
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin

On 3/27/2019 9:00 AM, Xiaolong Ye wrote:
> This allows applications to create mbuf mempool with specific flags
> such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

Hi Olivier,

Can you please check this patch?
I would like to get the PMD for rc1 but it has a dependency to this mbuf change.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 3/5] lib/mempool: allow page size aligned mempool
  2019-03-27  9:00   ` [PATCH v7 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
@ 2019-03-28 19:34     ` Ferruh Yigit
  2019-03-29 10:37     ` Andrew Rybchenko
  1 sibling, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-28 19:34 UTC (permalink / raw)
  To: Andrew Rybchenko, Olivier MATZ
  Cc: Xiaolong Ye, dev, David Marchand, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin

On 3/27/2019 9:00 AM, Xiaolong Ye wrote:
> Allow create a mempool with page size aligned base address.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>


Hi Andrew, Olivier,

Can you please check this patch,
It is adding a new mempool flag which is dependency for the pmd.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 5/5] net/af_xdp: enable zero copy
  2019-03-28 18:44     ` Ferruh Yigit
@ 2019-03-29  1:53       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-29  1:53 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin

On 03/28, Ferruh Yigit wrote:
>On 3/27/2019 9:00 AM, Xiaolong Ye wrote:
>> Try to check if external mempool (from rx_queue_setup) is fit for
>> af_xdp, if it is, it will be registered to af_xdp socket directly and
>> there will be no packet data copy on Rx and Tx.
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  drivers/net/af_xdp/rte_eth_af_xdp.c | 130 ++++++++++++++++++++--------
>>  1 file changed, 96 insertions(+), 34 deletions(-)
>> 
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index a1fda9212..c6ade4c94 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -67,6 +67,7 @@ struct xsk_umem_info {
>>  	struct xsk_umem *umem;
>>  	struct rte_mempool *mb_pool;
>>  	void *buffer;
>> +	uint8_t zc;
>>  };
>>  
>>  struct rx_stats {
>> @@ -85,6 +86,7 @@ struct pkt_rx_queue {
>>  
>>  	struct pkt_tx_queue *pair;
>>  	uint16_t queue_idx;
>> +	uint8_t zc;
>>  };
>>  
>>  struct tx_stats {
>
><...>
>
>> @@ -630,6 +685,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>>  
>>  	internals->umem = rxq->umem;
>>  
>> +	if (mb_pool == internals->umem->mb_pool)
>> +		rxq->zc = internals->umem->zc;
>> +
>> +	if (rxq->zc)
>> +		AF_XDP_LOG(INFO,
>> +			"zero copy enabled on rx queue %d\n", rx_queue_id);
>> +
>
>The "zero copy" implemented in this patch, also the variable 'zc', is from
>'af_xdp' umem to mbuf data via versa copy, right?
>There is also another "zero copy" support in af_xdp, device to buffers...
>Do you think can these be confused with each other, should we have another log
>message and variable name for this one?
>Indeed I can't think of a good name, but something like, "pmd/driver zero copy"
>& 'pmd_zc' ??

That's a good suggestion, will adopt it.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-28 17:51     ` Ferruh Yigit
  2019-03-28 18:52       ` Luca Boccassi
@ 2019-03-29  2:05       ` Ye Xiaolong
  2019-03-29  8:10         ` Ferruh Yigit
  1 sibling, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-29  2:05 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin

Hi, Ferruh

Thanks for the comments.

On 03/28, Ferruh Yigit wrote:
>On 3/27/2019 9:00 AM, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>
>Hi Xiaolong,
>
>Mostly looks good, only there are a few comments on minor issues, can you please
>check them?
>
><...>
>
>> @@ -474,6 +474,12 @@ M: John W. Linville <linville@tuxdriver.com>
>>  F: drivers/net/af_packet/
>>  F: doc/guides/nics/features/afpacket.ini
>>  
>> +Linux AF_XDP
>> +M: Xiaolong Ye <xiaolong.ye@intel.com>
>> +M: Qi Zhang <qi.z.zhang@intel.com>
>> +F: drivers/net/af_xdp/
>> +F: doc/guides/nics/features/af_xdp.rst
>
>Can you please add .ini file too?

Will add.

>
><...>
>
>> +static const char * const valid_arguments[] = {
>> +	ETH_AF_XDP_IFACE_ARG,
>> +	ETH_AF_XDP_QUEUE_IDX_ARG,
>> +	NULL
>> +};
>
>Minor issue, but can you please keep close the .*ARG defines and this structure,
>either move this up or move defines down just above this struct?

Got it, I'll put them together.

>
>> +
>> +static struct rte_eth_link pmd_link = {
>> +	.link_speed = ETH_SPEED_NUM_10G,
>> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
>> +	.link_status = ETH_LINK_DOWN,
>> +	.link_autoneg = ETH_LINK_AUTONEG
>> +};
>
>Can this variable be const?

Will do.

>
><...>
>
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>> +{
>> +	struct pmd_internals *internals = dev->data->dev_private;
>> +
>> +	dev_info->if_index = internals->if_index;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
>> +	dev_info->max_rx_queues = 1;
>> +	dev_info->max_tx_queues = 1;
>
>What do you think documenting the only single queue supported limitation?

I've mentioned the single queue limitation in af_xdp.rst, or you mean we also
need to comment it here?

>
><...>
>
>> +static void remove_xdp_program(struct pmd_internals *internals)
>> +{
>
>According coding convention it should be:
>
>static void
>remove_xdp_program(struct pmd_internals *internals)
>{
>
>There is a usage of mixture, can you please update them according convention?
>
><...>

Will do.

>
>> +	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
>> +		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
>> +		return -EINVAL;
>> +	}
>
>At this point we don't know if the user provided a "if_name" value, needs a way
>to confirm mandantory devargs provided before continue.
>
><...>

Got it, will do.

>
>> +RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
>> +RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
>
>s/eth_af_xdp/net_af_xdp

Got it.

>
>> +			      "iface=<string> "
>> +			      "queue=<int> ");
>> +
>> +RTE_INIT(af_xdp_init_log)
>> +{
>> +	af_xdp_logtype = rte_log_register("pmd.net.xdp");
>
>why not "pmd.net.af_xdp"?

Will do.

>
><...>
>
>> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
>>  endif
>>  
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
>
>Is "-lelf" still required?

Will remove it since Bjorn hasn't sent the fix patch to kernel community.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-29  2:05       ` Ye Xiaolong
@ 2019-03-29  8:10         ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-03-29  8:10 UTC (permalink / raw)
  To: Ye Xiaolong
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Luca Boccassi,
	Bruce Richardson, Ananyev Konstantin

On 3/29/2019 2:05 AM, Ye Xiaolong wrote:
>> <...>
>>
>>> +static void
>>> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>>> +{
>>> +	struct pmd_internals *internals = dev->data->dev_private;
>>> +
>>> +	dev_info->if_index = internals->if_index;
>>> +	dev_info->max_mac_addrs = 1;
>>> +	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
>>> +	dev_info->max_rx_queues = 1;
>>> +	dev_info->max_tx_queues = 1;
>> What do you think documenting the only single queue supported limitation?
> I've mentioned the single queue limitation in af_xdp.rst, or you mean we also
> need to comment it here?
> 

I missed it in af_xdp.rst, if documented there I think it is OK.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 3/5] lib/mempool: allow page size aligned mempool
  2019-03-27  9:00   ` [PATCH v7 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
  2019-03-28 19:34     ` Ferruh Yigit
@ 2019-03-29 10:37     ` Andrew Rybchenko
  2019-03-29 17:42       ` Olivier Matz
  1 sibling, 1 reply; 214+ messages in thread
From: Andrew Rybchenko @ 2019-03-29 10:37 UTC (permalink / raw)
  To: Xiaolong Ye, dev, David Marchand
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin

On 3/27/19 12:00 PM, Xiaolong Ye wrote:
> Allow create a mempool with page size aligned base address.
>
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>   lib/librte_mempool/rte_mempool.c | 3 +++
>   lib/librte_mempool/rte_mempool.h | 1 +
>   2 files changed, 4 insertions(+)
>
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 683b216f9..cfbb49ea5 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		if (try_contig)
>   			flags |= RTE_MEMZONE_IOVA_CONTIG;
>   
> +		if (mp->flags & MEMPOOL_CHUNK_F_PAGE_ALIGN)
> +			align = RTE_MAX(align, (size_t)getpagesize());
> +
>   		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
>   				mp->socket_id, flags, align);
>   
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 7c9cd9a2f..47729f7c9 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -264,6 +264,7 @@ struct rte_mempool {
>   #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>   #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
>   #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
> +#define MEMPOOL_CHUNK_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */

Now the define name better explains what happens. I would name it
MEMPOOL_F_CHUNK_PAGE_ALIGN to have MEMPOOL_F_ prefix for all flags,
but it is minor. More important is what is behind and I have doubts that 
it is
a right way to go.

I have already asked what is the final goal (but agree that question is 
unclear).
Personally I doubt that the final goal is just having chunk page aligned.

It looks like the patch makes an assumption on how chunk is sliced into
objects/elements and the property of chunk address allows to achieve some
goal with elements start address. It looks very fragile and easily breakable
with other flags (or no flags).

Also prefix should be just "mempool:", not "lib/mempool".

Andrew.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 3/5] lib/mempool: allow page size aligned mempool
  2019-03-29 10:37     ` Andrew Rybchenko
@ 2019-03-29 17:42       ` Olivier Matz
  0 siblings, 0 replies; 214+ messages in thread
From: Olivier Matz @ 2019-03-29 17:42 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Xiaolong Ye, dev, David Marchand, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Ferruh Yigit,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin

Hi,

On Fri, Mar 29, 2019 at 01:37:17PM +0300, Andrew Rybchenko wrote:
> On 3/27/19 12:00 PM, Xiaolong Ye wrote:
> > Allow create a mempool with page size aligned base address.
> > 
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> > ---
> >   lib/librte_mempool/rte_mempool.c | 3 +++
> >   lib/librte_mempool/rte_mempool.h | 1 +
> >   2 files changed, 4 insertions(+)
> > 
> > diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> > index 683b216f9..cfbb49ea5 100644
> > --- a/lib/librte_mempool/rte_mempool.c
> > +++ b/lib/librte_mempool/rte_mempool.c
> > @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp)
> >   		if (try_contig)
> >   			flags |= RTE_MEMZONE_IOVA_CONTIG;
> > +		if (mp->flags & MEMPOOL_CHUNK_F_PAGE_ALIGN)
> > +			align = RTE_MAX(align, (size_t)getpagesize());
> > +
> >   		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
> >   				mp->socket_id, flags, align);
> > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> > index 7c9cd9a2f..47729f7c9 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -264,6 +264,7 @@ struct rte_mempool {
> >   #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
> >   #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
> >   #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
> > +#define MEMPOOL_CHUNK_F_PAGE_ALIGN     0x0040 /**< Chunk's base address is page aligned */
> 
> Now the define name better explains what happens. I would name it
> MEMPOOL_F_CHUNK_PAGE_ALIGN to have MEMPOOL_F_ prefix for all flags,
> but it is minor. More important is what is behind and I have doubts that it
> is
> a right way to go.
> 
> I have already asked what is the final goal (but agree that question is
> unclear).
> Personally I doubt that the final goal is just having chunk page aligned.
> 
> It looks like the patch makes an assumption on how chunk is sliced into
> objects/elements and the property of chunk address allows to achieve some
> goal with elements start address. It looks very fragile and easily breakable
> with other flags (or no flags).
> 
> Also prefix should be just "mempool:", not "lib/mempool".

+1
I agree with Andrew. Please see my comment in patch 4.


Regards,
Olivier

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-26 12:20   ` [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
@ 2019-03-29 17:42     ` Olivier Matz
  2019-03-31 12:38       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Olivier Matz @ 2019-03-29 17:42 UTC (permalink / raw)
  To: Xiaolong Ye
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Ferruh Yigit,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin

Hi Xiaolong,

On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote:
> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
> allocated from rte_mempool can be converted to xdp_desc's address and
> vice versa.
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
>  1 file changed, 72 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 47a496ed7..a1fda9212 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
>  
>  #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>  #define ETH_AF_XDP_NUM_BUFFERS		4096
> -#define ETH_AF_XDP_DATA_HEADROOM	0
> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
> +/* data start from offset 320 (192 + 128) bytes */
> +#define ETH_AF_XDP_DATA_HEADROOM				\
> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)

Having these constants looks quite dangerous too me. It imposes the size
of the mbuf, and the mempool header size. It would at least require
compilation checks.

[...]

> +	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
> +			ETH_AF_XDP_NUM_BUFFERS,
> +			250, 0,
> +			ETH_AF_XDP_FRAME_SIZE -
> +			ETH_AF_XDP_MBUF_OVERHEAD,
> +			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
> +			SOCKET_ID_ANY);
> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
>  		goto err;
>  	}
> +	base_addr = (void *)get_base_addr(umem->mb_pool);

Creating a mempool in the pmd like this does not look good to me for
several reasons:
- the user application creates its mempool with a specific private
  area in its mbufs. Here there is no private area, so it will break
  applications doing this.
- in patch 3 (mempool), you ensure that the chunk starts at a
  page-aligned address, and you expect that given the other flags and
  the constants at the top of the file, the data will be aligned. In
  my opinion it is not ideal.
- the user application may create a large number of mbufs, for instance
  if the application manages large reassembly queues, or tcp sockets.
  Here the driver limits the number of mbufs to 4k per rx queue.
- the socket_id is any, so it won't be efficient on numa architectures.

May I suggest another idea?

Allocate the xsk_umem almost like in patch 1, but using rte_memzone
allocation instead of posix_memalign() (and it will be faster, because
it will use hugepages). And do not allocate any mempool in the driver.

When you receive a packet in the xsk_umem, allocate a new mbuf from
the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the
xsk memory to the mbuf. You will have to register a callback to return
the xsk memory when the mbuf is transmitted or freed.

This is, by the way, something I don't understand in your current
implementation: what happens if a mbuf is received in the af_xdp driver,
and freed by the application? How does the xsk buffer is returned?

Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and
mempool, at the price of (maybe) decreasing the performance. But I think
there are some places where it can be optimized.

I understand my feedback comes late -- as usual :( -- but if you are in
a hurry for RC1, maybe we can consider to put the 1st patch only, and
add the zero-copy mode in a second patch later. What do you think?

Regards,
Olivier

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-29 17:42     ` Olivier Matz
@ 2019-03-31 12:38       ` Ye Xiaolong
  2019-04-01  5:47         ` Zhang, Qi Z
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-03-31 12:38 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, David Marchand, Andrew Rybchenko, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Stephen Hemminger, Ferruh Yigit,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin

Hi, Olivier

Thanks for the comments.

On 03/29, Olivier Matz wrote:
>Hi Xiaolong,
>
>On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote:
>> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
>> allocated from rte_mempool can be converted to xdp_desc's address and
>> vice versa.
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
>>  1 file changed, 72 insertions(+), 45 deletions(-)
>> 
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index 47a496ed7..a1fda9212 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
>>  
>>  #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>>  #define ETH_AF_XDP_NUM_BUFFERS		4096
>> -#define ETH_AF_XDP_DATA_HEADROOM	0
>> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
>> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
>> +/* data start from offset 320 (192 + 128) bytes */
>> +#define ETH_AF_XDP_DATA_HEADROOM				\
>> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
>
>Having these constants looks quite dangerous too me. It imposes the size
>of the mbuf, and the mempool header size. It would at least require
>compilation checks.
>
>[...]
>
>> +	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
>> +			ETH_AF_XDP_NUM_BUFFERS,
>> +			250, 0,
>> +			ETH_AF_XDP_FRAME_SIZE -
>> +			ETH_AF_XDP_MBUF_OVERHEAD,
>> +			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
>> +			SOCKET_ID_ANY);
>> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
>> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
>>  		goto err;
>>  	}
>> +	base_addr = (void *)get_base_addr(umem->mb_pool);
>
>Creating a mempool in the pmd like this does not look good to me for
>several reasons:
>- the user application creates its mempool with a specific private
>  area in its mbufs. Here there is no private area, so it will break
>  applications doing this.
>- in patch 3 (mempool), you ensure that the chunk starts at a
>  page-aligned address, and you expect that given the other flags and
>  the constants at the top of the file, the data will be aligned. In
>  my opinion it is not ideal.
>- the user application may create a large number of mbufs, for instance
>  if the application manages large reassembly queues, or tcp sockets.
>  Here the driver limits the number of mbufs to 4k per rx queue.
>- the socket_id is any, so it won't be efficient on numa architectures.

Our mbuf/mempool change regarding to zero copy does have limitations.

>
>May I suggest another idea?
>
>Allocate the xsk_umem almost like in patch 1, but using rte_memzone
>allocation instead of posix_memalign() (and it will be faster, because
>it will use hugepages). And do not allocate any mempool in the driver.
>

rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in my
first patch.

>When you receive a packet in the xsk_umem, allocate a new mbuf from
>the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the
>xsk memory to the mbuf. You will have to register a callback to return
>the xsk memory when the mbuf is transmitted or freed.

I'll try to investigate how to implement it.

>
>This is, by the way, something I don't understand in your current
>implementation: what happens if a mbuf is received in the af_xdp driver,
>and freed by the application? How does the xsk buffer is returned?

It is coordinated by the fill ring. The fill ring is used by the application (
user space) to send down addr for the kernel to fill in with Rx packet data.
So for the free side, we just return it to the mempool, and each time in 
rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs to fill 
ring, that's how we return the xsk buffer to kernel.

>
>Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and
>mempool, at the price of (maybe) decreasing the performance. But I think
>there are some places where it can be optimized.
>
>I understand my feedback comes late -- as usual :( -- but if you are in

Sorry for not Cc you in my patch set.

>a hurry for RC1, maybe we can consider to put the 1st patch only, and
>add the zero-copy mode in a second patch later. What do you think?

Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first.


Thanks,
Xiaolong

>
>Regards,
>Olivier
>
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management
  2019-03-31 12:38       ` Ye Xiaolong
@ 2019-04-01  5:47         ` Zhang, Qi Z
  0 siblings, 0 replies; 214+ messages in thread
From: Zhang, Qi Z @ 2019-04-01  5:47 UTC (permalink / raw)
  To: Ye, Xiaolong, Olivier Matz
  Cc: dev, David Marchand, Andrew Rybchenko, Karlsson, Magnus, Topel,
	Bjorn, Maxime Coquelin, Stephen Hemminger, Yigit, Ferruh,
	Luca Boccassi, Richardson, Bruce, Ananyev, Konstantin



> -----Original Message-----
> From: Ye, Xiaolong
> Sent: Sunday, March 31, 2019 8:38 PM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn
> <bjorn.topel@intel.com>; Maxime Coquelin <maxime.coquelin@redhat.com>;
> Stephen Hemminger <stephen@networkplumber.org>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Luca Boccassi <bluca@debian.org>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for
> buffer management
> 
> Hi, Olivier
> 
> Thanks for the comments.
> 
> On 03/29, Olivier Matz wrote:
> >Hi Xiaolong,
> >
> >On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote:
> >> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
> >> allocated from rte_mempool can be converted to xdp_desc's address and
> >> vice versa.
> >>
> >> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> >> ---
> >>  drivers/net/af_xdp/rte_eth_af_xdp.c | 117
> >> +++++++++++++++++-----------
> >>  1 file changed, 72 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> index 47a496ed7..a1fda9212 100644
> >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
> >>
> >>  #define ETH_AF_XDP_FRAME_SIZE
> 	XSK_UMEM__DEFAULT_FRAME_SIZE
> >>  #define ETH_AF_XDP_NUM_BUFFERS		4096
> >> -#define ETH_AF_XDP_DATA_HEADROOM	0
> >> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
> >> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
> >> +/* data start from offset 320 (192 + 128) bytes */
> >> +#define ETH_AF_XDP_DATA_HEADROOM				\
> >> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
> >
> >Having these constants looks quite dangerous too me. It imposes the
> >size of the mbuf, and the mempool header size. It would at least
> >require compilation checks.
> >
> >[...]
> >
> >> +	umem->mb_pool =
> rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
> >> +			ETH_AF_XDP_NUM_BUFFERS,
> >> +			250, 0,
> >> +			ETH_AF_XDP_FRAME_SIZE -
> >> +			ETH_AF_XDP_MBUF_OVERHEAD,
> >> +			MEMPOOL_F_NO_SPREAD |
> MEMPOOL_CHUNK_F_PAGE_ALIGN,
> >> +			SOCKET_ID_ANY);
> >> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1)
> {
> >> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
> >>  		goto err;
> >>  	}
> >> +	base_addr = (void *)get_base_addr(umem->mb_pool);
> >
> >Creating a mempool in the pmd like this does not look good to me for
> >several reasons:
> >- the user application creates its mempool with a specific private
> >  area in its mbufs. Here there is no private area, so it will break
> >  applications doing this.
> >- in patch 3 (mempool), you ensure that the chunk starts at a
> >  page-aligned address, and you expect that given the other flags and
> >  the constants at the top of the file, the data will be aligned. In
> >  my opinion it is not ideal.
> >- the user application may create a large number of mbufs, for instance
> >  if the application manages large reassembly queues, or tcp sockets.
> >  Here the driver limits the number of mbufs to 4k per rx queue.
> >- the socket_id is any, so it won't be efficient on numa architectures.
> 
> Our mbuf/mempool change regarding to zero copy does have limitations.

Just want to clarify, the above code is only reached for non-zero copy case.
here we create a private memory pool be used to manage AF_XDP umem, it's not the Rx queues' s memory pool itself.
so I don't think there is concern on the private area and 4k per rx queue for above code.
 
> 
> >
> >May I suggest another idea?
> >
> >Allocate the xsk_umem almost like in patch 1, but using rte_memzone
> >allocation instead of posix_memalign() (and it will be faster, because
> >it will use hugepages). And do not allocate any mempool in the driver.
> >
> 
> rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in my
> first patch.
> 
> >When you receive a packet in the xsk_umem, allocate a new mbuf from the
> >standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the xsk
> >memory to the mbuf. You will have to register a callback to return the
> >xsk memory when the mbuf is transmitted or freed.
> 
> I'll try to investigate how to implement it.
> 
> >
> >This is, by the way, something I don't understand in your current
> >implementation: what happens if a mbuf is received in the af_xdp
> >driver, and freed by the application? How does the xsk buffer is returned?
> 
> It is coordinated by the fill ring. The fill ring is used by the application ( user space)
> to send down addr for the kernel to fill in with Rx packet data.
> So for the free side, we just return it to the mempool, and each time in
> rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs to
> fill ring, that's how we return the xsk buffer to kernel.
> 
> >
> >Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and
> >mempool, at the price of (maybe) decreasing the performance. But I think
> >there are some places where it can be optimized.
> >
> >I understand my feedback comes late -- as usual :( -- but if you are in
> 
> Sorry for not Cc you in my patch set.
> 
> >a hurry for RC1, maybe we can consider to put the 1st patch only, and
> >add the zero-copy mode in a second patch later. What do you think?
> 
> Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first.
> 
> 
> Thanks,
> Xiaolong
> 
> >
> >Regards,
> >Olivier
> >
> >

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v8 0/1] AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (12 preceding siblings ...)
  2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-04-02 10:45 ` Xiaolong Ye
  2019-04-02 10:45   ` [PATCH v8 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-02 15:46 ` [PATCH v9 0/1] Introduce AF_XDP PMD Xiaolong Ye
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-02 10:45 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko, Olivier Matz
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Hi, all

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========

v8:
- address Ferruh's comments on V7
- replace posix_memalign with rte_memzone_reserve_aligned to get better
  performance
- keep the first patch only as Oliver suggested as zero copy part
  implementation is still in suspense, we may provide the related patch
  later.

v7:
- mention mtu limitation in af_xdp.rst
- fix the vdev name in af_xdp.rst

V6:

- remove the newline in AF_XDP_LOG definition to avoid double new lines
  issue.
- rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (1):
  net/af_xdp: introduce AF XDP PMD driver

 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  48 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 945 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1083 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v8 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 10:45 ` [PATCH v8 0/1] AF_XDP PMD Xiaolong Ye
@ 2019-04-02 10:45   ` Xiaolong Ye
  2019-04-02 14:58     ` Stephen Hemminger
  0 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-02 10:45 UTC (permalink / raw)
  To: dev, David Marchand, Andrew Rybchenko, Olivier Matz
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Ferruh Yigit, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  48 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 945 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1083 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index e9ff2b4c2..c13ae8215 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -479,6 +479,13 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/af_xdp.rst
+F: doc/guides/nics/features/af_xdp.ini
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 6292bc4af..b95ee03d7 100644
--- a/config/common_base
+++ b/config/common_base
@@ -430,6 +430,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..af675d910
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,48 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2019 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Note that MTU of AF_XDP PMD is limited due to XDP lacks support for
+fragmentation.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev net_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index aae8ff602..cdf0d4a95 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..8343e3016
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..d40aae190
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..003a1d7db
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,945 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt, __func__, ##args)
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	const struct rte_memzone *mz;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static const struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void
+pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void
+kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->mz->addr,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void
+remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void
+xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	rte_memzone_free(umem->mz);
+	umem->mz = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct
+xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	const struct rte_memzone *mz;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 rte_socket_id(),
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	mz = rte_memzone_reserve_aligned("af_xdp uemem",
+			ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+			getpagesize());
+	if (mz == NULL) {
+		AF_XDP_LOG(ERR, "Failed to reserve memzone for af_xdp umem.\n");
+		goto err;
+	}
+
+	ret = xsk_umem__create(&umem->umem, mz->addr,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->mz = mz;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ] = {'\0'};
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	if (strlen(if_name) == 0) {
+		AF_XDP_LOG(ERR, "Network interface must be specified\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_memzone_free(internals->umem->mz);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.af_xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..f916bc9ef 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v8 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 10:45   ` [PATCH v8 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-04-02 14:58     ` Stephen Hemminger
  2019-04-02 15:10       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-04-02 14:58 UTC (permalink / raw)
  To: Xiaolong Ye
  Cc: dev, David Marchand, Andrew Rybchenko, Olivier Matz, Qi Zhang,
	Karlsson Magnus, Topel Bjorn, Maxime Coquelin, Ferruh Yigit,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin

On Tue,  2 Apr 2019 18:45:54 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev_driver.h>
> +#include <rte_ethdev_vdev.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +#include <rte_bus_vdev.h>
> +#include <rte_string_fns.h>
> +
> +#include <linux/if_ether.h>
> +#include <linux/if_xdp.h>
> +#include <linux/if_link.h>
> +#include <asm/barrier.h>
> +#include <arpa/inet.h>
> +#include <net/if.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +#include <poll.h>
> +#include <bpf/bpf.h>
> +#include <xsk.h>

The ordering here seems surprising. The usual ordering is:

Libc includes:
#include <stdio.h>
...
Sys includes:
#include <sys/types.h>

Linux includes:
#include <linux/if_xdp.h>

DPDK includes:
#include <rte_mbuf.h>


If I run "Include what you use" it has different suggestions.
Some of which you should ignore but overall there are
several good ones.

rte_eth_af_xdp.c should add these lines:
#include <asm/int-ll64.h>           // for __u64
#include <bits/stdint-uintn.h>      // for uint16_t, uint64_t, uint32_t
#include <errno.h>                  // for EINVAL, errno, EAGAIN, ENOMEM, EBUSY
#include <netinet/in.h>             // for IPPROTO_IP
#include <rte_ethdev.h>             // for rte_eth_dev, rte_eth_dev_data
#include <stdlib.h>                 // for strtol
#include <string.h>                 // for NULL, memset, strlen
#include "rte_branch_prediction.h"  // for unlikely
#include "rte_common.h"             // for __rte_unused, RTE_MIN, RTE_PRIORI...
#include "rte_config.h"             // for RTE_PKTMBUF_HEADROOM
#include "rte_dev.h"                // for rte_device, RTE_PMD_REGISTER_PARA...
#include "rte_eal.h"                // for rte_eal_process_type, rte_proc_ty...
#include "rte_ethdev.h"             // for rte_eth_stats, rte_eth_dev_info
#include "rte_ether.h"              // for ETHER_ADDR_LEN, ether_addr
#include "rte_lcore.h"              // for rte_socket_id
#include "rte_log.h"                // for rte_log, RTE_LOG_ERR, RTE_LOG_INFO
#include "rte_memory.h"             // for SOCKET_ID_ANY
#include "rte_memzone.h"            // for rte_memzone_free, RTE_MEMZONE_IOV...
#include "rte_ring.h"               // for rte_ring_free, rte_ring_create
struct rte_mempool;

rte_eth_af_xdp.c should remove these lines:
- #include <arpa/inet.h>  // lines 17-17
- #include <poll.h>  // lines 24-24
- #include <rte_ethdev_vdev.h>  // lines 7-7
- #include <sys/mman.h>  // lines 22-22
- #include <sys/types.h>  // lines 19-19

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v8 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 14:58     ` Stephen Hemminger
@ 2019-04-02 15:10       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-02 15:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, David Marchand, Andrew Rybchenko, Olivier Matz, Qi Zhang,
	Karlsson Magnus, Topel Bjorn, Maxime Coquelin, Ferruh Yigit,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin

On 04/02, Stephen Hemminger wrote:
>On Tue,  2 Apr 2019 18:45:54 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +#include <rte_mbuf.h>
>> +#include <rte_ethdev_driver.h>
>> +#include <rte_ethdev_vdev.h>
>> +#include <rte_malloc.h>
>> +#include <rte_kvargs.h>
>> +#include <rte_bus_vdev.h>
>> +#include <rte_string_fns.h>
>> +
>> +#include <linux/if_ether.h>
>> +#include <linux/if_xdp.h>
>> +#include <linux/if_link.h>
>> +#include <asm/barrier.h>
>> +#include <arpa/inet.h>
>> +#include <net/if.h>
>> +#include <sys/types.h>
>> +#include <sys/socket.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/mman.h>
>> +#include <unistd.h>
>> +#include <poll.h>
>> +#include <bpf/bpf.h>
>> +#include <xsk.h>
>
>The ordering here seems surprising. The usual ordering is:
>
>Libc includes:
>#include <stdio.h>
>...
>Sys includes:
>#include <sys/types.h>
>
>Linux includes:
>#include <linux/if_xdp.h>
>
>DPDK includes:
>#include <rte_mbuf.h>
>
>
>If I run "Include what you use" it has different suggestions.
>Some of which you should ignore but overall there are
>several good ones.
>
>rte_eth_af_xdp.c should add these lines:
>#include <asm/int-ll64.h>           // for __u64
>#include <bits/stdint-uintn.h>      // for uint16_t, uint64_t, uint32_t
>#include <errno.h>                  // for EINVAL, errno, EAGAIN, ENOMEM, EBUSY
>#include <netinet/in.h>             // for IPPROTO_IP
>#include <rte_ethdev.h>             // for rte_eth_dev, rte_eth_dev_data
>#include <stdlib.h>                 // for strtol
>#include <string.h>                 // for NULL, memset, strlen
>#include "rte_branch_prediction.h"  // for unlikely
>#include "rte_common.h"             // for __rte_unused, RTE_MIN, RTE_PRIORI...
>#include "rte_config.h"             // for RTE_PKTMBUF_HEADROOM
>#include "rte_dev.h"                // for rte_device, RTE_PMD_REGISTER_PARA...
>#include "rte_eal.h"                // for rte_eal_process_type, rte_proc_ty...
>#include "rte_ethdev.h"             // for rte_eth_stats, rte_eth_dev_info
>#include "rte_ether.h"              // for ETHER_ADDR_LEN, ether_addr
>#include "rte_lcore.h"              // for rte_socket_id
>#include "rte_log.h"                // for rte_log, RTE_LOG_ERR, RTE_LOG_INFO
>#include "rte_memory.h"             // for SOCKET_ID_ANY
>#include "rte_memzone.h"            // for rte_memzone_free, RTE_MEMZONE_IOV...
>#include "rte_ring.h"               // for rte_ring_free, rte_ring_create
>struct rte_mempool;
>
>rte_eth_af_xdp.c should remove these lines:
>- #include <arpa/inet.h>  // lines 17-17
>- #include <poll.h>  // lines 24-24
>- #include <rte_ethdev_vdev.h>  // lines 7-7
>- #include <sys/mman.h>  // lines 22-22
>- #include <sys/types.h>  // lines 19-19
>

Thanks for pointing out, I'll adjust the headers.

Thanks,
Xiaolong
>
>
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v9 0/1] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (13 preceding siblings ...)
  2019-04-02 10:45 ` [PATCH v8 0/1] AF_XDP PMD Xiaolong Ye
@ 2019-04-02 15:46 ` Xiaolong Ye
  2019-04-02 15:46   ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-03 16:59 ` [PATCH v10 0/1] Introduce AF_XDP PMD Xiaolong Ye
  2019-04-04  8:51 ` [PATCH v11 0/1] Introduce AF_XDP PMD Xiaolong Ye
  16 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-02 15:46 UTC (permalink / raw)
  To: dev, Stephen Hemminger, Ferruh Yigit
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========
V9:
- adjust header files order according to Stephen's suggestion

v8:
- address Ferruh's comments on V7
- replace posix_memalign with rte_memzone_reserve_aligned to get better
  performance
- keep the first patch only as Oliver suggested as zero copy part
  implementation is still in suspense, we may provide the related patch
  later.

v7:
- mention mtu limitation in af_xdp.rst
- fix the vdev name in af_xdp.rst

V6:

- remove the newline in AF_XDP_LOG definition to avoid double new lines
  issue.
- rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build libbpf in tools/lib/bpf, and copy the libbpf.a and libbpf.so to /usr/lib64

   cd tools/lib/bpf
   make

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (1):
  net/af_xdp: introduce AF XDP PMD driver

 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  48 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 956 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1094 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 15:46 ` [PATCH v9 0/1] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-04-02 15:46   ` Xiaolong Ye
  2019-04-02 18:56     ` Stephen Hemminger
                       ` (2 more replies)
  0 siblings, 3 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-02 15:46 UTC (permalink / raw)
  To: dev, Stephen Hemminger, Ferruh Yigit
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  48 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 956 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 13 files changed, 1094 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index e9ff2b4c2..c13ae8215 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -479,6 +479,13 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/af_xdp.rst
+F: doc/guides/nics/features/af_xdp.ini
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 6292bc4af..b95ee03d7 100644
--- a/config/common_base
+++ b/config/common_base
@@ -430,6 +430,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..af675d910
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,48 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2019 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Note that MTU of AF_XDP PMD is limited due to XDP lacks support for
+fragmentation.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev net_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index bdad1ddbe..79e36739f 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -74,6 +74,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..8343e3016
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..d40aae190
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..628b160a2
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,956 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <bpf/bpf.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <xsk.h>
+
+#include <rte_ethdev.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+#include <rte_branch_prediction.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_dev.h>
+#include <rte_eal.h>
+#include <rte_ether.h>
+#include <rte_lcore.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt, __func__, ##args)
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	const struct rte_memzone *mz;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static const struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void
+pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void
+kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->mz->addr,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void
+remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void
+xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	rte_memzone_free(umem->mz);
+	umem->mz = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct
+xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	const struct rte_memzone *mz;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 rte_socket_id(),
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	mz = rte_memzone_reserve_aligned("af_xdp uemem",
+			ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+			getpagesize());
+	if (mz == NULL) {
+		AF_XDP_LOG(ERR, "Failed to reserve memzone for af_xdp umem.\n");
+		goto err;
+	}
+
+	ret = xsk_umem__create(&umem->umem, mz->addr,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->mz = mz;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ] = {'\0'};
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	if (strlen(if_name) == 0) {
+		AF_XDP_LOG(ERR, "Network interface must be specified\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_memzone_free(internals->umem->mz);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.af_xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..f916bc9ef 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 15:46   ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-04-02 18:56     ` Stephen Hemminger
  2019-04-02 23:01       ` Ye Xiaolong
  2019-04-02 19:19     ` Luca Boccassi
  2019-04-02 19:43     ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Ferruh Yigit
  2 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-04-02 18:56 UTC (permalink / raw)
  To: Xiaolong Ye
  Cc: dev, Ferruh Yigit, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On Tue,  2 Apr 2019 23:46:53 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +		/* pull from complete qeueu to leave more space */

Overall looks good, one last spelling error

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 15:46   ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-02 18:56     ` Stephen Hemminger
@ 2019-04-02 19:19     ` Luca Boccassi
  2019-04-03  9:59       ` Ye Xiaolong
  2019-04-02 19:43     ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Ferruh Yigit
  2 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-02 19:19 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger, Ferruh Yigit
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
> diff --git a/drivers/net/af_xdp/Makefile
> b/drivers/net/af_xdp/Makefile
> new file mode 100644
> index 000000000..8343e3016
> --- /dev/null
> +++ b/drivers/net/af_xdp/Makefile
> @@ -0,0 +1,32 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2019 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_af_xdp.a
> +
> +EXPORT_MAP := rte_pmd_af_xdp_version.map
> +
> +LIBABIVER := 1
> +
> +CFLAGS += -O3
> +
> +# require kernel version >= v5.1-rc1
> +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
> +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf

Sorry for not noticing this before, but doesn't this require the full
kernel tree rather than just the typical headers package? Requiring the
full kernel tree to be available at build time will make this
unbuildable on distros that still use makefiles, like RHEL and SUSE. At
least on Debian and Ubuntu, the kernel headers packages distributed do
not include the full kernel tree, only the headers, so there's no
tools/lib or tools/include.

Like other dependencies, this should assume they are installed as
regular libraries, eg:

CFLAGS += $(shell command -v pkg-config > /dev/null 2>&1 && pkg-config --cflags libbpf || echo "-I/usr/include/bpf")

> +CFLAGS += $(WERROR_FLAGS)
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
> +LDLIBS += -lrte_bus_vdev
> +LDLIBS += -lbpf

LDLIBS += $(shell command -v pkg-config > /dev/null 2>&1 && pkg-config --libs libbpf || echo "-lbpf")

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 15:46   ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-02 18:56     ` Stephen Hemminger
  2019-04-02 19:19     ` Luca Boccassi
@ 2019-04-02 19:43     ` Ferruh Yigit
  2019-04-03 13:22       ` Bruce Richardson
  2 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-02 19:43 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Luca Boccassi, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz

On 4/2/2019 4:46 PM, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

<...>

> @@ -0,0 +1,956 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation.
> + */

> +#include <bpf/bpf.h>
> +#include <xsk.h>

Under linux, both headers are in same 'bpf' folder, why one included as
'bpf/bpf.h' but other 'xsk.h'?

Perhaps this is not problem when headers are installed into system folders, but
I am compiling using RTE_KERNELDIR, which used in Makefile as:
 -I$(RTE_KERNELDIR)/tools/lib/bpf

This fails to find 'bpf/bpf.h'

Also for '-lbpf', shouldn't need to add '-L$(RTE_KERNELDIR)/tools/lib/bpf', to
new added line in 'rte.app.mk', so that it can find the library?

I assume you are building in a system with new kernel, I think you need this for
functionality, where 'xsk.h' is located in that case? Because I was thinking
building and installing libbpf can solve the issue but it is not installing
'xsk.h', not sure why, so not exactly solving.

if you still need "CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf" for your case,
does it make sense update as following:
 CFLAGS += -I$(RTE_KERNELDIR)/tools/lib
 #include <bpf/xsk.h>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver
  2019-03-28 18:52       ` Luca Boccassi
@ 2019-04-02 19:55         ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-02 19:55 UTC (permalink / raw)
  To: Luca Boccassi, Xiaolong Ye, dev, David Marchand, Andrew Rybchenko
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Stephen Hemminger, Bruce Richardson, Ananyev Konstantin

On 3/28/2019 6:52 PM, Luca Boccassi wrote:
> On Thu, 2019-03-28 at 17:51 +0000, Ferruh Yigit wrote:
>>> @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  +=
>>> -lrte_mempool_dpaa2
>>>   endif
>>>   
>>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  +=
>>> -lrte_pmd_af_packet
>>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
>>> -lelf -lbpf
>>
>> Is "-lelf" still required?
> 
> This was fixed upstream in the bpf tree by Björn:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=89dedaef49d36adc2bb5e7e4c38b52fa3013c7c8
> 

ahh, but this is bpf tree. So it means this change won't be in 5.1 right?
Should we add this back for 5.1?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 18:56     ` Stephen Hemminger
@ 2019-04-02 23:01       ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-02 23:01 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Ferruh Yigit, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Luca Boccassi, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On 04/02, Stephen Hemminger wrote:
>On Tue,  2 Apr 2019 23:46:53 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +		/* pull from complete qeueu to leave more space */
>
>Overall looks good, one last spelling error

Sorry for the typo, will fix in in next version.

Thanks,
Xiaolong

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 19:19     ` Luca Boccassi
@ 2019-04-03  9:59       ` Ye Xiaolong
  2019-04-03 10:36         ` Luca Boccassi
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-03  9:59 UTC (permalink / raw)
  To: Luca Boccassi
  Cc: dev, Stephen Hemminger, Ferruh Yigit, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

Hi, Luca

On 04/02, Luca Boccassi wrote:
>On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
>> diff --git a/drivers/net/af_xdp/Makefile
>> b/drivers/net/af_xdp/Makefile
>> new file mode 100644
>> index 000000000..8343e3016
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/Makefile
>> @@ -0,0 +1,32 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2019 Intel Corporation
>> +
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +#
>> +# library name
>> +#
>> +LIB = librte_pmd_af_xdp.a
>> +
>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +CFLAGS += -O3
>> +
>> +# require kernel version >= v5.1-rc1
>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
>
>Sorry for not noticing this before, but doesn't this require the full
>kernel tree rather than just the typical headers package? Requiring the
>full kernel tree to be available at build time will make this
>unbuildable on distros that still use makefiles, like RHEL and SUSE. At
>least on Debian and Ubuntu, the kernel headers packages distributed do
>not include the full kernel tree, only the headers, so there's no
>tools/lib or tools/include.

Currently we do have dependencies on the kernel src tree, as xsk.h and
asm/barrier wouldn't be installed by libbpf, so before libbpf handles these
properly, can we keep the current RTE_KERNELDIR in Makefile for now, and mention
the dependencies in document, also suggest users to config RTE_KERNELDIR to correct
kernel src tree if they want to use af_xdp pmd?

Something like:

dependencies:
- kernel source code (>= v5.1-rc1)
- build libbfp and install

Thanks,
Xiaolong
>
>Like other dependencies, this should assume they are installed as
>regular libraries, eg:
>
>CFLAGS += $(shell command -v pkg-config > /dev/null 2>&1 && pkg-config --cflags libbpf || echo "-I/usr/include/bpf")
>
>> +CFLAGS += $(WERROR_FLAGS)
>> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
>> +LDLIBS += -lrte_bus_vdev
>> +LDLIBS += -lbpf
>
>LDLIBS += $(shell command -v pkg-config > /dev/null 2>&1 && pkg-config --libs libbpf || echo "-lbpf")
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03  9:59       ` Ye Xiaolong
@ 2019-04-03 10:36         ` Luca Boccassi
  2019-04-03 10:42           ` Luca Boccassi
  0 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 10:36 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
> Hi, Luca
> 
> On 04/02, Luca Boccassi wrote:
> > On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
> > > diff --git a/drivers/net/af_xdp/Makefile
> > > b/drivers/net/af_xdp/Makefile
> > > new file mode 100644
> > > index 000000000..8343e3016
> > > --- /dev/null
> > > +++ b/drivers/net/af_xdp/Makefile
> > > @@ -0,0 +1,32 @@
> > > +# SPDX-License-Identifier: BSD-3-Clause
> > > +# Copyright(c) 2019 Intel Corporation
> > > +
> > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > +
> > > +#
> > > +# library name
> > > +#
> > > +LIB = librte_pmd_af_xdp.a
> > > +
> > > +EXPORT_MAP := rte_pmd_af_xdp_version.map
> > > +
> > > +LIBABIVER := 1
> > > +
> > > +CFLAGS += -O3
> > > +
> > > +# require kernel version >= v5.1-rc1
> > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
> > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
> > 
> > Sorry for not noticing this before, but doesn't this require the
> > full
> > kernel tree rather than just the typical headers package? Requiring
> > the
> > full kernel tree to be available at build time will make this
> > unbuildable on distros that still use makefiles, like RHEL and
> > SUSE. At
> > least on Debian and Ubuntu, the kernel headers packages distributed
> > do
> > not include the full kernel tree, only the headers, so there's no
> > tools/lib or tools/include.
> 
> Currently we do have dependencies on the kernel src tree, as xsk.h
> and
> asm/barrier wouldn't be installed by libbpf, so before libbpf handles
> these
> properly, can we keep the current RTE_KERNELDIR in Makefile for now,
> and mention
> the dependencies in document, also suggest users to config
> RTE_KERNELDIR to correct
> kernel src tree if they want to use af_xdp pmd?
> 
> Something like:
> 
> dependencies:
> - kernel source code (>= v5.1-rc1)
> - build libbfp and install
> 
> Thanks,
> Xiaolong

asm/barrier.h is installed by the kernel headers packages so it would
be fine (although not ideal) and not need the full source tree.
xsk.h is a bit more worrying, as it looks like an internal header from
here.

Is it really necessary for external applications to use an internal-
only header and a kernel header to be able to use libbpf?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 10:36         ` Luca Boccassi
@ 2019-04-03 10:42           ` Luca Boccassi
  2019-04-03 11:18             ` Ferruh Yigit
  0 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 10:42 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
> On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
> > Hi, Luca
> > 
> > On 04/02, Luca Boccassi wrote:
> > > On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
> > > > diff --git a/drivers/net/af_xdp/Makefile
> > > > b/drivers/net/af_xdp/Makefile
> > > > new file mode 100644
> > > > index 000000000..8343e3016
> > > > --- /dev/null
> > > > +++ b/drivers/net/af_xdp/Makefile
> > > > @@ -0,0 +1,32 @@
> > > > +# SPDX-License-Identifier: BSD-3-Clause
> > > > +# Copyright(c) 2019 Intel Corporation
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > +
> > > > +#
> > > > +# library name
> > > > +#
> > > > +LIB = librte_pmd_af_xdp.a
> > > > +
> > > > +EXPORT_MAP := rte_pmd_af_xdp_version.map
> > > > +
> > > > +LIBABIVER := 1
> > > > +
> > > > +CFLAGS += -O3
> > > > +
> > > > +# require kernel version >= v5.1-rc1
> > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
> > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
> > > 
> > > Sorry for not noticing this before, but doesn't this require the
> > > full
> > > kernel tree rather than just the typical headers package?
> > > Requiring
> > > the
> > > full kernel tree to be available at build time will make this
> > > unbuildable on distros that still use makefiles, like RHEL and
> > > SUSE. At
> > > least on Debian and Ubuntu, the kernel headers packages
> > > distributed
> > > do
> > > not include the full kernel tree, only the headers, so there's no
> > > tools/lib or tools/include.
> > 
> > Currently we do have dependencies on the kernel src tree, as xsk.h
> > and
> > asm/barrier wouldn't be installed by libbpf, so before libbpf
> > handles
> > these
> > properly, can we keep the current RTE_KERNELDIR in Makefile for
> > now,
> > and mention
> > the dependencies in document, also suggest users to config
> > RTE_KERNELDIR to correct
> > kernel src tree if they want to use af_xdp pmd?
> > 
> > Something like:
> > 
> > dependencies:
> > - kernel source code (>= v5.1-rc1)
> > - build libbfp and install
> > 
> > Thanks,
> > Xiaolong
> 
> asm/barrier.h is installed by the kernel headers packages so it would
> be fine (although not ideal) and not need the full source tree.
> xsk.h is a bit more worrying, as it looks like an internal header
> from
> here.
> 
> Is it really necessary for external applications to use an internal-
> only header and a kernel header to be able to use libbpf?

Actually, xsk.h is now installed by the library makefile:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b

So the full kernel source tree is no longer required.

Is asm/barrier.h really required? Isn't there an userspace alternative?

Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
userspace header so it is not covered by the userspace exception, which
means at the very least the af_xdp PMD shared object is also licensed
under GPL-2.0 only, isn't it?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 10:42           ` Luca Boccassi
@ 2019-04-03 11:18             ` Ferruh Yigit
  2019-04-03 11:35               ` Luca Boccassi
  0 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-03 11:18 UTC (permalink / raw)
  To: Luca Boccassi, Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On 4/3/2019 11:42 AM, Luca Boccassi wrote:
> On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
>> On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
>>> Hi, Luca
>>>
>>> On 04/02, Luca Boccassi wrote:
>>>> On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
>>>>> diff --git a/drivers/net/af_xdp/Makefile
>>>>> b/drivers/net/af_xdp/Makefile
>>>>> new file mode 100644
>>>>> index 000000000..8343e3016
>>>>> --- /dev/null
>>>>> +++ b/drivers/net/af_xdp/Makefile
>>>>> @@ -0,0 +1,32 @@
>>>>> +# SPDX-License-Identifier: BSD-3-Clause
>>>>> +# Copyright(c) 2019 Intel Corporation
>>>>> +
>>>>> +include $(RTE_SDK)/mk/rte.vars.mk
>>>>> +
>>>>> +#
>>>>> +# library name
>>>>> +#
>>>>> +LIB = librte_pmd_af_xdp.a
>>>>> +
>>>>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>>>>> +
>>>>> +LIBABIVER := 1
>>>>> +
>>>>> +CFLAGS += -O3
>>>>> +
>>>>> +# require kernel version >= v5.1-rc1
>>>>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
>>>>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
>>>>
>>>> Sorry for not noticing this before, but doesn't this require the
>>>> full
>>>> kernel tree rather than just the typical headers package?
>>>> Requiring
>>>> the
>>>> full kernel tree to be available at build time will make this
>>>> unbuildable on distros that still use makefiles, like RHEL and
>>>> SUSE. At
>>>> least on Debian and Ubuntu, the kernel headers packages
>>>> distributed
>>>> do
>>>> not include the full kernel tree, only the headers, so there's no
>>>> tools/lib or tools/include.
>>>
>>> Currently we do have dependencies on the kernel src tree, as xsk.h
>>> and
>>> asm/barrier wouldn't be installed by libbpf, so before libbpf
>>> handles
>>> these
>>> properly, can we keep the current RTE_KERNELDIR in Makefile for
>>> now,
>>> and mention
>>> the dependencies in document, also suggest users to config
>>> RTE_KERNELDIR to correct
>>> kernel src tree if they want to use af_xdp pmd?
>>>
>>> Something like:
>>>
>>> dependencies:
>>> - kernel source code (>= v5.1-rc1)
>>> - build libbfp and install
>>>
>>> Thanks,
>>> Xiaolong
>>
>> asm/barrier.h is installed by the kernel headers packages so it would
>> be fine (although not ideal) and not need the full source tree.
>> xsk.h is a bit more worrying, as it looks like an internal header
>> from
>> here.
>>
>> Is it really necessary for external applications to use an internal-
>> only header and a kernel header to be able to use libbpf?
> 
> Actually, xsk.h is now installed by the library makefile:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b

Good to have this one. But again it is in BPF tree and it won't be in 5.1.

I suggested changing code as following for now, it would help to keep changes
small when above patch merged into kernel:
 CFLAGS += -I$(RTE_KERNELDIR)/tools/lib [in makefile]
 #include <bpf/xsk.h>                   [in .c file]

> 
> So the full kernel source tree is no longer required.
> 
> Is asm/barrier.h really required? Isn't there an userspace alternative?

The 'asm/barrier.h' in the kernel headers and the 'tools/include/asm/barrier.h'
looks different, the one in the kernel source has dependency to other kernel
headers.

I wonder same thing, what is used from 'tools/include/asm/barrier.h' and if it
can be avoided.

Anyway, as Xiaolong mentioned, following is working, can it work from a distro
point of view:
- get kernel source code (>= v5.1-rc1)
- build libbfp and install
- set 'RTE_KERNELDIR' to point kernel source path
- build dpdk with af_xdp enabled

> 
> Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
> userspace header so it is not covered by the userspace exception, which
> means at the very least the af_xdp PMD shared object is also licensed
> under GPL-2.0 only, isn't it?
> 

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 11:18             ` Ferruh Yigit
@ 2019-04-03 11:35               ` Luca Boccassi
  2019-04-03 12:16                 ` Luca Boccassi
  2019-04-03 13:09                 ` Ferruh Yigit
  0 siblings, 2 replies; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 11:35 UTC (permalink / raw)
  To: Ferruh Yigit, Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On Wed, 2019-04-03 at 12:18 +0100, Ferruh Yigit wrote:
> On 4/3/2019 11:42 AM, Luca Boccassi wrote:
> > On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
> > > On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
> > > > Hi, Luca
> > > > 
> > > > On 04/02, Luca Boccassi wrote:
> > > > > On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
> > > > > > diff --git a/drivers/net/af_xdp/Makefile
> > > > > > b/drivers/net/af_xdp/Makefile
> > > > > > new file mode 100644
> > > > > > index 000000000..8343e3016
> > > > > > --- /dev/null
> > > > > > +++ b/drivers/net/af_xdp/Makefile
> > > > > > @@ -0,0 +1,32 @@
> > > > > > +# SPDX-License-Identifier: BSD-3-Clause
> > > > > > +# Copyright(c) 2019 Intel Corporation
> > > > > > +
> > > > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > +
> > > > > > +#
> > > > > > +# library name
> > > > > > +#
> > > > > > +LIB = librte_pmd_af_xdp.a
> > > > > > +
> > > > > > +EXPORT_MAP := rte_pmd_af_xdp_version.map
> > > > > > +
> > > > > > +LIBABIVER := 1
> > > > > > +
> > > > > > +CFLAGS += -O3
> > > > > > +
> > > > > > +# require kernel version >= v5.1-rc1
> > > > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
> > > > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
> > > > > 
> > > > > Sorry for not noticing this before, but doesn't this require
> > > > > the
> > > > > full
> > > > > kernel tree rather than just the typical headers package?
> > > > > Requiring
> > > > > the
> > > > > full kernel tree to be available at build time will make this
> > > > > unbuildable on distros that still use makefiles, like RHEL
> > > > > and
> > > > > SUSE. At
> > > > > least on Debian and Ubuntu, the kernel headers packages
> > > > > distributed
> > > > > do
> > > > > not include the full kernel tree, only the headers, so
> > > > > there's no
> > > > > tools/lib or tools/include.
> > > > 
> > > > Currently we do have dependencies on the kernel src tree, as
> > > > xsk.h
> > > > and
> > > > asm/barrier wouldn't be installed by libbpf, so before libbpf
> > > > handles
> > > > these
> > > > properly, can we keep the current RTE_KERNELDIR in Makefile for
> > > > now,
> > > > and mention
> > > > the dependencies in document, also suggest users to config
> > > > RTE_KERNELDIR to correct
> > > > kernel src tree if they want to use af_xdp pmd?
> > > > 
> > > > Something like:
> > > > 
> > > > dependencies:
> > > > - kernel source code (>= v5.1-rc1)
> > > > - build libbfp and install
> > > > 
> > > > Thanks,
> > > > Xiaolong
> > > 
> > > asm/barrier.h is installed by the kernel headers packages so it
> > > would
> > > be fine (although not ideal) and not need the full source tree.
> > > xsk.h is a bit more worrying, as it looks like an internal header
> > > from
> > > here.
> > > 
> > > Is it really necessary for external applications to use an
> > > internal-
> > > only header and a kernel header to be able to use libbpf?
> > 
> > Actually, xsk.h is now installed by the library makefile:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b
> > 
> 
> Good to have this one. But again it is in BPF tree and it won't be in
> 5.1.

It looks like a small and required bug fix to me, and 5.1 is still in
RC state, so perhaps there's still time.

Bjorn and Magnus, any chance the above makefile install fix could be
sent for inclusion in 5.1-rc4?

> I suggested changing code as following for now, it would help to keep
> changes
> small when above patch merged into kernel:
>  CFLAGS += -I$(RTE_KERNELDIR)/tools/lib [in makefile]
>  #include <bpf/xsk.h>                   [in .c file]
> 
> > So the full kernel source tree is no longer required.
> > 
> > Is asm/barrier.h really required? Isn't there an userspace
> > alternative?
> 
> The 'asm/barrier.h' in the kernel headers and the
> 'tools/include/asm/barrier.h'
> looks different, the one in the kernel source has dependency to other
> kernel
> headers.
> 
> I wonder same thing, what is used from 'tools/include/asm/barrier.h'
> and if it
> can be avoided.

The one in tools/include also is GPL-2.0 only so it cannot be included
from the PMD, which is BSD-3-clause only (and it recursively includes
the other arch-specific kernel headers)

> Anyway, as Xiaolong mentioned, following is working, can it work from
> a distro
> point of view:
> - get kernel source code (>= v5.1-rc1)
> - build libbfp and install
> - set 'RTE_KERNELDIR' to point kernel source path
> - build dpdk with af_xdp enabled

As long as the full kernel tree is required, we cannot enable it in
Debian and Ubuntu - we can't have it at build time on the build
workers, and also there's the licensing problem.

> > Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
> > userspace header so it is not covered by the userspace exception,
> > which
> > means at the very least the af_xdp PMD shared object is also
> > licensed
> > under GPL-2.0 only, isn't it?
> > 
> 
> 
-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 11:35               ` Luca Boccassi
@ 2019-04-03 12:16                 ` Luca Boccassi
  2019-04-03 12:33                   ` Ferruh Yigit
  2019-04-03 13:09                 ` Ferruh Yigit
  1 sibling, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 12:16 UTC (permalink / raw)
  To: Ferruh Yigit, Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On Wed, 2019-04-03 at 12:35 +0100, Luca Boccassi wrote:
> On Wed, 2019-04-03 at 12:18 +0100, Ferruh Yigit wrote:
> > On 4/3/2019 11:42 AM, Luca Boccassi wrote:
> > > On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
> > > > On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
> > > > > Hi, Luca
> > > > > 
> > > > > On 04/02, Luca Boccassi wrote:
> > > > > > On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
> > > > > > > diff --git a/drivers/net/af_xdp/Makefile
> > > > > > > b/drivers/net/af_xdp/Makefile
> > > > > > > new file mode 100644
> > > > > > > index 000000000..8343e3016
> > > > > > > --- /dev/null
> > > > > > > +++ b/drivers/net/af_xdp/Makefile
> > > > > > > @@ -0,0 +1,32 @@
> > > > > > > +# SPDX-License-Identifier: BSD-3-Clause
> > > > > > > +# Copyright(c) 2019 Intel Corporation
> > > > > > > +
> > > > > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > +
> > > > > > > +#
> > > > > > > +# library name
> > > > > > > +#
> > > > > > > +LIB = librte_pmd_af_xdp.a
> > > > > > > +
> > > > > > > +EXPORT_MAP := rte_pmd_af_xdp_version.map
> > > > > > > +
> > > > > > > +LIBABIVER := 1
> > > > > > > +
> > > > > > > +CFLAGS += -O3
> > > > > > > +
> > > > > > > +# require kernel version >= v5.1-rc1
> > > > > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
> > > > > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
> > > > > > 
> > > > > > Sorry for not noticing this before, but doesn't this
> > > > > > require
> > > > > > the
> > > > > > full
> > > > > > kernel tree rather than just the typical headers package?
> > > > > > Requiring
> > > > > > the
> > > > > > full kernel tree to be available at build time will make
> > > > > > this
> > > > > > unbuildable on distros that still use makefiles, like RHEL
> > > > > > and
> > > > > > SUSE. At
> > > > > > least on Debian and Ubuntu, the kernel headers packages
> > > > > > distributed
> > > > > > do
> > > > > > not include the full kernel tree, only the headers, so
> > > > > > there's no
> > > > > > tools/lib or tools/include.
> > > > > 
> > > > > Currently we do have dependencies on the kernel src tree, as
> > > > > xsk.h
> > > > > and
> > > > > asm/barrier wouldn't be installed by libbpf, so before libbpf
> > > > > handles
> > > > > these
> > > > > properly, can we keep the current RTE_KERNELDIR in Makefile
> > > > > for
> > > > > now,
> > > > > and mention
> > > > > the dependencies in document, also suggest users to config
> > > > > RTE_KERNELDIR to correct
> > > > > kernel src tree if they want to use af_xdp pmd?
> > > > > 
> > > > > Something like:
> > > > > 
> > > > > dependencies:
> > > > > - kernel source code (>= v5.1-rc1)
> > > > > - build libbfp and install
> > > > > 
> > > > > Thanks,
> > > > > Xiaolong
> > > > 
> > > > asm/barrier.h is installed by the kernel headers packages so it
> > > > would
> > > > be fine (although not ideal) and not need the full source tree.
> > > > xsk.h is a bit more worrying, as it looks like an internal
> > > > header
> > > > from
> > > > here.
> > > > 
> > > > Is it really necessary for external applications to use an
> > > > internal-
> > > > only header and a kernel header to be able to use libbpf?
> > > 
> > > Actually, xsk.h is now installed by the library makefile:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b
> > > 
> > > 
> > 
> > Good to have this one. But again it is in BPF tree and it won't be
> > in
> > 5.1.
> 
> It looks like a small and required bug fix to me, and 5.1 is still in
> RC state, so perhaps there's still time.
> 
> Bjorn and Magnus, any chance the above makefile install fix could be
> sent for inclusion in 5.1-rc4?

Actually the bpf tree was already merged in the net tree a couple of
days ago. As far as I understand from the process, this should mean
that this fix should be set for inclusion in Linus' tree in time for
5.1-rc4:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit?id=379e2014c95b7a454713da822b8ef4ec51ab8a75

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 12:16                 ` Luca Boccassi
@ 2019-04-03 12:33                   ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-03 12:33 UTC (permalink / raw)
  To: Luca Boccassi, Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On 4/3/2019 1:16 PM, Luca Boccassi wrote:
> On Wed, 2019-04-03 at 12:35 +0100, Luca Boccassi wrote:
>> On Wed, 2019-04-03 at 12:18 +0100, Ferruh Yigit wrote:
>>> On 4/3/2019 11:42 AM, Luca Boccassi wrote:
>>>> On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
>>>>> On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
>>>>>> Hi, Luca
>>>>>>
>>>>>> On 04/02, Luca Boccassi wrote:
>>>>>>> On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
>>>>>>>> diff --git a/drivers/net/af_xdp/Makefile
>>>>>>>> b/drivers/net/af_xdp/Makefile
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000..8343e3016
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/drivers/net/af_xdp/Makefile
>>>>>>>> @@ -0,0 +1,32 @@
>>>>>>>> +# SPDX-License-Identifier: BSD-3-Clause
>>>>>>>> +# Copyright(c) 2019 Intel Corporation
>>>>>>>> +
>>>>>>>> +include $(RTE_SDK)/mk/rte.vars.mk
>>>>>>>> +
>>>>>>>> +#
>>>>>>>> +# library name
>>>>>>>> +#
>>>>>>>> +LIB = librte_pmd_af_xdp.a
>>>>>>>> +
>>>>>>>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>>>>>>>> +
>>>>>>>> +LIBABIVER := 1
>>>>>>>> +
>>>>>>>> +CFLAGS += -O3
>>>>>>>> +
>>>>>>>> +# require kernel version >= v5.1-rc1
>>>>>>>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
>>>>>>>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
>>>>>>>
>>>>>>> Sorry for not noticing this before, but doesn't this
>>>>>>> require
>>>>>>> the
>>>>>>> full
>>>>>>> kernel tree rather than just the typical headers package?
>>>>>>> Requiring
>>>>>>> the
>>>>>>> full kernel tree to be available at build time will make
>>>>>>> this
>>>>>>> unbuildable on distros that still use makefiles, like RHEL
>>>>>>> and
>>>>>>> SUSE. At
>>>>>>> least on Debian and Ubuntu, the kernel headers packages
>>>>>>> distributed
>>>>>>> do
>>>>>>> not include the full kernel tree, only the headers, so
>>>>>>> there's no
>>>>>>> tools/lib or tools/include.
>>>>>>
>>>>>> Currently we do have dependencies on the kernel src tree, as
>>>>>> xsk.h
>>>>>> and
>>>>>> asm/barrier wouldn't be installed by libbpf, so before libbpf
>>>>>> handles
>>>>>> these
>>>>>> properly, can we keep the current RTE_KERNELDIR in Makefile
>>>>>> for
>>>>>> now,
>>>>>> and mention
>>>>>> the dependencies in document, also suggest users to config
>>>>>> RTE_KERNELDIR to correct
>>>>>> kernel src tree if they want to use af_xdp pmd?
>>>>>>
>>>>>> Something like:
>>>>>>
>>>>>> dependencies:
>>>>>> - kernel source code (>= v5.1-rc1)
>>>>>> - build libbfp and install
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaolong
>>>>>
>>>>> asm/barrier.h is installed by the kernel headers packages so it
>>>>> would
>>>>> be fine (although not ideal) and not need the full source tree.
>>>>> xsk.h is a bit more worrying, as it looks like an internal
>>>>> header
>>>>> from
>>>>> here.
>>>>>
>>>>> Is it really necessary for external applications to use an
>>>>> internal-
>>>>> only header and a kernel header to be able to use libbpf?
>>>>
>>>> Actually, xsk.h is now installed by the library makefile:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b
>>>>
>>>>
>>>
>>> Good to have this one. But again it is in BPF tree and it won't be
>>> in
>>> 5.1.
>>
>> It looks like a small and required bug fix to me, and 5.1 is still in
>> RC state, so perhaps there's still time.
>>
>> Bjorn and Magnus, any chance the above makefile install fix could be
>> sent for inclusion in 5.1-rc4?
> 
> Actually the bpf tree was already merged in the net tree a couple of
> days ago. As far as I understand from the process, this should mean
> that this fix should be set for inclusion in Linus' tree in time for
> 5.1-rc4:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit?id=379e2014c95b7a454713da822b8ef4ec51ab8a75
> 

My bad, it seems 'bpf' tree can be merged into 'net' tree, but also there is
another 'bpf-next' for further releases.

So I believe it is OK to expect the fixes you pointed in 'bpf' tree to be in 5.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 11:35               ` Luca Boccassi
  2019-04-03 12:16                 ` Luca Boccassi
@ 2019-04-03 13:09                 ` Ferruh Yigit
  2019-04-03 13:29                   ` Luca Boccassi
  2019-04-03 14:22                   ` Ye Xiaolong
  1 sibling, 2 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-03 13:09 UTC (permalink / raw)
  To: Luca Boccassi, Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On 4/3/2019 12:35 PM, Luca Boccassi wrote:
> On Wed, 2019-04-03 at 12:18 +0100, Ferruh Yigit wrote:
>> On 4/3/2019 11:42 AM, Luca Boccassi wrote:
>>> On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
>>>> On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
>>>>> Hi, Luca
>>>>>
>>>>> On 04/02, Luca Boccassi wrote:
>>>>>> On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
>>>>>>> diff --git a/drivers/net/af_xdp/Makefile
>>>>>>> b/drivers/net/af_xdp/Makefile
>>>>>>> new file mode 100644
>>>>>>> index 000000000..8343e3016
>>>>>>> --- /dev/null
>>>>>>> +++ b/drivers/net/af_xdp/Makefile
>>>>>>> @@ -0,0 +1,32 @@
>>>>>>> +# SPDX-License-Identifier: BSD-3-Clause
>>>>>>> +# Copyright(c) 2019 Intel Corporation
>>>>>>> +
>>>>>>> +include $(RTE_SDK)/mk/rte.vars.mk
>>>>>>> +
>>>>>>> +#
>>>>>>> +# library name
>>>>>>> +#
>>>>>>> +LIB = librte_pmd_af_xdp.a
>>>>>>> +
>>>>>>> +EXPORT_MAP := rte_pmd_af_xdp_version.map
>>>>>>> +
>>>>>>> +LIBABIVER := 1
>>>>>>> +
>>>>>>> +CFLAGS += -O3
>>>>>>> +
>>>>>>> +# require kernel version >= v5.1-rc1
>>>>>>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
>>>>>>> +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
>>>>>>
>>>>>> Sorry for not noticing this before, but doesn't this require
>>>>>> the
>>>>>> full
>>>>>> kernel tree rather than just the typical headers package?
>>>>>> Requiring
>>>>>> the
>>>>>> full kernel tree to be available at build time will make this
>>>>>> unbuildable on distros that still use makefiles, like RHEL
>>>>>> and
>>>>>> SUSE. At
>>>>>> least on Debian and Ubuntu, the kernel headers packages
>>>>>> distributed
>>>>>> do
>>>>>> not include the full kernel tree, only the headers, so
>>>>>> there's no
>>>>>> tools/lib or tools/include.
>>>>>
>>>>> Currently we do have dependencies on the kernel src tree, as
>>>>> xsk.h
>>>>> and
>>>>> asm/barrier wouldn't be installed by libbpf, so before libbpf
>>>>> handles
>>>>> these
>>>>> properly, can we keep the current RTE_KERNELDIR in Makefile for
>>>>> now,
>>>>> and mention
>>>>> the dependencies in document, also suggest users to config
>>>>> RTE_KERNELDIR to correct
>>>>> kernel src tree if they want to use af_xdp pmd?
>>>>>
>>>>> Something like:
>>>>>
>>>>> dependencies:
>>>>> - kernel source code (>= v5.1-rc1)
>>>>> - build libbfp and install
>>>>>
>>>>> Thanks,
>>>>> Xiaolong
>>>>
>>>> asm/barrier.h is installed by the kernel headers packages so it
>>>> would
>>>> be fine (although not ideal) and not need the full source tree.
>>>> xsk.h is a bit more worrying, as it looks like an internal header
>>>> from
>>>> here.
>>>>
>>>> Is it really necessary for external applications to use an
>>>> internal-
>>>> only header and a kernel header to be able to use libbpf?
>>>
>>> Actually, xsk.h is now installed by the library makefile:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b
>>>
>>
>> Good to have this one. But again it is in BPF tree and it won't be in
>> 5.1.
> 
> It looks like a small and required bug fix to me, and 5.1 is still in
> RC state, so perhaps there's still time.
> 
> Bjorn and Magnus, any chance the above makefile install fix could be
> sent for inclusion in 5.1-rc4?
> 
>> I suggested changing code as following for now, it would help to keep
>> changes
>> small when above patch merged into kernel:
>>  CFLAGS += -I$(RTE_KERNELDIR)/tools/lib [in makefile]
>>  #include <bpf/xsk.h>                   [in .c file]
>>
>>> So the full kernel source tree is no longer required.
>>>
>>> Is asm/barrier.h really required? Isn't there an userspace
>>> alternative?
>>
>> The 'asm/barrier.h' in the kernel headers and the
>> 'tools/include/asm/barrier.h'
>> looks different, the one in the kernel source has dependency to other
>> kernel
>> headers.
>>
>> I wonder same thing, what is used from 'tools/include/asm/barrier.h'
>> and if it
>> can be avoided.

It seems, 'tools/include/asm/barrier.h' is required for 'smp_wmb()' &
'smp_rmb()' in 'xsk.h'.
We have equivalents of these in DPDK [1], and perhaps it can be possible to use
them and not include this header at all.

in 'rte_eth_af_xdp.c', before including 'xsk.h', we can include an local
compatibility header which does following should work:
#define smp_rmb() rte_rmb()
#define smp_wmb() rte_wmb()

@Xiaolong, what do you think?

[1]
https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/x86/rte_atomic.h?h=v19.02#n30

> 
> The one in tools/include also is GPL-2.0 only so it cannot be included
> from the PMD, which is BSD-3-clause only (and it recursively includes
> the other arch-specific kernel headers)
> 
>> Anyway, as Xiaolong mentioned, following is working, can it work from
>> a distro
>> point of view:
>> - get kernel source code (>= v5.1-rc1)
>> - build libbfp and install
>> - set 'RTE_KERNELDIR' to point kernel source path
>> - build dpdk with af_xdp enabled
> 
> As long as the full kernel tree is required, we cannot enable it in
> Debian and Ubuntu - we can't have it at build time on the build
> workers, and also there's the licensing problem.

Got it.

In above steps, 'libbpf' also build from kernel source tree, will it be problem
in you builds to not have it build from source?

If not, taking into account that xsk.h also will be fixed, only
'tools/include/asm/barrier.h' remains the problem, and it looks like it can be
solved, please check above.


> 
>>> Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
>>> userspace header so it is not covered by the userspace exception,
>>> which
>>> means at the very least the af_xdp PMD shared object is also
>>> licensed
>>> under GPL-2.0 only, isn't it?
>>>
>>
>>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-02 19:43     ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Ferruh Yigit
@ 2019-04-03 13:22       ` Bruce Richardson
  2019-04-03 13:34         ` Ferruh Yigit
  0 siblings, 1 reply; 214+ messages in thread
From: Bruce Richardson @ 2019-04-03 13:22 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Xiaolong Ye, dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Luca Boccassi, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz

On Tue, Apr 02, 2019 at 08:43:48PM +0100, Ferruh Yigit wrote:
> On 4/2/2019 4:46 PM, Xiaolong Ye wrote:
> > Add a new PMD driver for AF_XDP which is a proposed faster version of
> > AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> > [2].
> > 
> > This is the vanilla version PMD which just uses a raw buffer registered as
> > the umem.
> > 
> > [1] https://fosdem.org/2018/schedule/event/af_xdp/
> > [2] https://lwn.net/Articles/745934/
> > 
> > Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> 
> <...>
> 
> > @@ -0,0 +1,956 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2019 Intel Corporation.
> > + */
> 
> > +#include <bpf/bpf.h>
> > +#include <xsk.h>
> 
> Under linux, both headers are in same 'bpf' folder, why one included as
> 'bpf/bpf.h' but other 'xsk.h'?
> 
> Perhaps this is not problem when headers are installed into system folders, but
> I am compiling using RTE_KERNELDIR, which used in Makefile as:
>  -I$(RTE_KERNELDIR)/tools/lib/bpf

When installed in system folders they will still need the "bpf" prefix. On
my system after running "make headers_install" in libbpf folder, the
headers are placed in "/usr/local/include/bpf/"

> 
> This fails to find 'bpf/bpf.h'
> 
> Also for '-lbpf', shouldn't need to add '-L$(RTE_KERNELDIR)/tools/lib/bpf', to
> new added line in 'rte.app.mk', so that it can find the library?
> 
> I assume you are building in a system with new kernel, I think you need this for
> functionality, where 'xsk.h' is located in that case? Because I was thinking
> building and installing libbpf can solve the issue but it is not installing
> 'xsk.h', not sure why, so not exactly solving.
> 
> if you still need "CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf" for your case,
> does it make sense update as following:
>  CFLAGS += -I$(RTE_KERNELDIR)/tools/lib
>  #include <bpf/xsk.h>

We should not include in any driver a cflag or ldflag that points to the
kernel dir. We should expect the headers for libbpf in a regular include
folder and the library itself in /usr/lib or /usr/local/lib.

/Bruce

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 13:09                 ` Ferruh Yigit
@ 2019-04-03 13:29                   ` Luca Boccassi
  2019-04-03 14:43                     ` Ye Xiaolong
  2019-04-03 14:22                   ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 13:29 UTC (permalink / raw)
  To: Ferruh Yigit, Ye Xiaolong; +Cc: dev, Karlsson Magnus, Topel Bjorn

On Wed, 2019-04-03 at 14:09 +0100, Ferruh Yigit wrote:
> On 4/3/2019 12:35 PM, Luca Boccassi wrote:
> > On Wed, 2019-04-03 at 12:18 +0100, Ferruh Yigit wrote:
> > > On 4/3/2019 11:42 AM, Luca Boccassi wrote:
> > > > On Wed, 2019-04-03 at 11:36 +0100, Luca Boccassi wrote:
> > > > > On Wed, 2019-04-03 at 17:59 +0800, Ye Xiaolong wrote:
> > > > > > Hi, Luca
> > > > > > 
> > > > > > On 04/02, Luca Boccassi wrote:
> > > > > > > On Tue, 2019-04-02 at 23:46 +0800, Xiaolong Ye wrote:
> > > > > > > > diff --git a/drivers/net/af_xdp/Makefile
> > > > > > > > b/drivers/net/af_xdp/Makefile
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000..8343e3016
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/drivers/net/af_xdp/Makefile
> > > > > > > > @@ -0,0 +1,32 @@
> > > > > > > > +# SPDX-License-Identifier: BSD-3-Clause
> > > > > > > > +# Copyright(c) 2019 Intel Corporation
> > > > > > > > +
> > > > > > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > > +
> > > > > > > > +#
> > > > > > > > +# library name
> > > > > > > > +#
> > > > > > > > +LIB = librte_pmd_af_xdp.a
> > > > > > > > +
> > > > > > > > +EXPORT_MAP := rte_pmd_af_xdp_version.map
> > > > > > > > +
> > > > > > > > +LIBABIVER := 1
> > > > > > > > +
> > > > > > > > +CFLAGS += -O3
> > > > > > > > +
> > > > > > > > +# require kernel version >= v5.1-rc1
> > > > > > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/include
> > > > > > > > +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
> > > > > > > 
> > > > > > > Sorry for not noticing this before, but doesn't this
> > > > > > > require
> > > > > > > the
> > > > > > > full
> > > > > > > kernel tree rather than just the typical headers package?
> > > > > > > Requiring
> > > > > > > the
> > > > > > > full kernel tree to be available at build time will make
> > > > > > > this
> > > > > > > unbuildable on distros that still use makefiles, like
> > > > > > > RHEL
> > > > > > > and
> > > > > > > SUSE. At
> > > > > > > least on Debian and Ubuntu, the kernel headers packages
> > > > > > > distributed
> > > > > > > do
> > > > > > > not include the full kernel tree, only the headers, so
> > > > > > > there's no
> > > > > > > tools/lib or tools/include.
> > > > > > 
> > > > > > Currently we do have dependencies on the kernel src tree,
> > > > > > as
> > > > > > xsk.h
> > > > > > and
> > > > > > asm/barrier wouldn't be installed by libbpf, so before
> > > > > > libbpf
> > > > > > handles
> > > > > > these
> > > > > > properly, can we keep the current RTE_KERNELDIR in Makefile
> > > > > > for
> > > > > > now,
> > > > > > and mention
> > > > > > the dependencies in document, also suggest users to config
> > > > > > RTE_KERNELDIR to correct
> > > > > > kernel src tree if they want to use af_xdp pmd?
> > > > > > 
> > > > > > Something like:
> > > > > > 
> > > > > > dependencies:
> > > > > > - kernel source code (>= v5.1-rc1)
> > > > > > - build libbfp and install
> > > > > > 
> > > > > > Thanks,
> > > > > > Xiaolong
> > > > > 
> > > > > asm/barrier.h is installed by the kernel headers packages so
> > > > > it
> > > > > would
> > > > > be fine (although not ideal) and not need the full source
> > > > > tree.
> > > > > xsk.h is a bit more worrying, as it looks like an internal
> > > > > header
> > > > > from
> > > > > here.
> > > > > 
> > > > > Is it really necessary for external applications to use an
> > > > > internal-
> > > > > only header and a kernel header to be able to use libbpf?
> > > > 
> > > > Actually, xsk.h is now installed by the library makefile:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=379e2014c95b
> > > > 
> > > > 
> > > 
> > > Good to have this one. But again it is in BPF tree and it won't
> > > be in
> > > 5.1.
> > 
> > It looks like a small and required bug fix to me, and 5.1 is still
> > in
> > RC state, so perhaps there's still time.
> > 
> > Bjorn and Magnus, any chance the above makefile install fix could
> > be
> > sent for inclusion in 5.1-rc4?
> > 
> > > I suggested changing code as following for now, it would help to
> > > keep
> > > changes
> > > small when above patch merged into kernel:
> > >  CFLAGS += -I$(RTE_KERNELDIR)/tools/lib [in makefile]
> > >  #include <bpf/xsk.h>                   [in .c file]
> > > 
> > > > So the full kernel source tree is no longer required.
> > > > 
> > > > Is asm/barrier.h really required? Isn't there an userspace
> > > > alternative?
> > > 
> > > The 'asm/barrier.h' in the kernel headers and the
> > > 'tools/include/asm/barrier.h'
> > > looks different, the one in the kernel source has dependency to
> > > other
> > > kernel
> > > headers.
> > > 
> > > I wonder same thing, what is used from
> > > 'tools/include/asm/barrier.h'
> > > and if it
> > > can be avoided.
> 
> It seems, 'tools/include/asm/barrier.h' is required for 'smp_wmb()' &
> 'smp_rmb()' in 'xsk.h'.
> We have equivalents of these in DPDK [1], and perhaps it can be
> possible to use
> them and not include this header at all.
> 
> in 'rte_eth_af_xdp.c', before including 'xsk.h', we can include an
> local
> compatibility header which does following should work:
> #define smp_rmb() rte_rmb()
> #define smp_wmb() rte_wmb()
> 
> @Xiaolong, what do you think?
> 
> [1]
> https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/x86/rte_atomic.h?h=v19.02#n30

Perfect, that looks like a great solution for the PMD.

For the broader picture, now that xsk.h is a public userspace header,
it should at some point in the future be fixed to avoid depending on an
internal kernel definition for the barriers, and either ship their own
or depend on another public header that provides them. Otherwise every
application that wants to use bpf with xdp needs to provide their own
implementation - but this is not relanted to this patchset and we can
live without for the moment in DPDK.

> > The one in tools/include also is GPL-2.0 only so it cannot be
> > included
> > from the PMD, which is BSD-3-clause only (and it recursively
> > includes
> > the other arch-specific kernel headers)
> > 
> > > Anyway, as Xiaolong mentioned, following is working, can it work
> > > from
> > > a distro
> > > point of view:
> > > - get kernel source code (>= v5.1-rc1)
> > > - build libbfp and install
> > > - set 'RTE_KERNELDIR' to point kernel source path
> > > - build dpdk with af_xdp enabled
> > 
> > As long as the full kernel tree is required, we cannot enable it in
> > Debian and Ubuntu - we can't have it at build time on the build
> > workers, and also there's the licensing problem.
> 
> Got it.
> 
> In above steps, 'libbpf' also build from kernel source tree, will it
> be problem
> in you builds to not have it build from source?
> 
> If not, taking into account that xsk.h also will be fixed, only
> 'tools/include/asm/barrier.h' remains the problem, and it looks like
> it can be
> solved, please check above.

libbpf is already packaged separately in Debian and I think other
distros will follow soon, so it's all good for me once the barrier
issue is solved.

https://packages.debian.org/buster/libbpf-dev

>From the makefile's perspective it should not matter where it comes
from - the headers should be expected to be in /usr/include and the
library in /usr/lib* - and pkg-config can help with that if available.
And if a user wants to use a custom path, then it's no different than
any of the other dependencies on other external libraries

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 13:22       ` Bruce Richardson
@ 2019-04-03 13:34         ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-03 13:34 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Xiaolong Ye, dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Luca Boccassi, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz

On 4/3/2019 2:22 PM, Bruce Richardson wrote:
> On Tue, Apr 02, 2019 at 08:43:48PM +0100, Ferruh Yigit wrote:
>> On 4/2/2019 4:46 PM, Xiaolong Ye wrote:
>>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>>> [2].
>>>
>>> This is the vanilla version PMD which just uses a raw buffer registered as
>>> the umem.
>>>
>>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>>> [2] https://lwn.net/Articles/745934/
>>>
>>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>>
>> <...>
>>
>>> @@ -0,0 +1,956 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright(c) 2019 Intel Corporation.
>>> + */
>>
>>> +#include <bpf/bpf.h>
>>> +#include <xsk.h>
>>
>> Under linux, both headers are in same 'bpf' folder, why one included as
>> 'bpf/bpf.h' but other 'xsk.h'?
>>
>> Perhaps this is not problem when headers are installed into system folders, but
>> I am compiling using RTE_KERNELDIR, which used in Makefile as:
>>  -I$(RTE_KERNELDIR)/tools/lib/bpf
> 
> When installed in system folders they will still need the "bpf" prefix. On
> my system after running "make headers_install" in libbpf folder, the
> headers are placed in "/usr/local/include/bpf/"

This is for 'xsk.h' which was not installed via "make headers_install", but as
Luca pointed out there is a patch to install 'xsk.h' too, so it should be OK to
remove that line.

> 
>>
>> This fails to find 'bpf/bpf.h'
>>
>> Also for '-lbpf', shouldn't need to add '-L$(RTE_KERNELDIR)/tools/lib/bpf', to
>> new added line in 'rte.app.mk', so that it can find the library?
>>
>> I assume you are building in a system with new kernel, I think you need this for
>> functionality, where 'xsk.h' is located in that case? Because I was thinking
>> building and installing libbpf can solve the issue but it is not installing
>> 'xsk.h', not sure why, so not exactly solving.
>>
>> if you still need "CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf" for your case,
>> does it make sense update as following:
>>  CFLAGS += -I$(RTE_KERNELDIR)/tools/lib
>>  #include <bpf/xsk.h>
> 
> We should not include in any driver a cflag or ldflag that points to the
> kernel dir. We should expect the headers for libbpf in a regular include
> folder and the library itself in /usr/lib or /usr/local/lib.

Overall agree, but there is a dependency from 'xsk.h' to
'tools/include/asm/barrier.h', and this header file is not installed into system
folders, it can be found only in kernel source code.

Hopefully it looks like there is a way to get rid of
'tools/include/asm/barrier.h' dependency for DPDK, so we can remove that cflags too.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 13:09                 ` Ferruh Yigit
  2019-04-03 13:29                   ` Luca Boccassi
@ 2019-04-03 14:22                   ` Ye Xiaolong
  2019-04-03 15:52                     ` Ferruh Yigit
  1 sibling, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-03 14:22 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Luca Boccassi, dev, Karlsson Magnus, Topel Bjorn

On 04/03, Ferruh Yigit wrote:
[snip]
>
>It seems, 'tools/include/asm/barrier.h' is required for 'smp_wmb()' &
>'smp_rmb()' in 'xsk.h'.
>We have equivalents of these in DPDK [1], and perhaps it can be possible to use
>them and not include this header at all.
>
>in 'rte_eth_af_xdp.c', before including 'xsk.h', we can include an local
>compatibility header which does following should work:
>#define smp_rmb() rte_rmb()
>#define smp_wmb() rte_wmb()
>
>@Xiaolong, what do you think?

It sounds perfect to me, I'll take it in my next version.
Something to confirm, So we can now assume af_xdp pmd user would use kernel (say v5.1-rc4) 
that contains fixes regarding to xsk.h and libelf, I still need to do following
changes.

1. I shall use <bpf/xsk.h> as xsk.h should be installed in system folders.
2. `-lelf` is not needed in rte.app.mk
3. I need to document the libbpf build and install steps in af_xdp.rst
4. add the above two defines before including xsk.h

Thanks,
Xiaolong


>
>[1]
>https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/x86/rte_atomic.h?h=v19.02#n30
>
>> 
>> The one in tools/include also is GPL-2.0 only so it cannot be included
>> from the PMD, which is BSD-3-clause only (and it recursively includes
>> the other arch-specific kernel headers)
>> 
>>> Anyway, as Xiaolong mentioned, following is working, can it work from
>>> a distro
>>> point of view:
>>> - get kernel source code (>= v5.1-rc1)
>>> - build libbfp and install
>>> - set 'RTE_KERNELDIR' to point kernel source path
>>> - build dpdk with af_xdp enabled
>> 
>> As long as the full kernel tree is required, we cannot enable it in
>> Debian and Ubuntu - we can't have it at build time on the build
>> workers, and also there's the licensing problem.
>
>Got it.
>
>In above steps, 'libbpf' also build from kernel source tree, will it be problem
>in you builds to not have it build from source?
>
>If not, taking into account that xsk.h also will be fixed, only
>'tools/include/asm/barrier.h' remains the problem, and it looks like it can be
>solved, please check above.
>
>
>> 
>>>> Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
>>>> userspace header so it is not covered by the userspace exception,
>>>> which
>>>> means at the very least the af_xdp PMD shared object is also
>>>> licensed
>>>> under GPL-2.0 only, isn't it?
>>>>
>>>
>>>
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 13:29                   ` Luca Boccassi
@ 2019-04-03 14:43                     ` Ye Xiaolong
  2019-04-03 14:51                       ` Luca Boccassi
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-03 14:43 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Ferruh Yigit, dev, Karlsson Magnus, Topel Bjorn

On 04/03, Luca Boccassi wrote:
[snip]
>> 
>> Got it.
>> 
>> In above steps, 'libbpf' also build from kernel source tree, will it
>> be problem
>> in you builds to not have it build from source?
>> 
>> If not, taking into account that xsk.h also will be fixed, only
>> 'tools/include/asm/barrier.h' remains the problem, and it looks like
>> it can be
>> solved, please check above.
>
>libbpf is already packaged separately in Debian and I think other
>distros will follow soon, so it's all good for me once the barrier
>issue is solved.
>
>https://packages.debian.org/buster/libbpf-dev
>
>From the makefile's perspective it should not matter where it comes
>from - the headers should be expected to be in /usr/include and the
>library in /usr/lib* - and pkg-config can help with that if available.
>And if a user wants to use a custom path, then it's no different than
>any of the other dependencies on other external libraries

>From tools/lib/bpf/Makefile, after make install_lib and make install_headers,
the headers and library would be put in /usr/local/include/bpf and /usr/local/lib*,
Is it ok?

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 14:43                     ` Ye Xiaolong
@ 2019-04-03 14:51                       ` Luca Boccassi
  2019-04-03 15:14                         ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 14:51 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: Ferruh Yigit, dev

On Wed, 2019-04-03 at 22:43 +0800, Ye Xiaolong wrote:
> On 04/03, Luca Boccassi wrote:
> [snip]
> > > Got it.
> > > 
> > > In above steps, 'libbpf' also build from kernel source tree, will
> > > it
> > > be problem
> > > in you builds to not have it build from source?
> > > 
> > > If not, taking into account that xsk.h also will be fixed, only
> > > 'tools/include/asm/barrier.h' remains the problem, and it looks
> > > like
> > > it can be
> > > solved, please check above.
> > 
> > libbpf is already packaged separately in Debian and I think other
> > distros will follow soon, so it's all good for me once the barrier
> > issue is solved.
> > 
> > https://packages.debian.org/buster/libbpf-dev
> > 
> > 
> > From the makefile's perspective it should not matter where it comes
> > from - the headers should be expected to be in /usr/include and the
> > library in /usr/lib* - and pkg-config can help with that if
> > available.
> > And if a user wants to use a custom path, then it's no different
> > than
> > any of the other dependencies on other external libraries
> 
> From tools/lib/bpf/Makefile, after make install_lib and make
> install_headers,
> the headers and library would be put in /usr/local/include/bpf and
> /usr/local/lib*,
> Is it ok?

Yes certainly that's fine, that's expected for local installations, and
users can specify a prefix with the upstream's makefile if they want to
install somewhere else.

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 14:51                       ` Luca Boccassi
@ 2019-04-03 15:14                         ` Ye Xiaolong
  2019-04-03 15:23                           ` Bruce Richardson
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-03 15:14 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Ferruh Yigit, dev

On 04/03, Luca Boccassi wrote:
>On Wed, 2019-04-03 at 22:43 +0800, Ye Xiaolong wrote:
>> On 04/03, Luca Boccassi wrote:
>> [snip]
>> > > Got it.
>> > > 
>> > > In above steps, 'libbpf' also build from kernel source tree, will
>> > > it
>> > > be problem
>> > > in you builds to not have it build from source?
>> > > 
>> > > If not, taking into account that xsk.h also will be fixed, only
>> > > 'tools/include/asm/barrier.h' remains the problem, and it looks
>> > > like
>> > > it can be
>> > > solved, please check above.
>> > 
>> > libbpf is already packaged separately in Debian and I think other
>> > distros will follow soon, so it's all good for me once the barrier
>> > issue is solved.
>> > 
>> > https://packages.debian.org/buster/libbpf-dev
>> > 
>> > 
>> > From the makefile's perspective it should not matter where it comes
>> > from - the headers should be expected to be in /usr/include and the
>> > library in /usr/lib* - and pkg-config can help with that if
>> > available.
>> > And if a user wants to use a custom path, then it's no different
>> > than
>> > any of the other dependencies on other external libraries
>> 
>> From tools/lib/bpf/Makefile, after make install_lib and make
>> install_headers,
>> the headers and library would be put in /usr/local/include/bpf and
>> /usr/local/lib*,
>> Is it ok?
>
>Yes certainly that's fine, that's expected for local installations, and
>users can specify a prefix with the upstream's makefile if they want to
>install somewhere else.

In my local test, if I run `make install_lib` to install the libbpf.so to 
/usr/local/lib64, `-lbpf` specified in af_xdp pmd still fails to find the library,
the build would end up with a lot of undefined references which are defined in libbpf.
It means during dpdk compilation, it won't search libraries in /usr/local/lib*, right?

Install the libbpf to /usr/lib64 via `make install_lib prefix=/usr` doesn't have 
this issue, so shall I just document it in af_xdp.rst or there is other proper
way to do it?

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 15:14                         ` Ye Xiaolong
@ 2019-04-03 15:23                           ` Bruce Richardson
  2019-04-03 15:34                             ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Bruce Richardson @ 2019-04-03 15:23 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: Luca Boccassi, Ferruh Yigit, dev

On Wed, Apr 03, 2019 at 11:14:58PM +0800, Ye Xiaolong wrote:
> On 04/03, Luca Boccassi wrote:
> >On Wed, 2019-04-03 at 22:43 +0800, Ye Xiaolong wrote:
> >> On 04/03, Luca Boccassi wrote:
> >> [snip]
> >> > > Got it.
> >> > > 
> >> > > In above steps, 'libbpf' also build from kernel source tree, will
> >> > > it
> >> > > be problem
> >> > > in you builds to not have it build from source?
> >> > > 
> >> > > If not, taking into account that xsk.h also will be fixed, only
> >> > > 'tools/include/asm/barrier.h' remains the problem, and it looks
> >> > > like
> >> > > it can be
> >> > > solved, please check above.
> >> > 
> >> > libbpf is already packaged separately in Debian and I think other
> >> > distros will follow soon, so it's all good for me once the barrier
> >> > issue is solved.
> >> > 
> >> > https://packages.debian.org/buster/libbpf-dev
> >> > 
> >> > 
> >> > From the makefile's perspective it should not matter where it comes
> >> > from - the headers should be expected to be in /usr/include and the
> >> > library in /usr/lib* - and pkg-config can help with that if
> >> > available.
> >> > And if a user wants to use a custom path, then it's no different
> >> > than
> >> > any of the other dependencies on other external libraries
> >> 
> >> From tools/lib/bpf/Makefile, after make install_lib and make
> >> install_headers,
> >> the headers and library would be put in /usr/local/include/bpf and
> >> /usr/local/lib*,
> >> Is it ok?
> >
> >Yes certainly that's fine, that's expected for local installations, and
> >users can specify a prefix with the upstream's makefile if they want to
> >install somewhere else.
> 
> In my local test, if I run `make install_lib` to install the libbpf.so to 
> /usr/local/lib64, `-lbpf` specified in af_xdp pmd still fails to find the library,
> the build would end up with a lot of undefined references which are defined in libbpf.
> It means during dpdk compilation, it won't search libraries in /usr/local/lib*, right?
> 
> Install the libbpf to /usr/lib64 via `make install_lib prefix=/usr` doesn't have 
> this issue, so shall I just document it in af_xdp.rst or there is other proper
> way to do it?
> 
At a guess I'd say you are using Fedora Linux, right? Fedora is unusual in
that it doesn't by default add /usr/local to the library and header search
paths so you need to explicitly add them to your environment. Other distros
should work fine for this.

/Bruce

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 15:23                           ` Bruce Richardson
@ 2019-04-03 15:34                             ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-03 15:34 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Luca Boccassi, Ferruh Yigit, dev

On 04/03, Bruce Richardson wrote:
>On Wed, Apr 03, 2019 at 11:14:58PM +0800, Ye Xiaolong wrote:
>> On 04/03, Luca Boccassi wrote:
>> >On Wed, 2019-04-03 at 22:43 +0800, Ye Xiaolong wrote:
>> >> On 04/03, Luca Boccassi wrote:
>> >> [snip]
>> >> > > Got it.
>> >> > > 
>> >> > > In above steps, 'libbpf' also build from kernel source tree, will
>> >> > > it
>> >> > > be problem
>> >> > > in you builds to not have it build from source?
>> >> > > 
>> >> > > If not, taking into account that xsk.h also will be fixed, only
>> >> > > 'tools/include/asm/barrier.h' remains the problem, and it looks
>> >> > > like
>> >> > > it can be
>> >> > > solved, please check above.
>> >> > 
>> >> > libbpf is already packaged separately in Debian and I think other
>> >> > distros will follow soon, so it's all good for me once the barrier
>> >> > issue is solved.
>> >> > 
>> >> > https://packages.debian.org/buster/libbpf-dev
>> >> > 
>> >> > 
>> >> > From the makefile's perspective it should not matter where it comes
>> >> > from - the headers should be expected to be in /usr/include and the
>> >> > library in /usr/lib* - and pkg-config can help with that if
>> >> > available.
>> >> > And if a user wants to use a custom path, then it's no different
>> >> > than
>> >> > any of the other dependencies on other external libraries
>> >> 
>> >> From tools/lib/bpf/Makefile, after make install_lib and make
>> >> install_headers,
>> >> the headers and library would be put in /usr/local/include/bpf and
>> >> /usr/local/lib*,
>> >> Is it ok?
>> >
>> >Yes certainly that's fine, that's expected for local installations, and
>> >users can specify a prefix with the upstream's makefile if they want to
>> >install somewhere else.
>> 
>> In my local test, if I run `make install_lib` to install the libbpf.so to 
>> /usr/local/lib64, `-lbpf` specified in af_xdp pmd still fails to find the library,
>> the build would end up with a lot of undefined references which are defined in libbpf.
>> It means during dpdk compilation, it won't search libraries in /usr/local/lib*, right?
>> 
>> Install the libbpf to /usr/lib64 via `make install_lib prefix=/usr` doesn't have 
>> this issue, so shall I just document it in af_xdp.rst or there is other proper
>> way to do it?
>> 
>At a guess I'd say you are using Fedora Linux, right? Fedora is unusual in
>that it doesn't by default add /usr/local to the library and header search
>paths so you need to explicitly add them to your environment. Other distros
>should work fine for this.

I am using centos 7.4, I guess it has the same issue as Fedora you mentioned 
above.

Thanks,
Xiaolong
>
>/Bruce

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 14:22                   ` Ye Xiaolong
@ 2019-04-03 15:52                     ` Ferruh Yigit
  2019-04-03 15:57                       ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-03 15:52 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: Luca Boccassi, dev, Karlsson Magnus, Topel Bjorn

On 4/3/2019 3:22 PM, Ye Xiaolong wrote:
> On 04/03, Ferruh Yigit wrote:
> [snip]
>>
>> It seems, 'tools/include/asm/barrier.h' is required for 'smp_wmb()' &
>> 'smp_rmb()' in 'xsk.h'.
>> We have equivalents of these in DPDK [1], and perhaps it can be possible to use
>> them and not include this header at all.
>>
>> in 'rte_eth_af_xdp.c', before including 'xsk.h', we can include an local
>> compatibility header which does following should work:
>> #define smp_rmb() rte_rmb()
>> #define smp_wmb() rte_wmb()
>>
>> @Xiaolong, what do you think?
> 
> It sounds perfect to me, I'll take it in my next version.
> Something to confirm, So we can now assume af_xdp pmd user would use kernel (say v5.1-rc4) 
> that contains fixes regarding to xsk.h and libelf, I still need to do following
> changes.
> 
> 1. I shall use <bpf/xsk.h> as xsk.h should be installed in system folders.
> 2. `-lelf` is not needed in rte.app.mk
> 3. I need to document the libbpf build and install steps in af_xdp.rst
> 4. add the above two defines before including xsk.h

Looks good to me,
only for item 4) instead of putting those defines into .c file directly, can
create a private header in driver folder, put those lines and I assume will need
a few includes for rte_rmb as well, and include that header before xsk.h.

> 
> Thanks,
> Xiaolong
> 
> 
>>
>> [1]
>> https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/x86/rte_atomic.h?h=v19.02#n30
>>
>>>
>>> The one in tools/include also is GPL-2.0 only so it cannot be included
>>> from the PMD, which is BSD-3-clause only (and it recursively includes
>>> the other arch-specific kernel headers)
>>>
>>>> Anyway, as Xiaolong mentioned, following is working, can it work from
>>>> a distro
>>>> point of view:
>>>> - get kernel source code (>= v5.1-rc1)
>>>> - build libbfp and install
>>>> - set 'RTE_KERNELDIR' to point kernel source path
>>>> - build dpdk with af_xdp enabled
>>>
>>> As long as the full kernel tree is required, we cannot enable it in
>>> Debian and Ubuntu - we can't have it at build time on the build
>>> workers, and also there's the licensing problem.
>>
>> Got it.
>>
>> In above steps, 'libbpf' also build from kernel source tree, will it be problem
>> in you builds to not have it build from source?
>>
>> If not, taking into account that xsk.h also will be fixed, only
>> 'tools/include/asm/barrier.h' remains the problem, and it looks like it can be
>> solved, please check above.
>>
>>
>>>
>>>>> Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
>>>>> userspace header so it is not covered by the userspace exception,
>>>>> which
>>>>> means at the very least the af_xdp PMD shared object is also
>>>>> licensed
>>>>> under GPL-2.0 only, isn't it?
>>>>>
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 15:52                     ` Ferruh Yigit
@ 2019-04-03 15:57                       ` Ye Xiaolong
  2019-04-17 12:30                         ` [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device Markus Theil
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-03 15:57 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Luca Boccassi, dev, Karlsson Magnus, Topel Bjorn

On 04/03, Ferruh Yigit wrote:
>On 4/3/2019 3:22 PM, Ye Xiaolong wrote:
>> On 04/03, Ferruh Yigit wrote:
>> [snip]
>>>
>>> It seems, 'tools/include/asm/barrier.h' is required for 'smp_wmb()' &
>>> 'smp_rmb()' in 'xsk.h'.
>>> We have equivalents of these in DPDK [1], and perhaps it can be possible to use
>>> them and not include this header at all.
>>>
>>> in 'rte_eth_af_xdp.c', before including 'xsk.h', we can include an local
>>> compatibility header which does following should work:
>>> #define smp_rmb() rte_rmb()
>>> #define smp_wmb() rte_wmb()
>>>
>>> @Xiaolong, what do you think?
>> 
>> It sounds perfect to me, I'll take it in my next version.
>> Something to confirm, So we can now assume af_xdp pmd user would use kernel (say v5.1-rc4) 
>> that contains fixes regarding to xsk.h and libelf, I still need to do following
>> changes.
>> 
>> 1. I shall use <bpf/xsk.h> as xsk.h should be installed in system folders.
>> 2. `-lelf` is not needed in rte.app.mk
>> 3. I need to document the libbpf build and install steps in af_xdp.rst
>> 4. add the above two defines before including xsk.h
>
>Looks good to me,
>only for item 4) instead of putting those defines into .c file directly, can
>create a private header in driver folder, put those lines and I assume will need
>a few includes for rte_rmb as well, and include that header before xsk.h.

Sounds better, will do.

Thanks,
Xiaolong

>
>> 
>> Thanks,
>> Xiaolong
>> 
>> 
>>>
>>> [1]
>>> https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/x86/rte_atomic.h?h=v19.02#n30
>>>
>>>>
>>>> The one in tools/include also is GPL-2.0 only so it cannot be included
>>>> from the PMD, which is BSD-3-clause only (and it recursively includes
>>>> the other arch-specific kernel headers)
>>>>
>>>>> Anyway, as Xiaolong mentioned, following is working, can it work from
>>>>> a distro
>>>>> point of view:
>>>>> - get kernel source code (>= v5.1-rc1)
>>>>> - build libbfp and install
>>>>> - set 'RTE_KERNELDIR' to point kernel source path
>>>>> - build dpdk with af_xdp enabled
>>>>
>>>> As long as the full kernel tree is required, we cannot enable it in
>>>> Debian and Ubuntu - we can't have it at build time on the build
>>>> workers, and also there's the licensing problem.
>>>
>>> Got it.
>>>
>>> In above steps, 'libbpf' also build from kernel source tree, will it be problem
>>> in you builds to not have it build from source?
>>>
>>> If not, taking into account that xsk.h also will be fixed, only
>>> 'tools/include/asm/barrier.h' remains the problem, and it looks like it can be
>>> solved, please check above.
>>>
>>>
>>>>
>>>>>> Also, the license in asm/barrier.h is GPL-2.0 only. It is not a
>>>>>> userspace header so it is not covered by the userspace exception,
>>>>>> which
>>>>>> means at the very least the af_xdp PMD shared object is also
>>>>>> licensed
>>>>>> under GPL-2.0 only, isn't it?
>>>>>>
>>>>>
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v10 0/1] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (14 preceding siblings ...)
  2019-04-02 15:46 ` [PATCH v9 0/1] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-04-03 16:59 ` Xiaolong Ye
  2019-04-03 16:59   ` [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-04  8:51 ` [PATCH v11 0/1] Introduce AF_XDP PMD Xiaolong Ye
  16 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-03 16:59 UTC (permalink / raw)
  To: dev, Stephen Hemminger, Ferruh Yigit, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========
v10:

- refine the Makefile, remove RTE_KERNELDIR related CFLAGS 
- add a new internal file af_xdp_deps.h to handle the dependency for
  asm/barrier.h
- fix a typo observed by Stephen
- rename xsk.h to bpf/xsk.h as xsk.h is assumed to be installed in
  /usr/local/include/bpf
- add libbpf build steps in af_xdp.rst


v9:
- adjust header files order according to Stephen's suggestion

v8:
- address Ferruh's comments on V7
- replace posix_memalign with rte_memzone_reserve_aligned to get better
  performance
- keep the first patch only as Oliver suggested as zero copy part
  implementation is still in suspense, we may provide the related patch
  later.

v7:
- mention mtu limitation in af_xdp.rst
- fix the vdev name in af_xdp.rst

V6:

- remove the newline in AF_XDP_LOG definition to avoid double new lines
  issue.
- rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build & install libbpf in tools/lib/bpf

   cd tools/lib/bpf
   make install_lib
   make install_headers

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (1):
  net/af_xdp: introduce AF XDP PMD driver

 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  50 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  28 +
 drivers/net/af_xdp/af_xdp_deps.h              |  14 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 955 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1105 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/af_xdp_deps.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 16:59 ` [PATCH v10 0/1] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-04-03 16:59   ` Xiaolong Ye
  2019-04-03 17:32     ` Luca Boccassi
  2019-04-03 17:44     ` Ferruh Yigit
  0 siblings, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-03 16:59 UTC (permalink / raw)
  To: dev, Stephen Hemminger, Ferruh Yigit, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  50 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  28 +
 drivers/net/af_xdp/af_xdp_deps.h              |  14 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 955 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1105 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/af_xdp_deps.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 71ac8cd4b..f30572c07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -489,6 +489,13 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/af_xdp.rst
+F: doc/guides/nics/features/af_xdp.ini
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 6292bc4af..b95ee03d7 100644
--- a/config/common_base
+++ b/config/common_base
@@ -430,6 +430,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..175038153
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,50 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2019 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Note that MTU of AF_XDP PMD is limited due to XDP lacks support for
+fragmentation.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > v4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > v5.1-rc4) with latest af_xdp support installed,
+   User can install libbpf via `make install_lib` && `make install_headers` in
+   <kernel src tree>/tools/lib/bpf;
+*  A Kernel bound interface to attach to;
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev net_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index bdad1ddbe..79e36739f 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -74,6 +74,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..eeba6b693
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += $(shell command -v pkg-config > /dev/null 2>&1 && pkg-config --libs libbpf || echo "-lbpf")
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/af_xdp_deps.h b/drivers/net/af_xdp/af_xdp_deps.h
new file mode 100644
index 000000000..8c8f8b6ab
--- /dev/null
+++ b/drivers/net/af_xdp/af_xdp_deps.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef AF_XDP_DEPS_H_
+#define AF_XDP_DEPS_H_
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+
+#define smp_rmb() rte_rmb()
+#define smp_wmb() rte_wmb()
+
+#endif /* AF_XDP_DEPS_H_ */
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..d40aae190
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..007a1c6b4
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,955 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include "af_xdp_deps.h"
+#include <bpf/xsk.h>
+
+#include <rte_ethdev.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+#include <rte_branch_prediction.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_dev.h>
+#include <rte_eal.h>
+#include <rte_ether.h>
+#include <rte_lcore.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt, __func__, ##args)
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	const struct rte_memzone *mz;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static const struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx = 0;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void
+pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq = 0;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void
+kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from completion queue to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->mz->addr,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void
+remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void
+xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	rte_memzone_free(umem->mz);
+	umem->mz = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct
+xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	const struct rte_memzone *mz;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 rte_socket_id(),
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	mz = rte_memzone_reserve_aligned("af_xdp uemem",
+			ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+			getpagesize());
+	if (mz == NULL) {
+		AF_XDP_LOG(ERR, "Failed to reserve memzone for af_xdp umem.\n");
+		goto err;
+	}
+
+	ret = xsk_umem__create(&umem->umem, mz->addr,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->mz = mz;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ] = {'\0'};
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	if (strlen(if_name) == 0) {
+		AF_XDP_LOG(ERR, "Network interface must be specified\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_memzone_free(internals->umem->mz);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.af_xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..f916bc9ef 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 16:59   ` [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-04-03 17:32     ` Luca Boccassi
  2019-04-03 17:44     ` Ferruh Yigit
  1 sibling, 0 replies; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 17:32 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger, Ferruh Yigit
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On Thu, 2019-04-04 at 00:59 +0800, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to
> [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer
> registered as
> the umem.
> 
> [1] 
> https://fosdem.org/2018/schedule/event/af_xdp/
> 
> [2] 
> https://lwn.net/Articles/745934/
> 
> 
> Signed-off-by: Xiaolong Ye <
> xiaolong.ye@intel.com
> >
> ---
>  MAINTAINERS                                   |   7 +
>  config/common_base                            |   5 +
>  doc/guides/nics/af_xdp.rst                    |  50 +
>  doc/guides/nics/features/af_xdp.ini           |  11 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/rel_notes/release_19_05.rst        |   7 +
>  drivers/net/Makefile                          |   1 +
>  drivers/net/af_xdp/Makefile                   |  28 +
>  drivers/net/af_xdp/af_xdp_deps.h              |  14 +
>  drivers/net/af_xdp/meson.build                |  21 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 955
> ++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>  drivers/net/meson.build                       |   1 +
>  mk/rte.app.mk                                 |   1 +
>  14 files changed, 1105 insertions(+)
>  create mode 100644 doc/guides/nics/af_xdp.rst
>  create mode 100644 doc/guides/nics/features/af_xdp.ini
>  create mode 100644 drivers/net/af_xdp/Makefile
>  create mode 100644 drivers/net/af_xdp/af_xdp_deps.h
>  create mode 100644 drivers/net/af_xdp/meson.build
>  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

Acked-by: Luca Boccassi <bluca@debian.org>

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 16:59   ` [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-03 17:32     ` Luca Boccassi
@ 2019-04-03 17:44     ` Ferruh Yigit
  2019-04-03 18:52       ` Luca Boccassi
  2019-04-04  5:29       ` Ye Xiaolong
  1 sibling, 2 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-03 17:44 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

I am not able to test functionality but code looks good to me, I can compile via
Makefile (with suggested steps in doc) but not able to build with meson, can you
please check below comments?

<...>

> @@ -0,0 +1,21 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2019 Intel Corporation
> +
> +if host_machine.system() != 'linux'
> +	build = false
> +endif

After this point, if build is false it shouldn't continue to below checks I think.

> +
> +bpf_dep = dependency('libbpf', required: false)

My library is in '/usr/local/lib64/libbpf.so' but this line can't find it. Where
does 'dependency()' checks for libraries?

> +if bpf_dep.found()
> +	build = true
> +else
> +	bpf_dep = cc.find_library('libbpf', required: false)

Also this line can't find it, in log it says "(tried pkgconfig and cmake)" and
yes there is no pkgconfig for it, any idea how 'cmake' used?

> +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')

Should this be 'lib/xsk.h' now?

> +		build = true
> +		pkgconfig_extra_libs += '-lbpf'
> +	else
> +		build = false
> +	endif
> +endif
> +sources = files('rte_eth_af_xdp.c')
> +ext_deps += bpf_dep

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 17:44     ` Ferruh Yigit
@ 2019-04-03 18:52       ` Luca Boccassi
  2019-04-04  5:36         ` Ye Xiaolong
  2019-04-04  5:55         ` Ye Xiaolong
  2019-04-04  5:29       ` Ye Xiaolong
  1 sibling, 2 replies; 214+ messages in thread
From: Luca Boccassi @ 2019-04-03 18:52 UTC (permalink / raw)
  To: Ferruh Yigit, Xiaolong Ye, dev, Stephen Hemminger
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On Wed, 2019-04-03 at 18:44 +0100, Ferruh Yigit wrote:
> On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
> > Add a new PMD driver for AF_XDP which is a proposed faster version
> > of
> > AF_PACKET interface in Linux. More info about AF_XDP, please refer
> > to [1]
> > [2].
> > 
> > This is the vanilla version PMD which just uses a raw buffer
> > registered as
> > the umem.
> > 
> > [1] 
> > https://fosdem.org/2018/schedule/event/af_xdp/
> > 
> > [2] 
> > https://lwn.net/Articles/745934/
> > 
> > 
> > Signed-off-by: Xiaolong Ye <
> > xiaolong.ye@intel.com
> > >
> 
> I am not able to test functionality but code looks good to me, I can
> compile via
> Makefile (with suggested steps in doc) but not able to build with
> meson, can you
> please check below comments?
> 
> <...>
> 
> > @@ -0,0 +1,21 @@
> > +# SPDX-License-Identifier: BSD-3-Clause
> > +# Copyright(c) 2019 Intel Corporation
> > +
> > +if host_machine.system() != 'linux'
> > +	build = false
> > +endif
> 
> After this point, if build is false it shouldn't continue to below
> checks I think.
> 
> > +
> > +bpf_dep = dependency('libbpf', required: false)
> 
> My library is in '/usr/local/lib64/libbpf.so' but this line can't
> find it. Where
> does 'dependency()' checks for libraries?

dependency() uses only pkg-config (or cmake or embedded specific tools,
neither of which applies to bpf), so if you haven't built from bpf-next 
you won't have the pkg-config file installed, and it will fall back to
the next block.

Side note, there's an issue open upstream in Meson to merge
dependency() and find_library(), with some traction but it's not done
yet.

For me building from bpf-next it works fine:

$ PKG_CONFIG_PATH=/tmp/bpf/lib64/pkgconfig/ ninja -C build-gcc-shared
...
Dependency libbpf found: YES 0.0.2
...
$ lddtree build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 
librte_pmd_af_xdp.so.1.1 => build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 (interpreter => none)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1
    librte_ethdev.so.12 => build-gcc-shared/drivers/../lib/librte_ethdev.so.12
    librte_eal.so.10 => build-gcc-shared/drivers/../lib/librte_eal.so.10
    librte_kvargs.so.1 => build-gcc-shared/drivers/../lib/librte_kvargs.so.1
    librte_net.so.1 => build-gcc-shared/drivers/../lib/librte_net.so.1
    librte_mbuf.so.5 => build-gcc-shared/drivers/../lib/librte_mbuf.so.5
    librte_mempool.so.5 => build-gcc-shared/drivers/../lib/librte_mempool.so.5
    librte_ring.so.2 => build-gcc-shared/drivers/../lib/librte_ring.so.2
    librte_cmdline.so.2 => build-gcc-shared/drivers/../lib/librte_cmdline.so.2
    librte_meter.so.2 => build-gcc-shared/drivers/../lib/librte_meter.so.2
    librte_bus_pci.so.2 => not found
    librte_pci.so.1 => build-gcc-shared/drivers/../lib/librte_pci.so.1
    librte_bus_vdev.so.2 => not found
    libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
    libbpf.so.0 => /tmp/bpf/lib64/libbpf.so.0
        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1
            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
    ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

> > +if bpf_dep.found()
> > +	build = true
> > +else
> > +	bpf_dep = cc.find_library('libbpf', required: false)
> 
> Also this line can't find it, in log it says "(tried pkgconfig and
> cmake)" and
> yes there is no pkgconfig for it, any idea how 'cmake' used?

The issue here is that it should be cc.find_library('bpf' - not
'libbpf'. I missed this when reviewing, good catch.

That's because find_library just does a compilation test passing the
value to the compiler as a linker flag - so right now it's passing
-llibbpf. Fixing this line and the header line below makes it work
without pkg-config:

$ CPPFLAGS=-I/tmp/bpf/include LDFLAGS=-L/tmp/bpf/lib64 meson testt
...
Dependency libbpf found: NO (tried pkgconfig and cmake)
Library bpf found: YES

> > +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies:
> > bpf_dep) and cc.has_header('linux/if_xdp.h')
> 
> Should this be 'lib/xsk.h' now?

Yes, this should be 'bpf/xsk.h'

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 17:44     ` Ferruh Yigit
  2019-04-03 18:52       ` Luca Boccassi
@ 2019-04-04  5:29       ` Ye Xiaolong
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-04  5:29 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, Stephen Hemminger, Luca Boccassi, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

Hi, Ferruh

On 04/03, Ferruh Yigit wrote:
>On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>
>I am not able to test functionality but code looks good to me, I can compile via
>Makefile (with suggested steps in doc) but not able to build with meson, can you
>please check below comments?
>

My bad, sorry for not testing out meson build.

><...>
>
>> @@ -0,0 +1,21 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2019 Intel Corporation
>> +
>> +if host_machine.system() != 'linux'
>> +	build = false
>> +endif
>
>After this point, if build is false it shouldn't continue to below checks I think.

Make sense, will do.

>
>> +
>> +bpf_dep = dependency('libbpf', required: false)
>
>My library is in '/usr/local/lib64/libbpf.so' but this line can't find it. Where
>does 'dependency()' checks for libraries?
>
>> +if bpf_dep.found()
>> +	build = true
>> +else
>> +	bpf_dep = cc.find_library('libbpf', required: false)
>
>Also this line can't find it, in log it says "(tried pkgconfig and cmake)" and
>yes there is no pkgconfig for it, any idea how 'cmake' used?

As Luca said, it should be cc.find_library('bpf', required: false), will correct
it in next version.

>
>> +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
>
>Should this be 'lib/xsk.h' now?

Will change to bpf/xsk.h in next version.

Thanks,
Xiaolong
>
>> +		build = true
>> +		pkgconfig_extra_libs += '-lbpf'
>> +	else
>> +		build = false
>> +	endif
>> +endif
>> +sources = files('rte_eth_af_xdp.c')
>> +ext_deps += bpf_dep
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 18:52       ` Luca Boccassi
@ 2019-04-04  5:36         ` Ye Xiaolong
  2019-04-04  5:55         ` Ye Xiaolong
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-04  5:36 UTC (permalink / raw)
  To: Luca Boccassi
  Cc: Ferruh Yigit, dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On 04/03, Luca Boccassi wrote:
>On Wed, 2019-04-03 at 18:44 +0100, Ferruh Yigit wrote:
>> On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
>> > Add a new PMD driver for AF_XDP which is a proposed faster version
>> > of
>> > AF_PACKET interface in Linux. More info about AF_XDP, please refer
>> > to [1]
>> > [2].
>> > 
>> > This is the vanilla version PMD which just uses a raw buffer
>> > registered as
>> > the umem.
>> > 
>> > [1] 
>> > https://fosdem.org/2018/schedule/event/af_xdp/
>> > 
>> > [2] 
>> > https://lwn.net/Articles/745934/
>> > 
>> > 
>> > Signed-off-by: Xiaolong Ye <
>> > xiaolong.ye@intel.com
>> > >
>> 
>> I am not able to test functionality but code looks good to me, I can
>> compile via
>> Makefile (with suggested steps in doc) but not able to build with
>> meson, can you
>> please check below comments?
>> 
>> <...>
>> 
>> > @@ -0,0 +1,21 @@
>> > +# SPDX-License-Identifier: BSD-3-Clause
>> > +# Copyright(c) 2019 Intel Corporation
>> > +
>> > +if host_machine.system() != 'linux'
>> > +	build = false
>> > +endif
>> 
>> After this point, if build is false it shouldn't continue to below
>> checks I think.
>> 
>> > +
>> > +bpf_dep = dependency('libbpf', required: false)
>> 
>> My library is in '/usr/local/lib64/libbpf.so' but this line can't
>> find it. Where
>> does 'dependency()' checks for libraries?
>
>dependency() uses only pkg-config (or cmake or embedded specific tools,
>neither of which applies to bpf), so if you haven't built from bpf-next 
>you won't have the pkg-config file installed, and it will fall back to
>the next block.
>
>Side note, there's an issue open upstream in Meson to merge
>dependency() and find_library(), with some traction but it's not done
>yet.
>
>For me building from bpf-next it works fine:
>
>$ PKG_CONFIG_PATH=/tmp/bpf/lib64/pkgconfig/ ninja -C build-gcc-shared
>...
>Dependency libbpf found: YES 0.0.2
>...
>$ lddtree build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 
>librte_pmd_af_xdp.so.1.1 => build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 (interpreter => none)
>    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
>    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
>    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1
>    librte_ethdev.so.12 => build-gcc-shared/drivers/../lib/librte_ethdev.so.12
>    librte_eal.so.10 => build-gcc-shared/drivers/../lib/librte_eal.so.10
>    librte_kvargs.so.1 => build-gcc-shared/drivers/../lib/librte_kvargs.so.1
>    librte_net.so.1 => build-gcc-shared/drivers/../lib/librte_net.so.1
>    librte_mbuf.so.5 => build-gcc-shared/drivers/../lib/librte_mbuf.so.5
>    librte_mempool.so.5 => build-gcc-shared/drivers/../lib/librte_mempool.so.5
>    librte_ring.so.2 => build-gcc-shared/drivers/../lib/librte_ring.so.2
>    librte_cmdline.so.2 => build-gcc-shared/drivers/../lib/librte_cmdline.so.2
>    librte_meter.so.2 => build-gcc-shared/drivers/../lib/librte_meter.so.2
>    librte_bus_pci.so.2 => not found
>    librte_pci.so.1 => build-gcc-shared/drivers/../lib/librte_pci.so.1
>    librte_bus_vdev.so.2 => not found
>    libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0
>        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
>    libbpf.so.0 => /tmp/bpf/lib64/libbpf.so.0
>        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1
>            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
>    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
>    ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>
>> > +if bpf_dep.found()
>> > +	build = true
>> > +else
>> > +	bpf_dep = cc.find_library('libbpf', required: false)
>> 
>> Also this line can't find it, in log it says "(tried pkgconfig and
>> cmake)" and
>> yes there is no pkgconfig for it, any idea how 'cmake' used?
>
>The issue here is that it should be cc.find_library('bpf' - not
>'libbpf'. I missed this when reviewing, good catch.
>
>That's because find_library just does a compilation test passing the
>value to the compiler as a linker flag - so right now it's passing
>-llibbpf. Fixing this line and the header line below makes it work
>without pkg-config:
>
>$ CPPFLAGS=-I/tmp/bpf/include LDFLAGS=-L/tmp/bpf/lib64 meson testt
>...
>Dependency libbpf found: NO (tried pkgconfig and cmake)
>Library bpf found: YES

Thanks for pointing out, will adopt it.

>
>> > +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies:
>> > bpf_dep) and cc.has_header('linux/if_xdp.h')
>> 
>> Should this be 'lib/xsk.h' now?
>
>Yes, this should be 'bpf/xsk.h'

Will change it in next version.

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-03 18:52       ` Luca Boccassi
  2019-04-04  5:36         ` Ye Xiaolong
@ 2019-04-04  5:55         ` Ye Xiaolong
  2019-04-04  7:01           ` Phil Yang (Arm Technology China)
  2019-04-04  8:39           ` Luca Boccassi
  1 sibling, 2 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-04  5:55 UTC (permalink / raw)
  To: Ferruh Yigit, Luca Boccassi
  Cc: dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz

Hi, Luca

On 04/03, Luca Boccassi wrote:
>On Wed, 2019-04-03 at 18:44 +0100, Ferruh Yigit wrote:
>> On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
>> > Add a new PMD driver for AF_XDP which is a proposed faster version
>> > of
>> > AF_PACKET interface in Linux. More info about AF_XDP, please refer
>> > to [1]
>> > [2].
>> > 
>> > This is the vanilla version PMD which just uses a raw buffer
>> > registered as
>> > the umem.
>> > 
>> > [1] 
>> > https://fosdem.org/2018/schedule/event/af_xdp/
>> > 
>> > [2] 
>> > https://lwn.net/Articles/745934/
>> > 
>> > 
>> > Signed-off-by: Xiaolong Ye <
>> > xiaolong.ye@intel.com
>> > >
>> 
>> I am not able to test functionality but code looks good to me, I can
>> compile via
>> Makefile (with suggested steps in doc) but not able to build with
>> meson, can you
>> please check below comments?
>> 
>> <...>
>> 
>> > @@ -0,0 +1,21 @@
>> > +# SPDX-License-Identifier: BSD-3-Clause
>> > +# Copyright(c) 2019 Intel Corporation
>> > +
>> > +if host_machine.system() != 'linux'
>> > +	build = false
>> > +endif
>> 
>> After this point, if build is false it shouldn't continue to below
>> checks I think.
>> 
>> > +
>> > +bpf_dep = dependency('libbpf', required: false)
>> 
>> My library is in '/usr/local/lib64/libbpf.so' but this line can't
>> find it. Where
>> does 'dependency()' checks for libraries?
>
>dependency() uses only pkg-config (or cmake or embedded specific tools,
>neither of which applies to bpf), so if you haven't built from bpf-next 
>you won't have the pkg-config file installed, and it will fall back to
>the next block.
>
>Side note, there's an issue open upstream in Meson to merge
>dependency() and find_library(), with some traction but it's not done
>yet.
>
>For me building from bpf-next it works fine:
>
>$ PKG_CONFIG_PATH=/tmp/bpf/lib64/pkgconfig/ ninja -C build-gcc-shared
>...
>Dependency libbpf found: YES 0.0.2
>...
>$ lddtree build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 
>librte_pmd_af_xdp.so.1.1 => build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 (interpreter => none)
>    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
>    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
>    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1
>    librte_ethdev.so.12 => build-gcc-shared/drivers/../lib/librte_ethdev.so.12
>    librte_eal.so.10 => build-gcc-shared/drivers/../lib/librte_eal.so.10
>    librte_kvargs.so.1 => build-gcc-shared/drivers/../lib/librte_kvargs.so.1
>    librte_net.so.1 => build-gcc-shared/drivers/../lib/librte_net.so.1
>    librte_mbuf.so.5 => build-gcc-shared/drivers/../lib/librte_mbuf.so.5
>    librte_mempool.so.5 => build-gcc-shared/drivers/../lib/librte_mempool.so.5
>    librte_ring.so.2 => build-gcc-shared/drivers/../lib/librte_ring.so.2
>    librte_cmdline.so.2 => build-gcc-shared/drivers/../lib/librte_cmdline.so.2
>    librte_meter.so.2 => build-gcc-shared/drivers/../lib/librte_meter.so.2
>    librte_bus_pci.so.2 => not found
>    librte_pci.so.1 => build-gcc-shared/drivers/../lib/librte_pci.so.1
>    librte_bus_vdev.so.2 => not found
>    libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0
>        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
>    libbpf.so.0 => /tmp/bpf/lib64/libbpf.so.0
>        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1
>            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
>    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
>    ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>
>> > +if bpf_dep.found()
>> > +	build = true
>> > +else
>> > +	bpf_dep = cc.find_library('libbpf', required: false)
>> 
>> Also this line can't find it, in log it says "(tried pkgconfig and
>> cmake)" and
>> yes there is no pkgconfig for it, any idea how 'cmake' used?
>
>The issue here is that it should be cc.find_library('bpf' - not
>'libbpf'. I missed this when reviewing, good catch.
>
>That's because find_library just does a compilation test passing the
>value to the compiler as a linker flag - so right now it's passing
>-llibbpf. Fixing this line and the header line below makes it work
>without pkg-config:
>
>$ CPPFLAGS=-I/tmp/bpf/include LDFLAGS=-L/tmp/bpf/lib64 meson testt
>...
>Dependency libbpf found: NO (tried pkgconfig and cmake)
>Library bpf found: YES

After apply the fix in af_xdp pmd's meson.build, now I was able to build
library for af_xdp pmd.

$ ls drivers/ |grep xdp
a715181@@rte_pmd_af_xdp@sha
a715181@@rte_pmd_af_xdp@sta
a715181@@tmp_rte_pmd_af_xdp@sta
librte_pmd_af_xdp.a
librte_pmd_af_xdp.so
librte_pmd_af_xdp.so.1
librte_pmd_af_xdp.so.1.1
libtmp_rte_pmd_af_xdp.a
rte_pmd_af_xdp.pmd.c

But I found that if I install libbpf to /usr/local/lib64 by default, application
built by meson build will fail to run:

$ ./dpdk-testpmd
./dpdk-testpmd: error while loading shared libraries: libbpf.so.0: cannot open shared object file: No such file or directory

While install libbpf to /usr/lib doesn't have this issue (I was testing on ubuntu system).
Is it a expected behavior? Do we need any fix for it?

Thanks,
Xiaolong
>
>> > +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies:
>> > bpf_dep) and cc.has_header('linux/if_xdp.h')
>> 
>> Should this be 'lib/xsk.h' now?
>
>Yes, this should be 'bpf/xsk.h'
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04  5:55         ` Ye Xiaolong
@ 2019-04-04  7:01           ` Phil Yang (Arm Technology China)
  2019-04-04  8:39           ` Luca Boccassi
  1 sibling, 0 replies; 214+ messages in thread
From: Phil Yang (Arm Technology China) @ 2019-04-04  7:01 UTC (permalink / raw)
  To: Ye Xiaolong, Ferruh Yigit, Luca Boccassi
  Cc: dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz, nd, nd

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ye Xiaolong
> Sent: Thursday, April 4, 2019 1:55 PM
> To: Ferruh Yigit <ferruh.yigit@intel.com>; Luca Boccassi <bluca@debian.org>
> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>;
> Qi Zhang <qi.z.zhang@intel.com>; Karlsson Magnus
> <magnus.karlsson@intel.com>; Topel Bjorn <bjorn.topel@intel.com>;
> Maxime Coquelin <maxime.coquelin@redhat.com>; Bruce Richardson
> <bruce.richardson@intel.com>; Ananyev Konstantin
> <konstantin.ananyev@intel.com>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Olivier Matz <olivier.matz@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD
> driver
> 
> Hi, Luca
> 
> On 04/03, Luca Boccassi wrote:
> >On Wed, 2019-04-03 at 18:44 +0100, Ferruh Yigit wrote:
> >> On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
> >> > Add a new PMD driver for AF_XDP which is a proposed faster version
> >> > of AF_PACKET interface in Linux. More info about AF_XDP, please
> >> > refer to [1] [2].
> >> >
> >> > This is the vanilla version PMD which just uses a raw buffer
> >> > registered as the umem.
> >> >
> >> > [1]
> >> > https://fosdem.org/2018/schedule/event/af_xdp/
> >> >
> >> > [2]
> >> > https://lwn.net/Articles/745934/
> >> >
> >> >
> >> > Signed-off-by: Xiaolong Ye <
> >> > xiaolong.ye@intel.com
> >> > >
> >>
> >> I am not able to test functionality but code looks good to me, I can
> >> compile via Makefile (with suggested steps in doc) but not able to
> >> build with meson, can you please check below comments?
> >>
> >> <...>
> >>
> >> > @@ -0,0 +1,21 @@
> >> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2019 Intel
> >> > +Corporation
> >> > +
> >> > +if host_machine.system() != 'linux'
> >> > +	build = false
> >> > +endif
> >>
> >> After this point, if build is false it shouldn't continue to below
> >> checks I think.
> >>
> >> > +
> >> > +bpf_dep = dependency('libbpf', required: false)
> >>
> >> My library is in '/usr/local/lib64/libbpf.so' but this line can't
> >> find it. Where does 'dependency()' checks for libraries?
> >
> >dependency() uses only pkg-config (or cmake or embedded specific tools,
> >neither of which applies to bpf), so if you haven't built from bpf-next
> >you won't have the pkg-config file installed, and it will fall back to
> >the next block.
> >
> >Side note, there's an issue open upstream in Meson to merge
> >dependency() and find_library(), with some traction but it's not done
> >yet.
> >
> >For me building from bpf-next it works fine:
> >
> >$ PKG_CONFIG_PATH=/tmp/bpf/lib64/pkgconfig/ ninja -C build-gcc-shared
> >...
> >Dependency libbpf found: YES 0.0.2
> >...
> >$ lddtree build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1
> >librte_pmd_af_xdp.so.1.1 => build-gcc-
> shared/drivers/librte_pmd_af_xdp.so.1.1 (interpreter => none)
> >    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> >    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> >    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1
> >    librte_ethdev.so.12 => build-gcc-
> shared/drivers/../lib/librte_ethdev.so.12
> >    librte_eal.so.10 => build-gcc-shared/drivers/../lib/librte_eal.so.10
> >    librte_kvargs.so.1 => build-gcc-shared/drivers/../lib/librte_kvargs.so.1
> >    librte_net.so.1 => build-gcc-shared/drivers/../lib/librte_net.so.1
> >    librte_mbuf.so.5 => build-gcc-shared/drivers/../lib/librte_mbuf.so.5
> >    librte_mempool.so.5 => build-gcc-
> shared/drivers/../lib/librte_mempool.so.5
> >    librte_ring.so.2 => build-gcc-shared/drivers/../lib/librte_ring.so.2
> >    librte_cmdline.so.2 => build-gcc-shared/drivers/../lib/librte_cmdline.so.2
> >    librte_meter.so.2 => build-gcc-shared/drivers/../lib/librte_meter.so.2
> >    librte_bus_pci.so.2 => not found
> >    librte_pci.so.1 => build-gcc-shared/drivers/../lib/librte_pci.so.1
> >    librte_bus_vdev.so.2 => not found
> >    libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0
> >        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> >    libbpf.so.0 => /tmp/bpf/lib64/libbpf.so.0
> >        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1
> >            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
> >    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> >    ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >
> >> > +if bpf_dep.found()
> >> > +	build = true
> >> > +else
> >> > +	bpf_dep = cc.find_library('libbpf', required: false)
> >>
> >> Also this line can't find it, in log it says "(tried pkgconfig and
> >> cmake)" and yes there is no pkgconfig for it, any idea how 'cmake'
> >> used?
> >
> >The issue here is that it should be cc.find_library('bpf' - not
> >'libbpf'. I missed this when reviewing, good catch.
> >
> >That's because find_library just does a compilation test passing the
> >value to the compiler as a linker flag - so right now it's passing
> >-llibbpf. Fixing this line and the header line below makes it work
> >without pkg-config:
> >
> >$ CPPFLAGS=-I/tmp/bpf/include LDFLAGS=-L/tmp/bpf/lib64 meson testt ...
> >Dependency libbpf found: NO (tried pkgconfig and cmake) Library bpf
> >found: YES
> 
> After apply the fix in af_xdp pmd's meson.build, now I was able to build
> library for af_xdp pmd.
> 
> $ ls drivers/ |grep xdp
> a715181@@rte_pmd_af_xdp@sha
> a715181@@rte_pmd_af_xdp@sta
> a715181@@tmp_rte_pmd_af_xdp@sta
> librte_pmd_af_xdp.a
> librte_pmd_af_xdp.so
> librte_pmd_af_xdp.so.1
> librte_pmd_af_xdp.so.1.1
> libtmp_rte_pmd_af_xdp.a
> rte_pmd_af_xdp.pmd.c
> 
> But I found that if I install libbpf to /usr/local/lib64 by default, application built
> by meson build will fail to run:
> 
> $ ./dpdk-testpmd
> ./dpdk-testpmd: error while loading shared libraries: libbpf.so.0: cannot open
> shared object file: No such file or directory
> 
> While install libbpf to /usr/lib doesn't have this issue (I was testing on ubuntu
> system).
> Is it a expected behavior? Do we need any fix for it?

Hi, Xiaolong

I have encountered the same issue on ubuntu-arm64-server with Kernel v5.1.0-rc3+. 
If I read correctly, I think Luca's patch can solve this issue.
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=dd399ac9e343c7573c47d6820e4a23013c54749d 

So, will this patch in v5.1-rc4 ?

Thanks,
Phil

> 
> Thanks,
> Xiaolong
> >
> >> > +	if bpf_dep.found() and cc.has_header('xsk.h', dependencies:
> >> > bpf_dep) and cc.has_header('linux/if_xdp.h')
> >>
> >> Should this be 'lib/xsk.h' now?
> >
> >Yes, this should be 'bpf/xsk.h'
> >
> >--
> >Kind regards,
> >Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04  5:55         ` Ye Xiaolong
  2019-04-04  7:01           ` Phil Yang (Arm Technology China)
@ 2019-04-04  8:39           ` Luca Boccassi
  2019-04-04  8:40             ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-04  8:39 UTC (permalink / raw)
  To: Ye Xiaolong, Ferruh Yigit
  Cc: dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz

On Thu, 2019-04-04 at 13:55 +0800, Ye Xiaolong wrote:
> Hi, Luca
> 
> On 04/03, Luca Boccassi wrote:
> > On Wed, 2019-04-03 at 18:44 +0100, Ferruh Yigit wrote:
> > > On 4/3/2019 5:59 PM, Xiaolong Ye wrote:
> > > > Add a new PMD driver for AF_XDP which is a proposed faster
> > > > version
> > > > of
> > > > AF_PACKET interface in Linux. More info about AF_XDP, please
> > > > refer
> > > > to [1]
> > > > [2].
> > > > 
> > > > This is the vanilla version PMD which just uses a raw buffer
> > > > registered as
> > > > the umem.
> > > > 
> > > > [1] 
> > > > https://fosdem.org/2018/schedule/event/af_xdp/
> > > > 
> > > > 
> > > > [2] 
> > > > https://lwn.net/Articles/745934/
> > > > 
> > > > 
> > > > 
> > > > Signed-off-by: Xiaolong Ye <
> > > > xiaolong.ye@intel.com
> > > > 
> > > 
> > > I am not able to test functionality but code looks good to me, I
> > > can
> > > compile via
> > > Makefile (with suggested steps in doc) but not able to build with
> > > meson, can you
> > > please check below comments?
> > > 
> > > <...>
> > > 
> > > > @@ -0,0 +1,21 @@
> > > > +# SPDX-License-Identifier: BSD-3-Clause
> > > > +# Copyright(c) 2019 Intel Corporation
> > > > +
> > > > +if host_machine.system() != 'linux'
> > > > +	build = false
> > > > +endif
> > > 
> > > After this point, if build is false it shouldn't continue to
> > > below
> > > checks I think.
> > > 
> > > > +
> > > > +bpf_dep = dependency('libbpf', required: false)
> > > 
> > > My library is in '/usr/local/lib64/libbpf.so' but this line can't
> > > find it. Where
> > > does 'dependency()' checks for libraries?
> > 
> > dependency() uses only pkg-config (or cmake or embedded specific
> > tools,
> > neither of which applies to bpf), so if you haven't built from bpf-
> > next 
> > you won't have the pkg-config file installed, and it will fall back
> > to
> > the next block.
> > 
> > Side note, there's an issue open upstream in Meson to merge
> > dependency() and find_library(), with some traction but it's not
> > done
> > yet.
> > 
> > For me building from bpf-next it works fine:
> > 
> > $ PKG_CONFIG_PATH=/tmp/bpf/lib64/pkgconfig/ ninja -C build-gcc-
> > shared
> > ...
> > Dependency libbpf found: YES 0.0.2
> > ...
> > $ lddtree build-gcc-shared/drivers/librte_pmd_af_xdp.so.1.1 
> > librte_pmd_af_xdp.so.1.1 => build-gcc-
> > shared/drivers/librte_pmd_af_xdp.so.1.1 (interpreter => none)
> >    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> >    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> >    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1
> >    librte_ethdev.so.12 => build-gcc-
> > shared/drivers/../lib/librte_ethdev.so.12
> >    librte_eal.so.10 => build-gcc-
> > shared/drivers/../lib/librte_eal.so.10
> >    librte_kvargs.so.1 => build-gcc-
> > shared/drivers/../lib/librte_kvargs.so.1
> >    librte_net.so.1 => build-gcc-
> > shared/drivers/../lib/librte_net.so.1
> >    librte_mbuf.so.5 => build-gcc-
> > shared/drivers/../lib/librte_mbuf.so.5
> >    librte_mempool.so.5 => build-gcc-
> > shared/drivers/../lib/librte_mempool.so.5
> >    librte_ring.so.2 => build-gcc-
> > shared/drivers/../lib/librte_ring.so.2
> >    librte_cmdline.so.2 => build-gcc-
> > shared/drivers/../lib/librte_cmdline.so.2
> >    librte_meter.so.2 => build-gcc-
> > shared/drivers/../lib/librte_meter.so.2
> >    librte_bus_pci.so.2 => not found
> >    librte_pci.so.1 => build-gcc-
> > shared/drivers/../lib/librte_pci.so.1
> >    librte_bus_vdev.so.2 => not found
> >    libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0
> >        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> >    libbpf.so.0 => /tmp/bpf/lib64/libbpf.so.0
> >        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1
> >            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
> >    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> >    ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-
> > 64.so.2
> > 
> > > > +if bpf_dep.found()
> > > > +	build = true
> > > > +else
> > > > +	bpf_dep = cc.find_library('libbpf', required: false)
> > > 
> > > Also this line can't find it, in log it says "(tried pkgconfig
> > > and
> > > cmake)" and
> > > yes there is no pkgconfig for it, any idea how 'cmake' used?
> > 
> > The issue here is that it should be cc.find_library('bpf' - not
> > 'libbpf'. I missed this when reviewing, good catch.
> > 
> > That's because find_library just does a compilation test passing
> > the
> > value to the compiler as a linker flag - so right now it's passing
> > -llibbpf. Fixing this line and the header line below makes it work
> > without pkg-config:
> > 
> > $ CPPFLAGS=-I/tmp/bpf/include LDFLAGS=-L/tmp/bpf/lib64 meson testt
> > ...
> > Dependency libbpf found: NO (tried pkgconfig and cmake)
> > Library bpf found: YES
> 
> After apply the fix in af_xdp pmd's meson.build, now I was able to
> build
> library for af_xdp pmd.
> 
> $ ls drivers/ |grep xdp
> a715181@@rte_pmd_af_xdp@sha
> a715181@@rte_pmd_af_xdp@sta
> a715181@@tmp_rte_pmd_af_xdp@sta
> librte_pmd_af_xdp.a
> librte_pmd_af_xdp.so
> librte_pmd_af_xdp.so.1
> librte_pmd_af_xdp.so.1.1
> libtmp_rte_pmd_af_xdp.a
> rte_pmd_af_xdp.pmd.c
> 
> But I found that if I install libbpf to /usr/local/lib64 by default,
> application
> built by meson build will fail to run:
> 
> $ ./dpdk-testpmd
> ./dpdk-testpmd: error while loading shared libraries: libbpf.so.0:
> cannot open shared object file: No such file or directory
> 
> While install libbpf to /usr/lib doesn't have this issue (I was
> testing on ubuntu system).
> Is it a expected behavior? Do we need any fix for it?

Hi,

That is expected and distro specific: if your distro doesn't add
/usr/local/lib* to the compiler path, it also won't be in the
LD_LIBRARY_PATH.

So if you do:

LD_LIBRARY_PATH=/usr/local/lib64 ./dpdk-testpmd

It should then work. It's not related to the build system, but just to
what the default paths are in the distro.

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04  8:39           ` Luca Boccassi
@ 2019-04-04  8:40             ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-04  8:40 UTC (permalink / raw)
  To: Luca Boccassi
  Cc: Ferruh Yigit, dev, Stephen Hemminger, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On 04/04, Luca Boccassi wrote:
>> 
>> After apply the fix in af_xdp pmd's meson.build, now I was able to
>> build
>> library for af_xdp pmd.
>> 
>> $ ls drivers/ |grep xdp
>> a715181@@rte_pmd_af_xdp@sha
>> a715181@@rte_pmd_af_xdp@sta
>> a715181@@tmp_rte_pmd_af_xdp@sta
>> librte_pmd_af_xdp.a
>> librte_pmd_af_xdp.so
>> librte_pmd_af_xdp.so.1
>> librte_pmd_af_xdp.so.1.1
>> libtmp_rte_pmd_af_xdp.a
>> rte_pmd_af_xdp.pmd.c
>> 
>> But I found that if I install libbpf to /usr/local/lib64 by default,
>> application
>> built by meson build will fail to run:
>> 
>> $ ./dpdk-testpmd
>> ./dpdk-testpmd: error while loading shared libraries: libbpf.so.0:
>> cannot open shared object file: No such file or directory
>> 
>> While install libbpf to /usr/lib doesn't have this issue (I was
>> testing on ubuntu system).
>> Is it a expected behavior? Do we need any fix for it?
>
>Hi,
>
>That is expected and distro specific: if your distro doesn't add
>/usr/local/lib* to the compiler path, it also won't be in the
>LD_LIBRARY_PATH.
>
>So if you do:
>
>LD_LIBRARY_PATH=/usr/local/lib64 ./dpdk-testpmd

Yes, it does work for me.

>
>It should then work. It's not related to the build system, but just to
>what the default paths are in the distro.

Thanks for confirmation, then I can send my v11 out :)

Thanks,
Xiaolong
>
>-- 
>Kind regards,
>Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v11 0/1] Introduce AF_XDP PMD
  2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
                   ` (15 preceding siblings ...)
  2019-04-03 16:59 ` [PATCH v10 0/1] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-04-04  8:51 ` Xiaolong Ye
  2019-04-04  8:51   ` [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-04 16:13   ` [PATCH v11 0/1] Introduce AF_XDP PMD Ferruh Yigit
  16 siblings, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-04  8:51 UTC (permalink / raw)
  To: dev, Stephen Hemminger, Ferruh Yigit, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz, Xiaolong Ye

Overview
========

This patchset adds a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below links [1] [2] for
details of AF_XDP introduction:

AF_XDP roadmap
==============
- AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
  in libbpf has been merged in v5.1-rc1.
- Now i40e and ixgbe drivers have supported zero copy mode.

Change logs
===========
v11:

- fix the meson build issue

v10:

- refine the Makefile, remove RTE_KERNELDIR related CFLAGS 
- add a new internal file af_xdp_deps.h to handle the dependency for
  asm/barrier.h
- fix a typo observed by Stephen
- rename xsk.h to bpf/xsk.h as xsk.h is assumed to be installed in
  /usr/local/include/bpf
- add libbpf build steps in af_xdp.rst


v9:
- adjust header files order according to Stephen's suggestion

v8:
- address Ferruh's comments on V7
- replace posix_memalign with rte_memzone_reserve_aligned to get better
  performance
- keep the first patch only as Oliver suggested as zero copy part
  implementation is still in suspense, we may provide the related patch
  later.

v7:
- mention mtu limitation in af_xdp.rst
- fix the vdev name in af_xdp.rst

V6:

- remove the newline in AF_XDP_LOG definition to avoid double new lines
  issue.
- rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.

V5:

- disable AF_XDP pmd by default due to it requires kernel more recent
  than minimum kernel version supported by DPDK
- address other review comments of Maxime

V4:

- change vdev name to net_af_xdp
- adopt dynamic log type for all logging
- Fix other style issues raised by Stephen

V3:

- Fix all style issues pointed by Stephen, Mattias, David.
- Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
  mempool driver to handle the application use of AF_XDP's zero copy feature.

V2:
- Fix a NULL pointer reference crash issue
- Fix txonly stop sending traffic in zc mode
- Refactor rte_mbuf.c to avoid ABI breakage.
- Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.

changes vs RFC sent by Qi last Aug:

- Re-work base on AF_XDP's interface changes since the new libbpf has
  provided higher-level APIs that hide many of the details of the AP_XDP
  uapi. After the rework, it helps to reduce 300+ lines of code.

- multi-queues is not supported due to current libbpf doesn't support
  multi-sockets on a single umem.

- No extra steps to load xdp program manually, since the current behavior of
  libbpf would load a default xdp program when user calls xsk_socket__create.
  userspace application only needs to handle the cleanup.

How to try
==========

1. take the kernel >= v5.1-rc1, build kernel and replace your host
   kernel with it.
   
   make sure you turn on XDP sockets when compiling

   Networking support -->
        Networking options -->
                [ * ] XDP sockets
   
2. build & install libbpf in tools/lib/bpf

   cd tools/lib/bpf
   make install_lib
   make install_headers

3. ethtool -L enp59s0f0 combined 1

4. extra step to build dpdk

   explicitly enable AF_XDP pmd by adding below line to
   config/common_linux

   CONFIG_RTE_LIBRTE_PMD_AF_XDP=y

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1

    in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
    network traffics travel to queue 0 will be redirected to af_xdp socket.

Xiaolong Ye (1):
  net/af_xdp: introduce AF XDP PMD driver

 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  50 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  28 +
 drivers/net/af_xdp/af_xdp_deps.h              |  15 +
 drivers/net/af_xdp/meson.build                |  19 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 955 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1104 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/af_xdp_deps.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

-- 
2.17.1

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04  8:51 ` [PATCH v11 0/1] Introduce AF_XDP PMD Xiaolong Ye
@ 2019-04-04  8:51   ` Xiaolong Ye
  2019-04-04 16:20     ` Luca Boccassi
  2019-04-04 23:39     ` [dpdk-dev] " Ferruh Yigit
  2019-04-04 16:13   ` [PATCH v11 0/1] Introduce AF_XDP PMD Ferruh Yigit
  1 sibling, 2 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-04  8:51 UTC (permalink / raw)
  To: dev, Stephen Hemminger, Ferruh Yigit, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz, Xiaolong Ye

Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   7 +
 config/common_base                            |   5 +
 doc/guides/nics/af_xdp.rst                    |  50 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  28 +
 drivers/net/af_xdp/af_xdp_deps.h              |  15 +
 drivers/net/af_xdp/meson.build                |  19 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 955 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1104 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/af_xdp_deps.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 71ac8cd4b..f30572c07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -489,6 +489,13 @@ M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/af_xdp.rst
+F: doc/guides/nics/features/af_xdp.ini
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 6292bc4af..b95ee03d7 100644
--- a/config/common_base
+++ b/config/common_base
@@ -430,6 +430,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..175038153
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,50 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2019 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Note that MTU of AF_XDP PMD is limited due to XDP lacks support for
+fragmentation.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > v4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > v5.1-rc4) with latest af_xdp support installed,
+   User can install libbpf via `make install_lib` && `make install_headers` in
+   <kernel src tree>/tools/lib/bpf;
+*  A Kernel bound interface to attach to;
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev net_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@ Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index bdad1ddbe..79e36739f 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -74,6 +74,13 @@ New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..eeba6b693
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += $(shell command -v pkg-config > /dev/null 2>&1 && pkg-config --libs libbpf || echo "-lbpf")
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/af_xdp_deps.h b/drivers/net/af_xdp/af_xdp_deps.h
new file mode 100644
index 000000000..18416d094
--- /dev/null
+++ b/drivers/net/af_xdp/af_xdp_deps.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef AF_XDP_DEPS_H_
+#define AF_XDP_DEPS_H_
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+
+/* This is to fix the xsk.h's dependency on asm/barrier.h */
+#define smp_rmb() rte_rmb()
+#define smp_wmb() rte_wmb()
+
+#endif /* AF_XDP_DEPS_H_ */
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..840c93728
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+if host_machine.system() == 'linux'
+	bpf_dep = dependency('libbpf', required: false)
+	if bpf_dep.found()
+		build = true
+	else
+		bpf_dep = cc.find_library('bpf', required: false)
+		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
+			build = true
+			pkgconfig_extra_libs += '-lbpf'
+		else
+			build = false
+		endif
+	endif
+	ext_deps += bpf_dep
+endif
+sources = files('rte_eth_af_xdp.c')
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..007a1c6b4
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,955 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include "af_xdp_deps.h"
+#include <bpf/xsk.h>
+
+#include <rte_ethdev.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+#include <rte_branch_prediction.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_dev.h>
+#include <rte_eal.h>
+#include <rte_ether.h>
+#include <rte_lcore.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+static int af_xdp_logtype;
+
+#define AF_XDP_LOG(level, fmt, args...)			\
+	rte_log(RTE_LOG_ ## level, af_xdp_logtype,	\
+		"%s(): " fmt, __func__, ##args)
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	const struct rte_memzone *mz;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static const struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (unlikely(!ret)) {
+		AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		void *addr = NULL;
+		if (rte_ring_dequeue(umem->buf_ring, &addr)) {
+			i--;
+			break;
+		}
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, i);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx = 0;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0))
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr);
+
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_pktmbuf_pkt_len(mbufs[i]) = len;
+		rte_pktmbuf_data_len(mbufs[i]) = len;
+		rx_bytes += len;
+		bufs[count++] = mbufs[i];
+
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return count;
+}
+
+static void
+pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq = 0;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void
+kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from completion queue to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->mz->addr,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void
+remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void
+xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	rte_memzone_free(umem->mz);
+	umem->mz = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	rte_free(umem);
+	umem = NULL;
+}
+
+static struct
+xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	const struct rte_memzone *mz;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 rte_socket_id(),
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	mz = rte_memzone_reserve_aligned("af_xdp uemem",
+			ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+			getpagesize());
+	if (mz == NULL) {
+		AF_XDP_LOG(ERR, "Failed to reserve memzone for af_xdp umem.\n");
+		goto err;
+	}
+
+	ret = xsk_umem__create(&umem->umem, mz->addr,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->mz = mz;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL)
+		return -ENOMEM;
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		xsk_socket__delete(rxq->xsk);
+		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		AF_XDP_LOG(ERR, "Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	return (ret < 0) ? -errno : 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		AF_XDP_LOG(ERR, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n",
+			   value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ] = {'\0'};
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		AF_XDP_LOG(ERR, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	if (strlen(if_name) == 0) {
+		AF_XDP_LOG(ERR, "Network interface must be specified\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		AF_XDP_LOG(ERR, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_memzone_free(internals->umem->mz);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
+
+RTE_INIT(af_xdp_init_log)
+{
+	af_xdp_logtype = rte_log_register("pmd.net.af_xdp");
+	if (af_xdp_logtype >= 0)
+		rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..f916bc9ef 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [PATCH v11 0/1] Introduce AF_XDP PMD
  2019-04-04  8:51 ` [PATCH v11 0/1] Introduce AF_XDP PMD Xiaolong Ye
  2019-04-04  8:51   ` [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-04-04 16:13   ` Ferruh Yigit
  1 sibling, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-04 16:13 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
> Overview
> ========
> 
> This patchset adds a new PMD driver for AF_XDP which is a proposed
> faster version of AF_PACKET interface in Linux, see below links [1] [2] for
> details of AF_XDP introduction:
> 
> AF_XDP roadmap
> ==============
> - AF_XDP is included in upstream kernel since 4.18, and AF_XDP support
>   in libbpf has been merged in v5.1-rc1.
> - Now i40e and ixgbe drivers have supported zero copy mode.
> 
> Change logs
> ===========
> v11:
> 
> - fix the meson build issue
> 
> v10:
> 
> - refine the Makefile, remove RTE_KERNELDIR related CFLAGS 
> - add a new internal file af_xdp_deps.h to handle the dependency for
>   asm/barrier.h
> - fix a typo observed by Stephen
> - rename xsk.h to bpf/xsk.h as xsk.h is assumed to be installed in
>   /usr/local/include/bpf
> - add libbpf build steps in af_xdp.rst
> 
> 
> v9:
> - adjust header files order according to Stephen's suggestion
> 
> v8:
> - address Ferruh's comments on V7
> - replace posix_memalign with rte_memzone_reserve_aligned to get better
>   performance
> - keep the first patch only as Oliver suggested as zero copy part
>   implementation is still in suspense, we may provide the related patch
>   later.
> 
> v7:
> - mention mtu limitation in af_xdp.rst
> - fix the vdev name in af_xdp.rst
> 
> V6:
> 
> - remove the newline in AF_XDP_LOG definition to avoid double new lines
>   issue.
> - rename MEMPOOL_F_PAGE_ALIGN to MEMPOOL_CHUNK_F_PAGE_ALIGN.
> 
> V5:
> 
> - disable AF_XDP pmd by default due to it requires kernel more recent
>   than minimum kernel version supported by DPDK
> - address other review comments of Maxime
> 
> V4:
> 
> - change vdev name to net_af_xdp
> - adopt dynamic log type for all logging
> - Fix other style issues raised by Stephen
> 
> V3:
> 
> - Fix all style issues pointed by Stephen, Mattias, David.
> - Drop the testpmd change as we'll adopt Jerin's suggestion to add a new
>   mempool driver to handle the application use of AF_XDP's zero copy feature.
> 
> V2:
> - Fix a NULL pointer reference crash issue
> - Fix txonly stop sending traffic in zc mode
> - Refactor rte_mbuf.c to avoid ABI breakage.
> - Fix multiple style issues pointed by Ferruh, David, Stephen, Luca.
> 
> changes vs RFC sent by Qi last Aug:
> 
> - Re-work base on AF_XDP's interface changes since the new libbpf has
>   provided higher-level APIs that hide many of the details of the AP_XDP
>   uapi. After the rework, it helps to reduce 300+ lines of code.
> 
> - multi-queues is not supported due to current libbpf doesn't support
>   multi-sockets on a single umem.
> 
> - No extra steps to load xdp program manually, since the current behavior of
>   libbpf would load a default xdp program when user calls xsk_socket__create.
>   userspace application only needs to handle the cleanup.
> 
> How to try
> ==========
> 
> 1. take the kernel >= v5.1-rc1, build kernel and replace your host
>    kernel with it.
>    
>    make sure you turn on XDP sockets when compiling
> 
>    Networking support -->
>         Networking options -->
>                 [ * ] XDP sockets
>    
> 2. build & install libbpf in tools/lib/bpf
> 
>    cd tools/lib/bpf
>    make install_lib
>    make install_headers
> 
> 3. ethtool -L enp59s0f0 combined 1
> 
> 4. extra step to build dpdk
> 
>    explicitly enable AF_XDP pmd by adding below line to
>    config/common_linux
> 
>    CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
> 
> 5. start testpmd
> 
>    ./build/app/testpmd -c 0xc -n 4 --vdev net_af_xdp,iface=enp59s0f0,queue=0 -- -i --rxq=1 --txq=1
> 
>     in this case, default xdp program will be loaded and linked to queue 0 of enp59s0f0,
>     network traffics travel to queue 0 will be redirected to af_xdp socket.
> 
> Xiaolong Ye (1):
>   net/af_xdp: introduce AF XDP PMD driver

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04  8:51   ` [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
@ 2019-04-04 16:20     ` Luca Boccassi
  2019-04-04 16:41       ` Stephen Hemminger
  2019-04-04 23:39     ` [dpdk-dev] " Ferruh Yigit
  1 sibling, 1 reply; 214+ messages in thread
From: Luca Boccassi @ 2019-04-04 16:20 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger, Ferruh Yigit
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On Thu, 2019-04-04 at 16:51 +0800, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to
> [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer
> registered as
> the umem.
> 
> [1] 
> https://fosdem.org/2018/schedule/event/af_xdp/
> 
> [2] 
> https://lwn.net/Articles/745934/
> 
> 
> Signed-off-by: Xiaolong Ye <
> xiaolong.ye@intel.com
> >
> ---
>  MAINTAINERS                                   |   7 +
>  config/common_base                            |   5 +
>  doc/guides/nics/af_xdp.rst                    |  50 +
>  doc/guides/nics/features/af_xdp.ini           |  11 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/rel_notes/release_19_05.rst        |   7 +
>  drivers/net/Makefile                          |   1 +
>  drivers/net/af_xdp/Makefile                   |  28 +
>  drivers/net/af_xdp/af_xdp_deps.h              |  15 +
>  drivers/net/af_xdp/meson.build                |  19 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 955
> ++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
>  drivers/net/meson.build                       |   1 +
>  mk/rte.app.mk                                 |   1 +
>  14 files changed, 1104 insertions(+)
>  create mode 100644 doc/guides/nics/af_xdp.rst
>  create mode 100644 doc/guides/nics/features/af_xdp.ini
>  create mode 100644 drivers/net/af_xdp/Makefile
>  create mode 100644 drivers/net/af_xdp/af_xdp_deps.h
>  create mode 100644 drivers/net/af_xdp/meson.build
>  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map

Acked-by: Luca Boccassi <bluca@debian.org>

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04 16:20     ` Luca Boccassi
@ 2019-04-04 16:41       ` Stephen Hemminger
  2019-04-04 17:05         ` Ferruh Yigit
  0 siblings, 1 reply; 214+ messages in thread
From: Stephen Hemminger @ 2019-04-04 16:41 UTC (permalink / raw)
  To: Luca Boccassi
  Cc: Xiaolong Ye, dev, Ferruh Yigit, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On Thu, 04 Apr 2019 17:20:43 +0100
Luca Boccassi <bluca@debian.org> wrote:

> On Thu, 2019-04-04 at 16:51 +0800, Xiaolong Ye wrote:
> > Add a new PMD driver for AF_XDP which is a proposed faster version of
> > AF_PACKET interface in Linux. More info about AF_XDP, please refer to
> > [1]
> > [2].
> > 
> > This is the vanilla version PMD which just uses a raw buffer
> > registered as
> > the umem.
> > 
> > [1] 
> > https://fosdem.org/2018/schedule/event/af_xdp/
> > 
> > [2] 
> > https://lwn.net/Articles/745934/
> > 
> > 
> > Signed-off-by: Xiaolong Ye <
> > xiaolong.ye@intel.com  

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04 16:41       ` Stephen Hemminger
@ 2019-04-04 17:05         ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-04 17:05 UTC (permalink / raw)
  To: Stephen Hemminger, Luca Boccassi
  Cc: Xiaolong Ye, dev, Qi Zhang, Karlsson Magnus, Topel Bjorn,
	Maxime Coquelin, Bruce Richardson, Ananyev Konstantin,
	David Marchand, Andrew Rybchenko, Olivier Matz

On 4/4/2019 5:41 PM, Stephen Hemminger wrote:
> On Thu, 04 Apr 2019 17:20:43 +0100
> Luca Boccassi <bluca@debian.org> wrote:
> 
>> On Thu, 2019-04-04 at 16:51 +0800, Xiaolong Ye wrote:
>>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to
>>> [1]
>>> [2].
>>>
>>> This is the vanilla version PMD which just uses a raw buffer
>>> registered as
>>> the umem.
>>>
>>> [1] 
>>> https://fosdem.org/2018/schedule/event/af_xdp/
>>>
>>> [2] 
>>> https://lwn.net/Articles/745934/
>>>
>>>
>>> Signed-off-by: Xiaolong Ye <
>>> xiaolong.ye@intel.com  
> 
> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
> 

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04  8:51   ` [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
  2019-04-04 16:20     ` Luca Boccassi
@ 2019-04-04 23:39     ` Ferruh Yigit
  2019-04-05 15:05       ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-04 23:39 UTC (permalink / raw)
  To: Xiaolong Ye, dev, Stephen Hemminger, Luca Boccassi
  Cc: Qi Zhang, Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Bruce Richardson, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
> Add a new PMD driver for AF_XDP which is a proposed faster version of
> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> [2].
> 
> This is the vanilla version PMD which just uses a raw buffer registered as
> the umem.
> 
> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> [2] https://lwn.net/Articles/745934/
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

<...>

> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
> new file mode 100644
> index 000000000..840c93728
> --- /dev/null
> +++ b/drivers/net/af_xdp/meson.build
> @@ -0,0 +1,19 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2019 Intel Corporation
> +
> +if host_machine.system() == 'linux'
> +	bpf_dep = dependency('libbpf', required: false)
> +	if bpf_dep.found()
> +		build = true
> +	else
> +		bpf_dep = cc.find_library('bpf', required: false)
> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
> +			build = true
> +			pkgconfig_extra_libs += '-lbpf'
> +		else
> +			build = false
> +		endif
> +	endif
> +	ext_deps += bpf_dep
> +endif
> +sources = files('rte_eth_af_xdp.c')

if system is not 'linux', by default build will be 'true', right, so will it try
to build the driver in that case?
What about setting "build = false" before the linux check, so won't need to set
it false again in the if block, only set it true if dependencies found?
And can 'ext_deps' go out of if block?

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-04 23:39     ` [dpdk-dev] " Ferruh Yigit
@ 2019-04-05 15:05       ` Ye Xiaolong
  2019-04-05 15:17         ` Ferruh Yigit
  2019-04-05 15:23         ` Bruce Richardson
  0 siblings, 2 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-05 15:05 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, Stephen Hemminger, Luca Boccassi, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

Hi, Ferruh

On 04/05, Ferruh Yigit wrote:
>On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>> [2].
>> 
>> This is the vanilla version PMD which just uses a raw buffer registered as
>> the umem.
>> 
>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>> [2] https://lwn.net/Articles/745934/
>> 
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>
><...>
>
>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>> new file mode 100644
>> index 000000000..840c93728
>> --- /dev/null
>> +++ b/drivers/net/af_xdp/meson.build
>> @@ -0,0 +1,19 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2019 Intel Corporation
>> +
>> +if host_machine.system() == 'linux'
>> +	bpf_dep = dependency('libbpf', required: false)
>> +	if bpf_dep.found()
>> +		build = true
>> +	else
>> +		bpf_dep = cc.find_library('bpf', required: false)
>> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
>> +			build = true
>> +			pkgconfig_extra_libs += '-lbpf'
>> +		else
>> +			build = false
>> +		endif
>> +	endif
>> +	ext_deps += bpf_dep
>> +endif
>> +sources = files('rte_eth_af_xdp.c')
>
>if system is not 'linux', by default build will be 'true', right, so will it try
>to build the driver in that case?
>What about setting "build = false" before the linux check, so won't need to set
>it false again in the if block, only set it true if dependencies found?

This is a good catch, we do need to initialize build = false first, otherwise
meson/ninja would just try to build af_xdp pmd if system is not linux, which is
undesired. Do I need to send a separate patch for this so you can squash it into
af_xdp pmd patch?

>And can 'ext_deps' go out of if block?

If we move `ext_deps += bpf_dep` out of if block and build system is not linux, there
would be error "ERROR: Unknown variable "bpf_dep".", so we need either initialize
bpf_dep (to value like NULL?) first or keep `ext_deps += bpf_dep` inside the if 
block, I'd prefer keep it as it is, what's you opinion?


Thanks,
Xiaolong


^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-05 15:05       ` Ye Xiaolong
@ 2019-04-05 15:17         ` Ferruh Yigit
  2019-04-05 15:22           ` Ye Xiaolong
  2019-04-05 15:23         ` Bruce Richardson
  1 sibling, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-05 15:17 UTC (permalink / raw)
  To: Ye Xiaolong
  Cc: dev, Stephen Hemminger, Luca Boccassi, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On 4/5/2019 4:05 PM, Ye Xiaolong wrote:
> Hi, Ferruh
> 
> On 04/05, Ferruh Yigit wrote:
>> On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
>>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>>> [2].
>>>
>>> This is the vanilla version PMD which just uses a raw buffer registered as
>>> the umem.
>>>
>>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>>> [2] https://lwn.net/Articles/745934/
>>>
>>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>>
>> <...>
>>
>>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>>> new file mode 100644
>>> index 000000000..840c93728
>>> --- /dev/null
>>> +++ b/drivers/net/af_xdp/meson.build
>>> @@ -0,0 +1,19 @@
>>> +# SPDX-License-Identifier: BSD-3-Clause
>>> +# Copyright(c) 2019 Intel Corporation
>>> +
>>> +if host_machine.system() == 'linux'
>>> +	bpf_dep = dependency('libbpf', required: false)
>>> +	if bpf_dep.found()
>>> +		build = true
>>> +	else
>>> +		bpf_dep = cc.find_library('bpf', required: false)
>>> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
>>> +			build = true
>>> +			pkgconfig_extra_libs += '-lbpf'
>>> +		else
>>> +			build = false
>>> +		endif
>>> +	endif
>>> +	ext_deps += bpf_dep
>>> +endif
>>> +sources = files('rte_eth_af_xdp.c')
>>
>> if system is not 'linux', by default build will be 'true', right, so will it try
>> to build the driver in that case?
>> What about setting "build = false" before the linux check, so won't need to set
>> it false again in the if block, only set it true if dependencies found?
> 
> This is a good catch, we do need to initialize build = false first, otherwise
> meson/ninja would just try to build af_xdp pmd if system is not linux, which is
> undesired. Do I need to send a separate patch for this so you can squash it into
> af_xdp pmd patch?

If you are agree with the change, I can send the patch and squash it into the
original af_xdp commit:

  diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
  index 840c93728..e3d86c39a 100644
  --- a/drivers/net/af_xdp/meson.build
  +++ b/drivers/net/af_xdp/meson.build
  @@ -1,6 +1,7 @@
   # SPDX-License-Identifier: BSD-3-Clause
   # Copyright(c) 2019 Intel Corporation

  +build = false
   if host_machine.system() == 'linux'
          bpf_dep = dependency('libbpf', required: false)
          if bpf_dep.found()
  @@ -10,8 +11,6 @@ if host_machine.system() == 'linux'
                  if bpf_dep.found() and cc.has_header('bpf/xsk.h',
dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
                          build = true
                          pkgconfig_extra_libs += '-lbpf'
  -               else
  -                       build = false
                  endif
          endif
          ext_deps += bpf_dep

> 
>> And can 'ext_deps' go out of if block?
> 
> If we move `ext_deps += bpf_dep` out of if block and build system is not linux, there
> would be error "ERROR: Unknown variable "bpf_dep".", so we need either initialize
> bpf_dep (to value like NULL?) first or keep `ext_deps += bpf_dep` inside the if 
> block, I'd prefer keep it as it is, what's you opinion?

OK to keep as it is.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-05 15:17         ` Ferruh Yigit
@ 2019-04-05 15:22           ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-05 15:22 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, Stephen Hemminger, Luca Boccassi, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Bruce Richardson,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On 04/05, Ferruh Yigit wrote:
>On 4/5/2019 4:05 PM, Ye Xiaolong wrote:
>> Hi, Ferruh
>> 
>> On 04/05, Ferruh Yigit wrote:
>>> On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
>>>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>>>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>>>> [2].
>>>>
>>>> This is the vanilla version PMD which just uses a raw buffer registered as
>>>> the umem.
>>>>
>>>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>>>> [2] https://lwn.net/Articles/745934/
>>>>
>>>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>>>
>>> <...>
>>>
>>>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>>>> new file mode 100644
>>>> index 000000000..840c93728
>>>> --- /dev/null
>>>> +++ b/drivers/net/af_xdp/meson.build
>>>> @@ -0,0 +1,19 @@
>>>> +# SPDX-License-Identifier: BSD-3-Clause
>>>> +# Copyright(c) 2019 Intel Corporation
>>>> +
>>>> +if host_machine.system() == 'linux'
>>>> +	bpf_dep = dependency('libbpf', required: false)
>>>> +	if bpf_dep.found()
>>>> +		build = true
>>>> +	else
>>>> +		bpf_dep = cc.find_library('bpf', required: false)
>>>> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
>>>> +			build = true
>>>> +			pkgconfig_extra_libs += '-lbpf'
>>>> +		else
>>>> +			build = false
>>>> +		endif
>>>> +	endif
>>>> +	ext_deps += bpf_dep
>>>> +endif
>>>> +sources = files('rte_eth_af_xdp.c')
>>>
>>> if system is not 'linux', by default build will be 'true', right, so will it try
>>> to build the driver in that case?
>>> What about setting "build = false" before the linux check, so won't need to set
>>> it false again in the if block, only set it true if dependencies found?
>> 
>> This is a good catch, we do need to initialize build = false first, otherwise
>> meson/ninja would just try to build af_xdp pmd if system is not linux, which is
>> undesired. Do I need to send a separate patch for this so you can squash it into
>> af_xdp pmd patch?
>
>If you are agree with the change, I can send the patch and squash it into the
>original af_xdp commit:
>
>  diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>  index 840c93728..e3d86c39a 100644
>  --- a/drivers/net/af_xdp/meson.build
>  +++ b/drivers/net/af_xdp/meson.build
>  @@ -1,6 +1,7 @@
>   # SPDX-License-Identifier: BSD-3-Clause
>   # Copyright(c) 2019 Intel Corporation
>
>  +build = false
>   if host_machine.system() == 'linux'
>          bpf_dep = dependency('libbpf', required: false)
>          if bpf_dep.found()
>  @@ -10,8 +11,6 @@ if host_machine.system() == 'linux'
>                  if bpf_dep.found() and cc.has_header('bpf/xsk.h',
>dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
>                          build = true
>                          pkgconfig_extra_libs += '-lbpf'
>  -               else
>  -                       build = false
>                  endif
>          endif
>          ext_deps += bpf_dep
>

Looks good to me, thanks for the fix.

Thanks,
Xiaolong

>> 
>>> And can 'ext_deps' go out of if block?
>> 
>> If we move `ext_deps += bpf_dep` out of if block and build system is not linux, there
>> would be error "ERROR: Unknown variable "bpf_dep".", so we need either initialize
>> bpf_dep (to value like NULL?) first or keep `ext_deps += bpf_dep` inside the if 
>> block, I'd prefer keep it as it is, what's you opinion?
>
>OK to keep as it is.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-05 15:05       ` Ye Xiaolong
  2019-04-05 15:17         ` Ferruh Yigit
@ 2019-04-05 15:23         ` Bruce Richardson
  2019-04-05 15:31           ` Ferruh Yigit
  1 sibling, 1 reply; 214+ messages in thread
From: Bruce Richardson @ 2019-04-05 15:23 UTC (permalink / raw)
  To: Ye Xiaolong
  Cc: Ferruh Yigit, dev, Stephen Hemminger, Luca Boccassi, Qi Zhang,
	Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On Fri, Apr 05, 2019 at 11:05:25PM +0800, Ye Xiaolong wrote:
> Hi, Ferruh
> 
> On 04/05, Ferruh Yigit wrote:
> >On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
> >> Add a new PMD driver for AF_XDP which is a proposed faster version of
> >> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> >> [2].
> >> 
> >> This is the vanilla version PMD which just uses a raw buffer registered as
> >> the umem.
> >> 
> >> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> >> [2] https://lwn.net/Articles/745934/
> >> 
> >> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> >
> ><...>
> >
> >> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
> >> new file mode 100644
> >> index 000000000..840c93728
> >> --- /dev/null
> >> +++ b/drivers/net/af_xdp/meson.build
> >> @@ -0,0 +1,19 @@
> >> +# SPDX-License-Identifier: BSD-3-Clause
> >> +# Copyright(c) 2019 Intel Corporation
> >> +
> >> +if host_machine.system() == 'linux'
> >> +	bpf_dep = dependency('libbpf', required: false)
> >> +	if bpf_dep.found()
> >> +		build = true
> >> +	else
> >> +		bpf_dep = cc.find_library('bpf', required: false)
> >> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
> >> +			build = true
> >> +			pkgconfig_extra_libs += '-lbpf'
> >> +		else
> >> +			build = false
> >> +		endif
> >> +	endif
> >> +	ext_deps += bpf_dep
> >> +endif
> >> +sources = files('rte_eth_af_xdp.c')
> >
> >if system is not 'linux', by default build will be 'true', right, so will it try
> >to build the driver in that case?
> >What about setting "build = false" before the linux check, so won't need to set
> >it false again in the if block, only set it true if dependencies found?
> 
> This is a good catch, we do need to initialize build = false first, otherwise
> meson/ninja would just try to build af_xdp pmd if system is not linux, which is
> undesired. Do I need to send a separate patch for this so you can squash it into
> af_xdp pmd patch?
> 
> >And can 'ext_deps' go out of if block?
> 
> If we move `ext_deps += bpf_dep` out of if block and build system is not linux, there
> would be error "ERROR: Unknown variable "bpf_dep".", so we need either initialize
> bpf_dep (to value like NULL?) first or keep `ext_deps += bpf_dep` inside the if 
> block, I'd prefer keep it as it is, what's you opinion?

Actually, a suggestion - rather than limiting the build to linux, why not
just limit the build to the presence of libbpf, and not bother checking the
OS?

/Bruce

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-05 15:23         ` Bruce Richardson
@ 2019-04-05 15:31           ` Ferruh Yigit
  2019-04-05 15:35             ` Bruce Richardson
  0 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-05 15:31 UTC (permalink / raw)
  To: Bruce Richardson, Ye Xiaolong
  Cc: dev, Stephen Hemminger, Luca Boccassi, Qi Zhang, Karlsson Magnus,
	Topel Bjorn, Maxime Coquelin, Ananyev Konstantin, David Marchand,
	Andrew Rybchenko, Olivier Matz

On 4/5/2019 4:23 PM, Bruce Richardson wrote:
> On Fri, Apr 05, 2019 at 11:05:25PM +0800, Ye Xiaolong wrote:
>> Hi, Ferruh
>>
>> On 04/05, Ferruh Yigit wrote:
>>> On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
>>>> Add a new PMD driver for AF_XDP which is a proposed faster version of
>>>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
>>>> [2].
>>>>
>>>> This is the vanilla version PMD which just uses a raw buffer registered as
>>>> the umem.
>>>>
>>>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
>>>> [2] https://lwn.net/Articles/745934/
>>>>
>>>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>>>
>>> <...>
>>>
>>>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
>>>> new file mode 100644
>>>> index 000000000..840c93728
>>>> --- /dev/null
>>>> +++ b/drivers/net/af_xdp/meson.build
>>>> @@ -0,0 +1,19 @@
>>>> +# SPDX-License-Identifier: BSD-3-Clause
>>>> +# Copyright(c) 2019 Intel Corporation
>>>> +
>>>> +if host_machine.system() == 'linux'
>>>> +	bpf_dep = dependency('libbpf', required: false)
>>>> +	if bpf_dep.found()
>>>> +		build = true
>>>> +	else
>>>> +		bpf_dep = cc.find_library('bpf', required: false)
>>>> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
>>>> +			build = true
>>>> +			pkgconfig_extra_libs += '-lbpf'
>>>> +		else
>>>> +			build = false
>>>> +		endif
>>>> +	endif
>>>> +	ext_deps += bpf_dep
>>>> +endif
>>>> +sources = files('rte_eth_af_xdp.c')
>>>
>>> if system is not 'linux', by default build will be 'true', right, so will it try
>>> to build the driver in that case?
>>> What about setting "build = false" before the linux check, so won't need to set
>>> it false again in the if block, only set it true if dependencies found?
>>
>> This is a good catch, we do need to initialize build = false first, otherwise
>> meson/ninja would just try to build af_xdp pmd if system is not linux, which is
>> undesired. Do I need to send a separate patch for this so you can squash it into
>> af_xdp pmd patch?
>>
>>> And can 'ext_deps' go out of if block?
>>
>> If we move `ext_deps += bpf_dep` out of if block and build system is not linux, there
>> would be error "ERROR: Unknown variable "bpf_dep".", so we need either initialize
>> bpf_dep (to value like NULL?) first or keep `ext_deps += bpf_dep` inside the if 
>> block, I'd prefer keep it as it is, what's you opinion?
> 
> Actually, a suggestion - rather than limiting the build to linux, why not
> just limit the build to the presence of libbpf, and not bother checking the
> OS?

If this won't cause a problem for other OS in case they have the library, I have
no objection.


^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver
  2019-04-05 15:31           ` Ferruh Yigit
@ 2019-04-05 15:35             ` Bruce Richardson
  0 siblings, 0 replies; 214+ messages in thread
From: Bruce Richardson @ 2019-04-05 15:35 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Ye Xiaolong, dev, Stephen Hemminger, Luca Boccassi, Qi Zhang,
	Karlsson Magnus, Topel Bjorn, Maxime Coquelin,
	Ananyev Konstantin, David Marchand, Andrew Rybchenko,
	Olivier Matz

On Fri, Apr 05, 2019 at 04:31:16PM +0100, Ferruh Yigit wrote:
> On 4/5/2019 4:23 PM, Bruce Richardson wrote:
> > On Fri, Apr 05, 2019 at 11:05:25PM +0800, Ye Xiaolong wrote:
> >> Hi, Ferruh
> >>
> >> On 04/05, Ferruh Yigit wrote:
> >>> On 4/4/2019 9:51 AM, Xiaolong Ye wrote:
> >>>> Add a new PMD driver for AF_XDP which is a proposed faster version of
> >>>> AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
> >>>> [2].
> >>>>
> >>>> This is the vanilla version PMD which just uses a raw buffer registered as
> >>>> the umem.
> >>>>
> >>>> [1] https://fosdem.org/2018/schedule/event/af_xdp/
> >>>> [2] https://lwn.net/Articles/745934/
> >>>>
> >>>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> >>>
> >>> <...>
> >>>
> >>>> diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
> >>>> new file mode 100644
> >>>> index 000000000..840c93728
> >>>> --- /dev/null
> >>>> +++ b/drivers/net/af_xdp/meson.build
> >>>> @@ -0,0 +1,19 @@
> >>>> +# SPDX-License-Identifier: BSD-3-Clause
> >>>> +# Copyright(c) 2019 Intel Corporation
> >>>> +
> >>>> +if host_machine.system() == 'linux'
> >>>> +	bpf_dep = dependency('libbpf', required: false)
> >>>> +	if bpf_dep.found()
> >>>> +		build = true
> >>>> +	else
> >>>> +		bpf_dep = cc.find_library('bpf', required: false)
> >>>> +		if bpf_dep.found() and cc.has_header('bpf/xsk.h', dependencies: bpf_dep) and cc.has_header('linux/if_xdp.h')
> >>>> +			build = true
> >>>> +			pkgconfig_extra_libs += '-lbpf'
> >>>> +		else
> >>>> +			build = false
> >>>> +		endif
> >>>> +	endif
> >>>> +	ext_deps += bpf_dep
> >>>> +endif
> >>>> +sources = files('rte_eth_af_xdp.c')
> >>>
> >>> if system is not 'linux', by default build will be 'true', right, so will it try
> >>> to build the driver in that case?
> >>> What about setting "build = false" before the linux check, so won't need to set
> >>> it false again in the if block, only set it true if dependencies found?
> >>
> >> This is a good catch, we do need to initialize build = false first, otherwise
> >> meson/ninja would just try to build af_xdp pmd if system is not linux, which is
> >> undesired. Do I need to send a separate patch for this so you can squash it into
> >> af_xdp pmd patch?
> >>
> >>> And can 'ext_deps' go out of if block?
> >>
> >> If we move `ext_deps += bpf_dep` out of if block and build system is not linux, there
> >> would be error "ERROR: Unknown variable "bpf_dep".", so we need either initialize
> >> bpf_dep (to value like NULL?) first or keep `ext_deps += bpf_dep` inside the if 
> >> block, I'd prefer keep it as it is, what's you opinion?
> > 
> > Actually, a suggestion - rather than limiting the build to linux, why not
> > just limit the build to the presence of libbpf, and not bother checking the
> > OS?
> 
> If this won't cause a problem for other OS in case they have the library, I have
> no objection.
>
Well, they shouldn't have, and we also check for the header file
"linux/if_xdp.h", so I don't think an additional check for linux itself is
required. Generally, I think it's better to check for the requirements of a
driver directly, rather than assuming that only certain environments would
ever have them.
Just sent my suggested patch: http://patches.dpdk.org/patch/52357/

/Bruce

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-03 15:57                       ` Ye Xiaolong
@ 2019-04-17 12:30                         ` Markus Theil
  2019-04-18  1:05                           ` Ye Xiaolong
  2019-04-18 15:20                           ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically Xiaolong Ye
  0 siblings, 2 replies; 214+ messages in thread
From: Markus Theil @ 2019-04-17 12:30 UTC (permalink / raw)
  To: dev

I tested the new af_xdp based device on the current master branch and
noticed, that the usage of static mempool names allows only for the
creation of a single af_xdp vdev. If a second vdev of the same type gets
created, the mempool allocation fails.

Best regards,
Markus Theil

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-17 12:30                         ` [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device Markus Theil
@ 2019-04-18  1:05                           ` Ye Xiaolong
  2019-04-23 16:23                             ` Markus Theil
  2019-04-18 15:20                           ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically Xiaolong Ye
  1 sibling, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-18  1:05 UTC (permalink / raw)
  To: Markus Theil; +Cc: dev

Hi, Markus

On 04/17, Markus Theil wrote:
>I tested the new af_xdp based device on the current master branch and
>noticed, that the usage of static mempool names allows only for the
>creation of a single af_xdp vdev. If a second vdev of the same type gets
>created, the mempool allocation fails.

Thanks for reporting, could you paste the cmdline you used and the error log?
Are you referring to ring creation or mempool creation?


Thanks,
Xiaolong
>
>Best regards,
>Markus Theil

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically
  2019-04-17 12:30                         ` [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device Markus Theil
  2019-04-18  1:05                           ` Ye Xiaolong
@ 2019-04-18 15:20                           ` Xiaolong Ye
  2019-04-18 15:20                             ` [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically Xiaolong Ye
                                               ` (2 more replies)
  1 sibling, 3 replies; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-18 15:20 UTC (permalink / raw)
  To: dev, Ferruh Yigit; +Cc: Qi Zhang, Xiaolong Ye

Naming the buf_ring dynamically allows to create multiple af_xdp vdevs.

Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")

Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 497e2cfde..d8e99204e 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -473,7 +473,7 @@ xdp_umem_destroy(struct xsk_umem_info *umem)
 }
 
 static struct
-xsk_umem_info *xdp_umem_configure(void)
+xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals)
 {
 	struct xsk_umem_info *umem;
 	const struct rte_memzone *mz;
@@ -482,6 +482,7 @@ xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	char ring_name[RTE_RING_NAMESIZE];
 	int ret;
 	uint64_t i;
 
@@ -491,7 +492,9 @@ xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
+	ret = snprintf(ring_name, sizeof(ring_name), "af_xdp_ring_%s_%d",
+		       internals->if_name, internals->queue_idx);
+	umem->buf_ring = rte_ring_create(ring_name,
 					 ETH_AF_XDP_NUM_BUFFERS,
 					 rte_socket_id(),
 					 0x0);
@@ -541,7 +544,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	rxq->umem = xdp_umem_configure(internals);
 	if (rxq->umem == NULL)
 		return -ENOMEM;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 214+ messages in thread

* [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically
  2019-04-18 15:20                           ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically Xiaolong Ye
@ 2019-04-18 15:20                             ` Xiaolong Ye
  2019-04-19  9:47                               ` David Marchand
  2019-04-19  9:46                             ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically David Marchand
  2019-04-19 12:47                             ` [dpdk-dev] [PATCH v2] net/af_xdp: fix creating multiple instance Ferruh Yigit
  2 siblings, 1 reply; 214+ messages in thread
From: Xiaolong Ye @ 2019-04-18 15:20 UTC (permalink / raw)
  To: dev, Ferruh Yigit; +Cc: Qi Zhang, Xiaolong Ye

Naming the umem memzone dynamically allows to create multiple af_xdp vdevs.

Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")

Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index d8e99204e..666b4c17e 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -483,6 +483,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals)
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
 	char ring_name[RTE_RING_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
 	int ret;
 	uint64_t i;
 
@@ -508,7 +509,9 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals)
 				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
 					  ETH_AF_XDP_DATA_HEADROOM));
 
-	mz = rte_memzone_reserve_aligned("af_xdp uemem",
+	ret = snprintf(mz_name, sizeof(mz_name), "af_xdp_umem_%s_%d",
+		       internals->if_name, internals->queue_idx);
+	mz = rte_memzone_reserve_aligned(mz_name,
 			ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
 			getpagesize());
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically
  2019-04-18 15:20                           ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically Xiaolong Ye
  2019-04-18 15:20                             ` [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically Xiaolong Ye
@ 2019-04-19  9:46                             ` David Marchand
  2019-04-19 12:47                             ` [dpdk-dev] [PATCH v2] net/af_xdp: fix creating multiple instance Ferruh Yigit
  2 siblings, 0 replies; 214+ messages in thread
From: David Marchand @ 2019-04-19  9:46 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Ferruh Yigit, Qi Zhang

On Thu, Apr 18, 2019 at 5:27 PM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> Naming the buf_ring dynamically allows to create multiple af_xdp vdevs.
>
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
>
> Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 497e2cfde..d8e99204e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -473,7 +473,7 @@ xdp_umem_destroy(struct xsk_umem_info *umem)
>  }
>
>  static struct
> -xsk_umem_info *xdp_umem_configure(void)
> +xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals)
>  {
>         struct xsk_umem_info *umem;
>         const struct rte_memzone *mz;
> @@ -482,6 +482,7 @@ xsk_umem_info *xdp_umem_configure(void)
>                 .comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
>                 .frame_size = ETH_AF_XDP_FRAME_SIZE,
>                 .frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
> +       char ring_name[RTE_RING_NAMESIZE];
>         int ret;
>         uint64_t i;
>
> @@ -491,7 +492,9 @@ xsk_umem_info *xdp_umem_configure(void)
>                 return NULL;
>         }
>
> -       umem->buf_ring = rte_ring_create("af_xdp_ring",
> +       ret = snprintf(ring_name, sizeof(ring_name), "af_xdp_ring_%s_%d",
> +                      internals->if_name, internals->queue_idx);
>

You can drop the ret assignment, you won't check it anyway.
queue_idx is unsigned %d -> %u ?

+       umem->buf_ring = rte_ring_create(ring_name,
>                                          ETH_AF_XDP_NUM_BUFFERS,
>                                          rte_socket_id(),
>                                          0x0);
> @@ -541,7 +544,7 @@ xsk_configure(struct pmd_internals *internals, struct
> pkt_rx_queue *rxq,
>         int ret = 0;
>         int reserve_size;
>
> -       rxq->umem = xdp_umem_configure();
> +       rxq->umem = xdp_umem_configure(internals);
>         if (rxq->umem == NULL)
>                 return -ENOMEM;
>
> --
> 2.17.1
>
>
Reviewed-by: David Marchand <david.marchand@redhat.com>


-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically
  2019-04-18 15:20                             ` [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically Xiaolong Ye
@ 2019-04-19  9:47                               ` David Marchand
  2019-04-19 12:33                                 ` Ferruh Yigit
  0 siblings, 1 reply; 214+ messages in thread
From: David Marchand @ 2019-04-19  9:47 UTC (permalink / raw)
  To: Xiaolong Ye; +Cc: dev, Ferruh Yigit, Qi Zhang

On Thu, Apr 18, 2019 at 5:27 PM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> Naming the umem memzone dynamically allows to create multiple af_xdp vdevs.
>
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
>
> Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index d8e99204e..666b4c17e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -483,6 +483,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
> *internals)
>                 .frame_size = ETH_AF_XDP_FRAME_SIZE,
>                 .frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
>         char ring_name[RTE_RING_NAMESIZE];
> +       char mz_name[RTE_MEMZONE_NAMESIZE];
>         int ret;
>         uint64_t i;
>
> @@ -508,7 +509,9 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
> *internals)
>                                  (void *)(i * ETH_AF_XDP_FRAME_SIZE +
>                                           ETH_AF_XDP_DATA_HEADROOM));
>
> -       mz = rte_memzone_reserve_aligned("af_xdp uemem",
> +       ret = snprintf(mz_name, sizeof(mz_name), "af_xdp_umem_%s_%d",
> +                      internals->if_name, internals->queue_idx);
>

Idem previous patch.

+       mz = rte_memzone_reserve_aligned(mz_name,
>                         ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>                         rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
>                         getpagesize());
> --
> 2.17.1
>
>
How about squashing those two patches as a single one ?
The issue is that you can't create multiple devices. Having the first one
still leaves the issue.


Reviewed-by: David Marchand <david.marchand@redhat.com>

-- 
David Marchand

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically
  2019-04-19  9:47                               ` David Marchand
@ 2019-04-19 12:33                                 ` Ferruh Yigit
  2019-04-19 15:05                                   ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-19 12:33 UTC (permalink / raw)
  To: David Marchand, Xiaolong Ye; +Cc: dev, Qi Zhang

On 4/19/2019 10:47 AM, David Marchand wrote:
> On Thu, Apr 18, 2019 at 5:27 PM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> 
>> Naming the umem memzone dynamically allows to create multiple af_xdp vdevs.
>>
>> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
>>
>> Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>  drivers/net/af_xdp/rte_eth_af_xdp.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index d8e99204e..666b4c17e 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -483,6 +483,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>> *internals)
>>                 .frame_size = ETH_AF_XDP_FRAME_SIZE,
>>                 .frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
>>         char ring_name[RTE_RING_NAMESIZE];
>> +       char mz_name[RTE_MEMZONE_NAMESIZE];
>>         int ret;
>>         uint64_t i;
>>
>> @@ -508,7 +509,9 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>> *internals)
>>                                  (void *)(i * ETH_AF_XDP_FRAME_SIZE +
>>                                           ETH_AF_XDP_DATA_HEADROOM));
>>
>> -       mz = rte_memzone_reserve_aligned("af_xdp uemem",
>> +       ret = snprintf(mz_name, sizeof(mz_name), "af_xdp_umem_%s_%d",
>> +                      internals->if_name, internals->queue_idx);
>>
> 
> Idem previous patch.
> 
> +       mz = rte_memzone_reserve_aligned(mz_name,
>>                         ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>>                         rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
>>                         getpagesize());
>> --
>> 2.17.1
>>
>>
> How about squashing those two patches as a single one ?
> The issue is that you can't create multiple devices. Having the first one
> still leaves the issue.
> 

+1 to squash. let me make a new version applying minor issues you pointed in
other patch, squashing both and keeping your review tag.

> 
> Reviewed-by: David Marchand <david.marchand@redhat.com>
> 


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [dpdk-dev] [PATCH v2] net/af_xdp: fix creating multiple instance
  2019-04-18 15:20                           ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically Xiaolong Ye
  2019-04-18 15:20                             ` [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically Xiaolong Ye
  2019-04-19  9:46                             ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically David Marchand
@ 2019-04-19 12:47                             ` Ferruh Yigit
  2019-04-19 12:51                               ` Ferruh Yigit
  2 siblings, 1 reply; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-19 12:47 UTC (permalink / raw)
  To: Xiaolong Ye, Qi Zhang; +Cc: dev, Markus Theil, David Marchand

From: Xiaolong Ye <xiaolong.ye@intel.com>

Naming the buf_ring and umem memzone dynamically allows
to create multiple af_xdp vdevs.

Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")

Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
---
v2:
* squashed buf_ring & memzone patches
* removed unused return value
* changed format specifier to unsigned
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 497e2cfde..acf9ad605 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -473,7 +473,7 @@ xdp_umem_destroy(struct xsk_umem_info *umem)
 }
 
 static struct
-xsk_umem_info *xdp_umem_configure(void)
+xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals)
 {
 	struct xsk_umem_info *umem;
 	const struct rte_memzone *mz;
@@ -482,6 +482,8 @@ xsk_umem_info *xdp_umem_configure(void)
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.frame_size = ETH_AF_XDP_FRAME_SIZE,
 		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	char ring_name[RTE_RING_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
 	int ret;
 	uint64_t i;
 
@@ -491,7 +493,9 @@ xsk_umem_info *xdp_umem_configure(void)
 		return NULL;
 	}
 
-	umem->buf_ring = rte_ring_create("af_xdp_ring",
+	snprintf(ring_name, sizeof(ring_name), "af_xdp_ring_%s_%u",
+		       internals->if_name, internals->queue_idx);
+	umem->buf_ring = rte_ring_create(ring_name,
 					 ETH_AF_XDP_NUM_BUFFERS,
 					 rte_socket_id(),
 					 0x0);
@@ -505,7 +509,9 @@ xsk_umem_info *xdp_umem_configure(void)
 				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
 					  ETH_AF_XDP_DATA_HEADROOM));
 
-	mz = rte_memzone_reserve_aligned("af_xdp uemem",
+	snprintf(mz_name, sizeof(mz_name), "af_xdp_umem_%s_%u",
+		       internals->if_name, internals->queue_idx);
+	mz = rte_memzone_reserve_aligned(mz_name,
 			ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
 			rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
 			getpagesize());
@@ -541,7 +547,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	int ret = 0;
 	int reserve_size;
 
-	rxq->umem = xdp_umem_configure();
+	rxq->umem = xdp_umem_configure(internals);
 	if (rxq->umem == NULL)
 		return -ENOMEM;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/af_xdp: fix creating multiple instance
  2019-04-19 12:47                             ` [dpdk-dev] [PATCH v2] net/af_xdp: fix creating multiple instance Ferruh Yigit
@ 2019-04-19 12:51                               ` Ferruh Yigit
  0 siblings, 0 replies; 214+ messages in thread
From: Ferruh Yigit @ 2019-04-19 12:51 UTC (permalink / raw)
  To: Xiaolong Ye, Qi Zhang; +Cc: dev, Markus Theil, David Marchand

On 4/19/2019 1:47 PM, Ferruh Yigit wrote:
> From: Xiaolong Ye <xiaolong.ye@intel.com>
> 
> Naming the buf_ring and umem memzone dynamically allows
> to create multiple af_xdp vdevs.
> 
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> 
> Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> Reviewed-by: David Marchand <david.marchand@redhat.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically
  2019-04-19 12:33                                 ` Ferruh Yigit
@ 2019-04-19 15:05                                   ` Ye Xiaolong
  0 siblings, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-19 15:05 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: David Marchand, dev, Qi Zhang

On 04/19, Ferruh Yigit wrote:
>On 4/19/2019 10:47 AM, David Marchand wrote:
>> On Thu, Apr 18, 2019 at 5:27 PM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>> 
>>> Naming the umem memzone dynamically allows to create multiple af_xdp vdevs.
>>>
>>> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
>>>
>>> Reported-by: Markus Theil <markus.theil@tu-ilmenau.de>
>>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>>> ---
>>>  drivers/net/af_xdp/rte_eth_af_xdp.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> index d8e99204e..666b4c17e 100644
>>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> @@ -483,6 +483,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>>> *internals)
>>>                 .frame_size = ETH_AF_XDP_FRAME_SIZE,
>>>                 .frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
>>>         char ring_name[RTE_RING_NAMESIZE];
>>> +       char mz_name[RTE_MEMZONE_NAMESIZE];
>>>         int ret;
>>>         uint64_t i;
>>>
>>> @@ -508,7 +509,9 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>>> *internals)
>>>                                  (void *)(i * ETH_AF_XDP_FRAME_SIZE +
>>>                                           ETH_AF_XDP_DATA_HEADROOM));
>>>
>>> -       mz = rte_memzone_reserve_aligned("af_xdp uemem",
>>> +       ret = snprintf(mz_name, sizeof(mz_name), "af_xdp_umem_%s_%d",
>>> +                      internals->if_name, internals->queue_idx);
>>>
>> 
>> Idem previous patch.
>> 
>> +       mz = rte_memzone_reserve_aligned(mz_name,
>>>                         ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
>>>                         rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
>>>                         getpagesize());
>>> --
>>> 2.17.1
>>>
>>>
>> How about squashing those two patches as a single one ?
>> The issue is that you can't create multiple devices. Having the first one
>> still leaves the issue.
>> 
>
>+1 to squash. let me make a new version applying minor issues you pointed in
>other patch, squashing both and keeping your review tag.

Thanks for doing the new patch.

>
>> 
>> Reviewed-by: David Marchand <david.marchand@redhat.com>
>> 
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-18  1:05                           ` Ye Xiaolong
@ 2019-04-23 16:23                             ` Markus Theil
  2019-04-24  6:35                               ` Ye Xiaolong
  0 siblings, 1 reply; 214+ messages in thread
From: Markus Theil @ 2019-04-23 16:23 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev

Hi Xiaolong,

I tested your commit "net/af_xdp: fix creating multiple instance" on the
current master branch. It does not work for me in the following minimal
test setting:

1) allocate 2x 1GB huge pages for DPDK

2) ip link add p1 type veth peer name p2

3) ./dpdk-testpmd --vdev=net_af_xdp0,iface=p1
--vdev=net_af_xdp1,iface=p2 (I also tested this with two igb devices,
with the same errors)

I'm using Linux 5.1-rc6 and an up to date libbpf. The setup works for
the first device and fails for the second device when creating bpf maps
in libbpf ("qidconf_map" or "xsks_map"). It seems, that these maps also
need unique names and cannot exist twice under the same name.
Furthermore if running step 3 again after it failed for the first time,
xdp vdev allocation already fails for the first xdp vdev and does not
reach the second one. Please let me know if you need some program output
or more information from me.

Best regards,
Markus


On 4/18/19 3:05 AM, Ye Xiaolong wrote:
> Hi, Markus
>
> On 04/17, Markus Theil wrote:
>> I tested the new af_xdp based device on the current master branch and
>> noticed, that the usage of static mempool names allows only for the
>> creation of a single af_xdp vdev. If a second vdev of the same type gets
>> created, the mempool allocation fails.
> Thanks for reporting, could you paste the cmdline you used and the error log?
> Are you referring to ring creation or mempool creation?
>
>
> Thanks,
> Xiaolong
>> Best regards,
>> Markus Theil


^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-23 16:23                             ` Markus Theil
@ 2019-04-24  6:35                               ` Ye Xiaolong
  2019-04-24  9:21                                 ` Markus Theil
  0 siblings, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-24  6:35 UTC (permalink / raw)
  To: Markus Theil; +Cc: dev

Hi, Markus

On 04/23, Markus Theil wrote:
>Hi Xiaolong,
>
>I tested your commit "net/af_xdp: fix creating multiple instance" on the
>current master branch. It does not work for me in the following minimal
>test setting:
>
>1) allocate 2x 1GB huge pages for DPDK
>
>2) ip link add p1 type veth peer name p2
>
>3) ./dpdk-testpmd --vdev=net_af_xdp0,iface=p1
>--vdev=net_af_xdp1,iface=p2 (I also tested this with two igb devices,
>with the same errors)

I've tested 19.05-rc2, started testpmd with 2 af_xdp vdev (with two i40e devices),
and it works for me.

$ ./x86_64-native-linuxapp-gcc/app/testpmd -l 5,6 -n 4 --log-level=pmd.net.af_xdp:info -b 82:00.1 --no-pci --vdev net_af_xdp0,iface=ens786f1 --vdev net_af_xdp1,iface=ens786f0
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 3C:FD:FE:C5:E2:41
Configuring Port 1 (socket 0)
Port 1: 3C:FD:FE:C5:E2:40
Checking link statuses...
Done
No commandline core given, start packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
Logical Core 6 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0
    RX queue: 0
      RX desc=0 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=0 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x0 - TX RS bit threshold=0
  port 1: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0
    RX queue: 0
      RX desc=0 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=0 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x0 - TX RS bit threshold=0
Press enter to exit

Could you paste your whole failure log here?
>
>I'm using Linux 5.1-rc6 and an up to date libbpf. The setup works for
>the first device and fails for the second device when creating bpf maps
>in libbpf ("qidconf_map" or "xsks_map"). It seems, that these maps also
>need unique names and cannot exist twice under the same name.

So far as I know, there should not be such contraint, the bpf maps creations 
are wrapped in libbpf.

>Furthermore if running step 3 again after it failed for the first time,
>xdp vdev allocation already fails for the first xdp vdev and does not
>reach the second one. Please let me know if you need some program output
>or more information from me.
>
>Best regards,
>Markus
>

Thanks,
Xiaolong

>
>On 4/18/19 3:05 AM, Ye Xiaolong wrote:
>> Hi, Markus
>>
>> On 04/17, Markus Theil wrote:
>>> I tested the new af_xdp based device on the current master branch and
>>> noticed, that the usage of static mempool names allows only for the
>>> creation of a single af_xdp vdev. If a second vdev of the same type gets
>>> created, the mempool allocation fails.
>> Thanks for reporting, could you paste the cmdline you used and the error log?
>> Are you referring to ring creation or mempool creation?
>>
>>
>> Thanks,
>> Xiaolong
>>> Best regards,
>>> Markus Theil
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-24  6:35                               ` Ye Xiaolong
@ 2019-04-24  9:21                                 ` Markus Theil
  2019-04-24 14:47                                   ` Ye Xiaolong
  2019-04-25  5:43                                   ` Ye Xiaolong
  0 siblings, 2 replies; 214+ messages in thread
From: Markus Theil @ 2019-04-24  9:21 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev

Hi Xiaolong,

I also tested with i40e devices, with the same result.

./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-2048kB
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 3C:FD:FE:A3:E7:30
Configuring Port 1 (socket 0)
xsk_configure(): Failed to create xsk socket. (-1)
eth_rx_queue_setup(): Failed to configure xdp socket
Fail to configure port 1 rx queues
EAL: Error - exiting with code: 1
  Cause: Start ports failed

If I execute the same call again, I get error -16 already on the first port:

./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-2048kB
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
xsk_configure(): Failed to create xsk socket. (-16)
eth_rx_queue_setup(): Failed to configure xdp socket
Fail to configure port 0 rx queues
EAL: Error - exiting with code: 1
  Cause: Start ports failed

Software versions/commits/infos:

- Linux 5.1-rc6
- DPDK 7f251bcf22c5729792f9243480af1b3c072876a5 (19.05-rc2)
- libbpf from https://github.com/libbpf/libbpf
(910c475f09e5c269f441d7496c27dace30dc2335)
- DPDK and libbpf build with meson

Best regards,
Markus

On 4/24/19 8:35 AM, Ye Xiaolong wrote:
> Hi, Markus
>
> On 04/23, Markus Theil wrote:
>> Hi Xiaolong,
>>
>> I tested your commit "net/af_xdp: fix creating multiple instance" on the
>> current master branch. It does not work for me in the following minimal
>> test setting:
>>
>> 1) allocate 2x 1GB huge pages for DPDK
>>
>> 2) ip link add p1 type veth peer name p2
>>
>> 3) ./dpdk-testpmd --vdev=net_af_xdp0,iface=p1
>> --vdev=net_af_xdp1,iface=p2 (I also tested this with two igb devices,
>> with the same errors)
> I've tested 19.05-rc2, started testpmd with 2 af_xdp vdev (with two i40e devices),
> and it works for me.
>
> $ ./x86_64-native-linuxapp-gcc/app/testpmd -l 5,6 -n 4 --log-level=pmd.net.af_xdp:info -b 82:00.1 --no-pci --vdev net_af_xdp0,iface=ens786f1 --vdev net_af_xdp1,iface=ens786f0
> EAL: Detected 88 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Probing VFIO support...
> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Configuring Port 0 (socket 0)
> Port 0: 3C:FD:FE:C5:E2:41
> Configuring Port 1 (socket 0)
> Port 1: 3C:FD:FE:C5:E2:40
> Checking link statuses...
> Done
> No commandline core given, start packet forwarding
> io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
> Logical Core 6 (socket 0) forwards packets on 2 streams:
>   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
>   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>
>   io packet forwarding packets/burst=32
>   nb forwarding cores=1 - nb forwarding ports=2
>   port 0: RX queue number: 1 Tx queue number: 1
>     Rx offloads=0x0 Tx offloads=0x0
>     RX queue: 0
>       RX desc=0 - RX free threshold=0
>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>       RX Offloads=0x0
>     TX queue: 0
>       TX desc=0 - TX free threshold=0
>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>       TX offloads=0x0 - TX RS bit threshold=0
>   port 1: RX queue number: 1 Tx queue number: 1
>     Rx offloads=0x0 Tx offloads=0x0
>     RX queue: 0
>       RX desc=0 - RX free threshold=0
>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>       RX Offloads=0x0
>     TX queue: 0
>       TX desc=0 - TX free threshold=0
>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>       TX offloads=0x0 - TX RS bit threshold=0
> Press enter to exit
>
> Could you paste your whole failure log here?
>> I'm using Linux 5.1-rc6 and an up to date libbpf. The setup works for
>> the first device and fails for the second device when creating bpf maps
>> in libbpf ("qidconf_map" or "xsks_map"). It seems, that these maps also
>> need unique names and cannot exist twice under the same name.
> So far as I know, there should not be such contraint, the bpf maps creations 
> are wrapped in libbpf.
>
>> Furthermore if running step 3 again after it failed for the first time,
>> xdp vdev allocation already fails for the first xdp vdev and does not
>> reach the second one. Please let me know if you need some program output
>> or more information from me.
>>
>> Best regards,
>> Markus
>>
> Thanks,
> Xiaolong
>
>> On 4/18/19 3:05 AM, Ye Xiaolong wrote:
>>> Hi, Markus
>>>
>>> On 04/17, Markus Theil wrote:
>>>> I tested the new af_xdp based device on the current master branch and
>>>> noticed, that the usage of static mempool names allows only for the
>>>> creation of a single af_xdp vdev. If a second vdev of the same type gets
>>>> created, the mempool allocation fails.
>>> Thanks for reporting, could you paste the cmdline you used and the error log?
>>> Are you referring to ring creation or mempool creation?
>>>
>>>
>>> Thanks,
>>> Xiaolong
>>>> Best regards,
>>>> Markus Theil

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-24  9:21                                 ` Markus Theil
@ 2019-04-24 14:47                                   ` Ye Xiaolong
  2019-04-24 20:33                                     ` Markus Theil
  2019-04-25  5:43                                   ` Ye Xiaolong
  1 sibling, 1 reply; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-24 14:47 UTC (permalink / raw)
  To: Markus Theil; +Cc: dev

Hi, Markus

On 04/24, Markus Theil wrote:
>Hi Xiaolong,
>
>I also tested with i40e devices, with the same result.
>
>./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
>net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
>EAL: Detected 16 lcore(s)
>EAL: Detected 1 NUMA nodes
>EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>EAL: No free hugepages reported in hugepages-2048kB
>EAL: No available hugepages reported in hugepages-2048kB
>EAL: Probing VFIO support...
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
>size=2176, socket=0
>testpmd: preferred mempool ops selected: ring_mp_mc
>Configuring Port 0 (socket 0)
>Port 0: 3C:FD:FE:A3:E7:30
>Configuring Port 1 (socket 0)
>xsk_configure(): Failed to create xsk socket. (-1)
>eth_rx_queue_setup(): Failed to configure xdp socket
>Fail to configure port 1 rx queues
>EAL: Error - exiting with code: 1
>  Cause: Start ports failed
>

What about one vdev instance on your side? And have you brought up the interface?
xsk_configure requires the interface to be up state.

dsd
Thanks,
Xiaolong


>If I execute the same call again, I get error -16 already on the first port:
>
>./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
>net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
>EAL: Detected 16 lcore(s)
>EAL: Detected 1 NUMA nodes
>EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>EAL: No free hugepages reported in hugepages-2048kB
>EAL: No available hugepages reported in hugepages-2048kB
>EAL: Probing VFIO support...
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
>size=2176, socket=0
>testpmd: preferred mempool ops selected: ring_mp_mc
>Configuring Port 0 (socket 0)
>xsk_configure(): Failed to create xsk socket. (-16)
>eth_rx_queue_setup(): Failed to configure xdp socket
>Fail to configure port 0 rx queues
>EAL: Error - exiting with code: 1
>  Cause: Start ports failed
>
>Software versions/commits/infos:
>
>- Linux 5.1-rc6
>- DPDK 7f251bcf22c5729792f9243480af1b3c072876a5 (19.05-rc2)
>- libbpf from https://github.com/libbpf/libbpf
>(910c475f09e5c269f441d7496c27dace30dc2335)
>- DPDK and libbpf build with meson
>
>Best regards,
>Markus
>
>On 4/24/19 8:35 AM, Ye Xiaolong wrote:
>> Hi, Markus
>>
>> On 04/23, Markus Theil wrote:
>>> Hi Xiaolong,
>>>
>>> I tested your commit "net/af_xdp: fix creating multiple instance" on the
>>> current master branch. It does not work for me in the following minimal
>>> test setting:
>>>
>>> 1) allocate 2x 1GB huge pages for DPDK
>>>
>>> 2) ip link add p1 type veth peer name p2
>>>
>>> 3) ./dpdk-testpmd --vdev=net_af_xdp0,iface=p1
>>> --vdev=net_af_xdp1,iface=p2 (I also tested this with two igb devices,
>>> with the same errors)
>> I've tested 19.05-rc2, started testpmd with 2 af_xdp vdev (with two i40e devices),
>> and it works for me.
>>
>> $ ./x86_64-native-linuxapp-gcc/app/testpmd -l 5,6 -n 4 --log-level=pmd.net.af_xdp:info -b 82:00.1 --no-pci --vdev net_af_xdp0,iface=ens786f1 --vdev net_af_xdp1,iface=ens786f0
>> EAL: Detected 88 lcore(s)
>> EAL: Detected 2 NUMA nodes
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>> EAL: Probing VFIO support...
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
>> testpmd: preferred mempool ops selected: ring_mp_mc
>> Configuring Port 0 (socket 0)
>> Port 0: 3C:FD:FE:C5:E2:41
>> Configuring Port 1 (socket 0)
>> Port 1: 3C:FD:FE:C5:E2:40
>> Checking link statuses...
>> Done
>> No commandline core given, start packet forwarding
>> io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
>> Logical Core 6 (socket 0) forwards packets on 2 streams:
>>   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
>>   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>>
>>   io packet forwarding packets/burst=32
>>   nb forwarding cores=1 - nb forwarding ports=2
>>   port 0: RX queue number: 1 Tx queue number: 1
>>     Rx offloads=0x0 Tx offloads=0x0
>>     RX queue: 0
>>       RX desc=0 - RX free threshold=0
>>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       RX Offloads=0x0
>>     TX queue: 0
>>       TX desc=0 - TX free threshold=0
>>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       TX offloads=0x0 - TX RS bit threshold=0
>>   port 1: RX queue number: 1 Tx queue number: 1
>>     Rx offloads=0x0 Tx offloads=0x0
>>     RX queue: 0
>>       RX desc=0 - RX free threshold=0
>>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       RX Offloads=0x0
>>     TX queue: 0
>>       TX desc=0 - TX free threshold=0
>>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       TX offloads=0x0 - TX RS bit threshold=0
>> Press enter to exit
>>
>> Could you paste your whole failure log here?
>>> I'm using Linux 5.1-rc6 and an up to date libbpf. The setup works for
>>> the first device and fails for the second device when creating bpf maps
>>> in libbpf ("qidconf_map" or "xsks_map"). It seems, that these maps also
>>> need unique names and cannot exist twice under the same name.
>> So far as I know, there should not be such contraint, the bpf maps creations 
>> are wrapped in libbpf.
>>
>>> Furthermore if running step 3 again after it failed for the first time,
>>> xdp vdev allocation already fails for the first xdp vdev and does not
>>> reach the second one. Please let me know if you need some program output
>>> or more information from me.
>>>
>>> Best regards,
>>> Markus
>>>
>> Thanks,
>> Xiaolong
>>
>>> On 4/18/19 3:05 AM, Ye Xiaolong wrote:
>>>> Hi, Markus
>>>>
>>>> On 04/17, Markus Theil wrote:
>>>>> I tested the new af_xdp based device on the current master branch and
>>>>> noticed, that the usage of static mempool names allows only for the
>>>>> creation of a single af_xdp vdev. If a second vdev of the same type gets
>>>>> created, the mempool allocation fails.
>>>> Thanks for reporting, could you paste the cmdline you used and the error log?
>>>> Are you referring to ring creation or mempool creation?
>>>>
>>>>
>>>> Thanks,
>>>> Xiaolong
>>>>> Best regards,
>>>>> Markus Theil

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-24 14:47                                   ` Ye Xiaolong
@ 2019-04-24 20:33                                     ` Markus Theil
  0 siblings, 0 replies; 214+ messages in thread
From: Markus Theil @ 2019-04-24 20:33 UTC (permalink / raw)
  To: Ye Xiaolong; +Cc: dev

Hi Xiaolong,

with only one vdev everything works. It stops working if I use two
vdevs. Both interfaces were brought up before testing.

Best regards,
Markus

On 24.04.19 16:47, Ye Xiaolong wrote:
> Hi, Markus
>
> On 04/24, Markus Theil wrote:
>> Hi Xiaolong,
>>
>> I also tested with i40e devices, with the same result.
>>
>> ./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
>> net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
>> EAL: Detected 16 lcore(s)
>> EAL: Detected 1 NUMA nodes
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>> EAL: No free hugepages reported in hugepages-2048kB
>> EAL: No available hugepages reported in hugepages-2048kB
>> EAL: Probing VFIO support...
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
>> size=2176, socket=0
>> testpmd: preferred mempool ops selected: ring_mp_mc
>> Configuring Port 0 (socket 0)
>> Port 0: 3C:FD:FE:A3:E7:30
>> Configuring Port 1 (socket 0)
>> xsk_configure(): Failed to create xsk socket. (-1)
>> eth_rx_queue_setup(): Failed to configure xdp socket
>> Fail to configure port 1 rx queues
>> EAL: Error - exiting with code: 1
>>   Cause: Start ports failed
>>
> What about one vdev instance on your side? And have you brought up the interface?
> xsk_configure requires the interface to be up state.
>
> dsd
> Thanks,
> Xiaolong
>
>
>> If I execute the same call again, I get error -16 already on the first port:
>>
>> ./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
>> net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
>> EAL: Detected 16 lcore(s)
>> EAL: Detected 1 NUMA nodes
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>> EAL: No free hugepages reported in hugepages-2048kB
>> EAL: No available hugepages reported in hugepages-2048kB
>> EAL: Probing VFIO support...
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
>> size=2176, socket=0
>> testpmd: preferred mempool ops selected: ring_mp_mc
>> Configuring Port 0 (socket 0)
>> xsk_configure(): Failed to create xsk socket. (-16)
>> eth_rx_queue_setup(): Failed to configure xdp socket
>> Fail to configure port 0 rx queues
>> EAL: Error - exiting with code: 1
>>   Cause: Start ports failed
>>
>> Software versions/commits/infos:
>>
>> - Linux 5.1-rc6
>> - DPDK 7f251bcf22c5729792f9243480af1b3c072876a5 (19.05-rc2)
>> - libbpf from https://github.com/libbpf/libbpf
>> (910c475f09e5c269f441d7496c27dace30dc2335)
>> - DPDK and libbpf build with meson
>>
>> Best regards,
>> Markus
>>
>> On 4/24/19 8:35 AM, Ye Xiaolong wrote:
>>> Hi, Markus
>>>
>>> On 04/23, Markus Theil wrote:
>>>> Hi Xiaolong,
>>>>
>>>> I tested your commit "net/af_xdp: fix creating multiple instance" on the
>>>> current master branch. It does not work for me in the following minimal
>>>> test setting:
>>>>
>>>> 1) allocate 2x 1GB huge pages for DPDK
>>>>
>>>> 2) ip link add p1 type veth peer name p2
>>>>
>>>> 3) ./dpdk-testpmd --vdev=net_af_xdp0,iface=p1
>>>> --vdev=net_af_xdp1,iface=p2 (I also tested this with two igb devices,
>>>> with the same errors)
>>> I've tested 19.05-rc2, started testpmd with 2 af_xdp vdev (with two i40e devices),
>>> and it works for me.
>>>
>>> $ ./x86_64-native-linuxapp-gcc/app/testpmd -l 5,6 -n 4 --log-level=pmd.net.af_xdp:info -b 82:00.1 --no-pci --vdev net_af_xdp0,iface=ens786f1 --vdev net_af_xdp1,iface=ens786f0
>>> EAL: Detected 88 lcore(s)
>>> EAL: Detected 2 NUMA nodes
>>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>> EAL: Probing VFIO support...
>>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>>> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
>>> testpmd: preferred mempool ops selected: ring_mp_mc
>>> Configuring Port 0 (socket 0)
>>> Port 0: 3C:FD:FE:C5:E2:41
>>> Configuring Port 1 (socket 0)
>>> Port 1: 3C:FD:FE:C5:E2:40
>>> Checking link statuses...
>>> Done
>>> No commandline core given, start packet forwarding
>>> io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
>>> Logical Core 6 (socket 0) forwards packets on 2 streams:
>>>   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
>>>   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>>>
>>>   io packet forwarding packets/burst=32
>>>   nb forwarding cores=1 - nb forwarding ports=2
>>>   port 0: RX queue number: 1 Tx queue number: 1
>>>     Rx offloads=0x0 Tx offloads=0x0
>>>     RX queue: 0
>>>       RX desc=0 - RX free threshold=0
>>>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>>       RX Offloads=0x0
>>>     TX queue: 0
>>>       TX desc=0 - TX free threshold=0
>>>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>>       TX offloads=0x0 - TX RS bit threshold=0
>>>   port 1: RX queue number: 1 Tx queue number: 1
>>>     Rx offloads=0x0 Tx offloads=0x0
>>>     RX queue: 0
>>>       RX desc=0 - RX free threshold=0
>>>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>>       RX Offloads=0x0
>>>     TX queue: 0
>>>       TX desc=0 - TX free threshold=0
>>>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>>       TX offloads=0x0 - TX RS bit threshold=0
>>> Press enter to exit
>>>
>>> Could you paste your whole failure log here?
>>>> I'm using Linux 5.1-rc6 and an up to date libbpf. The setup works for
>>>> the first device and fails for the second device when creating bpf maps
>>>> in libbpf ("qidconf_map" or "xsks_map"). It seems, that these maps also
>>>> need unique names and cannot exist twice under the same name.
>>> So far as I know, there should not be such contraint, the bpf maps creations 
>>> are wrapped in libbpf.
>>>
>>>> Furthermore if running step 3 again after it failed for the first time,
>>>> xdp vdev allocation already fails for the first xdp vdev and does not
>>>> reach the second one. Please let me know if you need some program output
>>>> or more information from me.
>>>>
>>>> Best regards,
>>>> Markus
>>>>
>>> Thanks,
>>> Xiaolong
>>>
>>>> On 4/18/19 3:05 AM, Ye Xiaolong wrote:
>>>>> Hi, Markus
>>>>>
>>>>> On 04/17, Markus Theil wrote:
>>>>>> I tested the new af_xdp based device on the current master branch and
>>>>>> noticed, that the usage of static mempool names allows only for the
>>>>>> creation of a single af_xdp vdev. If a second vdev of the same type gets
>>>>>> created, the mempool allocation fails.
>>>>> Thanks for reporting, could you paste the cmdline you used and the error log?
>>>>> Are you referring to ring creation or mempool creation?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xiaolong
>>>>>> Best regards,
>>>>>> Markus Theil

-- 
Markus Theil

Technische Universität Ilmenau, Fachgebiet Telematik/Rechnernetze
Postfach 100565
98684 Ilmenau, Germany

Phone: +49 3677 69-4582
Email: markus[dot]theil[at]tu-ilmenau[dot]de
Web: http://www.tu-ilmenau.de/telematik


^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device
  2019-04-24  9:21                                 ` Markus Theil
  2019-04-24 14:47                                   ` Ye Xiaolong
@ 2019-04-25  5:43                                   ` Ye Xiaolong
  1 sibling, 0 replies; 214+ messages in thread
From: Ye Xiaolong @ 2019-04-25  5:43 UTC (permalink / raw)
  To: Markus Theil; +Cc: dev

Hi, Markus

On 04/24, Markus Theil wrote:
>Hi Xiaolong,
>
>I also tested with i40e devices, with the same result.
>
>./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
>net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
>EAL: Detected 16 lcore(s)
>EAL: Detected 1 NUMA nodes
>EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>EAL: No free hugepages reported in hugepages-2048kB
>EAL: No available hugepages reported in hugepages-2048kB
>EAL: Probing VFIO support...
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
>size=2176, socket=0
>testpmd: preferred mempool ops selected: ring_mp_mc
>Configuring Port 0 (socket 0)
>Port 0: 3C:FD:FE:A3:E7:30
>Configuring Port 1 (socket 0)
>xsk_configure(): Failed to create xsk socket. (-1)
>eth_rx_queue_setup(): Failed to configure xdp socket
>Fail to configure port 1 rx queues
>EAL: Error - exiting with code: 1
>  Cause: Start ports failed

(-1) error should typically refer to "Operation not permitted", any special 
configuration for you interfaces and were you running it with root privilege?
and out of curiosity, why you got (-1) in your log, do you add some private
patch to print the errno?

Thanks,
Xiaolong

>
>If I execute the same call again, I get error -16 already on the first port:
>
>./dpdk-testpmd -n 4 --log-level=pmd.net.af_xdp:debug --no-pci --vdev
>net_af_xdp0,iface=enp36s0f0 --vdev net_af_xdp1,iface=enp36s0f1
>EAL: Detected 16 lcore(s)
>EAL: Detected 1 NUMA nodes
>EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>EAL: No free hugepages reported in hugepages-2048kB
>EAL: No available hugepages reported in hugepages-2048kB
>EAL: Probing VFIO support...
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456,
>size=2176, socket=0
>testpmd: preferred mempool ops selected: ring_mp_mc
>Configuring Port 0 (socket 0)
>xsk_configure(): Failed to create xsk socket. (-16)
>eth_rx_queue_setup(): Failed to configure xdp socket
>Fail to configure port 0 rx queues
>EAL: Error - exiting with code: 1
>  Cause: Start ports failed
>
>Software versions/commits/infos:
>
>- Linux 5.1-rc6
>- DPDK 7f251bcf22c5729792f9243480af1b3c072876a5 (19.05-rc2)
>- libbpf from https://github.com/libbpf/libbpf
>(910c475f09e5c269f441d7496c27dace30dc2335)
>- DPDK and libbpf build with meson
>
>Best regards,
>Markus
>
>On 4/24/19 8:35 AM, Ye Xiaolong wrote:
>> Hi, Markus
>>
>> On 04/23, Markus Theil wrote:
>>> Hi Xiaolong,
>>>
>>> I tested your commit "net/af_xdp: fix creating multiple instance" on the
>>> current master branch. It does not work for me in the following minimal
>>> test setting:
>>>
>>> 1) allocate 2x 1GB huge pages for DPDK
>>>
>>> 2) ip link add p1 type veth peer name p2
>>>
>>> 3) ./dpdk-testpmd --vdev=net_af_xdp0,iface=p1
>>> --vdev=net_af_xdp1,iface=p2 (I also tested this with two igb devices,
>>> with the same errors)
>> I've tested 19.05-rc2, started testpmd with 2 af_xdp vdev (with two i40e devices),
>> and it works for me.
>>
>> $ ./x86_64-native-linuxapp-gcc/app/testpmd -l 5,6 -n 4 --log-level=pmd.net.af_xdp:info -b 82:00.1 --no-pci --vdev net_af_xdp0,iface=ens786f1 --vdev net_af_xdp1,iface=ens786f0
>> EAL: Detected 88 lcore(s)
>> EAL: Detected 2 NUMA nodes
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>> EAL: Probing VFIO support...
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp0
>> rte_pmd_af_xdp_probe(): Initializing pmd_af_xdp for net_af_xdp1
>> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
>> testpmd: preferred mempool ops selected: ring_mp_mc
>> Configuring Port 0 (socket 0)
>> Port 0: 3C:FD:FE:C5:E2:41
>> Configuring Port 1 (socket 0)
>> Port 1: 3C:FD:FE:C5:E2:40
>> Checking link statuses...
>> Done
>> No commandline core given, start packet forwarding
>> io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
>> Logical Core 6 (socket 0) forwards packets on 2 streams:
>>   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
>>   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>>
>>   io packet forwarding packets/burst=32
>>   nb forwarding cores=1 - nb forwarding ports=2
>>   port 0: RX queue number: 1 Tx queue number: 1
>>     Rx offloads=0x0 Tx offloads=0x0
>>     RX queue: 0
>>       RX desc=0 - RX free threshold=0
>>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       RX Offloads=0x0
>>     TX queue: 0
>>       TX desc=0 - TX free threshold=0
>>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       TX offloads=0x0 - TX RS bit threshold=0
>>   port 1: RX queue number: 1 Tx queue number: 1
>>     Rx offloads=0x0 Tx offloads=0x0
>>     RX queue: 0
>>       RX desc=0 - RX free threshold=0
>>       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       RX Offloads=0x0
>>     TX queue: 0
>>       TX desc=0 - TX free threshold=0
>>       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
>>       TX offloads=0x0 - TX RS bit threshold=0
>> Press enter to exit
>>
>> Could you paste your whole failure log here?
>>> I'm using Linux 5.1-rc6 and an up to date libbpf. The setup works for
>>> the first device and fails for the second device when creating bpf maps
>>> in libbpf ("qidconf_map" or "xsks_map"). It seems, that these maps also
>>> need unique names and cannot exist twice under the same name.
>> So far as I know, there should not be such contraint, the bpf maps creations 
>> are wrapped in libbpf.
>>
>>> Furthermore if running step 3 again after it failed for the first time,
>>> xdp vdev allocation already fails for the first xdp vdev and does not
>>> reach the second one. Please let me know if you need some program output
>>> or more information from me.
>>>
>>> Best regards,
>>> Markus
>>>
>> Thanks,
>> Xiaolong
>>
>>> On 4/18/19 3:05 AM, Ye Xiaolong wrote:
>>>> Hi, Markus
>>>>
>>>> On 04/17, Markus Theil wrote:
>>>>> I tested the new af_xdp based device on the current master branch and
>>>>> noticed, that the usage of static mempool names allows only for the
>>>>> creation of a single af_xdp vdev. If a second vdev of the same type gets
>>>>> created, the mempool allocation fails.
>>>> Thanks for reporting, could you paste the cmdline you used and the error log?
>>>> Are you referring to ring creation or mempool creation?
>>>>
>>>>
>>>> Thanks,
>>>> Xiaolong
>>>>> Best regards,
>>>>> Markus Theil

^ permalink raw reply	[flat|nested] 214+ messages in thread

end of thread, other threads:[~2019-04-25  5:50 UTC | newest]

Thread overview: 214+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01  8:09 [PATCH v1 0/6] Introduce AF_XDP PMD Xiaolong Ye
2019-03-01  8:09 ` [PATCH v1 1/6] net/af_xdp: introduce AF_XDP PMD driver Xiaolong Ye
2019-03-01 15:38   ` Luca Boccassi
2019-03-02  8:14     ` Ye Xiaolong
2019-03-17  3:34       ` Ye Xiaolong
2019-03-24 12:07         ` Luca Boccassi
2019-03-25  2:45           ` Ye Xiaolong
2019-03-25 10:42             ` Luca Boccassi
2019-03-25 12:22               ` Ye Xiaolong
2019-03-26  2:18               ` Ye Xiaolong
2019-03-26 10:14                 ` Luca Boccassi
2019-03-26 12:12                   ` Ye Xiaolong
2019-03-01 18:31   ` Stephen Hemminger
2019-03-02  8:08     ` Ye Xiaolong
2019-03-01 18:32   ` Stephen Hemminger
2019-03-02  8:07     ` Ye Xiaolong
2019-03-05  8:25   ` David Marchand
2019-03-07  3:19     ` Ye Xiaolong
2019-03-11 16:20   ` Ferruh Yigit
2019-03-12 15:54     ` Ye Xiaolong
2019-03-13 10:54       ` Ferruh Yigit
2019-03-13 11:12         ` Ye Xiaolong
2019-03-17  3:35       ` Ye Xiaolong
2019-03-01  8:09 ` [PATCH v1 2/6] lib/mbuf: enable parse flags when create mempool Xiaolong Ye
2019-03-05  8:30   ` David Marchand
2019-03-07  3:07     ` Ye Xiaolong
2019-03-01  8:09 ` [PATCH v1 3/6] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-01  8:09 ` [PATCH v1 4/6] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-01  8:09 ` [PATCH v1 5/6] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-01  8:09 ` [PATCH v1 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
2019-03-01 18:34   ` Stephen Hemminger
2019-03-02  8:06     ` Ye Xiaolong
2019-03-11 16:46   ` Ferruh Yigit
2019-03-12 15:10     ` Ye Xiaolong
2019-03-11 16:43 ` [PATCH v1 0/6] Introduce AF_XDP PMD Ferruh Yigit
2019-03-11 17:19   ` Thomas Monjalon
2019-03-12  1:51     ` Zhang, Qi Z
2019-03-12  7:55       ` Karlsson, Magnus
2019-03-19  7:12 ` [PATCH v2 " Xiaolong Ye
2019-03-19  7:12   ` [PATCH v2 1/6] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-03-19  9:07     ` Mattias Rönnblom
2019-03-19  9:49       ` Ye Xiaolong
2019-03-19 16:14     ` Stephen Hemminger
2019-03-20  2:32       ` Ye Xiaolong
2019-03-19 16:16     ` Stephen Hemminger
2019-03-19 16:33       ` Bruce Richardson
2019-03-20  2:07         ` Ye Xiaolong
2019-03-20  2:05       ` Ye Xiaolong
2019-03-20  9:23     ` David Marchand
2019-03-20 15:20       ` Ye Xiaolong
2019-03-19  7:12   ` [PATCH v2 2/6] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
2019-03-19  7:12   ` [PATCH v2 3/6] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-19  7:12   ` [PATCH v2 4/6] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-19  7:12   ` [PATCH v2 5/6] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-19  8:12     ` Mattias Rönnblom
2019-03-19  8:39       ` Ye Xiaolong
2019-03-20  9:22     ` David Marchand
2019-03-20  9:48       ` Zhang, Qi Z
2019-03-19  7:12   ` [PATCH v2 6/6] app/testpmd: add mempool flags parameter Xiaolong Ye
2019-03-19 23:36     ` Jerin Jacob Kollanukkaran
2019-03-20  2:08       ` Ye Xiaolong
2019-03-20  9:23       ` David Marchand
2019-03-20 15:22         ` Ye Xiaolong
2019-03-21  9:18 ` [PATCH v3 0/5] Introduce AF_XDP PMD Xiaolong Ye
2019-03-21  9:18   ` [PATCH v3 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-03-21 15:24     ` Stephen Hemminger
2019-03-22  2:05       ` Ye Xiaolong
2019-03-21 15:25     ` Stephen Hemminger
2019-03-22  2:05       ` Ye Xiaolong
2019-03-21 15:27     ` Stephen Hemminger
2019-03-22  2:04       ` Ye Xiaolong
2019-03-21 15:28     ` Stephen Hemminger
2019-03-22  2:15       ` Ye Xiaolong
2019-03-22 15:38         ` Stephen Hemminger
2019-03-22 23:20           ` Ye Xiaolong
2019-03-21 15:30     ` Stephen Hemminger
2019-03-22  2:01       ` Ye Xiaolong
2019-03-22 15:37         ` Stephen Hemminger
2019-03-22 23:19           ` Ye Xiaolong
2019-03-21 15:31     ` Stephen Hemminger
2019-03-22  1:55       ` Ye Xiaolong
2019-03-21 15:32     ` Stephen Hemminger
2019-03-22  1:54       ` Ye Xiaolong
2019-03-21 15:36     ` Stephen Hemminger
2019-03-22  1:49       ` Ye Xiaolong
2019-03-22  9:32         ` Bruce Richardson
2019-03-21  9:18   ` [PATCH v3 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
2019-03-21  9:18   ` [PATCH v3 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-21 14:00     ` Ananyev, Konstantin
2019-03-21 14:23       ` Zhang, Qi Z
2019-03-21  9:18   ` [PATCH v3 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-21  9:18   ` [PATCH v3 5/5] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-22 13:01 ` [PATCH v4 0/5] Introduce AF_XDP PMD Xiaolong Ye
2019-03-22 13:01   ` [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-03-22 14:32     ` Maxime Coquelin
2019-03-24  9:32       ` Ye Xiaolong
2019-03-24 12:10     ` Luca Boccassi
2019-03-24 16:27       ` Thomas Monjalon
2019-03-22 13:01   ` [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
2019-03-22 14:36     ` Maxime Coquelin
2019-03-24  9:08       ` Ye Xiaolong
2019-03-22 13:01   ` [PATCH v4 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-22 13:01   ` [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-22 14:51     ` Maxime Coquelin
2019-03-24  9:08       ` Ye Xiaolong
2019-03-24 11:52         ` Ye Xiaolong
2019-03-22 13:01   ` [PATCH v4 5/5] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-25  6:03 ` [PATCH v5 0/5] Introduce AF_XDP PMD Xiaolong Ye
2019-03-25  6:03   ` [PATCH v5 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-03-25 15:58     ` Stephen Hemminger
2019-03-26  2:13       ` Ye Xiaolong
2019-03-25  6:03   ` [PATCH v5 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
2019-03-25  6:03   ` [PATCH v5 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-25  9:04     ` Andrew Rybchenko
2019-03-26  3:27       ` Ye Xiaolong
2019-03-25  6:03   ` [PATCH v5 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-25  6:04   ` [PATCH v5 5/5] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-26 12:20 ` [PATCH v6 0/5] Introduce AF_XDP PMD Xiaolong Ye
2019-03-26 12:20   ` [PATCH v6 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-03-26 19:08     ` Stephen Hemminger
2019-03-27  5:33       ` Ye Xiaolong
2019-03-26 12:20   ` [PATCH v6 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
2019-03-26 12:20   ` [PATCH v6 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-26 12:20   ` [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-29 17:42     ` Olivier Matz
2019-03-31 12:38       ` Ye Xiaolong
2019-04-01  5:47         ` Zhang, Qi Z
2019-03-26 12:20   ` [PATCH v6 5/5] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-27  9:00 ` [PATCH v7 0/5] Introduce AF_XDP PMD Xiaolong Ye
2019-03-27  9:00   ` [PATCH v7 1/5] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-03-28 17:51     ` Ferruh Yigit
2019-03-28 18:52       ` Luca Boccassi
2019-04-02 19:55         ` Ferruh Yigit
2019-03-29  2:05       ` Ye Xiaolong
2019-03-29  8:10         ` Ferruh Yigit
2019-03-27  9:00   ` [PATCH v7 2/5] lib/mbuf: introduce helper to create mempool with flags Xiaolong Ye
2019-03-28 19:30     ` Ferruh Yigit
2019-03-27  9:00   ` [PATCH v7 3/5] lib/mempool: allow page size aligned mempool Xiaolong Ye
2019-03-28 19:34     ` Ferruh Yigit
2019-03-29 10:37     ` Andrew Rybchenko
2019-03-29 17:42       ` Olivier Matz
2019-03-27  9:00   ` [PATCH v7 4/5] net/af_xdp: use mbuf mempool for buffer management Xiaolong Ye
2019-03-27  9:00   ` [PATCH v7 5/5] net/af_xdp: enable zero copy Xiaolong Ye
2019-03-28 18:44     ` Ferruh Yigit
2019-03-29  1:53       ` Ye Xiaolong
2019-04-02 10:45 ` [PATCH v8 0/1] AF_XDP PMD Xiaolong Ye
2019-04-02 10:45   ` [PATCH v8 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-04-02 14:58     ` Stephen Hemminger
2019-04-02 15:10       ` Ye Xiaolong
2019-04-02 15:46 ` [PATCH v9 0/1] Introduce AF_XDP PMD Xiaolong Ye
2019-04-02 15:46   ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-04-02 18:56     ` Stephen Hemminger
2019-04-02 23:01       ` Ye Xiaolong
2019-04-02 19:19     ` Luca Boccassi
2019-04-03  9:59       ` Ye Xiaolong
2019-04-03 10:36         ` Luca Boccassi
2019-04-03 10:42           ` Luca Boccassi
2019-04-03 11:18             ` Ferruh Yigit
2019-04-03 11:35               ` Luca Boccassi
2019-04-03 12:16                 ` Luca Boccassi
2019-04-03 12:33                   ` Ferruh Yigit
2019-04-03 13:09                 ` Ferruh Yigit
2019-04-03 13:29                   ` Luca Boccassi
2019-04-03 14:43                     ` Ye Xiaolong
2019-04-03 14:51                       ` Luca Boccassi
2019-04-03 15:14                         ` Ye Xiaolong
2019-04-03 15:23                           ` Bruce Richardson
2019-04-03 15:34                             ` Ye Xiaolong
2019-04-03 14:22                   ` Ye Xiaolong
2019-04-03 15:52                     ` Ferruh Yigit
2019-04-03 15:57                       ` Ye Xiaolong
2019-04-17 12:30                         ` [dpdk-dev] [BUG] net/af_xdp: Current code can only create one af_xdp device Markus Theil
2019-04-18  1:05                           ` Ye Xiaolong
2019-04-23 16:23                             ` Markus Theil
2019-04-24  6:35                               ` Ye Xiaolong
2019-04-24  9:21                                 ` Markus Theil
2019-04-24 14:47                                   ` Ye Xiaolong
2019-04-24 20:33                                     ` Markus Theil
2019-04-25  5:43                                   ` Ye Xiaolong
2019-04-18 15:20                           ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically Xiaolong Ye
2019-04-18 15:20                             ` [dpdk-dev] [PATCH v1 2/2] net/af_xdp: name the umem memzone dynamically Xiaolong Ye
2019-04-19  9:47                               ` David Marchand
2019-04-19 12:33                                 ` Ferruh Yigit
2019-04-19 15:05                                   ` Ye Xiaolong
2019-04-19  9:46                             ` [dpdk-dev] [PATCH v1 1/2] net/af_xdp: name the buf ring dynamically David Marchand
2019-04-19 12:47                             ` [dpdk-dev] [PATCH v2] net/af_xdp: fix creating multiple instance Ferruh Yigit
2019-04-19 12:51                               ` Ferruh Yigit
2019-04-02 19:43     ` [PATCH v9 1/1] net/af_xdp: introduce AF XDP PMD driver Ferruh Yigit
2019-04-03 13:22       ` Bruce Richardson
2019-04-03 13:34         ` Ferruh Yigit
2019-04-03 16:59 ` [PATCH v10 0/1] Introduce AF_XDP PMD Xiaolong Ye
2019-04-03 16:59   ` [PATCH v10 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-04-03 17:32     ` Luca Boccassi
2019-04-03 17:44     ` Ferruh Yigit
2019-04-03 18:52       ` Luca Boccassi
2019-04-04  5:36         ` Ye Xiaolong
2019-04-04  5:55         ` Ye Xiaolong
2019-04-04  7:01           ` Phil Yang (Arm Technology China)
2019-04-04  8:39           ` Luca Boccassi
2019-04-04  8:40             ` Ye Xiaolong
2019-04-04  5:29       ` Ye Xiaolong
2019-04-04  8:51 ` [PATCH v11 0/1] Introduce AF_XDP PMD Xiaolong Ye
2019-04-04  8:51   ` [PATCH v11 1/1] net/af_xdp: introduce AF XDP PMD driver Xiaolong Ye
2019-04-04 16:20     ` Luca Boccassi
2019-04-04 16:41       ` Stephen Hemminger
2019-04-04 17:05         ` Ferruh Yigit
2019-04-04 23:39     ` [dpdk-dev] " Ferruh Yigit
2019-04-05 15:05       ` Ye Xiaolong
2019-04-05 15:17         ` Ferruh Yigit
2019-04-05 15:22           ` Ye Xiaolong
2019-04-05 15:23         ` Bruce Richardson
2019-04-05 15:31           ` Ferruh Yigit
2019-04-05 15:35             ` Bruce Richardson
2019-04-04 16:13   ` [PATCH v11 0/1] Introduce AF_XDP PMD Ferruh Yigit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.