All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/7] PMD driver for AF_XDP
@ 2018-03-08 13:52 Qi Zhang
  2018-03-08 13:52 ` [RFC v2 1/7] net/af_xdp: new PMD driver Qi Zhang
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

v2:
- fix lisence header
- clean up bpf dependency, bpf program is embedded,  no "xdpsock_kern.o" required
- clean up make file, only linux_header is required 
- fix all the compile warning.
- fix packet number return in Tx.

The RFC patches add a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below link for
detail AF_XDP introduction:
https://fosdem.org/2018/schedule/event/af_xdp/
https://lwn.net/Articles/745934/

This patchset is base on v18.02.
It also require a linux kernel that have below AF_XDP RFC patches be
applied.
https://patchwork.ozlabs.org/patch/867961/
https://patchwork.ozlabs.org/patch/867960/
https://patchwork.ozlabs.org/patch/867938/
https://patchwork.ozlabs.org/patch/867939/
https://patchwork.ozlabs.org/patch/867940/
https://patchwork.ozlabs.org/patch/867941/
https://patchwork.ozlabs.org/patch/867942/
https://patchwork.ozlabs.org/patch/867943/
https://patchwork.ozlabs.org/patch/867944/
https://patchwork.ozlabs.org/patch/867945/
https://patchwork.ozlabs.org/patch/867946/
https://patchwork.ozlabs.org/patch/867947/
https://patchwork.ozlabs.org/patch/867948/
https://patchwork.ozlabs.org/patch/867949/
https://patchwork.ozlabs.org/patch/867950/
https://patchwork.ozlabs.org/patch/867951/
https://patchwork.ozlabs.org/patch/867952/
https://patchwork.ozlabs.org/patch/867953/
https://patchwork.ozlabs.org/patch/867954/
https://patchwork.ozlabs.org/patch/867955/
https://patchwork.ozlabs.org/patch/867956/
https://patchwork.ozlabs.org/patch/867957/
https://patchwork.ozlabs.org/patch/867958/
https://patchwork.ozlabs.org/patch/867959/

There is no clean upstream target yet since kernel patch is still in
RFC stage, The purpose of the patchset is just for anyone that want to
eveluate af_xdp with DPDK application and get feedback for further
improvement.

To try with the new PMD
1. compile and install the kernel with above patches applied. 
   (commit 91e6dd8284256e should work)
2. configure $LINUX_HEADER_DIR (dir of "make headers_install")
   at driver/net/af_xdp/Makefile
3. make sure libelf and libbpf is installed.
4. testpmd cmdline example:
./testpmd -c 0x3 -n 4 --vdev 'eth_af_xdp,iface=eth0' -- -i --mbuf-size=1856 --mp-flags=65

BTW, performance test shows our PMD can reach 94%~98% of the orignal
benchmark when share memory is enabled.

Qi Zhang (7):
  net/af_xdp: new PMD driver
  lib/mbuf: enable parse flags when create mempool
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable share mempool
  net/af_xdp: load BPF file
  app/testpmd: enable parameter for mempool flags

 app/test-pmd/parameters.c                     |  12 +
 app/test-pmd/testpmd.c                        |  15 +-
 app/test-pmd/testpmd.h                        |   1 +
 config/common_base                            |   5 +
 config/common_linuxapp                        |   1 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  29 +
 drivers/net/af_xdp/bpf_load.c                 | 168 +++++
 drivers/net/af_xdp/bpf_load.h                 |  11 +
 drivers/net/af_xdp/meson.build                |   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 909 ++++++++++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 drivers/net/af_xdp/xdpsock_queue.h            |  66 ++
 lib/librte_mbuf/rte_mbuf.c                    |  15 +-
 lib/librte_mbuf/rte_mbuf.h                    |   8 +-
 lib/librte_mempool/rte_mempool.c              |   2 +
 lib/librte_mempool/rte_mempool.h              |   1 +
 mk/rte.app.mk                                 |   1 +
 18 files changed, 1246 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/bpf_load.c
 create mode 100644 drivers/net/af_xdp/bpf_load.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
 create mode 100644 drivers/net/af_xdp/xdpsock_queue.h

-- 
2.13.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC v2 1/7] net/af_xdp: new PMD driver
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  2018-03-08 13:52 ` [RFC v2 2/7] lib/mbuf: enable parse flags when create mempool Qi Zhang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

This is the vanilla version.
Packet data will copy between af_xdp memory buffer and mbuf mempool.
indexes of memory buffer is simply managed by a fifo ring.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 config/common_base                            |   5 +
 config/common_linuxapp                        |   1 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  26 +
 drivers/net/af_xdp/meson.build                |   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 760 ++++++++++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 drivers/net/af_xdp/xdpsock_queue.h            |  66 +++
 mk/rte.app.mk                                 |   1 +
 9 files changed, 871 insertions(+)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
 create mode 100644 drivers/net/af_xdp/xdpsock_queue.h

diff --git a/config/common_base b/config/common_base
index ad03cf433..84b7b3b7e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -368,6 +368,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
+#
 # Compile link bonding PMD library
 #
 CONFIG_RTE_LIBRTE_PMD_BOND=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..3b10695b6 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -16,6 +16,7 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e1127326b..409234ac3 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..990073655
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3 -I/opt/af_xdp/linux_headers/include
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..4b6652685
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+sources = files('rte_eth_af_xdp.c')
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..5c7c53aeb
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,760 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include "xdpsock_queue.h"
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+#define ETH_AF_XDP_IFACE_ARG		"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG	"queue"
+#define ETH_AF_XDP_RING_SIZE_ARG	"ringsz"
+
+#define ETH_AF_XDP_FRAME_SIZE		2048
+#define ETH_AF_XDP_NUM_BUFFERS		131072
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_RING_SIZE	1024
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+struct xdp_umem {
+	char *buffer;
+	size_t size;
+	unsigned int frame_size;
+	unsigned int frame_size_log2;
+	unsigned int nframes;
+	int mr_fd;
+};
+
+struct pmd_internals {
+	int sfd;
+	int if_index;
+	char if_name[IFNAMSIZ];
+	struct ether_addr eth_addr;
+	struct xdp_queue rx;
+	struct xdp_queue tx;
+	struct xdp_umem *umem;
+	struct rte_mempool *mb_pool;
+
+	unsigned long rx_pkts;
+	unsigned long rx_bytes;
+	unsigned long rx_dropped;
+
+	unsigned long tx_pkts;
+	unsigned long err_pkts;
+	unsigned long tx_bytes;
+
+	uint16_t port_id;
+	uint16_t queue_idx;
+	int ring_size;
+	struct rte_ring *buf_ring;
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	ETH_AF_XDP_RING_SIZE_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static void *get_pkt_data(struct pmd_internals *internals,
+			  uint32_t index,
+			  uint32_t offset)
+{
+	return (void *)(internals->umem->buffer +
+			(index << internals->umem->frame_size_log2) + offset);
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pmd_internals *internals = queue;
+	struct xdp_queue *rxq = &internals->rx;
+	struct rte_mbuf *mbuf;
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+
+	nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ?
+		  nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
+
+	struct xdp_desc descs[ETH_AF_XDP_RX_BATCH_SIZE];
+	void *indexes[ETH_AF_XDP_RX_BATCH_SIZE];
+	int rcvd, i;
+
+	/* fill rx ring */
+	if (rxq->num_free >= ETH_AF_XDP_RX_BATCH_SIZE) {
+		int n = rte_ring_dequeue_bulk(internals->buf_ring,
+					      indexes,
+					      ETH_AF_XDP_RX_BATCH_SIZE,
+					      NULL);
+		for (i = 0; i < n; i++)
+			descs[i].idx = (uint32_t)((long int)indexes[i]);
+		xq_enq(rxq, descs, n);
+	}
+
+	/* read data */
+	rcvd = xq_deq(rxq, descs, nb_pkts);
+	if (rcvd == 0)
+		return 0;
+
+	for (i = 0; i < rcvd; i++) {
+		char *pkt;
+		uint32_t idx = descs[i].idx;
+
+		mbuf = rte_pktmbuf_alloc(internals->mb_pool);
+		rte_pktmbuf_pkt_len(mbuf) =
+			rte_pktmbuf_data_len(mbuf) =
+			descs[i].len;
+		if (mbuf) {
+			pkt = get_pkt_data(internals, idx, descs[i].offset);
+			memcpy(rte_pktmbuf_mtod(mbuf, void *),
+			       pkt, descs[i].len);
+			rx_bytes += descs[i].len;
+			bufs[count++] = mbuf;
+		} else {
+			dropped++;
+		}
+		indexes[i] = (void *)((long int)idx);
+	}
+
+	rte_ring_enqueue_bulk(internals->buf_ring, indexes, rcvd, NULL);
+
+	internals->rx_pkts += (rcvd - dropped);
+	internals->rx_bytes += rx_bytes;
+	internals->rx_dropped += dropped;
+
+	return count;
+}
+
+static void kick_tx(int fd)
+{
+	int ret;
+
+	for (;;) {
+		ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0);
+		if (ret >= 0 || errno == ENOBUFS)
+			return;
+		if (errno == EAGAIN)
+			continue;
+	}
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pmd_internals *internals = queue;
+	struct xdp_queue *txq = &internals->tx;
+	struct rte_mbuf *mbuf;
+	struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE];
+	void *indexes[ETH_AF_XDP_TX_BATCH_SIZE];
+	uint16_t i, valid;
+	unsigned long tx_bytes = 0;
+
+	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
+		  nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
+
+	if (txq->num_free < ETH_AF_XDP_TX_BATCH_SIZE * 2) {
+		int n = xq_deq(txq, descs, ETH_AF_XDP_TX_BATCH_SIZE);
+
+		for (i = 0; i < n; i++)
+			indexes[i] = (void *)((long int)descs[i].idx);
+		rte_ring_enqueue_bulk(internals->buf_ring, indexes, n, NULL);
+	}
+
+	nb_pkts = nb_pkts > txq->num_free ? txq->num_free : nb_pkts;
+	nb_pkts = rte_ring_dequeue_bulk(internals->buf_ring, indexes,
+					nb_pkts, NULL);
+
+	valid = 0;
+	for (i = 0; i < nb_pkts; i++) {
+		char *pkt;
+		unsigned int buf_len =
+			internals->umem->frame_size - ETH_AF_XDP_DATA_HEADROOM;
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			descs[valid].idx = (uint32_t)((long int)indexes[valid]);
+			descs[valid].offset = ETH_AF_XDP_DATA_HEADROOM;
+			descs[valid].flags = 0;
+			descs[valid].len = mbuf->pkt_len;
+			pkt = get_pkt_data(internals, descs[i].idx,
+					   descs[i].offset);
+			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       descs[i].len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		/* packet will be consumed anyway */
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xq_enq(txq, descs, valid);
+	kick_tx(internals->sfd);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(internals->buf_ring, &indexes[valid],
+				      nb_pkts - valid, NULL);
+
+	internals->err_pkts += (nb_pkts - valid);
+	internals->tx_pkts += valid;
+	internals->tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static void
+fill_rx_desc(struct pmd_internals *internals)
+{
+	int num_free = internals->rx.num_free;
+	void *p = NULL;
+	int i;
+
+	for (i = 0; i < num_free; i++) {
+		struct xdp_desc desc = {};
+
+		rte_ring_dequeue(internals->buf_ring, &p);
+		desc.idx = (uint32_t)((long int)p);
+		xq_enq(&internals->rx, &desc, 1);
+	}
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+	fill_rx_desc(internals);
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	const struct pmd_internals *internals = dev->data->dev_private;
+
+	stats->ipackets = stats->q_ipackets[0] =
+		internals->rx_pkts;
+	stats->ibytes = stats->q_ibytes[0] =
+		internals->rx_bytes;
+	stats->imissed =
+		internals->rx_dropped;
+
+	stats->opackets = stats->q_opackets[0]
+		= internals->tx_pkts;
+	stats->oerrors = stats->q_errors[0] =
+		internals->err_pkts;
+	stats->obytes = stats->q_obytes[0] =
+		internals->tx_bytes;
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	internals->rx_pkts = 0;
+	internals->rx_bytes = 0;
+	internals->rx_dropped = 0;
+
+	internals->tx_pkts = 0;
+	internals->err_pkts = 0;
+	internals->tx_bytes = 0;
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev __rte_unused)
+{
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static struct xdp_umem *xsk_alloc_and_mem_reg_buffers(int sfd, size_t nbuffers)
+{
+	struct xdp_mr_req req = { .frame_size = ETH_AF_XDP_FRAME_SIZE,
+				  .data_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	struct xdp_umem *umem;
+	void *bufs;
+	int ret;
+
+	ret = posix_memalign((void **)&bufs, getpagesize(),
+			     nbuffers * req.frame_size);
+	if (ret)
+		return NULL;
+
+	umem = calloc(1, sizeof(*umem));
+	if (!umem) {
+		free(bufs);
+		return NULL;
+	}
+
+	req.addr = (unsigned long)bufs;
+	req.len = nbuffers * req.frame_size;
+	ret = setsockopt(sfd, SOL_XDP, XDP_MEM_REG, &req, sizeof(req));
+	RTE_ASSERT(ret == 0);
+
+	umem->frame_size = ETH_AF_XDP_FRAME_SIZE;
+	umem->frame_size_log2 = 11;
+	umem->buffer = bufs;
+	umem->size = nbuffers * req.frame_size;
+	umem->nframes = nbuffers;
+	umem->mr_fd = sfd;
+
+	return umem;
+}
+
+static int
+xdp_configure(struct pmd_internals *internals)
+{
+	struct sockaddr_xdp sxdp;
+	struct xdp_ring_req req;
+	char ring_name[0x100];
+	int ret = 0;
+	long int i;
+
+	snprintf(ring_name, 0x100, "%s_%s_%d", "af_xdp_ring",
+		 internals->if_name, internals->queue_idx);
+	internals->buf_ring = rte_ring_create(ring_name,
+					      ETH_AF_XDP_NUM_BUFFERS,
+					      SOCKET_ID_ANY,
+					      0x0);
+	if (!internals->buf_ring)
+		return -1;
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(internals->buf_ring, (void *)i);
+
+	internals->umem = xsk_alloc_and_mem_reg_buffers(internals->sfd,
+							ETH_AF_XDP_NUM_BUFFERS);
+	if (!internals->umem)
+		goto error;
+
+	req.mr_fd = internals->umem->mr_fd;
+	req.desc_nr = internals->ring_size;
+
+	ret = setsockopt(internals->sfd, SOL_XDP, XDP_RX_RING,
+			 &req, sizeof(req));
+
+	RTE_ASSERT(ret == 0);
+
+	ret = setsockopt(internals->sfd, SOL_XDP, XDP_TX_RING,
+			 &req, sizeof(req));
+
+	RTE_ASSERT(ret == 0);
+
+	internals->rx.ring = mmap(0, req.desc_nr * sizeof(struct xdp_desc),
+				  PROT_READ | PROT_WRITE,
+				  MAP_SHARED | MAP_LOCKED | MAP_POPULATE,
+				  internals->sfd,
+				  XDP_PGOFF_RX_RING);
+	RTE_ASSERT(internals->rx.ring != MAP_FAILED);
+
+	internals->rx.num_free = req.desc_nr;
+	internals->rx.ring_mask = req.desc_nr - 1;
+
+	internals->tx.ring = mmap(0, req.desc_nr * sizeof(struct xdp_desc),
+				  PROT_READ | PROT_WRITE,
+				  MAP_SHARED | MAP_LOCKED | MAP_POPULATE,
+				  internals->sfd,
+				  XDP_PGOFF_TX_RING);
+	RTE_ASSERT(internals->tx.ring != MAP_FAILED);
+
+	internals->tx.num_free = req.desc_nr;
+	internals->tx.ring_mask = req.desc_nr - 1;
+
+	sxdp.sxdp_family = PF_XDP;
+	sxdp.sxdp_ifindex = internals->if_index;
+	sxdp.sxdp_queue_id = internals->queue_idx;
+
+	ret = bind(internals->sfd, (struct sockaddr *)&sxdp, sizeof(sxdp));
+	RTE_ASSERT(ret == 0);
+
+	return ret;
+error:
+	rte_ring_free(internals->buf_ring);
+	internals->buf_ring = NULL;
+	return -1;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	unsigned int buf_size, data_size;
+
+	RTE_ASSERT(rx_queue_id == 0);
+	internals->mb_pool = mb_pool;
+	xdp_configure(internals);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(internals->mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = internals->umem->frame_size;
+
+	if (data_size > buf_size) {
+		RTE_LOG(ERR, PMD,
+			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		return -ENOMEM;
+	}
+
+	dev->data->rx_queues[rx_queue_id] = internals;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	RTE_ASSERT(tx_queue_id == 0);
+	dev->data->tx_queues[tx_queue_id] = internals;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", internals->if_name);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", if_name);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static struct rte_vdev_driver pmd_af_xdp_drv;
+
+static void
+parse_parameters(struct rte_kvargs *kvlist,
+		 char **if_name,
+		 int *queue_idx,
+		 int *ring_size)
+{
+	struct rte_kvargs_pair *pair = NULL;
+	unsigned int k_idx;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+		pair = &kvlist->pairs[k_idx];
+		if (strstr(pair->key, ETH_AF_XDP_IFACE_ARG))
+			*if_name = pair->value;
+		else if (strstr(pair->key, ETH_AF_XDP_QUEUE_IDX_ARG))
+			*queue_idx = atoi(pair->value);
+		else if (strstr(pair->key, ETH_AF_XDP_RING_SIZE_ARG))
+			*ring_size = atoi(pair->value);
+	}
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strcpy(ifr.ifr_name, if_name);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+	*if_index = ifr.ifr_ifindex;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, 6);
+
+	close(sock);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static int
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx,
+	       int ring_size)
+{
+	const char *name = rte_vdev_device_name(dev);
+	struct rte_eth_dev *eth_dev = NULL;
+	struct rte_eth_dev_data *data = NULL;
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals = NULL;
+	int ret;
+
+	data = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (!data)
+		return -1;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (!internals)
+		goto error_1;
+
+	internals->queue_idx = queue_idx;
+	internals->ring_size = ring_size;
+	strcpy(internals->if_name, if_name);
+	internals->sfd = socket(PF_XDP, SOCK_RAW, 0);
+	if (internals->sfd < 0)
+		goto error_2;
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto error_3;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (!eth_dev)
+		goto error_3;
+
+	rte_memcpy(data, eth_dev->data, sizeof(*data));
+	internals->port_id = eth_dev->data->port_id;
+	data->dev_private = internals;
+	data->nb_rx_queues = 1;
+	data->nb_tx_queues = 1;
+	data->dev_link = pmd_link;
+	data->mac_addrs = &internals->eth_addr;
+
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return 0;
+
+error_3:
+	close(internals->sfd);
+
+error_2:
+	rte_free(internals);
+
+error_1:
+	rte_free(data);
+	return -1;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char *if_name = NULL;
+	int ring_size = ETH_AF_XDP_DFLT_RING_SIZE;
+	int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	int ret;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n",
+		rte_vdev_device_name(dev));
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (!kvlist) {
+		RTE_LOG(ERR, PMD,
+			"Invalid kvargs");
+		return -1;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	parse_parameters(kvlist, &if_name, &queue_idx, &ring_size);
+
+	ret = init_internals(dev, if_name, queue_idx, ring_size);
+	rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	RTE_LOG(INFO, PMD, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (!dev)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (!eth_dev)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+	rte_ring_free(internals->buf_ring);
+	rte_free(internals->umem);
+	rte_free(eth_dev->data->dev_private);
+	rte_free(eth_dev->data);
+	close(internals->sfd);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_ALIAS(net_af_xdp, eth_af_xdp);
+RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> "
+			      "ringsz=<int> ");
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..ef3539840
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+
+	local: *;
+};
diff --git a/drivers/net/af_xdp/xdpsock_queue.h b/drivers/net/af_xdp/xdpsock_queue.h
new file mode 100644
index 000000000..c5d0cb56a
--- /dev/null
+++ b/drivers/net/af_xdp/xdpsock_queue.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef __XDPSOCK_QUEUE_H
+#define __XDPSOCK_QUEUE_H
+
+static inline int xq_enq(struct xdp_queue *q,
+			 const struct xdp_desc *descs,
+			 unsigned int ndescs)
+{
+	unsigned int avail_idx = q->avail_idx;
+	unsigned int i;
+	int j;
+
+	if (q->num_free < ndescs)
+		return -ENOSPC;
+
+	q->num_free -= ndescs;
+
+	for (i = 0; i < ndescs; i++) {
+		unsigned int idx = avail_idx++ & q->ring_mask;
+
+		q->ring[idx].idx	= descs[i].idx;
+		q->ring[idx].len	= descs[i].len;
+		q->ring[idx].offset	= descs[i].offset;
+		q->ring[idx].error	= 0;
+	}
+	rte_smp_wmb();
+
+	for (j = ndescs - 1; j >= 0; j--) {
+		unsigned int idx = (q->avail_idx + j) & q->ring_mask;
+
+		q->ring[idx].flags = descs[j].flags | XDP_DESC_KERNEL;
+	}
+	q->avail_idx += ndescs;
+
+	return 0;
+}
+
+static inline int xq_deq(struct xdp_queue *q,
+			 struct xdp_desc *descs,
+			 int ndescs)
+{
+	unsigned int idx, last_used_idx = q->last_used_idx;
+	int i, entries = 0;
+
+	for (i = 0; i < ndescs; i++) {
+		idx = (last_used_idx++) & q->ring_mask;
+		if (q->ring[idx].flags & XDP_DESC_KERNEL)
+			break;
+		entries++;
+	}
+	q->num_free += entries;
+
+	rte_smp_rmb();
+
+	for (i = 0; i < entries; i++) {
+		idx = q->last_used_idx++ & q->ring_mask;
+		descs[i] = q->ring[idx];
+	}
+
+	return entries;
+}
+
+#endif /* __XDPSOCK_QUEUE_H */
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3eb41d176..bc26e1457 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -120,6 +120,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK)  += -lrte_mempool_stack
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD)        += -lrte_pmd_avf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 2/7] lib/mbuf: enable parse flags when create mempool
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
  2018-03-08 13:52 ` [RFC v2 1/7] net/af_xdp: new PMD driver Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  2018-03-08 13:52 ` [RFC v2 3/7] lib/mempool: allow page size aligned mempool Qi Zhang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

This give the option that applicaiton can configure each
memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD).

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 15 ++++++++++++---
 lib/librte_mbuf/rte_mbuf.h |  8 +++++++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..5fd91c87c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -125,7 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 struct rte_mempool * __rte_experimental
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name)
+	unsigned int flags, int socket_id, const char *ops_name)
 {
 	struct rte_mempool *mp;
 	struct rte_pktmbuf_pool_private mbp_priv;
@@ -145,7 +145,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	mbp_priv.mbuf_priv_size = priv_size;
 
 	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		 sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+		 sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
 	if (mp == NULL)
 		return NULL;
 
@@ -179,9 +179,18 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
 	int socket_id)
 {
 	return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
-			data_room_size, socket_id, NULL);
+			data_room_size, 0, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with NO_SPREAD */
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned int flags, int socket_id)
+{
+	return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
+			data_room_size, flags, socket_id, NULL);
+}
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..6f6af42a8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1079,6 +1079,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
 	int socket_id);
 
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
+	unsigned flags, int socket_id);
+
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
@@ -1119,7 +1125,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 struct rte_mempool * __rte_experimental
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
 	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-	int socket_id, const char *ops_name);
+	unsigned int flags, int socket_id, const char *ops_name);
 
 /**
  * Get the data room size of mbufs stored in a pktmbuf_pool
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 3/7] lib/mempool: allow page size aligned mempool
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
  2018-03-08 13:52 ` [RFC v2 1/7] net/af_xdp: new PMD driver Qi Zhang
  2018-03-08 13:52 ` [RFC v2 2/7] lib/mbuf: enable parse flags when create mempool Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  2018-03-08 13:52 ` [RFC v2 4/7] net/af_xdp: use mbuf mempool for buffer management Qi Zhang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 2 ++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f7f4ba4..f8d4814ad 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -567,6 +567,8 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
 		align = RTE_CACHE_LINE_SIZE;
+		if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+			align = getpagesize();
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7ed..774ab0f66 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -245,6 +245,7 @@ struct rte_mempool {
 #define MEMPOOL_F_SC_GET         0x0008 /**< Default get is "single-consumer".*/
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
+#define MEMPOOL_F_PAGE_ALIGN     0x0040 /**< Base address is page aligned. */
 /**
  * This capability flag is advertised by a mempool handler, if the whole
  * memory area containing the objects must be physically contiguous.
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 4/7] net/af_xdp: use mbuf mempool for buffer management
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
                   ` (2 preceding siblings ...)
  2018-03-08 13:52 ` [RFC v2 3/7] lib/mempool: allow page size aligned mempool Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  2018-03-08 13:52 ` [RFC v2 5/7] net/af_xdp: enable share mempool Qi Zhang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

Now, af_xdp registered memory buffer is managed by rte_mempool.
mbuf be allocated from rte_mempool can be convert to descriptor
index and vice versa.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 166 +++++++++++++++++++++---------------
 1 file changed, 98 insertions(+), 68 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 5c7c53aeb..65c4c37bf 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -39,7 +39,11 @@
 
 #define ETH_AF_XDP_FRAME_SIZE		2048
 #define ETH_AF_XDP_NUM_BUFFERS		131072
-#define ETH_AF_XDP_DATA_HEADROOM	0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD	192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM \
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_RING_SIZE	1024
 #define ETH_AF_XDP_DFLT_QUEUE_IDX	0
 
@@ -53,6 +57,7 @@ struct xdp_umem {
 	unsigned int frame_size_log2;
 	unsigned int nframes;
 	int mr_fd;
+	struct rte_mempool *mb_pool;
 };
 
 struct pmd_internals {
@@ -63,7 +68,7 @@ struct pmd_internals {
 	struct xdp_queue rx;
 	struct xdp_queue tx;
 	struct xdp_umem *umem;
-	struct rte_mempool *mb_pool;
+	struct rte_mempool *ext_mb_pool;
 
 	unsigned long rx_pkts;
 	unsigned long rx_bytes;
@@ -76,7 +81,6 @@ struct pmd_internals {
 	uint16_t port_id;
 	uint16_t queue_idx;
 	int ring_size;
-	struct rte_ring *buf_ring;
 };
 
 static const char * const valid_arguments[] = {
@@ -101,6 +105,22 @@ static void *get_pkt_data(struct pmd_internals *internals,
 			(index << internals->umem->frame_size_log2) + offset);
 }
 
+static uint32_t
+mbuf_to_idx(struct pmd_internals *internals, struct rte_mbuf *mbuf)
+{
+	return (uint32_t)(((uint64_t)mbuf->buf_addr -
+			  (uint64_t)internals->umem->buffer) >>
+			  internals->umem->frame_size_log2);
+}
+
+static struct rte_mbuf *
+idx_to_mbuf(struct pmd_internals *internals, uint32_t idx)
+{
+	return (struct rte_mbuf *)(internals->umem->buffer +
+				   (idx << internals->umem->frame_size_log2) +
+				    0x40);
+}
+
 static uint16_t
 eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
@@ -115,18 +135,19 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		  nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
 
 	struct xdp_desc descs[ETH_AF_XDP_RX_BATCH_SIZE];
-	void *indexes[ETH_AF_XDP_RX_BATCH_SIZE];
+	struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
 	int rcvd, i;
 
 	/* fill rx ring */
 	if (rxq->num_free >= ETH_AF_XDP_RX_BATCH_SIZE) {
-		int n = rte_ring_dequeue_bulk(internals->buf_ring,
-					      indexes,
-					      ETH_AF_XDP_RX_BATCH_SIZE,
-					      NULL);
-		for (i = 0; i < n; i++)
-			descs[i].idx = (uint32_t)((long int)indexes[i]);
-		xq_enq(rxq, descs, n);
+		int ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+					       (void *)mbufs,
+					       ETH_AF_XDP_RX_BATCH_SIZE);
+		if (!ret) {
+			for (i = 0; i < ETH_AF_XDP_RX_BATCH_SIZE; i++)
+				descs[i].idx = mbuf_to_idx(internals, mbufs[i]);
+			xq_enq(rxq, descs, ETH_AF_XDP_RX_BATCH_SIZE);
+		}
 	}
 
 	/* read data */
@@ -138,7 +159,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		char *pkt;
 		uint32_t idx = descs[i].idx;
 
-		mbuf = rte_pktmbuf_alloc(internals->mb_pool);
+		mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
 		rte_pktmbuf_pkt_len(mbuf) =
 			rte_pktmbuf_data_len(mbuf) =
 			descs[i].len;
@@ -151,11 +172,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		} else {
 			dropped++;
 		}
-		indexes[i] = (void *)((long int)idx);
+		rte_pktmbuf_free(idx_to_mbuf(internals, idx));
 	}
 
-	rte_ring_enqueue_bulk(internals->buf_ring, indexes, rcvd, NULL);
-
 	internals->rx_pkts += (rcvd - dropped);
 	internals->rx_bytes += rx_bytes;
 	internals->rx_dropped += dropped;
@@ -183,9 +202,10 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct xdp_queue *txq = &internals->tx;
 	struct rte_mbuf *mbuf;
 	struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE];
-	void *indexes[ETH_AF_XDP_TX_BATCH_SIZE];
+	struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE];
 	uint16_t i, valid;
 	unsigned long tx_bytes = 0;
+	int ret;
 
 	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
 		  nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
@@ -194,13 +214,15 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		int n = xq_deq(txq, descs, ETH_AF_XDP_TX_BATCH_SIZE);
 
 		for (i = 0; i < n; i++)
-			indexes[i] = (void *)((long int)descs[i].idx);
-		rte_ring_enqueue_bulk(internals->buf_ring, indexes, n, NULL);
+			rte_pktmbuf_free(idx_to_mbuf(internals, descs[i].idx));
 	}
 
 	nb_pkts = nb_pkts > txq->num_free ? txq->num_free : nb_pkts;
-	nb_pkts = rte_ring_dequeue_bulk(internals->buf_ring, indexes,
-					nb_pkts, NULL);
+	ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+				   (void *)mbufs,
+				   nb_pkts);
+	if (ret)
+		return 0;
 
 	valid = 0;
 	for (i = 0; i < nb_pkts; i++) {
@@ -209,14 +231,14 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 			internals->umem->frame_size - ETH_AF_XDP_DATA_HEADROOM;
 		mbuf = bufs[i];
 		if (mbuf->pkt_len <= buf_len) {
-			descs[valid].idx = (uint32_t)((long int)indexes[valid]);
+			descs[valid].idx = mbuf_to_idx(internals, mbufs[i]);
 			descs[valid].offset = ETH_AF_XDP_DATA_HEADROOM;
 			descs[valid].flags = 0;
 			descs[valid].len = mbuf->pkt_len;
 			pkt = get_pkt_data(internals, descs[i].idx,
 					   descs[i].offset);
 			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-			       descs[i].len);
+					   descs[i].len);
 			valid++;
 			tx_bytes += mbuf->pkt_len;
 		}
@@ -227,9 +249,10 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	xq_enq(txq, descs, valid);
 	kick_tx(internals->sfd);
 
-	if (valid < nb_pkts)
-		rte_ring_enqueue_bulk(internals->buf_ring, &indexes[valid],
-				      nb_pkts - valid, NULL);
+	if (valid < nb_pkts) {
+		for (i = valid; i < nb_pkts; i++)
+			rte_pktmbuf_free(mbufs[i]);
+	}
 
 	internals->err_pkts += (nb_pkts - valid);
 	internals->tx_pkts += valid;
@@ -242,14 +265,13 @@ static void
 fill_rx_desc(struct pmd_internals *internals)
 {
 	int num_free = internals->rx.num_free;
-	void *p = NULL;
 	int i;
-
 	for (i = 0; i < num_free; i++) {
 		struct xdp_desc desc = {};
+		struct rte_mbuf *mbuf =
+			rte_pktmbuf_alloc(internals->umem->mb_pool);
 
-		rte_ring_dequeue(internals->buf_ring, &p);
-		desc.idx = (uint32_t)((long int)p);
+		desc.idx = mbuf_to_idx(internals, mbuf);
 		xq_enq(&internals->rx, &desc, 1);
 	}
 }
@@ -344,33 +366,53 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
-static struct xdp_umem *xsk_alloc_and_mem_reg_buffers(int sfd, size_t nbuffers)
+static void *get_base_addr(struct rte_mempool *mb_pool)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	STAILQ_FOREACH(memhdr, &mb_pool->mem_list, next) {
+		return memhdr->addr;
+	}
+	return NULL;
+}
+
+static struct xdp_umem *xsk_alloc_and_mem_reg_buffers(int sfd,
+						      size_t nbuffers,
+						      const char *pool_name)
 {
 	struct xdp_mr_req req = { .frame_size = ETH_AF_XDP_FRAME_SIZE,
 				  .data_headroom = ETH_AF_XDP_DATA_HEADROOM };
-	struct xdp_umem *umem;
-	void *bufs;
-	int ret;
+	struct xdp_umem *umem = calloc(1, sizeof(*umem));
 
-	ret = posix_memalign((void **)&bufs, getpagesize(),
-			     nbuffers * req.frame_size);
-	if (ret)
+	if (!umem)
+		return NULL;
+
+	umem->mb_pool =
+		rte_pktmbuf_pool_create_with_flags(
+			pool_name, nbuffers,
+			250, 0,
+			(ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_MBUF_OVERHEAD),
+			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+			SOCKET_ID_ANY);
+
+	if (!umem->mb_pool) {
+		free(umem);
 		return NULL;
+	}
 
-	umem = calloc(1, sizeof(*umem));
-	if (!umem) {
-		free(bufs);
+	if (umem->mb_pool->nb_mem_chunks > 1) {
+		rte_mempool_free(umem->mb_pool);
+		free(umem);
 		return NULL;
 	}
 
-	req.addr = (unsigned long)bufs;
+	req.addr = (uint64_t)get_base_addr(umem->mb_pool);
 	req.len = nbuffers * req.frame_size;
-	ret = setsockopt(sfd, SOL_XDP, XDP_MEM_REG, &req, sizeof(req));
-	RTE_ASSERT(ret == 0);
+	setsockopt(sfd, SOL_XDP, XDP_MEM_REG, &req, sizeof(req));
 
 	umem->frame_size = ETH_AF_XDP_FRAME_SIZE;
 	umem->frame_size_log2 = 11;
-	umem->buffer = bufs;
+	umem->buffer = (char *)req.addr;
 	umem->size = nbuffers * req.frame_size;
 	umem->nframes = nbuffers;
 	umem->mr_fd = sfd;
@@ -383,38 +425,27 @@ xdp_configure(struct pmd_internals *internals)
 {
 	struct sockaddr_xdp sxdp;
 	struct xdp_ring_req req;
-	char ring_name[0x100];
+	char pool_name[0x100];
+
 	int ret = 0;
-	long int i;
 
-	snprintf(ring_name, 0x100, "%s_%s_%d", "af_xdp_ring",
+	snprintf(pool_name, 0x100, "%s_%s_%d", "af_xdp_pool",
 		 internals->if_name, internals->queue_idx);
-	internals->buf_ring = rte_ring_create(ring_name,
-					      ETH_AF_XDP_NUM_BUFFERS,
-					      SOCKET_ID_ANY,
-					      0x0);
-	if (!internals->buf_ring)
-		return -1;
-
-	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
-		rte_ring_enqueue(internals->buf_ring, (void *)i);
-
 	internals->umem = xsk_alloc_and_mem_reg_buffers(internals->sfd,
-							ETH_AF_XDP_NUM_BUFFERS);
+							ETH_AF_XDP_NUM_BUFFERS,
+							pool_name);
 	if (!internals->umem)
-		goto error;
+		return -1;
 
 	req.mr_fd = internals->umem->mr_fd;
 	req.desc_nr = internals->ring_size;
 
 	ret = setsockopt(internals->sfd, SOL_XDP, XDP_RX_RING,
 			 &req, sizeof(req));
-
 	RTE_ASSERT(ret == 0);
 
 	ret = setsockopt(internals->sfd, SOL_XDP, XDP_TX_RING,
 			 &req, sizeof(req));
-
 	RTE_ASSERT(ret == 0);
 
 	internals->rx.ring = mmap(0, req.desc_nr * sizeof(struct xdp_desc),
@@ -445,10 +476,6 @@ xdp_configure(struct pmd_internals *internals)
 	RTE_ASSERT(ret == 0);
 
 	return ret;
-error:
-	rte_ring_free(internals->buf_ring);
-	internals->buf_ring = NULL;
-	return -1;
 }
 
 static int
@@ -463,11 +490,11 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 	unsigned int buf_size, data_size;
 
 	RTE_ASSERT(rx_queue_id == 0);
-	internals->mb_pool = mb_pool;
+	internals->ext_mb_pool = mb_pool;
 	xdp_configure(internals);
 
 	/* Now get the space available for data in the mbuf */
-	buf_size = rte_pktmbuf_data_room_size(internals->mb_pool) -
+	buf_size = rte_pktmbuf_data_room_size(internals->ext_mb_pool) -
 		RTE_PKTMBUF_HEADROOM;
 	data_size = internals->umem->frame_size;
 
@@ -736,8 +763,11 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 		return -1;
 
 	internals = eth_dev->data->dev_private;
-	rte_ring_free(internals->buf_ring);
-	rte_free(internals->umem);
+	if (internals->umem) {
+		if (internals->umem->mb_pool)
+			rte_mempool_free(internals->umem->mb_pool);
+		rte_free(internals->umem);
+	}
 	rte_free(eth_dev->data->dev_private);
 	rte_free(eth_dev->data);
 	close(internals->sfd);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 5/7] net/af_xdp: enable share mempool
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
                   ` (3 preceding siblings ...)
  2018-03-08 13:52 ` [RFC v2 4/7] net/af_xdp: use mbuf mempool for buffer management Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  2018-03-08 13:52 ` [RFC v2 6/7] net/af_xdp: load BPF file Qi Zhang
  2018-03-08 13:52 ` [RFC v2 7/7] app/testpmd: enable parameter for mempool flags Qi Zhang
  6 siblings, 0 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 193 +++++++++++++++++++++++-------------
 1 file changed, 126 insertions(+), 67 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 65c4c37bf..7e839f0da 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -56,7 +56,6 @@ struct xdp_umem {
 	unsigned int frame_size;
 	unsigned int frame_size_log2;
 	unsigned int nframes;
-	int mr_fd;
 	struct rte_mempool *mb_pool;
 };
 
@@ -69,6 +68,7 @@ struct pmd_internals {
 	struct xdp_queue tx;
 	struct xdp_umem *umem;
 	struct rte_mempool *ext_mb_pool;
+	uint8_t share_mb_pool;
 
 	unsigned long rx_pkts;
 	unsigned long rx_bytes;
@@ -159,20 +159,30 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		char *pkt;
 		uint32_t idx = descs[i].idx;
 
-		mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
-		rte_pktmbuf_pkt_len(mbuf) =
-			rte_pktmbuf_data_len(mbuf) =
-			descs[i].len;
-		if (mbuf) {
-			pkt = get_pkt_data(internals, idx, descs[i].offset);
-			memcpy(rte_pktmbuf_mtod(mbuf, void *),
-			       pkt, descs[i].len);
-			rx_bytes += descs[i].len;
-			bufs[count++] = mbuf;
+		if (!internals->share_mb_pool) {
+			mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
+			rte_pktmbuf_pkt_len(mbuf) =
+				rte_pktmbuf_data_len(mbuf) =
+				descs[i].len;
+			if (mbuf) {
+				pkt = get_pkt_data(internals, idx,
+						   descs[i].offset);
+				memcpy(rte_pktmbuf_mtod(mbuf, void *), pkt,
+				       descs[i].len);
+				rx_bytes += descs[i].len;
+				bufs[count++] = mbuf;
+			} else {
+				dropped++;
+			}
+			rte_pktmbuf_free(idx_to_mbuf(internals, idx));
 		} else {
-			dropped++;
+			mbuf = idx_to_mbuf(internals, idx);
+			rte_pktmbuf_pkt_len(mbuf) =
+				rte_pktmbuf_data_len(mbuf) =
+				descs[i].len;
+			bufs[count++] = mbuf;
+			rx_bytes += descs[i].len;
 		}
-		rte_pktmbuf_free(idx_to_mbuf(internals, idx));
 	}
 
 	internals->rx_pkts += (rcvd - dropped);
@@ -206,52 +216,72 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	uint16_t i, valid;
 	unsigned long tx_bytes = 0;
 	int ret;
+	uint8_t share_mempool = 0;
 
 	nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
 		  nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
 
 	if (txq->num_free < ETH_AF_XDP_TX_BATCH_SIZE * 2) {
 		int n = xq_deq(txq, descs, ETH_AF_XDP_TX_BATCH_SIZE);
-
 		for (i = 0; i < n; i++)
 			rte_pktmbuf_free(idx_to_mbuf(internals, descs[i].idx));
 	}
 
 	nb_pkts = nb_pkts > txq->num_free ? txq->num_free : nb_pkts;
-	ret = rte_mempool_get_bulk(internals->umem->mb_pool,
-				   (void *)mbufs,
-				   nb_pkts);
-	if (ret)
+	if (nb_pkts == 0)
 		return 0;
 
+	if (bufs[0]->pool == internals->ext_mb_pool && internals->share_mb_pool)
+		share_mempool = 1;
+
+	if (!share_mempool) {
+		ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+					   (void *)mbufs,
+					   nb_pkts);
+		if (ret)
+			return 0;
+	}
+
 	valid = 0;
 	for (i = 0; i < nb_pkts; i++) {
 		char *pkt;
-		unsigned int buf_len =
-			internals->umem->frame_size - ETH_AF_XDP_DATA_HEADROOM;
 		mbuf = bufs[i];
-		if (mbuf->pkt_len <= buf_len) {
-			descs[valid].idx = mbuf_to_idx(internals, mbufs[i]);
-			descs[valid].offset = ETH_AF_XDP_DATA_HEADROOM;
-			descs[valid].flags = 0;
-			descs[valid].len = mbuf->pkt_len;
-			pkt = get_pkt_data(internals, descs[i].idx,
-					   descs[i].offset);
-			memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-					   descs[i].len);
-			valid++;
+		if (!share_mempool) {
+			if (mbuf->pkt_len <=
+				(internals->umem->frame_size -
+				 ETH_AF_XDP_DATA_HEADROOM)) {
+				descs[valid].idx =
+					mbuf_to_idx(internals, mbufs[i]);
+				descs[valid].offset = ETH_AF_XDP_DATA_HEADROOM;
+				descs[valid].flags = 0;
+				descs[valid].len = mbuf->pkt_len;
+				pkt = get_pkt_data(internals, descs[i].idx,
+						   descs[i].offset);
+				memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+				       descs[i].len);
+				valid++;
+				tx_bytes += mbuf->pkt_len;
+			}
+			/* packet will be consumed anyway */
+			rte_pktmbuf_free(mbuf);
+		} else {
+			descs[i].idx = mbuf_to_idx(internals, mbuf);
+			descs[i].offset = ETH_AF_XDP_DATA_HEADROOM;
+			descs[i].flags = 0;
+			descs[i].len = mbuf->pkt_len;
 			tx_bytes += mbuf->pkt_len;
+			valid++;
 		}
-		/* packet will be consumed anyway */
-		rte_pktmbuf_free(mbuf);
 	}
 
 	xq_enq(txq, descs, valid);
 	kick_tx(internals->sfd);
 
-	if (valid < nb_pkts) {
-		for (i = valid; i < nb_pkts; i++)
-			rte_pktmbuf_free(mbufs[i]);
+	if (!share_mempool) {
+		if (valid < nb_pkts) {
+			for (i = valid; i < nb_pkts; i++)
+				rte_pktmbuf_free(mbufs[i]);
+		}
 	}
 
 	internals->err_pkts += (nb_pkts - valid);
@@ -376,46 +406,81 @@ static void *get_base_addr(struct rte_mempool *mb_pool)
 	return NULL;
 }
 
-static struct xdp_umem *xsk_alloc_and_mem_reg_buffers(int sfd,
-						      size_t nbuffers,
-						      const char *pool_name)
+static uint8_t
+check_mempool(struct rte_mempool *mp)
+{
+	RTE_ASSERT(mp);
+
+	/* must continues */
+	if (mp->nb_mem_chunks > 1)
+		return 0;
+
+	/* check header size */
+	if (mp->header_size != RTE_CACHE_LINE_SIZE)
+		return 0;
+
+	/* check base address */
+	if ((uint64_t)get_base_addr(mp) % getpagesize() != 0)
+		return 0;
+
+	/* check chunk size */
+	if ((mp->elt_size + mp->header_size + mp->trailer_size) %
+			ETH_AF_XDP_FRAME_SIZE != 0)
+		return 0;
+
+	return 1;
+}
+
+static struct xdp_umem *
+xsk_alloc_and_mem_reg_buffers(struct pmd_internals *internals)
 {
 	struct xdp_mr_req req = { .frame_size = ETH_AF_XDP_FRAME_SIZE,
 				  .data_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	char pool_name[0x100];
+	int nbuffers;
 	struct xdp_umem *umem = calloc(1, sizeof(*umem));
 
 	if (!umem)
 		return NULL;
 
-	umem->mb_pool =
-		rte_pktmbuf_pool_create_with_flags(
-			pool_name, nbuffers,
-			250, 0,
-			(ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_MBUF_OVERHEAD),
-			MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
-			SOCKET_ID_ANY);
-
-	if (!umem->mb_pool) {
-		free(umem);
-		return NULL;
-	}
+	internals->share_mb_pool = check_mempool(internals->ext_mb_pool);
+	if (!internals->share_mb_pool) {
+		snprintf(pool_name, 0x100, "%s_%s_%d", "af_xdp_pool",
+			 internals->if_name, internals->queue_idx);
+		umem->mb_pool =
+			rte_pktmbuf_pool_create_with_flags(
+				pool_name,
+				ETH_AF_XDP_NUM_BUFFERS,
+				250, 0,
+				(ETH_AF_XDP_FRAME_SIZE -
+				 ETH_AF_XDP_MBUF_OVERHEAD),
+				MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN,
+				SOCKET_ID_ANY);
+		if (!umem->mb_pool) {
+			free(umem);
+			return NULL;
+		}
 
-	if (umem->mb_pool->nb_mem_chunks > 1) {
-		rte_mempool_free(umem->mb_pool);
-		free(umem);
-		return NULL;
+		if (umem->mb_pool->nb_mem_chunks > 1) {
+			rte_mempool_free(umem->mb_pool);
+			free(umem);
+			return NULL;
+		}
+		nbuffers = ETH_AF_XDP_NUM_BUFFERS;
+	} else {
+		umem->mb_pool = internals->ext_mb_pool;
+		nbuffers = umem->mb_pool->populated_size;
 	}
 
 	req.addr = (uint64_t)get_base_addr(umem->mb_pool);
-	req.len = nbuffers * req.frame_size;
-	setsockopt(sfd, SOL_XDP, XDP_MEM_REG, &req, sizeof(req));
+	req.len = ETH_AF_XDP_NUM_BUFFERS * req.frame_size;
+	setsockopt(internals->sfd, SOL_XDP, XDP_MEM_REG, &req, sizeof(req));
 
 	umem->frame_size = ETH_AF_XDP_FRAME_SIZE;
 	umem->frame_size_log2 = 11;
 	umem->buffer = (char *)req.addr;
 	umem->size = nbuffers * req.frame_size;
 	umem->nframes = nbuffers;
-	umem->mr_fd = sfd;
 
 	return umem;
 }
@@ -425,19 +490,13 @@ xdp_configure(struct pmd_internals *internals)
 {
 	struct sockaddr_xdp sxdp;
 	struct xdp_ring_req req;
-	char pool_name[0x100];
-
 	int ret = 0;
 
-	snprintf(pool_name, 0x100, "%s_%s_%d", "af_xdp_pool",
-		 internals->if_name, internals->queue_idx);
-	internals->umem = xsk_alloc_and_mem_reg_buffers(internals->sfd,
-							ETH_AF_XDP_NUM_BUFFERS,
-							pool_name);
+	internals->umem = xsk_alloc_and_mem_reg_buffers(internals);
 	if (!internals->umem)
 		return -1;
 
-	req.mr_fd = internals->umem->mr_fd;
+	req.mr_fd = internals->sfd;
 	req.desc_nr = internals->ring_size;
 
 	ret = setsockopt(internals->sfd, SOL_XDP, XDP_RX_RING,
@@ -498,7 +557,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 		RTE_PKTMBUF_HEADROOM;
 	data_size = internals->umem->frame_size;
 
-	if (data_size > buf_size) {
+	if (data_size - ETH_AF_XDP_DATA_HEADROOM > buf_size) {
 		RTE_LOG(ERR, PMD,
 			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
 			dev->device->name, data_size, buf_size);
@@ -764,7 +823,7 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 
 	internals = eth_dev->data->dev_private;
 	if (internals->umem) {
-		if (internals->umem->mb_pool)
+		if (internals->umem->mb_pool && !internals->share_mb_pool)
 			rte_mempool_free(internals->umem->mb_pool);
 		rte_free(internals->umem);
 	}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 6/7] net/af_xdp: load BPF file
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
                   ` (4 preceding siblings ...)
  2018-03-08 13:52 ` [RFC v2 5/7] net/af_xdp: enable share mempool Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  2018-03-08 14:20   ` Zhang, Qi Z
  2018-03-08 23:15   ` Stephen Hemminger
  2018-03-08 13:52 ` [RFC v2 7/7] app/testpmd: enable parameter for mempool flags Qi Zhang
  6 siblings, 2 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

Add libbpf and libelf dependency in Makefile.
Durring initialization, a bpf prog which call imm "xdpsk_redirect"
will be loaded. Then the driver will always try to link XDP fd with
DRV mode first, then SKB mode if failed in previoius. Link will be
released during dev_close.

Note: this is workaround solution, af_xdp may remove BPF dependency
in future.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/af_xdp/Makefile         |   5 +-
 drivers/net/af_xdp/bpf_load.c       | 168 ++++++++++++++++++++++++++++++++++++
 drivers/net/af_xdp/bpf_load.h       |  11 +++
 drivers/net/af_xdp/rte_eth_af_xdp.c |  80 ++++++++++++++---
 mk/rte.app.mk                       |   2 +-
 5 files changed, 254 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/af_xdp/bpf_load.c
 create mode 100644 drivers/net/af_xdp/bpf_load.h

diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
index 990073655..f16b5306b 100644
--- a/drivers/net/af_xdp/Makefile
+++ b/drivers/net/af_xdp/Makefile
@@ -12,7 +12,9 @@ EXPORT_MAP := rte_pmd_af_xdp_version.map
 
 LIBABIVER := 1
 
-CFLAGS += -O3 -I/opt/af_xdp/linux_headers/include
+LINUX_HEADER_DIR := /opt/af_xdp/linux_headers/include
+
+CFLAGS += -O3 -I$(LINUX_HEADER_DIR)
 CFLAGS += $(WERROR_FLAGS)
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -22,5 +24,6 @@ LDLIBS += -lrte_bus_vdev
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += bpf_load.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/bpf_load.c b/drivers/net/af_xdp/bpf_load.c
new file mode 100644
index 000000000..255e67187
--- /dev/null
+++ b/drivers/net/af_xdp/bpf_load.c
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <libelf.h>
+#include <gelf.h>
+#include <errno.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/types.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <poll.h>
+#include <ctype.h>
+#include <assert.h>
+#include "bpf_load.h"
+
+static char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+struct bpf_insn prog[] = {
+	{
+		.code = 0x85, //call imm
+		.dst_reg = 0,
+		.src_reg = 0,
+		.off = 0,
+		.imm = BPF_FUNC_xdpsk_redirect,
+	},
+	{
+		.code = 0x95, //exit
+		.dst_reg = 0,
+		.src_reg = 0,
+		.off = 0,
+		.imm = 0,
+	},
+};
+
+int load_bpf_file(void)
+{
+	int fd;
+
+	fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog,
+			      ARRAY_SIZE(prog),
+			      "GPL", 0,
+			      bpf_log_buf, BPF_LOG_BUF_SIZE);
+
+	if (fd < 0) {
+		printf("bpf_load_program() err=%d\n%s", errno, bpf_log_buf);
+		return -1;
+	}
+
+	return fd;
+}
+
+int set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	struct sockaddr_nl sa;
+	int sock, len, ret = -1;
+	uint32_t seq = 0;
+	char buf[4096];
+	struct nlattr *nla, *nla_xdp;
+	struct {
+		struct nlmsghdr  nh;
+		struct ifinfomsg ifinfo;
+		char             attrbuf[64];
+	} req;
+	struct nlmsghdr *nh;
+	struct nlmsgerr *err;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.nl_family = AF_NETLINK;
+
+	sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+	if (sock < 0) {
+		printf("open netlink socket: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
+		printf("bind to netlink: %s\n", strerror(errno));
+		goto cleanup;
+	}
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_type = RTM_SETLINK;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++seq;
+	req.ifinfo.ifi_family = AF_UNSPEC;
+	req.ifinfo.ifi_index = ifindex;
+
+	/* started nested attribute for XDP */
+	nla = (struct nlattr *)(((char *)&req)
+				+ NLMSG_ALIGN(req.nh.nlmsg_len));
+	nla->nla_type = NLA_F_NESTED | 43/*IFLA_XDP*/;
+	nla->nla_len = NLA_HDRLEN;
+
+	/* add XDP fd */
+	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+	nla_xdp->nla_type = 1/*IFLA_XDP_FD*/;
+	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
+	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
+	nla->nla_len += nla_xdp->nla_len;
+
+	/* if user passed in any flags, add those too */
+	if (flags) {
+		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+		nla_xdp->nla_type = 3/*IFLA_XDP_FLAGS*/;
+		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
+		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
+		nla->nla_len += nla_xdp->nla_len;
+	}
+
+	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		printf("send to netlink: %s\n", strerror(errno));
+		goto cleanup;
+	}
+
+	len = recv(sock, buf, sizeof(buf), 0);
+	if (len < 0) {
+		printf("recv from netlink: %s\n", strerror(errno));
+		goto cleanup;
+	}
+
+	for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, (unsigned int)len);
+	     nh = NLMSG_NEXT(nh, len)) {
+		if (nh->nlmsg_pid != (uint32_t)getpid()) {
+			printf("Wrong pid %d, expected %d\n",
+			       nh->nlmsg_pid, getpid());
+			goto cleanup;
+		}
+		if (nh->nlmsg_seq != seq) {
+			printf("Wrong seq %d, expected %d\n",
+			       nh->nlmsg_seq, seq);
+			goto cleanup;
+		}
+		switch (nh->nlmsg_type) {
+		case NLMSG_ERROR:
+			err = (struct nlmsgerr *)NLMSG_DATA(nh);
+			if (!err->error)
+				continue;
+			printf("nlmsg error %s\n", strerror(-err->error));
+			goto cleanup;
+		case NLMSG_DONE:
+			break;
+		}
+	}
+
+	ret = 0;
+
+cleanup:
+	close(sock);
+	return ret;
+}
diff --git a/drivers/net/af_xdp/bpf_load.h b/drivers/net/af_xdp/bpf_load.h
new file mode 100644
index 000000000..2561ede55
--- /dev/null
+++ b/drivers/net/af_xdp/bpf_load.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#ifndef __BPF_LOAD_H
+#define __BPF_LOAD_H
+
+#include <bpf/bpf.h>
+
+int load_bpf_file(void);
+int set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+#endif
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 7e839f0da..825273c11 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -11,6 +11,7 @@
 
 #include <linux/if_ether.h>
 #include <linux/if_xdp.h>
+#include <linux/if_link.h>
 #include <arpa/inet.h>
 #include <net/if.h>
 #include <sys/types.h>
@@ -20,6 +21,7 @@
 #include <unistd.h>
 #include <poll.h>
 #include "xdpsock_queue.h"
+#include "bpf_load.h"
 
 #ifndef SOL_XDP
 #define SOL_XDP 283
@@ -81,6 +83,9 @@ struct pmd_internals {
 	uint16_t port_id;
 	uint16_t queue_idx;
 	int ring_size;
+
+	uint32_t xdp_flags;
+	int bpf_fd;
 };
 
 static const char * const valid_arguments[] = {
@@ -97,6 +102,39 @@ static struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+static int load_bpf(struct pmd_internals *internals)
+{
+	/* need fix: hard coded bpf file */
+	int fd = load_bpf_file();
+
+	if (fd < 0)
+		return -1;
+
+	internals->bpf_fd = fd;
+	return 0;
+}
+
+static int link_bpf_file(struct pmd_internals *internals)
+{
+	if (!set_link_xdp_fd(internals->if_index,
+			     internals->bpf_fd,
+			     XDP_FLAGS_DRV_MODE))
+		internals->xdp_flags = XDP_FLAGS_DRV_MODE;
+	else if (!set_link_xdp_fd(internals->if_index,
+				  internals->bpf_fd,
+				  XDP_FLAGS_SKB_MODE))
+		internals->xdp_flags = XDP_FLAGS_SKB_MODE;
+	else
+		return -1;
+
+	return 0;
+}
+
+static void unlink_bpf_file(struct pmd_internals *internals)
+{
+	set_link_xdp_fd(internals->if_index, -1, internals->xdp_flags);
+}
+
 static void *get_pkt_data(struct pmd_internals *internals,
 			  uint32_t index,
 			  uint32_t offset)
@@ -380,8 +418,26 @@ eth_stats_reset(struct rte_eth_dev *dev)
 }
 
 static void
-eth_dev_close(struct rte_eth_dev *dev __rte_unused)
+eth_dev_close(struct rte_eth_dev *dev)
 {
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	if (internals->xdp_flags) {
+		unlink_bpf_file(internals);
+		internals->xdp_flags = 0;
+	}
+
+	if (internals->umem) {
+		if (internals->umem->mb_pool && !internals->share_mb_pool)
+			rte_mempool_free(internals->umem->mb_pool);
+		free(internals->umem);
+		internals->umem = NULL;
+	}
+
+	if (internals->sfd != -1) {
+		close(internals->sfd);
+		internals->sfd = -1;
+	}
 }
 
 static void
@@ -743,9 +799,17 @@ init_internals(struct rte_vdev_device *dev,
 	if (ret)
 		goto error_3;
 
+	if (load_bpf(internals)) {
+		printf("load bpf file failed\n");
+		goto error_3;
+	}
+
+	if (link_bpf_file(internals))
+		goto error_3;
+
 	eth_dev = rte_eth_vdev_allocate(dev, 0);
 	if (!eth_dev)
-		goto error_3;
+		goto error_4;
 
 	rte_memcpy(data, eth_dev->data, sizeof(*data));
 	internals->port_id = eth_dev->data->port_id;
@@ -763,6 +827,9 @@ init_internals(struct rte_vdev_device *dev,
 
 	return 0;
 
+error_4:
+	unlink_bpf_file(internals);
+
 error_3:
 	close(internals->sfd);
 
@@ -808,7 +875,6 @@ static int
 rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 {
 	struct rte_eth_dev *eth_dev = NULL;
-	struct pmd_internals *internals;
 
 	RTE_LOG(INFO, PMD, "Closing AF_XDP ethdev on numa socket %u\n",
 		rte_socket_id());
@@ -821,15 +887,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
 	if (!eth_dev)
 		return -1;
 
-	internals = eth_dev->data->dev_private;
-	if (internals->umem) {
-		if (internals->umem->mb_pool && !internals->share_mb_pool)
-			rte_mempool_free(internals->umem->mb_pool);
-		rte_free(internals->umem);
-	}
+	eth_dev_close(eth_dev);
 	rte_free(eth_dev->data->dev_private);
 	rte_free(eth_dev->data);
-	close(internals->sfd);
 
 	rte_eth_dev_release_port(eth_dev);
 
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bc26e1457..d05e6c0e4 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -120,7 +120,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK)  += -lrte_mempool_stack
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD)        += -lrte_pmd_avf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 7/7] app/testpmd: enable parameter for mempool flags
  2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
                   ` (5 preceding siblings ...)
  2018-03-08 13:52 ` [RFC v2 6/7] net/af_xdp: load BPF file Qi Zhang
@ 2018-03-08 13:52 ` Qi Zhang
  6 siblings, 0 replies; 11+ messages in thread
From: Qi Zhang @ 2018-03-08 13:52 UTC (permalink / raw)
  To: dev; +Cc: magnus.karlsson, bjorn.topel, Qi Zhang

Now, it is possible for testpmd to create a af_xdp friendly mempool.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/parameters.c | 12 ++++++++++++
 app/test-pmd/testpmd.c    | 15 +++++++++------
 app/test-pmd/testpmd.h    |  1 +
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b860..19675671e 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -61,6 +61,7 @@ usage(char* progname)
 	       "--tx-first | --stats-period=PERIOD | "
 	       "--coremask=COREMASK --portmask=PORTMASK --numa "
 	       "--mbuf-size= | --total-num-mbufs= | "
+	       "--mp-flags= | "
 	       "--nb-cores= | --nb-ports= | "
 #ifdef RTE_LIBRTE_CMDLINE
 	       "--eth-peers-configfile= | "
@@ -105,6 +106,7 @@ usage(char* progname)
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
 	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mp-flags=N: set the flags when create mbuf memory pool.\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -568,6 +570,7 @@ launch_args_parse(int argc, char** argv)
 		{ "ring-numa-config",           1, 0, 0 },
 		{ "socket-num",			1, 0, 0 },
 		{ "mbuf-size",			1, 0, 0 },
+		{ "mp-flags",			1, 0, 0 },
 		{ "total-num-mbufs",		1, 0, 0 },
 		{ "max-pkt-len",		1, 0, 0 },
 		{ "pkt-filter-mode",            1, 0, 0 },
@@ -769,6 +772,15 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "mbuf-size should be > 0 and < 65536\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-flags")) {
+				n = atoi(optarg);
+				if (n > 0 && n <= 0xFFFF)
+					mp_flags = (uint16_t)n;
+				else
+					rte_exit(EXIT_FAILURE,
+						 "mp-flags should be > 0 and < 65536\n");
+			}
+
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
 				if (n > 1024)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e2586c..887899919 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -171,6 +171,7 @@ uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
 uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint16_t mp_flags = 0; /**< flags parsed when create mempool */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -486,6 +487,7 @@ set_def_fwd_config(void)
  */
 static void
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
+		 unsigned int flags,
 		 unsigned int socket_id)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
@@ -503,7 +505,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
 			mb_size, (unsigned) mb_mempool_cache,
 			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
+			socket_id, flags);
 		if (rte_mp == NULL)
 			goto err;
 
@@ -518,8 +520,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		/* wrapper to rte_mempool_create() */
 		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
 				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+		rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+			mb_mempool_cache, 0, mbuf_seg_size, flags, socket_id);
 	}
 
 err:
@@ -735,13 +737,14 @@ init_config(void)
 
 		for (i = 0; i < num_sockets; i++)
 			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-					 socket_ids[i]);
+					 mp_flags, socket_ids[i]);
 	} else {
 		if (socket_num == UMA_NO_CONFIG)
-			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, 0);
+			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
+					 mp_flags, 0);
 		else
 			mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-						 socket_num);
+					 mp_flags, socket_num);
 	}
 
 	init_port_config();
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea05..11c2ea681 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -386,6 +386,7 @@ extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
 extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint16_t mp_flags;  /**< flags for mempool creation. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC v2 6/7] net/af_xdp: load BPF file
  2018-03-08 13:52 ` [RFC v2 6/7] net/af_xdp: load BPF file Qi Zhang
@ 2018-03-08 14:20   ` Zhang, Qi Z
  2018-03-08 23:15   ` Stephen Hemminger
  1 sibling, 0 replies; 11+ messages in thread
From: Zhang, Qi Z @ 2018-03-08 14:20 UTC (permalink / raw)
  To: dev; +Cc: Karlsson, Magnus, Topel, Bjorn



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Thursday, March 8, 2018 9:53 PM
> To: dev@dpdk.org
> Cc: Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn
> <bjorn.topel@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> Subject: [RFC v2 6/7] net/af_xdp: load BPF file
> 
> Add libbpf and libelf dependency in Makefile.
> Durring initialization, a bpf prog which call imm "xdpsk_redirect"
> will be loaded. Then the driver will always try to link XDP fd with DRV mode
> first, then SKB mode if failed in previoius. Link will be released during
> dev_close.
> 
> Note: this is workaround solution, af_xdp may remove BPF dependency in
> future.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>  drivers/net/af_xdp/Makefile         |   5 +-
>  drivers/net/af_xdp/bpf_load.c       | 168
> ++++++++++++++++++++++++++++++++++++
>  drivers/net/af_xdp/bpf_load.h       |  11 +++
>  drivers/net/af_xdp/rte_eth_af_xdp.c |  80 ++++++++++++++---
>  mk/rte.app.mk                       |   2 +-
>  5 files changed, 254 insertions(+), 12 deletions(-)  create mode 100644
> drivers/net/af_xdp/bpf_load.c  create mode 100644
> drivers/net/af_xdp/bpf_load.h
> 
> diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile index
> 990073655..f16b5306b 100644
> --- a/drivers/net/af_xdp/Makefile
> +++ b/drivers/net/af_xdp/Makefile
> @@ -12,7 +12,9 @@ EXPORT_MAP := rte_pmd_af_xdp_version.map
> 
> +static char bpf_log_buf[BPF_LOG_BUF_SIZE];
> +
> +struct bpf_insn prog[] = {
> +	{
> +		.code = 0x85, //call imm
> +		.dst_reg = 0,
> +		.src_reg = 0,
> +		.off = 0,
> +		.imm = BPF_FUNC_xdpsk_redirect,
> +	},
> +	{
> +		.code = 0x95, //exit
> +		.dst_reg = 0,
> +		.src_reg = 0,
> +		.off = 0,
> +		.imm = 0,
> +	},
> +};
> +
> +int load_bpf_file(void)
> +{
> +	int fd;
> +
> +	fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog,
> +			      ARRAY_SIZE(prog),
Sorry for one mistake.
checkpatch recommend to use ARRAY_SIZE here, but seems this macro is not defined by default, so compile failed here, replace with "2" is a quick fix.
> +			      "GPL", 0,
> +			      bpf_log_buf, BPF_LOG_BUF_SIZE);
> +
> +	

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 6/7] net/af_xdp: load BPF file
  2018-03-08 13:52 ` [RFC v2 6/7] net/af_xdp: load BPF file Qi Zhang
  2018-03-08 14:20   ` Zhang, Qi Z
@ 2018-03-08 23:15   ` Stephen Hemminger
  2018-05-09  7:02     ` Björn Töpel
  1 sibling, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2018-03-08 23:15 UTC (permalink / raw)
  To: Qi Zhang; +Cc: dev, magnus.karlsson, bjorn.topel

On Thu,  8 Mar 2018 21:52:48 +0800
Qi Zhang <qi.z.zhang@intel.com> wrote:

> +struct bpf_insn prog[] = {
> +	{
> +		.code = 0x85, //call imm
> +		.dst_reg = 0,
> +		.src_reg = 0,
> +		.off = 0,
> +		.imm = BPF_FUNC_xdpsk_redirect,
> +	},
> +	{
> +		.code = 0x95, //exit
> +		.dst_reg = 0,
> +		.src_reg = 0,
> +		.off = 0,
> +		.imm = 0,
> +	},
> +};
> +
> +int load_bpf_file(void)
> +{
> +	int fd;
> +
> +	fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog,
> +			      ARRAY_SIZE(prog),
> +			      "GPL", 0,
> +			      bpf_log_buf, BPF_LOG_BUF_SIZE);

Still have license conflict here. The short bpf program is in BSD code and therefore
is BSD, not GPL. But kernel won't let you load non-GPL programs.

Please check with Intel open source compliance to find a GPL solution.

A possible license safe solution is more complex. You need to provide original program
source for the BPF program under dual clause (GPL-2/BSD-3); then read in that object
file and load it.  A user wishing to exercise their GPL rights can then take your
source file and modify and create new file to load.

Doing this also creates additional GPL issues for appliance vendors using AF_XDP.
They need to make available the source of all these XDP BPF programs.

Complying with mixed licenses is hard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 6/7] net/af_xdp: load BPF file
  2018-03-08 23:15   ` Stephen Hemminger
@ 2018-05-09  7:02     ` Björn Töpel
  0 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2018-05-09  7:02 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Qi Zhang, dev, Karlsson, Magnus, Björn Töpel,
	Jesper Dangaard Brouer

2018-03-09 0:15 GMT+01:00 Stephen Hemminger <stephen@networkplumber.org>:
> On Thu,  8 Mar 2018 21:52:48 +0800
> Qi Zhang <qi.z.zhang@intel.com> wrote:
>
>> +struct bpf_insn prog[] = {
>> +     {
>> +             .code = 0x85, //call imm
>> +             .dst_reg = 0,
>> +             .src_reg = 0,
>> +             .off = 0,
>> +             .imm = BPF_FUNC_xdpsk_redirect,
>> +     },
>> +     {
>> +             .code = 0x95, //exit
>> +             .dst_reg = 0,
>> +             .src_reg = 0,
>> +             .off = 0,
>> +             .imm = 0,
>> +     },
>> +};
>> +
>> +int load_bpf_file(void)
>> +{
>> +     int fd;
>> +
>> +     fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog,
>> +                           ARRAY_SIZE(prog),
>> +                           "GPL", 0,
>> +                           bpf_log_buf, BPF_LOG_BUF_SIZE);
>
> Still have license conflict here. The short bpf program is in BSD code and therefore
> is BSD, not GPL. But kernel won't let you load non-GPL programs.
>

Raising a dead thread! Loading a bpf program that's *not* gpl is not
an issue. The only think to keep in mind is that some bpf helpers are
gpl only -- still -- loading non-gpl bpf code is perfectly ok. So, the
issue here is that bpf_load_program passes "GPL" and therefore making
the program gpl.


Björn

> Please check with Intel open source compliance to find a GPL solution.
>
> A possible license safe solution is more complex. You need to provide original program
> source for the BPF program under dual clause (GPL-2/BSD-3); then read in that object
> file and load it.  A user wishing to exercise their GPL rights can then take your
> source file and modify and create new file to load.
>
> Doing this also creates additional GPL issues for appliance vendors using AF_XDP.
> They need to make available the source of all these XDP BPF programs.
>
> Complying with mixed licenses is hard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-09  7:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 13:52 [RFC v2 0/7] PMD driver for AF_XDP Qi Zhang
2018-03-08 13:52 ` [RFC v2 1/7] net/af_xdp: new PMD driver Qi Zhang
2018-03-08 13:52 ` [RFC v2 2/7] lib/mbuf: enable parse flags when create mempool Qi Zhang
2018-03-08 13:52 ` [RFC v2 3/7] lib/mempool: allow page size aligned mempool Qi Zhang
2018-03-08 13:52 ` [RFC v2 4/7] net/af_xdp: use mbuf mempool for buffer management Qi Zhang
2018-03-08 13:52 ` [RFC v2 5/7] net/af_xdp: enable share mempool Qi Zhang
2018-03-08 13:52 ` [RFC v2 6/7] net/af_xdp: load BPF file Qi Zhang
2018-03-08 14:20   ` Zhang, Qi Z
2018-03-08 23:15   ` Stephen Hemminger
2018-05-09  7:02     ` Björn Töpel
2018-03-08 13:52 ` [RFC v2 7/7] app/testpmd: enable parameter for mempool flags Qi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.