[PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support
@ 2019-06-30 17:23 Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 1/6] xdp: allow same allocator usage Ivan Khoronzhuk
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

This patchset adds XDP support for TI cpsw driver and base it on
page_pool allocator. It was verified on af_xdp socket drop,
af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.

It was verified with following configs enabled:
CONFIG_JIT=y
CONFIG_BPFILTER=y
CONFIG_BPF_SYSCALL=y
CONFIG_XDP_SOCKETS=y
CONFIG_BPF_EVENTS=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_JIT=y
CONFIG_CGROUP_BPF=y

Link on previous v4:
https://lkml.org/lkml/2019/6/25/996

Also regular tests with iperf2 were done in order to verify impact on
regular netstack performance, compared with base commit:
https://pastebin.com/JSMT0iZ4

v4..v5:
- added two plreliminary patches:
  net: ethernet: ti: davinci_cpdma: allow desc split while down
  net: ethernet: ti: cpsw_ethtool: allow res split while down
- added xdp alocator refcnt on xdp level, avoiding page pool refcnt
- moved flush status as separate argument for cpdma_chan_process
- reworked cpsw code according to last changes to allocator
- added missed statistic counter

v3..v4:
- added page pool user counter
- use same pool for ndevs in dual mac
- restructured page pool create/destroy according to the last changes in API

v2..v3:
- each rxq and ndev has its own page pool

v1..v2:
- combined xdp_xmit functions
- used page allocation w/o refcnt juggle
- unmapped page for skb netstack
- moved rxq/page pool allocation to open/close pair
- added several preliminary patches:
  net: page_pool: add helper function to retrieve dma addresses
  net: page_pool: add helper function to unmap dma addresses
  net: ethernet: ti: cpsw: use cpsw as drv data
  net: ethernet: ti: cpsw_ethtool: simplify slave loops


Based on net-next/master

Ivan Khoronzhuk (6):
  xdp: allow same allocator usage
  net: ethernet: ti: davinci_cpdma: add dma mapped submit
  net: ethernet: ti: davinci_cpdma: return handler status
  net: ethernet: ti: davinci_cpdma: allow desc split while down
  net: ethernet: ti: cpsw_ethtool: allow res split while down
  net: ethernet: ti: cpsw: add XDP support

 drivers/net/ethernet/ti/Kconfig         |   1 +
 drivers/net/ethernet/ti/cpsw.c          | 520 +++++++++++++++++++++---
 drivers/net/ethernet/ti/cpsw_ethtool.c  |  78 +++-
 drivers/net/ethernet/ti/cpsw_priv.h     |   9 +-
 drivers/net/ethernet/ti/davinci_cpdma.c | 125 +++++-
 drivers/net/ethernet/ti/davinci_cpdma.h |  11 +-
 drivers/net/ethernet/ti/davinci_emac.c  |  17 +-
 include/net/xdp_priv.h                  |   1 +
 net/core/xdp.c                          |  46 +++
 9 files changed, 701 insertions(+), 107 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 net-next 1/6] xdp: allow same allocator usage
  2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
@ 2019-06-30 17:23 ` Ivan Khoronzhuk
  2019-07-01 11:40   ` Jesper Dangaard Brouer
  2019-06-30 17:23 ` [PATCH v5 net-next 2/6] net: ethernet: ti: davinci_cpdma: add dma mapped submit Ivan Khoronzhuk
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

XDP rxqs can be same for ndevs running under same rx napi softirq.
But there is no ability to register same allocator for both rxqs,
by fact it can same rxq but has different ndev as a reference.

Due to last changes allocator destroy can be defered till the moment
all packets are recycled by destination interface, afterwards it's
freed. In order to schedule allocator destroy only after all users are
unregistered, add refcnt to allocator object and schedule to destroy
only it reaches 0.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 include/net/xdp_priv.h |  1 +
 net/core/xdp.c         | 46 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
index 6a8cba6ea79a..995b21da2f27 100644
--- a/include/net/xdp_priv.h
+++ b/include/net/xdp_priv.h
@@ -18,6 +18,7 @@ struct xdp_mem_allocator {
 	struct rcu_head rcu;
 	struct delayed_work defer_wq;
 	unsigned long defer_warn;
+	unsigned long refcnt;
 };
 
 #endif /* __LINUX_NET_XDP_PRIV_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index b29d7b513a18..a44621190fdc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -98,6 +98,18 @@ bool __mem_id_disconnect(int id, bool force)
 		WARN(1, "Request remove non-existing id(%d), driver bug?", id);
 		return true;
 	}
+
+	/* to avoid calling hash lookup twice, decrement refcnt here till it
+	 * reaches zero, then it can be called from workqueue afterwards.
+	 */
+	if (xa->refcnt)
+		xa->refcnt--;
+
+	if (xa->refcnt) {
+		mutex_unlock(&mem_id_lock);
+		return true;
+	}
+
 	xa->disconnect_cnt++;
 
 	/* Detects in-flight packet-pages for page_pool */
@@ -312,6 +324,33 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
 	return true;
 }
 
+static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
+{
+	struct xdp_mem_allocator *xae, *xa = NULL;
+	struct rhashtable_iter iter;
+
+	mutex_lock(&mem_id_lock);
+	rhashtable_walk_enter(mem_id_ht, &iter);
+	do {
+		rhashtable_walk_start(&iter);
+
+		while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
+			if (xae->allocator == allocator) {
+				xae->refcnt++;
+				xa = xae;
+				break;
+			}
+		}
+
+		rhashtable_walk_stop(&iter);
+
+	} while (xae == ERR_PTR(-EAGAIN));
+	rhashtable_walk_exit(&iter);
+	mutex_unlock(&mem_id_lock);
+
+	return xa;
+}
+
 int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 			       enum xdp_mem_type type, void *allocator)
 {
@@ -347,6 +386,12 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 		}
 	}
 
+	xdp_alloc = xdp_allocator_get(allocator);
+	if (xdp_alloc) {
+		xdp_rxq->mem.id = xdp_alloc->mem.id;
+		return 0;
+	}
+
 	xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
 	if (!xdp_alloc)
 		return -ENOMEM;
@@ -360,6 +405,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 	xdp_rxq->mem.id = id;
 	xdp_alloc->mem  = xdp_rxq->mem;
 	xdp_alloc->allocator = allocator;
+	xdp_alloc->refcnt = 1;
 
 	/* Insert allocator into ID lookup table */
 	ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 net-next 2/6] net: ethernet: ti: davinci_cpdma: add dma mapped submit
  2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 1/6] xdp: allow same allocator usage Ivan Khoronzhuk
@ 2019-06-30 17:23 ` Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 3/6] net: ethernet: ti: davinci_cpdma: return handler status Ivan Khoronzhuk
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

In case if dma mapped packet needs to be sent, like with XDP
page pool, the "mapped" submit can be used. This patch adds dma
mapped submit based on regular one.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 89 ++++++++++++++++++++++---
 drivers/net/ethernet/ti/davinci_cpdma.h |  4 ++
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 5cf1758d425b..8da46394c0e7 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -139,6 +139,7 @@ struct submit_info {
 	int directed;
 	void *token;
 	void *data;
+	int flags;
 	int len;
 };
 
@@ -184,6 +185,8 @@ static struct cpdma_control_info controls[] = {
 				 (directed << CPDMA_TO_PORT_SHIFT));	\
 	} while (0)
 
+#define CPDMA_DMA_EXT_MAP		BIT(16)
+
 static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr)
 {
 	struct cpdma_desc_pool *pool = ctlr->pool;
@@ -1015,6 +1018,7 @@ static int cpdma_chan_submit_si(struct submit_info *si)
 	struct cpdma_chan		*chan = si->chan;
 	struct cpdma_ctlr		*ctlr = chan->ctlr;
 	int				len = si->len;
+	int				swlen = len;
 	struct cpdma_desc __iomem	*desc;
 	dma_addr_t			buffer;
 	u32				mode;
@@ -1036,16 +1040,22 @@ static int cpdma_chan_submit_si(struct submit_info *si)
 		chan->stats.runt_transmit_buff++;
 	}
 
-	buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir);
-	ret = dma_mapping_error(ctlr->dev, buffer);
-	if (ret) {
-		cpdma_desc_free(ctlr->pool, desc, 1);
-		return -EINVAL;
-	}
-
 	mode = CPDMA_DESC_OWNER | CPDMA_DESC_SOP | CPDMA_DESC_EOP;
 	cpdma_desc_to_port(chan, mode, si->directed);
 
+	if (si->flags & CPDMA_DMA_EXT_MAP) {
+		buffer = (u32)si->data;
+		dma_sync_single_for_device(ctlr->dev, buffer, len, chan->dir);
+		swlen |= CPDMA_DMA_EXT_MAP;
+	} else {
+		buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir);
+		ret = dma_mapping_error(ctlr->dev, buffer);
+		if (ret) {
+			cpdma_desc_free(ctlr->pool, desc, 1);
+			return -EINVAL;
+		}
+	}
+
 	/* Relaxed IO accessors can be used here as there is read barrier
 	 * at the end of write sequence.
 	 */
@@ -1055,7 +1065,7 @@ static int cpdma_chan_submit_si(struct submit_info *si)
 	writel_relaxed(mode | len, &desc->hw_mode);
 	writel_relaxed((uintptr_t)si->token, &desc->sw_token);
 	writel_relaxed(buffer, &desc->sw_buffer);
-	writel_relaxed(len, &desc->sw_len);
+	writel_relaxed(swlen, &desc->sw_len);
 	desc_read(desc, sw_len);
 
 	__cpdma_chan_submit(chan, desc);
@@ -1079,6 +1089,32 @@ int cpdma_chan_idle_submit(struct cpdma_chan *chan, void *token, void *data,
 	si.data = data;
 	si.len = len;
 	si.directed = directed;
+	si.flags = 0;
+
+	spin_lock_irqsave(&chan->lock, flags);
+	if (chan->state == CPDMA_STATE_TEARDOWN) {
+		spin_unlock_irqrestore(&chan->lock, flags);
+		return -EINVAL;
+	}
+
+	ret = cpdma_chan_submit_si(&si);
+	spin_unlock_irqrestore(&chan->lock, flags);
+	return ret;
+}
+
+int cpdma_chan_idle_submit_mapped(struct cpdma_chan *chan, void *token,
+				  dma_addr_t data, int len, int directed)
+{
+	struct submit_info si;
+	unsigned long flags;
+	int ret;
+
+	si.chan = chan;
+	si.token = token;
+	si.data = (void *)(u32)data;
+	si.len = len;
+	si.directed = directed;
+	si.flags = CPDMA_DMA_EXT_MAP;
 
 	spin_lock_irqsave(&chan->lock, flags);
 	if (chan->state == CPDMA_STATE_TEARDOWN) {
@@ -1103,6 +1139,32 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	si.data = data;
 	si.len = len;
 	si.directed = directed;
+	si.flags = 0;
+
+	spin_lock_irqsave(&chan->lock, flags);
+	if (chan->state != CPDMA_STATE_ACTIVE) {
+		spin_unlock_irqrestore(&chan->lock, flags);
+		return -EINVAL;
+	}
+
+	ret = cpdma_chan_submit_si(&si);
+	spin_unlock_irqrestore(&chan->lock, flags);
+	return ret;
+}
+
+int cpdma_chan_submit_mapped(struct cpdma_chan *chan, void *token,
+			     dma_addr_t data, int len, int directed)
+{
+	struct submit_info si;
+	unsigned long flags;
+	int ret;
+
+	si.chan = chan;
+	si.token = token;
+	si.data = (void *)(u32)data;
+	si.len = len;
+	si.directed = directed;
+	si.flags = CPDMA_DMA_EXT_MAP;
 
 	spin_lock_irqsave(&chan->lock, flags);
 	if (chan->state != CPDMA_STATE_ACTIVE) {
@@ -1140,10 +1202,17 @@ static void __cpdma_chan_free(struct cpdma_chan *chan,
 	uintptr_t			token;
 
 	token      = desc_read(desc, sw_token);
-	buff_dma   = desc_read(desc, sw_buffer);
 	origlen    = desc_read(desc, sw_len);
 
-	dma_unmap_single(ctlr->dev, buff_dma, origlen, chan->dir);
+	buff_dma   = desc_read(desc, sw_buffer);
+	if (origlen & CPDMA_DMA_EXT_MAP) {
+		origlen &= ~CPDMA_DMA_EXT_MAP;
+		dma_sync_single_for_cpu(ctlr->dev, buff_dma, origlen,
+					chan->dir);
+	} else {
+		dma_unmap_single(ctlr->dev, buff_dma, origlen, chan->dir);
+	}
+
 	cpdma_desc_free(pool, desc, 1);
 	(*chan->handler)((void *)token, outlen, status);
 }
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h b/drivers/net/ethernet/ti/davinci_cpdma.h
index 9343c8c73c1b..0271a20c2e09 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -77,8 +77,12 @@ int cpdma_chan_stop(struct cpdma_chan *chan);
 
 int cpdma_chan_get_stats(struct cpdma_chan *chan,
 			 struct cpdma_chan_stats *stats);
+int cpdma_chan_submit_mapped(struct cpdma_chan *chan, void *token,
+			     dma_addr_t data, int len, int directed);
 int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 		      int len, int directed);
+int cpdma_chan_idle_submit_mapped(struct cpdma_chan *chan, void *token,
+				  dma_addr_t data, int len, int directed);
 int cpdma_chan_idle_submit(struct cpdma_chan *chan, void *token, void *data,
 			   int len, int directed);
 int cpdma_chan_process(struct cpdma_chan *chan, int quota);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 net-next 3/6] net: ethernet: ti: davinci_cpdma: return handler status
  2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 1/6] xdp: allow same allocator usage Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 2/6] net: ethernet: ti: davinci_cpdma: add dma mapped submit Ivan Khoronzhuk
@ 2019-06-30 17:23 ` Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 4/6] net: ethernet: ti: davinci_cpdma: allow desc split while down Ivan Khoronzhuk
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

This change is needed to return flush status of rx handler for
flushing redirected xdp frames after processing channel packets.
Do it as separate patch for simplicity.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c          | 23 +++++++++++++---------
 drivers/net/ethernet/ti/cpsw_ethtool.c  |  2 +-
 drivers/net/ethernet/ti/cpsw_priv.h     |  2 +-
 drivers/net/ethernet/ti/davinci_cpdma.c | 26 ++++++++++++++-----------
 drivers/net/ethernet/ti/davinci_cpdma.h |  4 ++--
 drivers/net/ethernet/ti/davinci_emac.c  | 17 ++++++++++------
 6 files changed, 44 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 32b7b3b74a6b..4f72dbb5a428 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -337,7 +337,7 @@ void cpsw_intr_disable(struct cpsw_common *cpsw)
 	return;
 }
 
-void cpsw_tx_handler(void *token, int len, int status)
+int cpsw_tx_handler(void *token, int len, int status)
 {
 	struct netdev_queue	*txq;
 	struct sk_buff		*skb = token;
@@ -355,6 +355,7 @@ void cpsw_tx_handler(void *token, int len, int status)
 	ndev->stats.tx_packets++;
 	ndev->stats.tx_bytes += len;
 	dev_kfree_skb_any(skb);
+	return 0;
 }
 
 static void cpsw_rx_vlan_encap(struct sk_buff *skb)
@@ -400,7 +401,7 @@ static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 	}
 }
 
-static void cpsw_rx_handler(void *token, int len, int status)
+static int cpsw_rx_handler(void *token, int len, int status)
 {
 	struct cpdma_chan	*ch;
 	struct sk_buff		*skb = token;
@@ -434,7 +435,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		/* the interface is going down, skbs are purged */
 		dev_kfree_skb_any(skb);
-		return;
+		return 0;
 	}
 
 	new_skb = netdev_alloc_skb_ip_align(ndev, cpsw->rx_packet_max);
@@ -464,6 +465,8 @@ static void cpsw_rx_handler(void *token, int len, int status)
 		WARN_ON(ret == -ENOMEM);
 		dev_kfree_skb_any(new_skb);
 	}
+
+	return 0;
 }
 
 void cpsw_split_res(struct cpsw_common *cpsw)
@@ -588,6 +591,7 @@ static int cpsw_tx_mq_poll(struct napi_struct *napi_tx, int budget)
 	u32			ch_map;
 	int			num_tx, cur_budget, ch;
 	struct cpsw_common	*cpsw = napi_to_cpsw(napi_tx);
+	int			flags;
 	struct cpsw_vector	*txv;
 
 	/* process every unprocessed channel */
@@ -602,7 +606,7 @@ static int cpsw_tx_mq_poll(struct napi_struct *napi_tx, int budget)
 		else
 			cur_budget = txv->budget;
 
-		num_tx += cpdma_chan_process(txv->ch, cur_budget);
+		num_tx += cpdma_chan_process(txv->ch, cur_budget, &flags);
 		if (num_tx >= budget)
 			break;
 	}
@@ -618,9 +622,9 @@ static int cpsw_tx_mq_poll(struct napi_struct *napi_tx, int budget)
 static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
 {
 	struct cpsw_common *cpsw = napi_to_cpsw(napi_tx);
-	int num_tx;
+	int num_tx, flags;
 
-	num_tx = cpdma_chan_process(cpsw->txv[0].ch, budget);
+	num_tx = cpdma_chan_process(cpsw->txv[0].ch, budget, &flags);
 	if (num_tx < budget) {
 		napi_complete(napi_tx);
 		writel(0xff, &cpsw->wr_regs->tx_en);
@@ -638,6 +642,7 @@ static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
 	u32			ch_map;
 	int			num_rx, cur_budget, ch;
 	struct cpsw_common	*cpsw = napi_to_cpsw(napi_rx);
+	int			flags;
 	struct cpsw_vector	*rxv;
 
 	/* process every unprocessed channel */
@@ -652,7 +657,7 @@ static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
 		else
 			cur_budget = rxv->budget;
 
-		num_rx += cpdma_chan_process(rxv->ch, cur_budget);
+		num_rx += cpdma_chan_process(rxv->ch, cur_budget, &flags);
 		if (num_rx >= budget)
 			break;
 	}
@@ -668,9 +673,9 @@ static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
 static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
 {
 	struct cpsw_common *cpsw = napi_to_cpsw(napi_rx);
-	int num_rx;
+	int num_rx, flags;
 
-	num_rx = cpdma_chan_process(cpsw->rxv[0].ch, budget);
+	num_rx = cpdma_chan_process(cpsw->rxv[0].ch, budget, &flags);
 	if (num_rx < budget) {
 		napi_complete_done(napi_rx, num_rx);
 		writel(0xff, &cpsw->wr_regs->rx_en);
diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
index f60dc1dfc443..7c19eebbabcc 100644
--- a/drivers/net/ethernet/ti/cpsw_ethtool.c
+++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
@@ -532,8 +532,8 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx,
 				    cpdma_handler_fn rx_handler)
 {
 	struct cpsw_common *cpsw = priv->cpsw;
-	void (*handler)(void *, int, int);
 	struct netdev_queue *queue;
+	cpdma_handler_fn handler;
 	struct cpsw_vector *vec;
 	int ret, *ch, vch;
 
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 04795b97ee71..2ecb3af59fe9 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -390,7 +390,7 @@ void cpsw_split_res(struct cpsw_common *cpsw);
 int cpsw_fill_rx_channels(struct cpsw_priv *priv);
 void cpsw_intr_enable(struct cpsw_common *cpsw);
 void cpsw_intr_disable(struct cpsw_common *cpsw);
-void cpsw_tx_handler(void *token, int len, int status);
+int cpsw_tx_handler(void *token, int len, int status);
 
 /* ethtool */
 u32 cpsw_get_msglevel(struct net_device *ndev);
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 8da46394c0e7..ea25b23c8058 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -1191,15 +1191,16 @@ bool cpdma_check_free_tx_desc(struct cpdma_chan *chan)
 	return free_tx_desc;
 }
 
-static void __cpdma_chan_free(struct cpdma_chan *chan,
-			      struct cpdma_desc __iomem *desc,
-			      int outlen, int status)
+static int __cpdma_chan_free(struct cpdma_chan *chan,
+			     struct cpdma_desc __iomem *desc, int outlen,
+			     int status)
 {
 	struct cpdma_ctlr		*ctlr = chan->ctlr;
 	struct cpdma_desc_pool		*pool = ctlr->pool;
 	dma_addr_t			buff_dma;
 	int				origlen;
 	uintptr_t			token;
+	int				ret;
 
 	token      = desc_read(desc, sw_token);
 	origlen    = desc_read(desc, sw_len);
@@ -1214,14 +1215,16 @@ static void __cpdma_chan_free(struct cpdma_chan *chan,
 	}
 
 	cpdma_desc_free(pool, desc, 1);
-	(*chan->handler)((void *)token, outlen, status);
+	ret = (*chan->handler)((void *)token, outlen, status);
+
+	return ret;
 }
 
 static int __cpdma_chan_process(struct cpdma_chan *chan)
 {
+	int				status, outlen, ret;
 	struct cpdma_ctlr		*ctlr = chan->ctlr;
 	struct cpdma_desc __iomem	*desc;
-	int				status, outlen;
 	int				cb_status = 0;
 	struct cpdma_desc_pool		*pool = ctlr->pool;
 	dma_addr_t			desc_dma;
@@ -1232,7 +1235,7 @@ static int __cpdma_chan_process(struct cpdma_chan *chan)
 	desc = chan->head;
 	if (!desc) {
 		chan->stats.empty_dequeue++;
-		status = -ENOENT;
+		ret = -ENOENT;
 		goto unlock_ret;
 	}
 	desc_dma = desc_phys(pool, desc);
@@ -1241,7 +1244,7 @@ static int __cpdma_chan_process(struct cpdma_chan *chan)
 	outlen	= status & 0x7ff;
 	if (status & CPDMA_DESC_OWNER) {
 		chan->stats.busy_dequeue++;
-		status = -EBUSY;
+		ret = -EBUSY;
 		goto unlock_ret;
 	}
 
@@ -1267,15 +1270,15 @@ static int __cpdma_chan_process(struct cpdma_chan *chan)
 	else
 		cb_status = status;
 
-	__cpdma_chan_free(chan, desc, outlen, cb_status);
-	return status;
+	ret = __cpdma_chan_free(chan, desc, outlen, cb_status);
+	return ret;
 
 unlock_ret:
 	spin_unlock_irqrestore(&chan->lock, flags);
-	return status;
+	return ret;
 }
 
-int cpdma_chan_process(struct cpdma_chan *chan, int quota)
+int cpdma_chan_process(struct cpdma_chan *chan, int quota, int *flags)
 {
 	int used = 0, ret = 0;
 
@@ -1286,6 +1289,7 @@ int cpdma_chan_process(struct cpdma_chan *chan, int quota)
 		ret = __cpdma_chan_process(chan);
 		if (ret < 0)
 			break;
+		*flags |= ret;
 		used++;
 	}
 	return used;
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h b/drivers/net/ethernet/ti/davinci_cpdma.h
index 0271a20c2e09..aafa8889c789 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -61,7 +61,7 @@ struct cpdma_chan_stats {
 struct cpdma_ctlr;
 struct cpdma_chan;
 
-typedef void (*cpdma_handler_fn)(void *token, int len, int status);
+typedef int (*cpdma_handler_fn)(void *token, int len, int status);
 
 struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params *params);
 int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr);
@@ -85,7 +85,7 @@ int cpdma_chan_idle_submit_mapped(struct cpdma_chan *chan, void *token,
 				  dma_addr_t data, int len, int directed);
 int cpdma_chan_idle_submit(struct cpdma_chan *chan, void *token, void *data,
 			   int len, int directed);
-int cpdma_chan_process(struct cpdma_chan *chan, int quota);
+int cpdma_chan_process(struct cpdma_chan *chan, int quota, int *flags);
 
 int cpdma_ctlr_int_ctrl(struct cpdma_ctlr *ctlr, bool enable);
 void cpdma_ctlr_eoi(struct cpdma_ctlr *ctlr, u32 value);
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 5f4ece0d5a73..06756471d586 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -860,7 +860,7 @@ static struct sk_buff *emac_rx_alloc(struct emac_priv *priv)
 	return skb;
 }
 
-static void emac_rx_handler(void *token, int len, int status)
+static int emac_rx_handler(void *token, int len, int status)
 {
 	struct sk_buff		*skb = token;
 	struct net_device	*ndev = skb->dev;
@@ -871,7 +871,7 @@ static void emac_rx_handler(void *token, int len, int status)
 	/* free and bail if we are shutting down */
 	if (unlikely(!netif_running(ndev))) {
 		dev_kfree_skb_any(skb);
-		return;
+		return 0;
 	}
 
 	/* recycle on receive error */
@@ -892,7 +892,7 @@ static void emac_rx_handler(void *token, int len, int status)
 	if (!skb) {
 		if (netif_msg_rx_err(priv) && net_ratelimit())
 			dev_err(emac_dev, "failed rx buffer alloc\n");
-		return;
+		return 0;
 	}
 
 recycle:
@@ -902,9 +902,11 @@ static void emac_rx_handler(void *token, int len, int status)
 	WARN_ON(ret == -ENOMEM);
 	if (unlikely(ret < 0))
 		dev_kfree_skb_any(skb);
+
+	return 0;
 }
 
-static void emac_tx_handler(void *token, int len, int status)
+static int emac_tx_handler(void *token, int len, int status)
 {
 	struct sk_buff		*skb = token;
 	struct net_device	*ndev = skb->dev;
@@ -917,6 +919,7 @@ static void emac_tx_handler(void *token, int len, int status)
 	ndev->stats.tx_packets++;
 	ndev->stats.tx_bytes += len;
 	dev_kfree_skb_any(skb);
+	return 0;
 }
 
 /**
@@ -1227,6 +1230,7 @@ static int emac_poll(struct napi_struct *napi, int budget)
 	struct device *emac_dev = &ndev->dev;
 	u32 status = 0;
 	u32 num_tx_pkts = 0, num_rx_pkts = 0;
+	int flags;
 
 	/* Check interrupt vectors and call packet processing */
 	status = emac_read(EMAC_MACINVECTOR);
@@ -1238,7 +1242,8 @@ static int emac_poll(struct napi_struct *napi, int budget)
 
 	if (status & mask) {
 		num_tx_pkts = cpdma_chan_process(priv->txchan,
-					      EMAC_DEF_TX_MAX_SERVICE);
+						 EMAC_DEF_TX_MAX_SERVICE,
+						 &flags);
 	} /* TX processing */
 
 	mask = EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC;
@@ -1247,7 +1252,7 @@ static int emac_poll(struct napi_struct *napi, int budget)
 		mask = EMAC_DM646X_MAC_IN_VECTOR_RX_INT_VEC;
 
 	if (status & mask) {
-		num_rx_pkts = cpdma_chan_process(priv->rxchan, budget);
+		num_rx_pkts = cpdma_chan_process(priv->rxchan, budget, &flags);
 	} /* RX processing */
 
 	mask = EMAC_DM644X_MAC_IN_VECTOR_HOST_INT;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 net-next 4/6] net: ethernet: ti: davinci_cpdma: allow desc split while down
  2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
                   ` (2 preceding siblings ...)
  2019-06-30 17:23 ` [PATCH v5 net-next 3/6] net: ethernet: ti: davinci_cpdma: return handler status Ivan Khoronzhuk
@ 2019-06-30 17:23 ` Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 5/6] net: ethernet: ti: cpsw_ethtool: allow res " Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Ivan Khoronzhuk
  5 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

That's possible to set ring params while interfaces are down. When
interface gets up it uses number of descs to fill rx queue and on
later on changes to create rx pools. Usually, this resplit can happen
after phy is up, but it can be needed before this, so allow it to
happen while setting number of rx descs, when interfaces are down.
Also, if no dependency on intf state, move it to cpdma layer, where
it should be.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/cpsw_ethtool.c  |  9 ++++-----
 drivers/net/ethernet/ti/davinci_cpdma.c | 10 +++++++++-
 drivers/net/ethernet/ti/davinci_cpdma.h |  3 +--
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
index 7c19eebbabcc..6ab0cec8560a 100644
--- a/drivers/net/ethernet/ti/cpsw_ethtool.c
+++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
@@ -664,15 +664,14 @@ int cpsw_set_ringparam(struct net_device *ndev,
 
 	cpsw_suspend_data_pass(ndev);
 
-	cpdma_set_num_rx_descs(cpsw->dma, ering->rx_pending);
-
-	if (cpsw->usage_count)
-		cpdma_chan_split_pool(cpsw->dma);
+	ret = cpdma_set_num_rx_descs(cpsw->dma, ering->rx_pending);
+	if (ret)
+		goto err;
 
 	ret = cpsw_resume_data_pass(ndev);
 	if (!ret)
 		return 0;
-
+err:
 	dev_err(cpsw->dev, "cannot set ring params, closing device\n");
 	dev_close(ndev);
 	return ret;
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index ea25b23c8058..7dc2c1ee6238 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -1427,8 +1427,16 @@ int cpdma_get_num_tx_descs(struct cpdma_ctlr *ctlr)
 	return ctlr->num_tx_desc;
 }
 
-void cpdma_set_num_rx_descs(struct cpdma_ctlr *ctlr, int num_rx_desc)
+int cpdma_set_num_rx_descs(struct cpdma_ctlr *ctlr, int num_rx_desc)
 {
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&ctlr->lock, flags);
 	ctlr->num_rx_desc = num_rx_desc;
 	ctlr->num_tx_desc = ctlr->pool->num_desc - ctlr->num_rx_desc;
+	ret = cpdma_chan_split_pool(ctlr);
+	spin_unlock_irqrestore(&ctlr->lock, flags);
+
+	return ret;
 }
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h b/drivers/net/ethernet/ti/davinci_cpdma.h
index aafa8889c789..df66b8c797ee 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -116,8 +116,7 @@ enum cpdma_control {
 int cpdma_control_get(struct cpdma_ctlr *ctlr, int control);
 int cpdma_control_set(struct cpdma_ctlr *ctlr, int control, int value);
 int cpdma_get_num_rx_descs(struct cpdma_ctlr *ctlr);
-void cpdma_set_num_rx_descs(struct cpdma_ctlr *ctlr, int num_rx_desc);
+int cpdma_set_num_rx_descs(struct cpdma_ctlr *ctlr, int num_rx_desc);
 int cpdma_get_num_tx_descs(struct cpdma_ctlr *ctlr);
-int cpdma_chan_split_pool(struct cpdma_ctlr *ctlr);
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 net-next 5/6] net: ethernet: ti: cpsw_ethtool: allow res split while down
  2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
                   ` (3 preceding siblings ...)
  2019-06-30 17:23 ` [PATCH v5 net-next 4/6] net: ethernet: ti: davinci_cpdma: allow desc split while down Ivan Khoronzhuk
@ 2019-06-30 17:23 ` Ivan Khoronzhuk
  2019-06-30 17:23 ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Ivan Khoronzhuk
  5 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

That's possible to set channel num while interfaces are down. When
interface gets up it should resplit budget. This resplit can happen
after phy is up but only if speed is changed, so should be set before
this, for this allow it to happen while changing number of channels,
when interfaces are down.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/cpsw_ethtool.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
index 6ab0cec8560a..99935c1d265d 100644
--- a/drivers/net/ethernet/ti/cpsw_ethtool.c
+++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
@@ -620,8 +620,7 @@ int cpsw_set_channels_common(struct net_device *ndev,
 		}
 	}
 
-	if (cpsw->usage_count)
-		cpsw_split_res(cpsw);
+	cpsw_split_res(cpsw);
 
 	ret = cpsw_resume_data_pass(ndev);
 	if (!ret)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
                   ` (4 preceding siblings ...)
  2019-06-30 17:23 ` [PATCH v5 net-next 5/6] net: ethernet: ti: cpsw_ethtool: allow res " Ivan Khoronzhuk
@ 2019-06-30 17:23 ` Ivan Khoronzhuk
  2019-07-01 16:19   ` Jesper Dangaard Brouer
  2019-07-03  7:26   ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Jesper Dangaard Brouer
  5 siblings, 2 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-06-30 17:23 UTC (permalink / raw)
  To: grygorii.strashko, hawk, davem
  Cc: ast, linux-kernel, linux-omap, xdp-newbies, ilias.apalodimas,
	netdev, daniel, jakub.kicinski, john.fastabend, Ivan Khoronzhuk

Add XDP support based on rx page_pool allocator, one frame per page.
Page pool allocator is used with assumption that only one rx_handler
is running simultaneously. DMA map/unmap is reused from page pool
despite there is no need to map whole page.

Due to specific of cpsw, the same TX/RX handler can be used by 2
network devices, so special fields in buffer are added to identify
an interface the frame is destined to. Thus XDP works for both
interfaces, that allows to test xdp redirect between two interfaces
easily. Aslo, each rx queue have own page pools, but common for both
netdevs.

XDP prog is common for all channels till appropriate changes are added
in XDP infrastructure. Also, once page_pool recycling becomes part of
skb netstack some simplifications can be added, like removing
page_pool_release_page() before skb receive.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/Kconfig        |   1 +
 drivers/net/ethernet/ti/cpsw.c         | 501 ++++++++++++++++++++++---
 drivers/net/ethernet/ti/cpsw_ethtool.c |  66 +++-
 drivers/net/ethernet/ti/cpsw_priv.h    |   7 +
 4 files changed, 515 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index a800d3417411..834afca3a019 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -50,6 +50,7 @@ config TI_CPSW
 	depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST
 	select TI_DAVINCI_MDIO
 	select MFD_SYSCON
+	select PAGE_POOL
 	select REGMAP
 	---help---
 	  This driver supports TI's CPSW Ethernet Switch.
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 4f72dbb5a428..62ad8d4e231d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -31,6 +31,10 @@
 #include <linux/if_vlan.h>
 #include <linux/kmemleak.h>
 #include <linux/sys_soc.h>
+#include <net/page_pool.h>
+#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
+#include <linux/filter.h>
 
 #include <linux/pinctrl/consumer.h>
 #include <net/pkt_cls.h>
@@ -60,6 +64,10 @@ static int descs_pool_size = CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT;
 module_param(descs_pool_size, int, 0444);
 MODULE_PARM_DESC(descs_pool_size, "Number of CPDMA CPPI descriptors in pool");
 
+/* The buf includes headroom compatible with both skb and xdpf */
+#define CPSW_HEADROOM_NA (max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN)
+#define CPSW_HEADROOM  ALIGN(CPSW_HEADROOM_NA, sizeof(long))
+
 #define for_each_slave(priv, func, arg...)				\
 	do {								\
 		struct cpsw_slave *slave;				\
@@ -74,6 +82,13 @@ MODULE_PARM_DESC(descs_pool_size, "Number of CPDMA CPPI descriptors in pool");
 				(func)(slave++, ##arg);			\
 	} while (0)
 
+#define CPSW_XMETA_OFFSET	ALIGN(sizeof(struct xdp_frame), sizeof(long))
+
+#define CPSW_XDP_CONSUMED		1
+#define CPSW_XDP_CONSUMED_FLUSH		2
+#define CPSW_XDP_PASS			0
+#define CPSW_FLUSH_XDP_MAP		1
+
 static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 				    __be16 proto, u16 vid);
 
@@ -337,24 +352,58 @@ void cpsw_intr_disable(struct cpsw_common *cpsw)
 	return;
 }
 
+static int cpsw_is_xdpf_handle(void *handle)
+{
+	return (unsigned long)handle & BIT(0);
+}
+
+static void *cpsw_xdpf_to_handle(struct xdp_frame *xdpf)
+{
+	return (void *)((unsigned long)xdpf | BIT(0));
+}
+
+static struct xdp_frame *cpsw_handle_to_xdpf(void *handle)
+{
+	return (struct xdp_frame *)((unsigned long)handle & ~BIT(0));
+}
+
+struct __aligned(sizeof(long)) cpsw_meta_xdp {
+	struct net_device *ndev;
+	int ch;
+};
+
 int cpsw_tx_handler(void *token, int len, int status)
 {
+	struct cpsw_meta_xdp	*xmeta;
+	struct xdp_frame	*xdpf;
+	struct net_device	*ndev;
 	struct netdev_queue	*txq;
-	struct sk_buff		*skb = token;
-	struct net_device	*ndev = skb->dev;
-	struct cpsw_common	*cpsw = ndev_to_cpsw(ndev);
+	struct sk_buff		*skb;
+	int			ch;
+
+	if (cpsw_is_xdpf_handle(token)) {
+		xdpf = cpsw_handle_to_xdpf(token);
+		xmeta = (void *)xdpf + CPSW_XMETA_OFFSET;
+		ndev = xmeta->ndev;
+		ch = xmeta->ch;
+		xdp_return_frame(xdpf);
+	} else {
+		skb = token;
+		ndev = skb->dev;
+		ch = skb_get_queue_mapping(skb);
+		cpts_tx_timestamp(ndev_to_cpsw(ndev)->cpts, skb);
+		dev_kfree_skb_any(skb);
+	}
 
 	/* Check whether the queue is stopped due to stalled tx dma, if the
 	 * queue is stopped then start the queue as we have free desc for tx
 	 */
-	txq = netdev_get_tx_queue(ndev, skb_get_queue_mapping(skb));
+	txq = netdev_get_tx_queue(ndev, ch);
 	if (unlikely(netif_tx_queue_stopped(txq)))
 		netif_tx_wake_queue(txq);
 
-	cpts_tx_timestamp(cpsw->cpts, skb);
 	ndev->stats.tx_packets++;
 	ndev->stats.tx_bytes += len;
-	dev_kfree_skb_any(skb);
 	return 0;
 }
 
@@ -401,24 +450,233 @@ static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 	}
 }
 
+static int cpsw_xdp_tx_frame(struct cpsw_priv *priv, struct xdp_frame *xdpf,
+			     struct page *page)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	struct cpsw_meta_xdp *xmeta;
+	struct cpdma_chan *txch;
+	dma_addr_t dma;
+	int ret, port;
+
+	xmeta = (void *)xdpf + CPSW_XMETA_OFFSET;
+	xmeta->ndev = priv->ndev;
+	xmeta->ch = 0;
+	txch = cpsw->txv[0].ch;
+
+	port = priv->emac_port + cpsw->data.dual_emac;
+	if (page) {
+		dma = page_pool_get_dma_addr(page);
+		dma += xdpf->data - (void *)xdpf;
+		ret = cpdma_chan_submit_mapped(txch, cpsw_xdpf_to_handle(xdpf),
+					       dma, xdpf->len, port);
+	} else {
+		if (sizeof(*xmeta) > xdpf->headroom) {
+			xdp_return_frame_rx_napi(xdpf);
+			return -EINVAL;
+		}
+
+		ret = cpdma_chan_submit(txch, cpsw_xdpf_to_handle(xdpf),
+					xdpf->data, xdpf->len, port);
+	}
+
+	if (ret) {
+		priv->ndev->stats.tx_dropped++;
+		xdp_return_frame_rx_napi(xdpf);
+	}
+
+	return ret;
+}
+
+static int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp,
+			struct page *page)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	struct net_device *ndev = priv->ndev;
+	int ret = CPSW_XDP_CONSUMED;
+	struct xdp_frame *xdpf;
+	struct bpf_prog *prog;
+	u32 act;
+
+	rcu_read_lock();
+
+	prog = READ_ONCE(priv->xdp_prog);
+	if (!prog) {
+		ret = CPSW_XDP_PASS;
+		goto out;
+	}
+
+	act = bpf_prog_run_xdp(prog, xdp);
+	switch (act) {
+	case XDP_PASS:
+		ret = CPSW_XDP_PASS;
+		break;
+	case XDP_TX:
+		xdpf = convert_to_xdp_frame(xdp);
+		if (unlikely(!xdpf))
+			goto drop;
+
+		cpsw_xdp_tx_frame(priv, xdpf, page);
+		break;
+	case XDP_REDIRECT:
+		if (xdp_do_redirect(ndev, xdp, prog))
+			goto drop;
+
+		ret = CPSW_XDP_CONSUMED_FLUSH;
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		/* fall through */
+	case XDP_ABORTED:
+		trace_xdp_exception(ndev, prog, act);
+		/* fall through -- handle aborts by dropping packet */
+	case XDP_DROP:
+		goto drop;
+	}
+out:
+	rcu_read_unlock();
+	return ret;
+drop:
+	rcu_read_unlock();
+	page_pool_recycle_direct(cpsw->page_pool[ch], page);
+	return ret;
+}
+
+static unsigned int cpsw_rxbuf_total_len(unsigned int len)
+{
+	len += CPSW_HEADROOM;
+	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+	return SKB_DATA_ALIGN(len);
+}
+
+static struct page_pool *cpsw_create_page_pool(struct cpsw_common *cpsw,
+					       int size)
+{
+	struct page_pool_params pp_params;
+	struct page_pool *pool;
+
+	pp_params.order = 0;
+	pp_params.flags = PP_FLAG_DMA_MAP;
+	pp_params.pool_size = size;
+	pp_params.nid = NUMA_NO_NODE;
+	pp_params.dma_dir = DMA_BIDIRECTIONAL;
+	pp_params.dev = cpsw->dev;
+
+	pool = page_pool_create(&pp_params);
+	if (IS_ERR(pool))
+		dev_err(cpsw->dev, "cannot create rx page pool\n");
+
+	return pool;
+}
+
+static int cpsw_create_rx_pool(struct cpsw_common *cpsw, int ch)
+{
+	struct page_pool *pool;
+	int ret = 0, pool_size;
+
+	pool_size = cpdma_chan_get_rx_buf_num(cpsw->rxv[ch].ch);
+	pool = cpsw_create_page_pool(cpsw, pool_size);
+	if (IS_ERR(pool))
+		ret = PTR_ERR(pool);
+	else
+		cpsw->page_pool[ch] = pool;
+
+	return ret;
+}
+
+static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int ret, new_pool = false;
+	struct xdp_rxq_info *rxq;
+
+	rxq = &priv->xdp_rxq[ch];
+
+	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
+	if (ret)
+		return ret;
+
+	if (!cpsw->page_pool[ch]) {
+		ret =  cpsw_create_rx_pool(cpsw, ch);
+		if (ret)
+			goto err_rxq;
+
+		new_pool = true;
+	}
+
+	ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL,
+					 cpsw->page_pool[ch]);
+	if (!ret)
+		return 0;
+
+	if (new_pool) {
+		page_pool_free(cpsw->page_pool[ch]);
+		cpsw->page_pool[ch] = NULL;
+	}
+
+err_rxq:
+	xdp_rxq_info_unreg(rxq);
+	return ret;
+}
+
+void cpsw_ndev_destroy_xdp_rxqs(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	struct xdp_rxq_info *rxq;
+	int i;
+
+	for (i = 0; i < cpsw->rx_ch_num; i++) {
+		rxq = &priv->xdp_rxq[i];
+		if (xdp_rxq_info_is_reg(rxq))
+			xdp_rxq_info_unreg(rxq);
+	}
+}
+
+int cpsw_ndev_create_xdp_rxqs(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int i, ret;
+
+	for (i = 0; i < cpsw->rx_ch_num; i++) {
+		ret = cpsw_ndev_create_xdp_rxq(priv, i);
+		if (ret)
+			goto err_cleanup;
+	}
+
+	return 0;
+
+err_cleanup:
+	cpsw_ndev_destroy_xdp_rxqs(priv);
+
+	return ret;
+}
+
 static int cpsw_rx_handler(void *token, int len, int status)
 {
-	struct cpdma_chan	*ch;
-	struct sk_buff		*skb = token;
-	struct sk_buff		*new_skb;
-	struct net_device	*ndev = skb->dev;
-	int			ret = 0, port;
-	struct cpsw_common	*cpsw = ndev_to_cpsw(ndev);
+	struct page		*new_page, *page = token;
+	void			*pa = page_address(page);
+	struct cpsw_meta_xdp	*xmeta = pa + CPSW_XMETA_OFFSET;
+	struct cpsw_common	*cpsw = ndev_to_cpsw(xmeta->ndev);
+	int			pkt_size = cpsw->rx_packet_max;
+	int			ret = 0, port, ch = xmeta->ch;
+	int			headroom = CPSW_HEADROOM;
+	struct net_device	*ndev = xmeta->ndev;
+	int			res = 0;
 	struct cpsw_priv	*priv;
+	struct page_pool	*pool;
+	struct sk_buff		*skb;
+	struct xdp_buff		xdp;
+	dma_addr_t		dma;
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw->data.dual_emac && status >= 0) {
 		port = CPDMA_RX_SOURCE_PORT(status);
-		if (port) {
+		if (port)
 			ndev = cpsw->slaves[--port].ndev;
-			skb->dev = ndev;
-		}
 	}
 
+	priv = netdev_priv(ndev);
+	pool = cpsw->page_pool[ch];
 	if (unlikely(status < 0) || unlikely(!netif_running(ndev))) {
 		/* In dual emac mode check for all interfaces */
 		if (cpsw->data.dual_emac && cpsw->usage_count &&
@@ -427,46 +685,94 @@ static int cpsw_rx_handler(void *token, int len, int status)
 			 * is already down and the other interface is up
 			 * and running, instead of freeing which results
 			 * in reducing of the number of rx descriptor in
-			 * DMA engine, requeue skb back to cpdma.
+			 * DMA engine, requeue page back to cpdma.
 			 */
-			new_skb = skb;
+			new_page = page;
 			goto requeue;
 		}
 
-		/* the interface is going down, skbs are purged */
-		dev_kfree_skb_any(skb);
+		/* the interface is going down, pages are purged */
+		page_pool_recycle_direct(pool, page);
 		return 0;
 	}
 
-	new_skb = netdev_alloc_skb_ip_align(ndev, cpsw->rx_packet_max);
-	if (new_skb) {
-		skb_copy_queue_mapping(new_skb, skb);
-		skb_put(skb, len);
-		if (status & CPDMA_RX_VLAN_ENCAP)
-			cpsw_rx_vlan_encap(skb);
-		priv = netdev_priv(ndev);
-		if (priv->rx_ts_enabled)
-			cpts_rx_timestamp(cpsw->cpts, skb);
-		skb->protocol = eth_type_trans(skb, ndev);
-		netif_receive_skb(skb);
-		ndev->stats.rx_bytes += len;
-		ndev->stats.rx_packets++;
-		kmemleak_not_leak(new_skb);
-	} else {
+	new_page = page_pool_dev_alloc_pages(pool);
+	if (unlikely(!new_page)) {
+		new_page = page;
+		ndev->stats.rx_dropped++;
+		goto requeue;
+	}
+
+	if (priv->xdp_prog) {
+		if (status & CPDMA_RX_VLAN_ENCAP) {
+			xdp.data = pa + CPSW_HEADROOM +
+				   CPSW_RX_VLAN_ENCAP_HDR_SIZE;
+			xdp.data_end = xdp.data + len -
+				       CPSW_RX_VLAN_ENCAP_HDR_SIZE;
+		} else {
+			xdp.data = pa + CPSW_HEADROOM;
+			xdp.data_end = xdp.data + len;
+		}
+
+		xdp_set_data_meta_invalid(&xdp);
+
+		xdp.data_hard_start = pa;
+		xdp.rxq = &priv->xdp_rxq[ch];
+
+		ret = cpsw_run_xdp(priv, ch, &xdp, page);
+		if (ret != CPSW_XDP_PASS) {
+			if (ret == CPSW_XDP_CONSUMED_FLUSH)
+				res = CPSW_FLUSH_XDP_MAP;
+
+			goto requeue;
+		}
+
+		/* XDP prog might have changed packet data and boundaries */
+		len = xdp.data_end - xdp.data;
+		headroom = xdp.data - xdp.data_hard_start;
+
+		/* XDP prog can modify vlan tag, so can't use encap header */
+		status &= ~CPDMA_RX_VLAN_ENCAP;
+	}
+
+	/* pass skb to netstack if no XDP prog or returned XDP_PASS */
+	skb = build_skb(pa, cpsw_rxbuf_total_len(pkt_size));
+	if (!skb) {
 		ndev->stats.rx_dropped++;
-		new_skb = skb;
+		page_pool_recycle_direct(pool, page);
+		goto requeue;
 	}
 
+	skb_reserve(skb, headroom);
+	skb_put(skb, len);
+	skb->dev = ndev;
+	if (status & CPDMA_RX_VLAN_ENCAP)
+		cpsw_rx_vlan_encap(skb);
+	if (priv->rx_ts_enabled)
+		cpts_rx_timestamp(cpsw->cpts, skb);
+	skb->protocol = eth_type_trans(skb, ndev);
+
+	/* unmap page as no netstack skb page recycling */
+	page_pool_release_page(pool, page);
+	netif_receive_skb(skb);
+
+	ndev->stats.rx_bytes += len;
+	ndev->stats.rx_packets++;
+
 requeue:
-	ch = cpsw->rxv[skb_get_queue_mapping(new_skb)].ch;
-	ret = cpdma_chan_submit(ch, new_skb, new_skb->data,
-				skb_tailroom(new_skb), 0);
+	xmeta = page_address(new_page) + CPSW_XMETA_OFFSET;
+	xmeta->ndev = ndev;
+	xmeta->ch = ch;
+
+	dma = page_pool_get_dma_addr(new_page) + CPSW_HEADROOM;
+	ret = cpdma_chan_submit_mapped(cpsw->rxv[ch].ch, new_page, dma,
+				       pkt_size, 0);
 	if (ret < 0) {
 		WARN_ON(ret == -ENOMEM);
-		dev_kfree_skb_any(new_skb);
+		page_pool_recycle_direct(pool, new_page);
 	}
 
-	return 0;
+	return res;
 }
 
 void cpsw_split_res(struct cpsw_common *cpsw)
@@ -640,8 +946,8 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
 static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
 {
 	u32			ch_map;
-	int			num_rx, cur_budget, ch;
 	struct cpsw_common	*cpsw = napi_to_cpsw(napi_rx);
+	int			num_rx, cur_budget, ch;
 	int			flags;
 	struct cpsw_vector	*rxv;
 
@@ -657,7 +963,11 @@ static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
 		else
 			cur_budget = rxv->budget;
 
+		flags = 0;
 		num_rx += cpdma_chan_process(rxv->ch, cur_budget, &flags);
+		if (flags & CPSW_FLUSH_XDP_MAP)
+			xdp_do_flush_map();
+
 		if (num_rx >= budget)
 			break;
 	}
@@ -673,9 +983,12 @@ static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
 static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
 {
 	struct cpsw_common *cpsw = napi_to_cpsw(napi_rx);
-	int num_rx, flags;
+	int num_rx, flags = 0;
 
 	num_rx = cpdma_chan_process(cpsw->rxv[0].ch, budget, &flags);
+	if (flags & CPSW_FLUSH_XDP_MAP)
+		xdp_do_flush_map();
+
 	if (num_rx < budget) {
 		napi_complete_done(napi_rx, num_rx);
 		writel(0xff, &cpsw->wr_regs->rx_en);
@@ -1037,33 +1350,39 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 int cpsw_fill_rx_channels(struct cpsw_priv *priv)
 {
 	struct cpsw_common *cpsw = priv->cpsw;
-	struct sk_buff *skb;
+	struct cpsw_meta_xdp *xmeta;
+	struct page_pool *pool;
+	struct page *page;
 	int ch_buf_num;
 	int ch, i, ret;
+	dma_addr_t dma;
 
 	for (ch = 0; ch < cpsw->rx_ch_num; ch++) {
+		pool = cpsw->page_pool[ch];
 		ch_buf_num = cpdma_chan_get_rx_buf_num(cpsw->rxv[ch].ch);
 		for (i = 0; i < ch_buf_num; i++) {
-			skb = __netdev_alloc_skb_ip_align(priv->ndev,
-							  cpsw->rx_packet_max,
-							  GFP_KERNEL);
-			if (!skb) {
-				cpsw_err(priv, ifup, "cannot allocate skb\n");
+			page = page_pool_dev_alloc_pages(pool);
+			if (!page) {
+				cpsw_err(priv, ifup, "allocate rx page err\n");
 				return -ENOMEM;
 			}
 
-			skb_set_queue_mapping(skb, ch);
-			ret = cpdma_chan_idle_submit(cpsw->rxv[ch].ch, skb,
-						     skb->data,
-						     skb_tailroom(skb), 0);
+			xmeta = page_address(page) + CPSW_XMETA_OFFSET;
+			xmeta->ndev = priv->ndev;
+			xmeta->ch = ch;
+
+			dma = page_pool_get_dma_addr(page) + CPSW_HEADROOM;
+			ret = cpdma_chan_idle_submit_mapped(cpsw->rxv[ch].ch,
+							    page, dma,
+							    cpsw->rx_packet_max,
+							    0);
 			if (ret < 0) {
 				cpsw_err(priv, ifup,
-					 "cannot submit skb to channel %d rx, error %d\n",
+					 "cannot submit page to channel %d rx, error %d\n",
 					 ch, ret);
-				kfree_skb(skb);
+				page_pool_recycle_direct(pool, page);
 				return ret;
 			}
-			kmemleak_not_leak(skb);
 		}
 
 		cpsw_info(priv, ifup, "ch %d rx, submitted %d descriptors\n",
@@ -1375,6 +1694,10 @@ static int cpsw_ndo_open(struct net_device *ndev)
 		cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
 				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
 
+	ret = cpsw_ndev_create_xdp_rxqs(priv);
+	if (ret)
+		goto err_cleanup;
+
 	/* initialize shared resources for every ndev */
 	if (!cpsw->usage_count) {
 		/* disable priority elevation */
@@ -1427,9 +1750,10 @@ static int cpsw_ndo_open(struct net_device *ndev)
 err_cleanup:
 	if (!cpsw->usage_count) {
 		cpdma_ctlr_stop(cpsw->dma);
-		for_each_slave(priv, cpsw_slave_stop, cpsw);
+		memset(cpsw->page_pool, 0, sizeof(cpsw->page_pool));
 	}
 
+	for_each_slave(priv, cpsw_slave_stop, cpsw);
 	pm_runtime_put_sync(cpsw->dev);
 	netif_carrier_off(priv->ndev);
 	return ret;
@@ -1452,9 +1776,12 @@ static int cpsw_ndo_stop(struct net_device *ndev)
 		cpsw_intr_disable(cpsw);
 		cpdma_ctlr_stop(cpsw->dma);
 		cpsw_ale_stop(cpsw->ale);
+		memset(cpsw->page_pool, 0, sizeof(cpsw->page_pool));
 	}
 	for_each_slave(priv, cpsw_slave_stop, cpsw);
 
+	cpsw_ndev_destroy_xdp_rxqs(priv);
+
 	if (cpsw_need_resplit(cpsw))
 		cpsw_split_res(cpsw);
 
@@ -2009,6 +2336,64 @@ static int cpsw_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,
 	}
 }
 
+static int cpsw_xdp_prog_setup(struct cpsw_priv *priv, struct netdev_bpf *bpf)
+{
+	struct bpf_prog *prog = bpf->prog;
+
+	if (!priv->xdpi.prog && !prog)
+		return 0;
+
+	if (!xdp_attachment_flags_ok(&priv->xdpi, bpf))
+		return -EBUSY;
+
+	WRITE_ONCE(priv->xdp_prog, prog);
+
+	xdp_attachment_setup(&priv->xdpi, bpf);
+
+	return 0;
+}
+
+static int cpsw_ndo_bpf(struct net_device *ndev, struct netdev_bpf *bpf)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+
+	switch (bpf->command) {
+	case XDP_SETUP_PROG:
+		return cpsw_xdp_prog_setup(priv, bpf);
+
+	case XDP_QUERY_PROG:
+		return xdp_attachment_query(&priv->xdpi, bpf);
+
+	default:
+		return -EINVAL;
+	}
+}
+
+static int cpsw_ndo_xdp_xmit(struct net_device *ndev, int n,
+			     struct xdp_frame **frames, u32 flags)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct xdp_frame *xdpf;
+	int i, drops = 0;
+
+	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+		return -EINVAL;
+
+	for (i = 0; i < n; i++) {
+		xdpf = frames[i];
+		if (xdpf->len < CPSW_MIN_PACKET_SIZE) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+			continue;
+		}
+
+		if (cpsw_xdp_tx_frame(priv, xdpf, NULL))
+			drops++;
+	}
+
+	return n - drops;
+}
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cpsw_ndo_poll_controller(struct net_device *ndev)
 {
@@ -2037,6 +2422,8 @@ static const struct net_device_ops cpsw_netdev_ops = {
 	.ndo_vlan_rx_add_vid	= cpsw_ndo_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= cpsw_ndo_vlan_rx_kill_vid,
 	.ndo_setup_tc           = cpsw_ndo_setup_tc,
+	.ndo_bpf		= cpsw_ndo_bpf,
+	.ndo_xdp_xmit		= cpsw_ndo_xdp_xmit,
 };
 
 static void cpsw_get_drvinfo(struct net_device *ndev,
diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
index 99935c1d265d..59f26d1fe09b 100644
--- a/drivers/net/ethernet/ti/cpsw_ethtool.c
+++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
@@ -578,6 +578,48 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx,
 	return 0;
 }
 
+static void cpsw_destroy_xdp_rxqs(struct cpsw_common *cpsw)
+{
+	struct net_device *ndev;
+	struct cpsw_priv *priv;
+	int i;
+
+	for (i = 0; i < cpsw->data.slaves; i++) {
+		ndev = cpsw->slaves[i].ndev;
+		if (!ndev || !netif_running(ndev))
+			continue;
+
+		priv = netdev_priv(ndev);
+		cpsw_ndev_destroy_xdp_rxqs(priv);
+	}
+
+	memset(cpsw->page_pool, 0, sizeof(cpsw->page_pool));
+}
+
+static int cpsw_create_xdp_rxqs(struct cpsw_common *cpsw)
+{
+	struct net_device *ndev;
+	struct cpsw_priv *priv;
+	int i, ret;
+
+	for (i = 0; i < cpsw->data.slaves; i++) {
+		ndev = cpsw->slaves[i].ndev;
+		if (!ndev || !netif_running(ndev))
+			continue;
+
+		priv = netdev_priv(ndev);
+		ret = cpsw_ndev_create_xdp_rxqs(priv);
+		if (ret)
+			goto err_cleanup;
+	}
+
+	return 0;
+
+err_cleanup:
+	cpsw_destroy_xdp_rxqs(cpsw);
+	return ret;
+}
+
 int cpsw_set_channels_common(struct net_device *ndev,
 			     struct ethtool_channels *chs,
 			     cpdma_handler_fn rx_handler)
@@ -585,7 +627,7 @@ int cpsw_set_channels_common(struct net_device *ndev,
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
 	struct net_device *sl_ndev;
-	int i, ret;
+	int i, new_pools, ret;
 
 	ret = cpsw_check_ch_settings(cpsw, chs);
 	if (ret < 0)
@@ -593,6 +635,10 @@ int cpsw_set_channels_common(struct net_device *ndev,
 
 	cpsw_suspend_data_pass(ndev);
 
+	new_pools = (chs->rx_count != cpsw->rx_ch_num) && cpsw->usage_count;
+	if (new_pools)
+		cpsw_destroy_xdp_rxqs(cpsw);
+
 	ret = cpsw_update_channels_res(priv, chs->rx_count, 1, rx_handler);
 	if (ret)
 		goto err;
@@ -622,6 +668,12 @@ int cpsw_set_channels_common(struct net_device *ndev,
 
 	cpsw_split_res(cpsw);
 
+	if (new_pools) {
+		ret = cpsw_create_xdp_rxqs(cpsw);
+		if (ret)
+			goto err;
+	}
+
 	ret = cpsw_resume_data_pass(ndev);
 	if (!ret)
 		return 0;
@@ -647,8 +699,7 @@ void cpsw_get_ringparam(struct net_device *ndev,
 int cpsw_set_ringparam(struct net_device *ndev,
 		       struct ethtool_ringparam *ering)
 {
-	struct cpsw_priv *priv = netdev_priv(ndev);
-	struct cpsw_common *cpsw = priv->cpsw;
+	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
 	int ret;
 
 	/* ignore ering->tx_pending - only rx_pending adjustment is supported */
@@ -663,10 +714,19 @@ int cpsw_set_ringparam(struct net_device *ndev,
 
 	cpsw_suspend_data_pass(ndev);
 
+	if (cpsw->usage_count)
+		cpsw_destroy_xdp_rxqs(cpsw);
+
 	ret = cpdma_set_num_rx_descs(cpsw->dma, ering->rx_pending);
 	if (ret)
 		goto err;
 
+	if (cpsw->usage_count) {
+		ret = cpsw_create_xdp_rxqs(cpsw);
+		if (ret)
+			goto err;
+	}
+
 	ret = cpsw_resume_data_pass(ndev);
 	if (!ret)
 		return 0;
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 2ecb3af59fe9..b177f94267cd 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -346,6 +346,7 @@ struct cpsw_common {
 	int				rx_ch_num, tx_ch_num;
 	int				speed;
 	int				usage_count;
+	struct page_pool		*page_pool[CPSW_MAX_QUEUES];
 };
 
 struct cpsw_priv {
@@ -360,6 +361,10 @@ struct cpsw_priv {
 	int				shp_cfg_speed;
 	int				tx_ts_enabled;
 	int				rx_ts_enabled;
+	struct bpf_prog			*xdp_prog;
+	struct xdp_rxq_info		xdp_rxq[CPSW_MAX_QUEUES];
+	struct xdp_attachment_info	xdpi;
+
 	u32 emac_port;
 	struct cpsw_common *cpsw;
 };
@@ -391,6 +396,8 @@ int cpsw_fill_rx_channels(struct cpsw_priv *priv);
 void cpsw_intr_enable(struct cpsw_common *cpsw);
 void cpsw_intr_disable(struct cpsw_common *cpsw);
 int cpsw_tx_handler(void *token, int len, int status);
+int cpsw_ndev_create_xdp_rxqs(struct cpsw_priv *priv);
+void cpsw_ndev_destroy_xdp_rxqs(struct cpsw_priv *priv);
 
 /* ethtool */
 u32 cpsw_get_msglevel(struct net_device *ndev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage
  2019-06-30 17:23 ` [PATCH v5 net-next 1/6] xdp: allow same allocator usage Ivan Khoronzhuk
@ 2019-07-01 11:40   ` Jesper Dangaard Brouer
  2019-07-02 10:27     ` Ivan Khoronzhuk
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-01 11:40 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, hawk, davem, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend, brouer


I'm very skeptical about this approach.

On Sun, 30 Jun 2019 20:23:43 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> XDP rxqs can be same for ndevs running under same rx napi softirq.
> But there is no ability to register same allocator for both rxqs,
> by fact it can same rxq but has different ndev as a reference.

This description is not very clear. It can easily be misunderstood.

It is an absolute requirement that each RX-queue have their own
page_pool object/allocator. (This where the performance comes from) as
the page_pool have NAPI protected array for alloc and XDP_DROP recycle.

Your driver/hardware seems to have special case, where a single
RX-queue can receive packets for two different net_device'es.

Do you violate this XDP devmap redirect assumption[1]?
[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329


> Due to last changes allocator destroy can be defered till the moment
> all packets are recycled by destination interface, afterwards it's
> freed. In order to schedule allocator destroy only after all users are
> unregistered, add refcnt to allocator object and schedule to destroy
> only it reaches 0.

The guiding principles when designing an API, is to make it easy to
use, but also make it hard to misuse.

Your API change makes it easy to misuse the API.  As it make it easy to
(re)use the allocator pointer (likely page_pool) for multiple
xdp_rxq_info structs.  It is only valid for your use-case, because you
have hardware where a single RX-queue delivers to two different
net_devices.  For other normal use-cases, this will be a violation.

If I was a user of this API, and saw your xdp_allocator_get(), then I
would assume that this was the normal case.  As minimum, we need to add
a comment in the code, about this specific/intended use-case.  I
through about detecting the misuse, by adding a queue_index to
xdp_mem_allocator, that can be checked against, when calling
xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
the obvious mistake where queue_index mismatch).


> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  include/net/xdp_priv.h |  1 +
>  net/core/xdp.c         | 46 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
> index 6a8cba6ea79a..995b21da2f27 100644
> --- a/include/net/xdp_priv.h
> +++ b/include/net/xdp_priv.h
> @@ -18,6 +18,7 @@ struct xdp_mem_allocator {
>  	struct rcu_head rcu;
>  	struct delayed_work defer_wq;
>  	unsigned long defer_warn;
> +	unsigned long refcnt;
>  };
>  
>  #endif /* __LINUX_NET_XDP_PRIV_H__ */
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index b29d7b513a18..a44621190fdc 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -98,6 +98,18 @@ bool __mem_id_disconnect(int id, bool force)
>  		WARN(1, "Request remove non-existing id(%d), driver bug?", id);
>  		return true;
>  	}
> +
> +	/* to avoid calling hash lookup twice, decrement refcnt here till it
> +	 * reaches zero, then it can be called from workqueue afterwards.
> +	 */
> +	if (xa->refcnt)
> +		xa->refcnt--;
> +
> +	if (xa->refcnt) {
> +		mutex_unlock(&mem_id_lock);
> +		return true;
> +	}
> +
>  	xa->disconnect_cnt++;
>  
>  	/* Detects in-flight packet-pages for page_pool */
> @@ -312,6 +324,33 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
>  	return true;
>  }
>  
> +static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)

API wise, when you have "get" operation, you usually also have a "put"
operation...

> +{
> +	struct xdp_mem_allocator *xae, *xa = NULL;
> +	struct rhashtable_iter iter;
> +
> +	mutex_lock(&mem_id_lock);
> +	rhashtable_walk_enter(mem_id_ht, &iter);
> +	do {
> +		rhashtable_walk_start(&iter);
> +
> +		while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
> +			if (xae->allocator == allocator) {
> +				xae->refcnt++;
> +				xa = xae;
> +				break;
> +			}
> +		}
> +
> +		rhashtable_walk_stop(&iter);
> +
> +	} while (xae == ERR_PTR(-EAGAIN));
> +	rhashtable_walk_exit(&iter);
> +	mutex_unlock(&mem_id_lock);
> +
> +	return xa;
> +}
> +
>  int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>  			       enum xdp_mem_type type, void *allocator)
>  {
> @@ -347,6 +386,12 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>  		}
>  	}
>  
> +	xdp_alloc = xdp_allocator_get(allocator);
> +	if (xdp_alloc) {
> +		xdp_rxq->mem.id = xdp_alloc->mem.id;
> +		return 0;
> +	}
> +

The allocator pointer (in-practice) becomes the identifier for the
mem.id (which rhashtable points to xdp_mem_allocator object).


>  	xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
>  	if (!xdp_alloc)
>  		return -ENOMEM;
> @@ -360,6 +405,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>  	xdp_rxq->mem.id = id;
>  	xdp_alloc->mem  = xdp_rxq->mem;
>  	xdp_alloc->allocator = allocator;
> +	xdp_alloc->refcnt = 1;
>  
>  	/* Insert allocator into ID lookup table */
>  	ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-06-30 17:23 ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Ivan Khoronzhuk
@ 2019-07-01 16:19   ` Jesper Dangaard Brouer
  2019-07-01 18:09     ` Ilias Apalodimas
  2019-07-02 11:37     ` Ivan Khoronzhuk
  2019-07-03  7:26   ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Jesper Dangaard Brouer
  1 sibling, 2 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-01 16:19 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, davem, ast, linux-kernel, linux-omap,
	ilias.apalodimas, netdev, daniel, jakub.kicinski, john.fastabend,
	brouer

On Sun, 30 Jun 2019 20:23:48 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> +static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
> +{
> +	struct cpsw_common *cpsw = priv->cpsw;
> +	int ret, new_pool = false;
> +	struct xdp_rxq_info *rxq;
> +
> +	rxq = &priv->xdp_rxq[ch];
> +
> +	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
> +	if (ret)
> +		return ret;
> +
> +	if (!cpsw->page_pool[ch]) {
> +		ret =  cpsw_create_rx_pool(cpsw, ch);
> +		if (ret)
> +			goto err_rxq;
> +
> +		new_pool = true;
> +	}
> +
> +	ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL,
> +					 cpsw->page_pool[ch]);
> +	if (!ret)
> +		return 0;
> +
> +	if (new_pool) {
> +		page_pool_free(cpsw->page_pool[ch]);
> +		cpsw->page_pool[ch] = NULL;
> +	}
> +
> +err_rxq:
> +	xdp_rxq_info_unreg(rxq);
> +	return ret;
> +}

Looking at this, and Ilias'es XDP-netsec error handling path, it might
be a mistake that I removed page_pool_destroy() and instead put the
responsibility on xdp_rxq_info_unreg().

As here, we have to detect if page_pool_create() was a success, and then
if xdp_rxq_info_reg_mem_model() was a failure, explicitly call
page_pool_free() because the xdp_rxq_info_unreg() call cannot "free"
the page_pool object given it was not registered.  

Ivan's patch in[1], might be a better approach, which forced all
drivers to explicitly call page_pool_free(), even-though it just
dec-refcnt and the real call to page_pool_free() happened via
xdp_rxq_info_unreg().

To better handle error path, I would re-introduce page_pool_destroy(),
as a driver API, that would gracefully handle NULL-pointer case, and
then call page_pool_free() with the atomic_dec_and_test().  (It should
hopefully simplify the error handling code a bit)

[1] https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/

> +void cpsw_ndev_destroy_xdp_rxqs(struct cpsw_priv *priv)
> +{
> +	struct cpsw_common *cpsw = priv->cpsw;
> +	struct xdp_rxq_info *rxq;
> +	int i;
> +
> +	for (i = 0; i < cpsw->rx_ch_num; i++) {
> +		rxq = &priv->xdp_rxq[i];
> +		if (xdp_rxq_info_is_reg(rxq))
> +			xdp_rxq_info_unreg(rxq);
> +	}
> +}

Are you sure you need to test xdp_rxq_info_is_reg() here?

You should just call xdp_rxq_info_unreg(rxq), if you know that this rxq
should be registered.  If your assumption failed, you will get a
WARNing, and discover your driver level bug.  This is one of the ways
the API is designed to "detect" misuse of the API.  (I found this
rather useful, when I converted the approx 12 drivers using this
xdp_rxq_info API).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-07-01 16:19   ` Jesper Dangaard Brouer
@ 2019-07-01 18:09     ` Ilias Apalodimas
  2019-07-02 11:37     ` Ivan Khoronzhuk
  1 sibling, 0 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2019-07-01 18:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Ivan Khoronzhuk, grygorii.strashko, davem, ast, linux-kernel,
	linux-omap, netdev, daniel, jakub.kicinski, john.fastabend

On Mon, Jul 01, 2019 at 06:19:01PM +0200, Jesper Dangaard Brouer wrote:
> On Sun, 30 Jun 2019 20:23:48 +0300
> Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> 
> > +static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
> > +{
> > +	struct cpsw_common *cpsw = priv->cpsw;
> > +	int ret, new_pool = false;
> > +	struct xdp_rxq_info *rxq;
> > +
> > +	rxq = &priv->xdp_rxq[ch];
> > +
> > +	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!cpsw->page_pool[ch]) {
> > +		ret =  cpsw_create_rx_pool(cpsw, ch);
> > +		if (ret)
> > +			goto err_rxq;
> > +
> > +		new_pool = true;
> > +	}
> > +
> > +	ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL,
> > +					 cpsw->page_pool[ch]);
> > +	if (!ret)
> > +		return 0;
> > +
> > +	if (new_pool) {
> > +		page_pool_free(cpsw->page_pool[ch]);
> > +		cpsw->page_pool[ch] = NULL;
> > +	}
> > +
> > +err_rxq:
> > +	xdp_rxq_info_unreg(rxq);
> > +	return ret;
> > +}
> 
> Looking at this, and Ilias'es XDP-netsec error handling path, it might
> be a mistake that I removed page_pool_destroy() and instead put the
> responsibility on xdp_rxq_info_unreg().
> 
> As here, we have to detect if page_pool_create() was a success, and then
> if xdp_rxq_info_reg_mem_model() was a failure, explicitly call
> page_pool_free() because the xdp_rxq_info_unreg() call cannot "free"
> the page_pool object given it was not registered.  
> 
> Ivan's patch in[1], might be a better approach, which forced all
> drivers to explicitly call page_pool_free(), even-though it just
> dec-refcnt and the real call to page_pool_free() happened via
> xdp_rxq_info_unreg().
We did discuss that xdp_XXXXX naming might be confusing.
That being said since Ivan's approach serves 'special' hardware and fixes the
naming irregularity, i perfectly fine doing that as long as we clearly document
that the API is supposed to serve a pool per queue (unless the hardware needs to
deal with it differently)
> 
> To better handle error path, I would re-introduce page_pool_destroy(),
> as a driver API, that would gracefully handle NULL-pointer case, and
> then call page_pool_free() with the atomic_dec_and_test().  (It should
> hopefully simplify the error handling code a bit)
> 
> [1] https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
> 
>

Thanks
/Ilias
> > +void cpsw_ndev_destroy_xdp_rxqs(struct cpsw_priv *priv)
> > +{
> > +	struct cpsw_common *cpsw = priv->cpsw;
> > +	struct xdp_rxq_info *rxq;
> > +	int i;
> > +
> > +	for (i = 0; i < cpsw->rx_ch_num; i++) {
> > +		rxq = &priv->xdp_rxq[i];
> > +		if (xdp_rxq_info_is_reg(rxq))
> > +			xdp_rxq_info_unreg(rxq);
> > +	}
> > +}
> 
> Are you sure you need to test xdp_rxq_info_is_reg() here?
> 
> You should just call xdp_rxq_info_unreg(rxq), if you know that this rxq
> should be registered.  If your assumption failed, you will get a
> WARNing, and discover your driver level bug.  This is one of the ways
> the API is designed to "detect" misuse of the API.  (I found this
> rather useful, when I converted the approx 12 drivers using this
> xdp_rxq_info API).
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage
  2019-07-01 11:40   ` Jesper Dangaard Brouer
@ 2019-07-02 10:27     ` Ivan Khoronzhuk
  2019-07-02 14:46       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 10:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: grygorii.strashko, hawk, davem, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend

On Mon, Jul 01, 2019 at 01:40:59PM +0200, Jesper Dangaard Brouer wrote:
>
>I'm very skeptical about this approach.
>
>On Sun, 30 Jun 2019 20:23:43 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> XDP rxqs can be same for ndevs running under same rx napi softirq.
>> But there is no ability to register same allocator for both rxqs,
>> by fact it can same rxq but has different ndev as a reference.
>
>This description is not very clear. It can easily be misunderstood.
>
>It is an absolute requirement that each RX-queue have their own
>page_pool object/allocator. (This where the performance comes from) as
>the page_pool have NAPI protected array for alloc and XDP_DROP recycle.
>
>Your driver/hardware seems to have special case, where a single
>RX-queue can receive packets for two different net_device'es.
>
>Do you violate this XDP devmap redirect assumption[1]?
>[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329
Seems that yes, but that's used only for trace for now.
As it runs under napi and flush clear dev_rx i must do it right in the
rx_handler. So next patchset version will have one patch less.

Thanks!

>
>
>> Due to last changes allocator destroy can be defered till the moment
>> all packets are recycled by destination interface, afterwards it's
>> freed. In order to schedule allocator destroy only after all users are
>> unregistered, add refcnt to allocator object and schedule to destroy
>> only it reaches 0.
>
>The guiding principles when designing an API, is to make it easy to
>use, but also make it hard to misuse.
>
>Your API change makes it easy to misuse the API.  As it make it easy to
>(re)use the allocator pointer (likely page_pool) for multiple
>xdp_rxq_info structs.  It is only valid for your use-case, because you
>have hardware where a single RX-queue delivers to two different
>net_devices.  For other normal use-cases, this will be a violation.
>
>If I was a user of this API, and saw your xdp_allocator_get(), then I
>would assume that this was the normal case.  As minimum, we need to add
>a comment in the code, about this specific/intended use-case.  I
>through about detecting the misuse, by adding a queue_index to
>xdp_mem_allocator, that can be checked against, when calling
>xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
>the obvious mistake where queue_index mismatch).

I can add, but not sure if it has or can have some conflicts with another
memory allocators, now or in future. Main here to not became a cornerstone
in some not obvious use-cases.

So, for now, let it be in this way:

--- a/include/net/xdp_priv.h
+++ b/include/net/xdp_priv.h
@@ -19,6 +19,7 @@ struct xdp_mem_allocator {
        struct delayed_work defer_wq;
        unsigned long defer_warn;
        unsigned long refcnt;
+       u32 queue_index;
 };

 #endif /* __LINUX_NET_XDP_PRIV_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index a44621190fdc..c4bf29810f4d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -324,7 +324,7 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
        return true;
 }

-static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
+static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
 {
        struct xdp_mem_allocator *xae, *xa = NULL;
        struct rhashtable_iter iter;
@@ -336,7 +336,6 @@ static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)

                while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
                        if (xae->allocator == allocator) {
-                               xae->refcnt++;
                                xa = xae;
                                break;
                        }
@@ -386,9 +385,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
                }
        }

-       xdp_alloc = xdp_allocator_get(allocator);
+       xdp_alloc = xdp_allocator_find(allocator);
        if (xdp_alloc) {
+               if (xdp_alloc->queue_index != xdp_rxq->queue_index)
+                       return -EINVAL;
+
                xdp_rxq->mem.id = xdp_alloc->mem.id;
+               xdp_alloc->refcnt++;
                return 0;
        }

@@ -406,6 +409,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
        xdp_alloc->mem  = xdp_rxq->mem;
        xdp_alloc->allocator = allocator;
        xdp_alloc->refcnt = 1;
+       xdp_alloc->queue_index = xdp_rxq->queue_index;

        /* Insert allocator into ID lookup table */
        ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);

Jesper, are you Ok with this version?

>
>
>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> ---
>>  include/net/xdp_priv.h |  1 +
>>  net/core/xdp.c         | 46 ++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 47 insertions(+)
>>
>> diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
>> index 6a8cba6ea79a..995b21da2f27 100644
>> --- a/include/net/xdp_priv.h
>> +++ b/include/net/xdp_priv.h
>> @@ -18,6 +18,7 @@ struct xdp_mem_allocator {
>>  	struct rcu_head rcu;
>>  	struct delayed_work defer_wq;
>>  	unsigned long defer_warn;
>> +	unsigned long refcnt;
>>  };
>>
>>  #endif /* __LINUX_NET_XDP_PRIV_H__ */
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index b29d7b513a18..a44621190fdc 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -98,6 +98,18 @@ bool __mem_id_disconnect(int id, bool force)
>>  		WARN(1, "Request remove non-existing id(%d), driver bug?", id);
>>  		return true;
>>  	}
>> +
>> +	/* to avoid calling hash lookup twice, decrement refcnt here till it
>> +	 * reaches zero, then it can be called from workqueue afterwards.
>> +	 */
>> +	if (xa->refcnt)
>> +		xa->refcnt--;
>> +
>> +	if (xa->refcnt) {
>> +		mutex_unlock(&mem_id_lock);
>> +		return true;
>> +	}
>> +
>>  	xa->disconnect_cnt++;
>>
>>  	/* Detects in-flight packet-pages for page_pool */
>> @@ -312,6 +324,33 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
>>  	return true;
>>  }
>>
>> +static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>
>API wise, when you have "get" operation, you usually also have a "put"
>operation...

It's not part of external API.
I propose to rename it on xdp_allocator_find() as in above diff.
What do you say?

>
>> +{
>> +	struct xdp_mem_allocator *xae, *xa = NULL;
>> +	struct rhashtable_iter iter;
>> +
>> +	mutex_lock(&mem_id_lock);
>> +	rhashtable_walk_enter(mem_id_ht, &iter);
>> +	do {
>> +		rhashtable_walk_start(&iter);
>> +
>> +		while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
>> +			if (xae->allocator == allocator) {
>> +				xae->refcnt++;
>> +				xa = xae;
>> +				break;
>> +			}
>> +		}
>> +
>> +		rhashtable_walk_stop(&iter);
>> +
>> +	} while (xae == ERR_PTR(-EAGAIN));
>> +	rhashtable_walk_exit(&iter);
>> +	mutex_unlock(&mem_id_lock);
>> +
>> +	return xa;
>> +}
>> +
>>  int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>>  			       enum xdp_mem_type type, void *allocator)
>>  {
>> @@ -347,6 +386,12 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>>  		}
>>  	}
>>
>> +	xdp_alloc = xdp_allocator_get(allocator);
>> +	if (xdp_alloc) {
>> +		xdp_rxq->mem.id = xdp_alloc->mem.id;
>> +		return 0;
>> +	}
>> +
>
>The allocator pointer (in-practice) becomes the identifier for the
>mem.id (which rhashtable points to xdp_mem_allocator object).

So, you have no obj against it?

[...]

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-07-01 16:19   ` Jesper Dangaard Brouer
  2019-07-01 18:09     ` Ilias Apalodimas
@ 2019-07-02 11:37     ` Ivan Khoronzhuk
  2019-07-02 13:39       ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 11:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: grygorii.strashko, davem, ast, linux-kernel, linux-omap,
	ilias.apalodimas, netdev, daniel, jakub.kicinski, john.fastabend

On Mon, Jul 01, 2019 at 06:19:01PM +0200, Jesper Dangaard Brouer wrote:
>On Sun, 30 Jun 2019 20:23:48 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> +static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
>> +{
>> +	struct cpsw_common *cpsw = priv->cpsw;
>> +	int ret, new_pool = false;
>> +	struct xdp_rxq_info *rxq;
>> +
>> +	rxq = &priv->xdp_rxq[ch];
>> +
>> +	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (!cpsw->page_pool[ch]) {
>> +		ret =  cpsw_create_rx_pool(cpsw, ch);
>> +		if (ret)
>> +			goto err_rxq;
>> +
>> +		new_pool = true;
>> +	}
>> +
>> +	ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL,
>> +					 cpsw->page_pool[ch]);
>> +	if (!ret)
>> +		return 0;
>> +
>> +	if (new_pool) {
>> +		page_pool_free(cpsw->page_pool[ch]);
>> +		cpsw->page_pool[ch] = NULL;
>> +	}
>> +
>> +err_rxq:
>> +	xdp_rxq_info_unreg(rxq);
>> +	return ret;
>> +}
>
>Looking at this, and Ilias'es XDP-netsec error handling path, it might
>be a mistake that I removed page_pool_destroy() and instead put the
>responsibility on xdp_rxq_info_unreg().
As for me this is started not from page_pool_free, but rather from calling
unreg_mem_model from rxq_info_unreg. Then, if page_pool_free is hidden
it looks more a while normal to move all chain to be self destroyed.

>
>As here, we have to detect if page_pool_create() was a success, and then
>if xdp_rxq_info_reg_mem_model() was a failure, explicitly call
>page_pool_free() because the xdp_rxq_info_unreg() call cannot "free"
>the page_pool object given it was not registered.
Yes, it looked a little bit ugly from the beginning, but, frankly,
I have got used to this already.

>
>Ivan's patch in[1], might be a better approach, which forced all
>drivers to explicitly call page_pool_free(), even-though it just
>dec-refcnt and the real call to page_pool_free() happened via
>xdp_rxq_info_unreg().
>
>To better handle error path, I would re-introduce page_pool_destroy(),
So, you might to do it later as I understand, and not for my special
case but becouse it makes error path to look a little bit more pretty.
I'm perfectly fine with this, and better you add this, for now my
implementation requires only "xdp: allow same allocator usage" patch,
but if you insist I can resend also patch in question afterwards my
series is applied (with modification to cpsw & netsec & mlx5 & page_pool).

What's your choice? I can add to your series patch needed for cpsw to
avoid some misuse.

>as a driver API, that would gracefully handle NULL-pointer case, and
>then call page_pool_free() with the atomic_dec_and_test().  (It should
>hopefully simplify the error handling code a bit)
>
>[1] https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
>
>
>> +void cpsw_ndev_destroy_xdp_rxqs(struct cpsw_priv *priv)
>> +{
>> +	struct cpsw_common *cpsw = priv->cpsw;
>> +	struct xdp_rxq_info *rxq;
>> +	int i;
>> +
>> +	for (i = 0; i < cpsw->rx_ch_num; i++) {
>> +		rxq = &priv->xdp_rxq[i];
>> +		if (xdp_rxq_info_is_reg(rxq))
>> +			xdp_rxq_info_unreg(rxq);
>> +	}
>> +}
>
>Are you sure you need to test xdp_rxq_info_is_reg() here?
Yes it's required in my case as it's used in error path where
an rx queue can be even not registered and no need in this warn.

>
>You should just call xdp_rxq_info_unreg(rxq), if you know that this rxq
>should be registered.  If your assumption failed, you will get a
>WARNing, and discover your driver level bug.  This is one of the ways
>the API is designed to "detect" misuse of the API.  (I found this
>rather useful, when I converted the approx 12 drivers using this
>xdp_rxq_info API).
>
>-- 
>Best regards,
>  Jesper Dangaard Brouer
>  MSc.CS, Principal Kernel Engineer at Red Hat
>  LinkedIn: http://www.linkedin.com/in/brouer

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-07-02 11:37     ` Ivan Khoronzhuk
@ 2019-07-02 13:39       ` Jesper Dangaard Brouer
  2019-07-02 14:24         ` Ivan Khoronzhuk
  2019-07-02 14:31         ` [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy Jesper Dangaard Brouer
  0 siblings, 2 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 13:39 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, davem, ast, linux-kernel, linux-omap,
	ilias.apalodimas, netdev, daniel, jakub.kicinski, john.fastabend,
	brouer

On Tue, 2 Jul 2019 14:37:39 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Mon, Jul 01, 2019 at 06:19:01PM +0200, Jesper Dangaard Brouer wrote:
> >On Sun, 30 Jun 2019 20:23:48 +0300
> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >  
> >> +static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
> >> +{
> >> +	struct cpsw_common *cpsw = priv->cpsw;
> >> +	int ret, new_pool = false;
> >> +	struct xdp_rxq_info *rxq;
> >> +
> >> +	rxq = &priv->xdp_rxq[ch];
> >> +
> >> +	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	if (!cpsw->page_pool[ch]) {
> >> +		ret =  cpsw_create_rx_pool(cpsw, ch);
> >> +		if (ret)
> >> +			goto err_rxq;
> >> +
> >> +		new_pool = true;
> >> +	}
> >> +
> >> +	ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL,
> >> +					 cpsw->page_pool[ch]);
> >> +	if (!ret)
> >> +		return 0;
> >> +
> >> +	if (new_pool) {
> >> +		page_pool_free(cpsw->page_pool[ch]);
> >> +		cpsw->page_pool[ch] = NULL;
> >> +	}
> >> +
> >> +err_rxq:
> >> +	xdp_rxq_info_unreg(rxq);
> >> +	return ret;
> >> +}  
> >
> >Looking at this, and Ilias'es XDP-netsec error handling path, it might
> >be a mistake that I removed page_pool_destroy() and instead put the
> >responsibility on xdp_rxq_info_unreg().  
>
> As for me this is started not from page_pool_free, but rather from calling
> unreg_mem_model from rxq_info_unreg. Then, if page_pool_free is hidden
> it looks more a while normal to move all chain to be self destroyed.
> 
> >
> >As here, we have to detect if page_pool_create() was a success, and then
> >if xdp_rxq_info_reg_mem_model() was a failure, explicitly call
> >page_pool_free() because the xdp_rxq_info_unreg() call cannot "free"
> >the page_pool object given it was not registered.  
>
> Yes, it looked a little bit ugly from the beginning, but, frankly,
> I have got used to this already.
> 
> >
> >Ivan's patch in[1], might be a better approach, which forced all
> >drivers to explicitly call page_pool_free(), even-though it just
> >dec-refcnt and the real call to page_pool_free() happened via
> >xdp_rxq_info_unreg().
> >
> >To better handle error path, I would re-introduce page_pool_destroy(),
>
> So, you might to do it later as I understand, and not for my special
> case but becouse it makes error path to look a little bit more pretty.
> I'm perfectly fine with this, and better you add this, for now my
> implementation requires only "xdp: allow same allocator usage" patch,
> but if you insist I can resend also patch in question afterwards my
> series is applied (with modification to cpsw & netsec & mlx5 & page_pool).
> 
> What's your choice? I can add to your series patch needed for cpsw to
> avoid some misuse.

I will try to create a cleaned-up version of your patch[1] and
re-introduce page_pool_destroy() for drivers to use, then we can build
your driver on top of that.


> >as a driver API, that would gracefully handle NULL-pointer case, and
> >then call page_pool_free() with the atomic_dec_and_test().  (It should
> >hopefully simplify the error handling code a bit)
> >
> >[1] https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
[...]

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-07-02 13:39       ` Jesper Dangaard Brouer
@ 2019-07-02 14:24         ` Ivan Khoronzhuk
  2019-07-02 14:31         ` [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy Jesper Dangaard Brouer
  1 sibling, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 14:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: grygorii.strashko, davem, ast, linux-kernel, linux-omap,
	ilias.apalodimas, netdev, daniel, jakub.kicinski, john.fastabend

On Tue, Jul 02, 2019 at 03:39:02PM +0200, Jesper Dangaard Brouer wrote:
>On Tue, 2 Jul 2019 14:37:39 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> On Mon, Jul 01, 2019 at 06:19:01PM +0200, Jesper Dangaard Brouer wrote:
>> >On Sun, 30 Jun 2019 20:23:48 +0300
>> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>> >
>> >> +static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
>> >> +{
>> >> +	struct cpsw_common *cpsw = priv->cpsw;
>> >> +	int ret, new_pool = false;
>> >> +	struct xdp_rxq_info *rxq;
>> >> +
>> >> +	rxq = &priv->xdp_rxq[ch];
>> >> +
>> >> +	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
>> >> +	if (ret)
>> >> +		return ret;
>> >> +
>> >> +	if (!cpsw->page_pool[ch]) {
>> >> +		ret =  cpsw_create_rx_pool(cpsw, ch);
>> >> +		if (ret)
>> >> +			goto err_rxq;
>> >> +
>> >> +		new_pool = true;
>> >> +	}
>> >> +
>> >> +	ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL,
>> >> +					 cpsw->page_pool[ch]);
>> >> +	if (!ret)
>> >> +		return 0;
>> >> +
>> >> +	if (new_pool) {
>> >> +		page_pool_free(cpsw->page_pool[ch]);
>> >> +		cpsw->page_pool[ch] = NULL;
>> >> +	}
>> >> +
>> >> +err_rxq:
>> >> +	xdp_rxq_info_unreg(rxq);
>> >> +	return ret;
>> >> +}
>> >
>> >Looking at this, and Ilias'es XDP-netsec error handling path, it might
>> >be a mistake that I removed page_pool_destroy() and instead put the
>> >responsibility on xdp_rxq_info_unreg().
>>
>> As for me this is started not from page_pool_free, but rather from calling
>> unreg_mem_model from rxq_info_unreg. Then, if page_pool_free is hidden
>> it looks more a while normal to move all chain to be self destroyed.
>>
>> >
>> >As here, we have to detect if page_pool_create() was a success, and then
>> >if xdp_rxq_info_reg_mem_model() was a failure, explicitly call
>> >page_pool_free() because the xdp_rxq_info_unreg() call cannot "free"
>> >the page_pool object given it was not registered.
>>
>> Yes, it looked a little bit ugly from the beginning, but, frankly,
>> I have got used to this already.
>>
>> >
>> >Ivan's patch in[1], might be a better approach, which forced all
>> >drivers to explicitly call page_pool_free(), even-though it just
>> >dec-refcnt and the real call to page_pool_free() happened via
>> >xdp_rxq_info_unreg().
>> >
>> >To better handle error path, I would re-introduce page_pool_destroy(),
>>
>> So, you might to do it later as I understand, and not for my special
>> case but becouse it makes error path to look a little bit more pretty.
>> I'm perfectly fine with this, and better you add this, for now my
>> implementation requires only "xdp: allow same allocator usage" patch,
>> but if you insist I can resend also patch in question afterwards my
>> series is applied (with modification to cpsw & netsec & mlx5 & page_pool).
>>
>> What's your choice? I can add to your series patch needed for cpsw to
>> avoid some misuse.
>
>I will try to create a cleaned-up version of your patch[1] and
>re-introduce page_pool_destroy() for drivers to use, then we can build
>your driver on top of that.

I've corrected patch to xdp core and tested. The "page pool API" change
seems is orthogonal now. So no limits to send v6 that is actually done
and no more strict dependency on page pool API changes whenever that
can happen.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 13:39       ` Jesper Dangaard Brouer
  2019-07-02 14:24         ` Ivan Khoronzhuk
@ 2019-07-02 14:31         ` Jesper Dangaard Brouer
  2019-07-02 14:44           ` Ivan Khoronzhuk
  1 sibling, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 14:31 UTC (permalink / raw)
  To: netdev, Ilias Apalodimas, ivan.khoronzhuk, Jesper Dangaard Brouer
  Cc: grygorii.strashko, jakub.kicinski, daniel, john.fastabend, ast,
	linux-kernel, linux-omap

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Jesper recently removed page_pool_destroy() (from driver invocation) and
moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
handle in-flight packets/pages. This created an asymmetry in drivers
create/destroy pairs.

This patch add page_pool user refcnt and reintroduce page_pool_destroy.
This serves two purposes, (1) simplify drivers error handling as driver now
drivers always calls page_pool_destroy() and don't need to track if
xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
where a single RX-queue (with a single page_pool) provides packets for two
net_device'es, and thus needs to register the same page_pool twice with two
xdp_rxq_info structures.

This patch is a modified version of Ivan Khoronzhuk's original patch.
Thus, Jesper gives author ownership to Ivan.

Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

To Ivan,
 If you agree with this patch, please add your Signed-off-by.

You can also say if you prefer to take this patch and make it
part of your driver patchset, what ever you prefer.
--Jesper

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++---
 drivers/net/ethernet/socionext/netsec.c           |    8 ++----
 include/net/page_pool.h                           |   27 +++++++++++++++++++++
 net/core/page_pool.c                              |    8 ++++++
 net/core/xdp.c                                    |    3 ++
 5 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1085040675ae..ce1c7a449eae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -545,10 +545,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	}
 	err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
 					 MEM_TYPE_PAGE_POOL, rq->page_pool);
-	if (err) {
-		page_pool_free(rq->page_pool);
+	if (err)
 		goto err_free;
-	}
 
 	for (i = 0; i < wq_sz; i++) {
 		if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) {
@@ -613,6 +611,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	if (rq->xdp_prog)
 		bpf_prog_put(rq->xdp_prog);
 	xdp_rxq_info_unreg(&rq->xdp_rxq);
+	page_pool_destroy(rq->page_pool);
 	mlx5_wq_destroy(&rq->wq_ctrl);
 
 	return err;
@@ -643,6 +642,7 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq)
 	}
 
 	xdp_rxq_info_unreg(&rq->xdp_rxq);
+	page_pool_destroy(rq->page_pool);
 	mlx5_wq_destroy(&rq->wq_ctrl);
 }
 
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 5544a722543f..43ab0ce90704 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1210,15 +1210,11 @@ static void netsec_uninit_pkt_dring(struct netsec_priv *priv, int id)
 		}
 	}
 
-	/* Rx is currently using page_pool
-	 * since the pool is created during netsec_setup_rx_dring(), we need to
-	 * free the pool manually if the registration failed
-	 */
+	/* Rx is currently using page_pool */
 	if (id == NETSEC_RING_RX) {
 		if (xdp_rxq_info_is_reg(&dring->xdp_rxq))
 			xdp_rxq_info_unreg(&dring->xdp_rxq);
-		else
-			page_pool_free(dring->page_pool);
+		page_pool_destroy(dring->page_pool);
 	}
 
 	memset(dring->desc, 0, sizeof(struct netsec_desc) * DESC_NUM);
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index ee9c871d2043..ea974856d0f7 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -101,6 +101,14 @@ struct page_pool {
 	struct ptr_ring ring;
 
 	atomic_t pages_state_release_cnt;
+
+	/* A page_pool is strictly tied to a single RX-queue being
+	 * protected by NAPI, due to above pp_alloc_cache.  This
+	 * refcnt serves two purposes. (1) simplify drivers error
+	 * handling, and (2) allow special cases where a single
+	 * RX-queue provides packet for two net_device'es.
+	 */
+	refcount_t user_cnt;
 };
 
 struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
@@ -134,6 +142,15 @@ static inline void page_pool_free(struct page_pool *pool)
 #endif
 }
 
+/* Drivers use this instead of page_pool_free */
+static inline void page_pool_destroy(struct page_pool *pool)
+{
+	if (!pool)
+		return;
+
+	page_pool_free(pool);
+}
+
 /* Never call this directly, use helpers below */
 void __page_pool_put_page(struct page_pool *pool,
 			  struct page *page, bool allow_direct);
@@ -201,4 +218,14 @@ static inline bool is_page_pool_compiled_in(void)
 #endif
 }
 
+static inline void page_pool_get(struct page_pool *pool)
+{
+	refcount_inc(&pool->user_cnt);
+}
+
+static inline bool page_pool_put(struct page_pool *pool)
+{
+	return refcount_dec_and_test(&pool->user_cnt);
+}
+
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index b366f59885c1..3272dc7a8c81 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -49,6 +49,9 @@ static int page_pool_init(struct page_pool *pool,
 
 	atomic_set(&pool->pages_state_release_cnt, 0);
 
+	/* Driver calling page_pool_create() also call page_pool_destroy() */
+	refcount_set(&pool->user_cnt, 1);
+
 	if (pool->p.flags & PP_FLAG_DMA_MAP)
 		get_device(pool->p.dev);
 
@@ -70,6 +73,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
 		kfree(pool);
 		return ERR_PTR(err);
 	}
+
 	return pool;
 }
 EXPORT_SYMBOL(page_pool_create);
@@ -356,6 +360,10 @@ static void __warn_in_flight(struct page_pool *pool)
 
 void __page_pool_free(struct page_pool *pool)
 {
+	/* Only last user actually free/release resources */
+	if (!page_pool_put(pool))
+		return;
+
 	WARN(pool->alloc.count, "API usage violation");
 	WARN(!ptr_ring_empty(&pool->ring), "ptr_ring is not empty");
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index b29d7b513a18..e57a0eb1feb7 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -372,6 +372,9 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 
 	mutex_unlock(&mem_id_lock);
 
+	if (type == MEM_TYPE_PAGE_POOL)
+		page_pool_get(xdp_alloc->page_pool);
+
 	trace_mem_connect(xdp_alloc, xdp_rxq);
 	return 0;
 err:


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 14:31         ` [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy Jesper Dangaard Brouer
@ 2019-07-02 14:44           ` Ivan Khoronzhuk
  2019-07-02 14:52             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 14:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap

On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:
>From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>
>Jesper recently removed page_pool_destroy() (from driver invocation) and
>moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
>handle in-flight packets/pages. This created an asymmetry in drivers
>create/destroy pairs.
>
>This patch add page_pool user refcnt and reintroduce page_pool_destroy.
>This serves two purposes, (1) simplify drivers error handling as driver now
>drivers always calls page_pool_destroy() and don't need to track if
>xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
>where a single RX-queue (with a single page_pool) provides packets for two
>net_device'es, and thus needs to register the same page_pool twice with two
>xdp_rxq_info structures.

As I tend to use xdp level patch there is no more reason to mention (2) case
here. XDP patch serves it better and can prevent not only obj deletion but also
pool flush, so, this one patch I could better leave only for (1) case.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage
  2019-07-02 10:27     ` Ivan Khoronzhuk
@ 2019-07-02 14:46       ` Jesper Dangaard Brouer
  2019-07-02 14:53         ` Ivan Khoronzhuk
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 14:46 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, hawk, davem, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend, brouer

On Tue, 2 Jul 2019 13:27:07 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Mon, Jul 01, 2019 at 01:40:59PM +0200, Jesper Dangaard Brouer wrote:
> >
> >I'm very skeptical about this approach.
> >
> >On Sun, 30 Jun 2019 20:23:43 +0300
> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >  
> >> XDP rxqs can be same for ndevs running under same rx napi softirq.
> >> But there is no ability to register same allocator for both rxqs,
> >> by fact it can same rxq but has different ndev as a reference.  
> >
> >This description is not very clear. It can easily be misunderstood.
> >
> >It is an absolute requirement that each RX-queue have their own
> >page_pool object/allocator. (This where the performance comes from) as
> >the page_pool have NAPI protected array for alloc and XDP_DROP recycle.
> >
> >Your driver/hardware seems to have special case, where a single
> >RX-queue can receive packets for two different net_device'es.
> >
> >Do you violate this XDP devmap redirect assumption[1]?
> >[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329  
> Seems that yes, but that's used only for trace for now.
> As it runs under napi and flush clear dev_rx i must do it right in the
> rx_handler. So next patchset version will have one patch less.
> 
> Thanks!
> 
> >
> >  
> >> Due to last changes allocator destroy can be defered till the moment
> >> all packets are recycled by destination interface, afterwards it's
> >> freed. In order to schedule allocator destroy only after all users are
> >> unregistered, add refcnt to allocator object and schedule to destroy
> >> only it reaches 0.  
> >
> >The guiding principles when designing an API, is to make it easy to
> >use, but also make it hard to misuse.
> >
> >Your API change makes it easy to misuse the API.  As it make it easy to
> >(re)use the allocator pointer (likely page_pool) for multiple
> >xdp_rxq_info structs.  It is only valid for your use-case, because you
> >have hardware where a single RX-queue delivers to two different
> >net_devices.  For other normal use-cases, this will be a violation.
> >
> >If I was a user of this API, and saw your xdp_allocator_get(), then I
> >would assume that this was the normal case.  As minimum, we need to add
> >a comment in the code, about this specific/intended use-case.  I
> >through about detecting the misuse, by adding a queue_index to
> >xdp_mem_allocator, that can be checked against, when calling
> >xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
> >the obvious mistake where queue_index mismatch).  
> 
> I can add, but not sure if it has or can have some conflicts with another
> memory allocators, now or in future. Main here to not became a cornerstone
> in some not obvious use-cases.
> 
> So, for now, let it be in this way:
> 
> --- a/include/net/xdp_priv.h
> +++ b/include/net/xdp_priv.h
> @@ -19,6 +19,7 @@ struct xdp_mem_allocator {
>         struct delayed_work defer_wq;
>         unsigned long defer_warn;
>         unsigned long refcnt;
> +       u32 queue_index;
>  };
> 
>  #endif /* __LINUX_NET_XDP_PRIV_H__ */
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index a44621190fdc..c4bf29810f4d 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -324,7 +324,7 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
>         return true;
>  }
> 
> -static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
> +static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
>  {
>         struct xdp_mem_allocator *xae, *xa = NULL;
>         struct rhashtable_iter iter;
> @@ -336,7 +336,6 @@ static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
> 
>                 while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
>                         if (xae->allocator == allocator) {
> -                               xae->refcnt++;
>                                 xa = xae;
>                                 break;
>                         }
> @@ -386,9 +385,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>                 }
>         }
> 
> -       xdp_alloc = xdp_allocator_get(allocator);
> +       xdp_alloc = xdp_allocator_find(allocator);
>         if (xdp_alloc) {
> +               if (xdp_alloc->queue_index != xdp_rxq->queue_index)
> +                       return -EINVAL;
> +
>                 xdp_rxq->mem.id = xdp_alloc->mem.id;
> +               xdp_alloc->refcnt++;

This is now adjusted outside lock, not good.

>                 return 0;
>         }
> 
> @@ -406,6 +409,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>         xdp_alloc->mem  = xdp_rxq->mem;
>         xdp_alloc->allocator = allocator;
>         xdp_alloc->refcnt = 1;
> +       xdp_alloc->queue_index = xdp_rxq->queue_index;
> 
>         /* Insert allocator into ID lookup table */
>         ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
> 
> Jesper, are you Ok with this version?

Please see my other patch, this is based on our first refcnt attempt.
I think that other patch is a better way forward.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 14:44           ` Ivan Khoronzhuk
@ 2019-07-02 14:52             ` Jesper Dangaard Brouer
  2019-07-02 14:56               ` Ivan Khoronzhuk
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 14:52 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap, brouer

On Tue, 2 Jul 2019 17:44:27 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:
> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> >
> >Jesper recently removed page_pool_destroy() (from driver invocation) and
> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
> >handle in-flight packets/pages. This created an asymmetry in drivers
> >create/destroy pairs.
> >
> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
> >This serves two purposes, (1) simplify drivers error handling as driver now
> >drivers always calls page_pool_destroy() and don't need to track if
> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
> >where a single RX-queue (with a single page_pool) provides packets for two
> >net_device'es, and thus needs to register the same page_pool twice with two
> >xdp_rxq_info structures.  
> 
> As I tend to use xdp level patch there is no more reason to mention (2) case
> here. XDP patch serves it better and can prevent not only obj deletion but also
> pool flush, so, this one patch I could better leave only for (1) case.
 
I don't understand what you are saying.

Do you approve this patch, or do you reject this patch?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage
  2019-07-02 14:46       ` Jesper Dangaard Brouer
@ 2019-07-02 14:53         ` Ivan Khoronzhuk
  0 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 14:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: grygorii.strashko, hawk, davem, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend

On Tue, Jul 02, 2019 at 04:46:48PM +0200, Jesper Dangaard Brouer wrote:
>On Tue, 2 Jul 2019 13:27:07 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> On Mon, Jul 01, 2019 at 01:40:59PM +0200, Jesper Dangaard Brouer wrote:
>> >
>> >I'm very skeptical about this approach.
>> >
>> >On Sun, 30 Jun 2019 20:23:43 +0300
>> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>> >
>> >> XDP rxqs can be same for ndevs running under same rx napi softirq.
>> >> But there is no ability to register same allocator for both rxqs,
>> >> by fact it can same rxq but has different ndev as a reference.
>> >
>> >This description is not very clear. It can easily be misunderstood.
>> >
>> >It is an absolute requirement that each RX-queue have their own
>> >page_pool object/allocator. (This where the performance comes from) as
>> >the page_pool have NAPI protected array for alloc and XDP_DROP recycle.
>> >
>> >Your driver/hardware seems to have special case, where a single
>> >RX-queue can receive packets for two different net_device'es.
>> >
>> >Do you violate this XDP devmap redirect assumption[1]?
>> >[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329
>> Seems that yes, but that's used only for trace for now.
>> As it runs under napi and flush clear dev_rx i must do it right in the
>> rx_handler. So next patchset version will have one patch less.
>>
>> Thanks!
>>
>> >
>> >
>> >> Due to last changes allocator destroy can be defered till the moment
>> >> all packets are recycled by destination interface, afterwards it's
>> >> freed. In order to schedule allocator destroy only after all users are
>> >> unregistered, add refcnt to allocator object and schedule to destroy
>> >> only it reaches 0.
>> >
>> >The guiding principles when designing an API, is to make it easy to
>> >use, but also make it hard to misuse.
>> >
>> >Your API change makes it easy to misuse the API.  As it make it easy to
>> >(re)use the allocator pointer (likely page_pool) for multiple
>> >xdp_rxq_info structs.  It is only valid for your use-case, because you
>> >have hardware where a single RX-queue delivers to two different
>> >net_devices.  For other normal use-cases, this will be a violation.
>> >
>> >If I was a user of this API, and saw your xdp_allocator_get(), then I
>> >would assume that this was the normal case.  As minimum, we need to add
>> >a comment in the code, about this specific/intended use-case.  I
>> >through about detecting the misuse, by adding a queue_index to
>> >xdp_mem_allocator, that can be checked against, when calling
>> >xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
>> >the obvious mistake where queue_index mismatch).
>>
>> I can add, but not sure if it has or can have some conflicts with another
>> memory allocators, now or in future. Main here to not became a cornerstone
>> in some not obvious use-cases.
>>
>> So, for now, let it be in this way:
>>
>> --- a/include/net/xdp_priv.h
>> +++ b/include/net/xdp_priv.h
>> @@ -19,6 +19,7 @@ struct xdp_mem_allocator {
>>         struct delayed_work defer_wq;
>>         unsigned long defer_warn;
>>         unsigned long refcnt;
>> +       u32 queue_index;
>>  };
>>
>>  #endif /* __LINUX_NET_XDP_PRIV_H__ */
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index a44621190fdc..c4bf29810f4d 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -324,7 +324,7 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
>>         return true;
>>  }
>>
>> -static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>> +static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
>>  {
>>         struct xdp_mem_allocator *xae, *xa = NULL;
>>         struct rhashtable_iter iter;
>> @@ -336,7 +336,6 @@ static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>>
>>                 while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
>>                         if (xae->allocator == allocator) {
>> -                               xae->refcnt++;
>>                                 xa = xae;
>>                                 break;
>>                         }
>> @@ -386,9 +385,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>>                 }
>>         }
>>
>> -       xdp_alloc = xdp_allocator_get(allocator);
>> +       xdp_alloc = xdp_allocator_find(allocator);
>>         if (xdp_alloc) {
>> +               if (xdp_alloc->queue_index != xdp_rxq->queue_index)
>> +                       return -EINVAL;
>> +
>>                 xdp_rxq->mem.id = xdp_alloc->mem.id;
>> +               xdp_alloc->refcnt++;
>
>This is now adjusted outside lock, not good.

In final it serves:

From f43a0b85838f75814cc93e5a724c4c7e5615f936 Mon Sep 17 00:00:00 2001
From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Fri, 28 Jun 2019 03:17:24 +0300
Subject: [PATCH] xdp: allow same allocator usage

XDP rxqs can be same for ndevs running under same rx napi softirq.
But there is no ability to register same allocator for both rxqs,
by fact it can same rxq but has different ndev as a reference.

Due to last changes allocator destroy can be defered till the moment
all packets are recycled by destination interface, afterwards it's
freed. In order to schedule allocator destroy only after all users are
unregistered, add refcnt to allocator object and schedule to destroy
only it reaches 0.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 include/net/xdp_priv.h |  2 ++
 net/core/xdp.c         | 52 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
index 6a8cba6ea79a..9858a4057842 100644
--- a/include/net/xdp_priv.h
+++ b/include/net/xdp_priv.h
@@ -18,6 +18,8 @@ struct xdp_mem_allocator {
 	struct rcu_head rcu;
 	struct delayed_work defer_wq;
 	unsigned long defer_warn;
+	unsigned long refcnt;
+	u32 queue_index;
 };
 
 #endif /* __LINUX_NET_XDP_PRIV_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 829377cc83db..090f26e4f793 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -98,6 +98,18 @@ static bool __mem_id_disconnect(int id, bool force)
 		WARN(1, "Request remove non-existing id(%d), driver bug?", id);
 		return true;
 	}
+
+	/* to avoid calling hash lookup twice, decrement refcnt here till it
+	 * reaches zero, then it can be called from workqueue afterwards.
+	 */
+	if (xa->refcnt)
+		xa->refcnt--;
+
+	if (xa->refcnt) {
+		mutex_unlock(&mem_id_lock);
+		return true;
+	}
+
 	xa->disconnect_cnt++;
 
 	/* Detects in-flight packet-pages for page_pool */
@@ -312,6 +324,30 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
 	return true;
 }
 
+static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
+{
+	struct xdp_mem_allocator *xae, *xa = NULL;
+	struct rhashtable_iter iter;
+
+	rhashtable_walk_enter(mem_id_ht, &iter);
+	do {
+		rhashtable_walk_start(&iter);
+
+		while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
+			if (xae->allocator == allocator) {
+				xa = xae;
+				break;
+			}
+		}
+
+		rhashtable_walk_stop(&iter);
+
+	} while (xae == ERR_PTR(-EAGAIN));
+	rhashtable_walk_exit(&iter);
+
+	return xa;
+}
+
 int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 			       enum xdp_mem_type type, void *allocator)
 {
@@ -347,6 +383,20 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 		}
 	}
 
+	mutex_lock(&mem_id_lock);
+	xdp_alloc = xdp_allocator_find(allocator);
+	if (xdp_alloc) {
+		/* One allocator per queue is supposed only */
+		if (xdp_alloc->queue_index != xdp_rxq->queue_index)
+			return -EINVAL;
+
+		xdp_rxq->mem.id = xdp_alloc->mem.id;
+		xdp_alloc->refcnt++;
+		mutex_unlock(&mem_id_lock);
+		return 0;
+	}
+	mutex_unlock(&mem_id_lock);
+
 	xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
 	if (!xdp_alloc)
 		return -ENOMEM;
@@ -360,6 +410,8 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 	xdp_rxq->mem.id = id;
 	xdp_alloc->mem  = xdp_rxq->mem;
 	xdp_alloc->allocator = allocator;
+	xdp_alloc->refcnt = 1;
+	xdp_alloc->queue_index = xdp_rxq->queue_index;
 
 	/* Insert allocator into ID lookup table */
 	ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);

>
>>                 return 0;
>>         }
>>
>> @@ -406,6 +409,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>>         xdp_alloc->mem  = xdp_rxq->mem;
>>         xdp_alloc->allocator = allocator;
>>         xdp_alloc->refcnt = 1;
>> +       xdp_alloc->queue_index = xdp_rxq->queue_index;
>>
>>         /* Insert allocator into ID lookup table */
>>         ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
>>
>> Jesper, are you Ok with this version?
>
>Please see my other patch, this is based on our first refcnt attempt.
>I think that other patch is a better way forward.
XDP patch serves it better and can prevent not only obj deletion but also
pool flush. So I propose use 2 patches.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 14:52             ` Jesper Dangaard Brouer
@ 2019-07-02 14:56               ` Ivan Khoronzhuk
  2019-07-02 15:10                 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 14:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap

On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:
>On Tue, 2 Jul 2019 17:44:27 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:
>> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> >
>> >Jesper recently removed page_pool_destroy() (from driver invocation) and
>> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
>> >handle in-flight packets/pages. This created an asymmetry in drivers
>> >create/destroy pairs.
>> >
>> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
>> >This serves two purposes, (1) simplify drivers error handling as driver now
>> >drivers always calls page_pool_destroy() and don't need to track if
>> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
>> >where a single RX-queue (with a single page_pool) provides packets for two
>> >net_device'es, and thus needs to register the same page_pool twice with two
>> >xdp_rxq_info structures.
>>
>> As I tend to use xdp level patch there is no more reason to mention (2) case
>> here. XDP patch serves it better and can prevent not only obj deletion but also
>> pool flush, so, this one patch I could better leave only for (1) case.
>
>I don't understand what you are saying.
>
>Do you approve this patch, or do you reject this patch?
>
It's not reject, it's proposition to use both, XDP and page pool patches,
each having its goal.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 14:56               ` Ivan Khoronzhuk
@ 2019-07-02 15:10                 ` Jesper Dangaard Brouer
  2019-07-02 15:21                   ` Ivan Khoronzhuk
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 15:10 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap, brouer

On Tue, 2 Jul 2019 17:56:13 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:
> >On Tue, 2 Jul 2019 17:44:27 +0300
> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >  
> >> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:  
> >> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> >> >
> >> >Jesper recently removed page_pool_destroy() (from driver invocation) and
> >> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
> >> >handle in-flight packets/pages. This created an asymmetry in drivers
> >> >create/destroy pairs.
> >> >
> >> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
> >> >This serves two purposes, (1) simplify drivers error handling as driver now
> >> >drivers always calls page_pool_destroy() and don't need to track if
> >> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
> >> >where a single RX-queue (with a single page_pool) provides packets for two
> >> >net_device'es, and thus needs to register the same page_pool twice with two
> >> >xdp_rxq_info structures.  
> >>
> >> As I tend to use xdp level patch there is no more reason to mention (2) case
> >> here. XDP patch serves it better and can prevent not only obj deletion but also
> >> pool flush, so, this one patch I could better leave only for (1) case.  
> >
> >I don't understand what you are saying.
> >
> >Do you approve this patch, or do you reject this patch?
> >  
> It's not reject, it's proposition to use both, XDP and page pool patches,
> each having its goal.

Just to be clear, if you want this patch to get accepted you have to
reply with your Signed-off-by (as I wrote).

Maybe we should discuss it in another thread, about why you want two
solutions to the same problem.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 15:10                 ` Jesper Dangaard Brouer
@ 2019-07-02 15:21                   ` Ivan Khoronzhuk
  2019-07-02 18:29                     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 15:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap

On Tue, Jul 02, 2019 at 05:10:29PM +0200, Jesper Dangaard Brouer wrote:
>On Tue, 2 Jul 2019 17:56:13 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:
>> >On Tue, 2 Jul 2019 17:44:27 +0300
>> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>> >
>> >> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:
>> >> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> >> >
>> >> >Jesper recently removed page_pool_destroy() (from driver invocation) and
>> >> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
>> >> >handle in-flight packets/pages. This created an asymmetry in drivers
>> >> >create/destroy pairs.
>> >> >
>> >> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
>> >> >This serves two purposes, (1) simplify drivers error handling as driver now
>> >> >drivers always calls page_pool_destroy() and don't need to track if
>> >> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
>> >> >where a single RX-queue (with a single page_pool) provides packets for two
>> >> >net_device'es, and thus needs to register the same page_pool twice with two
>> >> >xdp_rxq_info structures.
>> >>
>> >> As I tend to use xdp level patch there is no more reason to mention (2) case
>> >> here. XDP patch serves it better and can prevent not only obj deletion but also
>> >> pool flush, so, this one patch I could better leave only for (1) case.
>> >
>> >I don't understand what you are saying.
>> >
>> >Do you approve this patch, or do you reject this patch?
>> >
>> It's not reject, it's proposition to use both, XDP and page pool patches,
>> each having its goal.
>
>Just to be clear, if you want this patch to get accepted you have to
>reply with your Signed-off-by (as I wrote).
>
>Maybe we should discuss it in another thread, about why you want two
>solutions to the same problem.

If it solves same problem I propose to reject this one and use this:
https://lkml.org/lkml/2019/7/2/651

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 15:21                   ` Ivan Khoronzhuk
@ 2019-07-02 18:29                     ` Jesper Dangaard Brouer
  2019-07-02 18:58                       ` Ivan Khoronzhuk
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 18:29 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap, brouer

On Tue, 2 Jul 2019 18:21:13 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Tue, Jul 02, 2019 at 05:10:29PM +0200, Jesper Dangaard Brouer wrote:
> >On Tue, 2 Jul 2019 17:56:13 +0300
> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >  
> >> On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:  
> >> >On Tue, 2 Jul 2019 17:44:27 +0300
> >> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >> >  
> >> >> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:  
> >> >> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> >> >> >
> >> >> >Jesper recently removed page_pool_destroy() (from driver invocation) and
> >> >> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
> >> >> >handle in-flight packets/pages. This created an asymmetry in drivers
> >> >> >create/destroy pairs.
> >> >> >
> >> >> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
> >> >> >This serves two purposes, (1) simplify drivers error handling as driver now
> >> >> >drivers always calls page_pool_destroy() and don't need to track if
> >> >> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
> >> >> >where a single RX-queue (with a single page_pool) provides packets for two
> >> >> >net_device'es, and thus needs to register the same page_pool twice with two
> >> >> >xdp_rxq_info structures.  
> >> >>
> >> >> As I tend to use xdp level patch there is no more reason to mention (2) case
> >> >> here. XDP patch serves it better and can prevent not only obj deletion but also
> >> >> pool flush, so, this one patch I could better leave only for (1) case.  
> >> >
> >> >I don't understand what you are saying.
> >> >
> >> >Do you approve this patch, or do you reject this patch?
> >> >  
> >> It's not reject, it's proposition to use both, XDP and page pool patches,
> >> each having its goal.  
> >
> >Just to be clear, if you want this patch to get accepted you have to
> >reply with your Signed-off-by (as I wrote).
> >
> >Maybe we should discuss it in another thread, about why you want two
> >solutions to the same problem.  
> 
> If it solves same problem I propose to reject this one and use this:
> https://lkml.org/lkml/2019/7/2/651

No, I propose using this one, and rejecting the other one.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 18:29                     ` Jesper Dangaard Brouer
@ 2019-07-02 18:58                       ` Ivan Khoronzhuk
  2019-07-02 20:28                         ` Ivan Khoronzhuk
  2019-07-02 21:02                         ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 18:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap

On Tue, Jul 02, 2019 at 08:29:07PM +0200, Jesper Dangaard Brouer wrote:
>On Tue, 2 Jul 2019 18:21:13 +0300
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> On Tue, Jul 02, 2019 at 05:10:29PM +0200, Jesper Dangaard Brouer wrote:
>> >On Tue, 2 Jul 2019 17:56:13 +0300
>> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>> >
>> >> On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:
>> >> >On Tue, 2 Jul 2019 17:44:27 +0300
>> >> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>> >> >
>> >> >> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:
>> >> >> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> >> >> >
>> >> >> >Jesper recently removed page_pool_destroy() (from driver invocation) and
>> >> >> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
>> >> >> >handle in-flight packets/pages. This created an asymmetry in drivers
>> >> >> >create/destroy pairs.
>> >> >> >
>> >> >> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
>> >> >> >This serves two purposes, (1) simplify drivers error handling as driver now
>> >> >> >drivers always calls page_pool_destroy() and don't need to track if
>> >> >> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
>> >> >> >where a single RX-queue (with a single page_pool) provides packets for two
>> >> >> >net_device'es, and thus needs to register the same page_pool twice with two
>> >> >> >xdp_rxq_info structures.
>> >> >>
>> >> >> As I tend to use xdp level patch there is no more reason to mention (2) case
>> >> >> here. XDP patch serves it better and can prevent not only obj deletion but also
>> >> >> pool flush, so, this one patch I could better leave only for (1) case.
>> >> >
>> >> >I don't understand what you are saying.
>> >> >
>> >> >Do you approve this patch, or do you reject this patch?
>> >> >
>> >> It's not reject, it's proposition to use both, XDP and page pool patches,
>> >> each having its goal.
>> >
>> >Just to be clear, if you want this patch to get accepted you have to
>> >reply with your Signed-off-by (as I wrote).
>> >
>> >Maybe we should discuss it in another thread, about why you want two
>> >solutions to the same problem.
>>
>> If it solves same problem I propose to reject this one and use this:
>> https://lkml.org/lkml/2019/7/2/651
>
>No, I propose using this one, and rejecting the other one.

There is at least several arguments against this one (related (2) purpose)

It allows:
- avoid changes to page_pool/mlx5/netsec
- save not only allocator obj but allocator "page/buffer flush"
- buffer flush can be present not only in page_pool but for other allocators
  that can behave differently and not so simple solution.
- to not limit cpsw/(potentially others) to use "page_pool" allocator only
....

This patch better leave also, as it simplifies error path for page_pool and
have more error prone usage comparing with existent one.

Please, don't limit cpsw and potentially other drivers to use only
page_pool it can be zca or etc... I don't won't to modify each allocator.
I propose to add both as by fact they solve different problems with common
solution.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 18:58                       ` Ivan Khoronzhuk
@ 2019-07-02 20:28                         ` Ivan Khoronzhuk
  2019-07-02 21:02                         ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 20:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Ilias Apalodimas,
	grygorii.strashko, jakub.kicinski, daniel, john.fastabend, ast,
	linux-kernel, linux-omap

On Tue, Jul 02, 2019 at 09:58:40PM +0300, Ivan Khoronzhuk wrote:
>On Tue, Jul 02, 2019 at 08:29:07PM +0200, Jesper Dangaard Brouer wrote:
>>On Tue, 2 Jul 2019 18:21:13 +0300
>>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>>
>>>On Tue, Jul 02, 2019 at 05:10:29PM +0200, Jesper Dangaard Brouer wrote:
>>>>On Tue, 2 Jul 2019 17:56:13 +0300
>>>>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>>>>
>>>>> On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:
>>>>> >On Tue, 2 Jul 2019 17:44:27 +0300
>>>>> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>>>>> >
>>>>> >> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:
>>>>> >> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>>>> >> >
>>>>> >> >Jesper recently removed page_pool_destroy() (from driver invocation) and
>>>>> >> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
>>>>> >> >handle in-flight packets/pages. This created an asymmetry in drivers
>>>>> >> >create/destroy pairs.
>>>>> >> >
>>>>> >> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
>>>>> >> >This serves two purposes, (1) simplify drivers error handling as driver now
>>>>> >> >drivers always calls page_pool_destroy() and don't need to track if
>>>>> >> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
>>>>> >> >where a single RX-queue (with a single page_pool) provides packets for two
>>>>> >> >net_device'es, and thus needs to register the same page_pool twice with two
>>>>> >> >xdp_rxq_info structures.
>>>>> >>
>>>>> >> As I tend to use xdp level patch there is no more reason to mention (2) case
>>>>> >> here. XDP patch serves it better and can prevent not only obj deletion but also
>>>>> >> pool flush, so, this one patch I could better leave only for (1) case.
>>>>> >
>>>>> >I don't understand what you are saying.
>>>>> >
>>>>> >Do you approve this patch, or do you reject this patch?
>>>>> >
>>>>> It's not reject, it's proposition to use both, XDP and page pool patches,
>>>>> each having its goal.
>>>>
>>>>Just to be clear, if you want this patch to get accepted you have to
>>>>reply with your Signed-off-by (as I wrote).
>>>>
>>>>Maybe we should discuss it in another thread, about why you want two
>>>>solutions to the same problem.
>>>
>>>If it solves same problem I propose to reject this one and use this:
>>>https://lkml.org/lkml/2019/7/2/651
>>
>>No, I propose using this one, and rejecting the other one.
>
>There is at least several arguments against this one (related (2) purpose)
>
>It allows:
>- avoid changes to page_pool/mlx5/netsec
>- save not only allocator obj but allocator "page/buffer flush"
>- buffer flush can be present not only in page_pool but for other allocators
> that can behave differently and not so simple solution.
>- to not limit cpsw/(potentially others) to use "page_pool" allocator only
>....
>
>This patch better leave also, as it simplifies error path for page_pool and
>have more error prone usage comparing with existent one.
>
>Please, don't limit cpsw and potentially other drivers to use only
>page_pool it can be zca or etc... I don't won't to modify each allocator.
>I propose to add both as by fact they solve different problems with common
>solution.

I can pick up this one but remove description related to (2) and add
appropriate modifications to cpsw.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 18:58                       ` Ivan Khoronzhuk
  2019-07-02 20:28                         ` Ivan Khoronzhuk
@ 2019-07-02 21:02                         ` Jesper Dangaard Brouer
  2019-07-02 21:15                           ` Ilias Apalodimas
  1 sibling, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-02 21:02 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: netdev, Ilias Apalodimas, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap, brouer

On Tue, 2 Jul 2019 21:58:40 +0300
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Tue, Jul 02, 2019 at 08:29:07PM +0200, Jesper Dangaard Brouer wrote:
> >On Tue, 2 Jul 2019 18:21:13 +0300
> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >  
> >> On Tue, Jul 02, 2019 at 05:10:29PM +0200, Jesper Dangaard Brouer wrote:  
> >> >On Tue, 2 Jul 2019 17:56:13 +0300
> >> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >> >  
> >> >> On Tue, Jul 02, 2019 at 04:52:30PM +0200, Jesper Dangaard Brouer wrote:  
> >> >> >On Tue, 2 Jul 2019 17:44:27 +0300
> >> >> >Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >> >> >  
> >> >> >> On Tue, Jul 02, 2019 at 04:31:39PM +0200, Jesper Dangaard Brouer wrote:  
> >> >> >> >From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> >> >> >> >
> >> >> >> >Jesper recently removed page_pool_destroy() (from driver invocation) and
> >> >> >> >moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to
> >> >> >> >handle in-flight packets/pages. This created an asymmetry in drivers
> >> >> >> >create/destroy pairs.
> >> >> >> >
> >> >> >> >This patch add page_pool user refcnt and reintroduce page_pool_destroy.
> >> >> >> >This serves two purposes, (1) simplify drivers error handling as driver now
> >> >> >> >drivers always calls page_pool_destroy() and don't need to track if
> >> >> >> >xdp_rxq_info_reg_mem_model() was unsuccessful. (2) allow special cases
> >> >> >> >where a single RX-queue (with a single page_pool) provides packets for two
> >> >> >> >net_device'es, and thus needs to register the same page_pool twice with two
> >> >> >> >xdp_rxq_info structures.  
> >> >> >>
> >> >> >> As I tend to use xdp level patch there is no more reason to mention (2) case
> >> >> >> here. XDP patch serves it better and can prevent not only obj deletion but also
> >> >> >> pool flush, so, this one patch I could better leave only for (1) case.  
> >> >> >
> >> >> >I don't understand what you are saying.
> >> >> >
> >> >> >Do you approve this patch, or do you reject this patch?
> >> >> >  
> >> >> It's not reject, it's proposition to use both, XDP and page pool patches,
> >> >> each having its goal.  
> >> >
> >> >Just to be clear, if you want this patch to get accepted you have to
> >> >reply with your Signed-off-by (as I wrote).
> >> >
> >> >Maybe we should discuss it in another thread, about why you want two
> >> >solutions to the same problem.  
> >>
> >> If it solves same problem I propose to reject this one and use this:
> >> https://lkml.org/lkml/2019/7/2/651  
> >
> >No, I propose using this one, and rejecting the other one.  
> 
> There is at least several arguments against this one (related (2) purpose)
> 
> It allows:
> - avoid changes to page_pool/mlx5/netsec
> - save not only allocator obj but allocator "page/buffer flush"
> - buffer flush can be present not only in page_pool but for other allocators
>   that can behave differently and not so simple solution.
> - to not limit cpsw/(potentially others) to use "page_pool" allocator only
> ....
> 
> This patch better leave also, as it simplifies error path for page_pool and
> have more error prone usage comparing with existent one.
> 
> Please, don't limit cpsw and potentially other drivers to use only
> page_pool it can be zca or etc... I don't won't to modify each allocator.
> I propose to add both as by fact they solve different problems with common
> solution.


I'm trying to limit the scope of your changes, for your special case,
because I'm afraid this more common solution is going to limit our
options, painting ourselves into a corner.

E.g. for correct lifetime handling, I think we actually need to do a
dev_hold() on the net_device. (Changes in f71fec47c2 might not be
enough, but I first need to dig into the details and ask Hellwig about
some details).  Adding that after your patch is more complicated (if
even doable).

E.g. doing dev_hold() on the net_device, can also turn into a
performance advantage, when/if page_pool is extended to also "travel"
into SKBs. (Allowing to elide such dev_hold() calls in netstack).

I also worry about the possible performance impact these changes will
have down the road.  (For the RX/alloc side it should be clear by now
that we gain a lot of performance with the single RX-queue binding and
napi protection).  On the return/free side performance *need* to be
improved (it doesn't scale).  I'm basically looking at different ways
to bulk return pages into the ptr_ring, which requires changes in
page_pool and likely in xdp_allocator structure.  Which your changes
are complicating.

This special use-case, seems confined to your driver. And Ilias told me
that XDP is not really a performance benefit for this driver as the HW
PPS-limit is hit before the XDP and netstack limit.  I ask, does it
make sense to add XDP to this driver, if it complicates the code for
everybody else?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 21:02                         ` Jesper Dangaard Brouer
@ 2019-07-02 21:15                           ` Ilias Apalodimas
  2019-07-02 21:41                             ` Ivan Khoronzhuk
  0 siblings, 1 reply; 30+ messages in thread
From: Ilias Apalodimas @ 2019-07-02 21:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Ivan Khoronzhuk, netdev, grygorii.strashko, jakub.kicinski,
	daniel, john.fastabend, ast, linux-kernel, linux-omap

Hi Jesper,
Getting late here, i'll respond in detail tomorrow. One point though

[...]
> 
> This special use-case, seems confined to your driver. And Ilias told me
> that XDP is not really a performance benefit for this driver as the HW
> PPS-limit is hit before the XDP and netstack limit.  I ask, does it
> make sense to add XDP to this driver, if it complicates the code for
> everybody else?
I think yes. This is a widely used driver on TI embedded devices so having XDP 
to play along is a nice feature. It's also the first and only armv7 we have
supporting this. Ivan already found a couple of issues due to the 32-bit
architecture he is trying to fix, i think there's real benefit in having that,
performance aside.
I fully agree we should not impact the performance of the API to support a
special hardware though. I'll have a look on the 2 solutions tomorrow, but the
general approach on this one should be 'the simpler the better'

Cheers
/Ilias

> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
  2019-07-02 21:15                           ` Ilias Apalodimas
@ 2019-07-02 21:41                             ` Ivan Khoronzhuk
  0 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-02 21:41 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jesper Dangaard Brouer, netdev, grygorii.strashko,
	jakub.kicinski, daniel, john.fastabend, ast, linux-kernel,
	linux-omap

On Wed, Jul 03, 2019 at 12:15:36AM +0300, Ilias Apalodimas wrote:
>Hi Jesper,
>Getting late here, i'll respond in detail tomorrow. One point though
>
>[...]
>>
>> This special use-case, seems confined to your driver. And Ilias told me
>> that XDP is not really a performance benefit for this driver as the HW
>> PPS-limit is hit before the XDP and netstack limit.  I ask, does it
>> make sense to add XDP to this driver, if it complicates the code for
>> everybody else?
>I think yes. This is a widely used driver on TI embedded devices so having XDP
>to play along is a nice feature. It's also the first and only armv7 we have
>supporting this. Ivan already found a couple of issues due to the 32-bit
>architecture he is trying to fix, i think there's real benefit in having that,
>performance aside.
>I fully agree we should not impact the performance of the API to support a
>special hardware though. I'll have a look on the 2 solutions tomorrow, but the
>general approach on this one should be 'the simpler the better'
>
>Cheers
>/Ilias

BTW even w/o optimization it has close to 300kpps (but with increased number of
descs) on drop which is very close to netsec measurements Ilias sent. But from
what I know there is no h/w limit on cpsw at all that this CPU can serve, so
my assumption it's rather s/w limit. But that's not main here and XDP usage
has not been estimated enough yet in embedded, where hi speed not only benefit
that can be taken from XDP.

I need more clear circumstances to send v6 ...

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-06-30 17:23 ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Ivan Khoronzhuk
  2019-07-01 16:19   ` Jesper Dangaard Brouer
@ 2019-07-03  7:26   ` Jesper Dangaard Brouer
  2019-07-03  7:38     ` Ivan Khoronzhuk
  1 sibling, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2019-07-03  7:26 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, hawk, davem, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend, brouer


On Sun, 30 Jun 2019 20:23:48 +0300 Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> Add XDP support based on rx page_pool allocator, one frame per page.
> Page pool allocator is used with assumption that only one rx_handler
> is running simultaneously. DMA map/unmap is reused from page pool
> despite there is no need to map whole page.
> 
> Due to specific of cpsw, the same TX/RX handler can be used by 2
> network devices, so special fields in buffer are added to identify
> an interface the frame is destined to. Thus XDP works for both
> interfaces, that allows to test xdp redirect between two interfaces
> easily. Aslo, each rx queue have own page pools, but common for both
> netdevs.

Looking at the details what happen when a single RX-queue can receive
into multiple net_device'es.  I realize that this driver will
violate/kill some of the "hidden"/implicit RX-bulking that the
XDP_REDIRECT code depend on for performance.

Specifically, it violate this assumption:
 https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329

	/* Ingress dev_rx will be the same for all xdp_frame's in
	 * bulk_queue, because bq stored per-CPU and must be flushed
	 * from net_device drivers NAPI func end.
	 */
	if (!bq->dev_rx)
		bq->dev_rx = dev_rx;

This drivers "NAPI func end", can have received into multiple
net_devices, before it's NAPI cycle ends.  Thus, violating this code
assumption.

Knowing all xdp_frame's in the bulk queue is from the same net_device,
can be used to further optimize XDP.  E.g. the dev->netdev_ops->ndo_xdp_xmit()
call don't take fully advantage of this, yet.  If we merge this driver,
it will block optimizations in this area.

NACK

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support
  2019-07-03  7:26   ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Jesper Dangaard Brouer
@ 2019-07-03  7:38     ` Ivan Khoronzhuk
  0 siblings, 0 replies; 30+ messages in thread
From: Ivan Khoronzhuk @ 2019-07-03  7:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: grygorii.strashko, hawk, davem, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend

On Wed, Jul 03, 2019 at 09:26:03AM +0200, Jesper Dangaard Brouer wrote:
>
>On Sun, 30 Jun 2019 20:23:48 +0300 Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>
>> Add XDP support based on rx page_pool allocator, one frame per page.
>> Page pool allocator is used with assumption that only one rx_handler
>> is running simultaneously. DMA map/unmap is reused from page pool
>> despite there is no need to map whole page.
>>
>> Due to specific of cpsw, the same TX/RX handler can be used by 2
>> network devices, so special fields in buffer are added to identify
>> an interface the frame is destined to. Thus XDP works for both
>> interfaces, that allows to test xdp redirect between two interfaces
>> easily. Aslo, each rx queue have own page pools, but common for both
>> netdevs.
>
>Looking at the details what happen when a single RX-queue can receive
>into multiple net_device'es.  I realize that this driver will
>violate/kill some of the "hidden"/implicit RX-bulking that the
>XDP_REDIRECT code depend on for performance.
>
>Specifically, it violate this assumption:
> https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329
>
>	/* Ingress dev_rx will be the same for all xdp_frame's in
>	 * bulk_queue, because bq stored per-CPU and must be flushed
>	 * from net_device drivers NAPI func end.
>	 */
>	if (!bq->dev_rx)
>		bq->dev_rx = dev_rx;
>
>This drivers "NAPI func end", can have received into multiple
>net_devices, before it's NAPI cycle ends.  Thus, violating this code
>assumption.
). I said, I moved to be per device in rx_handler. It violates nothing.

>
>Knowing all xdp_frame's in the bulk queue is from the same net_device,
>can be used to further optimize XDP.  E.g. the dev->netdev_ops->ndo_xdp_xmit()
>call don't take fully advantage of this, yet.  If we merge this driver,
>it will block optimizations in this area.
>
>NACK

Jesper,

Seems I said that I moved it to flush, that does
dev->netdev_ops->ndo_xdp_xmit(), to rx_handler, so that it's done per device,
so device is knows per each flush.

In the code, I hope everyone can see ..., after each flush dev_rx is cleared
to 0. So no any impact on it.

As for me, it's very not clear and strange decision.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-07-03  7:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-30 17:23 [PATCH v5 net-next 0/6] net: ethernet: ti: cpsw: Add XDP support Ivan Khoronzhuk
2019-06-30 17:23 ` [PATCH v5 net-next 1/6] xdp: allow same allocator usage Ivan Khoronzhuk
2019-07-01 11:40   ` Jesper Dangaard Brouer
2019-07-02 10:27     ` Ivan Khoronzhuk
2019-07-02 14:46       ` Jesper Dangaard Brouer
2019-07-02 14:53         ` Ivan Khoronzhuk
2019-06-30 17:23 ` [PATCH v5 net-next 2/6] net: ethernet: ti: davinci_cpdma: add dma mapped submit Ivan Khoronzhuk
2019-06-30 17:23 ` [PATCH v5 net-next 3/6] net: ethernet: ti: davinci_cpdma: return handler status Ivan Khoronzhuk
2019-06-30 17:23 ` [PATCH v5 net-next 4/6] net: ethernet: ti: davinci_cpdma: allow desc split while down Ivan Khoronzhuk
2019-06-30 17:23 ` [PATCH v5 net-next 5/6] net: ethernet: ti: cpsw_ethtool: allow res " Ivan Khoronzhuk
2019-06-30 17:23 ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Ivan Khoronzhuk
2019-07-01 16:19   ` Jesper Dangaard Brouer
2019-07-01 18:09     ` Ilias Apalodimas
2019-07-02 11:37     ` Ivan Khoronzhuk
2019-07-02 13:39       ` Jesper Dangaard Brouer
2019-07-02 14:24         ` Ivan Khoronzhuk
2019-07-02 14:31         ` [PATCH] net: core: page_pool: add user refcnt and reintroduce page_pool_destroy Jesper Dangaard Brouer
2019-07-02 14:44           ` Ivan Khoronzhuk
2019-07-02 14:52             ` Jesper Dangaard Brouer
2019-07-02 14:56               ` Ivan Khoronzhuk
2019-07-02 15:10                 ` Jesper Dangaard Brouer
2019-07-02 15:21                   ` Ivan Khoronzhuk
2019-07-02 18:29                     ` Jesper Dangaard Brouer
2019-07-02 18:58                       ` Ivan Khoronzhuk
2019-07-02 20:28                         ` Ivan Khoronzhuk
2019-07-02 21:02                         ` Jesper Dangaard Brouer
2019-07-02 21:15                           ` Ilias Apalodimas
2019-07-02 21:41                             ` Ivan Khoronzhuk
2019-07-03  7:26   ` [PATCH v5 net-next 6/6] net: ethernet: ti: cpsw: add XDP support Jesper Dangaard Brouer
2019-07-03  7:38     ` Ivan Khoronzhuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).