netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] XDP unaligned chunk placement support
@ 2019-06-20  8:39 Kevin Laatz
  2019-06-20  8:39 ` [PATCH 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patchset adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patchset removes these restrictions, allowing XDP to be
more flexible in where it can place a chunk within a umem. By relaxing where
the chunks can be placed, it allows us to use an arbitrary buffer size and
place that wherever we have a free address in the umem. These changes add the
ability to support jumboframes and make it easy to integrate with other
existing frameworks that have their own memory management systems, such as
DPDK.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Adds an offset parameter to zero_copy_allocator. This change will
    enable us to calculate the original handle in zca_free. This will be
    required for unaligned chunk mode since we can't easily mask back to
    the original handle.

Patch 4:
  - Adds the offset parameter to i40e_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.

Patch 5:
  - Adds the offset parameter to ixgbe_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.


Patch 6:
  - Add infrastructure for unaligned chunks. Since we are dealing
    with unaligned chunks that could potentially cross a physical page
    boundary, we add checks to keep track of that information. We can
    later use this information to correctly handle buffers that are
    placed at an address where they cross a page boundary.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Addition of command line argument to pass in a desired buffer size
    and buffer recycling for unaligned mode. Passing in a buffer size will
    allow the application to use unaligned chunks with the unaligned chunk
    mode. Since we are now using unaligned chunks, we need to recycle our
    buffers in a slightly different way.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xdp: add offset param to zero_copy_allocator
  i40e: add offset to zca_free
  ixgbe: add offset to zca_free
  xsk: add support to allow unaligned chunk placement
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
 include/net/xdp.h                             |  3 +-
 include/net/xdp_sock.h                        |  2 +
 include/uapi/linux/if_xdp.h                   |  4 +
 net/core/xdp.c                                | 11 ++-
 net/xdp/xdp_umem.c                            | 17 ++--
 net/xdp/xsk.c                                 | 60 +++++++++--
 net/xdp/xsk_queue.h                           | 60 +++++++++--
 samples/bpf/xdpsock_user.c                    | 99 ++++++++++++++-----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  7 ++
 tools/lib/bpf/xsk.h                           |  2 +
 16 files changed, 241 insertions(+), 86 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/11] i40e: simplify Rx buffer recycle
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 02/11] ixgbe: " Kevin Laatz
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 1b17486543ac..c89e692e8663 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -419,8 +419,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -428,14 +426,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/11] ixgbe: simplify Rx buffer recycle
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
  2019-06-20  8:39 ` [PATCH 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 03/11] xdp: add offset param to zero_copy_allocator Kevin Laatz
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bfe95ce0bd7f..49536adafe8e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -251,8 +251,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -262,14 +260,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/11] xdp: add offset param to zero_copy_allocator
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
  2019-06-20  8:39 ` [PATCH 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
  2019-06-20  8:39 ` [PATCH 02/11] ixgbe: " Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 04/11] i40e: add offset to zca_free Kevin Laatz
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds an offset parameter for zero_copy_allocator.

This change is required for the unaligned chunk mode which will come later
in this patch set. The offset parameter is required for calculating the
original handle in unaligned mode since we can't easily mask back to it
like in the aligned case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 include/net/xdp.h |  3 ++-
 net/core/xdp.c    | 11 ++++++-----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0f25b3675c5c..ea801fd2bf98 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -53,7 +53,8 @@ struct xdp_mem_info {
 struct page_pool;
 
 struct zero_copy_allocator {
-	void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
+	void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
+			off_t off);
 };
 
 struct xdp_rxq_info {
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4b2b194f4f1f..a77a7162d213 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
  * of xdp_frames/pages in those cases.
  */
 static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
-			 unsigned long handle)
+			 unsigned long handle, off_t off)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -353,7 +353,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 		rcu_read_lock();
 		/* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
 		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
-		xa->zc_alloc->free(xa->zc_alloc, handle);
+		xa->zc_alloc->free(xa->zc_alloc, handle, off);
 		rcu_read_unlock();
 	default:
 		/* Not possible, checked in xdp_rxq_info_reg_mem_model() */
@@ -363,19 +363,20 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-	__xdp_return(xdpf->data, &xdpf->mem, false, 0);
+	__xdp_return(xdpf->data, &xdpf->mem, false, 0, 0);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
-	__xdp_return(xdpf->data, &xdpf->mem, true, 0);
+	__xdp_return(xdpf->data, &xdpf->mem, true, 0, 0);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
-	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle);
+	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle,
+			xdp->data - xdp->data_hard_start);
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/11] i40e: add offset to zca_free
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (2 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 03/11] xdp: add offset param to zero_copy_allocator Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 05/11] ixgbe: " Kevin Laatz
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds the offset param to for zero_copy_allocator to
i40e_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index c89e692e8663..8c281f356293 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
  * @alloc: Zero-copy allocator
  * @handle: Buffer handle
  **/
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct i40e_rx_buffer *bi;
 	struct i40e_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct i40e_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = &rx_ring->rx_bi[nta];
@@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 8cc0a2e7d9a2..85691dc9ac42 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			u16 qid);
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/11] ixgbe: add offset to zca_free
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (3 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 04/11] i40e: add offset to zca_free Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 06/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds the offset param to for zero_copy_allocator to
ixgbe_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 3 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c         | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index d93a690aff74..49702e2a4360 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -33,7 +33,8 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
 int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
 			 u16 qid);
 
-void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 
 void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
 int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 49536adafe8e..1ec02077ccb2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -268,16 +268,16 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	obi->skb = NULL;
 }
 
-void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct ixgbe_rx_buffer *bi;
 	struct ixgbe_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct ixgbe_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = rx_ring->rx_buffer_info;
@@ -285,7 +285,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/11] xsk: add support to allow unaligned chunk placement
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (4 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 05/11] ixgbe: " Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 07/11] libbpf: add flags to umem config Kevin Laatz
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/net/xdp_sock.h      |  2 ++
 include/uapi/linux/if_xdp.h |  4 +++
 net/xdp/xdp_umem.c          | 17 +++++++----
 net/xdp/xsk.c               | 60 ++++++++++++++++++++++++++++++++-----
 net/xdp/xsk_queue.h         | 60 ++++++++++++++++++++++++++++++++-----
 5 files changed, 121 insertions(+), 22 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index ae0f368a62bb..c07977e271c4 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index caed8b1614ff..8548f2110a77 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -52,6 +55,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2b18223e7eb8..4d4782572855 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -301,12 +301,14 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
 	int size_chk, err, i;
 
-	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
+	if ((!unaligned_chunks && chunk_size < XDP_UMEM_MIN_CHUNK_SIZE) ||
+			chunk_size > PAGE_SIZE) {
 		/* Strictly speaking we could support this, if:
 		 * - huge pages, or*
 		 * - using an IOMMU, or
@@ -316,7 +318,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -333,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -344,13 +348,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a14e8864e4fa..dfae19942b70 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -39,7 +39,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -49,14 +49,36 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+		u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT)+1].addr;
+		u64 page_start = addr & (PAGE_SIZE-1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -72,8 +94,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
+	__xsk_rcv_memcpy(xs->umem, addr, from_buf, len, metalen);
+
 	addr += metalen;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
@@ -126,7 +148,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -173,7 +195,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, &desc))
+		if (!xskq_peek_desc(xs->tx, &desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc.addr))
@@ -226,7 +248,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 	mutex_lock(&xs->mutex);
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -393,6 +415,26 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs-1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE == umem->pages[i+1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs-1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE == umem->pages[i+1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -480,6 +522,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 88b9ae24658d..c2676a1df938 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -119,6 +119,14 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr)
+{
+	bool cross_pg = (addr & (PAGE_SIZE-1)) + umem->chunk_size_nohr > PAGE_SIZE;
+	bool next_pg_contig = umem->pages[(addr >> PAGE_SHIFT)+1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -129,23 +137,44 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+		struct xdp_umem *umem)
+{
+	if (addr >= q->size ||
+			xskq_crosses_non_contig_pg(umem, addr)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+		struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr, umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+		struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -156,7 +185,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -215,8 +244,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+						   struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, umem))
+			return false;
+
+		if ((d->len > umem->chunk_size_nohr) || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -230,14 +272,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -247,7 +290,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -258,7 +302,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/11] libbpf: add flags to umem config
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (5 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 06/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 7 +++++++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index caed8b1614ff..8548f2110a77 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -52,6 +55,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 7ef6293b4fd7..df4207d4ff4a 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -115,6 +115,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -122,6 +123,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -181,6 +183,11 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
+
+	/* Headroom must be 0 for unaligned chunks */
+	if ((mr.flags & XDP_UMEM_UNALIGNED_CHUNKS) && mr.headroom != 0)
+		return -EINVAL;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 82ea71a0f3ec..8d393873b70f 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (6 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 07/11] libbpf: add flags to umem config Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..e26f43382d01 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static __u32 prog_id;
 
@@ -276,14 +278,21 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
 static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 {
 	struct xsk_umem_info *umem;
+	struct xsk_umem_config umem_cfg;
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
 	if (!umem)
 		exit_with_error(errno);
 
+	umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+	umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+	umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+	umem_cfg.flags = opt_umem_flags;
+
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
-			       NULL);
+			       &umem_cfg);
 	if (ret)
 		exit_with_error(-ret);
 
@@ -346,6 +355,7 @@ static struct option long_options[] = {
 	{"interval", required_argument, 0, 'n'},
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -365,6 +375,7 @@ static void usage(const char *prog)
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -377,7 +388,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -417,9 +428,14 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (7 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_teg in unaligned mode, we need to make
sure we give back the correct, original addr to the fill queue. To do this,
we need to mask the addr with the buffer size.

To pass in a buffer size, use the --buf-size=n argument.
NOTE: For xdpsock to work in aligned chunk mode, you still need to pass
'power of 2' buffer size.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 71 +++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index e26f43382d01..7b4ce047deb2 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -60,6 +60,10 @@ enum benchmark_type {
 	BENCH_L2FWD = 2,
 };
 
+#define LENGTH (256UL*1024*1024)
+#define ADDR (void *)(0x0UL)
+#define SHMAT_FLAGS (0)
+
 static enum benchmark_type opt_bench = BENCH_RXDROP;
 static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static const char *opt_if = "";
@@ -67,6 +71,7 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
@@ -287,7 +292,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
 	umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
-	umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	umem_cfg.frame_size = opt_buffer_size;
 	umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
 	umem_cfg.flags = opt_umem_flags;
 
@@ -334,8 +339,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 		exit_with_error(-ret);
 	for (i = 0;
 	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
-		     XSK_UMEM__DEFAULT_FRAME_SIZE;
-	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		     opt_buffer_size;
+	     i += opt_buffer_size)
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
 	xsk_ring_prod__submit(&xsk->umem->fq,
 			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
@@ -356,6 +361,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"unaligned", no_argument, 0, 'u'},
+	{"buf-size", required_argument, 0, 'b'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +382,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -u, --unaligned	Enable unaligned chunk placement\n"
+		"  -b, --buf-size=n	Specify the buffer size to use\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -388,7 +395,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czub", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -432,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
 			break;
+		case 'b':
+			opt_buffer_size = atoi(optarg);
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -483,13 +493,22 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		}
+
+		if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			for (i = 0; i < rcvd; i++) {
+				u64 comp_addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq,
+						idx_cq++);
+				u64 masked_comp = (comp_addr & ~((u64)opt_buffer_size-1));
+				*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+						masked_comp;
+			}
+		} else {
+			for (i = 0; i < rcvd; i++)
+				*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+						*xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
 		}
-		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -533,13 +552,25 @@ static void rx_drop(struct xsk_socket_info *xsk)
 		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
 	}
 
-	for (i = 0; i < rcvd; i++) {
-		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
-		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+	if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		for (i = 0; i < rcvd; i++) {
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 masked_addr = (addr & ~((u64)opt_buffer_size-1));
+
+			hex_dump(pkt, len, addr);
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = masked_addr;
+		}
+	} else {
+		for (i = 0; i < rcvd; i++) {
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
-		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+			hex_dump(pkt, len, addr);
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		}
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -677,20 +708,20 @@ int main(int argc, char **argv)
 	}
 
 	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+			     NUM_FRAMES * opt_buffer_size);
 	if (ret)
 		exit_with_error(ret);
 
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs,
-				  NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+				  NUM_FRAMES * opt_buffer_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
 
 	if (opt_bench == BENCH_TXONLY) {
 		int i;
 
-		for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
-		     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		for (i = 0; i < NUM_FRAMES * opt_buffer_size;
+		     i += opt_buffer_size)
 			(void)gen_eth_frame(umem, i);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/11] samples/bpf: use hugepages in xdpsock app
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (8 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-20  8:39 ` [PATCH 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7b4ce047deb2..8ed63ad68428 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -74,6 +74,7 @@ static int opt_interval = 1;
 static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static __u32 prog_id;
 
@@ -438,6 +439,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'b':
 			opt_buffer_size = atoi(optarg);
@@ -707,11 +709,13 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_buffer_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_buffer_size, PROT_READ|PROT_WRITE,
+			MAP_PRIVATE|MAP_ANONYMOUS|opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs,
 				  NUM_FRAMES * opt_buffer_size);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/11] doc/af_xdp: include unaligned chunk case
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (9 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
@ 2019-06-20  8:39 ` Kevin Laatz
  2019-06-24 15:38 ` [PATCH 00/11] XDP unaligned chunk placement support Björn Töpel
  2019-06-25 18:44 ` Jonathan Lemon
  12 siblings, 0 replies; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  8:39 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index e14d7d40fc75..16fbc68cac50 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (10 preceding siblings ...)
  2019-06-20  8:39 ` [PATCH 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
@ 2019-06-24 15:38 ` Björn Töpel
  2019-06-25 13:12   ` Laatz, Kevin
  2019-06-25 18:44 ` Jonathan Lemon
  12 siblings, 1 reply; 28+ messages in thread
From: Björn Töpel @ 2019-06-24 15:38 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 18:55, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patchset adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> place chunks within the umem as well as limiting the packet sizes that are
> supported.
>
> The changes in this patchset removes these restrictions, allowing XDP to be
> more flexible in where it can place a chunk within a umem. By relaxing where
> the chunks can be placed, it allows us to use an arbitrary buffer size and
> place that wherever we have a free address in the umem. These changes add the
> ability to support jumboframes and make it easy to integrate with other
> existing frameworks that have their own memory management systems, such as
> DPDK.
>

Thanks for working on this, Kevin and Ciara!

I have some minor comments on the series, but in general I think it's
in good shape!

For some reason the series was submitted twice (at least on my side)?


Thanks,
Björn

> Structure of the patchset:
> Patch 1:
>   - Remove unnecessary masking and headroom addition during zero-copy Rx
>     buffer recycling in i40e. This change is required in order for the
>     buffer recycling to work in the unaligned chunk mode.
>
> Patch 2:
>   - Remove unnecessary masking and headroom addition during
>     zero-copy Rx buffer recycling in ixgbe. This change is required in
>     order for the  buffer recycling to work in the unaligned chunk mode.
>
> Patch 3:
>   - Adds an offset parameter to zero_copy_allocator. This change will
>     enable us to calculate the original handle in zca_free. This will be
>     required for unaligned chunk mode since we can't easily mask back to
>     the original handle.
>
> Patch 4:
>   - Adds the offset parameter to i40e_zca_free. This change is needed for
>     calculating the handle since we can't easily mask back to the original
>     handle like we can in the aligned case.
>
> Patch 5:
>   - Adds the offset parameter to ixgbe_zca_free. This change is needed for
>     calculating the handle since we can't easily mask back to the original
>     handle like we can in the aligned case.
>
>
> Patch 6:
>   - Add infrastructure for unaligned chunks. Since we are dealing
>     with unaligned chunks that could potentially cross a physical page
>     boundary, we add checks to keep track of that information. We can
>     later use this information to correctly handle buffers that are
>     placed at an address where they cross a page boundary.
>
> Patch 7:
>   - Add flags for umem configuration to libbpf
>
> Patch 8:
>   - Modify xdpsock application to add a command line option for
>     unaligned chunks
>
> Patch 9:
>   - Addition of command line argument to pass in a desired buffer size
>     and buffer recycling for unaligned mode. Passing in a buffer size will
>     allow the application to use unaligned chunks with the unaligned chunk
>     mode. Since we are now using unaligned chunks, we need to recycle our
>     buffers in a slightly different way.
>
> Patch 10:
>   - Adds hugepage support to the xdpsock application
>
> Patch 11:
>   - Documentation update to include the unaligned chunk scenario. We need
>     to explicitly state that the incoming addresses are only masked in the
>     aligned chunk mode and not the unaligned chunk mode.
>
> Kevin Laatz (11):
>   i40e: simplify Rx buffer recycle
>   ixgbe: simplify Rx buffer recycle
>   xdp: add offset param to zero_copy_allocator
>   i40e: add offset to zca_free
>   ixgbe: add offset to zca_free
>   xsk: add support to allow unaligned chunk placement
>   libbpf: add flags to umem config
>   samples/bpf: add unaligned chunks mode support to xdpsock
>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>   samples/bpf: use hugepages in xdpsock app
>   doc/af_xdp: include unaligned chunk case
>
>  Documentation/networking/af_xdp.rst           | 10 +-
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
>  include/net/xdp.h                             |  3 +-
>  include/net/xdp_sock.h                        |  2 +
>  include/uapi/linux/if_xdp.h                   |  4 +
>  net/core/xdp.c                                | 11 ++-
>  net/xdp/xdp_umem.c                            | 17 ++--
>  net/xdp/xsk.c                                 | 60 +++++++++--
>  net/xdp/xsk_queue.h                           | 60 +++++++++--
>  samples/bpf/xdpsock_user.c                    | 99 ++++++++++++++-----
>  tools/include/uapi/linux/if_xdp.h             |  4 +
>  tools/lib/bpf/xsk.c                           |  7 ++
>  tools/lib/bpf/xsk.h                           |  2 +
>  16 files changed, 241 insertions(+), 86 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-24 15:38 ` [PATCH 00/11] XDP unaligned chunk placement support Björn Töpel
@ 2019-06-25 13:12   ` Laatz, Kevin
  0 siblings, 0 replies; 28+ messages in thread
From: Laatz, Kevin @ 2019-06-25 13:12 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus


On 24/06/2019 16:38, Björn Töpel wrote:
> On Thu, 20 Jun 2019 at 18:55, Kevin Laatz <kevin.laatz@intel.com> wrote:
>>
>> This patchset adds the ability to use unaligned chunks in the XDP umem.
>>
>> Currently, all chunk addresses passed to the umem are masked to be chunk
>> size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
>> place chunks within the umem as well as limiting the packet sizes 
>> that are
>> supported.
>>
>> The changes in this patchset removes these restrictions, allowing XDP 
>> to be
>> more flexible in where it can place a chunk within a umem. By 
>> relaxing where
>> the chunks can be placed, it allows us to use an arbitrary buffer 
>> size and
>> place that wherever we have a free address in the umem. These changes 
>> add the
>> ability to support jumboframes and make it easy to integrate with other
>> existing frameworks that have their own memory management systems, 
>> such as
>> DPDK.
>>
>
> Thanks for working on this, Kevin and Ciara!
>
> I have some minor comments on the series, but in general I think it's
> in good shape!
>
> For some reason the series was submitted twice (at least on my side)?
Apologies for the confusion... The first set had a typo in the bpf 
mailing list address (.com vs .org). Will fix for the v2.
>
>
> Thanks,
> Björn


Thanks for reviewing. Will address your comments in the v2.

>
>> Structure of the patchset:
>> Patch 1:
>>   - Remove unnecessary masking and headroom addition during zero-copy Rx
>>     buffer recycling in i40e. This change is required in order for the
>>     buffer recycling to work in the unaligned chunk mode.
>>
>> Patch 2:
>>   - Remove unnecessary masking and headroom addition during
>>     zero-copy Rx buffer recycling in ixgbe. This change is required in
>>     order for the  buffer recycling to work in the unaligned chunk mode.
>>
>> Patch 3:
>>   - Adds an offset parameter to zero_copy_allocator. This change will
>>     enable us to calculate the original handle in zca_free. This will be
>>     required for unaligned chunk mode since we can't easily mask back to
>>     the original handle.
>>
>> Patch 4:
>>   - Adds the offset parameter to i40e_zca_free. This change is needed 
>> for
>>     calculating the handle since we can't easily mask back to the 
>> original
>>     handle like we can in the aligned case.
>>
>> Patch 5:
>>   - Adds the offset parameter to ixgbe_zca_free. This change is 
>> needed for
>>     calculating the handle since we can't easily mask back to the 
>> original
>>     handle like we can in the aligned case.
>>
>>
>> Patch 6:
>>   - Add infrastructure for unaligned chunks. Since we are dealing
>>     with unaligned chunks that could potentially cross a physical page
>>     boundary, we add checks to keep track of that information. We can
>>     later use this information to correctly handle buffers that are
>>     placed at an address where they cross a page boundary.
>>
>> Patch 7:
>>   - Add flags for umem configuration to libbpf
>>
>> Patch 8:
>>   - Modify xdpsock application to add a command line option for
>>     unaligned chunks
>>
>> Patch 9:
>>   - Addition of command line argument to pass in a desired buffer size
>>     and buffer recycling for unaligned mode. Passing in a buffer size 
>> will
>>     allow the application to use unaligned chunks with the unaligned 
>> chunk
>>     mode. Since we are now using unaligned chunks, we need to recycle 
>> our
>>     buffers in a slightly different way.
>>
>> Patch 10:
>>   - Adds hugepage support to the xdpsock application
>>
>> Patch 11:
>>   - Documentation update to include the unaligned chunk scenario. We 
>> need
>>     to explicitly state that the incoming addresses are only masked 
>> in the
>>     aligned chunk mode and not the unaligned chunk mode.
>>
>> Kevin Laatz (11):
>>   i40e: simplify Rx buffer recycle
>>   ixgbe: simplify Rx buffer recycle
>>   xdp: add offset param to zero_copy_allocator
>>   i40e: add offset to zca_free
>>   ixgbe: add offset to zca_free
>>   xsk: add support to allow unaligned chunk placement
>>   libbpf: add flags to umem config
>>   samples/bpf: add unaligned chunks mode support to xdpsock
>>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>>   samples/bpf: use hugepages in xdpsock app
>>   doc/af_xdp: include unaligned chunk case
>>
>>  Documentation/networking/af_xdp.rst           | 10 +-
>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
>>  drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
>>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
>>  include/net/xdp.h                             |  3 +-
>>  include/net/xdp_sock.h                        |  2 +
>>  include/uapi/linux/if_xdp.h                   |  4 +
>>  net/core/xdp.c                                | 11 ++-
>>  net/xdp/xdp_umem.c                            | 17 ++--
>>  net/xdp/xsk.c                                 | 60 +++++++++--
>>  net/xdp/xsk_queue.h                           | 60 +++++++++--
>>  samples/bpf/xdpsock_user.c                    | 99 ++++++++++++++-----
>>  tools/include/uapi/linux/if_xdp.h             |  4 +
>>  tools/lib/bpf/xsk.c                           |  7 ++
>>  tools/lib/bpf/xsk.h                           |  2 +
>>  16 files changed, 241 insertions(+), 86 deletions(-)
>>
>> -- 
>> 2.17.1
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
                   ` (11 preceding siblings ...)
  2019-06-24 15:38 ` [PATCH 00/11] XDP unaligned chunk placement support Björn Töpel
@ 2019-06-25 18:44 ` Jonathan Lemon
  2019-06-27 11:14   ` Laatz, Kevin
  12 siblings, 1 reply; 28+ messages in thread
From: Jonathan Lemon @ 2019-06-25 18:44 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, bpf,
	intel-wired-lan, bruce.richardson, ciara.loftus

On 20 Jun 2019, at 1:39, Kevin Laatz wrote:

> This patchset adds the ability to use unaligned chunks in the XDP 
> umem.
>
> Currently, all chunk addresses passed to the umem are masked to be 
> chunk
> size aligned (default is 2k, max is PAGE_SIZE). This limits where we 
> can
> place chunks within the umem as well as limiting the packet sizes that 
> are
> supported.
>
> The changes in this patchset removes these restrictions, allowing XDP 
> to be
> more flexible in where it can place a chunk within a umem. By relaxing 
> where
> the chunks can be placed, it allows us to use an arbitrary buffer size 
> and
> place that wherever we have a free address in the umem. These changes 
> add the
> ability to support jumboframes and make it easy to integrate with 
> other
> existing frameworks that have their own memory management systems, 
> such as
> DPDK.

I'm a little unclear on how this should work, and have a few issues 
here:

  1) There isn't any support for the user defined umem->headroom

  2) When queuing RX buffers, the handle (aka umem offset) is used, 
which
     points to the start of the buffer area.  When the buffer appears in
     the completion queue, handle points to the start of the received 
data,
     which might be different from the buffer start address.

     Normally, this RX address is just put back in the fill queue, and 
the
     mask is used to find the buffer start address again.  This no 
longer
     works, so my question is, how is the buffer start address 
recomputed
     from the actual data payload address?

     Same with TX - if the TX payload isn't aligned in with the start of
     the buffer, what happens?

  3) This appears limited to crossing a single page boundary, but there
     is no constraint check on chunk_size.
-- 
Jonathan

>
> Structure of the patchset:
> Patch 1:
>   - Remove unnecessary masking and headroom addition during zero-copy 
> Rx
>     buffer recycling in i40e. This change is required in order for the
>     buffer recycling to work in the unaligned chunk mode.
>
> Patch 2:
>   - Remove unnecessary masking and headroom addition during
>     zero-copy Rx buffer recycling in ixgbe. This change is required in
>     order for the  buffer recycling to work in the unaligned chunk 
> mode.
>
> Patch 3:
>   - Adds an offset parameter to zero_copy_allocator. This change will
>     enable us to calculate the original handle in zca_free. This will 
> be
>     required for unaligned chunk mode since we can't easily mask back 
> to
>     the original handle.
>
> Patch 4:
>   - Adds the offset parameter to i40e_zca_free. This change is needed 
> for
>     calculating the handle since we can't easily mask back to the 
> original
>     handle like we can in the aligned case.
>
> Patch 5:
>   - Adds the offset parameter to ixgbe_zca_free. This change is needed 
> for
>     calculating the handle since we can't easily mask back to the 
> original
>     handle like we can in the aligned case.
>
>
> Patch 6:
>   - Add infrastructure for unaligned chunks. Since we are dealing
>     with unaligned chunks that could potentially cross a physical page
>     boundary, we add checks to keep track of that information. We can
>     later use this information to correctly handle buffers that are
>     placed at an address where they cross a page boundary.
>
> Patch 7:
>   - Add flags for umem configuration to libbpf
>
> Patch 8:
>   - Modify xdpsock application to add a command line option for
>     unaligned chunks
>
> Patch 9:
>   - Addition of command line argument to pass in a desired buffer size
>     and buffer recycling for unaligned mode. Passing in a buffer size 
> will
>     allow the application to use unaligned chunks with the unaligned 
> chunk
>     mode. Since we are now using unaligned chunks, we need to recycle 
> our
>     buffers in a slightly different way.
>
> Patch 10:
>   - Adds hugepage support to the xdpsock application
>
> Patch 11:
>   - Documentation update to include the unaligned chunk scenario. We 
> need
>     to explicitly state that the incoming addresses are only masked in 
> the
>     aligned chunk mode and not the unaligned chunk mode.
>
> Kevin Laatz (11):
>   i40e: simplify Rx buffer recycle
>   ixgbe: simplify Rx buffer recycle
>   xdp: add offset param to zero_copy_allocator
>   i40e: add offset to zca_free
>   ixgbe: add offset to zca_free
>   xsk: add support to allow unaligned chunk placement
>   libbpf: add flags to umem config
>   samples/bpf: add unaligned chunks mode support to xdpsock
>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>   samples/bpf: use hugepages in xdpsock app
>   doc/af_xdp: include unaligned chunk case
>
>  Documentation/networking/af_xdp.rst           | 10 +-
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
>  include/net/xdp.h                             |  3 +-
>  include/net/xdp_sock.h                        |  2 +
>  include/uapi/linux/if_xdp.h                   |  4 +
>  net/core/xdp.c                                | 11 ++-
>  net/xdp/xdp_umem.c                            | 17 ++--
>  net/xdp/xsk.c                                 | 60 +++++++++--
>  net/xdp/xsk_queue.h                           | 60 +++++++++--
>  samples/bpf/xdpsock_user.c                    | 99 
> ++++++++++++++-----
>  tools/include/uapi/linux/if_xdp.h             |  4 +
>  tools/lib/bpf/xsk.c                           |  7 ++
>  tools/lib/bpf/xsk.h                           |  2 +
>  16 files changed, 241 insertions(+), 86 deletions(-)
>
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-25 18:44 ` Jonathan Lemon
@ 2019-06-27 11:14   ` Laatz, Kevin
  2019-06-27 21:25     ` Jakub Kicinski
  0 siblings, 1 reply; 28+ messages in thread
From: Laatz, Kevin @ 2019-06-27 11:14 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, bpf,
	intel-wired-lan, bruce.richardson, ciara.loftus


On 25/06/2019 19:44, Jonathan Lemon wrote:
> On 20 Jun 2019, at 1:39, Kevin Laatz wrote:
>
>> This patchset adds the ability to use unaligned chunks in the XDP umem.
>>
>> Currently, all chunk addresses passed to the umem are masked to be chunk
>> size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
>> place chunks within the umem as well as limiting the packet sizes 
>> that are
>> supported.
>>
>> The changes in this patchset removes these restrictions, allowing XDP 
>> to be
>> more flexible in where it can place a chunk within a umem. By 
>> relaxing where
>> the chunks can be placed, it allows us to use an arbitrary buffer 
>> size and
>> place that wherever we have a free address in the umem. These changes 
>> add the
>> ability to support jumboframes and make it easy to integrate with other
>> existing frameworks that have their own memory management systems, 
>> such as
>> DPDK.
>
> I'm a little unclear on how this should work, and have a few issues here:
>
>  1) There isn't any support for the user defined umem->headroom
>

For the unaligned chunks case, it does not make sense to to support a 
user defined headroom since the user can point directly to where they 
want the data to start via the buffer address. Therefore, for unaligned 
chunks, the user defined headroom should always be 0 (aka the user did 
not define a headroom and the default value of 0 is used). Any other 
value will be caught and we return an invalid argument error.


>  2) When queuing RX buffers, the handle (aka umem offset) is used, which
>     points to the start of the buffer area.  When the buffer appears in
>     the completion queue, handle points to the start of the received 
> data,
>     which might be different from the buffer start address.
>
>     Normally, this RX address is just put back in the fill queue, and the
>     mask is used to find the buffer start address again.  This no longer
>     works, so my question is, how is the buffer start address recomputed
>     from the actual data payload address?
>
>     Same with TX - if the TX payload isn't aligned in with the start of
>     the buffer, what happens?

On the application side (xdpsock), we don't have to worry about the user 
defined headroom, since it is 0, so we only need to account for the 
XDP_PACKET_HEADROOM when computing the original address (in the default 
scenario). This was missing from the v1, will add this in the v2, to 
have xdpsock use the default value from libbpf! If the user is using 
another BPF program that uses a different offset, then the computation 
will need to be adjusted for that accordingly. In v2 we'll add support 
for this via command-line parameter.

However, we are also working on an "in-order" patchset, hopefully to be 
published soon, to guarantee the buffers returned to the application are 
in the same order as those provided to the kernel. Longer term, this is 
the best solution here as it allows the application to track itself, via 
a "shadow ring" or otherwise, the buffers sent to the kernel and any 
metadata associated with them, such as the start of buffer address.

>
>  3) This appears limited to crossing a single page boundary, but there
>     is no constraint check on chunk_size.

There is an existing check for chunk_size during xdp_umem_reg (in 
xdp_umem.c) The check makes sure that chunk size is at least 
XDP_UMEM_MIN_CHUNK_SIZE and at most PAGE_SIZE. Since the max is page 
size, we only need to check the immediate next page for contiguity.
While this patchset allows a max of 4k sized buffers, it is still an 
improvement from the current state. Future enhancements could look into 
extending the 4k limit but for now it is a good first step towards 
supporting hugepages efficiently.

Best regards,
Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-27 11:14   ` Laatz, Kevin
@ 2019-06-27 21:25     ` Jakub Kicinski
  2019-06-28 16:19       ` Laatz, Kevin
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2019-06-27 21:25 UTC (permalink / raw)
  To: Laatz, Kevin
  Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus

On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
> On the application side (xdpsock), we don't have to worry about the user 
> defined headroom, since it is 0, so we only need to account for the 
> XDP_PACKET_HEADROOM when computing the original address (in the default 
> scenario).

That assumes specific layout for the data inside the buffer.  Some NICs
will prepend information like timestamp to the packet, meaning the
packet would start at offset XDP_PACKET_HEADROOM + metadata len..

I think that's very limiting.  What is the challenge in providing
aligned addresses, exactly?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-27 21:25     ` Jakub Kicinski
@ 2019-06-28 16:19       ` Laatz, Kevin
  2019-06-28 16:51         ` Björn Töpel
                           ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Laatz, Kevin @ 2019-06-28 16:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus

On 27/06/2019 22:25, Jakub Kicinski wrote:
> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>> On the application side (xdpsock), we don't have to worry about the user
>> defined headroom, since it is 0, so we only need to account for the
>> XDP_PACKET_HEADROOM when computing the original address (in the default
>> scenario).
> That assumes specific layout for the data inside the buffer.  Some NICs
> will prepend information like timestamp to the packet, meaning the
> packet would start at offset XDP_PACKET_HEADROOM + metadata len..

Yes, if NICs prepend extra data to the packet that would be a problem for
using this feature in isolation. However, if we also add in support for 
in-order
RX and TX rings, that would no longer be an issue. However, even for NICs
which do prepend data, this patchset should not break anything that is 
currently
working.

>
> I think that's very limiting.  What is the challenge in providing
> aligned addresses, exactly?
The challenges are two-fold:
1) it prevents using arbitrary buffer sizes, which will be an issue 
supporting e.g. jumbo frames in future.
2) higher level user-space frameworks which may want to use AF_XDP, such 
as DPDK, do not currently support having buffers with 'fixed' alignment.
     The reason that DPDK uses arbitrary placement is that:
         - it would stop things working on certain NICs which need the 
actual writable space specified in units of 1k - therefore we need 2k + 
metadata space.
         - we place padding between buffers to avoid constantly hitting 
the same memory channels when accessing memory.
         - it allows the application to choose the actual buffer size it 
wants to use.
     We make use of the above to allow us to speed up processing 
significantly and also reduce the packet buffer memory size.

     Not having arbitrary buffer alignment also means an AF_XDP driver 
for DPDK cannot be a drop-in replacement for existing drivers in those 
frameworks. Even with a new capability to allow an arbitrary buffer 
alignment, existing apps will need to be modified to use that new 
capability.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-28 16:19       ` Laatz, Kevin
@ 2019-06-28 16:51         ` Björn Töpel
  2019-06-28 20:08           ` Jakub Kicinski
  2019-06-28 20:25         ` Jakub Kicinski
  2019-06-28 20:29         ` Jonathan Lemon
  2 siblings, 1 reply; 28+ messages in thread
From: Björn Töpel @ 2019-06-28 16:51 UTC (permalink / raw)
  To: Laatz, Kevin, Jakub Kicinski
  Cc: Jonathan Lemon, netdev, ast, daniel, magnus.karlsson, bpf,
	intel-wired-lan, bruce.richardson, ciara.loftus

On 2019-06-28 18:19, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
>> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>>> On the application side (xdpsock), we don't have to worry about the user
>>> defined headroom, since it is 0, so we only need to account for the
>>> XDP_PACKET_HEADROOM when computing the original address (in the default
>>> scenario).
>> That assumes specific layout for the data inside the buffer.  Some NICs
>> will prepend information like timestamp to the packet, meaning the
>> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
> 
> Yes, if NICs prepend extra data to the packet that would be a problem for
> using this feature in isolation. However, if we also add in support for 
> in-order
> RX and TX rings, that would no longer be an issue. However, even for NICs
> which do prepend data, this patchset should not break anything that is 
> currently
> working.

(Late on the ball. I'm in vacation mode.)

In your example Jakub, how would this look in XDP? Wouldn't the
timestamp be part of the metadata (xdp_md.data_meta)? Isn't
data-data_meta (if valid) <= XDP_PACKET_HEADROOM? That was my assumption.

There were some discussion on having meta data length in the struct
xdp_desc, before AF_XDP was merged, but the conclusion was that this was
*not* needed, because AF_XDP and the XDP program had an implicit
contract. If you're running AF_XDP, you also have an XDP program running
and you can determine the meta data length (and also getting back the
original buffer).

So, today in AF_XDP if XDP metadata is added, the userland application
can look it up before the xdp_desc.addr (just like regular XDP), and how
the XDP/AF_XDP application determines length/layout of the metadata i
out-of-band/not specified.

This is a bit messy/handwavy TBH, so maybe adding the length to the
descriptor *is* a good idea (extending the options part of the
xdp_desc)? Less clean though. OTOH the layout of the meta data still
need to be determined.


Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-28 16:51         ` Björn Töpel
@ 2019-06-28 20:08           ` Jakub Kicinski
  0 siblings, 0 replies; 28+ messages in thread
From: Jakub Kicinski @ 2019-06-28 20:08 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Laatz, Kevin, Jonathan Lemon, netdev, ast, daniel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus

On Fri, 28 Jun 2019 18:51:37 +0200, Björn Töpel wrote:
> In your example Jakub, how would this look in XDP? Wouldn't the
> timestamp be part of the metadata (xdp_md.data_meta)? Isn't
> data-data_meta (if valid) <= XDP_PACKET_HEADROOM? That was my assumption.

The driver parses the metadata and copies it outside of the prepend
before XDP runs.  Then XDP runs unaware of the prepend contents.  
That's the current situation.

XDP_PACKET_HEADROOM is before the entire frame.  Like this:

    buffer start
  /               DMA addr given to the device
 /              /
v              v
| XDP_HEADROOM | meta data | packet data |

Length of meta data comes in the standard fixed size descriptor.

The metadata prepend is in TV form ("TLV with no length field", length's
implied by type).

> There were some discussion on having meta data length in the struct
> xdp_desc, before AF_XDP was merged, but the conclusion was that this was
> *not* needed, because AF_XDP and the XDP program had an implicit
> contract. If you're running AF_XDP, you also have an XDP program running
> and you can determine the meta data length (and also getting back the
> original buffer).
> 
> So, today in AF_XDP if XDP metadata is added, the userland application
> can look it up before the xdp_desc.addr (just like regular XDP), and how
> the XDP/AF_XDP application determines length/layout of the metadata i
> out-of-band/not specified.
> 
> This is a bit messy/handwavy TBH, so maybe adding the length to the
> descriptor *is* a good idea (extending the options part of the
> xdp_desc)? Less clean though. OTOH the layout of the meta data still
> need to be determined.

Right, the device prepend is not exposed as metadata to XDP.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-28 16:19       ` Laatz, Kevin
  2019-06-28 16:51         ` Björn Töpel
@ 2019-06-28 20:25         ` Jakub Kicinski
  2019-06-28 20:29         ` Jonathan Lemon
  2 siblings, 0 replies; 28+ messages in thread
From: Jakub Kicinski @ 2019-06-28 20:25 UTC (permalink / raw)
  To: Laatz, Kevin
  Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus

On Fri, 28 Jun 2019 17:19:09 +0100, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
> > On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:  
> >> On the application side (xdpsock), we don't have to worry about the user
> >> defined headroom, since it is 0, so we only need to account for the
> >> XDP_PACKET_HEADROOM when computing the original address (in the default
> >> scenario).  
> > That assumes specific layout for the data inside the buffer.  Some NICs
> > will prepend information like timestamp to the packet, meaning the
> > packet would start at offset XDP_PACKET_HEADROOM + metadata len..  
> 
> Yes, if NICs prepend extra data to the packet that would be a problem for
> using this feature in isolation. However, if we also add in support for 
> in-order RX and TX rings, that would no longer be an issue.

Can you shed more light on in-order rings?  Do you mean that RX frames
come in order buffers were placed in the fill queue?  That wouldn't
make practical sense, no?  Even if the application does no
reordering there is also XDP_DROP and XDP_TX.  Please explain :)

> However, even for NICs which do prepend data, this patchset should
> not break anything that is currently working.

My understanding from the beginnings of AF_XDP was that we were
searching for a format flexible enough to support most if not all NICs.
Creating an ABI which will preclude vendors from supporting DPDK via
AF_XDP would seriously undermine the neutrality aspect.

> > I think that's very limiting.  What is the challenge in providing
> > aligned addresses, exactly?  
> The challenges are two-fold:
> 1) it prevents using arbitrary buffer sizes, which will be an issue 
> supporting e.g. jumbo frames in future.

Presumably support for jumbos would require a multi-buffer setup, and
therefore extensions to the ring format. Should we perhaps look into
implementing unaligned chunks by extending ring format as well?

> 2) higher level user-space frameworks which may want to use AF_XDP, such 
> as DPDK, do not currently support having buffers with 'fixed' alignment.
>      The reason that DPDK uses arbitrary placement is that:
>          - it would stop things working on certain NICs which need the 
> actual writable space specified in units of 1k - therefore we need 2k + 
> metadata space.
>          - we place padding between buffers to avoid constantly hitting 
> the same memory channels when accessing memory.
>          - it allows the application to choose the actual buffer size it 
> wants to use.
>      We make use of the above to allow us to speed up processing 
> significantly and also reduce the packet buffer memory size.
> 
>      Not having arbitrary buffer alignment also means an AF_XDP driver 
> for DPDK cannot be a drop-in replacement for existing drivers in those 
> frameworks. Even with a new capability to allow an arbitrary buffer 
> alignment, existing apps will need to be modified to use that new 
> capability.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-28 16:19       ` Laatz, Kevin
  2019-06-28 16:51         ` Björn Töpel
  2019-06-28 20:25         ` Jakub Kicinski
@ 2019-06-28 20:29         ` Jonathan Lemon
  2019-07-01 14:58           ` Laatz, Kevin
       [not found]           ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
  2 siblings, 2 replies; 28+ messages in thread
From: Jonathan Lemon @ 2019-06-28 20:29 UTC (permalink / raw)
  To: Laatz, Kevin
  Cc: Jakub Kicinski, netdev, ast, daniel, bjorn.topel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus



On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:

> On 27/06/2019 22:25, Jakub Kicinski wrote:
>> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>>> On the application side (xdpsock), we don't have to worry about the 
>>> user
>>> defined headroom, since it is 0, so we only need to account for the
>>> XDP_PACKET_HEADROOM when computing the original address (in the 
>>> default
>>> scenario).
>> That assumes specific layout for the data inside the buffer.  Some 
>> NICs
>> will prepend information like timestamp to the packet, meaning the
>> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem 
> for
> using this feature in isolation. However, if we also add in support 
> for in-order
> RX and TX rings, that would no longer be an issue. However, even for 
> NICs
> which do prepend data, this patchset should not break anything that is 
> currently
> working.

I read this as "the correct buffer address is recovered from the shadow 
ring".
I'm not sure I'm comfortable with that, and I'm also not sold on 
in-order completion
for the RX/TX rings.



>> I think that's very limiting.  What is the challenge in providing
>> aligned addresses, exactly?
> The challenges are two-fold:
> 1) it prevents using arbitrary buffer sizes, which will be an issue 
> supporting e.g. jumbo frames in future.
> 2) higher level user-space frameworks which may want to use AF_XDP, 
> such as DPDK, do not currently support having buffers with 'fixed' 
> alignment.
>     The reason that DPDK uses arbitrary placement is that:
>         - it would stop things working on certain NICs which 
> need the actual writable space specified in units of 1k - therefore we 
> need 2k + metadata space.
>         - we place padding between buffers to avoid constantly 
> hitting the same memory channels when accessing memory.
>         - it allows the application to choose the actual buffer 
> size it wants to use.
>     We make use of the above to allow us to speed up processing 
> significantly and also reduce the packet buffer memory size.
>
>     Not having arbitrary buffer alignment also means an AF_XDP 
> driver for DPDK cannot be a drop-in replacement for existing drivers 
> in those frameworks. Even with a new capability to allow an arbitrary 
> buffer alignment, existing apps will need to be modified to use that 
> new capability.

Since all buffers in the umem are the same chunk size, the original 
buffer
address can be recalculated with some multiply/shift math.  However, 
this is
more expensive than just a mask operation.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-06-28 20:29         ` Jonathan Lemon
@ 2019-07-01 14:58           ` Laatz, Kevin
       [not found]           ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
  1 sibling, 0 replies; 28+ messages in thread
From: Laatz, Kevin @ 2019-07-01 14:58 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Jakub Kicinski, netdev, ast, daniel, bjorn.topel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus

On 28/06/2019 21:29, Jonathan Lemon wrote:
> On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
>> On 27/06/2019 22:25, Jakub Kicinski wrote:
>>> I think that's very limiting.  What is the challenge in providing
>>> aligned addresses, exactly?
>> The challenges are two-fold:
>> 1) it prevents using arbitrary buffer sizes, which will be an issue 
>> supporting e.g. jumbo frames in future.
>> 2) higher level user-space frameworks which may want to use AF_XDP, 
>> such as DPDK, do not currently support having buffers with 'fixed' 
>> alignment.
>>     The reason that DPDK uses arbitrary placement is that:
>>         - it would stop things working on certain NICs which need the 
>> actual writable space specified in units of 1k - therefore we need 2k 
>> + metadata space.
>>         - we place padding between buffers to avoid constantly 
>> hitting the same memory channels when accessing memory.
>>         - it allows the application to choose the actual buffer size 
>> it wants to use.
>>     We make use of the above to allow us to speed up processing 
>> significantly and also reduce the packet buffer memory size.
>>
>>     Not having arbitrary buffer alignment also means an AF_XDP driver 
>> for DPDK cannot be a drop-in replacement for existing drivers in 
>> those frameworks. Even with a new capability to allow an arbitrary 
>> buffer alignment, existing apps will need to be modified to use that 
>> new capability.
>
> Since all buffers in the umem are the same chunk size, the original 
> buffer
> address can be recalculated with some multiply/shift math. However, 
> this is
> more expensive than just a mask operation.


Yes, we can do this.

Another option we have is to add a socket option for querying the 
metadata length from the driver (assuming it doesn't vary per packet). 
We can use that information to get back the original address using 
subtraction.

Alternatively, we can change the Rx descriptor format to include the 
metadata length. We could do this in a couple of ways, for example, 
rather than returning the address at the start of the packet, instead 
return the buffer address that was passed in, and adding another 16-bit 
field to specify the start of the packet offset with that buffer. Id 
using 16-bits of descriptor space is not desirable, an alternative could 
be to limit umem sizes to e.g. 2^48 bits (256 terabytes should be 
enough, right :-) ) and use the remaining 16 bits of the address as a 
packet offset. Other variations on these approaches are obviously 
possible too.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
       [not found]           ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
@ 2019-07-01 21:20             ` Jakub Kicinski
  2019-07-02  9:27               ` Richardson, Bruce
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2019-07-01 21:20 UTC (permalink / raw)
  To: Laatz, Kevin
  Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel,
	magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
	ciara.loftus

On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
> On 28/06/2019 21:29, Jonathan Lemon wrote:
> > On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:  
> >> On 27/06/2019 22:25, Jakub Kicinski wrote:  
> >>> I think that's very limiting.  What is the challenge in providing
> >>> aligned addresses, exactly?  
> >> The challenges are two-fold:
> >> 1) it prevents using arbitrary buffer sizes, which will be an issue 
> >> supporting e.g. jumbo frames in future.
> >> 2) higher level user-space frameworks which may want to use AF_XDP, 
> >> such as DPDK, do not currently support having buffers with 'fixed' 
> >> alignment.
> >>     The reason that DPDK uses arbitrary placement is that:
> >>         - it would stop things working on certain NICs which need the 
> >> actual writable space specified in units of 1k - therefore we need 2k 
> >> + metadata space.
> >>         - we place padding between buffers to avoid constantly 
> >> hitting the same memory channels when accessing memory.
> >>         - it allows the application to choose the actual buffer size 
> >> it wants to use.
> >>     We make use of the above to allow us to speed up processing 
> >> significantly and also reduce the packet buffer memory size.
> >>
> >>     Not having arbitrary buffer alignment also means an AF_XDP driver 
> >> for DPDK cannot be a drop-in replacement for existing drivers in 
> >> those frameworks. Even with a new capability to allow an arbitrary 
> >> buffer alignment, existing apps will need to be modified to use that 
> >> new capability.  
> >
> > Since all buffers in the umem are the same chunk size, the original 
> > buffer
> > address can be recalculated with some multiply/shift math. However, 
> > this is
> > more expensive than just a mask operation.  
> 
> Yes, we can do this.

That'd be best, can DPDK reasonably guarantee the slicing is uniform?
E.g. it's not desperate buffer pools with different bases?

> Another option we have is to add a socket option for querying the 
> metadata length from the driver (assuming it doesn't vary per packet). 
> We can use that information to get back to the original address using 
> subtraction.

Unfortunately the metadata depends on the packet and how much info 
the device was able to extract.  So it's variable length.

> Alternatively, we can change the Rx descriptor format to include the 
> metadata length. We could do this in a couple of ways, for example, 
> rather than returning the address as the start of the packet, instead 
> return the buffer address that was passed in, and adding another 16-bit 
> field to specify the start of packet offset with that buffer. If using 
> another 16-bits of the descriptor space is not desirable, an alternative 
> could be to limit umem sizes to e.g. 2^48 bits (256 terabytes should be 
> enough, right :-) ) and use the remaining 16 bits of the address as a 
> packet offset. Other variations on these approach are obviously possible 
> too.

Seems reasonable to me..

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 00/11] XDP unaligned chunk placement support
  2019-07-01 21:20             ` Jakub Kicinski
@ 2019-07-02  9:27               ` Richardson, Bruce
  2019-07-02 16:33                 ` Jonathan Lemon
  0 siblings, 1 reply; 28+ messages in thread
From: Richardson, Bruce @ 2019-07-02  9:27 UTC (permalink / raw)
  To: Jakub Kicinski, Laatz, Kevin
  Cc: Jonathan Lemon, netdev, ast, daniel, Topel, Bjorn, Karlsson,
	Magnus, bpf, intel-wired-lan, Loftus, Ciara



> -----Original Message-----
> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> Sent: Monday, July 1, 2019 10:20 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>; netdev@vger.kernel.org;
> ast@kernel.org; daniel@iogearbox.net; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>;
> bpf@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
> Subject: Re: [PATCH 00/11] XDP unaligned chunk placement support
> 
> On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
> > On 28/06/2019 21:29, Jonathan Lemon wrote:
> > > On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> > >> On 27/06/2019 22:25, Jakub Kicinski wrote:
> > >>> I think that's very limiting.  What is the challenge in providing
> > >>> aligned addresses, exactly?
> > >> The challenges are two-fold:
> > >> 1) it prevents using arbitrary buffer sizes, which will be an issue
> > >> supporting e.g. jumbo frames in future.
> > >> 2) higher level user-space frameworks which may want to use AF_XDP,
> > >> such as DPDK, do not currently support having buffers with 'fixed'
> > >> alignment.
> > >>     The reason that DPDK uses arbitrary placement is that:
> > >>         - it would stop things working on certain NICs which need
> > >> the actual writable space specified in units of 1k - therefore we
> > >> need 2k
> > >> + metadata space.
> > >>         - we place padding between buffers to avoid constantly
> > >> hitting the same memory channels when accessing memory.
> > >>         - it allows the application to choose the actual buffer
> > >> size it wants to use.
> > >>     We make use of the above to allow us to speed up processing
> > >> significantly and also reduce the packet buffer memory size.
> > >>
> > >>     Not having arbitrary buffer alignment also means an AF_XDP
> > >> driver for DPDK cannot be a drop-in replacement for existing
> > >> drivers in those frameworks. Even with a new capability to allow an
> > >> arbitrary buffer alignment, existing apps will need to be modified
> > >> to use that new capability.
> > >
> > > Since all buffers in the umem are the same chunk size, the original
> > > buffer address can be recalculated with some multiply/shift math.
> > > However, this is more expensive than just a mask operation.
> >
> > Yes, we can do this.
> 
> That'd be best, can DPDK reasonably guarantee the slicing is uniform?
> E.g. it's not desperate buffer pools with different bases?

It's generally uniform, but handling the crossing of (huge)page boundaries
complicates things a bit. Therefore I think the final option below
is best as it avoids any such problems.

> 
> > Another option we have is to add a socket option for querying the
> > metadata length from the driver (assuming it doesn't vary per packet).
> > We can use that information to get back to the original address using
> > subtraction.
> 
> Unfortunately the metadata depends on the packet and how much info the
> device was able to extract.  So it's variable length.
> 
> > Alternatively, we can change the Rx descriptor format to include the
> > metadata length. We could do this in a couple of ways, for example,
> > rather than returning the address as the start of the packet, instead
> > return the buffer address that was passed in, and adding another
> > 16-bit field to specify the start of packet offset with that buffer.
> > If using another 16-bits of the descriptor space is not desirable, an
> > alternative could be to limit umem sizes to e.g. 2^48 bits (256
> > terabytes should be enough, right :-) ) and use the remaining 16 bits
> > of the address as a packet offset. Other variations on these approach
> > are obviously possible too.
> 
> Seems reasonable to me..

I think this is probably the best solution, and also has the advantage that
a buffer retains its base address the full way through the cycle of Rx and Tx.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] XDP unaligned chunk placement support
  2019-07-02  9:27               ` Richardson, Bruce
@ 2019-07-02 16:33                 ` Jonathan Lemon
  0 siblings, 0 replies; 28+ messages in thread
From: Jonathan Lemon @ 2019-07-02 16:33 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Jakub Kicinski, Laatz, Kevin, netdev, ast, daniel, Topel, Bjorn,
	Karlsson, Magnus, bpf, intel-wired-lan, Loftus, Ciara



On 2 Jul 2019, at 2:27, Richardson, Bruce wrote:

>> -----Original Message-----
>> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
>> Sent: Monday, July 1, 2019 10:20 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>; 
>> netdev@vger.kernel.org;
>> ast@kernel.org; daniel@iogearbox.net; Topel, Bjorn
>> <bjorn.topel@intel.com>; Karlsson, Magnus 
>> <magnus.karlsson@intel.com>;
>> bpf@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Richardson, 
>> Bruce
>> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
>> Subject: Re: [PATCH 00/11] XDP unaligned chunk placement support
>>
>> On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
>>> On 28/06/2019 21:29, Jonathan Lemon wrote:
>>>> On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
>>>>> On 27/06/2019 22:25, Jakub Kicinski wrote:
>>>>>> I think that's very limiting.  What is the challenge in 
>>>>>> providing
>>>>>> aligned addresses, exactly?
>>>>> The challenges are two-fold:
>>>>> 1) it prevents using arbitrary buffer sizes, which will be an 
>>>>> issue
>>>>> supporting e.g. jumbo frames in future.
>>>>> 2) higher level user-space frameworks which may want to use 
>>>>> AF_XDP,
>>>>> such as DPDK, do not currently support having buffers with 'fixed'
>>>>> alignment.
>>>>>     The reason that DPDK uses arbitrary placement is that:
>>>>>         - it would stop things working on certain NICs which 
>>>>> need
>>>>> the actual writable space specified in units of 1k - therefore we
>>>>> need 2k
>>>>> + metadata space.
>>>>>         - we place padding between buffers to avoid 
>>>>> constantly
>>>>> hitting the same memory channels when accessing memory.
>>>>>         - it allows the application to choose the actual 
>>>>> buffer
>>>>> size it wants to use.
>>>>>     We make use of the above to allow us to speed up processing
>>>>> significantly and also reduce the packet buffer memory size.
>>>>>
>>>>>     Not having arbitrary buffer alignment also means an AF_XDP
>>>>> driver for DPDK cannot be a drop-in replacement for existing
>>>>> drivers in those frameworks. Even with a new capability to allow 
>>>>> an
>>>>> arbitrary buffer alignment, existing apps will need to be modified
>>>>> to use that new capability.
>>>>
>>>> Since all buffers in the umem are the same chunk size, the original
>>>> buffer address can be recalculated with some multiply/shift math.
>>>> However, this is more expensive than just a mask operation.
>>>
>>> Yes, we can do this.
>>
>> That'd be best, can DPDK reasonably guarantee the slicing is uniform?
>> E.g. it's not desperate buffer pools with different bases?
>
> It's generally uniform, but handling the crossing of (huge)page 
> boundaries
> complicates things a bit. Therefore I think the final option below
> is best as it avoids any such problems.
>
>>
>>> Another option we have is to add a socket option for querying the
>>> metadata length from the driver (assuming it doesn't vary per 
>>> packet).
>>> We can use that information to get back to the original address 
>>> using
>>> subtraction.
>>
>> Unfortunately the metadata depends on the packet and how much info 
>> the
>> device was able to extract.  So it's variable length.
>>
>>> Alternatively, we can change the Rx descriptor format to include the
>>> metadata length. We could do this in a couple of ways, for example,
>>> rather than returning the address as the start of the packet, 
>>> instead
>>> return the buffer address that was passed in, and adding another
>>> 16-bit field to specify the start of packet offset with that buffer.
>>> If using another 16-bits of the descriptor space is not desirable, 
>>> an
>>> alternative could be to limit umem sizes to e.g. 2^48 bits (256
>>> terabytes should be enough, right :-) ) and use the remaining 16 
>>> bits
>>> of the address as a packet offset. Other variations on these 
>>> approach
>>> are obviously possible too.
>>
>> Seems reasonable to me..
>
> I think this is probably the best solution, and also has the advantage 
> that
> a buffer retains its base address the full way through the cycle of Rx 
> and Tx.

I like this as well - it also has the advantage that drivers can keep
performing adjustments on the handle, which ends up just modifying the
offset.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/11] i40e: add offset to zca_free
  2019-06-20  9:09 ` [PATCH 04/11] i40e: add offset to zca_free Kevin Laatz
@ 2019-06-24 14:32   ` Björn Töpel
  0 siblings, 0 replies; 28+ messages in thread
From: Björn Töpel @ 2019-06-24 14:32 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds the offset param to for zero_copy_allocator to
> i40e_zca_free. This change is required to calculate the handle, otherwise,
> this function will not work in unaligned chunk mode since we can't easily mask
> back to the original handle in unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index c89e692e8663..8c281f356293 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>   * @alloc: Zero-copy allocator
>   * @handle: Buffer handle
>   **/
> -void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
> +void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off)
>  {
>         struct i40e_rx_buffer *bi;
>         struct i40e_ring *rx_ring;
> -       u64 hr, mask;
> +       u64 hr;
>         u16 nta;
>
>         rx_ring = container_of(alloc, struct i40e_ring, zca);
>         hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
> -       mask = rx_ring->xsk_umem->chunk_mask;
>
>         nta = rx_ring->next_to_alloc;
>         bi = &rx_ring->rx_bi[nta];
> @@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
>         nta++;
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
> -       handle &= mask;
> +       handle -= off;
>
>         bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
>         bi->dma += hr;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 8cc0a2e7d9a2..85691dc9ac42 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
>                         u16 qid);
> -void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> +void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off);
>  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 04/11] i40e: add offset to zca_free
  2019-06-20  9:09 Kevin Laatz
@ 2019-06-20  9:09 ` Kevin Laatz
  2019-06-24 14:32   ` Björn Töpel
  0 siblings, 1 reply; 28+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds the offset param to for zero_copy_allocator to
i40e_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index c89e692e8663..8c281f356293 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
  * @alloc: Zero-copy allocator
  * @handle: Buffer handle
  **/
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct i40e_rx_buffer *bi;
 	struct i40e_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct i40e_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = &rx_ring->rx_bi[nta];
@@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 8cc0a2e7d9a2..85691dc9ac42 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			u16 qid);
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-07-02 16:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-20  8:39 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
2019-06-20  8:39 ` [PATCH 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
2019-06-20  8:39 ` [PATCH 02/11] ixgbe: " Kevin Laatz
2019-06-20  8:39 ` [PATCH 03/11] xdp: add offset param to zero_copy_allocator Kevin Laatz
2019-06-20  8:39 ` [PATCH 04/11] i40e: add offset to zca_free Kevin Laatz
2019-06-20  8:39 ` [PATCH 05/11] ixgbe: " Kevin Laatz
2019-06-20  8:39 ` [PATCH 06/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-06-20  8:39 ` [PATCH 07/11] libbpf: add flags to umem config Kevin Laatz
2019-06-20  8:39 ` [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-06-20  8:39 ` [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-06-20  8:39 ` [PATCH 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-06-20  8:39 ` [PATCH 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-06-24 15:38 ` [PATCH 00/11] XDP unaligned chunk placement support Björn Töpel
2019-06-25 13:12   ` Laatz, Kevin
2019-06-25 18:44 ` Jonathan Lemon
2019-06-27 11:14   ` Laatz, Kevin
2019-06-27 21:25     ` Jakub Kicinski
2019-06-28 16:19       ` Laatz, Kevin
2019-06-28 16:51         ` Björn Töpel
2019-06-28 20:08           ` Jakub Kicinski
2019-06-28 20:25         ` Jakub Kicinski
2019-06-28 20:29         ` Jonathan Lemon
2019-07-01 14:58           ` Laatz, Kevin
     [not found]           ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
2019-07-01 21:20             ` Jakub Kicinski
2019-07-02  9:27               ` Richardson, Bruce
2019-07-02 16:33                 ` Jonathan Lemon
2019-06-20  9:09 Kevin Laatz
2019-06-20  9:09 ` [PATCH 04/11] i40e: add offset to zca_free Kevin Laatz
2019-06-24 14:32   ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).