All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] XDP unaligned chunk placement support
@ 2019-06-20  9:09 ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patchset adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patchset removes these restrictions, allowing XDP to be
more flexible in where it can place a chunk within a umem. By relaxing where
the chunks can be placed, it allows us to use an arbitrary buffer size and
place that wherever we have a free address in the umem. These changes add the
ability to support jumboframes and make it easy to integrate with other
existing frameworks that have their own memory management systems, such as
DPDK.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Adds an offset parameter to zero_copy_allocator. This change will
    enable us to calculate the original handle in zca_free. This will be
    required for unaligned chunk mode since we can't easily mask back to
    the original handle.

Patch 4:
  - Adds the offset parameter to i40e_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.

Patch 5:
  - Adds the offset parameter to ixgbe_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.


Patch 6:
  - Add infrastructure for unaligned chunks. Since we are dealing
    with unaligned chunks that could potentially cross a physical page
    boundary, we add checks to keep track of that information. We can
    later use this information to correctly handle buffers that are
    placed at an address where they cross a page boundary.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Addition of command line argument to pass in a desired buffer size
    and buffer recycling for unaligned mode. Passing in a buffer size will
    allow the application to use unaligned chunks with the unaligned chunk
    mode. Since we are now using unaligned chunks, we need to recycle our
    buffers in a slightly different way.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xdp: add offset param to zero_copy_allocator
  i40e: add offset to zca_free
  ixgbe: add offset to zca_free
  xsk: add support to allow unaligned chunk placement
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
 include/net/xdp.h                             |  3 +-
 include/net/xdp_sock.h                        |  2 +
 include/uapi/linux/if_xdp.h                   |  4 +
 net/core/xdp.c                                | 11 ++-
 net/xdp/xdp_umem.c                            | 17 ++--
 net/xdp/xsk.c                                 | 60 +++++++++--
 net/xdp/xsk_queue.h                           | 60 +++++++++--
 samples/bpf/xdpsock_user.c                    | 99 ++++++++++++++-----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  7 ++
 tools/lib/bpf/xsk.h                           |  2 +
 16 files changed, 241 insertions(+), 86 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 00/11] XDP unaligned chunk placement support
@ 2019-06-20  9:09 ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patchset adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patchset removes these restrictions, allowing XDP to be
more flexible in where it can place a chunk within a umem. By relaxing where
the chunks can be placed, it allows us to use an arbitrary buffer size and
place that wherever we have a free address in the umem. These changes add the
ability to support jumboframes and make it easy to integrate with other
existing frameworks that have their own memory management systems, such as
DPDK.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Adds an offset parameter to zero_copy_allocator. This change will
    enable us to calculate the original handle in zca_free. This will be
    required for unaligned chunk mode since we can't easily mask back to
    the original handle.

Patch 4:
  - Adds the offset parameter to i40e_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.

Patch 5:
  - Adds the offset parameter to ixgbe_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.


Patch 6:
  - Add infrastructure for unaligned chunks. Since we are dealing
    with unaligned chunks that could potentially cross a physical page
    boundary, we add checks to keep track of that information. We can
    later use this information to correctly handle buffers that are
    placed at an address where they cross a page boundary.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Addition of command line argument to pass in a desired buffer size
    and buffer recycling for unaligned mode. Passing in a buffer size will
    allow the application to use unaligned chunks with the unaligned chunk
    mode. Since we are now using unaligned chunks, we need to recycle our
    buffers in a slightly different way.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xdp: add offset param to zero_copy_allocator
  i40e: add offset to zca_free
  ixgbe: add offset to zca_free
  xsk: add support to allow unaligned chunk placement
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
 include/net/xdp.h                             |  3 +-
 include/net/xdp_sock.h                        |  2 +
 include/uapi/linux/if_xdp.h                   |  4 +
 net/core/xdp.c                                | 11 ++-
 net/xdp/xdp_umem.c                            | 17 ++--
 net/xdp/xsk.c                                 | 60 +++++++++--
 net/xdp/xsk_queue.h                           | 60 +++++++++--
 samples/bpf/xdpsock_user.c                    | 99 ++++++++++++++-----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  7 ++
 tools/lib/bpf/xsk.h                           |  2 +
 16 files changed, 241 insertions(+), 86 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH 01/11] i40e: simplify Rx buffer recycle
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 1b17486543ac..c89e692e8663 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -419,8 +419,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -428,14 +426,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 01/11] i40e: simplify Rx buffer recycle
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 1b17486543ac..c89e692e8663 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -419,8 +419,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -428,14 +426,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 02/11] ixgbe: simplify Rx buffer recycle
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bfe95ce0bd7f..49536adafe8e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -251,8 +251,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -262,14 +260,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 02/11] ixgbe: simplify Rx buffer recycle
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bfe95ce0bd7f..49536adafe8e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -251,8 +251,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -262,14 +260,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 03/11] xdp: add offset param to zero_copy_allocator
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds an offset parameter for zero_copy_allocator.

This change is required for the unaligned chunk mode which will come later
in this patch set. The offset parameter is required for calculating the
original handle in unaligned mode since we can't easily mask back to it
like in the aligned case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 include/net/xdp.h |  3 ++-
 net/core/xdp.c    | 11 ++++++-----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0f25b3675c5c..ea801fd2bf98 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -53,7 +53,8 @@ struct xdp_mem_info {
 struct page_pool;
 
 struct zero_copy_allocator {
-	void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
+	void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
+			off_t off);
 };
 
 struct xdp_rxq_info {
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4b2b194f4f1f..a77a7162d213 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
  * of xdp_frames/pages in those cases.
  */
 static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
-			 unsigned long handle)
+			 unsigned long handle, off_t off)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -353,7 +353,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 		rcu_read_lock();
 		/* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
 		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
-		xa->zc_alloc->free(xa->zc_alloc, handle);
+		xa->zc_alloc->free(xa->zc_alloc, handle, off);
 		rcu_read_unlock();
 	default:
 		/* Not possible, checked in xdp_rxq_info_reg_mem_model() */
@@ -363,19 +363,20 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-	__xdp_return(xdpf->data, &xdpf->mem, false, 0);
+	__xdp_return(xdpf->data, &xdpf->mem, false, 0, 0);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
-	__xdp_return(xdpf->data, &xdpf->mem, true, 0);
+	__xdp_return(xdpf->data, &xdpf->mem, true, 0, 0);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
-	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle);
+	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle,
+			xdp->data - xdp->data_hard_start);
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 03/11] xdp: add offset param to zero_copy_allocator
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds an offset parameter for zero_copy_allocator.

This change is required for the unaligned chunk mode which will come later
in this patch set. The offset parameter is required for calculating the
original handle in unaligned mode since we can't easily mask back to it
like in the aligned case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 include/net/xdp.h |  3 ++-
 net/core/xdp.c    | 11 ++++++-----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0f25b3675c5c..ea801fd2bf98 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -53,7 +53,8 @@ struct xdp_mem_info {
 struct page_pool;
 
 struct zero_copy_allocator {
-	void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
+	void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
+			off_t off);
 };
 
 struct xdp_rxq_info {
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4b2b194f4f1f..a77a7162d213 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
  * of xdp_frames/pages in those cases.
  */
 static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
-			 unsigned long handle)
+			 unsigned long handle, off_t off)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -353,7 +353,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 		rcu_read_lock();
 		/* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
 		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
-		xa->zc_alloc->free(xa->zc_alloc, handle);
+		xa->zc_alloc->free(xa->zc_alloc, handle, off);
 		rcu_read_unlock();
 	default:
 		/* Not possible, checked in xdp_rxq_info_reg_mem_model() */
@@ -363,19 +363,20 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-	__xdp_return(xdpf->data, &xdpf->mem, false, 0);
+	__xdp_return(xdpf->data, &xdpf->mem, false, 0, 0);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
-	__xdp_return(xdpf->data, &xdpf->mem, true, 0);
+	__xdp_return(xdpf->data, &xdpf->mem, true, 0, 0);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
-	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle);
+	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle,
+			xdp->data - xdp->data_hard_start);
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 04/11] i40e: add offset to zca_free
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds the offset param to for zero_copy_allocator to
i40e_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index c89e692e8663..8c281f356293 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
  * @alloc: Zero-copy allocator
  * @handle: Buffer handle
  **/
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct i40e_rx_buffer *bi;
 	struct i40e_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct i40e_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = &rx_ring->rx_bi[nta];
@@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 8cc0a2e7d9a2..85691dc9ac42 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			u16 qid);
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 04/11] i40e: add offset to zca_free
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds the offset param to for zero_copy_allocator to
i40e_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index c89e692e8663..8c281f356293 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
  * @alloc: Zero-copy allocator
  * @handle: Buffer handle
  **/
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct i40e_rx_buffer *bi;
 	struct i40e_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct i40e_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = &rx_ring->rx_bi[nta];
@@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 8cc0a2e7d9a2..85691dc9ac42 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			u16 qid);
-void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 05/11] ixgbe: add offset to zca_free
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds the offset param to for zero_copy_allocator to
ixgbe_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 3 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c         | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index d93a690aff74..49702e2a4360 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -33,7 +33,8 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
 int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
 			 u16 qid);
 
-void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 
 void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
 int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 49536adafe8e..1ec02077ccb2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -268,16 +268,16 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	obi->skb = NULL;
 }
 
-void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct ixgbe_rx_buffer *bi;
 	struct ixgbe_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct ixgbe_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = rx_ring->rx_buffer_info;
@@ -285,7 +285,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 05/11] ixgbe: add offset to zca_free
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds the offset param to for zero_copy_allocator to
ixgbe_zca_free. This change is required to calculate the handle, otherwise,
this function will not work in unaligned chunk mode since we can't easily mask
back to the original handle in unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 3 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c         | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index d93a690aff74..49702e2a4360 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -33,7 +33,8 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
 int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
 			 u16 qid);
 
-void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off);
 
 void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
 int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 49536adafe8e..1ec02077ccb2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -268,16 +268,16 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	obi->skb = NULL;
 }
 
-void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
+		off_t off)
 {
 	struct ixgbe_rx_buffer *bi;
 	struct ixgbe_ring *rx_ring;
-	u64 hr, mask;
+	u64 hr;
 	u16 nta;
 
 	rx_ring = container_of(alloc, struct ixgbe_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = rx_ring->rx_buffer_info;
@@ -285,7 +285,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	handle -= off;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 06/11] xsk: add support to allow unaligned chunk placement
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/net/xdp_sock.h      |  2 ++
 include/uapi/linux/if_xdp.h |  4 +++
 net/xdp/xdp_umem.c          | 17 +++++++----
 net/xdp/xsk.c               | 60 ++++++++++++++++++++++++++++++++-----
 net/xdp/xsk_queue.h         | 60 ++++++++++++++++++++++++++++++++-----
 5 files changed, 121 insertions(+), 22 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index ae0f368a62bb..c07977e271c4 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index caed8b1614ff..8548f2110a77 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -52,6 +55,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2b18223e7eb8..4d4782572855 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -301,12 +301,14 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
 	int size_chk, err, i;
 
-	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
+	if ((!unaligned_chunks && chunk_size < XDP_UMEM_MIN_CHUNK_SIZE) ||
+			chunk_size > PAGE_SIZE) {
 		/* Strictly speaking we could support this, if:
 		 * - huge pages, or*
 		 * - using an IOMMU, or
@@ -316,7 +318,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -333,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -344,13 +348,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a14e8864e4fa..dfae19942b70 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -39,7 +39,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -49,14 +49,36 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+		u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT)+1].addr;
+		u64 page_start = addr & (PAGE_SIZE-1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -72,8 +94,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
+	__xsk_rcv_memcpy(xs->umem, addr, from_buf, len, metalen);
+
 	addr += metalen;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
@@ -126,7 +148,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -173,7 +195,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, &desc))
+		if (!xskq_peek_desc(xs->tx, &desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc.addr))
@@ -226,7 +248,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 	mutex_lock(&xs->mutex);
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -393,6 +415,26 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs-1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE == umem->pages[i+1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs-1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE == umem->pages[i+1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -480,6 +522,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 88b9ae24658d..c2676a1df938 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -119,6 +119,14 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr)
+{
+	bool cross_pg = (addr & (PAGE_SIZE-1)) + umem->chunk_size_nohr > PAGE_SIZE;
+	bool next_pg_contig = umem->pages[(addr >> PAGE_SHIFT)+1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -129,23 +137,44 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+		struct xdp_umem *umem)
+{
+	if (addr >= q->size ||
+			xskq_crosses_non_contig_pg(umem, addr)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+		struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr, umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+		struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -156,7 +185,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -215,8 +244,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+						   struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, umem))
+			return false;
+
+		if ((d->len > umem->chunk_size_nohr) || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -230,14 +272,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -247,7 +290,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -258,7 +302,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 06/11] xsk: add support to allow unaligned chunk placement
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/net/xdp_sock.h      |  2 ++
 include/uapi/linux/if_xdp.h |  4 +++
 net/xdp/xdp_umem.c          | 17 +++++++----
 net/xdp/xsk.c               | 60 ++++++++++++++++++++++++++++++++-----
 net/xdp/xsk_queue.h         | 60 ++++++++++++++++++++++++++++++++-----
 5 files changed, 121 insertions(+), 22 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index ae0f368a62bb..c07977e271c4 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index caed8b1614ff..8548f2110a77 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -52,6 +55,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2b18223e7eb8..4d4782572855 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -301,12 +301,14 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
 	int size_chk, err, i;
 
-	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
+	if ((!unaligned_chunks && chunk_size < XDP_UMEM_MIN_CHUNK_SIZE) ||
+			chunk_size > PAGE_SIZE) {
 		/* Strictly speaking we could support this, if:
 		 * - huge pages, or*
 		 * - using an IOMMU, or
@@ -316,7 +318,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -333,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -344,13 +348,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a14e8864e4fa..dfae19942b70 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -39,7 +39,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -49,14 +49,36 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+		u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT)+1].addr;
+		u64 page_start = addr & (PAGE_SIZE-1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -72,8 +94,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
+	__xsk_rcv_memcpy(xs->umem, addr, from_buf, len, metalen);
+
 	addr += metalen;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
@@ -126,7 +148,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -173,7 +195,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, &desc))
+		if (!xskq_peek_desc(xs->tx, &desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc.addr))
@@ -226,7 +248,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 	mutex_lock(&xs->mutex);
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -393,6 +415,26 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs-1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE == umem->pages[i+1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs-1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE == umem->pages[i+1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -480,6 +522,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 88b9ae24658d..c2676a1df938 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -119,6 +119,14 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr)
+{
+	bool cross_pg = (addr & (PAGE_SIZE-1)) + umem->chunk_size_nohr > PAGE_SIZE;
+	bool next_pg_contig = umem->pages[(addr >> PAGE_SHIFT)+1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -129,23 +137,44 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+		struct xdp_umem *umem)
+{
+	if (addr >= q->size ||
+			xskq_crosses_non_contig_pg(umem, addr)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+		struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr, umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+		struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -156,7 +185,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -215,8 +244,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+						   struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, umem))
+			return false;
+
+		if ((d->len > umem->chunk_size_nohr) || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -230,14 +272,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -247,7 +290,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -258,7 +302,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 07/11] libbpf: add flags to umem config
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 7 +++++++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index caed8b1614ff..8548f2110a77 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -52,6 +55,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 7ef6293b4fd7..df4207d4ff4a 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -115,6 +115,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -122,6 +123,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -181,6 +183,11 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
+
+	/* Headroom must be 0 for unaligned chunks */
+	if ((mr.flags & XDP_UMEM_UNALIGNED_CHUNKS) && mr.headroom != 0)
+		return -EINVAL;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 82ea71a0f3ec..8d393873b70f 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 07/11] libbpf: add flags to umem config
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 7 +++++++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index caed8b1614ff..8548f2110a77 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -52,6 +55,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 7ef6293b4fd7..df4207d4ff4a 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -115,6 +115,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -122,6 +123,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -181,6 +183,11 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
+
+	/* Headroom must be 0 for unaligned chunks */
+	if ((mr.flags & XDP_UMEM_UNALIGNED_CHUNKS) && mr.headroom != 0)
+		return -EINVAL;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 82ea71a0f3ec..8d393873b70f 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..e26f43382d01 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static __u32 prog_id;
 
@@ -276,14 +278,21 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
 static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 {
 	struct xsk_umem_info *umem;
+	struct xsk_umem_config umem_cfg;
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
 	if (!umem)
 		exit_with_error(errno);
 
+	umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+	umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+	umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+	umem_cfg.flags = opt_umem_flags;
+
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
-			       NULL);
+			       &umem_cfg);
 	if (ret)
 		exit_with_error(-ret);
 
@@ -346,6 +355,7 @@ static struct option long_options[] = {
 	{"interval", required_argument, 0, 'n'},
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -365,6 +375,7 @@ static void usage(const char *prog)
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -377,7 +388,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -417,9 +428,14 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..e26f43382d01 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static __u32 prog_id;
 
@@ -276,14 +278,21 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
 static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 {
 	struct xsk_umem_info *umem;
+	struct xsk_umem_config umem_cfg;
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
 	if (!umem)
 		exit_with_error(errno);
 
+	umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+	umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+	umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+	umem_cfg.flags = opt_umem_flags;
+
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
-			       NULL);
+			       &umem_cfg);
 	if (ret)
 		exit_with_error(-ret);
 
@@ -346,6 +355,7 @@ static struct option long_options[] = {
 	{"interval", required_argument, 0, 'n'},
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -365,6 +375,7 @@ static void usage(const char *prog)
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -377,7 +388,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -417,9 +428,14 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_teg in unaligned mode, we need to make
sure we give back the correct, original addr to the fill queue. To do this,
we need to mask the addr with the buffer size.

To pass in a buffer size, use the --buf-size=n argument.
NOTE: For xdpsock to work in aligned chunk mode, you still need to pass
'power of 2' buffer size.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 71 +++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index e26f43382d01..7b4ce047deb2 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -60,6 +60,10 @@ enum benchmark_type {
 	BENCH_L2FWD = 2,
 };
 
+#define LENGTH (256UL*1024*1024)
+#define ADDR (void *)(0x0UL)
+#define SHMAT_FLAGS (0)
+
 static enum benchmark_type opt_bench = BENCH_RXDROP;
 static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static const char *opt_if = "";
@@ -67,6 +71,7 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
@@ -287,7 +292,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
 	umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
-	umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	umem_cfg.frame_size = opt_buffer_size;
 	umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
 	umem_cfg.flags = opt_umem_flags;
 
@@ -334,8 +339,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 		exit_with_error(-ret);
 	for (i = 0;
 	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
-		     XSK_UMEM__DEFAULT_FRAME_SIZE;
-	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		     opt_buffer_size;
+	     i += opt_buffer_size)
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
 	xsk_ring_prod__submit(&xsk->umem->fq,
 			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
@@ -356,6 +361,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"unaligned", no_argument, 0, 'u'},
+	{"buf-size", required_argument, 0, 'b'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +382,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -u, --unaligned	Enable unaligned chunk placement\n"
+		"  -b, --buf-size=n	Specify the buffer size to use\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -388,7 +395,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czub", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -432,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
 			break;
+		case 'b':
+			opt_buffer_size = atoi(optarg);
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -483,13 +493,22 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		}
+
+		if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			for (i = 0; i < rcvd; i++) {
+				u64 comp_addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq,
+						idx_cq++);
+				u64 masked_comp = (comp_addr & ~((u64)opt_buffer_size-1));
+				*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+						masked_comp;
+			}
+		} else {
+			for (i = 0; i < rcvd; i++)
+				*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+						*xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
 		}
-		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -533,13 +552,25 @@ static void rx_drop(struct xsk_socket_info *xsk)
 		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
 	}
 
-	for (i = 0; i < rcvd; i++) {
-		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
-		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+	if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		for (i = 0; i < rcvd; i++) {
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 masked_addr = (addr & ~((u64)opt_buffer_size-1));
+
+			hex_dump(pkt, len, addr);
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = masked_addr;
+		}
+	} else {
+		for (i = 0; i < rcvd; i++) {
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
-		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+			hex_dump(pkt, len, addr);
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		}
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -677,20 +708,20 @@ int main(int argc, char **argv)
 	}
 
 	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+			     NUM_FRAMES * opt_buffer_size);
 	if (ret)
 		exit_with_error(ret);
 
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs,
-				  NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+				  NUM_FRAMES * opt_buffer_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
 
 	if (opt_bench == BENCH_TXONLY) {
 		int i;
 
-		for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
-		     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		for (i = 0; i < NUM_FRAMES * opt_buffer_size;
+		     i += opt_buffer_size)
 			(void)gen_eth_frame(umem, i);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_teg in unaligned mode, we need to make
sure we give back the correct, original addr to the fill queue. To do this,
we need to mask the addr with the buffer size.

To pass in a buffer size, use the --buf-size=n argument.
NOTE: For xdpsock to work in aligned chunk mode, you still need to pass
'power of 2' buffer size.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 71 +++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index e26f43382d01..7b4ce047deb2 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -60,6 +60,10 @@ enum benchmark_type {
 	BENCH_L2FWD = 2,
 };
 
+#define LENGTH (256UL*1024*1024)
+#define ADDR (void *)(0x0UL)
+#define SHMAT_FLAGS (0)
+
 static enum benchmark_type opt_bench = BENCH_RXDROP;
 static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static const char *opt_if = "";
@@ -67,6 +71,7 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
@@ -287,7 +292,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
 	umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
-	umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	umem_cfg.frame_size = opt_buffer_size;
 	umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
 	umem_cfg.flags = opt_umem_flags;
 
@@ -334,8 +339,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 		exit_with_error(-ret);
 	for (i = 0;
 	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
-		     XSK_UMEM__DEFAULT_FRAME_SIZE;
-	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		     opt_buffer_size;
+	     i += opt_buffer_size)
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
 	xsk_ring_prod__submit(&xsk->umem->fq,
 			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
@@ -356,6 +361,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"unaligned", no_argument, 0, 'u'},
+	{"buf-size", required_argument, 0, 'b'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +382,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -u, --unaligned	Enable unaligned chunk placement\n"
+		"  -b, --buf-size=n	Specify the buffer size to use\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -388,7 +395,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czub", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -432,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
 			break;
+		case 'b':
+			opt_buffer_size = atoi(optarg);
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -483,13 +493,22 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		}
+
+		if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			for (i = 0; i < rcvd; i++) {
+				u64 comp_addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq,
+						idx_cq++);
+				u64 masked_comp = (comp_addr & ~((u64)opt_buffer_size-1));
+				*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+						masked_comp;
+			}
+		} else {
+			for (i = 0; i < rcvd; i++)
+				*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+						*xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
 		}
-		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -533,13 +552,25 @@ static void rx_drop(struct xsk_socket_info *xsk)
 		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
 	}
 
-	for (i = 0; i < rcvd; i++) {
-		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
-		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+	if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		for (i = 0; i < rcvd; i++) {
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 masked_addr = (addr & ~((u64)opt_buffer_size-1));
+
+			hex_dump(pkt, len, addr);
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = masked_addr;
+		}
+	} else {
+		for (i = 0; i < rcvd; i++) {
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
-		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+			hex_dump(pkt, len, addr);
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		}
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -677,20 +708,20 @@ int main(int argc, char **argv)
 	}
 
 	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+			     NUM_FRAMES * opt_buffer_size);
 	if (ret)
 		exit_with_error(ret);
 
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs,
-				  NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+				  NUM_FRAMES * opt_buffer_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
 
 	if (opt_bench == BENCH_TXONLY) {
 		int i;
 
-		for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
-		     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		for (i = 0; i < NUM_FRAMES * opt_buffer_size;
+		     i += opt_buffer_size)
 			(void)gen_eth_frame(umem, i);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 10/11] samples/bpf: use hugepages in xdpsock app
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7b4ce047deb2..8ed63ad68428 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -74,6 +74,7 @@ static int opt_interval = 1;
 static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static __u32 prog_id;
 
@@ -438,6 +439,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'b':
 			opt_buffer_size = atoi(optarg);
@@ -707,11 +709,13 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_buffer_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_buffer_size, PROT_READ|PROT_WRITE,
+			MAP_PRIVATE|MAP_ANONYMOUS|opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs,
 				  NUM_FRAMES * opt_buffer_size);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 10/11] samples/bpf: use hugepages in xdpsock app
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7b4ce047deb2..8ed63ad68428 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -74,6 +74,7 @@ static int opt_interval = 1;
 static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static __u32 prog_id;
 
@@ -438,6 +439,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'b':
 			opt_buffer_size = atoi(optarg);
@@ -707,11 +709,13 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_buffer_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_buffer_size, PROT_READ|PROT_WRITE,
+			MAP_PRIVATE|MAP_ANONYMOUS|opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs,
 				  NUM_FRAMES * opt_buffer_size);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH 11/11] doc/af_xdp: include unaligned chunk case
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-20  9:09   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
  Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index e14d7d40fc75..16fbc68cac50 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 11/11] doc/af_xdp: include unaligned chunk case
@ 2019-06-20  9:09   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-06-20  9:09 UTC (permalink / raw)
  To: intel-wired-lan

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index e14d7d40fc75..16fbc68cac50 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* Re: [PATCH 01/11] i40e: simplify Rx buffer recycle
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 14:29     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 14:29 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'old_bi' values
> directly without having to mask and add the headroom.
>

I like that the required change was turned into a cleanup! Nice!

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 1b17486543ac..c89e692e8663 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -419,8 +419,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>                                     struct i40e_rx_buffer *old_bi)
>  {
>         struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
> -       unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
> -       u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
>         u16 nta = rx_ring->next_to_alloc;
>
>         /* update, and store next to alloc */
> @@ -428,14 +426,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
>         /* transfer page from old buffer to new buffer */
> -       new_bi->dma = old_bi->dma & mask;
> -       new_bi->dma += hr;
> -
> -       new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
> -       new_bi->addr += hr;
> -
> -       new_bi->handle = old_bi->handle & mask;
> -       new_bi->handle += rx_ring->xsk_umem->headroom;
> +       new_bi->dma = old_bi->dma;
> +       new_bi->addr = old_bi->addr;
> +       new_bi->handle = old_bi->handle;
>
>         old_bi->addr = NULL;
>  }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 01/11] i40e: simplify Rx buffer recycle
@ 2019-06-24 14:29     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 14:29 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'old_bi' values
> directly without having to mask and add the headroom.
>

I like that the required change was turned into a cleanup! Nice!

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>

> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 1b17486543ac..c89e692e8663 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -419,8 +419,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>                                     struct i40e_rx_buffer *old_bi)
>  {
>         struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
> -       unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
> -       u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
>         u16 nta = rx_ring->next_to_alloc;
>
>         /* update, and store next to alloc */
> @@ -428,14 +426,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
>         /* transfer page from old buffer to new buffer */
> -       new_bi->dma = old_bi->dma & mask;
> -       new_bi->dma += hr;
> -
> -       new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
> -       new_bi->addr += hr;
> -
> -       new_bi->handle = old_bi->handle & mask;
> -       new_bi->handle += rx_ring->xsk_umem->headroom;
> +       new_bi->dma = old_bi->dma;
> +       new_bi->addr = old_bi->addr;
> +       new_bi->handle = old_bi->handle;
>
>         old_bi->addr = NULL;
>  }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 02/11] ixgbe: simplify Rx buffer recycle
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 14:30     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 14:30 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'obi' values
> directly without having to mask and add the headroom.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index bfe95ce0bd7f..49536adafe8e 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -251,8 +251,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>  static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>                                      struct ixgbe_rx_buffer *obi)
>  {
> -       unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
> -       u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
>         u16 nta = rx_ring->next_to_alloc;
>         struct ixgbe_rx_buffer *nbi;
>
> @@ -262,14 +260,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
>         /* transfer page from old buffer to new buffer */
> -       nbi->dma = obi->dma & mask;
> -       nbi->dma += hr;
> -
> -       nbi->addr = (void *)((unsigned long)obi->addr & mask);
> -       nbi->addr += hr;
> -
> -       nbi->handle = obi->handle & mask;
> -       nbi->handle += rx_ring->xsk_umem->headroom;
> +       nbi->dma = obi->dma;
> +       nbi->addr = obi->addr;
> +       nbi->handle = obi->handle;
>
>         obi->addr = NULL;
>         obi->skb = NULL;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 02/11] ixgbe: simplify Rx buffer recycle
@ 2019-06-24 14:30     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 14:30 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'obi' values
> directly without having to mask and add the headroom.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index bfe95ce0bd7f..49536adafe8e 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -251,8 +251,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>  static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>                                      struct ixgbe_rx_buffer *obi)
>  {
> -       unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
> -       u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
>         u16 nta = rx_ring->next_to_alloc;
>         struct ixgbe_rx_buffer *nbi;
>
> @@ -262,14 +260,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
>         /* transfer page from old buffer to new buffer */
> -       nbi->dma = obi->dma & mask;
> -       nbi->dma += hr;
> -
> -       nbi->addr = (void *)((unsigned long)obi->addr & mask);
> -       nbi->addr += hr;
> -
> -       nbi->handle = obi->handle & mask;
> -       nbi->handle += rx_ring->xsk_umem->headroom;
> +       nbi->dma = obi->dma;
> +       nbi->addr = obi->addr;
> +       nbi->handle = obi->handle;
>
>         obi->addr = NULL;
>         obi->skb = NULL;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 03/11] xdp: add offset param to zero_copy_allocator
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 14:31     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 14:31 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds an offset parameter for zero_copy_allocator.
>
> This change is required for the unaligned chunk mode which will come later
> in this patch set. The offset parameter is required for calculating the
> original handle in unaligned mode since we can't easily mask back to it
> like in the aligned case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>


> ---
>  include/net/xdp.h |  3 ++-
>  net/core/xdp.c    | 11 ++++++-----
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 0f25b3675c5c..ea801fd2bf98 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -53,7 +53,8 @@ struct xdp_mem_info {
>  struct page_pool;
>
>  struct zero_copy_allocator {
> -       void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
> +       void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
> +                       off_t off);
>  };
>
>  struct xdp_rxq_info {
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 4b2b194f4f1f..a77a7162d213 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
>   * of xdp_frames/pages in those cases.
>   */
>  static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> -                        unsigned long handle)
> +                        unsigned long handle, off_t off)
>  {
>         struct xdp_mem_allocator *xa;
>         struct page *page;
> @@ -353,7 +353,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
>                 rcu_read_lock();
>                 /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
>                 xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> -               xa->zc_alloc->free(xa->zc_alloc, handle);
> +               xa->zc_alloc->free(xa->zc_alloc, handle, off);
>                 rcu_read_unlock();
>         default:
>                 /* Not possible, checked in xdp_rxq_info_reg_mem_model() */
> @@ -363,19 +363,20 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
>
>  void xdp_return_frame(struct xdp_frame *xdpf)
>  {
> -       __xdp_return(xdpf->data, &xdpf->mem, false, 0);
> +       __xdp_return(xdpf->data, &xdpf->mem, false, 0, 0);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame);
>
>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
>  {
> -       __xdp_return(xdpf->data, &xdpf->mem, true, 0);
> +       __xdp_return(xdpf->data, &xdpf->mem, true, 0, 0);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
>
>  void xdp_return_buff(struct xdp_buff *xdp)
>  {
> -       __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle);
> +       __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle,
> +                       xdp->data - xdp->data_hard_start);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_buff);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 03/11] xdp: add offset param to zero_copy_allocator
@ 2019-06-24 14:31     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 14:31 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds an offset parameter for zero_copy_allocator.
>
> This change is required for the unaligned chunk mode which will come later
> in this patch set. The offset parameter is required for calculating the
> original handle in unaligned mode since we can't easily mask back to it
> like in the aligned case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>


> ---
>  include/net/xdp.h |  3 ++-
>  net/core/xdp.c    | 11 ++++++-----
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 0f25b3675c5c..ea801fd2bf98 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -53,7 +53,8 @@ struct xdp_mem_info {
>  struct page_pool;
>
>  struct zero_copy_allocator {
> -       void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
> +       void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
> +                       off_t off);
>  };
>
>  struct xdp_rxq_info {
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 4b2b194f4f1f..a77a7162d213 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
>   * of xdp_frames/pages in those cases.
>   */
>  static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> -                        unsigned long handle)
> +                        unsigned long handle, off_t off)
>  {
>         struct xdp_mem_allocator *xa;
>         struct page *page;
> @@ -353,7 +353,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
>                 rcu_read_lock();
>                 /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
>                 xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> -               xa->zc_alloc->free(xa->zc_alloc, handle);
> +               xa->zc_alloc->free(xa->zc_alloc, handle, off);
>                 rcu_read_unlock();
>         default:
>                 /* Not possible, checked in xdp_rxq_info_reg_mem_model() */
> @@ -363,19 +363,20 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
>
>  void xdp_return_frame(struct xdp_frame *xdpf)
>  {
> -       __xdp_return(xdpf->data, &xdpf->mem, false, 0);
> +       __xdp_return(xdpf->data, &xdpf->mem, false, 0, 0);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame);
>
>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
>  {
> -       __xdp_return(xdpf->data, &xdpf->mem, true, 0);
> +       __xdp_return(xdpf->data, &xdpf->mem, true, 0, 0);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
>
>  void xdp_return_buff(struct xdp_buff *xdp)
>  {
> -       __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle);
> +       __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle,
> +                       xdp->data - xdp->data_hard_start);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_buff);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 04/11] i40e: add offset to zca_free
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 14:32     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 14:32 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds the offset param to for zero_copy_allocator to
> i40e_zca_free. This change is required to calculate the handle, otherwise,
> this function will not work in unaligned chunk mode since we can't easily mask
> back to the original handle in unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index c89e692e8663..8c281f356293 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>   * @alloc: Zero-copy allocator
>   * @handle: Buffer handle
>   **/
> -void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
> +void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off)
>  {
>         struct i40e_rx_buffer *bi;
>         struct i40e_ring *rx_ring;
> -       u64 hr, mask;
> +       u64 hr;
>         u16 nta;
>
>         rx_ring = container_of(alloc, struct i40e_ring, zca);
>         hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
> -       mask = rx_ring->xsk_umem->chunk_mask;
>
>         nta = rx_ring->next_to_alloc;
>         bi = &rx_ring->rx_bi[nta];
> @@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
>         nta++;
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
> -       handle &= mask;
> +       handle -= off;
>
>         bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
>         bi->dma += hr;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 8cc0a2e7d9a2..85691dc9ac42 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
>                         u16 qid);
> -void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> +void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off);
>  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 04/11] i40e: add offset to zca_free
@ 2019-06-24 14:32     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 14:32 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds the offset param to for zero_copy_allocator to
> i40e_zca_free. This change is required to calculate the handle, otherwise,
> this function will not work in unaligned chunk mode since we can't easily mask
> back to the original handle in unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++----
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index c89e692e8663..8c281f356293 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -438,16 +438,16 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
>   * @alloc: Zero-copy allocator
>   * @handle: Buffer handle
>   **/
> -void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
> +void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off)
>  {
>         struct i40e_rx_buffer *bi;
>         struct i40e_ring *rx_ring;
> -       u64 hr, mask;
> +       u64 hr;
>         u16 nta;
>
>         rx_ring = container_of(alloc, struct i40e_ring, zca);
>         hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
> -       mask = rx_ring->xsk_umem->chunk_mask;
>
>         nta = rx_ring->next_to_alloc;
>         bi = &rx_ring->rx_bi[nta];
> @@ -455,7 +455,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
>         nta++;
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
> -       handle &= mask;
> +       handle -= off;
>
>         bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
>         bi->dma += hr;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 8cc0a2e7d9a2..85691dc9ac42 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -12,7 +12,8 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
>                         u16 qid);
> -void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> +void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off);
>  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 05/11] ixgbe: add offset to zca_free
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 14:32     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 14:32 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds the offset param to for zero_copy_allocator to
> ixgbe_zca_free. This change is required to calculate the handle, otherwise,
> this function will not work in unaligned chunk mode since we can't easily mask
> back to the original handle in unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 3 ++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c         | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> index d93a690aff74..49702e2a4360 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> @@ -33,7 +33,8 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
>  int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
>                          u16 qid);
>
> -void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> +void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off);
>
>  void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
>  int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 49536adafe8e..1ec02077ccb2 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -268,16 +268,16 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>         obi->skb = NULL;
>  }
>
> -void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
> +void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off)
>  {
>         struct ixgbe_rx_buffer *bi;
>         struct ixgbe_ring *rx_ring;
> -       u64 hr, mask;
> +       u64 hr;
>         u16 nta;
>
>         rx_ring = container_of(alloc, struct ixgbe_ring, zca);
>         hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
> -       mask = rx_ring->xsk_umem->chunk_mask;
>
>         nta = rx_ring->next_to_alloc;
>         bi = rx_ring->rx_buffer_info;
> @@ -285,7 +285,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
>         nta++;
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
> -       handle &= mask;
> +       handle -= off;
>
>         bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
>         bi->dma += hr;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 05/11] ixgbe: add offset to zca_free
@ 2019-06-24 14:32     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 14:32 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds the offset param to for zero_copy_allocator to
> ixgbe_zca_free. This change is required to calculate the handle, otherwise,
> this function will not work in unaligned chunk mode since we can't easily mask
> back to the original handle in unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>


> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 3 ++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c         | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> index d93a690aff74..49702e2a4360 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> @@ -33,7 +33,8 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
>  int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
>                          u16 qid);
>
> -void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> +void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off);
>
>  void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
>  int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 49536adafe8e..1ec02077ccb2 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -268,16 +268,16 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
>         obi->skb = NULL;
>  }
>
> -void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
> +void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle,
> +               off_t off)
>  {
>         struct ixgbe_rx_buffer *bi;
>         struct ixgbe_ring *rx_ring;
> -       u64 hr, mask;
> +       u64 hr;
>         u16 nta;
>
>         rx_ring = container_of(alloc, struct ixgbe_ring, zca);
>         hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
> -       mask = rx_ring->xsk_umem->chunk_mask;
>
>         nta = rx_ring->next_to_alloc;
>         bi = rx_ring->rx_buffer_info;
> @@ -285,7 +285,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
>         nta++;
>         rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
> -       handle &= mask;
> +       handle -= off;
>
>         bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
>         bi->dma += hr;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 06/11] xsk: add support to allow unaligned chunk placement
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 15:29     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 15:29 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  include/net/xdp_sock.h      |  2 ++
>  include/uapi/linux/if_xdp.h |  4 +++
>  net/xdp/xdp_umem.c          | 17 +++++++----
>  net/xdp/xsk.c               | 60 ++++++++++++++++++++++++++++++++-----
>  net/xdp/xsk_queue.h         | 60 ++++++++++++++++++++++++++++++++-----
>  5 files changed, 121 insertions(+), 22 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index ae0f368a62bb..c07977e271c4 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>         void *addr;
>         dma_addr_t dma;
> +       bool next_pg_contig;
>  };
>
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>         bool zc;
>         spinlock_t xsk_list_lock;
>         struct list_head xsk_list;
> +       u32 flags;
>  };
>
>  struct xdp_sock {
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index caed8b1614ff..8548f2110a77 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -52,6 +55,7 @@ struct xdp_umem_reg {
>         __u64 len; /* Length of packet data area */
>         __u32 chunk_size;
>         __u32 headroom;
> +       __u32 flags;
>  };
>
>  struct xdp_statistics {
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2b18223e7eb8..4d4782572855 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -301,12 +301,14 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  {
> +       bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;

You need to validate the flags field from userland, i.e. check for
unknown flags.

Also, headroom != 0 doesn't make sense in the unaligned chunks case,
right? Perhaps add a check that rejects headroom != 0 for the unaliged
case?

>         u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>         unsigned int chunks, chunks_per_page;
>         u64 addr = mr->addr, size = mr->len;
>         int size_chk, err, i;
>
> -       if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
> +       if ((!unaligned_chunks && chunk_size < XDP_UMEM_MIN_CHUNK_SIZE) ||

Hmm, so the chunk_size is the "buffer length" in the unaligned chunks
case. Any reason to relax (and allow) chunk_size==0? Is there a need
for the extra !unaligned_chunks check here?

> +                       chunk_size > PAGE_SIZE) {
>                 /* Strictly speaking we could support this, if:
>                  * - huge pages, or*
>                  * - using an IOMMU, or
> @@ -316,7 +318,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>                 return -EINVAL;
>         }
>
> -       if (!is_power_of_2(chunk_size))
> +       if (!unaligned_chunks && !is_power_of_2(chunk_size))
>                 return -EINVAL;
>
>         if (!PAGE_ALIGNED(addr)) {
> @@ -333,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>         if (chunks == 0)
>                 return -EINVAL;
>
> -       chunks_per_page = PAGE_SIZE / chunk_size;
> -       if (chunks < chunks_per_page || chunks % chunks_per_page)
> -               return -EINVAL;
> +       if (!unaligned_chunks) {
> +               chunks_per_page = PAGE_SIZE / chunk_size;
> +               if (chunks < chunks_per_page || chunks % chunks_per_page)
> +                       return -EINVAL;
> +       }
>
>         headroom = ALIGN(headroom, 64);
>
> @@ -344,13 +348,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>                 return -EINVAL;
>
>         umem->address = (unsigned long)addr;
> -       umem->chunk_mask = ~((u64)chunk_size - 1);
> +       umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);

Now that chunk_mask can be "invalid", xsk_diag.c needs a change as
well, since the mask is used to get the size. I suggest using
chunk_size_nohr and headroom in xsk_diag.c

Related, are you using chunk_mask at all in the unaligned-paths
(probably, right)?

>         umem->size = size;
>         umem->headroom = headroom;
>         umem->chunk_size_nohr = chunk_size - headroom;
>         umem->npgs = size / PAGE_SIZE;
>         umem->pgs = NULL;
>         umem->user = NULL;
> +       umem->flags = mr->flags;
>         INIT_LIST_HEAD(&umem->xsk_list);
>         spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index a14e8864e4fa..dfae19942b70 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -39,7 +39,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -       return xskq_peek_addr(umem->fq, addr);
> +       return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -49,14 +49,36 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +               u32 len, u32 metalen)
> +{
> +       void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +       if (xskq_crosses_non_contig_pg(umem, addr)) {
> +               void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT)+1].addr;
> +               u64 page_start = addr & (PAGE_SIZE-1);
> +               u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +               memcpy(to_buf, from_buf, first_len + metalen);
> +               memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +               return;
> +       }
> +
> +       memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  {
> -       void *to_buf, *from_buf;
> +       void *from_buf;
>         u32 metalen;
>         u64 addr;
>         int err;
>
> -       if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +       if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>             len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>                 xs->rx_dropped++;
>                 return -ENOSPC;
> @@ -72,8 +94,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>                 metalen = xdp->data - xdp->data_meta;
>         }
>
> -       to_buf = xdp_umem_get_data(xs->umem, addr);
> -       memcpy(to_buf, from_buf, len + metalen);
> +       __xsk_rcv_memcpy(xs->umem, addr, from_buf, len, metalen);
> +
>         addr += metalen;
>         err = xskq_produce_batch_desc(xs->rx, addr, len);
>         if (!err) {
> @@ -126,7 +148,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>         if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>                 return -EINVAL;
>
> -       if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +       if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>             len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>                 xs->rx_dropped++;
>                 return -ENOSPC;
> @@ -173,7 +195,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len)
>
>         rcu_read_lock();
>         list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -               if (!xskq_peek_desc(xs->tx, &desc))
> +               if (!xskq_peek_desc(xs->tx, &desc, umem))
>                         continue;
>
>                 if (xskq_produce_addr_lazy(umem->cq, desc.addr))
> @@ -226,7 +248,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>
>         mutex_lock(&xs->mutex);
>
> -       while (xskq_peek_desc(xs->tx, &desc)) {
> +       while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>                 char *buffer;
>                 u64 addr;
>                 u32 len;
> @@ -393,6 +415,26 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
>         return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity check
> + * For all other modes we use addr (kernel virtual address)
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
> +{
> +       int i;
> +
> +       if (flags & XDP_ZEROCOPY) {
> +               for (i = 0; i < umem->npgs-1; i++)
> +                       umem->pages[i].next_pg_contig =
> +                                       (umem->pages[i].dma + PAGE_SIZE == umem->pages[i+1].dma);
> +               return;
> +       }
> +
> +       for (i = 0; i < umem->npgs-1; i++)
> +               umem->pages[i].next_pg_contig =
> +                               (umem->pages[i].addr + PAGE_SIZE == umem->pages[i+1].addr);
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>  {
>         struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -480,6 +522,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>                 err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>                 if (err)
>                         goto out_unlock;
> +
> +               xsk_check_page_contiguity(xs->umem, flags);
>         }
>
>         xs->dev = dev;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 88b9ae24658d..c2676a1df938 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -119,6 +119,14 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr)
> +{
> +       bool cross_pg = (addr & (PAGE_SIZE-1)) + umem->chunk_size_nohr > PAGE_SIZE;
> +       bool next_pg_contig = umem->pages[(addr >> PAGE_SHIFT)+1].next_pg_contig;
> +
> +       return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>         if (addr >= q->size) {
> @@ -129,23 +137,44 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>         return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +               struct xdp_umem *umem)
> +{
> +       if (addr >= q->size ||
> +                       xskq_crosses_non_contig_pg(umem, addr)) {
> +               q->invalid_descs++;
> +               return false;
> +       }
> +
> +       return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +               struct xdp_umem *umem)
>  {
>         while (q->cons_tail != q->cons_head) {
>                 struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>                 unsigned int idx = q->cons_tail & q->ring_mask;
>
>                 *addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +               if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +                       if (xskq_is_valid_addr_unaligned(q, *addr, umem))
> +                               return addr;
> +                       goto out;
> +               }
> +
>                 if (xskq_is_valid_addr(q, *addr))
>                         return addr;
>
> +out:
>                 q->cons_tail++;
>         }
>
>         return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +               struct xdp_umem *umem)
>  {
>         if (q->cons_tail == q->cons_head) {
>                 smp_mb(); /* D, matches A */
> @@ -156,7 +185,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
>                 smp_rmb();
>         }
>
> -       return xskq_validate_addr(q, addr);
> +       return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -215,8 +244,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
> +                                                  struct xdp_umem *umem)
>  {
> +       if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +               if (!xskq_is_valid_addr_unaligned(q, d->addr, umem))
> +                       return false;
> +
> +               if ((d->len > umem->chunk_size_nohr) || d->options) {
> +                       q->invalid_descs++;
> +                       return false;
> +               }
> +
> +               return true;
> +       }
> +
>         if (!xskq_is_valid_addr(q, d->addr))
>                 return false;
>
> @@ -230,14 +272,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
> -                                                 struct xdp_desc *desc)
> +                                                 struct xdp_desc *desc,
> +                                                 struct xdp_umem *umem)
>  {
>         while (q->cons_tail != q->cons_head) {
>                 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>                 unsigned int idx = q->cons_tail & q->ring_mask;
>
>                 *desc = READ_ONCE(ring->desc[idx]);
> -               if (xskq_is_valid_desc(q, desc))
> +               if (xskq_is_valid_desc(q, desc, umem))
>                         return desc;
>
>                 q->cons_tail++;
> @@ -247,7 +290,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -                                             struct xdp_desc *desc)
> +                                             struct xdp_desc *desc,
> +                                             struct xdp_umem *umem)
>  {
>         if (q->cons_tail == q->cons_head) {
>                 smp_mb(); /* D, matches A */
> @@ -258,7 +302,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
>                 smp_rmb(); /* C, matches B */
>         }
>
> -       return xskq_validate_desc(q, desc);
> +       return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 06/11] xsk: add support to allow unaligned chunk placement
@ 2019-06-24 15:29     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 15:29 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  include/net/xdp_sock.h      |  2 ++
>  include/uapi/linux/if_xdp.h |  4 +++
>  net/xdp/xdp_umem.c          | 17 +++++++----
>  net/xdp/xsk.c               | 60 ++++++++++++++++++++++++++++++++-----
>  net/xdp/xsk_queue.h         | 60 ++++++++++++++++++++++++++++++++-----
>  5 files changed, 121 insertions(+), 22 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index ae0f368a62bb..c07977e271c4 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>         void *addr;
>         dma_addr_t dma;
> +       bool next_pg_contig;
>  };
>
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>         bool zc;
>         spinlock_t xsk_list_lock;
>         struct list_head xsk_list;
> +       u32 flags;
>  };
>
>  struct xdp_sock {
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index caed8b1614ff..8548f2110a77 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -52,6 +55,7 @@ struct xdp_umem_reg {
>         __u64 len; /* Length of packet data area */
>         __u32 chunk_size;
>         __u32 headroom;
> +       __u32 flags;
>  };
>
>  struct xdp_statistics {
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2b18223e7eb8..4d4782572855 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -301,12 +301,14 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  {
> +       bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;

You need to validate the flags field from userland, i.e. check for
unknown flags.

Also, headroom != 0 doesn't make sense in the unaligned chunks case,
right? Perhaps add a check that rejects headroom != 0 for the unaliged
case?

>         u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>         unsigned int chunks, chunks_per_page;
>         u64 addr = mr->addr, size = mr->len;
>         int size_chk, err, i;
>
> -       if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
> +       if ((!unaligned_chunks && chunk_size < XDP_UMEM_MIN_CHUNK_SIZE) ||

Hmm, so the chunk_size is the "buffer length" in the unaligned chunks
case. Any reason to relax (and allow) chunk_size==0? Is there a need
for the extra !unaligned_chunks check here?

> +                       chunk_size > PAGE_SIZE) {
>                 /* Strictly speaking we could support this, if:
>                  * - huge pages, or*
>                  * - using an IOMMU, or
> @@ -316,7 +318,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>                 return -EINVAL;
>         }
>
> -       if (!is_power_of_2(chunk_size))
> +       if (!unaligned_chunks && !is_power_of_2(chunk_size))
>                 return -EINVAL;
>
>         if (!PAGE_ALIGNED(addr)) {
> @@ -333,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>         if (chunks == 0)
>                 return -EINVAL;
>
> -       chunks_per_page = PAGE_SIZE / chunk_size;
> -       if (chunks < chunks_per_page || chunks % chunks_per_page)
> -               return -EINVAL;
> +       if (!unaligned_chunks) {
> +               chunks_per_page = PAGE_SIZE / chunk_size;
> +               if (chunks < chunks_per_page || chunks % chunks_per_page)
> +                       return -EINVAL;
> +       }
>
>         headroom = ALIGN(headroom, 64);
>
> @@ -344,13 +348,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>                 return -EINVAL;
>
>         umem->address = (unsigned long)addr;
> -       umem->chunk_mask = ~((u64)chunk_size - 1);
> +       umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);

Now that chunk_mask can be "invalid", xsk_diag.c needs a change as
well, since the mask is used to get the size. I suggest using
chunk_size_nohr and headroom in xsk_diag.c

Related, are you using chunk_mask at all in the unaligned-paths
(probably, right)?

>         umem->size = size;
>         umem->headroom = headroom;
>         umem->chunk_size_nohr = chunk_size - headroom;
>         umem->npgs = size / PAGE_SIZE;
>         umem->pgs = NULL;
>         umem->user = NULL;
> +       umem->flags = mr->flags;
>         INIT_LIST_HEAD(&umem->xsk_list);
>         spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index a14e8864e4fa..dfae19942b70 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -39,7 +39,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -       return xskq_peek_addr(umem->fq, addr);
> +       return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -49,14 +49,36 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +               u32 len, u32 metalen)
> +{
> +       void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +       if (xskq_crosses_non_contig_pg(umem, addr)) {
> +               void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT)+1].addr;
> +               u64 page_start = addr & (PAGE_SIZE-1);
> +               u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +               memcpy(to_buf, from_buf, first_len + metalen);
> +               memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +               return;
> +       }
> +
> +       memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  {
> -       void *to_buf, *from_buf;
> +       void *from_buf;
>         u32 metalen;
>         u64 addr;
>         int err;
>
> -       if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +       if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>             len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>                 xs->rx_dropped++;
>                 return -ENOSPC;
> @@ -72,8 +94,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>                 metalen = xdp->data - xdp->data_meta;
>         }
>
> -       to_buf = xdp_umem_get_data(xs->umem, addr);
> -       memcpy(to_buf, from_buf, len + metalen);
> +       __xsk_rcv_memcpy(xs->umem, addr, from_buf, len, metalen);
> +
>         addr += metalen;
>         err = xskq_produce_batch_desc(xs->rx, addr, len);
>         if (!err) {
> @@ -126,7 +148,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>         if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>                 return -EINVAL;
>
> -       if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +       if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>             len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>                 xs->rx_dropped++;
>                 return -ENOSPC;
> @@ -173,7 +195,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len)
>
>         rcu_read_lock();
>         list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -               if (!xskq_peek_desc(xs->tx, &desc))
> +               if (!xskq_peek_desc(xs->tx, &desc, umem))
>                         continue;
>
>                 if (xskq_produce_addr_lazy(umem->cq, desc.addr))
> @@ -226,7 +248,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>
>         mutex_lock(&xs->mutex);
>
> -       while (xskq_peek_desc(xs->tx, &desc)) {
> +       while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>                 char *buffer;
>                 u64 addr;
>                 u32 len;
> @@ -393,6 +415,26 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
>         return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity check
> + * For all other modes we use addr (kernel virtual address)
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
> +{
> +       int i;
> +
> +       if (flags & XDP_ZEROCOPY) {
> +               for (i = 0; i < umem->npgs-1; i++)
> +                       umem->pages[i].next_pg_contig =
> +                                       (umem->pages[i].dma + PAGE_SIZE == umem->pages[i+1].dma);
> +               return;
> +       }
> +
> +       for (i = 0; i < umem->npgs-1; i++)
> +               umem->pages[i].next_pg_contig =
> +                               (umem->pages[i].addr + PAGE_SIZE == umem->pages[i+1].addr);
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>  {
>         struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -480,6 +522,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>                 err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>                 if (err)
>                         goto out_unlock;
> +
> +               xsk_check_page_contiguity(xs->umem, flags);
>         }
>
>         xs->dev = dev;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 88b9ae24658d..c2676a1df938 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -119,6 +119,14 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr)
> +{
> +       bool cross_pg = (addr & (PAGE_SIZE-1)) + umem->chunk_size_nohr > PAGE_SIZE;
> +       bool next_pg_contig = umem->pages[(addr >> PAGE_SHIFT)+1].next_pg_contig;
> +
> +       return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>         if (addr >= q->size) {
> @@ -129,23 +137,44 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>         return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +               struct xdp_umem *umem)
> +{
> +       if (addr >= q->size ||
> +                       xskq_crosses_non_contig_pg(umem, addr)) {
> +               q->invalid_descs++;
> +               return false;
> +       }
> +
> +       return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +               struct xdp_umem *umem)
>  {
>         while (q->cons_tail != q->cons_head) {
>                 struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>                 unsigned int idx = q->cons_tail & q->ring_mask;
>
>                 *addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +               if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +                       if (xskq_is_valid_addr_unaligned(q, *addr, umem))
> +                               return addr;
> +                       goto out;
> +               }
> +
>                 if (xskq_is_valid_addr(q, *addr))
>                         return addr;
>
> +out:
>                 q->cons_tail++;
>         }
>
>         return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +               struct xdp_umem *umem)
>  {
>         if (q->cons_tail == q->cons_head) {
>                 smp_mb(); /* D, matches A */
> @@ -156,7 +185,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
>                 smp_rmb();
>         }
>
> -       return xskq_validate_addr(q, addr);
> +       return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -215,8 +244,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
> +                                                  struct xdp_umem *umem)
>  {
> +       if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +               if (!xskq_is_valid_addr_unaligned(q, d->addr, umem))
> +                       return false;
> +
> +               if ((d->len > umem->chunk_size_nohr) || d->options) {
> +                       q->invalid_descs++;
> +                       return false;
> +               }
> +
> +               return true;
> +       }
> +
>         if (!xskq_is_valid_addr(q, d->addr))
>                 return false;
>
> @@ -230,14 +272,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
> -                                                 struct xdp_desc *desc)
> +                                                 struct xdp_desc *desc,
> +                                                 struct xdp_umem *umem)
>  {
>         while (q->cons_tail != q->cons_head) {
>                 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>                 unsigned int idx = q->cons_tail & q->ring_mask;
>
>                 *desc = READ_ONCE(ring->desc[idx]);
> -               if (xskq_is_valid_desc(q, desc))
> +               if (xskq_is_valid_desc(q, desc, umem))
>                         return desc;
>
>                 q->cons_tail++;
> @@ -247,7 +290,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -                                             struct xdp_desc *desc)
> +                                             struct xdp_desc *desc,
> +                                             struct xdp_umem *umem)
>  {
>         if (q->cons_tail == q->cons_head) {
>                 smp_mb(); /* D, matches A */
> @@ -258,7 +302,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
>                 smp_rmb(); /* C, matches B */
>         }
>
> -       return xskq_validate_desc(q, desc);
> +       return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 07/11] libbpf: add flags to umem config
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 15:30     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 15:30 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:26, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  tools/include/uapi/linux/if_xdp.h | 4 ++++
>  tools/lib/bpf/xsk.c               | 7 +++++++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 13 insertions(+)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index caed8b1614ff..8548f2110a77 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -52,6 +55,7 @@ struct xdp_umem_reg {
>         __u64 len; /* Length of packet data area */
>         __u32 chunk_size;
>         __u32 headroom;
> +       __u32 flags;
>  };
>
>  struct xdp_statistics {
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 7ef6293b4fd7..df4207d4ff4a 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -115,6 +115,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -122,6 +123,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -181,6 +183,11 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
> +
> +       /* Headroom must be 0 for unaligned chunks */
> +       if ((mr.flags & XDP_UMEM_UNALIGNED_CHUNKS) && mr.headroom != 0)
> +               return -EINVAL;

Ah. :-) I'd prefer that this is done in the bind syscall.

>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 82ea71a0f3ec..8d393873b70f 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;
>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 07/11] libbpf: add flags to umem config
@ 2019-06-24 15:30     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 15:30 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:26, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  tools/include/uapi/linux/if_xdp.h | 4 ++++
>  tools/lib/bpf/xsk.c               | 7 +++++++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 13 insertions(+)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index caed8b1614ff..8548f2110a77 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -52,6 +55,7 @@ struct xdp_umem_reg {
>         __u64 len; /* Length of packet data area */
>         __u32 chunk_size;
>         __u32 headroom;
> +       __u32 flags;
>  };
>
>  struct xdp_statistics {
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 7ef6293b4fd7..df4207d4ff4a 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -115,6 +115,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -122,6 +123,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -181,6 +183,11 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
> +
> +       /* Headroom must be 0 for unaligned chunks */
> +       if ((mr.flags & XDP_UMEM_UNALIGNED_CHUNKS) && mr.headroom != 0)
> +               return -EINVAL;

Ah. :-) I'd prefer that this is done in the bind syscall.

>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 82ea71a0f3ec..8d393873b70f 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;
>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 15:31     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 15:31 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds support for the unaligned chunks mode. The addition of the
> unaligned chunks option will allow users to run the application with more
> relaxed chunk placement in the XDP umem.
>
> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
> line options.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> ---
>  samples/bpf/xdpsock_user.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index d08ee1ab7bb4..e26f43382d01 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -67,6 +67,8 @@ static int opt_ifindex;
>  static int opt_queue;
>  static int opt_poll;
>  static int opt_interval = 1;
> +static u32 opt_umem_flags;
> +static int opt_unaligned_chunks;
>  static u32 opt_xdp_bind_flags;
>  static __u32 prog_id;
>
> @@ -276,14 +278,21 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
>  static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
>  {
>         struct xsk_umem_info *umem;
> +       struct xsk_umem_config umem_cfg;
>         int ret;
>
>         umem = calloc(1, sizeof(*umem));
>         if (!umem)
>                 exit_with_error(errno);
>
> +       umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
> +       umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
> +       umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
> +       umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +       umem_cfg.flags = opt_umem_flags;
> +
>         ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
> -                              NULL);
> +                              &umem_cfg);
>         if (ret)
>                 exit_with_error(-ret);
>
> @@ -346,6 +355,7 @@ static struct option long_options[] = {
>         {"interval", required_argument, 0, 'n'},
>         {"zero-copy", no_argument, 0, 'z'},
>         {"copy", no_argument, 0, 'c'},
> +       {"unaligned", no_argument, 0, 'u'},
>         {0, 0, 0, 0}
>  };
>
> @@ -365,6 +375,7 @@ static void usage(const char *prog)
>                 "  -n, --interval=n     Specify statistics update interval (default 1 sec).\n"
>                 "  -z, --zero-copy      Force zero-copy mode.\n"
>                 "  -c, --copy           Force copy mode.\n"
> +               "  -u, --unaligned      Enable unaligned chunk placement\n"
>                 "\n";
>         fprintf(stderr, str, prog);
>         exit(EXIT_FAILURE);
> @@ -377,7 +388,7 @@ static void parse_command_line(int argc, char **argv)
>         opterr = 0;
>
>         for (;;) {
> -               c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
> +               c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
>                                 &option_index);
>                 if (c == -1)
>                         break;
> @@ -417,9 +428,14 @@ static void parse_command_line(int argc, char **argv)
>                 case 'c':
>                         opt_xdp_bind_flags |= XDP_COPY;
>                         break;
> +               case 'u':
> +                       opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
> +                       opt_unaligned_chunks = 1;
> +                       break;
>                 case 'F':
>                         opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
>                         break;
> +
>                 default:
>                         usage(basename(argv[0]));
>                 }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-06-24 15:31     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 15:31 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds support for the unaligned chunks mode. The addition of the
> unaligned chunks option will allow users to run the application with more
> relaxed chunk placement in the XDP umem.
>
> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
> line options.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>

> ---
>  samples/bpf/xdpsock_user.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index d08ee1ab7bb4..e26f43382d01 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -67,6 +67,8 @@ static int opt_ifindex;
>  static int opt_queue;
>  static int opt_poll;
>  static int opt_interval = 1;
> +static u32 opt_umem_flags;
> +static int opt_unaligned_chunks;
>  static u32 opt_xdp_bind_flags;
>  static __u32 prog_id;
>
> @@ -276,14 +278,21 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
>  static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
>  {
>         struct xsk_umem_info *umem;
> +       struct xsk_umem_config umem_cfg;
>         int ret;
>
>         umem = calloc(1, sizeof(*umem));
>         if (!umem)
>                 exit_with_error(errno);
>
> +       umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
> +       umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
> +       umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
> +       umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +       umem_cfg.flags = opt_umem_flags;
> +
>         ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
> -                              NULL);
> +                              &umem_cfg);
>         if (ret)
>                 exit_with_error(-ret);
>
> @@ -346,6 +355,7 @@ static struct option long_options[] = {
>         {"interval", required_argument, 0, 'n'},
>         {"zero-copy", no_argument, 0, 'z'},
>         {"copy", no_argument, 0, 'c'},
> +       {"unaligned", no_argument, 0, 'u'},
>         {0, 0, 0, 0}
>  };
>
> @@ -365,6 +375,7 @@ static void usage(const char *prog)
>                 "  -n, --interval=n     Specify statistics update interval (default 1 sec).\n"
>                 "  -z, --zero-copy      Force zero-copy mode.\n"
>                 "  -c, --copy           Force copy mode.\n"
> +               "  -u, --unaligned      Enable unaligned chunk placement\n"
>                 "\n";
>         fprintf(stderr, str, prog);
>         exit(EXIT_FAILURE);
> @@ -377,7 +388,7 @@ static void parse_command_line(int argc, char **argv)
>         opterr = 0;
>
>         for (;;) {
> -               c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
> +               c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
>                                 &option_index);
>                 if (c == -1)
>                         break;
> @@ -417,9 +428,14 @@ static void parse_command_line(int argc, char **argv)
>                 case 'c':
>                         opt_xdp_bind_flags |= XDP_COPY;
>                         break;
> +               case 'u':
> +                       opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
> +                       opt_unaligned_chunks = 1;
> +                       break;
>                 case 'F':
>                         opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
>                         break;
> +
>                 default:
>                         usage(basename(argv[0]));
>                 }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 11/11] doc/af_xdp: include unaligned chunk case
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 15:34     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 15:34 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> The addition of unaligned chunks mode, the documentation needs to be
> updated to indicate that the incoming addr to the fill ring will only be
> masked if the user application is run in the aligned chunk mode. This patch
> also adds a line to explicitly indicate that the incoming addr will not be
> masked if running the user application in the unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> ---
>  Documentation/networking/af_xdp.rst | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index e14d7d40fc75..16fbc68cac50 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
>
>  Frames passed to the kernel are used for the ingress path (RX rings).
>
> -The user application produces UMEM addrs to this ring. Note that the
> -kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
> -log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
> -and 3000 refers to the same chunk.
> +The user application produces UMEM addrs to this ring. Note that, if
> +running the application with aligned chunk mode, the kernel will mask
> +the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
> +the addr will be masked off, meaning that 2048, 2050 and 3000 refers
> +to the same chunk. If the user application is run in the unaligned
> +chunks mode, then the incoming addr will be left untouched.
>
>
>  UMEM Completion Ring
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 11/11] doc/af_xdp: include unaligned chunk case
@ 2019-06-24 15:34     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 15:34 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> The addition of unaligned chunks mode, the documentation needs to be
> updated to indicate that the incoming addr to the fill ring will only be
> masked if the user application is run in the aligned chunk mode. This patch
> also adds a line to explicitly indicate that the incoming addr will not be
> masked if running the user application in the unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>

> ---
>  Documentation/networking/af_xdp.rst | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index e14d7d40fc75..16fbc68cac50 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
>
>  Frames passed to the kernel are used for the ingress path (RX rings).
>
> -The user application produces UMEM addrs to this ring. Note that the
> -kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
> -log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
> -and 3000 refers to the same chunk.
> +The user application produces UMEM addrs to this ring. Note that, if
> +running the application with aligned chunk mode, the kernel will mask
> +the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
> +the addr will be masked off, meaning that 2048, 2050 and 3000 refers
> +to the same chunk. If the user application is run in the unaligned
> +chunks mode, then the incoming addr will be left untouched.
>
>
>  UMEM Completion Ring
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 15:35     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 15:35 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds buffer recycling support for unaligned buffers. Since we
> don't mask the addr to 2k at umem_teg in unaligned mode, we need to make
> sure we give back the correct, original addr to the fill queue. To do this,
> we need to mask the addr with the buffer size.
>
> To pass in a buffer size, use the --buf-size=n argument.
> NOTE: For xdpsock to work in aligned chunk mode, you still need to pass
> 'power of 2' buffer size.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 71 +++++++++++++++++++++++++++-----------
>  1 file changed, 51 insertions(+), 20 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index e26f43382d01..7b4ce047deb2 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -60,6 +60,10 @@ enum benchmark_type {
>         BENCH_L2FWD = 2,
>  };
>
> +#define LENGTH (256UL*1024*1024)
> +#define ADDR (void *)(0x0UL)
> +#define SHMAT_FLAGS (0)

Not used.

> +
>  static enum benchmark_type opt_bench = BENCH_RXDROP;
>  static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>  static const char *opt_if = "";
> @@ -67,6 +71,7 @@ static int opt_ifindex;
>  static int opt_queue;
>  static int opt_poll;
>  static int opt_interval = 1;
> +static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>  static u32 opt_umem_flags;
>  static int opt_unaligned_chunks;
>  static u32 opt_xdp_bind_flags;
> @@ -287,7 +292,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
>
>         umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
>         umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
> -       umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
> +       umem_cfg.frame_size = opt_buffer_size;
>         umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
>         umem_cfg.flags = opt_umem_flags;
>
> @@ -334,8 +339,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
>                 exit_with_error(-ret);
>         for (i = 0;
>              i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> -                    XSK_UMEM__DEFAULT_FRAME_SIZE;
> -            i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> +                    opt_buffer_size;
> +            i += opt_buffer_size)
>                 *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
>         xsk_ring_prod__submit(&xsk->umem->fq,
>                               XSK_RING_PROD__DEFAULT_NUM_DESCS);
> @@ -356,6 +361,7 @@ static struct option long_options[] = {
>         {"zero-copy", no_argument, 0, 'z'},
>         {"copy", no_argument, 0, 'c'},
>         {"unaligned", no_argument, 0, 'u'},
> +       {"buf-size", required_argument, 0, 'b'},
>         {0, 0, 0, 0}
>  };
>
> @@ -376,6 +382,7 @@ static void usage(const char *prog)
>                 "  -z, --zero-copy      Force zero-copy mode.\n"
>                 "  -c, --copy           Force copy mode.\n"
>                 "  -u, --unaligned      Enable unaligned chunk placement\n"
> +               "  -b, --buf-size=n     Specify the buffer size to use\n"
>                 "\n";
>         fprintf(stderr, str, prog);
>         exit(EXIT_FAILURE);
> @@ -388,7 +395,7 @@ static void parse_command_line(int argc, char **argv)
>         opterr = 0;
>
>         for (;;) {
> -               c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
> +               c = getopt_long(argc, argv, "Frtli:q:psSNn:czub", long_options,
>                                 &option_index);
>                 if (c == -1)
>                         break;
> @@ -432,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
>                         opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
>                         opt_unaligned_chunks = 1;
>                         break;
> +               case 'b':
> +                       opt_buffer_size = atoi(optarg);
> +                       break;
>                 case 'F':
>                         opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
>                         break;
> @@ -483,13 +493,22 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
>                 while (ret != rcvd) {
>                         if (ret < 0)
>                                 exit_with_error(-ret);
> -                       ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
> -                                                    &idx_fq);
> +                       ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
> +               }
> +
> +               if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +                       for (i = 0; i < rcvd; i++) {
> +                               u64 comp_addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq,
> +                                               idx_cq++);
> +                               u64 masked_comp = (comp_addr & ~((u64)opt_buffer_size-1));
> +                               *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> +                                               masked_comp;
> +                       }
> +               } else {
> +                       for (i = 0; i < rcvd; i++)
> +                               *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> +                                               *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
>                 }
> -               for (i = 0; i < rcvd; i++)
> -                       *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> -                               *xsk_ring_cons__comp_addr(&xsk->umem->cq,
> -                                                         idx_cq++);
>
>                 xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
>                 xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> @@ -533,13 +552,25 @@ static void rx_drop(struct xsk_socket_info *xsk)
>                 ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
>         }
>
> -       for (i = 0; i < rcvd; i++) {
> -               u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> -               u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> -               char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +       if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +               for (i = 0; i < rcvd; i++) {
> +                       u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> +                       u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> +                       char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +                       u64 masked_addr = (addr & ~((u64)opt_buffer_size-1));
> +
> +                       hex_dump(pkt, len, addr);
> +                       *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = masked_addr;
> +               }
> +       } else {
> +               for (i = 0; i < rcvd; i++) {
> +                       u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> +                       u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> +                       char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
>
> -               hex_dump(pkt, len, addr);
> -               *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
> +                       hex_dump(pkt, len, addr);
> +                       *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
> +               }
>         }
>
>         xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> @@ -677,20 +708,20 @@ int main(int argc, char **argv)
>         }
>
>         ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
> -                            NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
> +                            NUM_FRAMES * opt_buffer_size);
>         if (ret)
>                 exit_with_error(ret);
>
>         /* Create sockets... */
>         umem = xsk_configure_umem(bufs,
> -                                 NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
> +                                 NUM_FRAMES * opt_buffer_size);
>         xsks[num_socks++] = xsk_configure_socket(umem);
>
>         if (opt_bench == BENCH_TXONLY) {
>                 int i;
>
> -               for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
> -                    i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> +               for (i = 0; i < NUM_FRAMES * opt_buffer_size;
> +                    i += opt_buffer_size)
>                         (void)gen_eth_frame(umem, i);
>         }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-06-24 15:35     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 15:35 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds buffer recycling support for unaligned buffers. Since we
> don't mask the addr to 2k at umem_teg in unaligned mode, we need to make
> sure we give back the correct, original addr to the fill queue. To do this,
> we need to mask the addr with the buffer size.
>
> To pass in a buffer size, use the --buf-size=n argument.
> NOTE: For xdpsock to work in aligned chunk mode, you still need to pass
> 'power of 2' buffer size.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 71 +++++++++++++++++++++++++++-----------
>  1 file changed, 51 insertions(+), 20 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index e26f43382d01..7b4ce047deb2 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -60,6 +60,10 @@ enum benchmark_type {
>         BENCH_L2FWD = 2,
>  };
>
> +#define LENGTH (256UL*1024*1024)
> +#define ADDR (void *)(0x0UL)
> +#define SHMAT_FLAGS (0)

Not used.

> +
>  static enum benchmark_type opt_bench = BENCH_RXDROP;
>  static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>  static const char *opt_if = "";
> @@ -67,6 +71,7 @@ static int opt_ifindex;
>  static int opt_queue;
>  static int opt_poll;
>  static int opt_interval = 1;
> +static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>  static u32 opt_umem_flags;
>  static int opt_unaligned_chunks;
>  static u32 opt_xdp_bind_flags;
> @@ -287,7 +292,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
>
>         umem_cfg.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
>         umem_cfg.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
> -       umem_cfg.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
> +       umem_cfg.frame_size = opt_buffer_size;
>         umem_cfg.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
>         umem_cfg.flags = opt_umem_flags;
>
> @@ -334,8 +339,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
>                 exit_with_error(-ret);
>         for (i = 0;
>              i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> -                    XSK_UMEM__DEFAULT_FRAME_SIZE;
> -            i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> +                    opt_buffer_size;
> +            i += opt_buffer_size)
>                 *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
>         xsk_ring_prod__submit(&xsk->umem->fq,
>                               XSK_RING_PROD__DEFAULT_NUM_DESCS);
> @@ -356,6 +361,7 @@ static struct option long_options[] = {
>         {"zero-copy", no_argument, 0, 'z'},
>         {"copy", no_argument, 0, 'c'},
>         {"unaligned", no_argument, 0, 'u'},
> +       {"buf-size", required_argument, 0, 'b'},
>         {0, 0, 0, 0}
>  };
>
> @@ -376,6 +382,7 @@ static void usage(const char *prog)
>                 "  -z, --zero-copy      Force zero-copy mode.\n"
>                 "  -c, --copy           Force copy mode.\n"
>                 "  -u, --unaligned      Enable unaligned chunk placement\n"
> +               "  -b, --buf-size=n     Specify the buffer size to use\n"
>                 "\n";
>         fprintf(stderr, str, prog);
>         exit(EXIT_FAILURE);
> @@ -388,7 +395,7 @@ static void parse_command_line(int argc, char **argv)
>         opterr = 0;
>
>         for (;;) {
> -               c = getopt_long(argc, argv, "Frtli:q:psSNn:czu", long_options,
> +               c = getopt_long(argc, argv, "Frtli:q:psSNn:czub", long_options,
>                                 &option_index);
>                 if (c == -1)
>                         break;
> @@ -432,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
>                         opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
>                         opt_unaligned_chunks = 1;
>                         break;
> +               case 'b':
> +                       opt_buffer_size = atoi(optarg);
> +                       break;
>                 case 'F':
>                         opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
>                         break;
> @@ -483,13 +493,22 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
>                 while (ret != rcvd) {
>                         if (ret < 0)
>                                 exit_with_error(-ret);
> -                       ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
> -                                                    &idx_fq);
> +                       ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
> +               }
> +
> +               if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +                       for (i = 0; i < rcvd; i++) {
> +                               u64 comp_addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq,
> +                                               idx_cq++);
> +                               u64 masked_comp = (comp_addr & ~((u64)opt_buffer_size-1));
> +                               *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> +                                               masked_comp;
> +                       }
> +               } else {
> +                       for (i = 0; i < rcvd; i++)
> +                               *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> +                                               *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
>                 }
> -               for (i = 0; i < rcvd; i++)
> -                       *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> -                               *xsk_ring_cons__comp_addr(&xsk->umem->cq,
> -                                                         idx_cq++);
>
>                 xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
>                 xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> @@ -533,13 +552,25 @@ static void rx_drop(struct xsk_socket_info *xsk)
>                 ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
>         }
>
> -       for (i = 0; i < rcvd; i++) {
> -               u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> -               u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> -               char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +       if (opt_umem_flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +               for (i = 0; i < rcvd; i++) {
> +                       u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> +                       u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> +                       char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +                       u64 masked_addr = (addr & ~((u64)opt_buffer_size-1));
> +
> +                       hex_dump(pkt, len, addr);
> +                       *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = masked_addr;
> +               }
> +       } else {
> +               for (i = 0; i < rcvd; i++) {
> +                       u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> +                       u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> +                       char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
>
> -               hex_dump(pkt, len, addr);
> -               *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
> +                       hex_dump(pkt, len, addr);
> +                       *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
> +               }
>         }
>
>         xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> @@ -677,20 +708,20 @@ int main(int argc, char **argv)
>         }
>
>         ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
> -                            NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
> +                            NUM_FRAMES * opt_buffer_size);
>         if (ret)
>                 exit_with_error(ret);
>
>         /* Create sockets... */
>         umem = xsk_configure_umem(bufs,
> -                                 NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
> +                                 NUM_FRAMES * opt_buffer_size);
>         xsks[num_socks++] = xsk_configure_socket(umem);
>
>         if (opt_bench == BENCH_TXONLY) {
>                 int i;
>
> -               for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
> -                    i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> +               for (i = 0; i < NUM_FRAMES * opt_buffer_size;
> +                    i += opt_buffer_size)
>                         (void)gen_eth_frame(umem, i);
>         }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 10/11] samples/bpf: use hugepages in xdpsock app
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 15:36     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-06-24 15:36 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, bpf, intel-wired-lan,
	Bruce Richardson, ciara.loftus

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch modifies xdpsock to use mmap instead of posix_memalign. With
> this change, we can use hugepages when running the application in unaligned
> chunks mode. Using hugepages makes it more likely that we have physically
> contiguous memory, which supports the unaligned chunk mode better.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> ---
>  samples/bpf/xdpsock_user.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index 7b4ce047deb2..8ed63ad68428 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -74,6 +74,7 @@ static int opt_interval = 1;
>  static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>  static u32 opt_umem_flags;
>  static int opt_unaligned_chunks;
> +static int opt_mmap_flags;
>  static u32 opt_xdp_bind_flags;
>  static __u32 prog_id;
>
> @@ -438,6 +439,7 @@ static void parse_command_line(int argc, char **argv)
>                 case 'u':
>                         opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
>                         opt_unaligned_chunks = 1;
> +                       opt_mmap_flags = MAP_HUGETLB;
>                         break;
>                 case 'b':
>                         opt_buffer_size = atoi(optarg);
> @@ -707,11 +709,13 @@ int main(int argc, char **argv)
>                 exit(EXIT_FAILURE);
>         }
>
> -       ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
> -                            NUM_FRAMES * opt_buffer_size);
> -       if (ret)
> -               exit_with_error(ret);
> -
> +       /* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
> +       bufs = mmap(NULL, NUM_FRAMES * opt_buffer_size, PROT_READ|PROT_WRITE,
> +                       MAP_PRIVATE|MAP_ANONYMOUS|opt_mmap_flags, -1, 0);
> +       if (bufs == MAP_FAILED) {
> +               printf("ERROR: mmap failed\n");
> +               exit(EXIT_FAILURE);
> +       }
>         /* Create sockets... */
>         umem = xsk_configure_umem(bufs,
>                                   NUM_FRAMES * opt_buffer_size);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 10/11] samples/bpf: use hugepages in xdpsock app
@ 2019-06-24 15:36     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-06-24 15:36 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 at 19:25, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch modifies xdpsock to use mmap instead of posix_memalign. With
> this change, we can use hugepages when running the application in unaligned
> chunks mode. Using hugepages makes it more likely that we have physically
> contiguous memory, which supports the unaligned chunk mode better.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>

> ---
>  samples/bpf/xdpsock_user.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index 7b4ce047deb2..8ed63ad68428 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -74,6 +74,7 @@ static int opt_interval = 1;
>  static u64 opt_buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>  static u32 opt_umem_flags;
>  static int opt_unaligned_chunks;
> +static int opt_mmap_flags;
>  static u32 opt_xdp_bind_flags;
>  static __u32 prog_id;
>
> @@ -438,6 +439,7 @@ static void parse_command_line(int argc, char **argv)
>                 case 'u':
>                         opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
>                         opt_unaligned_chunks = 1;
> +                       opt_mmap_flags = MAP_HUGETLB;
>                         break;
>                 case 'b':
>                         opt_buffer_size = atoi(optarg);
> @@ -707,11 +709,13 @@ int main(int argc, char **argv)
>                 exit(EXIT_FAILURE);
>         }
>
> -       ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
> -                            NUM_FRAMES * opt_buffer_size);
> -       if (ret)
> -               exit_with_error(ret);
> -
> +       /* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
> +       bufs = mmap(NULL, NUM_FRAMES * opt_buffer_size, PROT_READ|PROT_WRITE,
> +                       MAP_PRIVATE|MAP_ANONYMOUS|opt_mmap_flags, -1, 0);
> +       if (bufs == MAP_FAILED) {
> +               printf("ERROR: mmap failed\n");
> +               exit(EXIT_FAILURE);
> +       }
>         /* Create sockets... */
>         umem = xsk_configure_umem(bufs,
>                                   NUM_FRAMES * opt_buffer_size);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 03/11] xdp: add offset param to zero_copy_allocator
  2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-06-24 19:23     ` Jakub Kicinski
  -1 siblings, 0 replies; 274+ messages in thread
From: Jakub Kicinski @ 2019-06-24 19:23 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, bpf,
	intel-wired-lan, bruce.richardson, ciara.loftus

On Thu, 20 Jun 2019 09:09:50 +0000, Kevin Laatz wrote:
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 0f25b3675c5c..ea801fd2bf98 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -53,7 +53,8 @@ struct xdp_mem_info {
>  struct page_pool;
>  
>  struct zero_copy_allocator {
> -	void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
> +	void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
> +			off_t off);
>  };

Please run checkpatch --strict on all your changes.  The code
formatting is incorrect in many ways in this series.

Please include performance measurements proving the slow down
is negligible in the cover letter.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 03/11] xdp: add offset param to zero_copy_allocator
@ 2019-06-24 19:23     ` Jakub Kicinski
  0 siblings, 0 replies; 274+ messages in thread
From: Jakub Kicinski @ 2019-06-24 19:23 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 20 Jun 2019 09:09:50 +0000, Kevin Laatz wrote:
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 0f25b3675c5c..ea801fd2bf98 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -53,7 +53,8 @@ struct xdp_mem_info {
>  struct page_pool;
>  
>  struct zero_copy_allocator {
> -	void (*free)(struct zero_copy_allocator *zca, unsigned long handle);
> +	void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
> +			off_t off);
>  };

Please run checkpatch --strict on all your changes.  The code
formatting is incorrect in many ways in this series.

Please include performance measurements proving the slow down
is negligible in the cover letter.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH 03/11] xdp: add offset param to zero_copy_allocator
  2019-06-24 19:23     ` [Intel-wired-lan] " Jakub Kicinski
@ 2019-06-25 13:14       ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-06-25 13:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, bpf,
	intel-wired-lan, bruce.richardson, ciara.loftus


On 24/06/2019 20:23, Jakub Kicinski wrote:
> On Thu, 20 Jun 2019 09:09:50 +0000, Kevin Laatz wrote:
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index 0f25b3675c5c..ea801fd2bf98 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -53,7 +53,8 @@ struct xdp_mem_info {
>>  struct page_pool;
>>
>>  struct zero_copy_allocator {
>> -    void (*free)(struct zero_copy_allocator *zca, unsigned long 
>> handle);
>> +    void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
>> +            off_t off);
>>  };
>
> Please run checkpatch --strict on all your changes.  The code
> formatting is incorrect in many ways in this series.
>
Thanks, will fix in the v2.


> Please include performance measurements proving the slow down
> is negligible in the cover letter.
>
Good suggestion. Will add to the cover letter in the v2!

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH 03/11] xdp: add offset param to zero_copy_allocator
@ 2019-06-25 13:14       ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-06-25 13:14 UTC (permalink / raw)
  To: intel-wired-lan


On 24/06/2019 20:23, Jakub Kicinski wrote:
> On Thu, 20 Jun 2019 09:09:50 +0000, Kevin Laatz wrote:
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index 0f25b3675c5c..ea801fd2bf98 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -53,7 +53,8 @@ struct xdp_mem_info {
>> ?struct page_pool;
>>
>> ?struct zero_copy_allocator {
>> -??? void (*free)(struct zero_copy_allocator *zca, unsigned long 
>> handle);
>> +??? void (*free)(struct zero_copy_allocator *zca, unsigned long handle,
>> +??????????? off_t off);
>> ?};
>
> Please run checkpatch --strict on all your changes.? The code
> formatting is incorrect in many ways in this series.
>
Thanks, will fix in the v2.


> Please include performance measurements proving the slow down
> is negligible in the cover letter.
>
Good suggestion. Will add to the cover letter in the v2!

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v2 00/10] XDP unaligned chunk placement support
  2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06   ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)

These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).

+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+ 

NOTE: We are currently working on the changes required in the Mellanox
driver. We will include these in the v3.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format. The new format is particularly useful here since we can now
    retrieve the original address in places like i40e_zca_free with ease.
    This saves us doing various calculations to get the original address
    back.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format. The new format is particularly useful here since we can now
    retrieve the original address in places like ixgbe_zca_free with ease.
    This saves us doing various calculations to get the original address
    back.

Patch 6:
  - Add flags for umem configuration to libbpf

Patch 7:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 8:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 9:
  - Adds hugepage support to the xdpsock application

Patch 10:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

Kevin Laatz (10):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst          | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
 include/net/xdp_sock.h                       |  2 +
 include/uapi/linux/if_xdp.h                  |  9 ++
 net/xdp/xdp_umem.c                           | 17 ++--
 net/xdp/xsk.c                                | 89 ++++++++++++++++----
 net/xdp/xsk_diag.c                           |  2 +-
 net/xdp/xsk_queue.h                          | 70 +++++++++++++--
 samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
 tools/include/uapi/linux/if_xdp.h            |  4 +
 tools/lib/bpf/xsk.c                          |  3 +
 tools/lib/bpf/xsk.h                          |  2 +
 13 files changed, 266 insertions(+), 81 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
@ 2019-07-16  3:06   ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)

These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).

+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+ 

NOTE: We are currently working on the changes required in the Mellanox
driver. We will include these in the v3.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format. The new format is particularly useful here since we can now
    retrieve the original address in places like i40e_zca_free with ease.
    This saves us doing various calculations to get the original address
    back.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format. The new format is particularly useful here since we can now
    retrieve the original address in places like ixgbe_zca_free with ease.
    This saves us doing various calculations to get the original address
    back.

Patch 6:
  - Add flags for umem configuration to libbpf

Patch 7:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 8:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 9:
  - Adds hugepage support to the xdpsock application

Patch 10:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

Kevin Laatz (10):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst          | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
 include/net/xdp_sock.h                       |  2 +
 include/uapi/linux/if_xdp.h                  |  9 ++
 net/xdp/xdp_umem.c                           | 17 ++--
 net/xdp/xsk.c                                | 89 ++++++++++++++++----
 net/xdp/xsk_diag.c                           |  2 +-
 net/xdp/xsk_queue.h                          | 70 +++++++++++++--
 samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
 tools/include/uapi/linux/if_xdp.h            |  4 +
 tools/lib/bpf/xsk.c                          |  3 +
 tools/lib/bpf/xsk.h                          |  2 +
 13 files changed, 266 insertions(+), 81 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v2 01/10] i40e: simplify Rx buffer recycle
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 01/10] i40e: simplify Rx buffer recycle
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 03/10] xsk: add support to allow unaligned chunk placement
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format
---
 include/net/xdp_sock.h      |  2 +
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 17 ++++---
 net/xdp/xsk.c               | 89 ++++++++++++++++++++++++++++++-------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 70 +++++++++++++++++++++++++----
 6 files changed, 159 insertions(+), 30 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..f7ab8ff33f06 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..f8dc68fcdf78 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -74,6 +78,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 20c91f02d3d8..6130735bdd3d 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -303,6 +303,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -318,7 +319,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -335,9 +339,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -346,13 +352,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index d4d6f10aa936..78089825821a 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & (PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+	offset += metalen;
+	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	else
+		addr += offset;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	u32 len = xdp->data_end - xdp->data;
 	void *buffer;
 	u64 addr;
+	u64 offset = xs->umem->headroom;
 	int err;
 
 	spin_lock_bh(&xs->rx_lock);
@@ -136,17 +162,20 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	else
+		addr += offset;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -190,7 +219,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -240,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 	mutex_lock(&xs->mutex);
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -265,6 +294,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 		skb_put(skb, len);
 		addr = desc.addr;
+		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
+				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		if (unlikely(err)) {
@@ -275,7 +308,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -415,6 +448,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs - 1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE ==
+						umem->pages[i + 1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs - 1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE ==
+					umem->pages[i + 1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -502,6 +557,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..04afc9de86d9 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -143,23 +153,52 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+		if (*addr & (~XSK_UNALIGNED_BUF_ADDR_MASK))
+			goto out;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -170,7 +209,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +268,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -244,14 +296,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -261,7 +314,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -272,7 +326,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 03/10] xsk: add support to allow unaligned chunk placement
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format
---
 include/net/xdp_sock.h      |  2 +
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 17 ++++---
 net/xdp/xsk.c               | 89 ++++++++++++++++++++++++++++++-------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 70 +++++++++++++++++++++++++----
 6 files changed, 159 insertions(+), 30 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..f7ab8ff33f06 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..f8dc68fcdf78 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -74,6 +78,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 20c91f02d3d8..6130735bdd3d 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -303,6 +303,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -318,7 +319,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -335,9 +339,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -346,13 +352,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index d4d6f10aa936..78089825821a 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & (PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+	offset += metalen;
+	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	else
+		addr += offset;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	u32 len = xdp->data_end - xdp->data;
 	void *buffer;
 	u64 addr;
+	u64 offset = xs->umem->headroom;
 	int err;
 
 	spin_lock_bh(&xs->rx_lock);
@@ -136,17 +162,20 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	else
+		addr += offset;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -190,7 +219,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -240,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 	mutex_lock(&xs->mutex);
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -265,6 +294,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 		skb_put(skb, len);
 		addr = desc.addr;
+		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
+				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		if (unlikely(err)) {
@@ -275,7 +308,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -415,6 +448,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs - 1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE ==
+						umem->pages[i + 1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs - 1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE ==
+					umem->pages[i + 1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -502,6 +557,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..04afc9de86d9 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -143,23 +153,52 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+		if (*addr & (~XSK_UNALIGNED_BUF_ADDR_MASK))
+			goto out;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -170,7 +209,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +268,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -244,14 +296,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -261,7 +314,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -272,7 +326,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 04/10] i40e: modify driver for handling offsets
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..b8316e9ba159 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,13 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		xdp->handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		xdp->handle += offset;
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +270,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +307,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -456,7 +464,10 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	if (rx_ring->xsk_umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		handle &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	else
+		handle &= mask;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
@@ -635,6 +646,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 	struct i40e_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 
 	while (budget-- > 0) {
@@ -647,7 +659,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling offsets
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..b8316e9ba159 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,13 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		xdp->handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		xdp->handle += offset;
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +270,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +307,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -456,7 +464,10 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	if (rx_ring->xsk_umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		handle &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	else
+		handle &= mask;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
@@ -635,6 +646,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 	struct i40e_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 
 	while (budget-- > 0) {
@@ -647,7 +659,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 05/10] ixgbe: modify driver for handling offsets
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 ++++++++++++++++----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..ac1669b18d13 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,13 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		xdp->handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		xdp->handle += offset;
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -235,7 +243,10 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	if (rx_ring->xsk_umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		handle &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	else
+		handle &= mask;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
@@ -269,7 +280,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +307,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -565,6 +576,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 	struct ixgbe_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 	u32 cmd_type;
 
@@ -578,7 +590,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling offsets
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 ++++++++++++++++----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..ac1669b18d13 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,13 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		xdp->handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		xdp->handle += offset;
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -235,7 +243,10 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	nta++;
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-	handle &= mask;
+	if (rx_ring->xsk_umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		handle &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	else
+		handle &= mask;
 
 	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
 	bi->dma += hr;
@@ -269,7 +280,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +307,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -565,6 +576,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 	struct ixgbe_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 	u32 cmd_type;
 
@@ -578,7 +590,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 06/10] libbpf: add flags to umem config
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 3 +++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..594bcebb189c 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index b33740221b7e..d5ff3fc39e32 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 06/10] libbpf: add flags to umem config
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 3 +++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..594bcebb189c 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index b33740221b7e..d5ff3fc39e32 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 07/10] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..26ba1a1fd582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -282,7 +284,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -291,6 +295,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -352,6 +357,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -372,6 +378,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -384,7 +391,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -424,12 +431,17 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -442,7 +454,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+			!opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 07/10] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..26ba1a1fd582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -282,7 +284,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -291,6 +295,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -352,6 +357,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -372,6 +378,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -384,7 +391,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -424,12 +431,17 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -442,7 +454,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+			!opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 08/10] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling
---
 samples/bpf/xdpsock_user.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 26ba1a1fd582..8f220afd549a 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -474,6 +474,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -486,22 +487,24 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
+		}
+
+		for (i = 0; i < rcvd; i++) {
+			u64 comp_addr =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				comp_addr & XSK_UNALIGNED_BUF_ADDR_MASK;
 		}
-		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -548,7 +551,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				addr + offset);
 
 		hex_dump(pkt, len, addr);
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -654,7 +661,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
 							  idx_rx)->addr;
 			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
 							 idx_rx++)->len;
-			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
 
 			swap_mac_addresses(pkt);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 08/10] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling
---
 samples/bpf/xdpsock_user.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 26ba1a1fd582..8f220afd549a 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -474,6 +474,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -486,22 +487,24 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
+		}
+
+		for (i = 0; i < rcvd; i++) {
+			u64 comp_addr =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				comp_addr & XSK_UNALIGNED_BUF_ADDR_MASK;
 		}
-		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -548,7 +551,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				addr + offset);
 
 		hex_dump(pkt, len, addr);
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -654,7 +661,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
 							  idx_rx)->addr;
 			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
 							 idx_rx++)->len;
-			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
 
 			swap_mac_addresses(pkt);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 09/10] samples/bpf: use hugepages in xdpsock app
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 8f220afd549a..958a27193582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -69,6 +69,7 @@ static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -434,6 +435,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -696,11 +698,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 09/10] samples/bpf: use hugepages in xdpsock app
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 8f220afd549a..958a27193582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -69,6 +69,7 @@ static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -434,6 +435,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -696,11 +698,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 10/10] doc/af_xdp: include unaligned chunk case
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-16  3:06     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 10/10] doc/af_xdp: include unaligned chunk case
@ 2019-07-16  3:06     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-16  3:06 UTC (permalink / raw)
  To: intel-wired-lan

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* RE: [Intel-wired-lan] [PATCH v2 01/10] i40e: simplify Rx buffer recycle
  2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-19 17:19       ` Bowers, AndrewX
  -1 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:19 UTC (permalink / raw)
  To: netdev; +Cc: intel-wired-lan, bpf

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:06 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 01/10] i40e: simplify Rx buffer recycle
> 
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the copies
> within the function, we can use the original 'old_bi' values directly without
> having to mask and add the headroom.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 01/10] i40e: simplify Rx buffer recycle
@ 2019-07-19 17:19       ` Bowers, AndrewX
  0 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:19 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:06 PM
> To: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> jonathan.lemon at gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan at lists.osuosl.org;
> bpf at vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 01/10] i40e: simplify Rx buffer recycle
> 
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the copies
> within the function, we can use the original 'old_bi' values directly without
> having to mask and add the headroom.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [Intel-wired-lan] [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
  2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-19 17:20       ` Bowers, AndrewX
  -1 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:20 UTC (permalink / raw)
  To: netdev, ast; +Cc: intel-wired-lan, bpf

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:06 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
> 
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the copies
> within the function, we can use the original 'obi' values directly without
> having to mask and add the headroom.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
@ 2019-07-19 17:20       ` Bowers, AndrewX
  0 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:20 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:06 PM
> To: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> jonathan.lemon at gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan at lists.osuosl.org;
> bpf at vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
> 
> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the copies
> within the function, we can use the original 'obi' values directly without
> having to mask and add the headroom.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [Intel-wired-lan] [PATCH v2 03/10] xsk: add support to allow unaligned chunk placement
  2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-19 17:21       ` Bowers, AndrewX
  -1 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:21 UTC (permalink / raw)
  To: netdev, ast; +Cc: intel-wired-lan, bpf

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 03/10] xsk: add support to allow
> unaligned chunk placement
> 
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed at
> 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
> ---
>  include/net/xdp_sock.h      |  2 +
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 17 ++++---
>  net/xdp/xsk.c               | 89 ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 70 +++++++++++++++++++++++++----
>  6 files changed, 159 insertions(+), 30 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 03/10] xsk: add support to allow unaligned chunk placement
@ 2019-07-19 17:21       ` Bowers, AndrewX
  0 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:21 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> jonathan.lemon at gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan at lists.osuosl.org;
> bpf at vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 03/10] xsk: add support to allow
> unaligned chunk placement
> 
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed at
> 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
> ---
>  include/net/xdp_sock.h      |  2 +
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 17 ++++---
>  net/xdp/xsk.c               | 89 ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 70 +++++++++++++++++++++++++----
>  6 files changed, 159 insertions(+), 30 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling offsets
  2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-19 17:22       ` Bowers, AndrewX
  -1 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:22 UTC (permalink / raw)
  To: netdev, ast; +Cc: intel-wired-lan, bpf

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling
> offsets
> 
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for each
> case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling offsets
@ 2019-07-19 17:22       ` Bowers, AndrewX
  0 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:22 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> jonathan.lemon at gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan at lists.osuosl.org;
> bpf at vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling
> offsets
> 
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for each
> case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling offsets
  2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-19 17:22       ` Bowers, AndrewX
  -1 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:22 UTC (permalink / raw)
  To: netdev, ast; +Cc: intel-wired-lan, bpf

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling
> offsets
> 
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for each
> case.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 ++++++++++++++++----
>  1 file changed, 21 insertions(+), 5 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling offsets
@ 2019-07-19 17:22       ` Bowers, AndrewX
  0 siblings, 0 replies; 274+ messages in thread
From: Bowers, AndrewX @ 2019-07-19 17:22 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> jonathan.lemon at gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan at lists.osuosl.org;
> bpf at vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling
> offsets
> 
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for each
> case.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 ++++++++++++++++----
>  1 file changed, 21 insertions(+), 5 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v2 00/10] XDP unaligned chunk placement support
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-23 21:08     ` Alexei Starovoitov
  -1 siblings, 0 replies; 274+ messages in thread
From: Alexei Starovoitov @ 2019-07-23 21:08 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, bruce.richardson, ciara.loftus, bpf,
	intel-wired-lan

Johnathan, Bjorn, Jakub,
Please review!
The patch set has been pending for a week.

On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> place chunks within the umem as well as limiting the packet sizes that are
> supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
>
> The numbers below were recorded with the following set up:
>   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
>   - Driver: i40e
>   - Application: xdpsock with l2fwd (single interface)
>
> These are solely for comparing performance with and without the patches.
> The largest drop was ~1% (in zero-copy mode).
>
> +-------------------------+------------+-----------------+-------------+
> | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
> +-------------------------+------------+-----------------+-------------+
> | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
> | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
> | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
>
> NOTE: We are currently working on the changes required in the Mellanox
> driver. We will include these in the v3.
>
> Structure of the patchset:
> Patch 1:
>   - Remove unnecessary masking and headroom addition during zero-copy Rx
>     buffer recycling in i40e. This change is required in order for the
>     buffer recycling to work in the unaligned chunk mode.
>
> Patch 2:
>   - Remove unnecessary masking and headroom addition during
>     zero-copy Rx buffer recycling in ixgbe. This change is required in
>     order for the  buffer recycling to work in the unaligned chunk mode.
>
> Patch 3:
>   - Add infrastructure for unaligned chunks. Since we are dealing with
>     unaligned chunks that could potentially cross a physical page boundary,
>     we add checks to keep track of that information. We can later use this
>     information to correctly handle buffers that are placed at an address
>     where they cross a page boundary.  This patch also modifies the
>     existing Rx and Tx functions to use the new descriptor format. To
>     handle addresses correctly, we need to mask appropriately based on
>     whether we are in aligned or unaligned mode.
>
> Patch 4:
>   - This patch updates the i40e driver to make use of the new descriptor
>     format. The new format is particularly useful here since we can now
>     retrieve the original address in places like i40e_zca_free with ease.
>     This saves us doing various calculations to get the original address
>     back.
>
> Patch 5:
>   - This patch updates the ixgbe driver to make use of the new descriptor
>     format. The new format is particularly useful here since we can now
>     retrieve the original address in places like ixgbe_zca_free with ease.
>     This saves us doing various calculations to get the original address
>     back.
>
> Patch 6:
>   - Add flags for umem configuration to libbpf
>
> Patch 7:
>   - Modify xdpsock application to add a command line option for
>     unaligned chunks
>
> Patch 8:
>   - Since we can now run the application in unaligned chunk mode, we need
>     to make sure we recycle the buffers appropriately.
>
> Patch 9:
>   - Adds hugepage support to the xdpsock application
>
> Patch 10:
>   - Documentation update to include the unaligned chunk scenario. We need
>     to explicitly state that the incoming addresses are only masked in the
>     aligned chunk mode and not the unaligned chunk mode.
>
> ---
> v2:
>   - fixed checkpatch issues
>   - fixed Rx buffer recycling for unaligned chunks in xdpsock
>   - removed unused defines
>   - fixed how chunk_size is calculated in xsk_diag.c
>   - added some performance numbers to cover letter
>   - modified descriptor format to make it easier to retrieve original
>     address
>   - removed patch adding off_t off to the zero copy allocator. This is no
>     longer needed with the new descriptor format.
>
> Kevin Laatz (10):
>   i40e: simplify Rx buffer recycle
>   ixgbe: simplify Rx buffer recycle
>   xsk: add support to allow unaligned chunk placement
>   i40e: modify driver for handling offsets
>   ixgbe: modify driver for handling offsets
>   libbpf: add flags to umem config
>   samples/bpf: add unaligned chunks mode support to xdpsock
>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>   samples/bpf: use hugepages in xdpsock app
>   doc/af_xdp: include unaligned chunk case
>
>  Documentation/networking/af_xdp.rst          | 10 ++-
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
>  include/net/xdp_sock.h                       |  2 +
>  include/uapi/linux/if_xdp.h                  |  9 ++
>  net/xdp/xdp_umem.c                           | 17 ++--
>  net/xdp/xsk.c                                | 89 ++++++++++++++++----
>  net/xdp/xsk_diag.c                           |  2 +-
>  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
>  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
>  tools/include/uapi/linux/if_xdp.h            |  4 +
>  tools/lib/bpf/xsk.c                          |  3 +
>  tools/lib/bpf/xsk.h                          |  2 +
>  13 files changed, 266 insertions(+), 81 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
@ 2019-07-23 21:08     ` Alexei Starovoitov
  0 siblings, 0 replies; 274+ messages in thread
From: Alexei Starovoitov @ 2019-07-23 21:08 UTC (permalink / raw)
  To: intel-wired-lan

Johnathan, Bjorn, Jakub,
Please review!
The patch set has been pending for a week.

On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> place chunks within the umem as well as limiting the packet sizes that are
> supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
>
> The numbers below were recorded with the following set up:
>   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
>   - Driver: i40e
>   - Application: xdpsock with l2fwd (single interface)
>
> These are solely for comparing performance with and without the patches.
> The largest drop was ~1% (in zero-copy mode).
>
> +-------------------------+------------+-----------------+-------------+
> | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
> +-------------------------+------------+-----------------+-------------+
> | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
> | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
> | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
>
> NOTE: We are currently working on the changes required in the Mellanox
> driver. We will include these in the v3.
>
> Structure of the patchset:
> Patch 1:
>   - Remove unnecessary masking and headroom addition during zero-copy Rx
>     buffer recycling in i40e. This change is required in order for the
>     buffer recycling to work in the unaligned chunk mode.
>
> Patch 2:
>   - Remove unnecessary masking and headroom addition during
>     zero-copy Rx buffer recycling in ixgbe. This change is required in
>     order for the  buffer recycling to work in the unaligned chunk mode.
>
> Patch 3:
>   - Add infrastructure for unaligned chunks. Since we are dealing with
>     unaligned chunks that could potentially cross a physical page boundary,
>     we add checks to keep track of that information. We can later use this
>     information to correctly handle buffers that are placed at an address
>     where they cross a page boundary.  This patch also modifies the
>     existing Rx and Tx functions to use the new descriptor format. To
>     handle addresses correctly, we need to mask appropriately based on
>     whether we are in aligned or unaligned mode.
>
> Patch 4:
>   - This patch updates the i40e driver to make use of the new descriptor
>     format. The new format is particularly useful here since we can now
>     retrieve the original address in places like i40e_zca_free with ease.
>     This saves us doing various calculations to get the original address
>     back.
>
> Patch 5:
>   - This patch updates the ixgbe driver to make use of the new descriptor
>     format. The new format is particularly useful here since we can now
>     retrieve the original address in places like ixgbe_zca_free with ease.
>     This saves us doing various calculations to get the original address
>     back.
>
> Patch 6:
>   - Add flags for umem configuration to libbpf
>
> Patch 7:
>   - Modify xdpsock application to add a command line option for
>     unaligned chunks
>
> Patch 8:
>   - Since we can now run the application in unaligned chunk mode, we need
>     to make sure we recycle the buffers appropriately.
>
> Patch 9:
>   - Adds hugepage support to the xdpsock application
>
> Patch 10:
>   - Documentation update to include the unaligned chunk scenario. We need
>     to explicitly state that the incoming addresses are only masked in the
>     aligned chunk mode and not the unaligned chunk mode.
>
> ---
> v2:
>   - fixed checkpatch issues
>   - fixed Rx buffer recycling for unaligned chunks in xdpsock
>   - removed unused defines
>   - fixed how chunk_size is calculated in xsk_diag.c
>   - added some performance numbers to cover letter
>   - modified descriptor format to make it easier to retrieve original
>     address
>   - removed patch adding off_t off to the zero copy allocator. This is no
>     longer needed with the new descriptor format.
>
> Kevin Laatz (10):
>   i40e: simplify Rx buffer recycle
>   ixgbe: simplify Rx buffer recycle
>   xsk: add support to allow unaligned chunk placement
>   i40e: modify driver for handling offsets
>   ixgbe: modify driver for handling offsets
>   libbpf: add flags to umem config
>   samples/bpf: add unaligned chunks mode support to xdpsock
>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>   samples/bpf: use hugepages in xdpsock app
>   doc/af_xdp: include unaligned chunk case
>
>  Documentation/networking/af_xdp.rst          | 10 ++-
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
>  include/net/xdp_sock.h                       |  2 +
>  include/uapi/linux/if_xdp.h                  |  9 ++
>  net/xdp/xdp_umem.c                           | 17 ++--
>  net/xdp/xsk.c                                | 89 ++++++++++++++++----
>  net/xdp/xsk_diag.c                           |  2 +-
>  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
>  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
>  tools/include/uapi/linux/if_xdp.h            |  4 +
>  tools/lib/bpf/xsk.c                          |  3 +
>  tools/lib/bpf/xsk.h                          |  2 +
>  13 files changed, 266 insertions(+), 81 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
  2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10     ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)

These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).

+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+

This patch set has been applied against commit 66b5f1c43984
("net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 33 +++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 33 +++----
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  9 +-
 include/net/xdp_sock.h                        | 17 ++++
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 18 ++--
 net/xdp/xsk.c                                 | 86 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 68 +++++++++++++--
 samples/bpf/xdpsock_user.c                    | 56 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  3 +
 tools/lib/bpf/xsk.h                           |  2 +
 15 files changed, 274 insertions(+), 84 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
@ 2019-07-24  5:10     ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)

These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).

+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+

This patch set has been applied against commit 66b5f1c43984
("net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 33 +++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 33 +++----
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  9 +-
 include/net/xdp_sock.h                        | 17 ++++
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 18 ++--
 net/xdp/xsk.c                                 | 86 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 68 +++++++++++++--
 samples/bpf/xdpsock_user.c                    | 56 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  3 +
 tools/lib/bpf/xsk.h                           |  2 +
 15 files changed, 274 insertions(+), 84 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 01/11] i40e: simplify Rx buffer recycle
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 01/11] i40e: simplify Rx buffer recycle
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 02/11] ixgbe: simplify Rx buffer recycle
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 02/11] ixgbe: simplify Rx buffer recycle
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition
---
 include/net/xdp_sock.h      | 17 ++++++++
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 18 +++++---
 net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
 6 files changed, 170 insertions(+), 30 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..738996c0f995 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
@@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return handle += offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 {
 }
 
+static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..f8dc68fcdf78 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -74,6 +78,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..952ca22103e9 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 59b57d708697..b3ab653091c4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & (PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+	offset += metalen;
+	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	else
+		addr += offset;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	u32 len = xdp->data_end - xdp->data;
 	void *buffer;
 	u64 addr;
+	u64 offset = xs->umem->headroom;
 	int err;
 
 	spin_lock_bh(&xs->rx_lock);
@@ -136,17 +162,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	addr = xsk_umem_handle_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -190,7 +216,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 		skb_put(skb, len);
 		addr = desc.addr;
+		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
+				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
@@ -272,7 +302,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -412,6 +442,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs - 1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE ==
+						umem->pages[i + 1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs - 1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE ==
+					umem->pages[i + 1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -500,6 +552,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..0d77212367f0 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -143,23 +153,50 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -170,7 +207,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +266,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -244,14 +294,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -261,7 +312,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -272,7 +324,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition
---
 include/net/xdp_sock.h      | 17 ++++++++
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 18 +++++---
 net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
 6 files changed, 170 insertions(+), 30 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..738996c0f995 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
+	bool next_pg_contig;
 };
 
 struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u32 flags;
 };
 
 struct xdp_sock {
@@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return handle += offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 {
 }
 
+static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..f8dc68fcdf78 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -74,6 +78,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..952ca22103e9 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 59b57d708697..b3ab653091c4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & (PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+	offset += metalen;
+	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	else
+		addr += offset;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	u32 len = xdp->data_end - xdp->data;
 	void *buffer;
 	u64 addr;
+	u64 offset = xs->umem->headroom;
 	int err;
 
 	spin_lock_bh(&xs->rx_lock);
@@ -136,17 +162,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	addr = xsk_umem_handle_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -190,7 +216,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 		skb_put(skb, len);
 		addr = desc.addr;
+		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
+				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
@@ -272,7 +302,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -412,6 +442,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	int i;
+
+	if (flags & XDP_ZEROCOPY) {
+		for (i = 0; i < umem->npgs - 1; i++)
+			umem->pages[i].next_pg_contig =
+					(umem->pages[i].dma + PAGE_SIZE ==
+						umem->pages[i + 1].dma);
+		return;
+	}
+
+	for (i = 0; i < umem->npgs - 1; i++)
+		umem->pages[i].next_pg_contig =
+				(umem->pages[i].addr + PAGE_SIZE ==
+					umem->pages[i + 1].addr);
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -500,6 +552,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..0d77212367f0 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -143,23 +153,50 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -170,7 +207,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +266,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -244,14 +294,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -261,7 +312,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -272,7 +324,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 04/11] i40e: modify driver for handling offsets
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..599498e21e4a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_handle_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
@@ -635,6 +640,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 	struct i40e_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 
 	while (budget-- > 0) {
@@ -647,7 +653,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 04/11] i40e: modify driver for handling offsets
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..599498e21e4a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_handle_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
@@ -635,6 +640,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 	struct i40e_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 
 	while (budget-- > 0) {
@@ -647,7 +653,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 05/11] ixgbe: modify driver for handling offsets
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..4481916563a8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_handle_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -565,6 +570,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 	struct ixgbe_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 	u32 cmd_type;
 
@@ -578,7 +584,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 05/11] ixgbe: modify driver for handling offsets
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..4481916563a8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_handle_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -565,6 +570,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 	struct ixgbe_tx_buffer *tx_bi;
 	bool work_done = true;
 	struct xdp_desc desc;
+	u64 addr, offset;
 	dma_addr_t dma;
 	u32 cmd_type;
 
@@ -578,7 +584,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
 			break;
 
-		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
 
 		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
 					   DMA_BIDIRECTIONAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index b0b982cf69bb..d5245893d2c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 35e188cf4ea4..f596e63cba00 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 	struct mlx5e_xdp_xmit_data xdptxd;
 	bool work_done = true;
 	bool flush = false;
+	u64 addr, offset;
 
 	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
 
@@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
-		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
+		xdptxd.data = xdp_umem_get_data(umem, addr + offset);
 		xdptxd.len = desc.len;
 
 		dma_sync_single_for_device(sq->pdev, xdptxd.dma_addr,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index b0b982cf69bb..d5245893d2c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 35e188cf4ea4..f596e63cba00 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 	struct mlx5e_xdp_xmit_data xdptxd;
 	bool work_done = true;
 	bool flush = false;
+	u64 addr, offset;
 
 	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
 
@@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
-		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
+		/* for unaligned chunks need to take offset from upper bits */
+		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
+		xdptxd.data = xdp_umem_get_data(umem, addr + offset);
 		xdptxd.len = desc.len;
 
 		dma_sync_single_for_device(sq->pdev, xdptxd.dma_addr,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 07/11] libbpf: add flags to umem config
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 3 +++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..594bcebb189c 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 5007b5d4fd2c..5e7e4d420ee0 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 07/11] libbpf: add flags to umem config
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.
---
 tools/include/uapi/linux/if_xdp.h | 4 ++++
 tools/lib/bpf/xsk.c               | 3 +++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..594bcebb189c 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 5007b5d4fd2c..5e7e4d420ee0 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..26ba1a1fd582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -282,7 +284,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -291,6 +295,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -352,6 +357,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -372,6 +378,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -384,7 +391,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -424,12 +431,17 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -442,7 +454,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+			!opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..26ba1a1fd582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -282,7 +284,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -291,6 +295,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -352,6 +357,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -372,6 +378,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -384,7 +391,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -424,12 +431,17 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -442,7 +454,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+			!opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling
---
 samples/bpf/xdpsock_user.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 26ba1a1fd582..f824fa3b5021 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -474,6 +474,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -486,22 +487,21 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -548,7 +548,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				addr + offset);
 
 		hex_dump(pkt, len, addr);
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -654,7 +658,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
 							  idx_rx)->addr;
 			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
 							 idx_rx++)->len;
-			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
 
 			swap_mac_addresses(pkt);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling
---
 samples/bpf/xdpsock_user.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 26ba1a1fd582..f824fa3b5021 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -474,6 +474,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -486,22 +487,21 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -548,7 +548,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				addr + offset);
 
 		hex_dump(pkt, len, addr);
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -654,7 +658,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
 							  idx_rx)->addr;
 			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
 							 idx_rx++)->len;
-			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
 
 			swap_mac_addresses(pkt);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 10/11] samples/bpf: use hugepages in xdpsock app
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index f824fa3b5021..d5cf3111d3b0 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -69,6 +69,7 @@ static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -434,6 +435,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -693,11 +695,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 10/11] samples/bpf: use hugepages in xdpsock app
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index f824fa3b5021..d5cf3111d3b0 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -69,6 +69,7 @@ static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -434,6 +435,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -693,11 +695,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v3 11/11] doc/af_xdp: include unaligned chunk case
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-24  5:10       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 11/11] doc/af_xdp: include unaligned chunk case
@ 2019-07-24  5:10       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-24  5:10 UTC (permalink / raw)
  To: intel-wired-lan

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
  2019-07-23 21:08     ` [Intel-wired-lan] " Alexei Starovoitov
@ 2019-07-24 13:25       ` Magnus Karlsson
  -1 siblings, 0 replies; 274+ messages in thread
From: Magnus Karlsson @ 2019-07-24 13:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kevin Laatz, Jakub Kicinski, Daniel Borkmann,
	Network Development, ciara.loftus, Alexei Starovoitov,
	intel-wired-lan, Jonathan Lemon, bruce.richardson, bpf,
	Björn Töpel, Karlsson, Magnus

On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Johnathan, Bjorn, Jakub,
> Please review!
> The patch set has been pending for a week.

There is a v3 coming out shortly so I suggest we wait for that. It
will have Mellanox support for this feature too and some clean ups. I
refrained from posting a review on the mailing list due to the merge
window being closed last week, but maybe that was not correct. Should
I still post reviews for new features submitted during the closed
merge window period? I am happy to do it since I do not have any other
tasks during that time. It is a quite period for me. Just let me know.

/Magnus

> On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
> >
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be chunk
> > size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> > place chunks within the umem as well as limiting the packet sizes that are
> > supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP to
> > be more flexible in where it can place a chunk within a umem. By relaxing
> > where the chunks can be placed, it allows us to use an arbitrary buffer
> > size and place that wherever we have a free address in the umem. These
> > changes add the ability to support arbitrary frame sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> > that have their own memory management systems, such as DPDK.
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need to
> > update how we pass around addresses. Currently, the addresses can simply be
> > masked to 2k to get back to the original address. This becomes less trivial
> > when using frame sizes that are not a 'power of 2' size. This patch set
> > modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> > field for an offset value, leaving the lower 48-bits for the address (this
> > leaves us with 256 Terabytes, which should be enough!). We only need to use
> > the upper 16-bits to store the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will store
> > it in the upper 16-bits of the address field. This way, we can easily add
> > the offset to the address where we need it, using some bit manipulation and
> > addition, and we can also easily get the original address wherever we need
> > it (for example in i40e_zca_free) by simply masking to get the lower
> > 48-bits of the address field.
> >
> > The numbers below were recorded with the following set up:
> >   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
> >   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
> >   - Driver: i40e
> >   - Application: xdpsock with l2fwd (single interface)
> >
> > These are solely for comparing performance with and without the patches.
> > The largest drop was ~1% (in zero-copy mode).
> >
> > +-------------------------+------------+-----------------+-------------+
> > | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
> > +-------------------------+------------+-----------------+-------------+
> > | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> > | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> > | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> >
> > NOTE: We are currently working on the changes required in the Mellanox
> > driver. We will include these in the v3.
> >
> > Structure of the patchset:
> > Patch 1:
> >   - Remove unnecessary masking and headroom addition during zero-copy Rx
> >     buffer recycling in i40e. This change is required in order for the
> >     buffer recycling to work in the unaligned chunk mode.
> >
> > Patch 2:
> >   - Remove unnecessary masking and headroom addition during
> >     zero-copy Rx buffer recycling in ixgbe. This change is required in
> >     order for the  buffer recycling to work in the unaligned chunk mode.
> >
> > Patch 3:
> >   - Add infrastructure for unaligned chunks. Since we are dealing with
> >     unaligned chunks that could potentially cross a physical page boundary,
> >     we add checks to keep track of that information. We can later use this
> >     information to correctly handle buffers that are placed at an address
> >     where they cross a page boundary.  This patch also modifies the
> >     existing Rx and Tx functions to use the new descriptor format. To
> >     handle addresses correctly, we need to mask appropriately based on
> >     whether we are in aligned or unaligned mode.
> >
> > Patch 4:
> >   - This patch updates the i40e driver to make use of the new descriptor
> >     format. The new format is particularly useful here since we can now
> >     retrieve the original address in places like i40e_zca_free with ease.
> >     This saves us doing various calculations to get the original address
> >     back.
> >
> > Patch 5:
> >   - This patch updates the ixgbe driver to make use of the new descriptor
> >     format. The new format is particularly useful here since we can now
> >     retrieve the original address in places like ixgbe_zca_free with ease.
> >     This saves us doing various calculations to get the original address
> >     back.
> >
> > Patch 6:
> >   - Add flags for umem configuration to libbpf
> >
> > Patch 7:
> >   - Modify xdpsock application to add a command line option for
> >     unaligned chunks
> >
> > Patch 8:
> >   - Since we can now run the application in unaligned chunk mode, we need
> >     to make sure we recycle the buffers appropriately.
> >
> > Patch 9:
> >   - Adds hugepage support to the xdpsock application
> >
> > Patch 10:
> >   - Documentation update to include the unaligned chunk scenario. We need
> >     to explicitly state that the incoming addresses are only masked in the
> >     aligned chunk mode and not the unaligned chunk mode.
> >
> > ---
> > v2:
> >   - fixed checkpatch issues
> >   - fixed Rx buffer recycling for unaligned chunks in xdpsock
> >   - removed unused defines
> >   - fixed how chunk_size is calculated in xsk_diag.c
> >   - added some performance numbers to cover letter
> >   - modified descriptor format to make it easier to retrieve original
> >     address
> >   - removed patch adding off_t off to the zero copy allocator. This is no
> >     longer needed with the new descriptor format.
> >
> > Kevin Laatz (10):
> >   i40e: simplify Rx buffer recycle
> >   ixgbe: simplify Rx buffer recycle
> >   xsk: add support to allow unaligned chunk placement
> >   i40e: modify driver for handling offsets
> >   ixgbe: modify driver for handling offsets
> >   libbpf: add flags to umem config
> >   samples/bpf: add unaligned chunks mode support to xdpsock
> >   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
> >   samples/bpf: use hugepages in xdpsock app
> >   doc/af_xdp: include unaligned chunk case
> >
> >  Documentation/networking/af_xdp.rst          | 10 ++-
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
> >  include/net/xdp_sock.h                       |  2 +
> >  include/uapi/linux/if_xdp.h                  |  9 ++
> >  net/xdp/xdp_umem.c                           | 17 ++--
> >  net/xdp/xsk.c                                | 89 ++++++++++++++++----
> >  net/xdp/xsk_diag.c                           |  2 +-
> >  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
> >  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
> >  tools/include/uapi/linux/if_xdp.h            |  4 +
> >  tools/lib/bpf/xsk.c                          |  3 +
> >  tools/lib/bpf/xsk.h                          |  2 +
> >  13 files changed, 266 insertions(+), 81 deletions(-)
> >
> > --
> > 2.17.1
> >
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
@ 2019-07-24 13:25       ` Magnus Karlsson
  0 siblings, 0 replies; 274+ messages in thread
From: Magnus Karlsson @ 2019-07-24 13:25 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Johnathan, Bjorn, Jakub,
> Please review!
> The patch set has been pending for a week.

There is a v3 coming out shortly so I suggest we wait for that. It
will have Mellanox support for this feature too and some clean ups. I
refrained from posting a review on the mailing list due to the merge
window being closed last week, but maybe that was not correct. Should
I still post reviews for new features submitted during the closed
merge window period? I am happy to do it since I do not have any other
tasks during that time. It is a quite period for me. Just let me know.

/Magnus

> On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
> >
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be chunk
> > size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> > place chunks within the umem as well as limiting the packet sizes that are
> > supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP to
> > be more flexible in where it can place a chunk within a umem. By relaxing
> > where the chunks can be placed, it allows us to use an arbitrary buffer
> > size and place that wherever we have a free address in the umem. These
> > changes add the ability to support arbitrary frame sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> > that have their own memory management systems, such as DPDK.
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need to
> > update how we pass around addresses. Currently, the addresses can simply be
> > masked to 2k to get back to the original address. This becomes less trivial
> > when using frame sizes that are not a 'power of 2' size. This patch set
> > modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> > field for an offset value, leaving the lower 48-bits for the address (this
> > leaves us with 256 Terabytes, which should be enough!). We only need to use
> > the upper 16-bits to store the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will store
> > it in the upper 16-bits of the address field. This way, we can easily add
> > the offset to the address where we need it, using some bit manipulation and
> > addition, and we can also easily get the original address wherever we need
> > it (for example in i40e_zca_free) by simply masking to get the lower
> > 48-bits of the address field.
> >
> > The numbers below were recorded with the following set up:
> >   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
> >   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
> >   - Driver: i40e
> >   - Application: xdpsock with l2fwd (single interface)
> >
> > These are solely for comparing performance with and without the patches.
> > The largest drop was ~1% (in zero-copy mode).
> >
> > +-------------------------+------------+-----------------+-------------+
> > | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
> > +-------------------------+------------+-----------------+-------------+
> > | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> > | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> > | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> >
> > NOTE: We are currently working on the changes required in the Mellanox
> > driver. We will include these in the v3.
> >
> > Structure of the patchset:
> > Patch 1:
> >   - Remove unnecessary masking and headroom addition during zero-copy Rx
> >     buffer recycling in i40e. This change is required in order for the
> >     buffer recycling to work in the unaligned chunk mode.
> >
> > Patch 2:
> >   - Remove unnecessary masking and headroom addition during
> >     zero-copy Rx buffer recycling in ixgbe. This change is required in
> >     order for the  buffer recycling to work in the unaligned chunk mode.
> >
> > Patch 3:
> >   - Add infrastructure for unaligned chunks. Since we are dealing with
> >     unaligned chunks that could potentially cross a physical page boundary,
> >     we add checks to keep track of that information. We can later use this
> >     information to correctly handle buffers that are placed at an address
> >     where they cross a page boundary.  This patch also modifies the
> >     existing Rx and Tx functions to use the new descriptor format. To
> >     handle addresses correctly, we need to mask appropriately based on
> >     whether we are in aligned or unaligned mode.
> >
> > Patch 4:
> >   - This patch updates the i40e driver to make use of the new descriptor
> >     format. The new format is particularly useful here since we can now
> >     retrieve the original address in places like i40e_zca_free with ease.
> >     This saves us doing various calculations to get the original address
> >     back.
> >
> > Patch 5:
> >   - This patch updates the ixgbe driver to make use of the new descriptor
> >     format. The new format is particularly useful here since we can now
> >     retrieve the original address in places like ixgbe_zca_free with ease.
> >     This saves us doing various calculations to get the original address
> >     back.
> >
> > Patch 6:
> >   - Add flags for umem configuration to libbpf
> >
> > Patch 7:
> >   - Modify xdpsock application to add a command line option for
> >     unaligned chunks
> >
> > Patch 8:
> >   - Since we can now run the application in unaligned chunk mode, we need
> >     to make sure we recycle the buffers appropriately.
> >
> > Patch 9:
> >   - Adds hugepage support to the xdpsock application
> >
> > Patch 10:
> >   - Documentation update to include the unaligned chunk scenario. We need
> >     to explicitly state that the incoming addresses are only masked in the
> >     aligned chunk mode and not the unaligned chunk mode.
> >
> > ---
> > v2:
> >   - fixed checkpatch issues
> >   - fixed Rx buffer recycling for unaligned chunks in xdpsock
> >   - removed unused defines
> >   - fixed how chunk_size is calculated in xsk_diag.c
> >   - added some performance numbers to cover letter
> >   - modified descriptor format to make it easier to retrieve original
> >     address
> >   - removed patch adding off_t off to the zero copy allocator. This is no
> >     longer needed with the new descriptor format.
> >
> > Kevin Laatz (10):
> >   i40e: simplify Rx buffer recycle
> >   ixgbe: simplify Rx buffer recycle
> >   xsk: add support to allow unaligned chunk placement
> >   i40e: modify driver for handling offsets
> >   ixgbe: modify driver for handling offsets
> >   libbpf: add flags to umem config
> >   samples/bpf: add unaligned chunks mode support to xdpsock
> >   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
> >   samples/bpf: use hugepages in xdpsock app
> >   doc/af_xdp: include unaligned chunk case
> >
> >  Documentation/networking/af_xdp.rst          | 10 ++-
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
> >  include/net/xdp_sock.h                       |  2 +
> >  include/uapi/linux/if_xdp.h                  |  9 ++
> >  net/xdp/xdp_umem.c                           | 17 ++--
> >  net/xdp/xsk.c                                | 89 ++++++++++++++++----
> >  net/xdp/xsk_diag.c                           |  2 +-
> >  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
> >  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
> >  tools/include/uapi/linux/if_xdp.h            |  4 +
> >  tools/lib/bpf/xsk.c                          |  3 +
> >  tools/lib/bpf/xsk.h                          |  2 +
> >  13 files changed, 266 insertions(+), 81 deletions(-)
> >
> > --
> > 2.17.1
> >
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25  2:22         ` Jakub Kicinski
  -1 siblings, 0 replies; 274+ messages in thread
From: Jakub Kicinski @ 2019-07-25  2:22 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jonathan.lemon, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan

On Wed, 24 Jul 2019 05:10:35 +0000, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
> 
> v3:
>   - Add helper function to do address/offset masking/addition
> ---
>  include/net/xdp_sock.h      | 17 ++++++++
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 18 +++++---
>  net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>  6 files changed, 170 insertions(+), 30 deletions(-)
> 
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..738996c0f995 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> +	bool next_pg_contig;

IIRC accesses to xdp_umem_page case a lot of cache misses, so having
this structure grow from 16 to 24B is a little unfortunate :(
Can we try to steal lower bits of addr or dma? Or perhaps not pre
compute this info at all?

>  };
>  
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u32 flags;
>  };
>  
>  struct xdp_sock {
> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>  
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  {
> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>  {
>  }
>  
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	return NULL;

	return 0?

> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>  
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..f8dc68fcdf78 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>  
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>  
>  struct xdp_statistics {
> @@ -74,6 +78,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>  
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..952ca22103e9 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>  
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>  
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))

parens unnecessary, consider adding a define for known flags.

> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>  
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>  
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>  
>  	headroom = ALIGN(headroom, 64);
>  
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>  
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..b3ab653091c4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>  
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>  
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>  
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}

Why handle this case gracefully? Real XSK use is the zero copy mode,
having extra code to make copy mode more permissive seems a little
counter productive IMHO.

>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>  
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>  
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>  
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	else
> +		addr += offset;
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  	u32 len = xdp->data_end - xdp->data;
>  	void *buffer;
>  	u64 addr;
> +	u64 offset = xs->umem->headroom;

reverse xmas tree, please

>  	int err;
>  
>  	spin_lock_bh(&xs->rx_lock);


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-25  2:22         ` Jakub Kicinski
  0 siblings, 0 replies; 274+ messages in thread
From: Jakub Kicinski @ 2019-07-25  2:22 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 24 Jul 2019 05:10:35 +0000, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
> 
> v3:
>   - Add helper function to do address/offset masking/addition
> ---
>  include/net/xdp_sock.h      | 17 ++++++++
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 18 +++++---
>  net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>  6 files changed, 170 insertions(+), 30 deletions(-)
> 
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..738996c0f995 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> +	bool next_pg_contig;

IIRC accesses to xdp_umem_page case a lot of cache misses, so having
this structure grow from 16 to 24B is a little unfortunate :(
Can we try to steal lower bits of addr or dma? Or perhaps not pre
compute this info at all?

>  };
>  
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u32 flags;
>  };
>  
>  struct xdp_sock {
> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>  
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  {
> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>  {
>  }
>  
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	return NULL;

	return 0?

> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>  
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..f8dc68fcdf78 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>  
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>  
>  struct xdp_statistics {
> @@ -74,6 +78,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>  
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..952ca22103e9 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>  
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>  
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))

parens unnecessary, consider adding a define for known flags.

> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>  
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>  
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>  
>  	headroom = ALIGN(headroom, 64);
>  
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>  
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..b3ab653091c4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>  
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>  
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>  
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}

Why handle this case gracefully? Real XSK use is the zero copy mode,
having extra code to make copy mode more permissive seems a little
counter productive IMHO.

>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>  
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>  
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>  
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	else
> +		addr += offset;
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  	u32 len = xdp->data_end - xdp->data;
>  	void *buffer;
>  	u64 addr;
> +	u64 offset = xs->umem->headroom;

reverse xmas tree, please

>  	int err;
>  
>  	spin_lock_bh(&xs->rx_lock);


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25  9:27         ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25  9:27 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> ---
>   include/net/xdp_sock.h      | 17 ++++++++
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 18 +++++---
>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>   6 files changed, 170 insertions(+), 30 deletions(-)
> 

<...>

> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);

Let addr = 0x12345, PAGE_SIZE = 0x1000, len = 0x1000. Your calculations 
lead to page_start = 0x345, first_len = 0x1000 - 0x12000, which is 
negative. I think page_start is calculated incorrectly (is ~ missing?).

> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +

<...>

> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||

Addresses like 0x00aaffffffffffff will pass the validation (0xaa + 
0xffffffffffff will overflow mod 2^48 and become a small number), 
whereas such addresses don't look valid for me.

> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {

If the region is not contiguous, we cant RX into it - that's clear. 
However, how can the userspace determine whether these two pages of UMEM 
are mapped contiguously in the DMA space? Are we going to silently drop 
descriptors to non-contiguous frames and leak them? Please explain how 
to use it correctly from the application side.

> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}

<...>


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-25  9:27         ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25  9:27 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> ---
>   include/net/xdp_sock.h      | 17 ++++++++
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 18 +++++---
>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>   6 files changed, 170 insertions(+), 30 deletions(-)
> 

<...>

> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);

Let addr = 0x12345, PAGE_SIZE = 0x1000, len = 0x1000. Your calculations 
lead to page_start = 0x345, first_len = 0x1000 - 0x12000, which is 
negative. I think page_start is calculated incorrectly (is ~ missing?).

> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +

<...>

> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||

Addresses like 0x00aaffffffffffff will pass the validation (0xaa + 
0xffffffffffff will overflow mod 2^48 and become a small number), 
whereas such addresses don't look valid for me.

> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {

If the region is not contiguous, we cant RX into it - that's clear. 
However, how can the userspace determine whether these two pages of UMEM 
are mapped contiguously in the DMA space? Are we going to silently drop 
descriptors to non-contiguous frames and leak them? Please explain how 
to use it correctly from the application side.

> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}

<...>


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25  9:43         ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25  9:43 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> This patch adds support for the unaligned chunks mode. The addition of the
> unaligned chunks option will allow users to run the application with more
> relaxed chunk placement in the XDP umem.
> 
> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
> line options.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>   samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
>   1 file changed, 15 insertions(+), 2 deletions(-)

<...>

> @@ -372,6 +378,7 @@ static void usage(const char *prog)
>   		"  -z, --zero-copy      Force zero-copy mode.\n"
>   		"  -c, --copy           Force copy mode.\n"
>   		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"

Help text for -f has to be updated, it doesn't have to be a power of two 
if -u is specified.

> +		"  -u, --unaligned	Enable unaligned chunk placement\n"

<...>

>   
> -	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
> +	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
> +			!opt_unaligned_chunks) {
>   		fprintf(stderr, "--frame-size=%d is not a power of two\n",
>   			opt_xsk_frame_size);
>   		usage(basename(argv[0]));
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-07-25  9:43         ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25  9:43 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> This patch adds support for the unaligned chunks mode. The addition of the
> unaligned chunks option will allow users to run the application with more
> relaxed chunk placement in the XDP umem.
> 
> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
> line options.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>   samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
>   1 file changed, 15 insertions(+), 2 deletions(-)

<...>

> @@ -372,6 +378,7 @@ static void usage(const char *prog)
>   		"  -z, --zero-copy      Force zero-copy mode.\n"
>   		"  -c, --copy           Force copy mode.\n"
>   		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"

Help text for -f has to be updated, it doesn't have to be a power of two 
if -u is specified.

> +		"  -u, --unaligned	Enable unaligned chunk placement\n"

<...>

>   
> -	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
> +	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
> +			!opt_unaligned_chunks) {
>   		fprintf(stderr, "--frame-size=%d is not a power of two\n",
>   			opt_xsk_frame_size);
>   		usage(basename(argv[0]));
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25 10:08         ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25 10:08 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> ---
>   include/net/xdp_sock.h      | 17 ++++++++
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 18 +++++---
>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>   6 files changed, 170 insertions(+), 30 deletions(-)

<...>

> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}

|= and += are not necessary here, | and + are enough.

In the unaligned mode, it's not supported to run xsk_umem_handle_offset 
multiple times, and it's assumed that offset is zero. I'll explain the 
need in my following comment to patch 6.

<...>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-25 10:08         ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25 10:08 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> ---
>   include/net/xdp_sock.h      | 17 ++++++++
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 18 +++++---
>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>   6 files changed, 170 insertions(+), 30 deletions(-)

<...>

> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}

|= and += are not necessary here, | and + are enough.

In the unaligned mode, it's not supported to run xsk_umem_handle_offset 
multiple times, and it's assumed that offset is zero. I'll explain the 
need in my following comment to patch 6.

<...>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
  2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25 10:15         ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25 10:15 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> 
> ---
> v3:
>    - Use new helper function to handle offset
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>   drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
>   2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index b0b982cf69bb..d5245893d2c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>   {
>   	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
> +	struct xdp_umem *umem = rq->umem;
>   	struct xdp_buff xdp;
>   	u32 act;
>   	int err;
> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	xdp.rxq = &rq->xdp_rxq;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
> -	if (xsk)
> -		xdp.handle += xdp.data - xdp.data_hard_start;
> +	if (xsk) {
> +		u64 off = xdp.data - xdp.data_hard_start;
> +
> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
> +	}

What's missed is that umem_headroom is added to handle directly in 
mlx5e_xsk_page_alloc_umem. In my understanding umem_headroom should go 
to the offset part (high 16 bits) of the handle, to 
xsk_umem_handle_offset has to support increasing the offset.

>   	switch (act) {
>   	case XDP_PASS:
>   		*rx_headroom = xdp.data - xdp.data_hard_start;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> index 35e188cf4ea4..f596e63cba00 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> @@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   	struct mlx5e_xdp_xmit_data xdptxd;
>   	bool work_done = true;
>   	bool flush = false;
> +	u64 addr, offset;
>   
>   	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
>   
> @@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   			break;
>   		}
>   
> -		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
> -		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
> +		/* for unaligned chunks need to take offset from upper bits */
> +		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
> +
> +		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
> +		xdptxd.data = xdp_umem_get_data(umem, addr + offset);

Why can't these calculations be encapsulated into 
xdp_umem_get_{dma,data}? I think they are common for all drivers, aren't 
they?

Even if there is some reason not to put this bitshifting stuff into 
xdp_umem_get_* functions, I suggest to encapsulate it into a function 
anyway, because it's a good idea to keep those calculations in a single 
place.

>   		xdptxd.len = desc.len;
>   
>   		dma_sync_single_for_device(sq->pdev, xdptxd.dma_addr,
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
@ 2019-07-25 10:15         ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-07-25 10:15 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-07-24 08:10, Kevin Laatz wrote:
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> 
> ---
> v3:
>    - Use new helper function to handle offset
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>   drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
>   2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index b0b982cf69bb..d5245893d2c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>   {
>   	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
> +	struct xdp_umem *umem = rq->umem;
>   	struct xdp_buff xdp;
>   	u32 act;
>   	int err;
> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	xdp.rxq = &rq->xdp_rxq;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
> -	if (xsk)
> -		xdp.handle += xdp.data - xdp.data_hard_start;
> +	if (xsk) {
> +		u64 off = xdp.data - xdp.data_hard_start;
> +
> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
> +	}

What's missed is that umem_headroom is added to handle directly in 
mlx5e_xsk_page_alloc_umem. In my understanding umem_headroom should go 
to the offset part (high 16 bits) of the handle, to 
xsk_umem_handle_offset has to support increasing the offset.

>   	switch (act) {
>   	case XDP_PASS:
>   		*rx_headroom = xdp.data - xdp.data_hard_start;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> index 35e188cf4ea4..f596e63cba00 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> @@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   	struct mlx5e_xdp_xmit_data xdptxd;
>   	bool work_done = true;
>   	bool flush = false;
> +	u64 addr, offset;
>   
>   	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
>   
> @@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   			break;
>   		}
>   
> -		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
> -		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
> +		/* for unaligned chunks need to take offset from upper bits */
> +		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
> +
> +		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
> +		xdptxd.data = xdp_umem_get_data(umem, addr + offset);

Why can't these calculations be encapsulated into 
xdp_umem_get_{dma,data}? I think they are common for all drivers, aren't 
they?

Even if there is some reason not to put this bitshifting stuff into 
xdp_umem_get_* functions, I suggest to encapsulate it into a function 
anyway, because it's a good idea to keep those calculations in a single 
place.

>   		xdptxd.len = desc.len;
>   
>   		dma_sync_single_for_device(sq->pdev, xdptxd.dma_addr,
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25 15:39         ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 15:39 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 23 Jul 2019, at 22:10, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
> ---
>  include/net/xdp_sock.h      | 17 ++++++++
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 18 +++++---
>  net/xdp/xsk.c               | 86 
> ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>  6 files changed, 170 insertions(+), 30 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..738996c0f995 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> +	bool next_pg_contig;
>  };
>
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u32 flags;
>  };
>
>  struct xdp_sock {
> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}

This should be something like 'xsk_umem_adjust_offset()', and use "+=" 
for both cases.


>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>  {
>  }
>
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return NULL;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..f8dc68fcdf78 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>
>  struct xdp_statistics {
> @@ -74,6 +78,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..952ca22103e9 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);

The handle needs to be cleaned (reset to base address) when removed
from the fill queue or recycle stack.  This will not provide the correct
semantics for unaligned mode.


>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..b3ab653091c4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	else
> +		addr += offset;
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  	u32 len = xdp->data_end - xdp->data;
>  	void *buffer;
>  	u64 addr;
> +	u64 offset = xs->umem->headroom;
>  	int err;
>
>  	spin_lock_bh(&xs->rx_lock);
> @@ -136,17 +162,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_handle_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;
> @@ -190,7 +216,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>
>  		skb_put(skb, len);
>  		addr = desc.addr;
> +		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
> +				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);

This doesn't look right to me.  Shouldn't it be "(addr & mask) + (addr 
 >> shift)"?
I'd also prefer to see this type of logic in an inline/macro

> +
>  		buffer = xdp_umem_get_data(xs->umem, addr);
>  		err = skb_store_bits(skb, 0, buffer, len);
>  		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
> @@ -272,7 +302,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -412,6 +442,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	int i;
> +
> +	if (flags & XDP_ZEROCOPY) {
> +		for (i = 0; i < umem->npgs - 1; i++)
> +			umem->pages[i].next_pg_contig =
> +					(umem->pages[i].dma + PAGE_SIZE ==
> +						umem->pages[i + 1].dma);
> +		return;
> +	}
> +
> +	for (i = 0; i < umem->npgs - 1; i++)
> +		umem->pages[i].next_pg_contig =
> +				(umem->pages[i].addr + PAGE_SIZE ==
> +					umem->pages[i + 1].addr);
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -500,6 +552,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 909c5168ed0f..0d77212367f0 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -143,23 +153,50 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -170,7 +207,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -229,8 +266,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -244,14 +294,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -261,7 +312,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -272,7 +324,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-25 15:39         ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 15:39 UTC (permalink / raw)
  To: intel-wired-lan



On 23 Jul 2019, at 22:10, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
> ---
>  include/net/xdp_sock.h      | 17 ++++++++
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 18 +++++---
>  net/xdp/xsk.c               | 86 
> ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>  6 files changed, 170 insertions(+), 30 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..738996c0f995 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> +	bool next_pg_contig;
>  };
>
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u32 flags;
>  };
>
>  struct xdp_sock {
> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}

This should be something like 'xsk_umem_adjust_offset()', and use "+=" 
for both cases.


>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>  {
>  }
>
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return NULL;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..f8dc68fcdf78 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>
>  struct xdp_statistics {
> @@ -74,6 +78,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..952ca22103e9 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);

The handle needs to be cleaned (reset to base address) when removed
from the fill queue or recycle stack.  This will not provide the correct
semantics for unaligned mode.


>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..b3ab653091c4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	else
> +		addr += offset;
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  	u32 len = xdp->data_end - xdp->data;
>  	void *buffer;
>  	u64 addr;
> +	u64 offset = xs->umem->headroom;
>  	int err;
>
>  	spin_lock_bh(&xs->rx_lock);
> @@ -136,17 +162,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_handle_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;
> @@ -190,7 +216,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>
>  		skb_put(skb, len);
>  		addr = desc.addr;
> +		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
> +				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);

This doesn't look right to me.  Shouldn't it be "(addr & mask) + (addr 
 >> shift)"?
I'd also prefer to see this type of logic in an inline/macro

> +
>  		buffer = xdp_umem_get_data(xs->umem, addr);
>  		err = skb_store_bits(skb, 0, buffer, len);
>  		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
> @@ -272,7 +302,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -412,6 +442,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	int i;
> +
> +	if (flags & XDP_ZEROCOPY) {
> +		for (i = 0; i < umem->npgs - 1; i++)
> +			umem->pages[i].next_pg_contig =
> +					(umem->pages[i].dma + PAGE_SIZE ==
> +						umem->pages[i + 1].dma);
> +		return;
> +	}
> +
> +	for (i = 0; i < umem->npgs - 1; i++)
> +		umem->pages[i].next_pg_contig =
> +				(umem->pages[i].addr + PAGE_SIZE ==
> +					umem->pages[i + 1].addr);
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -500,6 +552,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 909c5168ed0f..0d77212367f0 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -143,23 +153,50 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -170,7 +207,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -229,8 +266,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -244,14 +294,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -261,7 +312,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -272,7 +324,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-25 15:39       ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 15:39 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 23 Jul 2019, at 22:10, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_fr-- ee) by simply masking to get the lower
> 48-bits of the address field.

I wonder if it would be better to break backwards compatibility here and
say that a handle is going to change from [addr] to [base | offset], or
even [index | offset], where address = (index * chunk size) + offset, and
then use accessor macros to manipulate the queue entries.

This way, the XDP hotpath can adjust the handle with simple arithmetic,
bypassing the "if (unaligned)", check, as it changes the offset directly.

Using a chunk index instead of a base address is safer, otherwise it is
too easy to corrupt things.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
@ 2019-07-25 15:39       ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 15:39 UTC (permalink / raw)
  To: intel-wired-lan



On 23 Jul 2019, at 22:10, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_fr-- ee) by simply masking to get the lower
> 48-bits of the address field.

I wonder if it would be better to break backwards compatibility here and
say that a handle is going to change from [addr] to [base | offset], or
even [index | offset], where address = (index * chunk size) + offset, and
then use accessor macros to manipulate the queue entries.

This way, the XDP hotpath can adjust the handle with simple arithmetic,
bypassing the "if (unaligned)", check, as it changes the offset directly.

Using a chunk index instead of a base address is safer, otherwise it is
too easy to corrupt things.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
  2019-07-24 13:25       ` Magnus Karlsson
@ 2019-07-25 15:43         ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 15:43 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Alexei Starovoitov, Kevin Laatz, Jakub Kicinski, Daniel Borkmann,
	Network Development, ciara.loftus, Alexei Starovoitov,
	intel-wired-lan, bruce.richardson, bpf, Björn Töpel,
	Karlsson, Magnus



On 24 Jul 2019, at 6:25, Magnus Karlsson wrote:

> On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> Johnathan, Bjorn, Jakub,
>> Please review!
>> The patch set has been pending for a week.
>
> There is a v3 coming out shortly so I suggest we wait for that. It
> will have Mellanox support for this feature too and some clean ups. I
> refrained from posting a review on the mailing list due to the merge
> window being closed last week, but maybe that was not correct. Should
> I still post reviews for new features submitted during the closed
> merge window period? I am happy to do it since I do not have any other
> tasks during that time. It is a quite period for me. Just let me know.

Same here - last time I posted a patch when the merge window was closed,
it was signaled to me (Hi Jakub!) to wait until it reopened.
-- 
Jonathan



>
> /Magnus
>
>> On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> 
>> wrote:
>>>
>>> This patch set adds the ability to use unaligned chunks in the XDP 
>>> umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be 
>>> chunk
>>> size aligned (default is 2k, max is PAGE_SIZE). This limits where we 
>>> can
>>> place chunks within the umem as well as limiting the packet sizes 
>>> that are
>>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing 
>>> XDP to
>>> be more flexible in where it can place a chunk within a umem. By 
>>> relaxing
>>> where the chunks can be placed, it allows us to use an arbitrary 
>>> buffer
>>> size and place that wherever we have a free address in the umem. 
>>> These
>>> changes add the ability to support arbitrary frame sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing 
>>> frameworks
>>> that have their own memory management systems, such as DPDK.
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also 
>>> need to
>>> update how we pass around addresses. Currently, the addresses can 
>>> simply be
>>> masked to 2k to get back to the original address. This becomes less 
>>> trivial
>>> when using frame sizes that are not a 'power of 2' size. This patch 
>>> set
>>> modifies the Rx/Tx descriptor format to use the upper 16-bits of the 
>>> addr
>>> field for an offset value, leaving the lower 48-bits for the address 
>>> (this
>>> leaves us with 256 Terabytes, which should be enough!). We only need 
>>> to use
>>> the upper 16-bits to store the offset when running in unaligned 
>>> mode.
>>> Rather than adding the offset (headroom etc) to the address, we will 
>>> store
>>> it in the upper 16-bits of the address field. This way, we can 
>>> easily add
>>> the offset to the address where we need it, using some bit 
>>> manipulation and
>>> addition, and we can also easily get the original address wherever 
>>> we need
>>> it (for example in i40e_zca_free) by simply masking to get the lower
>>> 48-bits of the address field.
>>>
>>> The numbers below were recorded with the following set up:
>>>   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>>>   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 
>>> (rev 02)
>>>   - Driver: i40e
>>>   - Application: xdpsock with l2fwd (single interface)
>>>
>>> These are solely for comparing performance with and without the 
>>> patches.
>>> The largest drop was ~1% (in zero-copy mode).
>>>
>>> +-------------------------+------------+-----------------+-------------+
>>> | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy      
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>>
>>> NOTE: We are currently working on the changes required in the 
>>> Mellanox
>>> driver. We will include these in the v3.
>>>
>>> Structure of the patchset:
>>> Patch 1:
>>>   - Remove unnecessary masking and headroom addition during 
>>> zero-copy Rx
>>>     buffer recycling in i40e. This change is required in order for 
>>> the
>>>     buffer recycling to work in the unaligned chunk mode.
>>>
>>> Patch 2:
>>>   - Remove unnecessary masking and headroom addition during
>>>     zero-copy Rx buffer recycling in ixgbe. This change is required 
>>> in
>>>     order for the  buffer recycling to work in the unaligned chunk 
>>> mode.
>>>
>>> Patch 3:
>>>   - Add infrastructure for unaligned chunks. Since we are dealing 
>>> with
>>>     unaligned chunks that could potentially cross a physical page 
>>> boundary,
>>>     we add checks to keep track of that information. We can later 
>>> use this
>>>     information to correctly handle buffers that are placed at an 
>>> address
>>>     where they cross a page boundary.  This patch also modifies the
>>>     existing Rx and Tx functions to use the new descriptor format. 
>>> To
>>>     handle addresses correctly, we need to mask appropriately based 
>>> on
>>>     whether we are in aligned or unaligned mode.
>>>
>>> Patch 4:
>>>   - This patch updates the i40e driver to make use of the new 
>>> descriptor
>>>     format. The new format is particularly useful here since we can 
>>> now
>>>     retrieve the original address in places like i40e_zca_free with 
>>> ease.
>>>     This saves us doing various calculations to get the original 
>>> address
>>>     back.
>>>
>>> Patch 5:
>>>   - This patch updates the ixgbe driver to make use of the new 
>>> descriptor
>>>     format. The new format is particularly useful here since we can 
>>> now
>>>     retrieve the original address in places like ixgbe_zca_free with 
>>> ease.
>>>     This saves us doing various calculations to get the original 
>>> address
>>>     back.
>>>
>>> Patch 6:
>>>   - Add flags for umem configuration to libbpf
>>>
>>> Patch 7:
>>>   - Modify xdpsock application to add a command line option for
>>>     unaligned chunks
>>>
>>> Patch 8:
>>>   - Since we can now run the application in unaligned chunk mode, we 
>>> need
>>>     to make sure we recycle the buffers appropriately.
>>>
>>> Patch 9:
>>>   - Adds hugepage support to the xdpsock application
>>>
>>> Patch 10:
>>>   - Documentation update to include the unaligned chunk scenario. We 
>>> need
>>>     to explicitly state that the incoming addresses are only masked 
>>> in the
>>>     aligned chunk mode and not the unaligned chunk mode.
>>>
>>> ---
>>> v2:
>>>   - fixed checkpatch issues
>>>   - fixed Rx buffer recycling for unaligned chunks in xdpsock
>>>   - removed unused defines
>>>   - fixed how chunk_size is calculated in xsk_diag.c
>>>   - added some performance numbers to cover letter
>>>   - modified descriptor format to make it easier to retrieve 
>>> original
>>>     address
>>>   - removed patch adding off_t off to the zero copy allocator. This 
>>> is no
>>>     longer needed with the new descriptor format.
>>>
>>> Kevin Laatz (10):
>>>   i40e: simplify Rx buffer recycle
>>>   ixgbe: simplify Rx buffer recycle
>>>   xsk: add support to allow unaligned chunk placement
>>>   i40e: modify driver for handling offsets
>>>   ixgbe: modify driver for handling offsets
>>>   libbpf: add flags to umem config
>>>   samples/bpf: add unaligned chunks mode support to xdpsock
>>>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>>>   samples/bpf: use hugepages in xdpsock app
>>>   doc/af_xdp: include unaligned chunk case
>>>
>>>  Documentation/networking/af_xdp.rst          | 10 ++-
>>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
>>>  include/net/xdp_sock.h                       |  2 +
>>>  include/uapi/linux/if_xdp.h                  |  9 ++
>>>  net/xdp/xdp_umem.c                           | 17 ++--
>>>  net/xdp/xsk.c                                | 89 
>>> ++++++++++++++++----
>>>  net/xdp/xsk_diag.c                           |  2 +-
>>>  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
>>>  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
>>>  tools/include/uapi/linux/if_xdp.h            |  4 +
>>>  tools/lib/bpf/xsk.c                          |  3 +
>>>  tools/lib/bpf/xsk.h                          |  2 +
>>>  13 files changed, 266 insertions(+), 81 deletions(-)
>>>
>>> --
>>> 2.17.1
>>>
>> _______________________________________________
>> Intel-wired-lan mailing list
>> Intel-wired-lan@osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
@ 2019-07-25 15:43         ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 15:43 UTC (permalink / raw)
  To: intel-wired-lan



On 24 Jul 2019, at 6:25, Magnus Karlsson wrote:

> On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> Johnathan, Bjorn, Jakub,
>> Please review!
>> The patch set has been pending for a week.
>
> There is a v3 coming out shortly so I suggest we wait for that. It
> will have Mellanox support for this feature too and some clean ups. I
> refrained from posting a review on the mailing list due to the merge
> window being closed last week, but maybe that was not correct. Should
> I still post reviews for new features submitted during the closed
> merge window period? I am happy to do it since I do not have any other
> tasks during that time. It is a quite period for me. Just let me know.

Same here - last time I posted a patch when the merge window was closed,
it was signaled to me (Hi Jakub!) to wait until it reopened.
-- 
Jonathan



>
> /Magnus
>
>> On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> 
>> wrote:
>>>
>>> This patch set adds the ability to use unaligned chunks in the XDP 
>>> umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be 
>>> chunk
>>> size aligned (default is 2k, max is PAGE_SIZE). This limits where we 
>>> can
>>> place chunks within the umem as well as limiting the packet sizes 
>>> that are
>>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing 
>>> XDP to
>>> be more flexible in where it can place a chunk within a umem. By 
>>> relaxing
>>> where the chunks can be placed, it allows us to use an arbitrary 
>>> buffer
>>> size and place that wherever we have a free address in the umem. 
>>> These
>>> changes add the ability to support arbitrary frame sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing 
>>> frameworks
>>> that have their own memory management systems, such as DPDK.
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also 
>>> need to
>>> update how we pass around addresses. Currently, the addresses can 
>>> simply be
>>> masked to 2k to get back to the original address. This becomes less 
>>> trivial
>>> when using frame sizes that are not a 'power of 2' size. This patch 
>>> set
>>> modifies the Rx/Tx descriptor format to use the upper 16-bits of the 
>>> addr
>>> field for an offset value, leaving the lower 48-bits for the address 
>>> (this
>>> leaves us with 256 Terabytes, which should be enough!). We only need 
>>> to use
>>> the upper 16-bits to store the offset when running in unaligned 
>>> mode.
>>> Rather than adding the offset (headroom etc) to the address, we will 
>>> store
>>> it in the upper 16-bits of the address field. This way, we can 
>>> easily add
>>> the offset to the address where we need it, using some bit 
>>> manipulation and
>>> addition, and we can also easily get the original address wherever 
>>> we need
>>> it (for example in i40e_zca_free) by simply masking to get the lower
>>> 48-bits of the address field.
>>>
>>> The numbers below were recorded with the following set up:
>>>   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>>>   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 
>>> (rev 02)
>>>   - Driver: i40e
>>>   - Application: xdpsock with l2fwd (single interface)
>>>
>>> These are solely for comparing performance with and without the 
>>> patches.
>>> The largest drop was ~1% (in zero-copy mode).
>>>
>>> +-------------------------+------------+-----------------+-------------+
>>> | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy      
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>>
>>> NOTE: We are currently working on the changes required in the 
>>> Mellanox
>>> driver. We will include these in the v3.
>>>
>>> Structure of the patchset:
>>> Patch 1:
>>>   - Remove unnecessary masking and headroom addition during 
>>> zero-copy Rx
>>>     buffer recycling in i40e. This change is required in order for 
>>> the
>>>     buffer recycling to work in the unaligned chunk mode.
>>>
>>> Patch 2:
>>>   - Remove unnecessary masking and headroom addition during
>>>     zero-copy Rx buffer recycling in ixgbe. This change is required 
>>> in
>>>     order for the  buffer recycling to work in the unaligned chunk 
>>> mode.
>>>
>>> Patch 3:
>>>   - Add infrastructure for unaligned chunks. Since we are dealing 
>>> with
>>>     unaligned chunks that could potentially cross a physical page 
>>> boundary,
>>>     we add checks to keep track of that information. We can later 
>>> use this
>>>     information to correctly handle buffers that are placed at an 
>>> address
>>>     where they cross a page boundary.  This patch also modifies the
>>>     existing Rx and Tx functions to use the new descriptor format. 
>>> To
>>>     handle addresses correctly, we need to mask appropriately based 
>>> on
>>>     whether we are in aligned or unaligned mode.
>>>
>>> Patch 4:
>>>   - This patch updates the i40e driver to make use of the new 
>>> descriptor
>>>     format. The new format is particularly useful here since we can 
>>> now
>>>     retrieve the original address in places like i40e_zca_free with 
>>> ease.
>>>     This saves us doing various calculations to get the original 
>>> address
>>>     back.
>>>
>>> Patch 5:
>>>   - This patch updates the ixgbe driver to make use of the new 
>>> descriptor
>>>     format. The new format is particularly useful here since we can 
>>> now
>>>     retrieve the original address in places like ixgbe_zca_free with 
>>> ease.
>>>     This saves us doing various calculations to get the original 
>>> address
>>>     back.
>>>
>>> Patch 6:
>>>   - Add flags for umem configuration to libbpf
>>>
>>> Patch 7:
>>>   - Modify xdpsock application to add a command line option for
>>>     unaligned chunks
>>>
>>> Patch 8:
>>>   - Since we can now run the application in unaligned chunk mode, we 
>>> need
>>>     to make sure we recycle the buffers appropriately.
>>>
>>> Patch 9:
>>>   - Adds hugepage support to the xdpsock application
>>>
>>> Patch 10:
>>>   - Documentation update to include the unaligned chunk scenario. We 
>>> need
>>>     to explicitly state that the incoming addresses are only masked 
>>> in the
>>>     aligned chunk mode and not the unaligned chunk mode.
>>>
>>> ---
>>> v2:
>>>   - fixed checkpatch issues
>>>   - fixed Rx buffer recycling for unaligned chunks in xdpsock
>>>   - removed unused defines
>>>   - fixed how chunk_size is calculated in xsk_diag.c
>>>   - added some performance numbers to cover letter
>>>   - modified descriptor format to make it easier to retrieve 
>>> original
>>>     address
>>>   - removed patch adding off_t off to the zero copy allocator. This 
>>> is no
>>>     longer needed with the new descriptor format.
>>>
>>> Kevin Laatz (10):
>>>   i40e: simplify Rx buffer recycle
>>>   ixgbe: simplify Rx buffer recycle
>>>   xsk: add support to allow unaligned chunk placement
>>>   i40e: modify driver for handling offsets
>>>   ixgbe: modify driver for handling offsets
>>>   libbpf: add flags to umem config
>>>   samples/bpf: add unaligned chunks mode support to xdpsock
>>>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>>>   samples/bpf: use hugepages in xdpsock app
>>>   doc/af_xdp: include unaligned chunk case
>>>
>>>  Documentation/networking/af_xdp.rst          | 10 ++-
>>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
>>>  include/net/xdp_sock.h                       |  2 +
>>>  include/uapi/linux/if_xdp.h                  |  9 ++
>>>  net/xdp/xdp_umem.c                           | 17 ++--
>>>  net/xdp/xsk.c                                | 89 
>>> ++++++++++++++++----
>>>  net/xdp/xsk_diag.c                           |  2 +-
>>>  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
>>>  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
>>>  tools/include/uapi/linux/if_xdp.h            |  4 +
>>>  tools/lib/bpf/xsk.c                          |  3 +
>>>  tools/lib/bpf/xsk.h                          |  2 +
>>>  13 files changed, 266 insertions(+), 81 deletions(-)
>>>
>>> --
>>> 2.17.1
>>>
>> _______________________________________________
>> Intel-wired-lan mailing list
>> Intel-wired-lan at osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
  2019-07-25 15:39       ` [Intel-wired-lan] " Jonathan Lemon
@ 2019-07-25 15:56         ` Richardson, Bruce
  -1 siblings, 0 replies; 274+ messages in thread
From: Richardson, Bruce @ 2019-07-25 15:56 UTC (permalink / raw)
  To: Jonathan Lemon, Laatz, Kevin
  Cc: netdev, ast, daniel, Topel, Bjorn, Karlsson, Magnus,
	jakub.kicinski, saeedm, maximmi, stephen, Loftus, Ciara, bpf,
	intel-wired-lan



> -----Original Message-----
> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
> Sent: Thursday, July 25, 2019 4:39 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> support
> 
> 
> 
> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> 
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be
> > chunk size aligned (max is PAGE_SIZE). This limits where we can place
> > chunks within the umem as well as limiting the packet sizes that are
> supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP
> > to be more flexible in where it can place a chunk within a umem. By
> > relaxing where the chunks can be placed, it allows us to use an
> > arbitrary buffer size and place that wherever we have a free address
> > in the umem. These changes add the ability to support arbitrary frame
> > sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing
> > frameworks that have their own memory management systems, such as DPDK.
> > In DPDK, for example, there is already support for AF_XDP with zero-
> copy.
> > However, with this patch set the integration will be much more seamless.
> > You can find the DPDK AF_XDP driver at:
> > https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need
> > to update how we pass around addresses. Currently, the addresses can
> > simply be masked to 2k to get back to the original address. This
> > becomes less trivial when using frame sizes that are not a 'power of
> > 2' size. This patch set modifies the Rx/Tx descriptor format to use
> > the upper 16-bits of the addr field for an offset value, leaving the
> > lower 48-bits for the address (this leaves us with 256 Terabytes,
> > which should be enough!). We only need to use the upper 16-bits to store
> the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will
> > store it in the upper 16-bits of the address field. This way, we can
> > easily add the offset to the address where we need it, using some bit
> > manipulation and addition, and we can also easily get the original
> > address wherever we need it (for example in i40e_zca_fr-- ee) by
> > simply masking to get the lower 48-bits of the address field.
> 
> I wonder if it would be better to break backwards compatibility here and
> say that a handle is going to change from [addr] to [base | offset], or
> even [index | offset], where address = (index * chunk size) + offset, and
> then use accessor macros to manipulate the queue entries.
> 
> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> bypassing the "if (unaligned)", check, as it changes the offset directly.
> 
> Using a chunk index instead of a base address is safer, otherwise it is
> too easy to corrupt things.
> --

The trouble with using a chunk index is that it assumes that all chunks are
contiguous, which is not always going to be the case. For example, for
userspace apps the easiest way to get memory that is IOVA/physically 
contiguous is to use hugepages, but even then we still need to skip space
when crossing a 2MB barrier.

Specifically in this example case, with a 3k buffer size and 2MB hugepages,
we'd get 666 buffers on a single page, but then waste a few KB before
starting the 667th buffer at byte 0 on the second 2MB page.
This is the setup used in DPDK, for instance, when we allocate memory for
use in buffer pools. 

Therefore, I think it's better to just have the kernel sanity checking the
request for safety and leave userspace the freedom to decide where in its
memory area it wants to place the buffers.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
@ 2019-07-25 15:56         ` Richardson, Bruce
  0 siblings, 0 replies; 274+ messages in thread
From: Richardson, Bruce @ 2019-07-25 15:56 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Jonathan Lemon [mailto:jonathan.lemon at gmail.com]
> Sent: Thursday, July 25, 2019 4:39 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> saeedm at mellanox.com; maximmi at mellanox.com; stephen at networkplumber.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; bpf at vger.kernel.org; intel-wired-
> lan at lists.osuosl.org
> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> support
> 
> 
> 
> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> 
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be
> > chunk size aligned (max is PAGE_SIZE). This limits where we can place
> > chunks within the umem as well as limiting the packet sizes that are
> supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP
> > to be more flexible in where it can place a chunk within a umem. By
> > relaxing where the chunks can be placed, it allows us to use an
> > arbitrary buffer size and place that wherever we have a free address
> > in the umem. These changes add the ability to support arbitrary frame
> > sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing
> > frameworks that have their own memory management systems, such as DPDK.
> > In DPDK, for example, there is already support for AF_XDP with zero-
> copy.
> > However, with this patch set the integration will be much more seamless.
> > You can find the DPDK AF_XDP driver at:
> > https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need
> > to update how we pass around addresses. Currently, the addresses can
> > simply be masked to 2k to get back to the original address. This
> > becomes less trivial when using frame sizes that are not a 'power of
> > 2' size. This patch set modifies the Rx/Tx descriptor format to use
> > the upper 16-bits of the addr field for an offset value, leaving the
> > lower 48-bits for the address (this leaves us with 256 Terabytes,
> > which should be enough!). We only need to use the upper 16-bits to store
> the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will
> > store it in the upper 16-bits of the address field. This way, we can
> > easily add the offset to the address where we need it, using some bit
> > manipulation and addition, and we can also easily get the original
> > address wherever we need it (for example in i40e_zca_fr-- ee) by
> > simply masking to get the lower 48-bits of the address field.
> 
> I wonder if it would be better to break backwards compatibility here and
> say that a handle is going to change from [addr] to [base | offset], or
> even [index | offset], where address = (index * chunk size) + offset, and
> then use accessor macros to manipulate the queue entries.
> 
> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> bypassing the "if (unaligned)", check, as it changes the offset directly.
> 
> Using a chunk index instead of a base address is safer, otherwise it is
> too easy to corrupt things.
> --

The trouble with using a chunk index is that it assumes that all chunks are
contiguous, which is not always going to be the case. For example, for
userspace apps the easiest way to get memory that is IOVA/physically 
contiguous is to use hugepages, but even then we still need to skip space
when crossing a 2MB barrier.

Specifically in this example case, with a 3k buffer size and 2MB hugepages,
we'd get 666 buffers on a single page, but then waste a few KB before
starting the 667th buffer at byte 0 on the second 2MB page.
This is the setup used in DPDK, for instance, when we allocate memory for
use in buffer pools. 

Therefore, I think it's better to just have the kernel sanity checking the
request for safety and leave userspace the freedom to decide where in its
memory area it wants to place the buffers.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-25  9:27         ` [Intel-wired-lan] " Maxim Mikityanskiy
@ 2019-07-25 17:00           ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, ast, daniel, Topel, Bjorn, Karlsson, Magnus,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	Richardson, Bruce, Loftus, Ciara, bpf, intel-wired-lan


On 25/07/2019 10:27, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>     - Add checks for the flags coming from userspace
>>     - Fix how we get chunk_size in xsk_diag.c
>>     - Add defines for masking the new descriptor format
>>     - Modified the rx functions to use new descriptor format
>>     - Modified the tx functions to use new descriptor format
>>
>> v3:
>>     - Add helper function to do address/offset masking/addition
>> ---
>>    include/net/xdp_sock.h      | 17 ++++++++
>>    include/uapi/linux/if_xdp.h |  9 ++++
>>    net/xdp/xdp_umem.c          | 18 +++++---
>>    net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>>    net/xdp/xsk_diag.c          |  2 +-
>>    net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>>    6 files changed, 170 insertions(+), 30 deletions(-)
>>
> <...>
>
>> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
>> + * each page. This is only required in copy mode.
>> + */
>> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
>> +			     u32 len, u32 metalen)
>> +{
>> +	void *to_buf = xdp_umem_get_data(umem, addr);
>> +
>> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
>> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
>> +		u64 page_start = addr & (PAGE_SIZE - 1);
>> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> Let addr = 0x12345, PAGE_SIZE = 0x1000, len = 0x1000. Your calculations
> lead to page_start = 0x345, first_len = 0x1000 - 0x12000, which is
> negative. I think page_start is calculated incorrectly (is ~ missing?).

Correct, the ~ is missing in the page_start calculation. Nice spot, thanks!

>
>> +
>> +		memcpy(to_buf, from_buf, first_len + metalen);
>> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
>> +
>> +		return;
>> +	}
>> +
>> +	memcpy(to_buf, from_buf, len + metalen);
>> +}
>> +
> <...>
>
>> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
>> +						u64 length,
>> +						struct xdp_umem *umem)
>> +{
>> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
>> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
>> +	if (addr >= q->size ||
> Addresses like 0x00aaffffffffffff will pass the validation (0xaa +
> 0xffffffffffff will overflow mod 2^48 and become a small number),
> whereas such addresses don't look valid for me.

If you are referring to the addr >= q->size check... that check was 
already in xskq_is_valid_addr (which I based this function on). If this 
doesn't make sense, then it should be removed/fixed for both.

>
>> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> If the region is not contiguous, we cant RX into it - that's clear.
> However, how can the userspace determine whether these two pages of UMEM
> are mapped contiguously in the DMA space? Are we going to silently drop
> descriptors to non-contiguous frames and leak them? Please explain how
> to use it correctly from the application side.

Correct, if it is not contiguous then we cannot Rx into it.

Userspace apps should be aware of the page contiguity when using 
zero-copy and if not, then should not be using the unaligned mode. 
Existing frameworks that have their own memory management subsystem, 
such as DPDK, are aware of page contiguity and manage this themselves 
while simpler apps that don't have this kind of visibility can use 
hugepages, as is shown in the xdpsock sample changes in this set.




^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-25 17:00           ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: intel-wired-lan


On 25/07/2019 10:27, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>     - Add checks for the flags coming from userspace
>>     - Fix how we get chunk_size in xsk_diag.c
>>     - Add defines for masking the new descriptor format
>>     - Modified the rx functions to use new descriptor format
>>     - Modified the tx functions to use new descriptor format
>>
>> v3:
>>     - Add helper function to do address/offset masking/addition
>> ---
>>    include/net/xdp_sock.h      | 17 ++++++++
>>    include/uapi/linux/if_xdp.h |  9 ++++
>>    net/xdp/xdp_umem.c          | 18 +++++---
>>    net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>>    net/xdp/xsk_diag.c          |  2 +-
>>    net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>>    6 files changed, 170 insertions(+), 30 deletions(-)
>>
> <...>
>
>> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
>> + * each page. This is only required in copy mode.
>> + */
>> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
>> +			     u32 len, u32 metalen)
>> +{
>> +	void *to_buf = xdp_umem_get_data(umem, addr);
>> +
>> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
>> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
>> +		u64 page_start = addr & (PAGE_SIZE - 1);
>> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> Let addr = 0x12345, PAGE_SIZE = 0x1000, len = 0x1000. Your calculations
> lead to page_start = 0x345, first_len = 0x1000 - 0x12000, which is
> negative. I think page_start is calculated incorrectly (is ~ missing?).

Correct, the ~ is missing in the page_start calculation. Nice spot, thanks!

>
>> +
>> +		memcpy(to_buf, from_buf, first_len + metalen);
>> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
>> +
>> +		return;
>> +	}
>> +
>> +	memcpy(to_buf, from_buf, len + metalen);
>> +}
>> +
> <...>
>
>> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
>> +						u64 length,
>> +						struct xdp_umem *umem)
>> +{
>> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
>> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
>> +	if (addr >= q->size ||
> Addresses like 0x00aaffffffffffff will pass the validation (0xaa +
> 0xffffffffffff will overflow mod 2^48 and become a small number),
> whereas such addresses don't look valid for me.

If you are referring to the addr >= q->size check... that check was 
already in xskq_is_valid_addr (which I based this function on). If this 
doesn't make sense, then it should be removed/fixed for both.

>
>> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> If the region is not contiguous, we cant RX into it - that's clear.
> However, how can the userspace determine whether these two pages of UMEM
> are mapped contiguously in the DMA space? Are we going to silently drop
> descriptors to non-contiguous frames and leak them? Please explain how
> to use it correctly from the application side.

Correct, if it is not contiguous then we cannot Rx into it.

Userspace apps should be aware of the page contiguity when using 
zero-copy and if not, then should not be using the unaligned mode. 
Existing frameworks that have their own memory management subsystem, 
such as DPDK, are aware of page contiguity and manage this themselves 
while simpler apps that don't have this kind of visibility can use 
hugepages, as is shown in the xdpsock sample changes in this set.




^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-07-25  9:43         ` [Intel-wired-lan] " Maxim Mikityanskiy
@ 2019-07-25 17:00           ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, ast, daniel, Topel, Bjorn, Karlsson, Magnus,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	Richardson, Bruce, Loftus, Ciara, bpf, intel-wired-lan


On 25/07/2019 10:43, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> This patch adds support for the unaligned chunks mode. The addition of the
>> unaligned chunks option will allow users to run the application with more
>> relaxed chunk placement in the XDP umem.
>>
>> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
>> line options.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> ---
>>    samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
>>    1 file changed, 15 insertions(+), 2 deletions(-)
> <...>
>
>> @@ -372,6 +378,7 @@ static void usage(const char *prog)
>>    		"  -z, --zero-copy      Force zero-copy mode.\n"
>>    		"  -c, --copy           Force copy mode.\n"
>>    		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
> Help text for -f has to be updated, it doesn't have to be a power of two
> if -u is specified.

Will fix in the v4, thanks!




^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-07-25 17:00           ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: intel-wired-lan


On 25/07/2019 10:43, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> This patch adds support for the unaligned chunks mode. The addition of the
>> unaligned chunks option will allow users to run the application with more
>> relaxed chunk placement in the XDP umem.
>>
>> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
>> line options.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> ---
>>    samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
>>    1 file changed, 15 insertions(+), 2 deletions(-)
> <...>
>
>> @@ -372,6 +378,7 @@ static void usage(const char *prog)
>>    		"  -z, --zero-copy      Force zero-copy mode.\n"
>>    		"  -c, --copy           Force copy mode.\n"
>>    		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
> Help text for -f has to be updated, it doesn't have to be a power of two
> if -u is specified.

Will fix in the v4, thanks!




^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
  2019-07-25 10:15         ` [Intel-wired-lan] " Maxim Mikityanskiy
@ 2019-07-25 17:00           ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, ast, daniel, Topel, Bjorn, Karlsson, Magnus,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	Richardson, Bruce, Loftus, Ciara, bpf, intel-wired-lan

On 25/07/2019 11:15, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> With the addition of the unaligned chunks option, we need to make sure we
>> handle the offsets accordingly based on the mode we are currently running
>> in. This patch modifies the driver to appropriately mask the address for
>> each case.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>>
>> ---
>> v3:
>>     - Use new helper function to handle offset
>> ---
>>    drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>>    drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
>>    2 files changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> index b0b982cf69bb..d5245893d2c8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>>    		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>>    {
>>    	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>> +	struct xdp_umem *umem = rq->umem;
>>    	struct xdp_buff xdp;
>>    	u32 act;
>>    	int err;
>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>>    	xdp.rxq = &rq->xdp_rxq;
>>    
>>    	act = bpf_prog_run_xdp(prog, &xdp);
>> -	if (xsk)
>> -		xdp.handle += xdp.data - xdp.data_hard_start;
>> +	if (xsk) {
>> +		u64 off = xdp.data - xdp.data_hard_start;
>> +
>> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>> +	}
> What's missed is that umem_headroom is added to handle directly in
> mlx5e_xsk_page_alloc_umem. In my understanding umem_headroom should go
> to the offset part (high 16 bits) of the handle, to
> xsk_umem_handle_offset has to support increasing the offset.

Will look into it and make the changes for the v4

>>    	switch (act) {
>>    	case XDP_PASS:
>>    		*rx_headroom = xdp.data - xdp.data_hard_start;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> index 35e188cf4ea4..f596e63cba00 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> @@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>>    	struct mlx5e_xdp_xmit_data xdptxd;
>>    	bool work_done = true;
>>    	bool flush = false;
>> +	u64 addr, offset;
>>    
>>    	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
>>    
>> @@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>>    			break;
>>    		}
>>    
>> -		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
>> -		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
>> +		/* for unaligned chunks need to take offset from upper bits */
>> +		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
>> +
>> +		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
>> +		xdptxd.data = xdp_umem_get_data(umem, addr + offset);
> Why can't these calculations be encapsulated into
> xdp_umem_get_{dma,data}? I think they are common for all drivers, aren't
> they?
>
> Even if there is some reason not to put this bitshifting stuff into
> xdp_umem_get_* functions, I suggest to encapsulate it into a function
> anyway, because it's a good idea to keep those calculations in a single
> place.
Nice suggestion! I will move it to the xdp_umem_get_* functions in the 
v4. Thanks


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
@ 2019-07-25 17:00           ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: intel-wired-lan

On 25/07/2019 11:15, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> With the addition of the unaligned chunks option, we need to make sure we
>> handle the offsets accordingly based on the mode we are currently running
>> in. This patch modifies the driver to appropriately mask the address for
>> each case.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>>
>> ---
>> v3:
>>     - Use new helper function to handle offset
>> ---
>>    drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>>    drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
>>    2 files changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> index b0b982cf69bb..d5245893d2c8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>>    		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>>    {
>>    	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>> +	struct xdp_umem *umem = rq->umem;
>>    	struct xdp_buff xdp;
>>    	u32 act;
>>    	int err;
>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>>    	xdp.rxq = &rq->xdp_rxq;
>>    
>>    	act = bpf_prog_run_xdp(prog, &xdp);
>> -	if (xsk)
>> -		xdp.handle += xdp.data - xdp.data_hard_start;
>> +	if (xsk) {
>> +		u64 off = xdp.data - xdp.data_hard_start;
>> +
>> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>> +	}
> What's missed is that umem_headroom is added to handle directly in
> mlx5e_xsk_page_alloc_umem. In my understanding umem_headroom should go
> to the offset part (high 16 bits) of the handle, to
> xsk_umem_handle_offset has to support increasing the offset.

Will look into it and make the changes for the v4

>>    	switch (act) {
>>    	case XDP_PASS:
>>    		*rx_headroom = xdp.data - xdp.data_hard_start;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> index 35e188cf4ea4..f596e63cba00 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> @@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>>    	struct mlx5e_xdp_xmit_data xdptxd;
>>    	bool work_done = true;
>>    	bool flush = false;
>> +	u64 addr, offset;
>>    
>>    	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
>>    
>> @@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>>    			break;
>>    		}
>>    
>> -		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
>> -		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
>> +		/* for unaligned chunks need to take offset from upper bits */
>> +		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
>> +
>> +		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
>> +		xdptxd.data = xdp_umem_get_data(umem, addr + offset);
> Why can't these calculations be encapsulated into
> xdp_umem_get_{dma,data}? I think they are common for all drivers, aren't
> they?
>
> Even if there is some reason not to put this bitshifting stuff into
> xdp_umem_get_* functions, I suggest to encapsulate it into a function
> anyway, because it's a good idea to keep those calculations in a single
> place.
Nice suggestion! I will move it to the xdp_umem_get_* functions in the 
v4. Thanks


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-25  2:22         ` [Intel-wired-lan] " Jakub Kicinski
@ 2019-07-25 17:01           ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, ast, daniel, Topel, Bjorn, Karlsson, Magnus,
	jonathan.lemon, saeedm, maximmi, stephen, Richardson, Bruce,
	Loftus, Ciara, bpf, intel-wired-lan

On 25/07/2019 03:22, Jakub Kicinski wrote:
> On Wed, 24 Jul 2019 05:10:35 +0000, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>    - Add checks for the flags coming from userspace
>>    - Fix how we get chunk_size in xsk_diag.c
>>    - Add defines for masking the new descriptor format
>>    - Modified the rx functions to use new descriptor format
>>    - Modified the tx functions to use new descriptor format
>>
>> v3:
>>    - Add helper function to do address/offset masking/addition
>> ---
>>   include/net/xdp_sock.h      | 17 ++++++++
>>   include/uapi/linux/if_xdp.h |  9 ++++
>>   net/xdp/xdp_umem.c          | 18 +++++---
>>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>>   net/xdp/xsk_diag.c          |  2 +-
>>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>>   6 files changed, 170 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>> index 69796d264f06..738996c0f995 100644
>> --- a/include/net/xdp_sock.h
>> +++ b/include/net/xdp_sock.h
>> @@ -19,6 +19,7 @@ struct xsk_queue;
>>   struct xdp_umem_page {
>>   	void *addr;
>>   	dma_addr_t dma;
>> +	bool next_pg_contig;
> IIRC accesses to xdp_umem_page case a lot of cache misses, so having
> this structure grow from 16 to 24B is a little unfortunate :(
> Can we try to steal lower bits of addr or dma? Or perhaps not pre
> compute this info at all?

Pre-computing is saving us from re-computing the same information 
multiple times for the same chunk in this case. Keeping that in mind, I 
would be more in favor of stealing a bit.

Will look into a nicer solution for the v4 :)

>>   };
>>   
>>   struct xdp_umem_fq_reuse {
>> @@ -48,6 +49,7 @@ struct xdp_umem {
>>   	bool zc;
>>   	spinlock_t xsk_list_lock;
>>   	struct list_head xsk_list;
>> +	u32 flags;
>>   };
>>   
>>   struct xdp_sock {
>> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>>   
>>   	rq->handles[rq->length++] = addr;
>>   }
>> +
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
>> +					 u64 offset)
>> +{
>> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +	else
>> +		return handle += offset;
>> +}
>>   #else
>>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>>   {
>> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>>   {
>>   }
>>   
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
>> +					 u64 offset)
>> +{
>> +	return NULL;
> 	return 0?
Will change for the v4, thanks.
>
>> +}
>> +
>>   #endif /* CONFIG_XDP_SOCKETS */
>>   
>>   #endif /* _LINUX_XDP_SOCK_H */
>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
>> index faaa5ca2a117..f8dc68fcdf78 100644
>> --- a/include/uapi/linux/if_xdp.h
>> +++ b/include/uapi/linux/if_xdp.h
>> @@ -17,6 +17,9 @@
>>   #define XDP_COPY	(1 << 1) /* Force copy-mode */
>>   #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>>   
>> +/* Flags for xsk_umem_config flags */
>> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
>> +
>>   struct sockaddr_xdp {
>>   	__u16 sxdp_family;
>>   	__u16 sxdp_flags;
>> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>>   	__u64 len; /* Length of packet data area */
>>   	__u32 chunk_size;
>>   	__u32 headroom;
>> +	__u32 flags;
>>   };
>>   
>>   struct xdp_statistics {
>> @@ -74,6 +78,11 @@ struct xdp_options {
>>   #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>>   #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>>   
>> +/* Masks for unaligned chunks mode */
>> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
>> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
>> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
>> +
>>   /* Rx/Tx descriptor */
>>   struct xdp_desc {
>>   	__u64 addr;
>> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>> index 83de74ca729a..952ca22103e9 100644
>> --- a/net/xdp/xdp_umem.c
>> +++ b/net/xdp/xdp_umem.c
>> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>>   
>>   static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   {
>> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>>   	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>>   	unsigned int chunks, chunks_per_page;
>>   	u64 addr = mr->addr, size = mr->len;
>> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   		return -EINVAL;
>>   	}
>>   
>> -	if (!is_power_of_2(chunk_size))
>> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
> parens unnecessary, consider adding a define for known flags.
Will fix in the v4.
>
>> +		return -EINVAL;
>> +
>> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>>   		return -EINVAL;
>>   
>>   	if (!PAGE_ALIGNED(addr)) {
>> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   	if (chunks == 0)
>>   		return -EINVAL;
>>   
>> -	chunks_per_page = PAGE_SIZE / chunk_size;
>> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
>> -		return -EINVAL;
>> +	if (!unaligned_chunks) {
>> +		chunks_per_page = PAGE_SIZE / chunk_size;
>> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
>> +			return -EINVAL;
>> +	}
>>   
>>   	headroom = ALIGN(headroom, 64);
>>   
>> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   		return -EINVAL;
>>   
>>   	umem->address = (unsigned long)addr;
>> -	umem->chunk_mask = ~((u64)chunk_size - 1);
>> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
>> +					    : ~((u64)chunk_size - 1);
>>   	umem->size = size;
>>   	umem->headroom = headroom;
>>   	umem->chunk_size_nohr = chunk_size - headroom;
>>   	umem->npgs = size / PAGE_SIZE;
>>   	umem->pgs = NULL;
>>   	umem->user = NULL;
>> +	umem->flags = mr->flags;
>>   	INIT_LIST_HEAD(&umem->xsk_list);
>>   	spin_lock_init(&umem->xsk_list_lock);
>>   
>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>> index 59b57d708697..b3ab653091c4 100644
>> --- a/net/xdp/xsk.c
>> +++ b/net/xdp/xsk.c
>> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>>   
>>   u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>>   {
>> -	return xskq_peek_addr(umem->fq, addr);
>> +	return xskq_peek_addr(umem->fq, addr, umem);
>>   }
>>   EXPORT_SYMBOL(xsk_umem_peek_addr);
>>   
>> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>>   }
>>   EXPORT_SYMBOL(xsk_umem_discard_addr);
>>   
>> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
>> + * each page. This is only required in copy mode.
>> + */
>> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
>> +			     u32 len, u32 metalen)
>> +{
>> +	void *to_buf = xdp_umem_get_data(umem, addr);
>> +
>> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
>> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
>> +		u64 page_start = addr & (PAGE_SIZE - 1);
>> +		u64 first_len = PAGE_SIZE - (addr - page_start);
>> +
>> +		memcpy(to_buf, from_buf, first_len + metalen);
>> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
>> +
>> +		return;
>> +	}
>> +
>> +	memcpy(to_buf, from_buf, len + metalen);
>> +}
> Why handle this case gracefully? Real XSK use is the zero copy mode,
> having extra code to make copy mode more permissive seems a little
> counter productive IMHO.
Agree that zero copy mode is the main use and that this is somewhat 
unnecessary. However, since we are now allowing for unaligned chunks 
which can cross page boundaries, there's no harm in adding this extra 
check and handling it gracefully.
>
>>   static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>   {
>> -	void *to_buf, *from_buf;
>> +	u64 offset = xs->umem->headroom;
>> +	void *from_buf;
>>   	u32 metalen;
>>   	u64 addr;
>>   	int err;
>>   
>> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
>> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>>   		xs->rx_dropped++;
>>   		return -ENOSPC;
>>   	}
>>   
>> -	addr += xs->umem->headroom;
>> -
>>   	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>>   		from_buf = xdp->data;
>>   		metalen = 0;
>> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>   		metalen = xdp->data - xdp->data_meta;
>>   	}
>>   
>> -	to_buf = xdp_umem_get_data(xs->umem, addr);
>> -	memcpy(to_buf, from_buf, len + metalen);
>> -	addr += metalen;
>> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
>> +
>> +	offset += metalen;
>> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
>> +	else
>> +		addr += offset;
>>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>>   	if (!err) {
>>   		xskq_discard_addr(xs->umem->fq);
>> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>>   	u32 len = xdp->data_end - xdp->data;
>>   	void *buffer;
>>   	u64 addr;
>> +	u64 offset = xs->umem->headroom;
> reverse xmas tree, please

Will fix in the v4.

Thanks for reviewing!


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-25 17:01           ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-25 17:01 UTC (permalink / raw)
  To: intel-wired-lan

On 25/07/2019 03:22, Jakub Kicinski wrote:
> On Wed, 24 Jul 2019 05:10:35 +0000, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>    - Add checks for the flags coming from userspace
>>    - Fix how we get chunk_size in xsk_diag.c
>>    - Add defines for masking the new descriptor format
>>    - Modified the rx functions to use new descriptor format
>>    - Modified the tx functions to use new descriptor format
>>
>> v3:
>>    - Add helper function to do address/offset masking/addition
>> ---
>>   include/net/xdp_sock.h      | 17 ++++++++
>>   include/uapi/linux/if_xdp.h |  9 ++++
>>   net/xdp/xdp_umem.c          | 18 +++++---
>>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>>   net/xdp/xsk_diag.c          |  2 +-
>>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>>   6 files changed, 170 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>> index 69796d264f06..738996c0f995 100644
>> --- a/include/net/xdp_sock.h
>> +++ b/include/net/xdp_sock.h
>> @@ -19,6 +19,7 @@ struct xsk_queue;
>>   struct xdp_umem_page {
>>   	void *addr;
>>   	dma_addr_t dma;
>> +	bool next_pg_contig;
> IIRC accesses to xdp_umem_page case a lot of cache misses, so having
> this structure grow from 16 to 24B is a little unfortunate :(
> Can we try to steal lower bits of addr or dma? Or perhaps not pre
> compute this info at all?

Pre-computing is saving us from re-computing the same information 
multiple times for the same chunk in this case. Keeping that in mind, I 
would be more in favor of stealing a bit.

Will look into a nicer solution for the v4 :)

>>   };
>>   
>>   struct xdp_umem_fq_reuse {
>> @@ -48,6 +49,7 @@ struct xdp_umem {
>>   	bool zc;
>>   	spinlock_t xsk_list_lock;
>>   	struct list_head xsk_list;
>> +	u32 flags;
>>   };
>>   
>>   struct xdp_sock {
>> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>>   
>>   	rq->handles[rq->length++] = addr;
>>   }
>> +
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
>> +					 u64 offset)
>> +{
>> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +	else
>> +		return handle += offset;
>> +}
>>   #else
>>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>>   {
>> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>>   {
>>   }
>>   
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
>> +					 u64 offset)
>> +{
>> +	return NULL;
> 	return 0?
Will change for the v4, thanks.
>
>> +}
>> +
>>   #endif /* CONFIG_XDP_SOCKETS */
>>   
>>   #endif /* _LINUX_XDP_SOCK_H */
>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
>> index faaa5ca2a117..f8dc68fcdf78 100644
>> --- a/include/uapi/linux/if_xdp.h
>> +++ b/include/uapi/linux/if_xdp.h
>> @@ -17,6 +17,9 @@
>>   #define XDP_COPY	(1 << 1) /* Force copy-mode */
>>   #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>>   
>> +/* Flags for xsk_umem_config flags */
>> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
>> +
>>   struct sockaddr_xdp {
>>   	__u16 sxdp_family;
>>   	__u16 sxdp_flags;
>> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>>   	__u64 len; /* Length of packet data area */
>>   	__u32 chunk_size;
>>   	__u32 headroom;
>> +	__u32 flags;
>>   };
>>   
>>   struct xdp_statistics {
>> @@ -74,6 +78,11 @@ struct xdp_options {
>>   #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>>   #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>>   
>> +/* Masks for unaligned chunks mode */
>> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
>> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
>> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
>> +
>>   /* Rx/Tx descriptor */
>>   struct xdp_desc {
>>   	__u64 addr;
>> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>> index 83de74ca729a..952ca22103e9 100644
>> --- a/net/xdp/xdp_umem.c
>> +++ b/net/xdp/xdp_umem.c
>> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>>   
>>   static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   {
>> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>>   	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>>   	unsigned int chunks, chunks_per_page;
>>   	u64 addr = mr->addr, size = mr->len;
>> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   		return -EINVAL;
>>   	}
>>   
>> -	if (!is_power_of_2(chunk_size))
>> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
> parens unnecessary, consider adding a define for known flags.
Will fix in the v4.
>
>> +		return -EINVAL;
>> +
>> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>>   		return -EINVAL;
>>   
>>   	if (!PAGE_ALIGNED(addr)) {
>> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   	if (chunks == 0)
>>   		return -EINVAL;
>>   
>> -	chunks_per_page = PAGE_SIZE / chunk_size;
>> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
>> -		return -EINVAL;
>> +	if (!unaligned_chunks) {
>> +		chunks_per_page = PAGE_SIZE / chunk_size;
>> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
>> +			return -EINVAL;
>> +	}
>>   
>>   	headroom = ALIGN(headroom, 64);
>>   
>> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   		return -EINVAL;
>>   
>>   	umem->address = (unsigned long)addr;
>> -	umem->chunk_mask = ~((u64)chunk_size - 1);
>> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
>> +					    : ~((u64)chunk_size - 1);
>>   	umem->size = size;
>>   	umem->headroom = headroom;
>>   	umem->chunk_size_nohr = chunk_size - headroom;
>>   	umem->npgs = size / PAGE_SIZE;
>>   	umem->pgs = NULL;
>>   	umem->user = NULL;
>> +	umem->flags = mr->flags;
>>   	INIT_LIST_HEAD(&umem->xsk_list);
>>   	spin_lock_init(&umem->xsk_list_lock);
>>   
>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>> index 59b57d708697..b3ab653091c4 100644
>> --- a/net/xdp/xsk.c
>> +++ b/net/xdp/xsk.c
>> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>>   
>>   u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>>   {
>> -	return xskq_peek_addr(umem->fq, addr);
>> +	return xskq_peek_addr(umem->fq, addr, umem);
>>   }
>>   EXPORT_SYMBOL(xsk_umem_peek_addr);
>>   
>> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>>   }
>>   EXPORT_SYMBOL(xsk_umem_discard_addr);
>>   
>> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
>> + * each page. This is only required in copy mode.
>> + */
>> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
>> +			     u32 len, u32 metalen)
>> +{
>> +	void *to_buf = xdp_umem_get_data(umem, addr);
>> +
>> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
>> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
>> +		u64 page_start = addr & (PAGE_SIZE - 1);
>> +		u64 first_len = PAGE_SIZE - (addr - page_start);
>> +
>> +		memcpy(to_buf, from_buf, first_len + metalen);
>> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
>> +
>> +		return;
>> +	}
>> +
>> +	memcpy(to_buf, from_buf, len + metalen);
>> +}
> Why handle this case gracefully? Real XSK use is the zero copy mode,
> having extra code to make copy mode more permissive seems a little
> counter productive IMHO.
Agree that zero copy mode is the main use and that this is somewhat 
unnecessary. However, since we are now allowing for unaligned chunks 
which can cross page boundaries, there's no harm in adding this extra 
check and handling it gracefully.
>
>>   static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>   {
>> -	void *to_buf, *from_buf;
>> +	u64 offset = xs->umem->headroom;
>> +	void *from_buf;
>>   	u32 metalen;
>>   	u64 addr;
>>   	int err;
>>   
>> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
>> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>>   		xs->rx_dropped++;
>>   		return -ENOSPC;
>>   	}
>>   
>> -	addr += xs->umem->headroom;
>> -
>>   	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>>   		from_buf = xdp->data;
>>   		metalen = 0;
>> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>   		metalen = xdp->data - xdp->data_meta;
>>   	}
>>   
>> -	to_buf = xdp_umem_get_data(xs->umem, addr);
>> -	memcpy(to_buf, from_buf, len + metalen);
>> -	addr += metalen;
>> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
>> +
>> +	offset += metalen;
>> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
>> +	else
>> +		addr += offset;
>>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>>   	if (!err) {
>>   		xskq_discard_addr(xs->umem->fq);
>> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>>   	u32 len = xdp->data_end - xdp->data;
>>   	void *buffer;
>>   	u64 addr;
>> +	u64 offset = xs->umem->headroom;
> reverse xmas tree, please

Will fix in the v4.

Thanks for reviewing!


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
  2019-07-25 15:56         ` [Intel-wired-lan] " Richardson, Bruce
@ 2019-07-25 17:30           ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 17:30 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Laatz, Kevin, netdev, ast, daniel, Topel, Bjorn, Karlsson,
	Magnus, jakub.kicinski, saeedm, maximmi, stephen, Loftus, Ciara,
	bpf, intel-wired-lan



On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:

>> -----Original Message-----
>> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
>> Sent: Thursday, July 25, 2019 4:39 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
>> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
>> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
>> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
>> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
>> lan@lists.osuosl.org
>> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
>> support
>>
>>
>>
>> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>>
>>> This patch set adds the ability to use unaligned chunks in the XDP umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be
>>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
>>> chunks within the umem as well as limiting the packet sizes that are
>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing XDP
>>> to be more flexible in where it can place a chunk within a umem. By
>>> relaxing where the chunks can be placed, it allows us to use an
>>> arbitrary buffer size and place that wherever we have a free address
>>> in the umem. These changes add the ability to support arbitrary frame
>>> sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing
>>> frameworks that have their own memory management systems, such as DPDK.
>>> In DPDK, for example, there is already support for AF_XDP with zero-
>> copy.
>>> However, with this patch set the integration will be much more seamless.
>>> You can find the DPDK AF_XDP driver at:
>>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also need
>>> to update how we pass around addresses. Currently, the addresses can
>>> simply be masked to 2k to get back to the original address. This
>>> becomes less trivial when using frame sizes that are not a 'power of
>>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
>>> the upper 16-bits of the addr field for an offset value, leaving the
>>> lower 48-bits for the address (this leaves us with 256 Terabytes,
>>> which should be enough!). We only need to use the upper 16-bits to store
>> the offset when running in unaligned mode.
>>> Rather than adding the offset (headroom etc) to the address, we will
>>> store it in the upper 16-bits of the address field. This way, we can
>>> easily add the offset to the address where we need it, using some bit
>>> manipulation and addition, and we can also easily get the original
>>> address wherever we need it (for example in i40e_zca_fr-- ee) by
>>> simply masking to get the lower 48-bits of the address field.
>>
>> I wonder if it would be better to break backwards compatibility here and
>> say that a handle is going to change from [addr] to [base | offset], or
>> even [index | offset], where address = (index * chunk size) + offset, and
>> then use accessor macros to manipulate the queue entries.
>>
>> This way, the XDP hotpath can adjust the handle with simple arithmetic,
>> bypassing the "if (unaligned)", check, as it changes the offset directly.
>>
>> Using a chunk index instead of a base address is safer, otherwise it is
>> too easy to corrupt things.
>> --
>
> The trouble with using a chunk index is that it assumes that all chunks are
> contiguous, which is not always going to be the case. For example, for
> userspace apps the easiest way to get memory that is IOVA/physically
> contiguous is to use hugepages, but even then we still need to skip space
> when crossing a 2MB barrier.
>
> Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> we'd get 666 buffers on a single page, but then waste a few KB before
> starting the 667th buffer at byte 0 on the second 2MB page.
> This is the setup used in DPDK, for instance, when we allocate memory for
> use in buffer pools.
>
> Therefore, I think it's better to just have the kernel sanity checking the
> request for safety and leave userspace the freedom to decide where in its
> memory area it wants to place the buffers.

That makes sense, thanks.  My concern is that this makes it difficult for
the kernel to validate the buffer start address, but perhaps that isn't a
concern.

I still believe it would simplify things if the format was [base | offset],
any opinion on that?
-- 
Jonathan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
@ 2019-07-25 17:30           ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-25 17:30 UTC (permalink / raw)
  To: intel-wired-lan



On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:

>> -----Original Message-----
>> From: Jonathan Lemon [mailto:jonathan.lemon at gmail.com]
>> Sent: Thursday, July 25, 2019 4:39 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
>> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
>> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
>> saeedm at mellanox.com; maximmi at mellanox.com; stephen at networkplumber.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
>> <ciara.loftus@intel.com>; bpf at vger.kernel.org; intel-wired-
>> lan at lists.osuosl.org
>> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
>> support
>>
>>
>>
>> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>>
>>> This patch set adds the ability to use unaligned chunks in the XDP umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be
>>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
>>> chunks within the umem as well as limiting the packet sizes that are
>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing XDP
>>> to be more flexible in where it can place a chunk within a umem. By
>>> relaxing where the chunks can be placed, it allows us to use an
>>> arbitrary buffer size and place that wherever we have a free address
>>> in the umem. These changes add the ability to support arbitrary frame
>>> sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing
>>> frameworks that have their own memory management systems, such as DPDK.
>>> In DPDK, for example, there is already support for AF_XDP with zero-
>> copy.
>>> However, with this patch set the integration will be much more seamless.
>>> You can find the DPDK AF_XDP driver at:
>>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also need
>>> to update how we pass around addresses. Currently, the addresses can
>>> simply be masked to 2k to get back to the original address. This
>>> becomes less trivial when using frame sizes that are not a 'power of
>>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
>>> the upper 16-bits of the addr field for an offset value, leaving the
>>> lower 48-bits for the address (this leaves us with 256 Terabytes,
>>> which should be enough!). We only need to use the upper 16-bits to store
>> the offset when running in unaligned mode.
>>> Rather than adding the offset (headroom etc) to the address, we will
>>> store it in the upper 16-bits of the address field. This way, we can
>>> easily add the offset to the address where we need it, using some bit
>>> manipulation and addition, and we can also easily get the original
>>> address wherever we need it (for example in i40e_zca_fr-- ee) by
>>> simply masking to get the lower 48-bits of the address field.
>>
>> I wonder if it would be better to break backwards compatibility here and
>> say that a handle is going to change from [addr] to [base | offset], or
>> even [index | offset], where address = (index * chunk size) + offset, and
>> then use accessor macros to manipulate the queue entries.
>>
>> This way, the XDP hotpath can adjust the handle with simple arithmetic,
>> bypassing the "if (unaligned)", check, as it changes the offset directly.
>>
>> Using a chunk index instead of a base address is safer, otherwise it is
>> too easy to corrupt things.
>> --
>
> The trouble with using a chunk index is that it assumes that all chunks are
> contiguous, which is not always going to be the case. For example, for
> userspace apps the easiest way to get memory that is IOVA/physically
> contiguous is to use hugepages, but even then we still need to skip space
> when crossing a 2MB barrier.
>
> Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> we'd get 666 buffers on a single page, but then waste a few KB before
> starting the 667th buffer at byte 0 on the second 2MB page.
> This is the setup used in DPDK, for instance, when we allocate memory for
> use in buffer pools.
>
> Therefore, I think it's better to just have the kernel sanity checking the
> request for safety and leave userspace the freedom to decide where in its
> memory area it wants to place the buffers.

That makes sense, thanks.  My concern is that this makes it difficult for
the kernel to validate the buffer start address, but perhaps that isn't a
concern.

I still believe it would simplify things if the format was [base | offset],
any opinion on that?
-- 
Jonathan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
  2019-07-25 17:30           ` [Intel-wired-lan] " Jonathan Lemon
@ 2019-07-26  8:41             ` Bruce Richardson
  -1 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2019-07-26  8:41 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Laatz, Kevin, netdev, ast, daniel, Topel, Bjorn, Karlsson,
	Magnus, jakub.kicinski, saeedm, maximmi, stephen, Loftus, Ciara,
	bpf, intel-wired-lan

On Thu, Jul 25, 2019 at 10:30:51AM -0700, Jonathan Lemon wrote:
> 
> 
> On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:
> 
> >> -----Original Message-----
> >> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
> >> Sent: Thursday, July 25, 2019 4:39 PM
> >> To: Laatz, Kevin <kevin.laatz@intel.com>
> >> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> >> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> >> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> >> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> >> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
> >> lan@lists.osuosl.org
> >> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> >> support
> >>
> >>
> >>
> >> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> >>
> >>> This patch set adds the ability to use unaligned chunks in the XDP umem.
> >>>
> >>> Currently, all chunk addresses passed to the umem are masked to be
> >>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
> >>> chunks within the umem as well as limiting the packet sizes that are
> >> supported.
> >>>
> >>> The changes in this patch set removes these restrictions, allowing XDP
> >>> to be more flexible in where it can place a chunk within a umem. By
> >>> relaxing where the chunks can be placed, it allows us to use an
> >>> arbitrary buffer size and place that wherever we have a free address
> >>> in the umem. These changes add the ability to support arbitrary frame
> >>> sizes up to 4k
> >>> (PAGE_SIZE) and make it easy to integrate with other existing
> >>> frameworks that have their own memory management systems, such as DPDK.
> >>> In DPDK, for example, there is already support for AF_XDP with zero-
> >> copy.
> >>> However, with this patch set the integration will be much more seamless.
> >>> You can find the DPDK AF_XDP driver at:
> >>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >>>
> >>> Since we are now dealing with arbitrary frame sizes, we need also need
> >>> to update how we pass around addresses. Currently, the addresses can
> >>> simply be masked to 2k to get back to the original address. This
> >>> becomes less trivial when using frame sizes that are not a 'power of
> >>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
> >>> the upper 16-bits of the addr field for an offset value, leaving the
> >>> lower 48-bits for the address (this leaves us with 256 Terabytes,
> >>> which should be enough!). We only need to use the upper 16-bits to store
> >> the offset when running in unaligned mode.
> >>> Rather than adding the offset (headroom etc) to the address, we will
> >>> store it in the upper 16-bits of the address field. This way, we can
> >>> easily add the offset to the address where we need it, using some bit
> >>> manipulation and addition, and we can also easily get the original
> >>> address wherever we need it (for example in i40e_zca_fr-- ee) by
> >>> simply masking to get the lower 48-bits of the address field.
> >>
> >> I wonder if it would be better to break backwards compatibility here and
> >> say that a handle is going to change from [addr] to [base | offset], or
> >> even [index | offset], where address = (index * chunk size) + offset, and
> >> then use accessor macros to manipulate the queue entries.
> >>
> >> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> >> bypassing the "if (unaligned)", check, as it changes the offset directly.
> >>
> >> Using a chunk index instead of a base address is safer, otherwise it is
> >> too easy to corrupt things.
> >> --
> >
> > The trouble with using a chunk index is that it assumes that all chunks are
> > contiguous, which is not always going to be the case. For example, for
> > userspace apps the easiest way to get memory that is IOVA/physically
> > contiguous is to use hugepages, but even then we still need to skip space
> > when crossing a 2MB barrier.
> >
> > Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> > we'd get 666 buffers on a single page, but then waste a few KB before
> > starting the 667th buffer at byte 0 on the second 2MB page.
> > This is the setup used in DPDK, for instance, when we allocate memory for
> > use in buffer pools.
> >
> > Therefore, I think it's better to just have the kernel sanity checking the
> > request for safety and leave userspace the freedom to decide where in its
> > memory area it wants to place the buffers.
> 
> That makes sense, thanks.  My concern is that this makes it difficult for
> the kernel to validate the buffer start address, but perhaps that isn't a
> concern.
> 
I don't think that's a problem right now, since the start address is still
an offset into the umem, and so must be between 0 and size (or size -
chunk_size, more specifically). Are there other checks that should be
performed on it, do you think?

> I still believe it would simplify things if the format was [base | offset],
> any opinion on that?

Can you perhaps clarify what you mean here. Are you suggesting globally
changing the address format to always use this, irrespective of chunk
alignment, or something else?

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
@ 2019-07-26  8:41             ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2019-07-26  8:41 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Jul 25, 2019 at 10:30:51AM -0700, Jonathan Lemon wrote:
> 
> 
> On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:
> 
> >> -----Original Message-----
> >> From: Jonathan Lemon [mailto:jonathan.lemon at gmail.com]
> >> Sent: Thursday, July 25, 2019 4:39 PM
> >> To: Laatz, Kevin <kevin.laatz@intel.com>
> >> Cc: netdev at vger.kernel.org; ast at kernel.org; daniel at iogearbox.net; Topel,
> >> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> >> <magnus.karlsson@intel.com>; jakub.kicinski at netronome.com;
> >> saeedm at mellanox.com; maximmi at mellanox.com; stephen at networkplumber.org;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> >> <ciara.loftus@intel.com>; bpf at vger.kernel.org; intel-wired-
> >> lan at lists.osuosl.org
> >> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> >> support
> >>
> >>
> >>
> >> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> >>
> >>> This patch set adds the ability to use unaligned chunks in the XDP umem.
> >>>
> >>> Currently, all chunk addresses passed to the umem are masked to be
> >>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
> >>> chunks within the umem as well as limiting the packet sizes that are
> >> supported.
> >>>
> >>> The changes in this patch set removes these restrictions, allowing XDP
> >>> to be more flexible in where it can place a chunk within a umem. By
> >>> relaxing where the chunks can be placed, it allows us to use an
> >>> arbitrary buffer size and place that wherever we have a free address
> >>> in the umem. These changes add the ability to support arbitrary frame
> >>> sizes up to 4k
> >>> (PAGE_SIZE) and make it easy to integrate with other existing
> >>> frameworks that have their own memory management systems, such as DPDK.
> >>> In DPDK, for example, there is already support for AF_XDP with zero-
> >> copy.
> >>> However, with this patch set the integration will be much more seamless.
> >>> You can find the DPDK AF_XDP driver at:
> >>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >>>
> >>> Since we are now dealing with arbitrary frame sizes, we need also need
> >>> to update how we pass around addresses. Currently, the addresses can
> >>> simply be masked to 2k to get back to the original address. This
> >>> becomes less trivial when using frame sizes that are not a 'power of
> >>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
> >>> the upper 16-bits of the addr field for an offset value, leaving the
> >>> lower 48-bits for the address (this leaves us with 256 Terabytes,
> >>> which should be enough!). We only need to use the upper 16-bits to store
> >> the offset when running in unaligned mode.
> >>> Rather than adding the offset (headroom etc) to the address, we will
> >>> store it in the upper 16-bits of the address field. This way, we can
> >>> easily add the offset to the address where we need it, using some bit
> >>> manipulation and addition, and we can also easily get the original
> >>> address wherever we need it (for example in i40e_zca_fr-- ee) by
> >>> simply masking to get the lower 48-bits of the address field.
> >>
> >> I wonder if it would be better to break backwards compatibility here and
> >> say that a handle is going to change from [addr] to [base | offset], or
> >> even [index | offset], where address = (index * chunk size) + offset, and
> >> then use accessor macros to manipulate the queue entries.
> >>
> >> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> >> bypassing the "if (unaligned)", check, as it changes the offset directly.
> >>
> >> Using a chunk index instead of a base address is safer, otherwise it is
> >> too easy to corrupt things.
> >> --
> >
> > The trouble with using a chunk index is that it assumes that all chunks are
> > contiguous, which is not always going to be the case. For example, for
> > userspace apps the easiest way to get memory that is IOVA/physically
> > contiguous is to use hugepages, but even then we still need to skip space
> > when crossing a 2MB barrier.
> >
> > Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> > we'd get 666 buffers on a single page, but then waste a few KB before
> > starting the 667th buffer at byte 0 on the second 2MB page.
> > This is the setup used in DPDK, for instance, when we allocate memory for
> > use in buffer pools.
> >
> > Therefore, I think it's better to just have the kernel sanity checking the
> > request for safety and leave userspace the freedom to decide where in its
> > memory area it wants to place the buffers.
> 
> That makes sense, thanks.  My concern is that this makes it difficult for
> the kernel to validate the buffer start address, but perhaps that isn't a
> concern.
> 
I don't think that's a problem right now, since the start address is still
an offset into the umem, and so must be between 0 and size (or size -
chunk_size, more specifically). Are there other checks that should be
performed on it, do you think?

> I still believe it would simplify things if the format was [base | offset],
> any opinion on that?

Can you perhaps clarify what you mean here. Are you suggesting globally
changing the address format to always use this, irrespective of chunk
alignment, or something else?

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-25 15:39         ` [Intel-wired-lan] " Jonathan Lemon
  (?)
@ 2019-07-26  9:47         ` Laatz, Kevin
  2019-07-26  9:58           ` Laatz, Kevin
  -1 siblings, 1 reply; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-26  9:47 UTC (permalink / raw)
  To: intel-wired-lan

On 25/07/2019 16:39, Jonathan Lemon wrote:
> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be 
>> placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing 
>> with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>> ? - Add checks for the flags coming from userspace
>> ? - Fix how we get chunk_size in xsk_diag.c
>> ? - Add defines for masking the new descriptor format
>> ? - Modified the rx functions to use new descriptor format
>> ? - Modified the tx functions to use new descriptor format
>>
>> v3:
>> ? - Add helper function to do address/offset masking/addition
>> ---
>> ?include/net/xdp_sock.h????? | 17 ++++++++
>> ?include/uapi/linux/if_xdp.h |? 9 ++++
>> ?net/xdp/xdp_umem.c????????? | 18 +++++---
>> ?net/xdp/xsk.c?????????????? | 86 ++++++++++++++++++++++++++++++-------
>> ?net/xdp/xsk_diag.c????????? |? 2 +-
>> ?net/xdp/xsk_queue.h???????? | 68 +++++++++++++++++++++++++----
>> ?6 files changed, 170 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>> index 69796d264f06..738996c0f995 100644
>> --- a/include/net/xdp_sock.h
>> +++ b/include/net/xdp_sock.h
>> @@ -19,6 +19,7 @@ struct xsk_queue;
>> ?struct xdp_umem_page {
>> ???? void *addr;
>> ???? dma_addr_t dma;
>> +??? bool next_pg_contig;
>> ?};
>>
>> ?struct xdp_umem_fq_reuse {
>> @@ -48,6 +49,7 @@ struct xdp_umem {
>> ???? bool zc;
>> ???? spinlock_t xsk_list_lock;
>> ???? struct list_head xsk_list;
>> +??? u32 flags;
>> ?};
>>
>> ?struct xdp_sock {
>> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct 
>> xdp_umem *umem, u64 addr)
>>
>> ???? rq->handles[rq->length++] = addr;
>> ?}
>> +
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
>> handle,
>> +???????????????????? u64 offset)
>> +{
>> +??? if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +??????? return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +??? else
>> +??????? return handle += offset;
>> +}
>
> This should be something like 'xsk_umem_adjust_offset()', and use "+=" 
> for both cases.

I'll try to come up with a better name for this, I see how 
"handle_offset" could be misleading :)

In terms of the "+=", we can't do that for both. We need to use | for 
the unaligned case since we store the offset in the upper 16-bits of the 
address field so that we leave the lower 48-bits (the original base 
address) untouched.


<...>

>>
>> ???? if (!PAGE_ALIGNED(addr)) {
>> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
>> struct xdp_umem_reg *mr)
>> ???? if (chunks == 0)
>> ???????? return -EINVAL;
>>
>> -??? chunks_per_page = PAGE_SIZE / chunk_size;
>> -??? if (chunks < chunks_per_page || chunks % chunks_per_page)
>> -??????? return -EINVAL;
>> +??? if (!unaligned_chunks) {
>> +??????? chunks_per_page = PAGE_SIZE / chunk_size;
>> +??????? if (chunks < chunks_per_page || chunks % chunks_per_page)
>> +??????????? return -EINVAL;
>> +??? }
>>
>> ???? headroom = ALIGN(headroom, 64);
>>
>> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
>> struct xdp_umem_reg *mr)
>> ???????? return -EINVAL;
>>
>> ???? umem->address = (unsigned long)addr;
>> -??? umem->chunk_mask = ~((u64)chunk_size - 1);
>> +??? umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
>> +??????????????????????? : ~((u64)chunk_size - 1);
>
> The handle needs to be cleaned (reset to base address) when removed
> from the fill queue or recycle stack.? This will not provide the correct
> semantics for unaligned mode.

When we use aligned mode, the chunk mask is ~0x07FF which will remove 
the low 11-bits in order to get us back to the original base address.

In unaligned mode, the chunk mask is ~0xFFFF00....00 which will remove 
the upper 16-bits. This will remove the offset from the address field 
and, since we have not directly modified the low 48-bits (original base 
address), leave us with the original base address.

Cleaning the handle will therefore work as it does now, using the 
appropriate mask based on which mode we are using.

<...>

>>
>> ???????? if (xskq_produce_addr_lazy(umem->cq, desc->addr))
>> @@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, 
>> struct msghdr *m,
>> ???? if (xs->queue_id >= xs->dev->real_num_tx_queues)
>> ???????? goto out;
>>
>> -??? while (xskq_peek_desc(xs->tx, &desc)) {
>> +??? while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>> ???????? char *buffer;
>> ???????? u64 addr;
>> ???????? u32 len;
>> @@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk, 
>> struct msghdr *m,
>>
>> ???????? skb_put(skb, len);
>> ???????? addr = desc.addr;
>> +??????? if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +??????????? addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
>> +??????????????? (addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>
> This doesn't look right to me.? Shouldn't it be "(addr & mask) + (addr 
> >> shift)"?
> I'd also prefer to see this type of logic in an inline/macro

Will fix in the v4, thanks!

I'll look to move this to an inline function as well.

<...>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190726/6b620ef3/attachment.html>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
  2019-07-26  9:47         ` Laatz, Kevin
@ 2019-07-26  9:58           ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-07-26  9:58 UTC (permalink / raw)
  To: intel-wired-lan

On 26/07/2019 10:47, Laatz, Kevin wrote:
> On 25/07/2019 16:39, Jonathan Lemon wrote:
>> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>>
>>> Currently, addresses are chunk size aligned. This means, we are very
>>> restricted in terms of where we can place chunk within the umem. For
>>> example, if we have a chunk size of 2k, then our chunks can only be 
>>> placed
>>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>>
>>> This patch introduces the ability to use unaligned chunks. With these
>>> changes, we are no longer bound to having to place chunks at a 2k (or
>>> whatever your chunk size is) interval. Since we are no longer 
>>> dealing with
>>> aligned chunks, they can now cross page boundaries. Checks for page
>>> contiguity have been added in order to keep track of which pages are
>>> followed by a physically contiguous page.
>>>
>>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>>
>>> ---
>>> v2:
>>> ? - Add checks for the flags coming from userspace
>>> ? - Fix how we get chunk_size in xsk_diag.c
>>> ? - Add defines for masking the new descriptor format
>>> ? - Modified the rx functions to use new descriptor format
>>> ? - Modified the tx functions to use new descriptor format
>>>
>>> v3:
>>> ? - Add helper function to do address/offset masking/addition
>>> ---
>>> ?include/net/xdp_sock.h????? | 17 ++++++++
>>> ?include/uapi/linux/if_xdp.h |? 9 ++++
>>> ?net/xdp/xdp_umem.c????????? | 18 +++++---
>>> ?net/xdp/xsk.c?????????????? | 86 ++++++++++++++++++++++++++++++-------
>>> ?net/xdp/xsk_diag.c????????? |? 2 +-
>>> ?net/xdp/xsk_queue.h???????? | 68 +++++++++++++++++++++++++----
>>> ?6 files changed, 170 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>>> index 69796d264f06..738996c0f995 100644
>>> --- a/include/net/xdp_sock.h
>>> +++ b/include/net/xdp_sock.h
>>> @@ -19,6 +19,7 @@ struct xsk_queue;
>>> ?struct xdp_umem_page {
>>> ???? void *addr;
>>> ???? dma_addr_t dma;
>>> +??? bool next_pg_contig;
>>> ?};
>>>
>>> ?struct xdp_umem_fq_reuse {
>>> @@ -48,6 +49,7 @@ struct xdp_umem {
>>> ???? bool zc;
>>> ???? spinlock_t xsk_list_lock;
>>> ???? struct list_head xsk_list;
>>> +??? u32 flags;
>>> ?};
>>>
>>> ?struct xdp_sock {
>>> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct 
>>> xdp_umem *umem, u64 addr)
>>>
>>> ???? rq->handles[rq->length++] = addr;
>>> ?}
>>> +
>>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
>>> handle,
>>> +???????????????????? u64 offset)
>>> +{
>>> +??? if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>>> +??????? return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>>> +??? else
>>> +??????? return handle += offset;
>>> +}
>>
>> This should be something like 'xsk_umem_adjust_offset()', and use 
>> "+=" for both cases.
>
> I'll try to come up with a better name for this, I see how 
> "handle_offset" could be misleading :)
>
> In terms of the "+=", we can't do that for both. We need to use | for 
> the unaligned case since we store the offset in the upper 16-bits of 
> the address field so that we leave the lower 48-bits (the original 
> base address) untouched.
>
Correction, we can do "+=". With that, the lower 48-bits will still 
remain untouched though, so we still have the original address :)
>
> <...>
>
>>>
>>> ???? if (!PAGE_ALIGNED(addr)) {
>>> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
>>> struct xdp_umem_reg *mr)
>>> ???? if (chunks == 0)
>>> ???????? return -EINVAL;
>>>
>>> -??? chunks_per_page = PAGE_SIZE / chunk_size;
>>> -??? if (chunks < chunks_per_page || chunks % chunks_per_page)
>>> -??????? return -EINVAL;
>>> +??? if (!unaligned_chunks) {
>>> +??????? chunks_per_page = PAGE_SIZE / chunk_size;
>>> +??????? if (chunks < chunks_per_page || chunks % chunks_per_page)
>>> +??????????? return -EINVAL;
>>> +??? }
>>>
>>> ???? headroom = ALIGN(headroom, 64);
>>>
>>> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
>>> struct xdp_umem_reg *mr)
>>> ???????? return -EINVAL;
>>>
>>> ???? umem->address = (unsigned long)addr;
>>> -??? umem->chunk_mask = ~((u64)chunk_size - 1);
>>> +??? umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
>>> +??????????????????????? : ~((u64)chunk_size - 1);
>>
>> The handle needs to be cleaned (reset to base address) when removed
>> from the fill queue or recycle stack.? This will not provide the correct
>> semantics for unaligned mode.
>
> When we use aligned mode, the chunk mask is ~0x07FF which will remove 
> the low 11-bits in order to get us back to the original base address.
>
> In unaligned mode, the chunk mask is ~0xFFFF00....00 which will remove 
> the upper 16-bits. This will remove the offset from the address field 
> and, since we have not directly modified the low 48-bits (original 
> base address), leave us with the original base address.
>
> Cleaning the handle will therefore work as it does now, using the 
> appropriate mask based on which mode we are using.
>
> <...>
>
>>>
>>> ???????? if (xskq_produce_addr_lazy(umem->cq, desc->addr))
>>> @@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, 
>>> struct msghdr *m,
>>> ???? if (xs->queue_id >= xs->dev->real_num_tx_queues)
>>> ???????? goto out;
>>>
>>> -??? while (xskq_peek_desc(xs->tx, &desc)) {
>>> +??? while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>>> ???????? char *buffer;
>>> ???????? u64 addr;
>>> ???????? u32 len;
>>> @@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk, 
>>> struct msghdr *m,
>>>
>>> ???????? skb_put(skb, len);
>>> ???????? addr = desc.addr;
>>> +??????? if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>>> +??????????? addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
>>> +??????????????? (addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>>
>> This doesn't look right to me.? Shouldn't it be "(addr & mask) + 
>> (addr >> shift)"?
>> I'd also prefer to see this type of logic in an inline/macro
>
> Will fix in the v4, thanks!
>
> I'll look to move this to an inline function as well.
>
> <...>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190726/75ab8880/attachment-0001.html>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 00/11] XDP unaligned chunk placement support
  2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53       ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

These are solely for comparing performance with and without the patches.
The largest drop was ~1.5% (in zero-copy mode).

+-------------------------+-------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode    | Zero-copy       | Copy        |
+-------------------------+-------------+-----------------+-------------+
| Aligned (baseline)      | 1.25 Mpps   | 10.0 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Aligned (with patches)  | 1.25 Mpps   | 9.85 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Unaligned               | 1.25 Mpps   | 9.65 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+

This patch set has been applied against
commit 475e31f8da1b("Merge branch 'revamp-test_progs'")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add flags for umem configuration to libbpf

Patch 4:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 5:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 7:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  libbpf: add flags to umem config
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++---
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 +-
 include/net/xdp_sock.h                        | 40 +++++++++-
 include/uapi/linux/if_xdp.h                   | 14 +++-
 net/xdp/xdp_umem.c                            | 18 +++--
 net/xdp/xsk.c                                 | 79 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 69 ++++++++++++++--
 samples/bpf/xdpsock_user.c                    | 59 ++++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++-
 tools/lib/bpf/xsk.c                           |  3 +
 tools/lib/bpf/xsk.h                           |  2 +
 15 files changed, 280 insertions(+), 88 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 00/11] XDP unaligned chunk placement support
@ 2019-07-30  8:53       ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

These are solely for comparing performance with and without the patches.
The largest drop was ~1.5% (in zero-copy mode).

+-------------------------+-------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode    | Zero-copy       | Copy        |
+-------------------------+-------------+-----------------+-------------+
| Aligned (baseline)      | 1.25 Mpps   | 10.0 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Aligned (with patches)  | 1.25 Mpps   | 9.85 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Unaligned               | 1.25 Mpps   | 9.65 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+

This patch set has been applied against
commit 475e31f8da1b("Merge branch 'revamp-test_progs'")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add flags for umem configuration to libbpf

Patch 4:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 5:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 7:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  libbpf: add flags to umem config
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++---
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 +-
 include/net/xdp_sock.h                        | 40 +++++++++-
 include/uapi/linux/if_xdp.h                   | 14 +++-
 net/xdp/xdp_umem.c                            | 18 +++--
 net/xdp/xsk.c                                 | 79 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 69 ++++++++++++++--
 samples/bpf/xdpsock_user.c                    | 59 ++++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++-
 tools/lib/bpf/xsk.c                           |  3 +
 tools/lib/bpf/xsk.h                           |  2 +
 15 files changed, 280 insertions(+), 88 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 01/11] i40e: simplify Rx buffer recycle
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 01/11] i40e: simplify Rx buffer recycle
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 02/11] ixgbe: simplify Rx buffer recycle
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 02/11] ixgbe: simplify Rx buffer recycle
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.

v4:
  - modified chunk flag define
---
 tools/include/uapi/linux/if_xdp.h | 9 +++++++--
 tools/lib/bpf/xsk.c               | 3 +++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..a691802d7915 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,10 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
 #define XDP_OPTIONS			8
 
 struct xdp_umem_reg {
-	__u64 addr; /* Start of packet data area */
-	__u64 len; /* Length of packet data area */
+	__u64 addr;     /* Start of packet data area */
+	__u64 len:48;   /* Length of packet data area */
+	__u64 flags:16; /* Flags for umem */
 	__u32 chunk_size;
 	__u32 headroom;
 };
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 5007b5d4fd2c..5e7e4d420ee0 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.

v4:
  - modified chunk flag define
---
 tools/include/uapi/linux/if_xdp.h | 9 +++++++--
 tools/lib/bpf/xsk.c               | 3 +++
 tools/lib/bpf/xsk.h               | 2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..a691802d7915 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,10 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
 #define XDP_OPTIONS			8
 
 struct xdp_umem_reg {
-	__u64 addr; /* Start of packet data area */
-	__u64 len; /* Length of packet data area */
+	__u64 addr;     /* Start of packet data area */
+	__u64 len:48;   /* Length of packet data area */
+	__u64 flags:16; /* Flags for umem */
 	__u32 chunk_size;
 	__u32 headroom;
 };
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 5007b5d4fd2c..5e7e4d420ee0 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 04/11] xsk: add support to allow unaligned chunk placement
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition

v4:
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - other minor changes based on review comments
---
 include/net/xdp_sock.h      | 40 ++++++++++++++++++-
 include/uapi/linux/if_xdp.h | 14 ++++++-
 net/xdp/xdp_umem.c          | 18 ++++++---
 net/xdp/xsk.c               | 79 +++++++++++++++++++++++++++++--------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 69 ++++++++++++++++++++++++++++----
 6 files changed, 188 insertions(+), 34 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..a755e8ab6cac 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,6 +16,13 @@
 struct net_device;
 struct xsk_queue;
 
+/* Masks for xdp_umem_page flags.
+ * The low 12-bits of the addr will be 0 since this is the page address, so we
+ * can use them for flags.
+ */
+#define XSK_NEXT_PG_CONTIG_SHIFT 0
+#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
+
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -48,6 +55,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u16 flags;
 };
 
 struct xdp_sock {
@@ -98,12 +106,21 @@ struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
+	unsigned long page_addr;
+
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
+
+	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
 }
 
 static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+
+	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
 }
 
 /* Reuse-queue aware version of FILL queue helpers */
@@ -144,6 +161,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return address + offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +271,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 {
 }
 
+static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return 0;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..4a5490651b22 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,10 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
 #define XDP_OPTIONS			8
 
 struct xdp_umem_reg {
-	__u64 addr; /* Start of packet data area */
-	__u64 len; /* Length of packet data area */
+	__u64 addr;    /* Start of packet data area */
+	__u64 len:48;  /* Length of packet data area */
+	__u64 flags:16; /*Flags for umem */
 	__u32 chunk_size;
 	__u32 headroom;
 };
@@ -74,6 +79,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..5590ca7bbe15 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 59b57d708697..9b834d54549e 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & ~(PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -78,9 +99,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+	offset += metalen;
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -125,6 +147,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
+	u64 offset = xs->umem->headroom;
 	void *buffer;
 	u64 addr;
 	int err;
@@ -136,17 +159,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -190,7 +213,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -243,7 +266,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -262,6 +285,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 		skb_put(skb, len);
 		addr = desc.addr;
+		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) +
+				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
@@ -272,7 +299,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -412,6 +439,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ * Store the result in the low bits of addr.
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	struct xdp_umem_page *pgs = umem->pages;
+	int i, is_contig;
+
+	for (i = 0; i < umem->npgs - 1; i++) {
+		is_contig = (flags & XDP_ZEROCOPY) ?
+			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
+			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
+		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
+	}
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -500,6 +545,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..3d045c1c94b1 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
+			XSK_NEXT_PG_CONTIG_MASK;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -143,23 +154,50 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -170,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +267,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -244,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -261,7 +313,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -272,7 +325,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 04/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition

v4:
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - other minor changes based on review comments
---
 include/net/xdp_sock.h      | 40 ++++++++++++++++++-
 include/uapi/linux/if_xdp.h | 14 ++++++-
 net/xdp/xdp_umem.c          | 18 ++++++---
 net/xdp/xsk.c               | 79 +++++++++++++++++++++++++++++--------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 69 ++++++++++++++++++++++++++++----
 6 files changed, 188 insertions(+), 34 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..a755e8ab6cac 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,6 +16,13 @@
 struct net_device;
 struct xsk_queue;
 
+/* Masks for xdp_umem_page flags.
+ * The low 12-bits of the addr will be 0 since this is the page address, so we
+ * can use them for flags.
+ */
+#define XSK_NEXT_PG_CONTIG_SHIFT 0
+#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
+
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -48,6 +55,7 @@ struct xdp_umem {
 	bool zc;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
+	u16 flags;
 };
 
 struct xdp_sock {
@@ -98,12 +106,21 @@ struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
+	unsigned long page_addr;
+
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
+
+	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
 }
 
 static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+
+	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
 }
 
 /* Reuse-queue aware version of FILL queue helpers */
@@ -144,6 +161,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return address + offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +271,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 {
 }
 
+static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return 0;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..4a5490651b22 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,10 @@
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
 #define XDP_OPTIONS			8
 
 struct xdp_umem_reg {
-	__u64 addr; /* Start of packet data area */
-	__u64 len; /* Length of packet data area */
+	__u64 addr;    /* Start of packet data area */
+	__u64 len:48;  /* Length of packet data area */
+	__u64 flags:16; /*Flags for umem */
 	__u32 chunk_size;
 	__u32 headroom;
 };
@@ -74,6 +79,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..5590ca7bbe15 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 59b57d708697..9b834d54549e 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & ~(PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	void *from_buf;
 	u32 metalen;
 	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -78,9 +99,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+	offset += metalen;
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -125,6 +147,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
+	u64 offset = xs->umem->headroom;
 	void *buffer;
 	u64 addr;
 	int err;
@@ -136,17 +159,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -190,7 +213,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -243,7 +266,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -262,6 +285,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 
 		skb_put(skb, len);
 		addr = desc.addr;
+		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) +
+				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
@@ -272,7 +299,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -412,6 +439,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ * Store the result in the low bits of addr.
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	struct xdp_umem_page *pgs = umem->pages;
+	int i, is_contig;
+
+	for (i = 0; i < umem->npgs - 1; i++) {
+		is_contig = (flags & XDP_ZEROCOPY) ?
+			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
+			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
+		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
+	}
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -500,6 +545,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..3d045c1c94b1 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
+			XSK_NEXT_PG_CONTIG_MASK;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -143,23 +154,50 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -170,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +267,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -244,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -261,7 +313,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -272,7 +325,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 05/11] i40e: modify driver for handling offsets
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..09dd8fe28c35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 05/11] i40e: modify driver for handling offsets
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..09dd8fe28c35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 06/11] ixgbe: modify driver for handling offsets
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..11c400f8a6df 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 06/11] ixgbe: modify driver for handling offsets
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..11c400f8a6df 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset

v4:
  - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
    now.
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index b0b982cf69bb..d5245893d2c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 6a55573ec8f2..7c49a66d28c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	if (!xsk_umem_peek_addr_rq(umem, &handle))
 		return -ENOMEM;
 
-	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
+	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
+						      rq->buff.umem_headroom);
 	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
 
 	/* No need to add headroom to the DMA address. In striding RQ case, we
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset

v4:
  - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
    now.
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index b0b982cf69bb..d5245893d2c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 6a55573ec8f2..7c49a66d28c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	if (!xsk_umem_peek_addr_rq(umem, &handle))
 		return -ENOMEM;
 
-	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
+	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
+						      rq->buff.umem_headroom);
 	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
 
 	/* No need to add headroom to the DMA address. In striding RQ case, we
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v4:
  - updated help text for -f
  - use new chunk flag define
---
 samples/bpf/xdpsock_user.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..756b00eb1afe 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -51,6 +51,7 @@
 
 typedef __u64 u64;
 typedef __u32 u32;
+typedef __u16 u16;
 
 static unsigned long prev_time;
 
@@ -67,6 +68,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u16 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -282,7 +285,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -291,6 +296,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -352,6 +358,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -371,7 +378,8 @@ static void usage(const char *prog)
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
-		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two in aligned mode, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -384,7 +392,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -424,12 +432,17 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -442,7 +455,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+	    !opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v4:
  - updated help text for -f
  - use new chunk flag define
---
 samples/bpf/xdpsock_user.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..756b00eb1afe 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -51,6 +51,7 @@
 
 typedef __u64 u64;
 typedef __u32 u32;
+typedef __u16 u16;
 
 static unsigned long prev_time;
 
@@ -67,6 +68,8 @@ static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
+static u16 opt_umem_flags;
+static int opt_unaligned_chunks;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -282,7 +285,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -291,6 +296,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -352,6 +358,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -371,7 +378,8 @@ static void usage(const char *prog)
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
-		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two in aligned mode, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -384,7 +392,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -424,12 +432,17 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -442,7 +455,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+	    !opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling
---
 samples/bpf/xdpsock_user.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 756b00eb1afe..62b2059cd0e3 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -475,6 +475,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -487,22 +488,21 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -549,7 +549,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				addr + offset);
 
 		hex_dump(pkt, len, addr);
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -655,7 +659,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
 							  idx_rx)->addr;
 			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
 							 idx_rx++)->len;
-			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
 
 			swap_mac_addresses(pkt);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling
---
 samples/bpf/xdpsock_user.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 756b00eb1afe..62b2059cd0e3 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -475,6 +475,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -487,22 +488,21 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -549,7 +549,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
-		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				addr + offset);
 
 		hex_dump(pkt, len, addr);
 		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -655,7 +659,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
 							  idx_rx)->addr;
 			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
 							 idx_rx++)->len;
-			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
 
 			swap_mac_addresses(pkt);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 10/11] samples/bpf: use hugepages in xdpsock app
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:53         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 62b2059cd0e3..d1c61ec0e697 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -70,6 +70,7 @@ static int opt_poll;
 static int opt_interval = 1;
 static u16 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -435,6 +436,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -694,11 +696,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 10/11] samples/bpf: use hugepages in xdpsock app
@ 2019-07-30  8:53         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:53 UTC (permalink / raw)
  To: intel-wired-lan

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 62b2059cd0e3..d1c61ec0e697 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -70,6 +70,7 @@ static int opt_poll;
 static int opt_interval = 1;
 static u16 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
@@ -435,6 +436,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -694,11 +696,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v4 11/11] doc/af_xdp: include unaligned chunk case
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-30  8:54         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:54 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 11/11] doc/af_xdp: include unaligned chunk case
@ 2019-07-30  8:54         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-07-30  8:54 UTC (permalink / raw)
  To: intel-wired-lan

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-31 12:45           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-07-31 12:45 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf

On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>
> ---
> v2:
>   - Removed the headroom check from this patch. It has moved to the
>     previous patch.
>
> v4:
>   - modified chunk flag define
> ---
>  tools/include/uapi/linux/if_xdp.h | 9 +++++++--
>  tools/lib/bpf/xsk.c               | 3 +++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..a691802d7915 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS                    8
>
>  struct xdp_umem_reg {
> -       __u64 addr; /* Start of packet data area */
> -       __u64 len; /* Length of packet data area */
> +       __u64 addr;     /* Start of packet data area */
> +       __u64 len:48;   /* Length of packet data area */
> +       __u64 flags:16; /* Flags for umem */

So, the flags member moved from struct sockaddr_xdp to struct
xdp_umem_reg. Makes sense. However, I'm not a fan of the bitfield. Why
not just add the flags member after the last member (headroom) and
deal with it in xsk.c/xsk_setsockopt and libbpf? The bitfield
preserves the size, but makes it hard to read/error prone IMO.


Björn


>         __u32 chunk_size;
>         __u32 headroom;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..5e7e4d420ee0 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 833a6e60d065..44a03d8c34b9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;
>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-07-31 12:45           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-07-31 12:45 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>
> ---
> v2:
>   - Removed the headroom check from this patch. It has moved to the
>     previous patch.
>
> v4:
>   - modified chunk flag define
> ---
>  tools/include/uapi/linux/if_xdp.h | 9 +++++++--
>  tools/lib/bpf/xsk.c               | 3 +++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..a691802d7915 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS                    8
>
>  struct xdp_umem_reg {
> -       __u64 addr; /* Start of packet data area */
> -       __u64 len; /* Length of packet data area */
> +       __u64 addr;     /* Start of packet data area */
> +       __u64 len:48;   /* Length of packet data area */
> +       __u64 flags:16; /* Flags for umem */

So, the flags member moved from struct sockaddr_xdp to struct
xdp_umem_reg. Makes sense. However, I'm not a fan of the bitfield. Why
not just add the flags member after the last member (headroom) and
deal with it in xsk.c/xsk_setsockopt and libbpf? The bitfield
preserves the size, but makes it hard to read/error prone IMO.


Bj?rn


>         __u32 chunk_size;
>         __u32 headroom;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..5e7e4d420ee0 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 833a6e60d065..44a03d8c34b9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;
>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-31 14:25           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-07-31 14:25 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf

On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>
> ---
> v2:
>   - Removed the headroom check from this patch. It has moved to the
>     previous patch.
>
> v4:
>   - modified chunk flag define
> ---
>  tools/include/uapi/linux/if_xdp.h | 9 +++++++--
>  tools/lib/bpf/xsk.c               | 3 +++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..a691802d7915 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS                    8
>
>  struct xdp_umem_reg {
> -       __u64 addr; /* Start of packet data area */
> -       __u64 len; /* Length of packet data area */
> +       __u64 addr;     /* Start of packet data area */
> +       __u64 len:48;   /* Length of packet data area */
> +       __u64 flags:16; /* Flags for umem */
>         __u32 chunk_size;
>         __u32 headroom;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..5e7e4d420ee0 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 833a6e60d065..44a03d8c34b9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;

And the flags addition here, unfortunately, requires symbol versioning
of xsk_umem__create(). That'll be the first in libbpf! :-)


Björn

>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-07-31 14:25           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-07-31 14:25 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>
> ---
> v2:
>   - Removed the headroom check from this patch. It has moved to the
>     previous patch.
>
> v4:
>   - modified chunk flag define
> ---
>  tools/include/uapi/linux/if_xdp.h | 9 +++++++--
>  tools/lib/bpf/xsk.c               | 3 +++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..a691802d7915 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS                    8
>
>  struct xdp_umem_reg {
> -       __u64 addr; /* Start of packet data area */
> -       __u64 len; /* Length of packet data area */
> +       __u64 addr;     /* Start of packet data area */
> +       __u64 len:48;   /* Length of packet data area */
> +       __u64 flags:16; /* Flags for umem */
>         __u32 chunk_size;
>         __u32 headroom;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..5e7e4d420ee0 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 833a6e60d065..44a03d8c34b9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;

And the flags addition here, unfortunately, requires symbol versioning
of xsk_umem__create(). That'll be the first in libbpf! :-)


Bj?rn

>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
  2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-31 18:10           ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-31 18:10 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 30 Jul 2019, at 1:53, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure 
> we
> handle the offsets accordingly based on the mode we are currently 
> running
> in. This patch modifies the driver to appropriately mask the address 
> for
> each case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>
> ---
> v3:
>   - Use new helper function to handle offset
>
> v4:
>   - fixed headroom addition to handle. Using 
> xsk_umem_adjust_headroom()
>     now.
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index b0b982cf69bb..d5245893d2c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
> mlx5e_dma_info *di,
>  		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>  {
>  	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
> +	struct xdp_umem *umem = rq->umem;
>  	struct xdp_buff xdp;
>  	u32 act;
>  	int err;
> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
> mlx5e_dma_info *di,
>  	xdp.rxq = &rq->xdp_rxq;
>
>  	act = bpf_prog_run_xdp(prog, &xdp);
> -	if (xsk)
> -		xdp.handle += xdp.data - xdp.data_hard_start;
> +	if (xsk) {
> +		u64 off = xdp.data - xdp.data_hard_start;
> +
> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);

Shouldn't this be xdp_umem_adjust_offset()?


> +	}
>  	switch (act) {
>  	case XDP_PASS:
>  		*rx_headroom = xdp.data - xdp.data_hard_start;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index 6a55573ec8f2..7c49a66d28c9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>  	if (!xsk_umem_peek_addr_rq(umem, &handle))
>  		return -ENOMEM;
>
> -	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
> +	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
> +						      rq->buff.umem_headroom);
>  	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>
>  	/* No need to add headroom to the DMA address. In striding RQ case, 
> we
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
@ 2019-07-31 18:10           ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-31 18:10 UTC (permalink / raw)
  To: intel-wired-lan



On 30 Jul 2019, at 1:53, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure 
> we
> handle the offsets accordingly based on the mode we are currently 
> running
> in. This patch modifies the driver to appropriately mask the address 
> for
> each case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>
> ---
> v3:
>   - Use new helper function to handle offset
>
> v4:
>   - fixed headroom addition to handle. Using 
> xsk_umem_adjust_headroom()
>     now.
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index b0b982cf69bb..d5245893d2c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
> mlx5e_dma_info *di,
>  		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>  {
>  	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
> +	struct xdp_umem *umem = rq->umem;
>  	struct xdp_buff xdp;
>  	u32 act;
>  	int err;
> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
> mlx5e_dma_info *di,
>  	xdp.rxq = &rq->xdp_rxq;
>
>  	act = bpf_prog_run_xdp(prog, &xdp);
> -	if (xsk)
> -		xdp.handle += xdp.data - xdp.data_hard_start;
> +	if (xsk) {
> +		u64 off = xdp.data - xdp.data_hard_start;
> +
> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);

Shouldn't this be xdp_umem_adjust_offset()?


> +	}
>  	switch (act) {
>  	case XDP_PASS:
>  		*rx_headroom = xdp.data - xdp.data_hard_start;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index 6a55573ec8f2..7c49a66d28c9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>  	if (!xsk_umem_peek_addr_rq(umem, &handle))
>  		return -ENOMEM;
>
> -	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
> +	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
> +						      rq->buff.umem_headroom);
>  	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>
>  	/* No need to add headroom to the DMA address. In striding RQ case, 
> we
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v4 04/11] xsk: add support to allow unaligned chunk placement
  2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-31 18:11           ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-31 18:11 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 30 Jul 2019, at 1:53, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
>
> v4:
>   - fixed page_start calculation in __xsk_rcv_memcpy().
>   - move offset handling to the xdp_umem_get_* functions
>   - modified the len field in xdp_umem_reg struct. We now use 16 bits 
> from
>     this for the flags field.
>   - removed next_pg_contig field from xdp_umem_page struct. Using low 
> 12
>     bits of addr to store flags instead.
>   - other minor changes based on review comments
> ---
>  include/net/xdp_sock.h      | 40 ++++++++++++++++++-
>  include/uapi/linux/if_xdp.h | 14 ++++++-
>  net/xdp/xdp_umem.c          | 18 ++++++---
>  net/xdp/xsk.c               | 79 
> +++++++++++++++++++++++++++++--------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 69 ++++++++++++++++++++++++++++----
>  6 files changed, 188 insertions(+), 34 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..a755e8ab6cac 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>  struct net_device;
>  struct xsk_queue;
>
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page 
> address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> @@ -48,6 +55,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u16 flags;
>  };
>
>  struct xdp_sock {
> @@ -98,12 +106,21 @@ struct xdp_umem *xdp_get_umem_from_qid(struct 
> net_device *dev, u16 queue_id);
>
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 
> 1));
> +	unsigned long page_addr;
> +
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>  }
>
>  static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 
> 1));
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>  }
>
>  /* Reuse-queue aware version of FILL queue helpers */
> @@ -144,6 +161,19 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned 
> mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of 
> the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +271,12 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>  {
>  }
>
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..4a5490651b22 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 
> XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS			8
>
>  struct xdp_umem_reg {
> -	__u64 addr; /* Start of packet data area */
> -	__u64 len; /* Length of packet data area */
> +	__u64 addr;    /* Start of packet data area */
> +	__u64 len:48;  /* Length of packet data area */
> +	__u64 flags:16; /*Flags for umem */
>  	__u32 chunk_size;
>  	__u32 headroom;
>  };
> @@ -74,6 +79,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..5590ca7bbe15 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..9b834d54549e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;

If 'addr' (really handle) is v2 format and has an offset, this will
give incorrect results.  Please use accessors which correctly extract
page and offset from a handle.


> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);

'addr' is actually a handle here.  Can't add offset, this needs 'adjust 
offset'.


> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -125,6 +147,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
>  	u32 metalen = xdp->data - xdp->data_meta;
>  	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>  	void *buffer;
>  	u64 addr;
>  	int err;
> @@ -136,17 +159,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;
> @@ -190,7 +213,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -243,7 +266,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -262,6 +285,10 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>
>  		skb_put(skb, len);
>  		addr = desc.addr;
> +		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) +
> +				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);

This is repeating the computations done in xdp_umem_get_data().  Remove.


>  		buffer = xdp_umem_get_data(xs->umem, addr);
>  		err = skb_store_bits(skb, 0, buffer, len);
>  		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
> @@ -272,7 +299,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -412,6 +439,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -500,6 +545,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 909c5168ed0f..3d045c1c94b1 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -133,6 +133,17 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -143,23 +154,50 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -170,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -229,8 +267,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -244,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -261,7 +313,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -272,7 +325,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 04/11] xsk: add support to allow unaligned chunk placement
@ 2019-07-31 18:11           ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-31 18:11 UTC (permalink / raw)
  To: intel-wired-lan



On 30 Jul 2019, at 1:53, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
>
> v4:
>   - fixed page_start calculation in __xsk_rcv_memcpy().
>   - move offset handling to the xdp_umem_get_* functions
>   - modified the len field in xdp_umem_reg struct. We now use 16 bits 
> from
>     this for the flags field.
>   - removed next_pg_contig field from xdp_umem_page struct. Using low 
> 12
>     bits of addr to store flags instead.
>   - other minor changes based on review comments
> ---
>  include/net/xdp_sock.h      | 40 ++++++++++++++++++-
>  include/uapi/linux/if_xdp.h | 14 ++++++-
>  net/xdp/xdp_umem.c          | 18 ++++++---
>  net/xdp/xsk.c               | 79 
> +++++++++++++++++++++++++++++--------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 69 ++++++++++++++++++++++++++++----
>  6 files changed, 188 insertions(+), 34 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..a755e8ab6cac 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>  struct net_device;
>  struct xsk_queue;
>
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page 
> address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> @@ -48,6 +55,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u16 flags;
>  };
>
>  struct xdp_sock {
> @@ -98,12 +106,21 @@ struct xdp_umem *xdp_get_umem_from_qid(struct 
> net_device *dev, u16 queue_id);
>
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 
> 1));
> +	unsigned long page_addr;
> +
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>  }
>
>  static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 
> 1));
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>  }
>
>  /* Reuse-queue aware version of FILL queue helpers */
> @@ -144,6 +161,19 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned 
> mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of 
> the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +271,12 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>  {
>  }
>
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..4a5490651b22 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 
> XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS			8
>
>  struct xdp_umem_reg {
> -	__u64 addr; /* Start of packet data area */
> -	__u64 len; /* Length of packet data area */
> +	__u64 addr;    /* Start of packet data area */
> +	__u64 len:48;  /* Length of packet data area */
> +	__u64 flags:16; /*Flags for umem */
>  	__u32 chunk_size;
>  	__u32 headroom;
>  };
> @@ -74,6 +79,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..5590ca7bbe15 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..9b834d54549e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;

If 'addr' (really handle) is v2 format and has an offset, this will
give incorrect results.  Please use accessors which correctly extract
page and offset from a handle.


> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);

'addr' is actually a handle here.  Can't add offset, this needs 'adjust 
offset'.


> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -125,6 +147,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
>  	u32 metalen = xdp->data - xdp->data_meta;
>  	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>  	void *buffer;
>  	u64 addr;
>  	int err;
> @@ -136,17 +159,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;
> @@ -190,7 +213,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -243,7 +266,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -262,6 +285,10 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>
>  		skb_put(skb, len);
>  		addr = desc.addr;
> +		if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +			addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) +
> +				(addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);

This is repeating the computations done in xdp_umem_get_data().  Remove.


>  		buffer = xdp_umem_get_data(xs->umem, addr);
>  		err = skb_store_bits(skb, 0, buffer, len);
>  		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
> @@ -272,7 +299,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -412,6 +439,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -500,6 +545,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 909c5168ed0f..3d045c1c94b1 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -133,6 +133,17 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -143,23 +154,50 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -170,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -229,8 +267,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -244,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -261,7 +313,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -272,7 +325,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v4 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-07-31 18:26           ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-31 18:26 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 30 Jul 2019, at 1:53, Kevin Laatz wrote:

> This patch adds buffer recycling support for unaligned buffers. Since 
> we
> don't mask the addr to 2k at umem_reg in unaligned mode, we need to 
> make
> sure we give back the correct (original) addr to the fill queue. We 
> achieve
> this using the new descriptor format and associated masks. The new 
> format
> uses the upper 16-bits for the offset and the lower 48-bits for the 
> addr.
> Since we have a field for the offset, we no longer need to modify the
> actual address. As such, all we have to do to get back the original 
> address
> is mask for the lower 48 bits (i.e. strip the offset and we get the 
> address
> on it's own).
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Removed unused defines
>   - Fix buffer recycling for unaligned case
>   - Remove --buf-size (--frame-size merged before this)
>   - Modifications to use the new descriptor format for buffer 
> recycling
> ---
>  samples/bpf/xdpsock_user.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index 756b00eb1afe..62b2059cd0e3 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -475,6 +475,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
>
>  static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
>  {
> +	struct xsk_umem_info *umem = xsk->umem;
>  	u32 idx_cq = 0, idx_fq = 0;
>  	unsigned int rcvd;
>  	size_t ndescs;
> @@ -487,22 +488,21 @@ static inline void complete_tx_l2fwd(struct 
> xsk_socket_info *xsk)
>  		xsk->outstanding_tx;
>
>  	/* re-add completed Tx buffers */
> -	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
> +	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
>  	if (rcvd > 0) {
>  		unsigned int i;
>  		int ret;
>
> -		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
> +		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
>  		while (ret != rcvd) {
>  			if (ret < 0)
>  				exit_with_error(-ret);
> -			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
> -						     &idx_fq);
> +			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
>  		}
> +
>  		for (i = 0; i < rcvd; i++)
> -			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> -				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
> -							  idx_cq++);
> +			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
> +				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
>
>  		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
>  		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> @@ -549,7 +549,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
>  	for (i = 0; i < rcvd; i++) {
>  		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
>  		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> -		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +
> +		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
> +				addr + offset);

The mask constants should not be part of the api - this should be
hidden behind an accessor.

Something like:
   u64 addr = xsk_umem__get_addr(xsk->umem, handle);




>  		hex_dump(pkt, len, addr);
>  		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
> @@ -655,7 +659,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
>  							  idx_rx)->addr;
>  			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
>  							 idx_rx++)->len;
> -			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
> +				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
>
>  			swap_mac_addresses(pkt);
>
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-07-31 18:26           ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-07-31 18:26 UTC (permalink / raw)
  To: intel-wired-lan



On 30 Jul 2019, at 1:53, Kevin Laatz wrote:

> This patch adds buffer recycling support for unaligned buffers. Since 
> we
> don't mask the addr to 2k at umem_reg in unaligned mode, we need to 
> make
> sure we give back the correct (original) addr to the fill queue. We 
> achieve
> this using the new descriptor format and associated masks. The new 
> format
> uses the upper 16-bits for the offset and the lower 48-bits for the 
> addr.
> Since we have a field for the offset, we no longer need to modify the
> actual address. As such, all we have to do to get back the original 
> address
> is mask for the lower 48 bits (i.e. strip the offset and we get the 
> address
> on it's own).
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Removed unused defines
>   - Fix buffer recycling for unaligned case
>   - Remove --buf-size (--frame-size merged before this)
>   - Modifications to use the new descriptor format for buffer 
> recycling
> ---
>  samples/bpf/xdpsock_user.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index 756b00eb1afe..62b2059cd0e3 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -475,6 +475,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
>
>  static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
>  {
> +	struct xsk_umem_info *umem = xsk->umem;
>  	u32 idx_cq = 0, idx_fq = 0;
>  	unsigned int rcvd;
>  	size_t ndescs;
> @@ -487,22 +488,21 @@ static inline void complete_tx_l2fwd(struct 
> xsk_socket_info *xsk)
>  		xsk->outstanding_tx;
>
>  	/* re-add completed Tx buffers */
> -	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
> +	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
>  	if (rcvd > 0) {
>  		unsigned int i;
>  		int ret;
>
> -		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
> +		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
>  		while (ret != rcvd) {
>  			if (ret < 0)
>  				exit_with_error(-ret);
> -			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
> -						     &idx_fq);
> +			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
>  		}
> +
>  		for (i = 0; i < rcvd; i++)
> -			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
> -				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
> -							  idx_cq++);
> +			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
> +				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
>
>  		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
>  		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> @@ -549,7 +549,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
>  	for (i = 0; i < rcvd; i++) {
>  		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
>  		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
> -		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +		u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +
> +		addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +		char *pkt = xsk_umem__get_data(xsk->umem->buffer,
> +				addr + offset);

The mask constants should not be part of the api - this should be
hidden behind an accessor.

Something like:
   u64 addr = xsk_umem__get_addr(xsk->umem, handle);




>  		hex_dump(pkt, len, addr);
>  		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
> @@ -655,7 +659,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
>  							  idx_rx)->addr;
>  			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
>  							 idx_rx++)->len;
> -			char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +			u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +			char *pkt = xsk_umem__get_data(xsk->umem->buffer,
> +				(addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
>
>  			swap_mac_addresses(pkt);
>
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-07-31 14:25           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2019-08-01  6:59             ` Andrii Nakryiko
  -1 siblings, 0 replies; 274+ messages in thread
From: Andrii Nakryiko @ 2019-08-01  6:59 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Kevin Laatz, Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf

On Wed, Jul 31, 2019 at 8:21 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> >
> > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > This will allow for more options to be added for configuring umems.
> >
> > The first use for the flags field is to add a flag for unaligned chunks
> > mode. These flags can either be user-provided or filled with a default.
> >
> > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> >
> > ---
> > v2:
> >   - Removed the headroom check from this patch. It has moved to the
> >     previous patch.
> >
> > v4:
> >   - modified chunk flag define
> > ---

[...]

> > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > index 833a6e60d065..44a03d8c34b9 100644
> > --- a/tools/lib/bpf/xsk.h
> > +++ b/tools/lib/bpf/xsk.h
> > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> >  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
> >  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> >  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > +#define XSK_UMEM__DEFAULT_FLAGS 0
> >
> >  struct xsk_umem_config {
> >         __u32 fill_size;
> >         __u32 comp_size;
> >         __u32 frame_size;
> >         __u32 frame_headroom;
> > +       __u32 flags;
>
> And the flags addition here, unfortunately, requires symbol versioning
> of xsk_umem__create(). That'll be the first in libbpf! :-)

xsk_umem_config is passed by pointer to xsk_umem__create(), so this
doesn't break ABI, does it?

>
>
> Björn
>
> >  };
> >
> >  /* Flags for the libbpf_flags field. */
> > --
> > 2.17.1
> >
> > _______________________________________________
> > Intel-wired-lan mailing list
> > Intel-wired-lan@osuosl.org
> > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-08-01  6:59             ` Andrii Nakryiko
  0 siblings, 0 replies; 274+ messages in thread
From: Andrii Nakryiko @ 2019-08-01  6:59 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Jul 31, 2019 at 8:21 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> >
> > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > This will allow for more options to be added for configuring umems.
> >
> > The first use for the flags field is to add a flag for unaligned chunks
> > mode. These flags can either be user-provided or filled with a default.
> >
> > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> >
> > ---
> > v2:
> >   - Removed the headroom check from this patch. It has moved to the
> >     previous patch.
> >
> > v4:
> >   - modified chunk flag define
> > ---

[...]

> > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > index 833a6e60d065..44a03d8c34b9 100644
> > --- a/tools/lib/bpf/xsk.h
> > +++ b/tools/lib/bpf/xsk.h
> > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> >  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
> >  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> >  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > +#define XSK_UMEM__DEFAULT_FLAGS 0
> >
> >  struct xsk_umem_config {
> >         __u32 fill_size;
> >         __u32 comp_size;
> >         __u32 frame_size;
> >         __u32 frame_headroom;
> > +       __u32 flags;
>
> And the flags addition here, unfortunately, requires symbol versioning
> of xsk_umem__create(). That'll be the first in libbpf! :-)

xsk_umem_config is passed by pointer to xsk_umem__create(), so this
doesn't break ABI, does it?

>
>
> Bj?rn
>
> >  };
> >
> >  /* Flags for the libbpf_flags field. */
> > --
> > 2.17.1
> >
> > _______________________________________________
> > Intel-wired-lan mailing list
> > Intel-wired-lan at osuosl.org
> > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-08-01  6:59             ` Andrii Nakryiko
@ 2019-08-01  7:34               ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-08-01  7:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kevin Laatz, Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf

On Thu, 1 Aug 2019 at 08:59, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Jul 31, 2019 at 8:21 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> >
> > On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> > >
> > > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > > This will allow for more options to be added for configuring umems.
> > >
> > > The first use for the flags field is to add a flag for unaligned chunks
> > > mode. These flags can either be user-provided or filled with a default.
> > >
> > > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > >
> > > ---
> > > v2:
> > >   - Removed the headroom check from this patch. It has moved to the
> > >     previous patch.
> > >
> > > v4:
> > >   - modified chunk flag define
> > > ---
>
> [...]
>
> > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > > index 833a6e60d065..44a03d8c34b9 100644
> > > --- a/tools/lib/bpf/xsk.h
> > > +++ b/tools/lib/bpf/xsk.h
> > > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> > >  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
> > >  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> > >  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > > +#define XSK_UMEM__DEFAULT_FLAGS 0
> > >
> > >  struct xsk_umem_config {
> > >         __u32 fill_size;
> > >         __u32 comp_size;
> > >         __u32 frame_size;
> > >         __u32 frame_headroom;
> > > +       __u32 flags;
> >
> > And the flags addition here, unfortunately, requires symbol versioning
> > of xsk_umem__create(). That'll be the first in libbpf! :-)
>
> xsk_umem_config is passed by pointer to xsk_umem__create(), so this
> doesn't break ABI, does it?
>

Old application, dynamically linked to new libbpf.so will crash,
right? Old application passes old version of xsk_umem_config, and new
library accesses (non-existing) flag struct member.


Björn


> >
> >
> > Björn
> >
> > >  };
> > >
> > >  /* Flags for the libbpf_flags field. */
> > > --
> > > 2.17.1
> > >
> > > _______________________________________________
> > > Intel-wired-lan mailing list
> > > Intel-wired-lan@osuosl.org
> > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-08-01  7:34               ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-08-01  7:34 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 1 Aug 2019 at 08:59, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Jul 31, 2019 at 8:21 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> >
> > On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> > >
> > > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > > This will allow for more options to be added for configuring umems.
> > >
> > > The first use for the flags field is to add a flag for unaligned chunks
> > > mode. These flags can either be user-provided or filled with a default.
> > >
> > > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > >
> > > ---
> > > v2:
> > >   - Removed the headroom check from this patch. It has moved to the
> > >     previous patch.
> > >
> > > v4:
> > >   - modified chunk flag define
> > > ---
>
> [...]
>
> > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > > index 833a6e60d065..44a03d8c34b9 100644
> > > --- a/tools/lib/bpf/xsk.h
> > > +++ b/tools/lib/bpf/xsk.h
> > > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> > >  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
> > >  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> > >  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > > +#define XSK_UMEM__DEFAULT_FLAGS 0
> > >
> > >  struct xsk_umem_config {
> > >         __u32 fill_size;
> > >         __u32 comp_size;
> > >         __u32 frame_size;
> > >         __u32 frame_headroom;
> > > +       __u32 flags;
> >
> > And the flags addition here, unfortunately, requires symbol versioning
> > of xsk_umem__create(). That'll be the first in libbpf! :-)
>
> xsk_umem_config is passed by pointer to xsk_umem__create(), so this
> doesn't break ABI, does it?
>

Old application, dynamically linked to new libbpf.so will crash,
right? Old application passes old version of xsk_umem_config, and new
library accesses (non-existing) flag struct member.


Bj?rn


> >
> >
> > Bj?rn
> >
> > >  };
> > >
> > >  /* Flags for the libbpf_flags field. */
> > > --
> > > 2.17.1
> > >
> > > _______________________________________________
> > > Intel-wired-lan mailing list
> > > Intel-wired-lan at osuosl.org
> > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
  2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-01 10:05           ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-08-01 10:05 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-07-30 11:53, Kevin Laatz wrote:
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Please note that this patch doesn't actually add the support for the new 
feature, because the validation checks in mlx5e_rx_get_linear_frag_sz 
and mlx5e_validate_xsk_param need to be relaxed. Currently the frame 
size of PAGE_SIZE is forced, and the fragment size is increased to 
PAGE_SIZE in case of XDP (including XSK).

After making the changes required to permit frame sizes smaller than 
PAGE_SIZE, our Striding RQ feature will be used in a way we haven't used 
it before, so we need to verify with the hardware team that this usage 
is legitimate.

> ---
> v3:
>    - Use new helper function to handle offset
> 
> v4:
>    - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
>      now.
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>   drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>   2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index b0b982cf69bb..d5245893d2c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>   {
>   	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
> +	struct xdp_umem *umem = rq->umem;
>   	struct xdp_buff xdp;
>   	u32 act;
>   	int err;
> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	xdp.rxq = &rq->xdp_rxq;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
> -	if (xsk)
> -		xdp.handle += xdp.data - xdp.data_hard_start;
> +	if (xsk) {
> +		u64 off = xdp.data - xdp.data_hard_start;
> +
> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
> +	}
>   	switch (act) {
>   	case XDP_PASS:
>   		*rx_headroom = xdp.data - xdp.data_hard_start;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index 6a55573ec8f2..7c49a66d28c9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>   	if (!xsk_umem_peek_addr_rq(umem, &handle))
>   		return -ENOMEM;
>   
> -	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
> +	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
> +						      rq->buff.umem_headroom);
>   	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>   
>   	/* No need to add headroom to the DMA address. In striding RQ case, we
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
@ 2019-08-01 10:05           ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-08-01 10:05 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-07-30 11:53, Kevin Laatz wrote:
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Please note that this patch doesn't actually add the support for the new 
feature, because the validation checks in mlx5e_rx_get_linear_frag_sz 
and mlx5e_validate_xsk_param need to be relaxed. Currently the frame 
size of PAGE_SIZE is forced, and the fragment size is increased to 
PAGE_SIZE in case of XDP (including XSK).

After making the changes required to permit frame sizes smaller than 
PAGE_SIZE, our Striding RQ feature will be used in a way we haven't used 
it before, so we need to verify with the hardware team that this usage 
is legitimate.

> ---
> v3:
>    - Use new helper function to handle offset
> 
> v4:
>    - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
>      now.
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>   drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>   2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index b0b982cf69bb..d5245893d2c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>   {
>   	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
> +	struct xdp_umem *umem = rq->umem;
>   	struct xdp_buff xdp;
>   	u32 act;
>   	int err;
> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	xdp.rxq = &rq->xdp_rxq;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
> -	if (xsk)
> -		xdp.handle += xdp.data - xdp.data_hard_start;
> +	if (xsk) {
> +		u64 off = xdp.data - xdp.data_hard_start;
> +
> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
> +	}
>   	switch (act) {
>   	case XDP_PASS:
>   		*rx_headroom = xdp.data - xdp.data_hard_start;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index 6a55573ec8f2..7c49a66d28c9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>   	if (!xsk_umem_peek_addr_rq(umem, &handle))
>   		return -ENOMEM;
>   
> -	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
> +	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
> +						      rq->buff.umem_headroom);
>   	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>   
>   	/* No need to add headroom to the DMA address. In striding RQ case, we
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-08-01  7:34               ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2019-08-02  7:19                 ` Andrii Nakryiko
  -1 siblings, 0 replies; 274+ messages in thread
From: Andrii Nakryiko @ 2019-08-02  7:19 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Kevin Laatz, Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf

On Thu, Aug 1, 2019 at 12:34 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> On Thu, 1 Aug 2019 at 08:59, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2019 at 8:21 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> > >
> > > On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> > > >
> > > > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > > > This will allow for more options to be added for configuring umems.
> > > >
> > > > The first use for the flags field is to add a flag for unaligned chunks
> > > > mode. These flags can either be user-provided or filled with a default.
> > > >
> > > > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > > >
> > > > ---
> > > > v2:
> > > >   - Removed the headroom check from this patch. It has moved to the
> > > >     previous patch.
> > > >
> > > > v4:
> > > >   - modified chunk flag define
> > > > ---
> >
> > [...]
> >
> > > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > > > index 833a6e60d065..44a03d8c34b9 100644
> > > > --- a/tools/lib/bpf/xsk.h
> > > > +++ b/tools/lib/bpf/xsk.h
> > > > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> > > >  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
> > > >  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> > > >  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > > > +#define XSK_UMEM__DEFAULT_FLAGS 0
> > > >
> > > >  struct xsk_umem_config {
> > > >         __u32 fill_size;
> > > >         __u32 comp_size;
> > > >         __u32 frame_size;
> > > >         __u32 frame_headroom;
> > > > +       __u32 flags;
> > >
> > > And the flags addition here, unfortunately, requires symbol versioning
> > > of xsk_umem__create(). That'll be the first in libbpf! :-)
> >
> > xsk_umem_config is passed by pointer to xsk_umem__create(), so this
> > doesn't break ABI, does it?
> >
>
> Old application, dynamically linked to new libbpf.so will crash,
> right? Old application passes old version of xsk_umem_config, and new
> library accesses (non-existing) flag struct member.

I think we have similar problems for all the _xattr type of commands
(as well some of btf stuff accepting extra opts structs). How is this
problem solved in general? Do we version same function multiple times,
one for each added field? It feels like there should be some better
way to handle this...

>
>
> Björn
>
>
> > >
> > >
> > > Björn
> > >
> > > >  };
> > > >
> > > >  /* Flags for the libbpf_flags field. */
> > > > --
> > > > 2.17.1
> > > >
> > > > _______________________________________________
> > > > Intel-wired-lan mailing list
> > > > Intel-wired-lan@osuosl.org
> > > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-08-02  7:19                 ` Andrii Nakryiko
  0 siblings, 0 replies; 274+ messages in thread
From: Andrii Nakryiko @ 2019-08-02  7:19 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Aug 1, 2019 at 12:34 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> On Thu, 1 Aug 2019 at 08:59, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2019 at 8:21 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> > >
> > > On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> > > >
> > > > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > > > This will allow for more options to be added for configuring umems.
> > > >
> > > > The first use for the flags field is to add a flag for unaligned chunks
> > > > mode. These flags can either be user-provided or filled with a default.
> > > >
> > > > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > > >
> > > > ---
> > > > v2:
> > > >   - Removed the headroom check from this patch. It has moved to the
> > > >     previous patch.
> > > >
> > > > v4:
> > > >   - modified chunk flag define
> > > > ---
> >
> > [...]
> >
> > > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > > > index 833a6e60d065..44a03d8c34b9 100644
> > > > --- a/tools/lib/bpf/xsk.h
> > > > +++ b/tools/lib/bpf/xsk.h
> > > > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> > > >  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
> > > >  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> > > >  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > > > +#define XSK_UMEM__DEFAULT_FLAGS 0
> > > >
> > > >  struct xsk_umem_config {
> > > >         __u32 fill_size;
> > > >         __u32 comp_size;
> > > >         __u32 frame_size;
> > > >         __u32 frame_headroom;
> > > > +       __u32 flags;
> > >
> > > And the flags addition here, unfortunately, requires symbol versioning
> > > of xsk_umem__create(). That'll be the first in libbpf! :-)
> >
> > xsk_umem_config is passed by pointer to xsk_umem__create(), so this
> > doesn't break ABI, does it?
> >
>
> Old application, dynamically linked to new libbpf.so will crash,
> right? Old application passes old version of xsk_umem_config, and new
> library accesses (non-existing) flag struct member.

I think we have similar problems for all the _xattr type of commands
(as well some of btf stuff accepting extra opts structs). How is this
problem solved in general? Do we version same function multiple times,
one for each added field? It feels like there should be some better
way to handle this...

>
>
> Bj?rn
>
>
> > >
> > >
> > > Bj?rn
> > >
> > > >  };
> > > >
> > > >  /* Flags for the libbpf_flags field. */
> > > > --
> > > > 2.17.1
> > > >
> > > > _______________________________________________
> > > > Intel-wired-lan mailing list
> > > > Intel-wired-lan at osuosl.org
> > > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
  2019-08-02  7:19                 ` Andrii Nakryiko
@ 2019-08-02  7:26                   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 274+ messages in thread
From: Björn Töpel @ 2019-08-02  7:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kevin Laatz, Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf

On Fri, 2 Aug 2019 at 09:19, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 12:34 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> >
[...]
> >
> > Old application, dynamically linked to new libbpf.so will crash,
> > right? Old application passes old version of xsk_umem_config, and new
> > library accesses (non-existing) flag struct member.
>
> I think we have similar problems for all the _xattr type of commands
> (as well some of btf stuff accepting extra opts structs). How is this
> problem solved in general? Do we version same function multiple times,
> one for each added field? It feels like there should be some better
> way to handle this...
>

If the size of the struct was passed as an argument (and extra care is
taken when adding members to the struct), it could be handled w/o
versioning... but that's not the case here. :-( Versioning is a mess
to deal with, so I'd be happy if it could be avoided...

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
@ 2019-08-02  7:26                   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 274+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-08-02  7:26 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 2 Aug 2019 at 09:19, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 12:34 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> >
[...]
> >
> > Old application, dynamically linked to new libbpf.so will crash,
> > right? Old application passes old version of xsk_umem_config, and new
> > library accesses (non-existing) flag struct member.
>
> I think we have similar problems for all the _xattr type of commands
> (as well some of btf stuff accepting extra opts structs). How is this
> problem solved in general? Do we version same function multiple times,
> one for each added field? It feels like there should be some better
> way to handle this...
>

If the size of the struct was passed as an argument (and extra care is
taken when adding members to the struct), it could be handled w/o
versioning... but that's not the case here. :-( Versioning is a mess
to deal with, so I'd be happy if it could be avoided...

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
  2019-08-01 10:05           ` [Intel-wired-lan] " Maxim Mikityanskiy
@ 2019-08-19 14:36             ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-08-19 14:36 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-08-01 13:05, Maxim Mikityanskiy wrote:
> On 2019-07-30 11:53, Kevin Laatz wrote:
>> With the addition of the unaligned chunks option, we need to make sure we
>> handle the offsets accordingly based on the mode we are currently running
>> in. This patch modifies the driver to appropriately mask the address for
>> each case.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> 
> Please note that this patch doesn't actually add the support for the new 
> feature, because the validation checks in mlx5e_rx_get_linear_frag_sz 
> and mlx5e_validate_xsk_param need to be relaxed. Currently the frame 
> size of PAGE_SIZE is forced, and the fragment size is increased to 
> PAGE_SIZE in case of XDP (including XSK).
> 
> After making the changes required to permit frame sizes smaller than 
> PAGE_SIZE, our Striding RQ feature will be used in a way we haven't used 
> it before, so we need to verify with the hardware team that this usage 
> is legitimate.

After discussing it internally, we found a way to support unaligned XSK 
with Striding RQ, and the hardware is compatible with this way. I have 
performed some testing, and it looks working.

Your patch only adds support for the new handle format to our driver, 
and I've made another patch that actually enables the new feature (makes 
mlx5e accept frame sizes different from PAGE_SIZE). It's currently on 
internal review.

Please also don't forget to fix the s/_handle_/_adjust_/ typo.

>> ---
>> v3:
>>    - Use new helper function to handle offset
>>
>> v4:
>>    - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
>>      now.
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>>   drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>>   2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> index b0b982cf69bb..d5245893d2c8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
>> mlx5e_dma_info *di,
>>                 void *va, u16 *rx_headroom, u32 *len, bool xsk)
>>   {
>>       struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>> +    struct xdp_umem *umem = rq->umem;
>>       struct xdp_buff xdp;
>>       u32 act;
>>       int err;
>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
>> mlx5e_dma_info *di,
>>       xdp.rxq = &rq->xdp_rxq;
>>       act = bpf_prog_run_xdp(prog, &xdp);
>> -    if (xsk)
>> -        xdp.handle += xdp.data - xdp.data_hard_start;
>> +    if (xsk) {
>> +        u64 off = xdp.data - xdp.data_hard_start;
>> +
>> +        xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>> +    }
>>       switch (act) {
>>       case XDP_PASS:
>>           *rx_headroom = xdp.data - xdp.data_hard_start;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>> index 6a55573ec8f2..7c49a66d28c9 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>>       if (!xsk_umem_peek_addr_rq(umem, &handle))
>>           return -ENOMEM;
>> -    dma_info->xsk.handle = handle + rq->buff.umem_headroom;
>> +    dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
>> +                              rq->buff.umem_headroom);
>>       dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>>       /* No need to add headroom to the DMA address. In striding RQ 
>> case, we
>>
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
@ 2019-08-19 14:36             ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-08-19 14:36 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-08-01 13:05, Maxim Mikityanskiy wrote:
> On 2019-07-30 11:53, Kevin Laatz wrote:
>> With the addition of the unaligned chunks option, we need to make sure we
>> handle the offsets accordingly based on the mode we are currently running
>> in. This patch modifies the driver to appropriately mask the address for
>> each case.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> 
> Please note that this patch doesn't actually add the support for the new 
> feature, because the validation checks in mlx5e_rx_get_linear_frag_sz 
> and mlx5e_validate_xsk_param need to be relaxed. Currently the frame 
> size of PAGE_SIZE is forced, and the fragment size is increased to 
> PAGE_SIZE in case of XDP (including XSK).
> 
> After making the changes required to permit frame sizes smaller than 
> PAGE_SIZE, our Striding RQ feature will be used in a way we haven't used 
> it before, so we need to verify with the hardware team that this usage 
> is legitimate.

After discussing it internally, we found a way to support unaligned XSK 
with Striding RQ, and the hardware is compatible with this way. I have 
performed some testing, and it looks working.

Your patch only adds support for the new handle format to our driver, 
and I've made another patch that actually enables the new feature (makes 
mlx5e accept frame sizes different from PAGE_SIZE). It's currently on 
internal review.

Please also don't forget to fix the s/_handle_/_adjust_/ typo.

>> ---
>> v3:
>> ?? - Use new helper function to handle offset
>>
>> v4:
>> ?? - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
>> ???? now.
>> ---
>> ? drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c??? | 8 ++++++--
>> ? drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>> ? 2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> index b0b982cf69bb..d5245893d2c8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
>> mlx5e_dma_info *di,
>> ??????????????? void *va, u16 *rx_headroom, u32 *len, bool xsk)
>> ? {
>> ????? struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>> +??? struct xdp_umem *umem = rq->umem;
>> ????? struct xdp_buff xdp;
>> ????? u32 act;
>> ????? int err;
>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct 
>> mlx5e_dma_info *di,
>> ????? xdp.rxq = &rq->xdp_rxq;
>> ????? act = bpf_prog_run_xdp(prog, &xdp);
>> -??? if (xsk)
>> -??????? xdp.handle += xdp.data - xdp.data_hard_start;
>> +??? if (xsk) {
>> +??????? u64 off = xdp.data - xdp.data_hard_start;
>> +
>> +??????? xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>> +??? }
>> ????? switch (act) {
>> ????? case XDP_PASS:
>> ????????? *rx_headroom = xdp.data - xdp.data_hard_start;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>> index 6a55573ec8f2..7c49a66d28c9 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>> ????? if (!xsk_umem_peek_addr_rq(umem, &handle))
>> ????????? return -ENOMEM;
>> -??? dma_info->xsk.handle = handle + rq->buff.umem_headroom;
>> +??? dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
>> +????????????????????????????? rq->buff.umem_headroom);
>> ????? dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>> ????? /* No need to add headroom to the DMA address. In striding RQ 
>> case, we
>>
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
  2019-08-19 14:36             ` [Intel-wired-lan] " Maxim Mikityanskiy
@ 2019-08-19 14:47               ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-08-19 14:47 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 19/08/2019 15:36, Maxim Mikityanskiy wrote:
> On 2019-08-01 13:05, Maxim Mikityanskiy wrote:
>> On 2019-07-30 11:53, Kevin Laatz wrote:
>>> With the addition of the unaligned chunks option, we need to make sure we
>>> handle the offsets accordingly based on the mode we are currently running
>>> in. This patch modifies the driver to appropriately mask the address for
>>> each case.
>>>
>>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Please note that this patch doesn't actually add the support for the new
>> feature, because the validation checks in mlx5e_rx_get_linear_frag_sz
>> and mlx5e_validate_xsk_param need to be relaxed. Currently the frame
>> size of PAGE_SIZE is forced, and the fragment size is increased to
>> PAGE_SIZE in case of XDP (including XSK).
>>
>> After making the changes required to permit frame sizes smaller than
>> PAGE_SIZE, our Striding RQ feature will be used in a way we haven't used
>> it before, so we need to verify with the hardware team that this usage
>> is legitimate.
> After discussing it internally, we found a way to support unaligned XSK
> with Striding RQ, and the hardware is compatible with this way. I have
> performed some testing, and it looks working.
Great news! :-)
>
> Your patch only adds support for the new handle format to our driver,
> and I've made another patch that actually enables the new feature (makes
> mlx5e accept frame sizes different from PAGE_SIZE). It's currently on
> internal review.
Thanks for taking the time to prepare the patch!
>
> Please also don't forget to fix the s/_handle_/_adjust_/ typo.

I have fixed it in the next version (also currently on internal review).

Regards,

Kevin

>
>>> ---
>>> v3:
>>>     - Use new helper function to handle offset
>>>
>>> v4:
>>>     - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
>>>       now.
>>> ---
>>>    drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>>>    drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>>>    2 files changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> index b0b982cf69bb..d5245893d2c8 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct
>>> mlx5e_dma_info *di,
>>>                  void *va, u16 *rx_headroom, u32 *len, bool xsk)
>>>    {
>>>        struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>>> +    struct xdp_umem *umem = rq->umem;
>>>        struct xdp_buff xdp;
>>>        u32 act;
>>>        int err;
>>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct
>>> mlx5e_dma_info *di,
>>>        xdp.rxq = &rq->xdp_rxq;
>>>        act = bpf_prog_run_xdp(prog, &xdp);
>>> -    if (xsk)
>>> -        xdp.handle += xdp.data - xdp.data_hard_start;
>>> +    if (xsk) {
>>> +        u64 off = xdp.data - xdp.data_hard_start;
>>> +
>>> +        xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>>> +    }
>>>        switch (act) {
>>>        case XDP_PASS:
>>>            *rx_headroom = xdp.data - xdp.data_hard_start;
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> index 6a55573ec8f2..7c49a66d28c9 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>>>        if (!xsk_umem_peek_addr_rq(umem, &handle))
>>>            return -ENOMEM;
>>> -    dma_info->xsk.handle = handle + rq->buff.umem_headroom;
>>> +    dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
>>> +                              rq->buff.umem_headroom);
>>>        dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>>>        /* No need to add headroom to the DMA address. In striding RQ
>>> case, we
>>>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v4 07/11] mlx5e: modify driver for handling offsets
@ 2019-08-19 14:47               ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-08-19 14:47 UTC (permalink / raw)
  To: intel-wired-lan

On 19/08/2019 15:36, Maxim Mikityanskiy wrote:
> On 2019-08-01 13:05, Maxim Mikityanskiy wrote:
>> On 2019-07-30 11:53, Kevin Laatz wrote:
>>> With the addition of the unaligned chunks option, we need to make sure we
>>> handle the offsets accordingly based on the mode we are currently running
>>> in. This patch modifies the driver to appropriately mask the address for
>>> each case.
>>>
>>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Please note that this patch doesn't actually add the support for the new
>> feature, because the validation checks in mlx5e_rx_get_linear_frag_sz
>> and mlx5e_validate_xsk_param need to be relaxed. Currently the frame
>> size of PAGE_SIZE is forced, and the fragment size is increased to
>> PAGE_SIZE in case of XDP (including XSK).
>>
>> After making the changes required to permit frame sizes smaller than
>> PAGE_SIZE, our Striding RQ feature will be used in a way we haven't used
>> it before, so we need to verify with the hardware team that this usage
>> is legitimate.
> After discussing it internally, we found a way to support unaligned XSK
> with Striding RQ, and the hardware is compatible with this way. I have
> performed some testing, and it looks working.
Great news! :-)
>
> Your patch only adds support for the new handle format to our driver,
> and I've made another patch that actually enables the new feature (makes
> mlx5e accept frame sizes different from PAGE_SIZE). It's currently on
> internal review.
Thanks for taking the time to prepare the patch!
>
> Please also don't forget to fix the s/_handle_/_adjust_/ typo.

I have fixed it in the next version (also currently on internal review).

Regards,

Kevin

>
>>> ---
>>> v3:
>>>  ?? - Use new helper function to handle offset
>>>
>>> v4:
>>>  ?? - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
>>>  ???? now.
>>> ---
>>>  ? drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c??? | 8 ++++++--
>>>  ? drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
>>>  ? 2 files changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> index b0b982cf69bb..d5245893d2c8 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct
>>> mlx5e_dma_info *di,
>>>  ??????????????? void *va, u16 *rx_headroom, u32 *len, bool xsk)
>>>  ? {
>>>  ????? struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>>> +??? struct xdp_umem *umem = rq->umem;
>>>  ????? struct xdp_buff xdp;
>>>  ????? u32 act;
>>>  ????? int err;
>>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct
>>> mlx5e_dma_info *di,
>>>  ????? xdp.rxq = &rq->xdp_rxq;
>>>  ????? act = bpf_prog_run_xdp(prog, &xdp);
>>> -??? if (xsk)
>>> -??????? xdp.handle += xdp.data - xdp.data_hard_start;
>>> +??? if (xsk) {
>>> +??????? u64 off = xdp.data - xdp.data_hard_start;
>>> +
>>> +??????? xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>>> +??? }
>>>  ????? switch (act) {
>>>  ????? case XDP_PASS:
>>>  ????????? *rx_headroom = xdp.data - xdp.data_hard_start;
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> index 6a55573ec8f2..7c49a66d28c9 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
>>> @@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>>>  ????? if (!xsk_umem_peek_addr_rq(umem, &handle))
>>>  ????????? return -ENOMEM;
>>> -??? dma_info->xsk.handle = handle + rq->buff.umem_headroom;
>>> +??? dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
>>> +????????????????????????????? rq->buff.umem_headroom);
>>>  ????? dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
>>>  ????? /* No need to add headroom to the DMA address. In striding RQ
>>> case, we
>>>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 00/11] XDP unaligned chunk placement support
  2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44         ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

Note: The mlx5e patch only adds support for the new offset handling. Maxim
Mikityanski <maximmi@mellanox.com> is working on an additional patch that
enables the unaligned chunks feature (makes mlx5e accept arbitrary frame
sizes) which will be added later and applies on top of this patch.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

These are solely for comparing performance with and without the patches.
The largest drop was ~1.5% (in zero-copy mode).

+-------------------------+-------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode    | Zero-copy       | Copy        |
+-------------------------+-------------+-----------------+-------------+
| Aligned (baseline)      | 1.25 Mpps   | 10.0 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Aligned (with patches)  | 1.25 Mpps   | 9.85 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Unaligned               | 1.25 Mpps   | 9.65 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+

This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - Add flags for umem configuration to libbpf. Since we increase the size
    of the struct by adding flags, we also need to add the ABI versioning
    in this patch.

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

v5:
  - Added ABI versioning in the libbpf patch
  - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
  - Added accessors for getting addr and offset.
  - Added helper function for adding the offset to the addr.
  - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
  - Fixed typo in mlx driver patch.
  - Moved libbpf patch to later in the set (7/11, just before the sample
    app changes)

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 +-
 include/net/xdp_sock.h                        | 75 ++++++++++++++-
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 19 +++-
 net/xdp/xsk.c                                 | 96 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 68 +++++++++++--
 samples/bpf/xdpsock_user.c                    | 61 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++
 tools/lib/bpf/Makefile                        |  5 +-
 tools/lib/bpf/libbpf.map                      |  1 +
 tools/lib/bpf/xsk.c                           | 33 ++++++-
 tools/lib/bpf/xsk.h                           | 27 ++++++
 17 files changed, 384 insertions(+), 94 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 00/11] XDP unaligned chunk placement support
@ 2019-08-22  1:44         ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

Note: The mlx5e patch only adds support for the new offset handling. Maxim
Mikityanski <maximmi@mellanox.com> is working on an additional patch that
enables the unaligned chunks feature (makes mlx5e accept arbitrary frame
sizes) which will be added later and applies on top of this patch.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

These are solely for comparing performance with and without the patches.
The largest drop was ~1.5% (in zero-copy mode).

+-------------------------+-------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode    | Zero-copy       | Copy        |
+-------------------------+-------------+-----------------+-------------+
| Aligned (baseline)      | 1.25 Mpps   | 10.0 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Aligned (with patches)  | 1.25 Mpps   | 9.85 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Unaligned               | 1.25 Mpps   | 9.65 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+

This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - Add flags for umem configuration to libbpf. Since we increase the size
    of the struct by adding flags, we also need to add the ABI versioning
    in this patch.

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

v5:
  - Added ABI versioning in the libbpf patch
  - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
  - Added accessors for getting addr and offset.
  - Added helper function for adding the offset to the addr.
  - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
  - Fixed typo in mlx driver patch.
  - Moved libbpf patch to later in the set (7/11, just before the sample
    app changes)

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 +-
 include/net/xdp_sock.h                        | 75 ++++++++++++++-
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 19 +++-
 net/xdp/xsk.c                                 | 96 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 68 +++++++++++--
 samples/bpf/xdpsock_user.c                    | 61 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++
 tools/lib/bpf/Makefile                        |  5 +-
 tools/lib/bpf/libbpf.map                      |  1 +
 tools/lib/bpf/xsk.c                           | 33 ++++++-
 tools/lib/bpf/xsk.h                           | 27 ++++++
 17 files changed, 384 insertions(+), 94 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 01/11] i40e: simplify Rx buffer recycle
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 42c90126b6c4..2d6e82f72f48 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 01/11] i40e: simplify Rx buffer recycle
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 42c90126b6c4..2d6e82f72f48 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 02/11] ixgbe: simplify Rx buffer recycle
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 9a28d98a1484..e1f743470ae9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 02/11] ixgbe: simplify Rx buffer recycle
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 9a28d98a1484..e1f743470ae9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition

v4:
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - other minor changes based on review comments

v5:
  - Added accessors for getting addr and offset
  - Added helper function to add offset to addr
  - Fixed offset handling in xsk_rcv
  - Removed bitfields from xdp_umem_reg
  - Added struct size checking for xdp_umem_reg in xsk_setsockopt to handle
    different versions of the struct.
  - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
---
 include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 19 ++++++--
 net/xdp/xsk.c               | 96 +++++++++++++++++++++++++++++--------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
 6 files changed, 232 insertions(+), 37 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index f023b9940d64..c9398ce7960f 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,6 +16,13 @@
 struct net_device;
 struct xsk_queue;
 
+/* Masks for xdp_umem_page flags.
+ * The low 12-bits of the addr will be 0 since this is the page address, so we
+ * can use them for flags.
+ */
+#define XSK_NEXT_PG_CONTIG_SHIFT 0
+#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
+
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
 	u64 handles[];
 };
 
-/* Flags for the umem flags field. */
-#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
+/* Flags for the umem flags field.
+ *
+ * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public
+ * flags. See inlude/uapi/include/linux/if_xdp.h.
+ */
+#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
 
 struct xdp_umem {
 	struct xsk_queue *fq;
@@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
+	unsigned long page_addr;
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
+
+	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
 }
 
 static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
+	addr = xsk_umem_add_offset_to_addr(addr);
+
+	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
 }
 
 /* Reuse-queue aware version of FILL queue helpers */
@@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return address + offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +287,21 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
 	return NULL;
 }
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return 0;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
@@ -290,6 +351,12 @@ static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 	return false;
 }
 
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return 0;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2d65779282a1..e997b263a0dd 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
+			XDP_UMEM_USES_NEED_WAKEUP))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..907e5f12338f 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & ~(PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	u64 addr, memcpy_addr;
+	void *from_buf;
 	u32 metalen;
-	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
+	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
+
+	offset += metalen;
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
+	u64 offset = xs->umem->headroom;
 	void *buffer;
 	u64 addr;
 	int err;
@@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ * Store the result in the low bits of addr.
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	struct xdp_umem_page *pgs = umem->pages;
+	int i, is_contig;
+
+	for (i = 0; i < umem->npgs - 1; i++) {
+		is_contig = (flags & XDP_ZEROCOPY) ?
+			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
+			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
+		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
+	}
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
@@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	return err;
 }
 
+struct xdp_umem_reg_v1 {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
 static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			  char __user *optval, unsigned int optlen)
 {
@@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 	}
 	case XDP_UMEM_REG:
 	{
-		struct xdp_umem_reg mr;
+		size_t mr_size = sizeof(struct xdp_umem_reg);
+		struct xdp_umem_reg mr = {};
 		struct xdp_umem *umem;
 
-		if (copy_from_user(&mr, optval, sizeof(mr)))
+		if (optlen < sizeof(struct xdp_umem_reg_v1))
+			return -EINVAL;
+		else if (optlen < sizeof(mr))
+			mr_size = sizeof(struct xdp_umem_reg_v1);
+
+		if (copy_from_user(&mr, optval, mr_size))
 			return -EFAULT;
 
 		mutex_lock(&xs->mutex);
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index dd9e985c2461..6c67c9d0294f 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
+			XSK_NEXT_PG_CONTIG_MASK;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -262,7 +313,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -273,7 +325,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition

v4:
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - other minor changes based on review comments

v5:
  - Added accessors for getting addr and offset
  - Added helper function to add offset to addr
  - Fixed offset handling in xsk_rcv
  - Removed bitfields from xdp_umem_reg
  - Added struct size checking for xdp_umem_reg in xsk_setsockopt to handle
    different versions of the struct.
  - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
---
 include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 19 ++++++--
 net/xdp/xsk.c               | 96 +++++++++++++++++++++++++++++--------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
 6 files changed, 232 insertions(+), 37 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index f023b9940d64..c9398ce7960f 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,6 +16,13 @@
 struct net_device;
 struct xsk_queue;
 
+/* Masks for xdp_umem_page flags.
+ * The low 12-bits of the addr will be 0 since this is the page address, so we
+ * can use them for flags.
+ */
+#define XSK_NEXT_PG_CONTIG_SHIFT 0
+#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
+
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
 	u64 handles[];
 };
 
-/* Flags for the umem flags field. */
-#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
+/* Flags for the umem flags field.
+ *
+ * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public
+ * flags. See inlude/uapi/include/linux/if_xdp.h.
+ */
+#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
 
 struct xdp_umem {
 	struct xsk_queue *fq;
@@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
+	unsigned long page_addr;
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
+
+	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
 }
 
 static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
+	addr = xsk_umem_add_offset_to_addr(addr);
+
+	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
 }
 
 /* Reuse-queue aware version of FILL queue helpers */
@@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return address + offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +287,21 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
 	return NULL;
 }
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return 0;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
@@ -290,6 +351,12 @@ static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 	return false;
 }
 
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return 0;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2d65779282a1..e997b263a0dd 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
+			XDP_UMEM_USES_NEED_WAKEUP))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..907e5f12338f 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & ~(PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	u64 addr, memcpy_addr;
+	void *from_buf;
 	u32 metalen;
-	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
+	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
+
+	offset += metalen;
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
+	u64 offset = xs->umem->headroom;
 	void *buffer;
 	u64 addr;
 	int err;
@@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
-	buffer = xdp_umem_get_data(xs->umem, addr);
+	buffer = xdp_umem_get_data(xs->umem, addr + offset);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+	offset += metalen;
+
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ * Store the result in the low bits of addr.
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	struct xdp_umem_page *pgs = umem->pages;
+	int i, is_contig;
+
+	for (i = 0; i < umem->npgs - 1; i++) {
+		is_contig = (flags & XDP_ZEROCOPY) ?
+			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
+			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
+		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
+	}
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
@@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	return err;
 }
 
+struct xdp_umem_reg_v1 {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
 static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			  char __user *optval, unsigned int optlen)
 {
@@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 	}
 	case XDP_UMEM_REG:
 	{
-		struct xdp_umem_reg mr;
+		size_t mr_size = sizeof(struct xdp_umem_reg);
+		struct xdp_umem_reg mr = {};
 		struct xdp_umem *umem;
 
-		if (copy_from_user(&mr, optval, sizeof(mr)))
+		if (optlen < sizeof(struct xdp_umem_reg_v1))
+			return -EINVAL;
+		else if (optlen < sizeof(mr))
+			mr_size = sizeof(struct xdp_umem_reg_v1);
+
+		if (copy_from_user(&mr, optval, mr_size))
 			return -EFAULT;
 
 		mutex_lock(&xs->mutex);
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index dd9e985c2461..6c67c9d0294f 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
+			XSK_NEXT_PG_CONTIG_MASK;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -262,7 +313,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -273,7 +325,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 04/11] i40e: modify driver for handling offsets
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2d6e82f72f48..eaca6162a6e6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 04/11] i40e: modify driver for handling offsets
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2d6e82f72f48..eaca6162a6e6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 05/11] ixgbe: modify driver for handling offsets
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index e1f743470ae9..17061c799f72 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 05/11] ixgbe: modify driver for handling offsets
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index e1f743470ae9..17061c799f72 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 06/11] mlx5e: modify driver for handling offsets
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Note: This patch only adds support for the new offset handling. An
additional patch that enables the unaligned chunks feature (makes mlx5e
accept arbitrary frame sizes) will be added later and will apply on top
of this patch.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset

v4:
  - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
    now.

v5:
  - Fixed typo: handle_offset -> adjust_offset
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 1ed5c33e022f..f049e0ac308a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_adjust_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 6a55573ec8f2..7c49a66d28c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	if (!xsk_umem_peek_addr_rq(umem, &handle))
 		return -ENOMEM;
 
-	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
+	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
+						      rq->buff.umem_headroom);
 	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
 
 	/* No need to add headroom to the DMA address. In striding RQ case, we
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 06/11] mlx5e: modify driver for handling offsets
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Note: This patch only adds support for the new offset handling. An
additional patch that enables the unaligned chunks feature (makes mlx5e
accept arbitrary frame sizes) will be added later and will apply on top
of this patch.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset

v4:
  - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
    now.

v5:
  - Fixed typo: handle_offset -> adjust_offset
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 1ed5c33e022f..f049e0ac308a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_adjust_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 6a55573ec8f2..7c49a66d28c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	if (!xsk_umem_peek_addr_rq(umem, &handle))
 		return -ENOMEM;
 
-	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
+	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
+						      rq->buff.umem_headroom);
 	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
 
 	/* No need to add headroom to the DMA address. In striding RQ case, we
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 07/11] libbpf: add flags to umem config
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Since we change the size of the xsk_umem_config struct, we need to version
the ABI. This patch includes the ABI versioning for xsk_umem__create. The
Makefile was also updated to handle multiple function versions in
check-abi.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.

v4:
  - Modified chunk flag define.

v5:
  - Added ABI versioning for xsk_umem__create().
  - Updated Makefile to handle multiple function versions.
  - Removed bitfields from xdp_umem_reg
  - Fix conflicts after 'bpf-af-xdp-wakeup' was merged
---
 tools/include/uapi/linux/if_xdp.h |  9 +++++++++
 tools/lib/bpf/Makefile            |  5 ++++-
 tools/lib/bpf/libbpf.map          |  1 +
 tools/lib/bpf/xsk.c               | 33 ++++++++++++++++++++++++++++---
 tools/lib/bpf/xsk.h               | 27 +++++++++++++++++++++++++
 5 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 613acb93b144..c6f94cffe06e 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -134,7 +134,9 @@ LIB_FILE	:= $(addprefix $(OUTPUT),$(LIB_FILE))
 PC_FILE		:= $(addprefix $(OUTPUT),$(PC_FILE))
 
 GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN) | \
-			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {s++} END{print s}')
+			   cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' | \
+			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}' | \
+			   sort -u | wc -l)
 VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
 			      grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
 
@@ -201,6 +203,7 @@ check_abi: $(OUTPUT)libbpf.so
 		     "Please make sure all LIBBPF_API symbols are"	 \
 		     "versioned in $(VERSION_SCRIPT)." >&2;		 \
 		readelf -s --wide $(OUTPUT)libbpf-in.o |		 \
+		    cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' |	 \
 		    awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}'|   \
 		    sort -u > $(OUTPUT)libbpf_global_syms.tmp;		 \
 		readelf -s --wide $(OUTPUT)libbpf.so |			 \
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 664ce8e7a60e..d04c7cb623ed 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -183,6 +183,7 @@ LIBBPF_0.0.4 {
 		perf_buffer__new;
 		perf_buffer__new_raw;
 		perf_buffer__poll;
+		xsk_umem__create;
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 12ad78510147..842c4fd55859 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -99,6 +99,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -106,6 +107,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -132,9 +134,10 @@ static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
 	return 0;
 }
 
-int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
-		     struct xsk_ring_prod *fill, struct xsk_ring_cons *comp,
-		     const struct xsk_umem_config *usr_config)
+int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
 {
 	struct xdp_mmap_offsets off;
 	struct xdp_umem_reg mr;
@@ -165,6 +168,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
@@ -238,6 +242,29 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	return err;
 }
 
+struct xsk_umem_config_v1 {
+	__u32 fill_size;
+	__u32 comp_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+int xsk_umem__create_v0_0_2(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
+{
+	struct xsk_umem_config config;
+
+	memcpy(&config, usr_config, sizeof(struct xsk_umem_config_v1));
+	config.flags = 0;
+
+	return xsk_umem__create_v0_0_4(umem_ptr, umem_area, size, fill, comp,
+					&config);
+}
+asm(".symver xsk_umem__create_v0_0_2, xsk_umem__create@LIBBPF_0.0.2");
+asm(".symver xsk_umem__create_v0_0_4, xsk_umem__create@@LIBBPF_0.0.4");
+
 static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 {
 	static const int log_buf_size = 16 * 1024;
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index aa1d6122b7db..584f6820a639 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -168,6 +168,21 @@ static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
 	return &((char *)umem_area)[addr];
 }
 
+static inline __u64 xsk_umem__extract_addr(__u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline __u64 xsk_umem__extract_offset(__u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
+{
+	return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
+}
+
 LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
 LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 
@@ -176,12 +191,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
@@ -201,6 +218,16 @@ LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
 				struct xsk_ring_prod *fill,
 				struct xsk_ring_cons *comp,
 				const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_2(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_4(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
 LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
 				  const char *ifname, __u32 queue_id,
 				  struct xsk_umem *umem,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 07/11] libbpf: add flags to umem config
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Since we change the size of the xsk_umem_config struct, we need to version
the ABI. This patch includes the ABI versioning for xsk_umem__create. The
Makefile was also updated to handle multiple function versions in
check-abi.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.

v4:
  - Modified chunk flag define.

v5:
  - Added ABI versioning for xsk_umem__create().
  - Updated Makefile to handle multiple function versions.
  - Removed bitfields from xdp_umem_reg
  - Fix conflicts after 'bpf-af-xdp-wakeup' was merged
---
 tools/include/uapi/linux/if_xdp.h |  9 +++++++++
 tools/lib/bpf/Makefile            |  5 ++++-
 tools/lib/bpf/libbpf.map          |  1 +
 tools/lib/bpf/xsk.c               | 33 ++++++++++++++++++++++++++++---
 tools/lib/bpf/xsk.h               | 27 +++++++++++++++++++++++++
 5 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 613acb93b144..c6f94cffe06e 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -134,7 +134,9 @@ LIB_FILE	:= $(addprefix $(OUTPUT),$(LIB_FILE))
 PC_FILE		:= $(addprefix $(OUTPUT),$(PC_FILE))
 
 GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN) | \
-			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {s++} END{print s}')
+			   cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' | \
+			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}' | \
+			   sort -u | wc -l)
 VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
 			      grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
 
@@ -201,6 +203,7 @@ check_abi: $(OUTPUT)libbpf.so
 		     "Please make sure all LIBBPF_API symbols are"	 \
 		     "versioned in $(VERSION_SCRIPT)." >&2;		 \
 		readelf -s --wide $(OUTPUT)libbpf-in.o |		 \
+		    cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' |	 \
 		    awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}'|   \
 		    sort -u > $(OUTPUT)libbpf_global_syms.tmp;		 \
 		readelf -s --wide $(OUTPUT)libbpf.so |			 \
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 664ce8e7a60e..d04c7cb623ed 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -183,6 +183,7 @@ LIBBPF_0.0.4 {
 		perf_buffer__new;
 		perf_buffer__new_raw;
 		perf_buffer__poll;
+		xsk_umem__create;
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 12ad78510147..842c4fd55859 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -99,6 +99,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -106,6 +107,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -132,9 +134,10 @@ static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
 	return 0;
 }
 
-int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
-		     struct xsk_ring_prod *fill, struct xsk_ring_cons *comp,
-		     const struct xsk_umem_config *usr_config)
+int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
 {
 	struct xdp_mmap_offsets off;
 	struct xdp_umem_reg mr;
@@ -165,6 +168,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
@@ -238,6 +242,29 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	return err;
 }
 
+struct xsk_umem_config_v1 {
+	__u32 fill_size;
+	__u32 comp_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+int xsk_umem__create_v0_0_2(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
+{
+	struct xsk_umem_config config;
+
+	memcpy(&config, usr_config, sizeof(struct xsk_umem_config_v1));
+	config.flags = 0;
+
+	return xsk_umem__create_v0_0_4(umem_ptr, umem_area, size, fill, comp,
+					&config);
+}
+asm(".symver xsk_umem__create_v0_0_2, xsk_umem__create at LIBBPF_0.0.2");
+asm(".symver xsk_umem__create_v0_0_4, xsk_umem__create@@LIBBPF_0.0.4");
+
 static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 {
 	static const int log_buf_size = 16 * 1024;
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index aa1d6122b7db..584f6820a639 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -168,6 +168,21 @@ static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
 	return &((char *)umem_area)[addr];
 }
 
+static inline __u64 xsk_umem__extract_addr(__u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline __u64 xsk_umem__extract_offset(__u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
+{
+	return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
+}
+
 LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
 LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 
@@ -176,12 +191,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
@@ -201,6 +218,16 @@ LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
 				struct xsk_ring_prod *fill,
 				struct xsk_ring_cons *comp,
 				const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_2(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_4(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
 LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
 				  const char *ifname, __u32 queue_id,
 				  struct xsk_umem *umem,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v4:
  - updated help text for -f
  - use new chunk flag define
---
 samples/bpf/xdpsock_user.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index da84c760c094..7312eab4d201 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,6 +68,9 @@ static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
+static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
@@ -284,7 +287,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -293,6 +298,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -355,6 +361,7 @@ static struct option long_options[] = {
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
 	{"no-need-wakeup", no_argument, 0, 'm'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +383,8 @@ static void usage(const char *prog)
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
 		"  -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two in aligned mode, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -388,8 +397,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:m",
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:mu",
 				long_options, &option_index);
 		if (c == -1)
 			break;
@@ -429,6 +437,10 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -438,6 +450,7 @@ static void parse_command_line(int argc, char **argv)
 			opt_need_wakeup = false;
 			opt_xdp_bind_flags &= ~XDP_USE_NEED_WAKEUP;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -450,7 +463,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+	    !opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v4:
  - updated help text for -f
  - use new chunk flag define
---
 samples/bpf/xdpsock_user.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index da84c760c094..7312eab4d201 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,6 +68,9 @@ static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
+static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
@@ -284,7 +287,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -293,6 +298,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -355,6 +361,7 @@ static struct option long_options[] = {
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
 	{"no-need-wakeup", no_argument, 0, 'm'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +383,8 @@ static void usage(const char *prog)
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
 		"  -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two in aligned mode, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -388,8 +397,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:m",
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:mu",
 				long_options, &option_index);
 		if (c == -1)
 			break;
@@ -429,6 +437,10 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -438,6 +450,7 @@ static void parse_command_line(int argc, char **argv)
 			opt_need_wakeup = false;
 			opt_xdp_bind_flags &= ~XDP_USE_NEED_WAKEUP;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -450,7 +463,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+	    !opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling

v5:
  - Use new accessors for addr/offset instead of macros.
---
 samples/bpf/xdpsock_user.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7312eab4d201..dc3d50f8ed86 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -484,6 +484,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 				     struct pollfd *fds)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -498,24 +499,23 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
+			if (xsk_ring_prod__needs_wakeup(&umem->fq))
 				ret = poll(fds, num_socks, opt_timeout);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -568,10 +568,13 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = orig;
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -680,12 +683,15 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		swap_mac_addresses(pkt);
 
 		hex_dump(pkt, len, addr);
-		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = addr;
+		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = orig;
 		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx++)->len = len;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 09/11] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling

v5:
  - Use new accessors for addr/offset instead of macros.
---
 samples/bpf/xdpsock_user.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7312eab4d201..dc3d50f8ed86 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -484,6 +484,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 				     struct pollfd *fds)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -498,24 +499,23 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
+			if (xsk_ring_prod__needs_wakeup(&umem->fq))
 				ret = poll(fds, num_socks, opt_timeout);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -568,10 +568,13 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = orig;
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -680,12 +683,15 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		swap_mac_addresses(pkt);
 
 		hex_dump(pkt, len, addr);
-		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = addr;
+		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = orig;
 		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx++)->len = len;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 10/11] samples/bpf: use hugepages in xdpsock app
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index dc3d50f8ed86..102eace22956 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -70,6 +70,7 @@ static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
@@ -440,6 +441,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -742,11 +744,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 10/11] samples/bpf: use hugepages in xdpsock app
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index dc3d50f8ed86..102eace22956 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -70,6 +70,7 @@ static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
@@ -440,6 +441,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -742,11 +744,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v5 11/11] doc/af_xdp: include unaligned chunk case
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22  1:44           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 11/11] doc/af_xdp: include unaligned chunk case
@ 2019-08-22  1:44           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-22  1:44 UTC (permalink / raw)
  To: intel-wired-lan

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
  2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-22 18:43             ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-22 18:43 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 21 Aug 2019, at 18:44, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
>
> v4:
>   - fixed page_start calculation in __xsk_rcv_memcpy().
>   - move offset handling to the xdp_umem_get_* functions
>   - modified the len field in xdp_umem_reg struct. We now use 16 bits 
> from
>     this for the flags field.
>   - removed next_pg_contig field from xdp_umem_page struct. Using low 
> 12
>     bits of addr to store flags instead.
>   - other minor changes based on review comments
>
> v5:
>   - Added accessors for getting addr and offset
>   - Added helper function to add offset to addr
>   - Fixed offset handling in xsk_rcv
>   - Removed bitfields from xdp_umem_reg
>   - Added struct size checking for xdp_umem_reg in xsk_setsockopt to 
> handle
>     different versions of the struct.
>   - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
> ---
>  include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 19 ++++++--
>  net/xdp/xsk.c               | 96 
> +++++++++++++++++++++++++++++--------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
>  6 files changed, 232 insertions(+), 37 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index f023b9940d64..c9398ce7960f 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>  struct net_device;
>  struct xsk_queue;
>
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page 
> address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> @@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
>  	u64 handles[];
>  };
>
> -/* Flags for the umem flags field. */
> -#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
> +/* Flags for the umem flags field.
> + *
> + * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for 
> public
> + * flags. See inlude/uapi/include/linux/if_xdp.h.
> + */
> +#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
>
>  struct xdp_umem {
>  	struct xsk_queue *fq;
> @@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map 
> *map, struct xdp_sock *xs,
>  int xsk_map_inc(struct xsk_map *map);
>  void xsk_map_put(struct xsk_map *map);
>
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
> +}
> +
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 
> 1));
> +	unsigned long page_addr;
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>  }
>
>  static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 
> 1));
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>  }
>
>  /* Reuse-queue aware version of FILL queue helpers */
> @@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned 
> mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of 
> the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +287,21 @@ static inline struct xdp_umem 
> *xdp_get_umem_from_qid(struct net_device *dev,
>  	return NULL;
>  }
>
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
>  	return NULL;
> @@ -290,6 +351,12 @@ static inline bool 
> xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>  	return false;
>  }
>
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index 62b80d57b72a..be328c59389d 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -26,6 +26,9 @@
>   */
>  #define XDP_USE_NEED_WAKEUP (1 << 3)
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -66,6 +69,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>
>  struct xdp_statistics {
> @@ -87,6 +91,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2d65779282a1..e997b263a0dd 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
> +			XDP_UMEM_USES_NEED_WAKEUP))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..907e5f12338f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem 
> *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	u64 addr, memcpy_addr;
> +	void *from_buf;
>  	u32 metalen;
> -	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> +	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
>  	u32 metalen = xdp->data - xdp->data_meta;
>  	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>  	void *buffer;
>  	u64 addr;
>  	int err;
> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;

Can't just add address and offset any longer.  This should read:

	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
	buffer = xdp_umem_get_data(xs->umem, addr);

	addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
	

so that offset and then metalen are added.  (or preserve the
address across the calls like memcpy_addr earlier).
-- 
Jonathan

> @@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> @@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  	return err;
>  }
>
> +struct xdp_umem_reg_v1 {
> +	__u64 addr; /* Start of packet data area */
> +	__u64 len; /* Length of packet data area */
> +	__u32 chunk_size;
> +	__u32 headroom;
> +};
> +
>  static int xsk_setsockopt(struct socket *sock, int level, int 
> optname,
>  			  char __user *optval, unsigned int optlen)
>  {
> @@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, 
> int level, int optname,
>  	}
>  	case XDP_UMEM_REG:
>  	{
> -		struct xdp_umem_reg mr;
> +		size_t mr_size = sizeof(struct xdp_umem_reg);
> +		struct xdp_umem_reg mr = {};
>  		struct xdp_umem *umem;
>
> -		if (copy_from_user(&mr, optval, sizeof(mr)))
> +		if (optlen < sizeof(struct xdp_umem_reg_v1))
> +			return -EINVAL;
> +		else if (optlen < sizeof(mr))
> +			mr_size = sizeof(struct xdp_umem_reg_v1);
> +
> +		if (copy_from_user(&mr, optval, mr_size))
>  			return -EFAULT;
>
>  		mutex_lock(&xs->mutex);
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index dd9e985c2461..6c67c9d0294f 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -262,7 +313,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -273,7 +325,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-08-22 18:43             ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-22 18:43 UTC (permalink / raw)
  To: intel-wired-lan



On 21 Aug 2019, at 18:44, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
>
> v4:
>   - fixed page_start calculation in __xsk_rcv_memcpy().
>   - move offset handling to the xdp_umem_get_* functions
>   - modified the len field in xdp_umem_reg struct. We now use 16 bits 
> from
>     this for the flags field.
>   - removed next_pg_contig field from xdp_umem_page struct. Using low 
> 12
>     bits of addr to store flags instead.
>   - other minor changes based on review comments
>
> v5:
>   - Added accessors for getting addr and offset
>   - Added helper function to add offset to addr
>   - Fixed offset handling in xsk_rcv
>   - Removed bitfields from xdp_umem_reg
>   - Added struct size checking for xdp_umem_reg in xsk_setsockopt to 
> handle
>     different versions of the struct.
>   - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
> ---
>  include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 19 ++++++--
>  net/xdp/xsk.c               | 96 
> +++++++++++++++++++++++++++++--------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
>  6 files changed, 232 insertions(+), 37 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index f023b9940d64..c9398ce7960f 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>  struct net_device;
>  struct xsk_queue;
>
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page 
> address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> @@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
>  	u64 handles[];
>  };
>
> -/* Flags for the umem flags field. */
> -#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
> +/* Flags for the umem flags field.
> + *
> + * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for 
> public
> + * flags. See inlude/uapi/include/linux/if_xdp.h.
> + */
> +#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
>
>  struct xdp_umem {
>  	struct xsk_queue *fq;
> @@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map 
> *map, struct xdp_sock *xs,
>  int xsk_map_inc(struct xsk_map *map);
>  void xsk_map_put(struct xsk_map *map);
>
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
> +}
> +
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 
> 1));
> +	unsigned long page_addr;
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>  }
>
>  static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 
> 1));
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>  }
>
>  /* Reuse-queue aware version of FILL queue helpers */
> @@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned 
> mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of 
> the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +287,21 @@ static inline struct xdp_umem 
> *xdp_get_umem_from_qid(struct net_device *dev,
>  	return NULL;
>  }
>
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
>  	return NULL;
> @@ -290,6 +351,12 @@ static inline bool 
> xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>  	return false;
>  }
>
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index 62b80d57b72a..be328c59389d 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -26,6 +26,9 @@
>   */
>  #define XDP_USE_NEED_WAKEUP (1 << 3)
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -66,6 +69,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>
>  struct xdp_statistics {
> @@ -87,6 +91,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2d65779282a1..e997b263a0dd 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
> +			XDP_UMEM_USES_NEED_WAKEUP))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..907e5f12338f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem 
> *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	u64 addr, memcpy_addr;
> +	void *from_buf;
>  	u32 metalen;
> -	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> +	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
>  	u32 metalen = xdp->data - xdp->data_meta;
>  	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>  	void *buffer;
>  	u64 addr;
>  	int err;
> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;

Can't just add address and offset any longer.  This should read:

	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
	buffer = xdp_umem_get_data(xs->umem, addr);

	addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
	

so that offset and then metalen are added.  (or preserve the
address across the calls like memcpy_addr earlier).
-- 
Jonathan

> @@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> @@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  	return err;
>  }
>
> +struct xdp_umem_reg_v1 {
> +	__u64 addr; /* Start of packet data area */
> +	__u64 len; /* Length of packet data area */
> +	__u32 chunk_size;
> +	__u32 headroom;
> +};
> +
>  static int xsk_setsockopt(struct socket *sock, int level, int 
> optname,
>  			  char __user *optval, unsigned int optlen)
>  {
> @@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, 
> int level, int optname,
>  	}
>  	case XDP_UMEM_REG:
>  	{
> -		struct xdp_umem_reg mr;
> +		size_t mr_size = sizeof(struct xdp_umem_reg);
> +		struct xdp_umem_reg mr = {};
>  		struct xdp_umem *umem;
>
> -		if (copy_from_user(&mr, optval, sizeof(mr)))
> +		if (optlen < sizeof(struct xdp_umem_reg_v1))
> +			return -EINVAL;
> +		else if (optlen < sizeof(mr))
> +			mr_size = sizeof(struct xdp_umem_reg_v1);
> +
> +		if (copy_from_user(&mr, optval, mr_size))
>  			return -EFAULT;
>
>  		mutex_lock(&xs->mutex);
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index dd9e985c2461..6c67c9d0294f 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -262,7 +313,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -273,7 +325,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
  2019-08-22 18:43             ` [Intel-wired-lan] " Jonathan Lemon
@ 2019-08-23 13:35               ` Laatz, Kevin
  -1 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-08-23 13:35 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan

On 22/08/2019 19:43, Jonathan Lemon wrote:
> On 21 Aug 2019, at 18:44, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be 
>> placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing 
>> with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>   - Add checks for the flags coming from userspace
>>   - Fix how we get chunk_size in xsk_diag.c
>>   - Add defines for masking the new descriptor format
>>   - Modified the rx functions to use new descriptor format
>>   - Modified the tx functions to use new descriptor format
>>
>> v3:
>>   - Add helper function to do address/offset masking/addition
>>
>> v4:
>>   - fixed page_start calculation in __xsk_rcv_memcpy().
>>   - move offset handling to the xdp_umem_get_* functions
>>   - modified the len field in xdp_umem_reg struct. We now use 16 bits 
>> from
>>     this for the flags field.
>>   - removed next_pg_contig field from xdp_umem_page struct. Using low 12
>>     bits of addr to store flags instead.
>>   - other minor changes based on review comments
>>
>> v5:
>>   - Added accessors for getting addr and offset
>>   - Added helper function to add offset to addr
>>   - Fixed offset handling in xsk_rcv
>>   - Removed bitfields from xdp_umem_reg
>>   - Added struct size checking for xdp_umem_reg in xsk_setsockopt to 
>> handle
>>     different versions of the struct.
>>   - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
>> ---
>>  include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
>>  include/uapi/linux/if_xdp.h |  9 ++++
>>  net/xdp/xdp_umem.c          | 19 ++++++--
>>  net/xdp/xsk.c               | 96 +++++++++++++++++++++++++++++--------
>>  net/xdp/xsk_diag.c          |  2 +-
>>  net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
>>  6 files changed, 232 insertions(+), 37 deletions(-)
>>
>>
[...]

>> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
>> xdp_buff *xdp)
>>          goto out_unlock;
>>      }
>>
>> -    if (!xskq_peek_addr(xs->umem->fq, &addr) ||
>> +    if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>>          len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>>          err = -ENOSPC;
>>          goto out_drop;
>>      }
>>
>> -    addr += xs->umem->headroom;
>> -
>> -    buffer = xdp_umem_get_data(xs->umem, addr);
>> +    buffer = xdp_umem_get_data(xs->umem, addr + offset);
>>      memcpy(buffer, xdp->data_meta, len + metalen);
>> -    addr += metalen;
>> +    offset += metalen;
>> +
>> +    addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>>      err = xskq_produce_batch_desc(xs->rx, addr, len);
>>      if (err)
>>          goto out_drop;
>
> Can't just add address and offset any longer.  This should read:
>
>     addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>     buffer = xdp_umem_get_data(xs->umem, addr);
>
>     addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
>
>
> so that offset and then metalen are added.  (or preserve the
> address across the calls like memcpy_addr earlier).


Will fix this, thanks!

-Kevin


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-08-23 13:35               ` Laatz, Kevin
  0 siblings, 0 replies; 274+ messages in thread
From: Laatz, Kevin @ 2019-08-23 13:35 UTC (permalink / raw)
  To: intel-wired-lan

On 22/08/2019 19:43, Jonathan Lemon wrote:
> On 21 Aug 2019, at 18:44, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be 
>> placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing 
>> with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>> ? - Add checks for the flags coming from userspace
>> ? - Fix how we get chunk_size in xsk_diag.c
>> ? - Add defines for masking the new descriptor format
>> ? - Modified the rx functions to use new descriptor format
>> ? - Modified the tx functions to use new descriptor format
>>
>> v3:
>> ? - Add helper function to do address/offset masking/addition
>>
>> v4:
>> ? - fixed page_start calculation in __xsk_rcv_memcpy().
>> ? - move offset handling to the xdp_umem_get_* functions
>> ? - modified the len field in xdp_umem_reg struct. We now use 16 bits 
>> from
>> ??? this for the flags field.
>> ? - removed next_pg_contig field from xdp_umem_page struct. Using low 12
>> ??? bits of addr to store flags instead.
>> ? - other minor changes based on review comments
>>
>> v5:
>> ? - Added accessors for getting addr and offset
>> ? - Added helper function to add offset to addr
>> ? - Fixed offset handling in xsk_rcv
>> ? - Removed bitfields from xdp_umem_reg
>> ? - Added struct size checking for xdp_umem_reg in xsk_setsockopt to 
>> handle
>> ??? different versions of the struct.
>> ? - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
>> ---
>> ?include/net/xdp_sock.h????? | 75 +++++++++++++++++++++++++++--
>> ?include/uapi/linux/if_xdp.h |? 9 ++++
>> ?net/xdp/xdp_umem.c????????? | 19 ++++++--
>> ?net/xdp/xsk.c?????????????? | 96 +++++++++++++++++++++++++++++--------
>> ?net/xdp/xsk_diag.c????????? |? 2 +-
>> ?net/xdp/xsk_queue.h???????? | 68 ++++++++++++++++++++++----
>> ?6 files changed, 232 insertions(+), 37 deletions(-)
>>
>>
[...]

>> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
>> xdp_buff *xdp)
>> ???????? goto out_unlock;
>> ???? }
>>
>> -??? if (!xskq_peek_addr(xs->umem->fq, &addr) ||
>> +??? if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>> ???????? len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>> ???????? err = -ENOSPC;
>> ???????? goto out_drop;
>> ???? }
>>
>> -??? addr += xs->umem->headroom;
>> -
>> -??? buffer = xdp_umem_get_data(xs->umem, addr);
>> +??? buffer = xdp_umem_get_data(xs->umem, addr + offset);
>> ???? memcpy(buffer, xdp->data_meta, len + metalen);
>> -??? addr += metalen;
>> +??? offset += metalen;
>> +
>> +??? addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>> ???? err = xskq_produce_batch_desc(xs->rx, addr, len);
>> ???? if (err)
>> ???????? goto out_drop;
>
> Can't just add address and offset any longer.? This should read:
>
> ????addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> ????buffer = xdp_umem_get_data(xs->umem, addr);
>
> ????addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
>
>
> so that offset and then metalen are added.? (or preserve the
> address across the calls like memcpy_addr earlier).


Will fix this, thanks!

-Kevin


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
  2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25           ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The patch set was tested with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

There are no changes to performance before and after these patches for SKB
mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.

This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - This patch allows XSK frames smaller than page size in the mlx5e
    driver. Relax the requirements to the XSK frame size to allow it to be
    smaller than a page and even not a power of two. The current
    implementation can work in this mode, both with Striding RQ and without
    it.

Patch 8:
  - Add flags for umem configuration to libbpf. Since we increase the size
    of the struct by adding flags, we also need to add the ABI versioning
    in this patch.

Patch 9:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 10:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 11:
  - Adds hugepage support to the xdpsock application

Patch 12:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

v5:
  - Added ABI versioning in the libbpf patch
  - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
  - Added accessors for getting addr and offset.
  - Added helper function for adding the offset to the addr.
  - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
  - Fixed typo in mlx driver patch.
  - Moved libbpf patch to later in the set (7/11, just before the sample
    app changes)

v6:
  - Added support for XSK frames smaller than page in mlx5e driver (Maxim
    Mikityanskiy <maximmi@mellanox.com).
  - Fixed offset handling in xsk_generic_rcv.
  - Added check for base address in xskq_is_valid_addr_unaligned.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

Maxim Mikityanskiy (1):
  net/mlx5e: Allow XSK frames smaller than a page

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
 .../ethernet/mellanox/mlx5/core/en/params.c   | 23 ++++-
 .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  5 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++-
 include/net/xdp_sock.h                        | 75 ++++++++++++++-
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 19 +++-
 net/xdp/xsk.c                                 | 94 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 70 ++++++++++++--
 samples/bpf/xdpsock_user.c                    | 61 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++
 tools/lib/bpf/Makefile                        |  5 +-
 tools/lib/bpf/libbpf.map                      |  1 +
 tools/lib/bpf/xsk.c                           | 33 ++++++-
 tools/lib/bpf/xsk.h                           | 27 ++++++
 20 files changed, 417 insertions(+), 103 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
@ 2019-08-27  2:25           ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The patch set was tested with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

There are no changes to performance before and after these patches for SKB
mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.

This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - This patch allows XSK frames smaller than page size in the mlx5e
    driver. Relax the requirements to the XSK frame size to allow it to be
    smaller than a page and even not a power of two. The current
    implementation can work in this mode, both with Striding RQ and without
    it.

Patch 8:
  - Add flags for umem configuration to libbpf. Since we increase the size
    of the struct by adding flags, we also need to add the ABI versioning
    in this patch.

Patch 9:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 10:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 11:
  - Adds hugepage support to the xdpsock application

Patch 12:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

v5:
  - Added ABI versioning in the libbpf patch
  - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
  - Added accessors for getting addr and offset.
  - Added helper function for adding the offset to the addr.
  - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
  - Fixed typo in mlx driver patch.
  - Moved libbpf patch to later in the set (7/11, just before the sample
    app changes)

v6:
  - Added support for XSK frames smaller than page in mlx5e driver (Maxim
    Mikityanskiy <maximmi@mellanox.com).
  - Fixed offset handling in xsk_generic_rcv.
  - Added check for base address in xskq_is_valid_addr_unaligned.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

Maxim Mikityanskiy (1):
  net/mlx5e: Allow XSK frames smaller than a page

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
 .../ethernet/mellanox/mlx5/core/en/params.c   | 23 ++++-
 .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  5 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++-
 include/net/xdp_sock.h                        | 75 ++++++++++++++-
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 19 +++-
 net/xdp/xsk.c                                 | 94 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 70 ++++++++++++--
 samples/bpf/xdpsock_user.c                    | 61 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++
 tools/lib/bpf/Makefile                        |  5 +-
 tools/lib/bpf/libbpf.map                      |  1 +
 tools/lib/bpf/xsk.c                           | 33 ++++++-
 tools/lib/bpf/xsk.h                           | 27 ++++++
 20 files changed, 417 insertions(+), 103 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 01/12] i40e: simplify Rx buffer recycle
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 42c90126b6c4..2d6e82f72f48 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 01/12] i40e: simplify Rx buffer recycle
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 42c90126b6c4..2d6e82f72f48 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
 	/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	new_bi->dma = old_bi->dma & mask;
-	new_bi->dma += hr;
-
-	new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
-	new_bi->addr += hr;
-
-	new_bi->handle = old_bi->handle & mask;
-	new_bi->handle += rx_ring->xsk_umem->headroom;
+	new_bi->dma = old_bi->dma;
+	new_bi->addr = old_bi->addr;
+	new_bi->handle = old_bi->handle;
 
 	old_bi->addr = NULL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 02/12] ixgbe: simplify Rx buffer recycle
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 9a28d98a1484..e1f743470ae9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 02/12] ixgbe: simplify Rx buffer recycle
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 9a28d98a1484..e1f743470ae9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 				     struct ixgbe_rx_buffer *obi)
 {
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
-	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 	struct ixgbe_rx_buffer *nbi;
 
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
 	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
 	/* transfer page from old buffer to new buffer */
-	nbi->dma = obi->dma & mask;
-	nbi->dma += hr;
-
-	nbi->addr = (void *)((unsigned long)obi->addr & mask);
-	nbi->addr += hr;
-
-	nbi->handle = obi->handle & mask;
-	nbi->handle += rx_ring->xsk_umem->headroom;
+	nbi->dma = obi->dma;
+	nbi->addr = obi->addr;
+	nbi->handle = obi->handle;
 
 	obi->addr = NULL;
 	obi->skb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 03/12] xsk: add support to allow unaligned chunk placement
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition

v4:
  - Fixed page_start calculation in __xsk_rcv_memcpy().
  - Move offset handling to the xdp_umem_get_* functions
  - Modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - Removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - Other minor changes based on review comments

v5:
  - Added accessors for getting addr and offset
  - Added helper function to add offset to addr
  - Fixed offset handling in xsk_rcv
  - Removed bitfields from xdp_umem_reg
  - Added struct size checking for xdp_umem_reg in xsk_setsockopt to handle
    different versions of the struct.
  - Fixed conflicts after 'bpf-af-xdp-wakeup' was merged.

v6:
  - Fixed offset handling in xsk_generic_rcv.
  - Added check for base address in xskq_is_valid_addr_unaligned().
---
 include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 19 ++++++--
 net/xdp/xsk.c               | 94 ++++++++++++++++++++++++++++++-------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 70 +++++++++++++++++++++++----
 6 files changed, 233 insertions(+), 36 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index f023b9940d64..c9398ce7960f 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,6 +16,13 @@
 struct net_device;
 struct xsk_queue;
 
+/* Masks for xdp_umem_page flags.
+ * The low 12-bits of the addr will be 0 since this is the page address, so we
+ * can use them for flags.
+ */
+#define XSK_NEXT_PG_CONTIG_SHIFT 0
+#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
+
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
 	u64 handles[];
 };
 
-/* Flags for the umem flags field. */
-#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
+/* Flags for the umem flags field.
+ *
+ * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public
+ * flags. See inlude/uapi/include/linux/if_xdp.h.
+ */
+#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
 
 struct xdp_umem {
 	struct xsk_queue *fq;
@@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
+	unsigned long page_addr;
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
+
+	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
 }
 
 static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
+	addr = xsk_umem_add_offset_to_addr(addr);
+
+	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
 }
 
 /* Reuse-queue aware version of FILL queue helpers */
@@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return address + offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +287,21 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
 	return NULL;
 }
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return 0;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
@@ -290,6 +351,12 @@ static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 	return false;
 }
 
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return 0;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2d65779282a1..e997b263a0dd 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
+			XDP_UMEM_USES_NEED_WAKEUP))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..187fd157fcff 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & ~(PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	u64 addr, memcpy_addr;
+	void *from_buf;
 	u32 metalen;
-	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
+	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
+
+	offset += metalen;
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
+	u64 offset = xs->umem->headroom;
 	void *buffer;
 	u64 addr;
 	int err;
@@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	buffer = xdp_umem_get_data(xs->umem, addr);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+
+	addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ * Store the result in the low bits of addr.
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	struct xdp_umem_page *pgs = umem->pages;
+	int i, is_contig;
+
+	for (i = 0; i < umem->npgs - 1; i++) {
+		is_contig = (flags & XDP_ZEROCOPY) ?
+			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
+			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
+		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
+	}
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
@@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	return err;
 }
 
+struct xdp_umem_reg_v1 {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
 static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			  char __user *optval, unsigned int optlen)
 {
@@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 	}
 	case XDP_UMEM_REG:
 	{
-		struct xdp_umem_reg mr;
+		size_t mr_size = sizeof(struct xdp_umem_reg);
+		struct xdp_umem_reg mr = {};
 		struct xdp_umem *umem;
 
-		if (copy_from_user(&mr, optval, sizeof(mr)))
+		if (optlen < sizeof(struct xdp_umem_reg_v1))
+			return -EINVAL;
+		else if (optlen < sizeof(mr))
+			mr_size = sizeof(struct xdp_umem_reg_v1);
+
+		if (copy_from_user(&mr, optval, mr_size))
 			return -EFAULT;
 
 		mutex_lock(&xs->mutex);
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index dd9e985c2461..eddae4688862 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
+			XSK_NEXT_PG_CONTIG_MASK;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -144,23 +155,51 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	u64 base_addr = xsk_umem_extract_addr(addr);
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (base_addr >= q->size || addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -171,7 +210,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -230,8 +269,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -245,14 +297,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -262,7 +315,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -273,7 +327,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 03/12] xsk: add support to allow unaligned chunk placement
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).

This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Add checks for the flags coming from userspace
  - Fix how we get chunk_size in xsk_diag.c
  - Add defines for masking the new descriptor format
  - Modified the rx functions to use new descriptor format
  - Modified the tx functions to use new descriptor format

v3:
  - Add helper function to do address/offset masking/addition

v4:
  - Fixed page_start calculation in __xsk_rcv_memcpy().
  - Move offset handling to the xdp_umem_get_* functions
  - Modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - Removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - Other minor changes based on review comments

v5:
  - Added accessors for getting addr and offset
  - Added helper function to add offset to addr
  - Fixed offset handling in xsk_rcv
  - Removed bitfields from xdp_umem_reg
  - Added struct size checking for xdp_umem_reg in xsk_setsockopt to handle
    different versions of the struct.
  - Fixed conflicts after 'bpf-af-xdp-wakeup' was merged.

v6:
  - Fixed offset handling in xsk_generic_rcv.
  - Added check for base address in xskq_is_valid_addr_unaligned().
---
 include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
 include/uapi/linux/if_xdp.h |  9 ++++
 net/xdp/xdp_umem.c          | 19 ++++++--
 net/xdp/xsk.c               | 94 ++++++++++++++++++++++++++++++-------
 net/xdp/xsk_diag.c          |  2 +-
 net/xdp/xsk_queue.h         | 70 +++++++++++++++++++++++----
 6 files changed, 233 insertions(+), 36 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index f023b9940d64..c9398ce7960f 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,6 +16,13 @@
 struct net_device;
 struct xsk_queue;
 
+/* Masks for xdp_umem_page flags.
+ * The low 12-bits of the addr will be 0 since this is the page address, so we
+ * can use them for flags.
+ */
+#define XSK_NEXT_PG_CONTIG_SHIFT 0
+#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
+
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
 	u64 handles[];
 };
 
-/* Flags for the umem flags field. */
-#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
+/* Flags for the umem flags field.
+ *
+ * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public
+ * flags. See inlude/uapi/include/linux/if_xdp.h.
+ */
+#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
 
 struct xdp_umem {
 	struct xsk_queue *fq;
@@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
+	unsigned long page_addr;
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
+
+	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
 }
 
 static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
-	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
+	addr = xsk_umem_add_offset_to_addr(addr);
+
+	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
 }
 
 /* Reuse-queue aware version of FILL queue helpers */
@@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
 
 	rq->handles[rq->length++] = addr;
 }
+
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
+					 u64 offset)
+{
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return address + offset;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -241,6 +287,21 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
 	return NULL;
 }
 
+static inline u64 xsk_umem_extract_addr(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_extract_offset(u64 addr)
+{
+	return 0;
+}
+
+static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
+{
+	return 0;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
@@ -290,6 +351,12 @@ static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 	return false;
 }
 
+static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
+					 u64 offset)
+{
+	return 0;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2d65779282a1..e997b263a0dd 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
 
 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 {
+	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
@@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 	}
 
-	if (!is_power_of_2(chunk_size))
+	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
+			XDP_UMEM_USES_NEED_WAKEUP))
+		return -EINVAL;
+
+	if (!unaligned_chunks && !is_power_of_2(chunk_size))
 		return -EINVAL;
 
 	if (!PAGE_ALIGNED(addr)) {
@@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (chunks == 0)
 		return -EINVAL;
 
-	chunks_per_page = PAGE_SIZE / chunk_size;
-	if (chunks < chunks_per_page || chunks % chunks_per_page)
-		return -EINVAL;
+	if (!unaligned_chunks) {
+		chunks_per_page = PAGE_SIZE / chunk_size;
+		if (chunks < chunks_per_page || chunks % chunks_per_page)
+			return -EINVAL;
+	}
 
 	headroom = ALIGN(headroom, 64);
 
@@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		return -EINVAL;
 
 	umem->address = (unsigned long)addr;
-	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
+					    : ~((u64)chunk_size - 1);
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
 	umem->pgs = NULL;
 	umem->user = NULL;
+	umem->flags = mr->flags;
 	INIT_LIST_HEAD(&umem->xsk_list);
 	spin_lock_init(&umem->xsk_list_lock);
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..187fd157fcff 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr);
+	return xskq_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
@@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
 
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+			     u32 len, u32 metalen)
+{
+	void *to_buf = xdp_umem_get_data(umem, addr);
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+		u64 page_start = addr & ~(PAGE_SIZE - 1);
+		u64 first_len = PAGE_SIZE - (addr - page_start);
+
+		memcpy(to_buf, from_buf, first_len + metalen);
+		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+		return;
+	}
+
+	memcpy(to_buf, from_buf, len + metalen);
+}
+
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *to_buf, *from_buf;
+	u64 offset = xs->umem->headroom;
+	u64 addr, memcpy_addr;
+	void *from_buf;
 	u32 metalen;
-	u64 addr;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	addr += xs->umem->headroom;
-
 	if (unlikely(xdp_data_meta_unsupported(xdp))) {
 		from_buf = xdp->data;
 		metalen = 0;
@@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		metalen = xdp->data - xdp->data_meta;
 	}
 
-	to_buf = xdp_umem_get_data(xs->umem, addr);
-	memcpy(to_buf, from_buf, len + metalen);
-	addr += metalen;
+	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
+	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
+
+	offset += metalen;
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
+	u64 offset = xs->umem->headroom;
 	void *buffer;
 	u64 addr;
 	int err;
@@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
 	}
 
-	addr += xs->umem->headroom;
-
+	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	buffer = xdp_umem_get_data(xs->umem, addr);
 	memcpy(buffer, xdp->data_meta, len + metalen);
-	addr += metalen;
+
+	addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
@@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc))
+		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc)) {
+	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
 		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
@@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
 	return sock;
 }
 
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ * Store the result in the low bits of addr.
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+	struct xdp_umem_page *pgs = umem->pages;
+	int i, is_contig;
+
+	for (i = 0; i < umem->npgs - 1; i++) {
+		is_contig = (flags & XDP_ZEROCOPY) ?
+			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
+			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
+		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
+	}
+}
+
 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		xsk_check_page_contiguity(xs->umem, flags);
 	}
 
 	xs->dev = dev;
@@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	return err;
 }
 
+struct xdp_umem_reg_v1 {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
 static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			  char __user *optval, unsigned int optlen)
 {
@@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 	}
 	case XDP_UMEM_REG:
 	{
-		struct xdp_umem_reg mr;
+		size_t mr_size = sizeof(struct xdp_umem_reg);
+		struct xdp_umem_reg mr = {};
 		struct xdp_umem *umem;
 
-		if (copy_from_user(&mr, optval, sizeof(mr)))
+		if (optlen < sizeof(struct xdp_umem_reg_v1))
+			return -EINVAL;
+		else if (optlen < sizeof(mr))
+			mr_size = sizeof(struct xdp_umem_reg_v1);
+
+		if (copy_from_user(&mr, optval, mr_size))
 			return -EFAULT;
 
 		mutex_lock(&xs->mutex);
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.id = umem->id;
 	du.size = umem->size;
 	du.num_pages = umem->npgs;
-	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
 	du.headroom = umem->headroom;
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index dd9e985c2461..eddae4688862 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+					      u64 length)
+{
+	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+	bool next_pg_contig =
+		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
+			XSK_NEXT_PG_CONTIG_MASK;
+
+	return cross_pg && !next_pg_contig;
+}
+
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
@@ -144,23 +155,51 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+						u64 length,
+						struct xdp_umem *umem)
+{
+	u64 base_addr = xsk_umem_extract_addr(addr);
+
+	addr = xsk_umem_add_offset_to_addr(addr);
+	if (base_addr >= q->size || addr >= q->size ||
+	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+				      struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+
+		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+			if (xskq_is_valid_addr_unaligned(q, *addr,
+							 umem->chunk_size_nohr,
+							 umem))
+				return addr;
+			goto out;
+		}
+
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
+out:
 		q->cons_tail++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+				  struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -171,7 +210,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
 		smp_rmb();
 	}
 
-	return xskq_validate_addr(q, addr);
+	return xskq_validate_addr(q, addr, umem);
 }
 
 static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -230,8 +269,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+				      struct xdp_umem *umem)
 {
+	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
+		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+			return false;
+
+		if (d->len > umem->chunk_size_nohr || d->options) {
+			q->invalid_descs++;
+			return false;
+		}
+
+		return true;
+	}
+
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
@@ -245,14 +297,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 }
 
 static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc)
+						  struct xdp_desc *desc,
+						  struct xdp_umem *umem)
 {
 	while (q->cons_tail != q->cons_head) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc))
+		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
 		q->cons_tail++;
@@ -262,7 +315,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 }
 
 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc)
+					      struct xdp_desc *desc,
+					      struct xdp_umem *umem)
 {
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
@@ -273,7 +327,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 		smp_rmb(); /* C, matches B */
 	}
 
-	return xskq_validate_desc(q, desc);
+	return xskq_validate_desc(q, desc, umem);
 }
 
 static inline void xskq_discard_desc(struct xsk_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 04/12] i40e: modify driver for handling offsets
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2d6e82f72f48..eaca6162a6e6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 04/12] i40e: modify driver for handling offsets
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function for handling the offset
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2d6e82f72f48..eaca6162a6e6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = I40E_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	u32 act;
@@ -201,7 +203,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	 */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -262,7 +267,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -299,7 +304,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
@@ -464,7 +469,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 05/12] ixgbe: modify driver for handling offsets
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index e1f743470ae9..17061c799f72 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 05/12] ixgbe: modify driver for handling offsets
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index e1f743470ae9..17061c799f72 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 			    struct ixgbe_ring *rx_ring,
 			    struct xdp_buff *xdp)
 {
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	int err, result = IXGBE_XDP_PASS;
+	u64 offset = umem->headroom;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
 	u32 act;
@@ -151,7 +153,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
-	xdp->handle += xdp->data - xdp->data_hard_start;
+	offset += xdp->data - xdp->data_hard_start;
+
+	xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
@@ -243,7 +248,7 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
 	bi->addr += hr;
 
-	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+	bi->handle = (u64)handle;
 }
 
 static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
@@ -269,7 +274,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr(umem);
 	return true;
@@ -296,7 +301,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 	bi->addr = xdp_umem_get_data(umem, handle);
 	bi->addr += hr;
 
-	bi->handle = handle + umem->headroom;
+	bi->handle = handle;
 
 	xsk_umem_discard_addr_rq(umem);
 	return true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 06/12] mlx5e: modify driver for handling offsets
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset

v4:
  - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
    now.

v5:
  - Fixed typo: handle_offset -> adjust_offset
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 1ed5c33e022f..f049e0ac308a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_adjust_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 6a55573ec8f2..7c49a66d28c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	if (!xsk_umem_peek_addr_rq(umem, &handle))
 		return -ENOMEM;
 
-	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
+	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
+						      rq->buff.umem_headroom);
 	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
 
 	/* No need to add headroom to the DMA address. In striding RQ case, we
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 06/12] mlx5e: modify driver for handling offsets
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

---
v3:
  - Use new helper function to handle offset

v4:
  - fixed headroom addition to handle. Using xsk_umem_adjust_headroom()
    now.

v5:
  - Fixed typo: handle_offset -> adjust_offset
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 1ed5c33e022f..f049e0ac308a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
 {
 	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+	struct xdp_umem *umem = rq->umem;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
@@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	xdp.rxq = &rq->xdp_rxq;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
-	if (xsk)
-		xdp.handle += xdp.data - xdp.data_hard_start;
+	if (xsk) {
+		u64 off = xdp.data - xdp.data_hard_start;
+
+		xdp.handle = xsk_umem_adjust_offset(umem, xdp.handle, off);
+	}
 	switch (act) {
 	case XDP_PASS:
 		*rx_headroom = xdp.data - xdp.data_hard_start;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 6a55573ec8f2..7c49a66d28c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -24,7 +24,8 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	if (!xsk_umem_peek_addr_rq(umem, &handle))
 		return -ENOMEM;
 
-	dma_info->xsk.handle = handle + rq->buff.umem_headroom;
+	dma_info->xsk.handle = xsk_umem_adjust_offset(umem, handle,
+						      rq->buff.umem_headroom);
 	dma_info->xsk.data = xdp_umem_get_data(umem, dma_info->xsk.handle);
 
 	/* No need to add headroom to the DMA address. In striding RQ case, we
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 07/12] net/mlx5e: Allow XSK frames smaller than a page
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan

From: Maxim Mikityanskiy <maximmi@mellanox.com>

Relax the requirements to the XSK frame size to allow it to be smaller
than a page and even not a power of two. The current implementation can
work in this mode, both with Striding RQ and without it.

The code that checks `mtu + headroom <= XSK frame size` is modified
accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.

Functions that worked with pages only now work with XSK frames, even if
their size is different from PAGE_SIZE.

With XSK queues, regardless of the frame size, Striding RQ uses the
stride size of PAGE_SIZE, and UMR MTTs are posted using starting
addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
packet data will overlap with other frames. UMR MTT size is made equal
to the stride size of the RQ, because UMEM frames may come in random
order, and we need to handle them one by one. PAGE_SIZE is just a power
of two that is bigger than any allowed XSK frame size, and also it
doesn't require making additional changes to the code.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/params.c   | 23 +++++++++++++++----
 .../ethernet/mellanox/mlx5/core/en/params.h   |  2 ++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  2 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++++++++----
 4 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 79301d116667..eb2e1f2138e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -25,18 +25,33 @@ u16 mlx5e_get_linear_rq_headroom(struct mlx5e_params *params,
 	return headroom;
 }
 
-u32 mlx5e_rx_get_linear_frag_sz(struct mlx5e_params *params,
-				struct mlx5e_xsk_param *xsk)
+u32 mlx5e_rx_get_min_frag_sz(struct mlx5e_params *params,
+			     struct mlx5e_xsk_param *xsk)
 {
 	u32 hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu);
 	u16 linear_rq_headroom = mlx5e_get_linear_rq_headroom(params, xsk);
-	u32 frag_sz = linear_rq_headroom + hw_mtu;
+
+	return linear_rq_headroom + hw_mtu;
+}
+
+u32 mlx5e_rx_get_linear_frag_sz(struct mlx5e_params *params,
+				struct mlx5e_xsk_param *xsk)
+{
+	u32 frag_sz = mlx5e_rx_get_min_frag_sz(params, xsk);
 
 	/* AF_XDP doesn't build SKBs in place. */
 	if (!xsk)
 		frag_sz = MLX5_SKB_FRAG_SZ(frag_sz);
 
-	/* XDP in mlx5e doesn't support multiple packets per page. */
+	/* XDP in mlx5e doesn't support multiple packets per page. AF_XDP is a
+	 * special case. It can run with frames smaller than a page, as it
+	 * doesn't allocate pages dynamically. However, here we pretend that
+	 * fragments are page-sized: it allows to treat XSK frames like pages
+	 * by redirecting alloc and free operations to XSK rings and by using
+	 * the fact there are no multiple packets per "page" (which is a frame).
+	 * The latter is important, because frames may come in a random order,
+	 * and we will have trouble assemblying a real page of multiple frames.
+	 */
 	if (mlx5e_rx_is_xdp(params, xsk))
 		frag_sz = max_t(u32, frag_sz, PAGE_SIZE);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 3a615d663d84..989d8f429438 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -76,6 +76,8 @@ static inline bool mlx5e_qid_validate(const struct mlx5e_profile *profile,
 
 u16 mlx5e_get_linear_rq_headroom(struct mlx5e_params *params,
 				 struct mlx5e_xsk_param *xsk);
+u32 mlx5e_rx_get_min_frag_sz(struct mlx5e_params *params,
+			     struct mlx5e_xsk_param *xsk);
 u32 mlx5e_rx_get_linear_frag_sz(struct mlx5e_params *params,
 				struct mlx5e_xsk_param *xsk);
 u8 mlx5e_mpwqe_log_pkts_per_wqe(struct mlx5e_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 7c49a66d28c9..475b6bd5d29b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -105,7 +105,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 
 	/* head_offset is not used in this function, because di->xsk.data and
 	 * di->addr point directly to the necessary place. Furthermore, in the
-	 * current implementation, one page = one packet = one frame, so
+	 * current implementation, UMR pages are mapped to XSK frames, so
 	 * head_offset should always be 0.
 	 */
 	WARN_ON_ONCE(head_offset);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index f701e4f3c076..d549f770cb4f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -4,18 +4,23 @@
 #include "setup.h"
 #include "en/params.h"
 
+/* It matches XDP_UMEM_MIN_CHUNK_SIZE, but as this constant is private and may
+ * change unexpectedly, and mlx5e has a minimum valid stride size for striding
+ * RQ, keep this check in the driver.
+ */
+#define MLX5E_MIN_XSK_CHUNK_SIZE 2048
+
 bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
 			      struct mlx5e_xsk_param *xsk,
 			      struct mlx5_core_dev *mdev)
 {
-	/* AF_XDP doesn't support frames larger than PAGE_SIZE, and the current
-	 * mlx5e XDP implementation doesn't support multiple packets per page.
-	 */
-	if (xsk->chunk_size != PAGE_SIZE)
+	/* AF_XDP doesn't support frames larger than PAGE_SIZE. */
+	if (xsk->chunk_size > PAGE_SIZE ||
+			xsk->chunk_size < MLX5E_MIN_XSK_CHUNK_SIZE)
 		return false;
 
 	/* Current MTU and XSK headroom don't allow packets to fit the frames. */
-	if (mlx5e_rx_get_linear_frag_sz(params, xsk) > xsk->chunk_size)
+	if (mlx5e_rx_get_min_frag_sz(params, xsk) > xsk->chunk_size)
 		return false;
 
 	/* frag_sz is different for regular and XSK RQs, so ensure that linear
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 07/12] net/mlx5e: Allow XSK frames smaller than a page
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

From: Maxim Mikityanskiy <maximmi@mellanox.com>

Relax the requirements to the XSK frame size to allow it to be smaller
than a page and even not a power of two. The current implementation can
work in this mode, both with Striding RQ and without it.

The code that checks `mtu + headroom <= XSK frame size` is modified
accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.

Functions that worked with pages only now work with XSK frames, even if
their size is different from PAGE_SIZE.

With XSK queues, regardless of the frame size, Striding RQ uses the
stride size of PAGE_SIZE, and UMR MTTs are posted using starting
addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
packet data will overlap with other frames. UMR MTT size is made equal
to the stride size of the RQ, because UMEM frames may come in random
order, and we need to handle them one by one. PAGE_SIZE is just a power
of two that is bigger than any allowed XSK frame size, and also it
doesn't require making additional changes to the code.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/params.c   | 23 +++++++++++++++----
 .../ethernet/mellanox/mlx5/core/en/params.h   |  2 ++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  2 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++++++++----
 4 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 79301d116667..eb2e1f2138e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -25,18 +25,33 @@ u16 mlx5e_get_linear_rq_headroom(struct mlx5e_params *params,
 	return headroom;
 }
 
-u32 mlx5e_rx_get_linear_frag_sz(struct mlx5e_params *params,
-				struct mlx5e_xsk_param *xsk)
+u32 mlx5e_rx_get_min_frag_sz(struct mlx5e_params *params,
+			     struct mlx5e_xsk_param *xsk)
 {
 	u32 hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu);
 	u16 linear_rq_headroom = mlx5e_get_linear_rq_headroom(params, xsk);
-	u32 frag_sz = linear_rq_headroom + hw_mtu;
+
+	return linear_rq_headroom + hw_mtu;
+}
+
+u32 mlx5e_rx_get_linear_frag_sz(struct mlx5e_params *params,
+				struct mlx5e_xsk_param *xsk)
+{
+	u32 frag_sz = mlx5e_rx_get_min_frag_sz(params, xsk);
 
 	/* AF_XDP doesn't build SKBs in place. */
 	if (!xsk)
 		frag_sz = MLX5_SKB_FRAG_SZ(frag_sz);
 
-	/* XDP in mlx5e doesn't support multiple packets per page. */
+	/* XDP in mlx5e doesn't support multiple packets per page. AF_XDP is a
+	 * special case. It can run with frames smaller than a page, as it
+	 * doesn't allocate pages dynamically. However, here we pretend that
+	 * fragments are page-sized: it allows to treat XSK frames like pages
+	 * by redirecting alloc and free operations to XSK rings and by using
+	 * the fact there are no multiple packets per "page" (which is a frame).
+	 * The latter is important, because frames may come in a random order,
+	 * and we will have trouble assemblying a real page of multiple frames.
+	 */
 	if (mlx5e_rx_is_xdp(params, xsk))
 		frag_sz = max_t(u32, frag_sz, PAGE_SIZE);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 3a615d663d84..989d8f429438 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -76,6 +76,8 @@ static inline bool mlx5e_qid_validate(const struct mlx5e_profile *profile,
 
 u16 mlx5e_get_linear_rq_headroom(struct mlx5e_params *params,
 				 struct mlx5e_xsk_param *xsk);
+u32 mlx5e_rx_get_min_frag_sz(struct mlx5e_params *params,
+			     struct mlx5e_xsk_param *xsk);
 u32 mlx5e_rx_get_linear_frag_sz(struct mlx5e_params *params,
 				struct mlx5e_xsk_param *xsk);
 u8 mlx5e_mpwqe_log_pkts_per_wqe(struct mlx5e_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 7c49a66d28c9..475b6bd5d29b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -105,7 +105,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 
 	/* head_offset is not used in this function, because di->xsk.data and
 	 * di->addr point directly to the necessary place. Furthermore, in the
-	 * current implementation, one page = one packet = one frame, so
+	 * current implementation, UMR pages are mapped to XSK frames, so
 	 * head_offset should always be 0.
 	 */
 	WARN_ON_ONCE(head_offset);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index f701e4f3c076..d549f770cb4f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -4,18 +4,23 @@
 #include "setup.h"
 #include "en/params.h"
 
+/* It matches XDP_UMEM_MIN_CHUNK_SIZE, but as this constant is private and may
+ * change unexpectedly, and mlx5e has a minimum valid stride size for striding
+ * RQ, keep this check in the driver.
+ */
+#define MLX5E_MIN_XSK_CHUNK_SIZE 2048
+
 bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
 			      struct mlx5e_xsk_param *xsk,
 			      struct mlx5_core_dev *mdev)
 {
-	/* AF_XDP doesn't support frames larger than PAGE_SIZE, and the current
-	 * mlx5e XDP implementation doesn't support multiple packets per page.
-	 */
-	if (xsk->chunk_size != PAGE_SIZE)
+	/* AF_XDP doesn't support frames larger than PAGE_SIZE. */
+	if (xsk->chunk_size > PAGE_SIZE ||
+			xsk->chunk_size < MLX5E_MIN_XSK_CHUNK_SIZE)
 		return false;
 
 	/* Current MTU and XSK headroom don't allow packets to fit the frames. */
-	if (mlx5e_rx_get_linear_frag_sz(params, xsk) > xsk->chunk_size)
+	if (mlx5e_rx_get_min_frag_sz(params, xsk) > xsk->chunk_size)
 		return false;
 
 	/* frag_sz is different for regular and XSK RQs, so ensure that linear
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 08/12] libbpf: add flags to umem config
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Since we change the size of the xsk_umem_config struct, we need to version
the ABI. This patch includes the ABI versioning for xsk_umem__create. The
Makefile was also updated to handle multiple function versions in
check-abi.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.

v4:
  - Modified chunk flag define.

v5:
  - Added ABI versioning for xsk_umem__create().
  - Updated Makefile to handle multiple function versions.
  - Removed bitfields from xdp_umem_reg
  - Fix conflicts after 'bpf-af-xdp-wakeup' was merged
---
 tools/include/uapi/linux/if_xdp.h |  9 +++++++++
 tools/lib/bpf/Makefile            |  5 ++++-
 tools/lib/bpf/libbpf.map          |  1 +
 tools/lib/bpf/xsk.c               | 33 ++++++++++++++++++++++++++++---
 tools/lib/bpf/xsk.h               | 27 +++++++++++++++++++++++++
 5 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 613acb93b144..c6f94cffe06e 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -134,7 +134,9 @@ LIB_FILE	:= $(addprefix $(OUTPUT),$(LIB_FILE))
 PC_FILE		:= $(addprefix $(OUTPUT),$(PC_FILE))
 
 GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN) | \
-			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {s++} END{print s}')
+			   cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' | \
+			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}' | \
+			   sort -u | wc -l)
 VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
 			      grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
 
@@ -201,6 +203,7 @@ check_abi: $(OUTPUT)libbpf.so
 		     "Please make sure all LIBBPF_API symbols are"	 \
 		     "versioned in $(VERSION_SCRIPT)." >&2;		 \
 		readelf -s --wide $(OUTPUT)libbpf-in.o |		 \
+		    cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' |	 \
 		    awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}'|   \
 		    sort -u > $(OUTPUT)libbpf_global_syms.tmp;		 \
 		readelf -s --wide $(OUTPUT)libbpf.so |			 \
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 664ce8e7a60e..d04c7cb623ed 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -183,6 +183,7 @@ LIBBPF_0.0.4 {
 		perf_buffer__new;
 		perf_buffer__new_raw;
 		perf_buffer__poll;
+		xsk_umem__create;
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 12ad78510147..842c4fd55859 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -99,6 +99,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -106,6 +107,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -132,9 +134,10 @@ static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
 	return 0;
 }
 
-int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
-		     struct xsk_ring_prod *fill, struct xsk_ring_cons *comp,
-		     const struct xsk_umem_config *usr_config)
+int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
 {
 	struct xdp_mmap_offsets off;
 	struct xdp_umem_reg mr;
@@ -165,6 +168,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
@@ -238,6 +242,29 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	return err;
 }
 
+struct xsk_umem_config_v1 {
+	__u32 fill_size;
+	__u32 comp_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+int xsk_umem__create_v0_0_2(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
+{
+	struct xsk_umem_config config;
+
+	memcpy(&config, usr_config, sizeof(struct xsk_umem_config_v1));
+	config.flags = 0;
+
+	return xsk_umem__create_v0_0_4(umem_ptr, umem_area, size, fill, comp,
+					&config);
+}
+asm(".symver xsk_umem__create_v0_0_2, xsk_umem__create@LIBBPF_0.0.2");
+asm(".symver xsk_umem__create_v0_0_4, xsk_umem__create@@LIBBPF_0.0.4");
+
 static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 {
 	static const int log_buf_size = 16 * 1024;
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index aa1d6122b7db..584f6820a639 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -168,6 +168,21 @@ static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
 	return &((char *)umem_area)[addr];
 }
 
+static inline __u64 xsk_umem__extract_addr(__u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline __u64 xsk_umem__extract_offset(__u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
+{
+	return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
+}
+
 LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
 LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 
@@ -176,12 +191,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
@@ -201,6 +218,16 @@ LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
 				struct xsk_ring_prod *fill,
 				struct xsk_ring_cons *comp,
 				const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_2(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_4(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
 LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
 				  const char *ifname, __u32 queue_id,
 				  struct xsk_umem *umem,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 08/12] libbpf: add flags to umem config
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.

The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.

Since we change the size of the xsk_umem_config struct, we need to version
the ABI. This patch includes the ABI versioning for xsk_umem__create. The
Makefile was also updated to handle multiple function versions in
check-abi.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v2:
  - Removed the headroom check from this patch. It has moved to the
    previous patch.

v4:
  - Modified chunk flag define.

v5:
  - Added ABI versioning for xsk_umem__create().
  - Updated Makefile to handle multiple function versions.
  - Removed bitfields from xdp_umem_reg
  - Fix conflicts after 'bpf-af-xdp-wakeup' was merged
---
 tools/include/uapi/linux/if_xdp.h |  9 +++++++++
 tools/lib/bpf/Makefile            |  5 ++++-
 tools/lib/bpf/libbpf.map          |  1 +
 tools/lib/bpf/xsk.c               | 33 ++++++++++++++++++++++++++++---
 tools/lib/bpf/xsk.h               | 27 +++++++++++++++++++++++++
 5 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 62b80d57b72a..be328c59389d 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -26,6 +26,9 @@
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
 
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
+
 struct sockaddr_xdp {
 	__u16 sxdp_family;
 	__u16 sxdp_flags;
@@ -66,6 +69,7 @@ struct xdp_umem_reg {
 	__u64 len; /* Length of packet data area */
 	__u32 chunk_size;
 	__u32 headroom;
+	__u32 flags;
 };
 
 struct xdp_statistics {
@@ -87,6 +91,11 @@ struct xdp_options {
 #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
 #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
 
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
 /* Rx/Tx descriptor */
 struct xdp_desc {
 	__u64 addr;
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 613acb93b144..c6f94cffe06e 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -134,7 +134,9 @@ LIB_FILE	:= $(addprefix $(OUTPUT),$(LIB_FILE))
 PC_FILE		:= $(addprefix $(OUTPUT),$(PC_FILE))
 
 GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN) | \
-			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {s++} END{print s}')
+			   cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' | \
+			   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}' | \
+			   sort -u | wc -l)
 VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
 			      grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
 
@@ -201,6 +203,7 @@ check_abi: $(OUTPUT)libbpf.so
 		     "Please make sure all LIBBPF_API symbols are"	 \
 		     "versioned in $(VERSION_SCRIPT)." >&2;		 \
 		readelf -s --wide $(OUTPUT)libbpf-in.o |		 \
+		    cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' |	 \
 		    awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$8}'|   \
 		    sort -u > $(OUTPUT)libbpf_global_syms.tmp;		 \
 		readelf -s --wide $(OUTPUT)libbpf.so |			 \
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 664ce8e7a60e..d04c7cb623ed 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -183,6 +183,7 @@ LIBBPF_0.0.4 {
 		perf_buffer__new;
 		perf_buffer__new_raw;
 		perf_buffer__poll;
+		xsk_umem__create;
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 12ad78510147..842c4fd55859 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -99,6 +99,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
 		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
 		return;
 	}
 
@@ -106,6 +107,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
 	cfg->comp_size = usr_cfg->comp_size;
 	cfg->frame_size = usr_cfg->frame_size;
 	cfg->frame_headroom = usr_cfg->frame_headroom;
+	cfg->flags = usr_cfg->flags;
 }
 
 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -132,9 +134,10 @@ static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
 	return 0;
 }
 
-int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
-		     struct xsk_ring_prod *fill, struct xsk_ring_cons *comp,
-		     const struct xsk_umem_config *usr_config)
+int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
 {
 	struct xdp_mmap_offsets off;
 	struct xdp_umem_reg mr;
@@ -165,6 +168,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	mr.len = size;
 	mr.chunk_size = umem->config.frame_size;
 	mr.headroom = umem->config.frame_headroom;
+	mr.flags = umem->config.flags;
 
 	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
 	if (err) {
@@ -238,6 +242,29 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	return err;
 }
 
+struct xsk_umem_config_v1 {
+	__u32 fill_size;
+	__u32 comp_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+int xsk_umem__create_v0_0_2(struct xsk_umem **umem_ptr, void *umem_area,
+			    __u64 size, struct xsk_ring_prod *fill,
+			    struct xsk_ring_cons *comp,
+			    const struct xsk_umem_config *usr_config)
+{
+	struct xsk_umem_config config;
+
+	memcpy(&config, usr_config, sizeof(struct xsk_umem_config_v1));
+	config.flags = 0;
+
+	return xsk_umem__create_v0_0_4(umem_ptr, umem_area, size, fill, comp,
+					&config);
+}
+asm(".symver xsk_umem__create_v0_0_2, xsk_umem__create at LIBBPF_0.0.2");
+asm(".symver xsk_umem__create_v0_0_4, xsk_umem__create@@LIBBPF_0.0.4");
+
 static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 {
 	static const int log_buf_size = 16 * 1024;
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index aa1d6122b7db..584f6820a639 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -168,6 +168,21 @@ static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
 	return &((char *)umem_area)[addr];
 }
 
+static inline __u64 xsk_umem__extract_addr(__u64 addr)
+{
+	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
+}
+
+static inline __u64 xsk_umem__extract_offset(__u64 addr)
+{
+	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+}
+
+static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
+{
+	return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
+}
+
 LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
 LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 
@@ -176,12 +191,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
 
 struct xsk_umem_config {
 	__u32 fill_size;
 	__u32 comp_size;
 	__u32 frame_size;
 	__u32 frame_headroom;
+	__u32 flags;
 };
 
 /* Flags for the libbpf_flags field. */
@@ -201,6 +218,16 @@ LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
 				struct xsk_ring_prod *fill,
 				struct xsk_ring_cons *comp,
 				const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_2(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
+LIBBPF_API int xsk_umem__create_v0_0_4(struct xsk_umem **umem,
+				       void *umem_area, __u64 size,
+				       struct xsk_ring_prod *fill,
+				       struct xsk_ring_cons *comp,
+				       const struct xsk_umem_config *config);
 LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
 				  const char *ifname, __u32 queue_id,
 				  struct xsk_umem *umem,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 09/12] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v4:
  - updated help text for -f
  - use new chunk flag define
---
 samples/bpf/xdpsock_user.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index da84c760c094..7312eab4d201 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,6 +68,9 @@ static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
+static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
@@ -284,7 +287,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -293,6 +298,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -355,6 +361,7 @@ static struct option long_options[] = {
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
 	{"no-need-wakeup", no_argument, 0, 'm'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +383,8 @@ static void usage(const char *prog)
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
 		"  -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two in aligned mode, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -388,8 +397,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:m",
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:mu",
 				long_options, &option_index);
 		if (c == -1)
 			break;
@@ -429,6 +437,10 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -438,6 +450,7 @@ static void parse_command_line(int argc, char **argv)
 			opt_need_wakeup = false;
 			opt_xdp_bind_flags &= ~XDP_USE_NEED_WAKEUP;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -450,7 +463,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+	    !opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 09/12] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.

Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

---
v4:
  - updated help text for -f
  - use new chunk flag define
---
 samples/bpf/xdpsock_user.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index da84c760c094..7312eab4d201 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,6 +68,9 @@ static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
+static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
@@ -284,7 +287,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+		.flags = opt_umem_flags
 	};
+
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -293,6 +298,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
 			       &cfg);
+
 	if (ret)
 		exit_with_error(-ret);
 
@@ -355,6 +361,7 @@ static struct option long_options[] = {
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
 	{"no-need-wakeup", no_argument, 0, 'm'},
+	{"unaligned", no_argument, 0, 'u'},
 	{0, 0, 0, 0}
 };
 
@@ -376,6 +383,8 @@ static void usage(const char *prog)
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
 		"  -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two in aligned mode, default is %d).\n"
+		"  -u, --unaligned	Enable unaligned chunk placement\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -388,8 +397,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:m",
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:mu",
 				long_options, &option_index);
 		if (c == -1)
 			break;
@@ -429,6 +437,10 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'u':
+			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+			opt_unaligned_chunks = 1;
+			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
@@ -438,6 +450,7 @@ static void parse_command_line(int argc, char **argv)
 			opt_need_wakeup = false;
 			opt_xdp_bind_flags &= ~XDP_USE_NEED_WAKEUP;
 			break;
+
 		default:
 			usage(basename(argv[0]));
 		}
@@ -450,7 +463,8 @@ static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
-	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+	if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+	    !opt_unaligned_chunks) {
 		fprintf(stderr, "--frame-size=%d is not a power of two\n",
 			opt_xsk_frame_size);
 		usage(basename(argv[0]));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 10/12] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling

v5:
  - Use new accessors for addr/offset instead of macros.
---
 samples/bpf/xdpsock_user.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7312eab4d201..dc3d50f8ed86 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -484,6 +484,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 				     struct pollfd *fds)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -498,24 +499,23 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
+			if (xsk_ring_prod__needs_wakeup(&umem->fq))
 				ret = poll(fds, num_socks, opt_timeout);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -568,10 +568,13 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = orig;
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -680,12 +683,15 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		swap_mac_addresses(pkt);
 
 		hex_dump(pkt, len, addr);
-		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = addr;
+		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = orig;
 		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx++)->len = len;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 10/12] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2:
  - Removed unused defines
  - Fix buffer recycling for unaligned case
  - Remove --buf-size (--frame-size merged before this)
  - Modifications to use the new descriptor format for buffer recycling

v5:
  - Use new accessors for addr/offset instead of macros.
---
 samples/bpf/xdpsock_user.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7312eab4d201..dc3d50f8ed86 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -484,6 +484,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 				     struct pollfd *fds)
 {
+	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
 	unsigned int rcvd;
 	size_t ndescs;
@@ -498,24 +499,23 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+	rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
 		unsigned int i;
 		int ret;
 
-		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
+			if (xsk_ring_prod__needs_wakeup(&umem->fq))
 				ret = poll(fds, num_socks, opt_timeout);
-			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
-						     &idx_fq);
+			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
+
 		for (i = 0; i < rcvd; i++)
-			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
-				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
-							  idx_cq++);
+			*xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
 
 		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -568,10 +568,13 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		hex_dump(pkt, len, addr);
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = orig;
 	}
 
 	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
@@ -680,12 +683,15 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 	for (i = 0; i < rcvd; i++) {
 		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
 		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		u64 orig = xsk_umem__extract_addr(addr);
+
+		addr = xsk_umem__add_offset_to_addr(addr);
 		char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
 
 		swap_mac_addresses(pkt);
 
 		hex_dump(pkt, len, addr);
-		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = addr;
+		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = orig;
 		xsk_ring_prod__tx_desc(&xsk->tx, idx_tx++)->len = len;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 11/12] samples/bpf: use hugepages in xdpsock app
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index dc3d50f8ed86..102eace22956 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -70,6 +70,7 @@ static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
@@ -440,6 +441,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -742,11 +744,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 11/12] samples/bpf: use hugepages in xdpsock app
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index dc3d50f8ed86..102eace22956 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -70,6 +70,7 @@ static int opt_interval = 1;
 static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
+static int opt_mmap_flags;
 static u32 opt_xdp_bind_flags;
 static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static int opt_timeout = 1000;
@@ -440,6 +441,7 @@ static void parse_command_line(int argc, char **argv)
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
 			opt_unaligned_chunks = 1;
+			opt_mmap_flags = MAP_HUGETLB;
 			break;
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -742,11 +744,14 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * opt_xsk_frame_size);
-	if (ret)
-		exit_with_error(ret);
-
+	/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+	bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED) {
+		printf("ERROR: mmap failed\n");
+		exit(EXIT_FAILURE);
+	}
        /* Create sockets... */
 	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH bpf-next v6 12/12] doc/af_xdp: include unaligned chunk case
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  2:25             ` Kevin Laatz
  -1 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 12/12] doc/af_xdp: include unaligned chunk case
@ 2019-08-27  2:25             ` Kevin Laatz
  0 siblings, 0 replies; 274+ messages in thread
From: Kevin Laatz @ 2019-08-27  2:25 UTC (permalink / raw)
  To: intel-wired-lan

The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 Documentation/networking/af_xdp.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
 
 Frames passed to the kernel are used for the ingress path (RX rings).
 
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
 
 
 UMEM Completion Ring
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
  2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-27  7:36             ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-08-27  7:36 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, Saeed Mahameed, stephen,
	bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 2019-08-22 04:44, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> 
> v4:
>    - fixed page_start calculation in __xsk_rcv_memcpy().
>    - move offset handling to the xdp_umem_get_* functions
>    - modified the len field in xdp_umem_reg struct. We now use 16 bits from
>      this for the flags field.
>    - removed next_pg_contig field from xdp_umem_page struct. Using low 12
>      bits of addr to store flags instead.
>    - other minor changes based on review comments
> 
> v5:
>    - Added accessors for getting addr and offset
>    - Added helper function to add offset to addr
>    - Fixed offset handling in xsk_rcv
>    - Removed bitfields from xdp_umem_reg
>    - Added struct size checking for xdp_umem_reg in xsk_setsockopt to handle
>      different versions of the struct.
>    - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
> ---
>   include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 19 ++++++--
>   net/xdp/xsk.c               | 96 +++++++++++++++++++++++++++++--------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
>   6 files changed, 232 insertions(+), 37 deletions(-)
> 
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index f023b9940d64..c9398ce7960f 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>   struct net_device;
>   struct xsk_queue;
>   
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>   struct xdp_umem_page {
>   	void *addr;
>   	dma_addr_t dma;
> @@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
>   	u64 handles[];
>   };
>   
> -/* Flags for the umem flags field. */
> -#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
> +/* Flags for the umem flags field.
> + *
> + * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public
> + * flags. See inlude/uapi/include/linux/if_xdp.h.
> + */
> +#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
>   
>   struct xdp_umem {
>   	struct xsk_queue *fq;
> @@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
>   int xsk_map_inc(struct xsk_map *map);
>   void xsk_map_put(struct xsk_map *map);
>   
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
> +}
> +
>   static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
>   {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
> +	unsigned long page_addr;
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>   }
>   
>   static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
>   {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>   }
>   
>   /* Reuse-queue aware version of FILL queue helpers */
> @@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>   
>   	rq->handles[rq->length++] = addr;
>   }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>   #else
>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
> @@ -241,6 +287,21 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
>   	return NULL;
>   }
>   
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
>   static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
>   {
>   	return NULL;
> @@ -290,6 +351,12 @@ static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>   	return false;
>   }
>   
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>   #endif /* CONFIG_XDP_SOCKETS */
>   
>   #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index 62b80d57b72a..be328c59389d 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -26,6 +26,9 @@
>    */
>   #define XDP_USE_NEED_WAKEUP (1 << 3)
>   
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
> +
>   struct sockaddr_xdp {
>   	__u16 sxdp_family;
>   	__u16 sxdp_flags;
> @@ -66,6 +69,7 @@ struct xdp_umem_reg {
>   	__u64 len; /* Length of packet data area */
>   	__u32 chunk_size;
>   	__u32 headroom;
> +	__u32 flags;
>   };
>   
>   struct xdp_statistics {
> @@ -87,6 +91,11 @@ struct xdp_options {
>   #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>   #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>   
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>   /* Rx/Tx descriptor */
>   struct xdp_desc {
>   	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2d65779282a1..e997b263a0dd 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>   
>   static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>   	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>   	unsigned int chunks, chunks_per_page;
>   	u64 addr = mr->addr, size = mr->len;
> @@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   		return -EINVAL;
>   	}
>   
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
> +			XDP_UMEM_USES_NEED_WAKEUP))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>   		return -EINVAL;
>   
>   	if (!PAGE_ALIGNED(addr)) {
> @@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   	if (chunks == 0)
>   		return -EINVAL;
>   
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>   
>   	headroom = ALIGN(headroom, 64);
>   
> @@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   		return -EINVAL;
>   
>   	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>   	umem->size = size;
>   	umem->headroom = headroom;
>   	umem->chunk_size_nohr = chunk_size - headroom;
>   	umem->npgs = size / PAGE_SIZE;
>   	umem->pgs = NULL;
>   	umem->user = NULL;
> +	umem->flags = mr->flags;
>   	INIT_LIST_HEAD(&umem->xsk_list);
>   	spin_lock_init(&umem->xsk_list_lock);
>   
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..907e5f12338f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>   
>   u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>   {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>   }
>   EXPORT_SYMBOL(xsk_umem_peek_addr);
>   
> @@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>   }
>   EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
>   
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>   static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>   {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	u64 addr, memcpy_addr;
> +	void *from_buf;
>   	u32 metalen;
> -	u64 addr;
>   	int err;
>   
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>   		xs->rx_dropped++;
>   		return -ENOSPC;
>   	}
>   
> -	addr += xs->umem->headroom;
> -
>   	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>   		from_buf = xdp->data;
>   		metalen = 0;
> @@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>   		metalen = xdp->data - xdp->data_meta;
>   	}
>   
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> +	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>   	if (!err) {
>   		xskq_discard_addr(xs->umem->fq);
> @@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
>   	u32 metalen = xdp->data - xdp->data_meta;
>   	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>   	void *buffer;
>   	u64 addr;
>   	int err;
> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   		goto out_unlock;
>   	}
>   
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>   		err = -ENOSPC;
>   		goto out_drop;
>   	}
>   
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>   	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>   	if (err)
>   		goto out_drop;
> @@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
>   
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>   			continue;
>   
>   		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>   	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>   		goto out;
>   
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>   		char *buffer;
>   		u64 addr;
>   		u32 len;
> @@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>   		skb->dev = xs->dev;
>   		skb->priority = sk->sk_priority;
>   		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>   		skb->destructor = xsk_destruct_skb;
>   
>   		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
>   	return sock;
>   }
>   
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>   static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   {
>   	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>   		if (err)
>   			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>   	}
>   
>   	xs->dev = dev;
> @@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   	return err;
>   }
>   
> +struct xdp_umem_reg_v1 {
> +	__u64 addr; /* Start of packet data area */
> +	__u64 len; /* Length of packet data area */
> +	__u32 chunk_size;
> +	__u32 headroom;
> +};
> +
>   static int xsk_setsockopt(struct socket *sock, int level, int optname,
>   			  char __user *optval, unsigned int optlen)
>   {
> @@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
>   	}
>   	case XDP_UMEM_REG:
>   	{
> -		struct xdp_umem_reg mr;
> +		size_t mr_size = sizeof(struct xdp_umem_reg);
> +		struct xdp_umem_reg mr = {};
>   		struct xdp_umem *umem;
>   
> -		if (copy_from_user(&mr, optval, sizeof(mr)))
> +		if (optlen < sizeof(struct xdp_umem_reg_v1))
> +			return -EINVAL;
> +		else if (optlen < sizeof(mr))
> +			mr_size = sizeof(struct xdp_umem_reg_v1);
> +
> +		if (copy_from_user(&mr, optval, mr_size))
>   			return -EFAULT;
>   
>   		mutex_lock(&xs->mutex);
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
>   	du.id = umem->id;
>   	du.size = umem->size;
>   	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>   	du.headroom = umem->headroom;
>   	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>   	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index dd9e985c2461..6c67c9d0294f 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
>   
>   /* UMEM queue */
>   
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>   static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>   {
>   	if (addr >= q->size) {
> @@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>   	return true;
>   }
>   
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (addr >= q->size ||

I know I already asked about it, but I feel we need to clarify things 
here further.

The `addr >= q->size` check is good, it checks that the address doesn't 
overflow the UMEM. However, the new encoding of UMEM handles consists of 
two parts: base address of the frame (low bits) and offset (high bits), 
that are added together by xsk_umem_add_offset_to_addr. It's possible to 
craft an address that has too big base address of the frame (all 
0xff...f), some non-zero offset, so after adding them together we pass 
`addr < q->size`, but the base address exceeds the UMEM. In my opinion, 
we should not allow such addresses, because the base address of the 
frame is out of bounds, so we need one more check here. What do you 
think about this point?

> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>   {
>   	while (q->cons_tail != q->cons_head) {
>   		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>   		unsigned int idx = q->cons_tail & q->ring_mask;
>   
>   		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>   		if (xskq_is_valid_addr(q, *addr))
>   			return addr;
>   
> +out:
>   		q->cons_tail++;
>   	}
>   
>   	return NULL;
>   }
>   
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>   {
>   	if (q->cons_tail == q->cons_head) {
>   		smp_mb(); /* D, matches A */
> @@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
>   		smp_rmb();
>   	}
>   
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>   }
>   
>   static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
>   
>   /* Rx/Tx queue */
>   
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
> +				      struct xdp_umem *umem)
>   {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>   	if (!xskq_is_valid_addr(q, d->addr))
>   		return false;
>   
> @@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
>   }
>   
>   static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>   {
>   	while (q->cons_tail != q->cons_head) {
>   		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>   		unsigned int idx = q->cons_tail & q->ring_mask;
>   
>   		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>   			return desc;
>   
>   		q->cons_tail++;
> @@ -262,7 +313,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
>   }
>   
>   static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>   {
>   	if (q->cons_tail == q->cons_head) {
>   		smp_mb(); /* D, matches A */
> @@ -273,7 +325,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
>   		smp_rmb(); /* C, matches B */
>   	}
>   
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>   }
>   
>   static inline void xskq_discard_desc(struct xsk_queue *q)
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
@ 2019-08-27  7:36             ` Maxim Mikityanskiy
  0 siblings, 0 replies; 274+ messages in thread
From: Maxim Mikityanskiy @ 2019-08-27  7:36 UTC (permalink / raw)
  To: intel-wired-lan

On 2019-08-22 04:44, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> 
> v4:
>    - fixed page_start calculation in __xsk_rcv_memcpy().
>    - move offset handling to the xdp_umem_get_* functions
>    - modified the len field in xdp_umem_reg struct. We now use 16 bits from
>      this for the flags field.
>    - removed next_pg_contig field from xdp_umem_page struct. Using low 12
>      bits of addr to store flags instead.
>    - other minor changes based on review comments
> 
> v5:
>    - Added accessors for getting addr and offset
>    - Added helper function to add offset to addr
>    - Fixed offset handling in xsk_rcv
>    - Removed bitfields from xdp_umem_reg
>    - Added struct size checking for xdp_umem_reg in xsk_setsockopt to handle
>      different versions of the struct.
>    - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
> ---
>   include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 19 ++++++--
>   net/xdp/xsk.c               | 96 +++++++++++++++++++++++++++++--------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
>   6 files changed, 232 insertions(+), 37 deletions(-)
> 
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index f023b9940d64..c9398ce7960f 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>   struct net_device;
>   struct xsk_queue;
>   
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>   struct xdp_umem_page {
>   	void *addr;
>   	dma_addr_t dma;
> @@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
>   	u64 handles[];
>   };
>   
> -/* Flags for the umem flags field. */
> -#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
> +/* Flags for the umem flags field.
> + *
> + * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public
> + * flags. See inlude/uapi/include/linux/if_xdp.h.
> + */
> +#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
>   
>   struct xdp_umem {
>   	struct xsk_queue *fq;
> @@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
>   int xsk_map_inc(struct xsk_map *map);
>   void xsk_map_put(struct xsk_map *map);
>   
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
> +}
> +
>   static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
>   {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
> +	unsigned long page_addr;
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>   }
>   
>   static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
>   {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>   }
>   
>   /* Reuse-queue aware version of FILL queue helpers */
> @@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>   
>   	rq->handles[rq->length++] = addr;
>   }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>   #else
>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
> @@ -241,6 +287,21 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
>   	return NULL;
>   }
>   
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
>   static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
>   {
>   	return NULL;
> @@ -290,6 +351,12 @@ static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>   	return false;
>   }
>   
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>   #endif /* CONFIG_XDP_SOCKETS */
>   
>   #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index 62b80d57b72a..be328c59389d 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -26,6 +26,9 @@
>    */
>   #define XDP_USE_NEED_WAKEUP (1 << 3)
>   
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
> +
>   struct sockaddr_xdp {
>   	__u16 sxdp_family;
>   	__u16 sxdp_flags;
> @@ -66,6 +69,7 @@ struct xdp_umem_reg {
>   	__u64 len; /* Length of packet data area */
>   	__u32 chunk_size;
>   	__u32 headroom;
> +	__u32 flags;
>   };
>   
>   struct xdp_statistics {
> @@ -87,6 +91,11 @@ struct xdp_options {
>   #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>   #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>   
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>   /* Rx/Tx descriptor */
>   struct xdp_desc {
>   	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2d65779282a1..e997b263a0dd 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>   
>   static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>   	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>   	unsigned int chunks, chunks_per_page;
>   	u64 addr = mr->addr, size = mr->len;
> @@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   		return -EINVAL;
>   	}
>   
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
> +			XDP_UMEM_USES_NEED_WAKEUP))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>   		return -EINVAL;
>   
>   	if (!PAGE_ALIGNED(addr)) {
> @@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   	if (chunks == 0)
>   		return -EINVAL;
>   
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>   
>   	headroom = ALIGN(headroom, 64);
>   
> @@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   		return -EINVAL;
>   
>   	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>   	umem->size = size;
>   	umem->headroom = headroom;
>   	umem->chunk_size_nohr = chunk_size - headroom;
>   	umem->npgs = size / PAGE_SIZE;
>   	umem->pgs = NULL;
>   	umem->user = NULL;
> +	umem->flags = mr->flags;
>   	INIT_LIST_HEAD(&umem->xsk_list);
>   	spin_lock_init(&umem->xsk_list_lock);
>   
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..907e5f12338f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>   
>   u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>   {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>   }
>   EXPORT_SYMBOL(xsk_umem_peek_addr);
>   
> @@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>   }
>   EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
>   
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>   static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>   {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	u64 addr, memcpy_addr;
> +	void *from_buf;
>   	u32 metalen;
> -	u64 addr;
>   	int err;
>   
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>   		xs->rx_dropped++;
>   		return -ENOSPC;
>   	}
>   
> -	addr += xs->umem->headroom;
> -
>   	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>   		from_buf = xdp->data;
>   		metalen = 0;
> @@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>   		metalen = xdp->data - xdp->data_meta;
>   	}
>   
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> +	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>   	if (!err) {
>   		xskq_discard_addr(xs->umem->fq);
> @@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
>   	u32 metalen = xdp->data - xdp->data_meta;
>   	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>   	void *buffer;
>   	u64 addr;
>   	int err;
> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   		goto out_unlock;
>   	}
>   
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>   		err = -ENOSPC;
>   		goto out_drop;
>   	}
>   
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>   	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>   	if (err)
>   		goto out_drop;
> @@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
>   
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>   			continue;
>   
>   		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>   	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>   		goto out;
>   
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>   		char *buffer;
>   		u64 addr;
>   		u32 len;
> @@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>   		skb->dev = xs->dev;
>   		skb->priority = sk->sk_priority;
>   		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>   		skb->destructor = xsk_destruct_skb;
>   
>   		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
>   	return sock;
>   }
>   
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>   static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   {
>   	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>   		if (err)
>   			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>   	}
>   
>   	xs->dev = dev;
> @@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   	return err;
>   }
>   
> +struct xdp_umem_reg_v1 {
> +	__u64 addr; /* Start of packet data area */
> +	__u64 len; /* Length of packet data area */
> +	__u32 chunk_size;
> +	__u32 headroom;
> +};
> +
>   static int xsk_setsockopt(struct socket *sock, int level, int optname,
>   			  char __user *optval, unsigned int optlen)
>   {
> @@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
>   	}
>   	case XDP_UMEM_REG:
>   	{
> -		struct xdp_umem_reg mr;
> +		size_t mr_size = sizeof(struct xdp_umem_reg);
> +		struct xdp_umem_reg mr = {};
>   		struct xdp_umem *umem;
>   
> -		if (copy_from_user(&mr, optval, sizeof(mr)))
> +		if (optlen < sizeof(struct xdp_umem_reg_v1))
> +			return -EINVAL;
> +		else if (optlen < sizeof(mr))
> +			mr_size = sizeof(struct xdp_umem_reg_v1);
> +
> +		if (copy_from_user(&mr, optval, mr_size))
>   			return -EFAULT;
>   
>   		mutex_lock(&xs->mutex);
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
>   	du.id = umem->id;
>   	du.size = umem->size;
>   	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>   	du.headroom = umem->headroom;
>   	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>   	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index dd9e985c2461..6c67c9d0294f 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
>   
>   /* UMEM queue */
>   
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>   static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>   {
>   	if (addr >= q->size) {
> @@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>   	return true;
>   }
>   
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (addr >= q->size ||

I know I already asked about it, but I feel we need to clarify things 
here further.

The `addr >= q->size` check is good, it checks that the address doesn't 
overflow the UMEM. However, the new encoding of UMEM handles consists of 
two parts: base address of the frame (low bits) and offset (high bits), 
that are added together by xsk_umem_add_offset_to_addr. It's possible to 
craft an address that has too big base address of the frame (all 
0xff...f), some non-zero offset, so after adding them together we pass 
`addr < q->size`, but the base address exceeds the UMEM. In my opinion, 
we should not allow such addresses, because the base address of the 
frame is out of bounds, so we need one more check here. What do you 
think about this point?

> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>   {
>   	while (q->cons_tail != q->cons_head) {
>   		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>   		unsigned int idx = q->cons_tail & q->ring_mask;
>   
>   		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>   		if (xskq_is_valid_addr(q, *addr))
>   			return addr;
>   
> +out:
>   		q->cons_tail++;
>   	}
>   
>   	return NULL;
>   }
>   
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>   {
>   	if (q->cons_tail == q->cons_head) {
>   		smp_mb(); /* D, matches A */
> @@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
>   		smp_rmb();
>   	}
>   
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>   }
>   
>   static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
>   
>   /* Rx/Tx queue */
>   
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
> +				      struct xdp_umem *umem)
>   {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>   	if (!xskq_is_valid_addr(q, d->addr))
>   		return false;
>   
> @@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
>   }
>   
>   static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>   {
>   	while (q->cons_tail != q->cons_head) {
>   		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>   		unsigned int idx = q->cons_tail & q->ring_mask;
>   
>   		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>   			return desc;
>   
>   		q->cons_tail++;
> @@ -262,7 +313,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
>   }
>   
>   static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>   {
>   	if (q->cons_tail == q->cons_head) {
>   		smp_mb(); /* D, matches A */
> @@ -273,7 +325,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
>   		smp_rmb(); /* C, matches B */
>   	}
>   
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>   }
>   
>   static inline void xskq_discard_desc(struct xsk_queue *q)
> 


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 01/12] i40e: simplify Rx buffer recycle
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:37               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:37 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'old_bi' values
> directly without having to mask and add the headroom.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 01/12] i40e: simplify Rx buffer recycle
@ 2019-08-30 15:37               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:37 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'old_bi' values
> directly without having to mask and add the headroom.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 02/12] ixgbe: simplify Rx buffer recycle
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:39               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:39 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan

On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'obi' values
> directly without having to mask and add the headroom.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 02/12] ixgbe: simplify Rx buffer recycle
@ 2019-08-30 15:39               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:39 UTC (permalink / raw)
  To: intel-wired-lan

On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> Currently, the dma, addr and handle are modified when we reuse Rx buffers
> in zero-copy mode. However, this is not required as the inputs to the
> function are copies, not the original values themselves. As we use the
> copies within the function, we can use the original 'obi' values
> directly without having to mask and add the headroom.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 03/12] xsk: add support to allow unaligned chunk placement
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:41               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:41 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 03/12] xsk: add support to allow unaligned chunk placement
@ 2019-08-30 15:41               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:41 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 04/12] i40e: modify driver for handling offsets
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:42               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:42 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 04/12] i40e: modify driver for handling offsets
@ 2019-08-30 15:42               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:42 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 05/12] ixgbe: modify driver for handling offsets
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:42               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:42 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 05/12] ixgbe: modify driver for handling offsets
@ 2019-08-30 15:42               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:42 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 06/12] mlx5e: modify driver for handling offsets
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:43               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:43 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 06/12] mlx5e: modify driver for handling offsets
@ 2019-08-30 15:43               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:43 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for
> each case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 07/12] net/mlx5e: Allow XSK frames smaller than a page
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:45               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:45 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> From: Maxim Mikityanskiy <maximmi@mellanox.com>
>
> Relax the requirements to the XSK frame size to allow it to be smaller
> than a page and even not a power of two. The current implementation can
> work in this mode, both with Striding RQ and without it.
>
> The code that checks `mtu + headroom <= XSK frame size` is modified
> accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.
>
> Functions that worked with pages only now work with XSK frames, even if
> their size is different from PAGE_SIZE.
>
> With XSK queues, regardless of the frame size, Striding RQ uses the
> stride size of PAGE_SIZE, and UMR MTTs are posted using starting
> addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
> packet data will overlap with other frames. UMR MTT size is made equal
> to the stride size of the RQ, because UMEM frames may come in random
> order, and we need to handle them one by one. PAGE_SIZE is just a power
> of two that is bigger than any allowed XSK frame size, and also it
> doesn't require making additional changes to the code.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 07/12] net/mlx5e: Allow XSK frames smaller than a page
@ 2019-08-30 15:45               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:45 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> From: Maxim Mikityanskiy <maximmi@mellanox.com>
>
> Relax the requirements to the XSK frame size to allow it to be smaller
> than a page and even not a power of two. The current implementation can
> work in this mode, both with Striding RQ and without it.
>
> The code that checks `mtu + headroom <= XSK frame size` is modified
> accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.
>
> Functions that worked with pages only now work with XSK frames, even if
> their size is different from PAGE_SIZE.
>
> With XSK queues, regardless of the frame size, Striding RQ uses the
> stride size of PAGE_SIZE, and UMR MTTs are posted using starting
> addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
> packet data will overlap with other frames. UMR MTT size is made equal
> to the stride size of the RQ, because UMEM frames may come in random
> order, and we need to handle them one by one. PAGE_SIZE is just a power
> of two that is bigger than any allowed XSK frame size, and also it
> doesn't require making additional changes to the code.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 08/12] libbpf: add flags to umem config
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:46               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:46 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Since we change the size of the xsk_umem_config struct, we need to version
> the ABI. This patch includes the ABI versioning for xsk_umem__create. The
> Makefile was also updated to handle multiple function versions in
> check-abi.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 08/12] libbpf: add flags to umem config
@ 2019-08-30 15:46               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:46 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Since we change the size of the xsk_umem_config struct, we need to version
> the ABI. This patch includes the ABI versioning for xsk_umem__create. The
> Makefile was also updated to handle multiple function versions in
> check-abi.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 09/12] samples/bpf: add unaligned chunks mode support to xdpsock
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:47               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:47 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch adds support for the unaligned chunks mode. The addition of the
> unaligned chunks option will allow users to run the application with more
> relaxed chunk placement in the XDP umem.
>
> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
> line options.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 09/12] samples/bpf: add unaligned chunks mode support to xdpsock
@ 2019-08-30 15:47               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:47 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch adds support for the unaligned chunks mode. The addition of the
> unaligned chunks option will allow users to run the application with more
> relaxed chunk placement in the XDP umem.
>
> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
> line options.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 10/12] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:49               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:49 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch adds buffer recycling support for unaligned buffers. Since we
> don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
> sure we give back the correct (original) addr to the fill queue. We achieve
> this using the new descriptor format and associated masks. The new format
> uses the upper 16-bits for the offset and the lower 48-bits for the addr.
> Since we have a field for the offset, we no longer need to modify the
> actual address. As such, all we have to do to get back the original address
> is mask for the lower 48 bits (i.e. strip the offset and we get the address
> on it's own).
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 10/12] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
@ 2019-08-30 15:49               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:49 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch adds buffer recycling support for unaligned buffers. Since we
> don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
> sure we give back the correct (original) addr to the fill queue. We achieve
> this using the new descriptor format and associated masks. The new format
> uses the upper 16-bits for the offset and the lower 48-bits for the addr.
> Since we have a field for the offset, we no longer need to modify the
> actual address. As such, all we have to do to get back the original address
> is mask for the lower 48 bits (i.e. strip the offset and we get the address
> on it's own).
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 11/12] samples/bpf: use hugepages in xdpsock app
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:51               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:51 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch modifies xdpsock to use mmap instead of posix_memalign. With
> this change, we can use hugepages when running the application in unaligned
> chunks mode. Using hugepages makes it more likely that we have physically
> contiguous memory, which supports the unaligned chunk mode better.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 11/12] samples/bpf: use hugepages in xdpsock app
@ 2019-08-30 15:51               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:51 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch modifies xdpsock to use mmap instead of posix_memalign. With
> this change, we can use hugepages when running the application in unaligned
> chunks mode. Using hugepages makes it more likely that we have physically
> contiguous memory, which supports the unaligned chunk mode better.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 12/12] doc/af_xdp: include unaligned chunk case
  2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:51               ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:51 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> The addition of unaligned chunks mode, the documentation needs to be
> updated to indicate that the incoming addr to the fill ring will only be
> masked if the user application is run in the aligned chunk mode. This patch
> also adds a line to explicitly indicate that the incoming addr will not be
> masked if running the user application in the unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 12/12] doc/af_xdp: include unaligned chunk case
@ 2019-08-30 15:51               ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:51 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> The addition of unaligned chunks mode, the documentation needs to be
> updated to indicate that the incoming addr to the fill ring will only be
> masked if the user application is run in the aligned chunk mode. This patch
> also adds a line to explicitly indicate that the incoming addr will not be
> masked if running the user application in the unaligned chunk mode.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 15:52             ` Jonathan Lemon
  -1 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:52 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson,
	jakub.kicinski, saeedm, maximmi, stephen, bruce.richardson,
	ciara.loftus, bpf, intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.

Thanks, Kevin for your hard work and perseverance on this patch set!
-- 
Jonathan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
@ 2019-08-30 15:52             ` Jonathan Lemon
  0 siblings, 0 replies; 274+ messages in thread
From: Jonathan Lemon @ 2019-08-30 15:52 UTC (permalink / raw)
  To: intel-wired-lan



On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.

Thanks, Kevin for your hard work and perseverance on this patch set!
-- 
Jonathan

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
  2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
@ 2019-08-30 23:29             ` Daniel Borkmann
  -1 siblings, 0 replies; 274+ messages in thread
From: Daniel Borkmann @ 2019-08-30 23:29 UTC (permalink / raw)
  To: Kevin Laatz, netdev, ast, bjorn.topel, magnus.karlsson,
	jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
  Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan

On 8/27/19 4:25 AM, Kevin Laatz wrote:
> This patch set adds the ability to use unaligned chunks in the XDP umem.
> 
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
> 
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> 
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
> 
> The patch set was tested with the following set up:
>    - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>    - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
>    - Driver: i40e
>    - Application: xdpsock with l2fwd (single interface)
>    - Turbo disabled in BIOS
> 
> There are no changes to performance before and after these patches for SKB
> mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.
> 
> This patch set has been applied against
> commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")
> 
> Structure of the patch set:
> Patch 1:
>    - Remove unnecessary masking and headroom addition during zero-copy Rx
>      buffer recycling in i40e. This change is required in order for the
>      buffer recycling to work in the unaligned chunk mode.
> 
> Patch 2:
>    - Remove unnecessary masking and headroom addition during
>      zero-copy Rx buffer recycling in ixgbe. This change is required in
>      order for the  buffer recycling to work in the unaligned chunk mode.
> 
> Patch 3:
>    - Add infrastructure for unaligned chunks. Since we are dealing with
>      unaligned chunks that could potentially cross a physical page boundary,
>      we add checks to keep track of that information. We can later use this
>      information to correctly handle buffers that are placed at an address
>      where they cross a page boundary.  This patch also modifies the
>      existing Rx and Tx functions to use the new descriptor format. To
>      handle addresses correctly, we need to mask appropriately based on
>      whether we are in aligned or unaligned mode.
> 
> Patch 4:
>    - This patch updates the i40e driver to make use of the new descriptor
>      format.
> 
> Patch 5:
>    - This patch updates the ixgbe driver to make use of the new descriptor
>      format.
> 
> Patch 6:
>    - This patch updates the mlx5e driver to make use of the new descriptor
>      format. These changes are required to handle the new descriptor format
>      and for unaligned chunks support.
> 
> Patch 7:
>    - This patch allows XSK frames smaller than page size in the mlx5e
>      driver. Relax the requirements to the XSK frame size to allow it to be
>      smaller than a page and even not a power of two. The current
>      implementation can work in this mode, both with Striding RQ and without
>      it.
> 
> Patch 8:
>    - Add flags for umem configuration to libbpf. Since we increase the size
>      of the struct by adding flags, we also need to add the ABI versioning
>      in this patch.
> 
> Patch 9:
>    - Modify xdpsock application to add a command line option for
>      unaligned chunks
> 
> Patch 10:
>    - Since we can now run the application in unaligned chunk mode, we need
>      to make sure we recycle the buffers appropriately.
> 
> Patch 11:
>    - Adds hugepage support to the xdpsock application
> 
> Patch 12:
>    - Documentation update to include the unaligned chunk scenario. We need
>      to explicitly state that the incoming addresses are only masked in the
>      aligned chunk mode and not the unaligned chunk mode.
> 
> ---
> v2:
>    - fixed checkpatch issues
>    - fixed Rx buffer recycling for unaligned chunks in xdpsock
>    - removed unused defines
>    - fixed how chunk_size is calculated in xsk_diag.c
>    - added some performance numbers to cover letter
>    - modified descriptor format to make it easier to retrieve original
>      address
>    - removed patch adding off_t off to the zero copy allocator. This is no
>      longer needed with the new descriptor format.
> 
> v3:
>    - added patch for mlx5 driver changes needed for unaligned chunks
>    - moved offset handling to new helper function
>    - changed value used for the umem chunk_mask. Now using the new
>      descriptor format to save us doing the calculations in a number of
>      places meaning more of the code is left unchanged while adding
>      unaligned chunk support.
> 
> v4:
>    - reworked the next_pg_contig field in the xdp_umem_page struct. We now
>      use the low 12 bits of the addr for flags rather than adding an extra
>      field in the struct.
>    - modified unaligned chunks flag define
>    - fixed page_start calculation in __xsk_rcv_memcpy().
>    - move offset handling to the xdp_umem_get_* functions
>    - modified the len field in xdp_umem_reg struct. We now use 16 bits from
>      this for the flags field.
>    - fixed headroom addition to handle in the mlx5e driver
>    - other minor changes based on review comments
> 
> v5:
>    - Added ABI versioning in the libbpf patch
>    - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
>    - Added accessors for getting addr and offset.
>    - Added helper function for adding the offset to the addr.
>    - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
>    - Fixed typo in mlx driver patch.
>    - Moved libbpf patch to later in the set (7/11, just before the sample
>      app changes)
> 
> v6:
>    - Added support for XSK frames smaller than page in mlx5e driver (Maxim
>      Mikityanskiy <maximmi@mellanox.com).
>    - Fixed offset handling in xsk_generic_rcv.
>    - Added check for base address in xskq_is_valid_addr_unaligned.
> 
> Kevin Laatz (11):
>    i40e: simplify Rx buffer recycle
>    ixgbe: simplify Rx buffer recycle
>    xsk: add support to allow unaligned chunk placement
>    i40e: modify driver for handling offsets
>    ixgbe: modify driver for handling offsets
>    mlx5e: modify driver for handling offsets
>    libbpf: add flags to umem config
>    samples/bpf: add unaligned chunks mode support to xdpsock
>    samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>    samples/bpf: use hugepages in xdpsock app
>    doc/af_xdp: include unaligned chunk case
> 
> Maxim Mikityanskiy (1):
>    net/mlx5e: Allow XSK frames smaller than a page
> 
>   Documentation/networking/af_xdp.rst           | 10 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
>   .../ethernet/mellanox/mlx5/core/en/params.c   | 23 ++++-
>   .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +
>   .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
>   .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  5 +-
>   .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++-
>   include/net/xdp_sock.h                        | 75 ++++++++++++++-
>   include/uapi/linux/if_xdp.h                   |  9 ++
>   net/xdp/xdp_umem.c                            | 19 +++-
>   net/xdp/xsk.c                                 | 94 +++++++++++++++----
>   net/xdp/xsk_diag.c                            |  2 +-
>   net/xdp/xsk_queue.h                           | 70 ++++++++++++--
>   samples/bpf/xdpsock_user.c                    | 61 ++++++++----
>   tools/include/uapi/linux/if_xdp.h             |  9 ++
>   tools/lib/bpf/Makefile                        |  5 +-
>   tools/lib/bpf/libbpf.map                      |  1 +
>   tools/lib/bpf/xsk.c                           | 33 ++++++-
>   tools/lib/bpf/xsk.h                           | 27 ++++++
>   20 files changed, 417 insertions(+), 103 deletions(-)
> 

Applied, thanks!

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
@ 2019-08-30 23:29             ` Daniel Borkmann
  0 siblings, 0 replies; 274+ messages in thread
From: Daniel Borkmann @ 2019-08-30 23:29 UTC (permalink / raw)
  To: intel-wired-lan

On 8/27/19 4:25 AM, Kevin Laatz wrote:
> This patch set adds the ability to use unaligned chunks in the XDP umem.
> 
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
> 
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> 
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
> 
> The patch set was tested with the following set up:
>    - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>    - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
>    - Driver: i40e
>    - Application: xdpsock with l2fwd (single interface)
>    - Turbo disabled in BIOS
> 
> There are no changes to performance before and after these patches for SKB
> mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.
> 
> This patch set has been applied against
> commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")
> 
> Structure of the patch set:
> Patch 1:
>    - Remove unnecessary masking and headroom addition during zero-copy Rx
>      buffer recycling in i40e. This change is required in order for the
>      buffer recycling to work in the unaligned chunk mode.
> 
> Patch 2:
>    - Remove unnecessary masking and headroom addition during
>      zero-copy Rx buffer recycling in ixgbe. This change is required in
>      order for the  buffer recycling to work in the unaligned chunk mode.
> 
> Patch 3:
>    - Add infrastructure for unaligned chunks. Since we are dealing with
>      unaligned chunks that could potentially cross a physical page boundary,
>      we add checks to keep track of that information. We can later use this
>      information to correctly handle buffers that are placed at an address
>      where they cross a page boundary.  This patch also modifies the
>      existing Rx and Tx functions to use the new descriptor format. To
>      handle addresses correctly, we need to mask appropriately based on
>      whether we are in aligned or unaligned mode.
> 
> Patch 4:
>    - This patch updates the i40e driver to make use of the new descriptor
>      format.
> 
> Patch 5:
>    - This patch updates the ixgbe driver to make use of the new descriptor
>      format.
> 
> Patch 6:
>    - This patch updates the mlx5e driver to make use of the new descriptor
>      format. These changes are required to handle the new descriptor format
>      and for unaligned chunks support.
> 
> Patch 7:
>    - This patch allows XSK frames smaller than page size in the mlx5e
>      driver. Relax the requirements to the XSK frame size to allow it to be
>      smaller than a page and even not a power of two. The current
>      implementation can work in this mode, both with Striding RQ and without
>      it.
> 
> Patch 8:
>    - Add flags for umem configuration to libbpf. Since we increase the size
>      of the struct by adding flags, we also need to add the ABI versioning
>      in this patch.
> 
> Patch 9:
>    - Modify xdpsock application to add a command line option for
>      unaligned chunks
> 
> Patch 10:
>    - Since we can now run the application in unaligned chunk mode, we need
>      to make sure we recycle the buffers appropriately.
> 
> Patch 11:
>    - Adds hugepage support to the xdpsock application
> 
> Patch 12:
>    - Documentation update to include the unaligned chunk scenario. We need
>      to explicitly state that the incoming addresses are only masked in the
>      aligned chunk mode and not the unaligned chunk mode.
> 
> ---
> v2:
>    - fixed checkpatch issues
>    - fixed Rx buffer recycling for unaligned chunks in xdpsock
>    - removed unused defines
>    - fixed how chunk_size is calculated in xsk_diag.c
>    - added some performance numbers to cover letter
>    - modified descriptor format to make it easier to retrieve original
>      address
>    - removed patch adding off_t off to the zero copy allocator. This is no
>      longer needed with the new descriptor format.
> 
> v3:
>    - added patch for mlx5 driver changes needed for unaligned chunks
>    - moved offset handling to new helper function
>    - changed value used for the umem chunk_mask. Now using the new
>      descriptor format to save us doing the calculations in a number of
>      places meaning more of the code is left unchanged while adding
>      unaligned chunk support.
> 
> v4:
>    - reworked the next_pg_contig field in the xdp_umem_page struct. We now
>      use the low 12 bits of the addr for flags rather than adding an extra
>      field in the struct.
>    - modified unaligned chunks flag define
>    - fixed page_start calculation in __xsk_rcv_memcpy().
>    - move offset handling to the xdp_umem_get_* functions
>    - modified the len field in xdp_umem_reg struct. We now use 16 bits from
>      this for the flags field.
>    - fixed headroom addition to handle in the mlx5e driver
>    - other minor changes based on review comments
> 
> v5:
>    - Added ABI versioning in the libbpf patch
>    - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
>    - Added accessors for getting addr and offset.
>    - Added helper function for adding the offset to the addr.
>    - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
>    - Fixed typo in mlx driver patch.
>    - Moved libbpf patch to later in the set (7/11, just before the sample
>      app changes)
> 
> v6:
>    - Added support for XSK frames smaller than page in mlx5e driver (Maxim
>      Mikityanskiy <maximmi@mellanox.com).
>    - Fixed offset handling in xsk_generic_rcv.
>    - Added check for base address in xskq_is_valid_addr_unaligned.
> 
> Kevin Laatz (11):
>    i40e: simplify Rx buffer recycle
>    ixgbe: simplify Rx buffer recycle
>    xsk: add support to allow unaligned chunk placement
>    i40e: modify driver for handling offsets
>    ixgbe: modify driver for handling offsets
>    mlx5e: modify driver for handling offsets
>    libbpf: add flags to umem config
>    samples/bpf: add unaligned chunks mode support to xdpsock
>    samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>    samples/bpf: use hugepages in xdpsock app
>    doc/af_xdp: include unaligned chunk case
> 
> Maxim Mikityanskiy (1):
>    net/mlx5e: Allow XSK frames smaller than a page
> 
>   Documentation/networking/af_xdp.rst           | 10 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
>   .../ethernet/mellanox/mlx5/core/en/params.c   | 23 ++++-
>   .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +
>   .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
>   .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  5 +-
>   .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++-
>   include/net/xdp_sock.h                        | 75 ++++++++++++++-
>   include/uapi/linux/if_xdp.h                   |  9 ++
>   net/xdp/xdp_umem.c                            | 19 +++-
>   net/xdp/xsk.c                                 | 94 +++++++++++++++----
>   net/xdp/xsk_diag.c                            |  2 +-
>   net/xdp/xsk_queue.h                           | 70 ++++++++++++--
>   samples/bpf/xdpsock_user.c                    | 61 ++++++++----
>   tools/include/uapi/linux/if_xdp.h             |  9 ++
>   tools/lib/bpf/Makefile                        |  5 +-
>   tools/lib/bpf/libbpf.map                      |  1 +
>   tools/lib/bpf/xsk.c                           | 33 ++++++-
>   tools/lib/bpf/xsk.h                           | 27 ++++++
>   20 files changed, 417 insertions(+), 103 deletions(-)
> 

Applied, thanks!

^ permalink raw reply	[flat|nested] 274+ messages in thread

end of thread, other threads:[~2019-08-30 23:29 UTC | newest]

Thread overview: 274+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-20  9:09 [PATCH 00/11] XDP unaligned chunk placement support Kevin Laatz
2019-06-20  9:09 ` [Intel-wired-lan] " Kevin Laatz
2019-06-20  9:09 ` [PATCH 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 14:29   ` Björn Töpel
2019-06-24 14:29     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 02/11] ixgbe: " Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 14:30   ` Björn Töpel
2019-06-24 14:30     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 03/11] xdp: add offset param to zero_copy_allocator Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 14:31   ` Björn Töpel
2019-06-24 14:31     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-24 19:23   ` Jakub Kicinski
2019-06-24 19:23     ` [Intel-wired-lan] " Jakub Kicinski
2019-06-25 13:14     ` Laatz, Kevin
2019-06-25 13:14       ` [Intel-wired-lan] " Laatz, Kevin
2019-06-20  9:09 ` [PATCH 04/11] i40e: add offset to zca_free Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 14:32   ` Björn Töpel
2019-06-24 14:32     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 05/11] ixgbe: " Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 14:32   ` Björn Töpel
2019-06-24 14:32     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 06/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 15:29   ` Björn Töpel
2019-06-24 15:29     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 07/11] libbpf: add flags to umem config Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 15:30   ` Björn Töpel
2019-06-24 15:30     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 15:31   ` Björn Töpel
2019-06-24 15:31     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 15:35   ` Björn Töpel
2019-06-24 15:35     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 15:36   ` Björn Töpel
2019-06-24 15:36     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-06-20  9:09 ` [PATCH 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-06-20  9:09   ` [Intel-wired-lan] " Kevin Laatz
2019-06-24 15:34   ` Björn Töpel
2019-06-24 15:34     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-07-16  3:06 ` [PATCH v2 00/10] XDP unaligned chunk placement support Kevin Laatz
2019-07-16  3:06   ` [Intel-wired-lan] " Kevin Laatz
2019-07-16  3:06   ` [PATCH v2 01/10] i40e: simplify Rx buffer recycle Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-19 17:19     ` Bowers, AndrewX
2019-07-19 17:19       ` Bowers, AndrewX
2019-07-16  3:06   ` [PATCH v2 02/10] ixgbe: " Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-19 17:20     ` Bowers, AndrewX
2019-07-19 17:20       ` Bowers, AndrewX
2019-07-16  3:06   ` [PATCH v2 03/10] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-19 17:21     ` Bowers, AndrewX
2019-07-19 17:21       ` Bowers, AndrewX
2019-07-16  3:06   ` [PATCH v2 04/10] i40e: modify driver for handling offsets Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-19 17:22     ` Bowers, AndrewX
2019-07-19 17:22       ` Bowers, AndrewX
2019-07-16  3:06   ` [PATCH v2 05/10] ixgbe: " Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-19 17:22     ` Bowers, AndrewX
2019-07-19 17:22       ` Bowers, AndrewX
2019-07-16  3:06   ` [PATCH v2 06/10] libbpf: add flags to umem config Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-16  3:06   ` [PATCH v2 07/10] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-16  3:06   ` [PATCH v2 08/10] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-16  3:06   ` [PATCH v2 09/10] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-16  3:06   ` [PATCH v2 10/10] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-07-16  3:06     ` [Intel-wired-lan] " Kevin Laatz
2019-07-23 21:08   ` [PATCH v2 00/10] XDP unaligned chunk placement support Alexei Starovoitov
2019-07-23 21:08     ` [Intel-wired-lan] " Alexei Starovoitov
2019-07-24 13:25     ` Magnus Karlsson
2019-07-24 13:25       ` Magnus Karlsson
2019-07-25 15:43       ` Jonathan Lemon
2019-07-25 15:43         ` Jonathan Lemon
2019-07-24  5:10   ` [PATCH bpf-next v3 00/11] " Kevin Laatz
2019-07-24  5:10     ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 02/11] ixgbe: " Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-25  2:22       ` Jakub Kicinski
2019-07-25  2:22         ` [Intel-wired-lan] " Jakub Kicinski
2019-07-25 17:01         ` Laatz, Kevin
2019-07-25 17:01           ` [Intel-wired-lan] " Laatz, Kevin
2019-07-25  9:27       ` Maxim Mikityanskiy
2019-07-25  9:27         ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-07-25 17:00         ` Laatz, Kevin
2019-07-25 17:00           ` [Intel-wired-lan] " Laatz, Kevin
2019-07-25 10:08       ` Maxim Mikityanskiy
2019-07-25 10:08         ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-07-25 15:39       ` Jonathan Lemon
2019-07-25 15:39         ` [Intel-wired-lan] " Jonathan Lemon
2019-07-26  9:47         ` Laatz, Kevin
2019-07-26  9:58           ` Laatz, Kevin
2019-07-24  5:10     ` [PATCH bpf-next v3 04/11] i40e: modify driver for handling offsets Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 05/11] ixgbe: " Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 06/11] mlx5e: " Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-25 10:15       ` Maxim Mikityanskiy
2019-07-25 10:15         ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-07-25 17:00         ` Laatz, Kevin
2019-07-25 17:00           ` [Intel-wired-lan] " Laatz, Kevin
2019-07-24  5:10     ` [PATCH bpf-next v3 07/11] libbpf: add flags to umem config Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-25  9:43       ` Maxim Mikityanskiy
2019-07-25  9:43         ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-07-25 17:00         ` Laatz, Kevin
2019-07-25 17:00           ` [Intel-wired-lan] " Laatz, Kevin
2019-07-24  5:10     ` [PATCH bpf-next v3 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-24  5:10     ` [PATCH bpf-next v3 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-07-24  5:10       ` [Intel-wired-lan] " Kevin Laatz
2019-07-25 15:39     ` [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support Jonathan Lemon
2019-07-25 15:39       ` [Intel-wired-lan] " Jonathan Lemon
2019-07-25 15:56       ` Richardson, Bruce
2019-07-25 15:56         ` [Intel-wired-lan] " Richardson, Bruce
2019-07-25 17:30         ` Jonathan Lemon
2019-07-25 17:30           ` [Intel-wired-lan] " Jonathan Lemon
2019-07-26  8:41           ` Bruce Richardson
2019-07-26  8:41             ` [Intel-wired-lan] " Bruce Richardson
2019-07-30  8:53     ` [PATCH bpf-next v4 " Kevin Laatz
2019-07-30  8:53       ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:53       ` [PATCH bpf-next v4 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:53       ` [PATCH bpf-next v4 02/11] ixgbe: " Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:53       ` [PATCH bpf-next v4 03/11] libbpf: add flags to umem config Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-31 12:45         ` Björn Töpel
2019-07-31 12:45           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-07-31 14:25         ` Björn Töpel
2019-07-31 14:25           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-08-01  6:59           ` Andrii Nakryiko
2019-08-01  6:59             ` Andrii Nakryiko
2019-08-01  7:34             ` Björn Töpel
2019-08-01  7:34               ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-08-02  7:19               ` Andrii Nakryiko
2019-08-02  7:19                 ` Andrii Nakryiko
2019-08-02  7:26                 ` Björn Töpel
2019-08-02  7:26                   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-07-30  8:53       ` [PATCH bpf-next v4 04/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-31 18:11         ` Jonathan Lemon
2019-07-31 18:11           ` [Intel-wired-lan] " Jonathan Lemon
2019-07-30  8:53       ` [PATCH bpf-next v4 05/11] i40e: modify driver for handling offsets Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:53       ` [PATCH bpf-next v4 06/11] ixgbe: " Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:53       ` [PATCH bpf-next v4 07/11] mlx5e: " Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-31 18:10         ` Jonathan Lemon
2019-07-31 18:10           ` [Intel-wired-lan] " Jonathan Lemon
2019-08-01 10:05         ` Maxim Mikityanskiy
2019-08-01 10:05           ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-08-19 14:36           ` Maxim Mikityanskiy
2019-08-19 14:36             ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-08-19 14:47             ` Laatz, Kevin
2019-08-19 14:47               ` [Intel-wired-lan] " Laatz, Kevin
2019-07-30  8:53       ` [PATCH bpf-next v4 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:53       ` [PATCH bpf-next v4 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-31 18:26         ` Jonathan Lemon
2019-07-31 18:26           ` [Intel-wired-lan] " Jonathan Lemon
2019-07-30  8:53       ` [PATCH bpf-next v4 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-07-30  8:53         ` [Intel-wired-lan] " Kevin Laatz
2019-07-30  8:54       ` [PATCH bpf-next v4 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-07-30  8:54         ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44       ` [PATCH bpf-next v5 00/11] XDP unaligned chunk placement support Kevin Laatz
2019-08-22  1:44         ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 01/11] i40e: simplify Rx buffer recycle Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 02/11] ixgbe: " Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22 18:43           ` Jonathan Lemon
2019-08-22 18:43             ` [Intel-wired-lan] " Jonathan Lemon
2019-08-23 13:35             ` Laatz, Kevin
2019-08-23 13:35               ` [Intel-wired-lan] " Laatz, Kevin
2019-08-27  7:36           ` Maxim Mikityanskiy
2019-08-27  7:36             ` [Intel-wired-lan] " Maxim Mikityanskiy
2019-08-22  1:44         ` [PATCH bpf-next v5 04/11] i40e: modify driver for handling offsets Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 05/11] ixgbe: " Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 06/11] mlx5e: " Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 07/11] libbpf: add flags to umem config Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 08/11] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 09/11] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 10/11] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-22  1:44         ` [PATCH bpf-next v5 11/11] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-08-22  1:44           ` [Intel-wired-lan] " Kevin Laatz
2019-08-27  2:25         ` [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support Kevin Laatz
2019-08-27  2:25           ` [Intel-wired-lan] " Kevin Laatz
2019-08-27  2:25           ` [PATCH bpf-next v6 01/12] i40e: simplify Rx buffer recycle Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:37             ` Jonathan Lemon
2019-08-30 15:37               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 02/12] ixgbe: " Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:39             ` Jonathan Lemon
2019-08-30 15:39               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 03/12] xsk: add support to allow unaligned chunk placement Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:41             ` Jonathan Lemon
2019-08-30 15:41               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 04/12] i40e: modify driver for handling offsets Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:42             ` Jonathan Lemon
2019-08-30 15:42               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 05/12] ixgbe: " Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:42             ` Jonathan Lemon
2019-08-30 15:42               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 06/12] mlx5e: " Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:43             ` Jonathan Lemon
2019-08-30 15:43               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 07/12] net/mlx5e: Allow XSK frames smaller than a page Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:45             ` Jonathan Lemon
2019-08-30 15:45               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 08/12] libbpf: add flags to umem config Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:46             ` Jonathan Lemon
2019-08-30 15:46               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 09/12] samples/bpf: add unaligned chunks mode support to xdpsock Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:47             ` Jonathan Lemon
2019-08-30 15:47               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 10/12] samples/bpf: add buffer recycling for unaligned chunks " Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:49             ` Jonathan Lemon
2019-08-30 15:49               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 11/12] samples/bpf: use hugepages in xdpsock app Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:51             ` Jonathan Lemon
2019-08-30 15:51               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-27  2:25           ` [PATCH bpf-next v6 12/12] doc/af_xdp: include unaligned chunk case Kevin Laatz
2019-08-27  2:25             ` [Intel-wired-lan] " Kevin Laatz
2019-08-30 15:51             ` Jonathan Lemon
2019-08-30 15:51               ` [Intel-wired-lan] " Jonathan Lemon
2019-08-30 15:52           ` [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support Jonathan Lemon
2019-08-30 15:52             ` [Intel-wired-lan] " Jonathan Lemon
2019-08-30 23:29           ` Daniel Borkmann
2019-08-30 23:29             ` [Intel-wired-lan] " Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.