bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE
@ 2023-04-10 12:06 Kal Conley
  2023-04-10 12:06 ` [PATCH bpf-next v5 1/4] xsk: Use pool->dma_pages to check for DMA Kal Conley
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Kal Conley @ 2023-04-10 12:06 UTC (permalink / raw)
  To: Magnus Karlsson, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend
  Cc: Kal Conley, netdev, bpf

The main purpose of this patchset is to add AF_XDP support for UMEM
chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB
pages.

Note, v5 fixes a major bug in previous versions of this patchset.
In particular, dma_map_page_attrs used to be called once for each
order-0 page in a hugepage with the assumption that returned I/O
addresses are contiguous within a hugepage. This assumption is incorrect
when an IOMMU is enabled. To fix this, v5 does DMA page accounting
accounting at hugepage granularity.

Changes since v4:
  * Use hugepages in DMA map (fixes zero-copy mode with IOMMU).
  * Use pool->dma_pages to check for DMA. This change is needed to avoid
    performance regressions).
  * Update commit message and benchmark table.

Changes since v3:
  * Fix checkpatch.pl whitespace error.

Changes since v2:
  * Related fixes/improvements included with v2 have been removed. These
    changes have all been resubmitted as standalone patchsets.
  * Minimize uses of #ifdef CONFIG_HUGETLB_PAGE.
  * Improve AF_XDP documentation.
  * Update benchmark table in commit message.

Changes since v1:
  * Add many fixes/improvements to the XSK selftests.
  * Add check for unaligned descriptors that overrun UMEM.
  * Fix compile errors when CONFIG_HUGETLB_PAGE is not set.
  * Fix incorrect use of _Static_assert.
  * Update AF_XDP documentation.
  * Rename unaligned 9K frame size test.
  * Make xp_check_dma_contiguity less conservative.
  * Add more information to benchmark table.

Thanks to Magnus Karlsson for all his support!

Happy Easter!

Kal Conley (4):
  xsk: Use pool->dma_pages to check for DMA
  xsk: Support UMEM chunk_size > PAGE_SIZE
  selftests: xsk: Use hugepages when umem->frame_size > PAGE_SIZE
  selftests: xsk: Add tests for 8K and 9K frame sizes

 Documentation/networking/af_xdp.rst      | 36 ++++++++++------
 include/net/xdp_sock.h                   |  2 +
 include/net/xdp_sock_drv.h               | 12 ++++++
 include/net/xsk_buff_pool.h              | 12 +++---
 net/xdp/xdp_umem.c                       | 55 +++++++++++++++++++-----
 net/xdp/xsk_buff_pool.c                  | 43 ++++++++++--------
 tools/testing/selftests/bpf/xskxceiver.c | 27 +++++++++++-
 tools/testing/selftests/bpf/xskxceiver.h |  2 +
 8 files changed, 142 insertions(+), 47 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 1/4] xsk: Use pool->dma_pages to check for DMA
  2023-04-10 12:06 [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
@ 2023-04-10 12:06 ` Kal Conley
  2023-04-10 12:06 ` [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Kal Conley @ 2023-04-10 12:06 UTC (permalink / raw)
  To: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend
  Cc: Kal Conley, netdev, bpf, linux-kernel

Read pool->dma_pages instead of pool->dma_pages_cnt to check for an
active DMA mapping. pool->dma_pages needs to be read anyway to access
the map so this compiles to more efficient code.

Signed-off-by: Kal Conley <kal.conley@dectris.com>
---
 include/net/xsk_buff_pool.h | 2 +-
 net/xdp/xsk_buff_pool.c     | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index d318c769b445..a8d7b8a3688a 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
 	if (likely(!cross_pg))
 		return false;
 
-	return pool->dma_pages_cnt &&
+	return pool->dma_pages &&
 	       !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
 }
 
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index b2df1e0f8153..26f6d304451e 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -350,7 +350,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 {
 	struct xsk_dma_map *dma_map;
 
-	if (pool->dma_pages_cnt == 0)
+	if (!pool->dma_pages)
 		return;
 
 	dma_map = xp_find_dma_map(pool);
@@ -364,6 +364,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 
 	__xp_dma_unmap(dma_map, attrs);
 	kvfree(pool->dma_pages);
+	pool->dma_pages = NULL;
 	pool->dma_pages_cnt = 0;
 	pool->dev = NULL;
 }
@@ -503,7 +504,7 @@ static struct xdp_buff_xsk *__xp_alloc(struct xsk_buff_pool *pool)
 	if (pool->unaligned) {
 		xskb = pool->free_heads[--pool->free_heads_cnt];
 		xp_init_xskb_addr(xskb, pool, addr);
-		if (pool->dma_pages_cnt)
+		if (pool->dma_pages)
 			xp_init_xskb_dma(xskb, pool, pool->dma_pages, addr);
 	} else {
 		xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];
@@ -569,7 +570,7 @@ static u32 xp_alloc_new_from_fq(struct xsk_buff_pool *pool, struct xdp_buff **xd
 		if (pool->unaligned) {
 			xskb = pool->free_heads[--pool->free_heads_cnt];
 			xp_init_xskb_addr(xskb, pool, addr);
-			if (pool->dma_pages_cnt)
+			if (pool->dma_pages)
 				xp_init_xskb_dma(xskb, pool, pool->dma_pages, addr);
 		} else {
 			xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-10 12:06 [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
  2023-04-10 12:06 ` [PATCH bpf-next v5 1/4] xsk: Use pool->dma_pages to check for DMA Kal Conley
@ 2023-04-10 12:06 ` Kal Conley
  2023-04-11  3:10   ` Bagas Sanjaya
  2023-04-12 13:39   ` Magnus Karlsson
  2023-04-10 12:06 ` [PATCH bpf-next v5 3/4] selftests: xsk: Use hugepages when umem->frame_size " Kal Conley
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Kal Conley @ 2023-04-10 12:06 UTC (permalink / raw)
  To: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend
  Cc: Kal Conley, netdev, bpf, linux-doc, linux-kernel

Add core AF_XDP support for chunk sizes larger than PAGE_SIZE. This
enables sending/receiving jumbo ethernet frames up to the theoretical
maximum of 64 KiB. For chunk sizes > PAGE_SIZE, the UMEM is required
to consist of HugeTLB VMAs (and be hugepage aligned). Initially, only
SKB mode is usable pending future driver work.

For consistency, check for HugeTLB pages during UMEM registration. This
implies that hugepages are required for XDP_COPY mode despite DMA not
being used. This restriction is desirable since it ensures user software
can take advantage of future driver support.

Despite this change, always store order-0 pages in the umem->pgs array
since this is what is returned by pin_user_pages(). Conversely, XSK
pools bound to HugeTLB UMEMs do DMA page accounting at hugepage
granularity (HPAGE_SIZE).

No significant change in RX/TX performance was observed with this patch.
A few data points are reproduced below:

Machine : Dell PowerEdge R940
CPU     : Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
NIC     : MT27700 Family [ConnectX-4]

+-----+------+------+-------+--------+--------+--------+
|     |      |      | chunk | packet | rxdrop | rxdrop |
|     | mode |  mtu |  size |   size | (Mpps) | (Gbps) |
+-----+------+------+-------+--------+--------+--------+
| old |   -z | 3498 |  4000 |    320 |   15.9 |   40.8 |
| new |   -z | 3498 |  4000 |    320 |   15.9 |   40.8 |
+-----+------+------+-------+--------+--------+--------+
| old |   -z | 3498 |  4096 |    320 |   16.5 |   42.2 |
| new |   -z | 3498 |  4096 |    320 |   16.5 |   42.3 |
+-----+------+------+-------+--------+--------+--------+
| new |   -c | 3498 | 10240 |    320 |    6.1 |   15.7 |
+-----+------+------+-------+--------+--------+--------+
| new |   -S | 9000 | 10240 |   9000 |   0.37 |   26.4 |
+-----+------+------+-------+--------+--------+--------+

Signed-off-by: Kal Conley <kal.conley@dectris.com>
---
 Documentation/networking/af_xdp.rst | 36 +++++++++++--------
 include/net/xdp_sock.h              |  2 ++
 include/net/xdp_sock_drv.h          | 12 +++++++
 include/net/xsk_buff_pool.h         | 10 +++---
 net/xdp/xdp_umem.c                  | 55 +++++++++++++++++++++++------
 net/xdp/xsk_buff_pool.c             | 36 +++++++++++--------
 6 files changed, 109 insertions(+), 42 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 247c6c4127e9..ea65cd882af6 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -105,12 +105,13 @@ with AF_XDP". It can be found at https://lwn.net/Articles/750845/.
 UMEM
 ----
 
-UMEM is a region of virtual contiguous memory, divided into
-equal-sized frames. An UMEM is associated to a netdev and a specific
-queue id of that netdev. It is created and configured (chunk size,
-headroom, start address and size) by using the XDP_UMEM_REG setsockopt
-system call. A UMEM is bound to a netdev and queue id, via the bind()
-system call.
+UMEM is a region of virtual contiguous memory divided into equal-sized
+frames. This is the area that contains all the buffers that packets can
+reside in. A UMEM is associated with a netdev and a specific queue id of
+that netdev. It is created and configured (start address, size,
+chunk size, and headroom) by using the XDP_UMEM_REG setsockopt system
+call. A UMEM is bound to a netdev and queue id via the bind() system
+call.
 
 An AF_XDP is socket linked to a single UMEM, but one UMEM can have
 multiple AF_XDP sockets. To share an UMEM created via one socket A,
@@ -418,14 +419,21 @@ negatively impact performance.
 XDP_UMEM_REG setsockopt
 -----------------------
 
-This setsockopt registers a UMEM to a socket. This is the area that
-contain all the buffers that packet can reside in. The call takes a
-pointer to the beginning of this area and the size of it. Moreover, it
-also has parameter called chunk_size that is the size that the UMEM is
-divided into. It can only be 2K or 4K at the moment. If you have an
-UMEM area that is 128K and a chunk size of 2K, this means that you
-will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM
-area and that your largest packet size can be 2K.
+This setsockopt registers a UMEM to a socket. The call takes a pointer
+to the beginning of this area and the size of it. Moreover, there is a
+parameter called chunk_size that is the size that the UMEM is divided
+into. The chunk size limits the maximum packet size that can be sent or
+received. For example, if you have a UMEM area that is 128K and a chunk
+size of 2K, then you will be able to hold a maximum of 128K / 2K = 64
+packets in your UMEM. In this case, the maximum packet size will be 2K.
+
+Valid chunk sizes range from 2K to 64K. However, in aligned mode, the
+chunk size must also be a power of two. Additionally, the chunk size
+must not exceed the size of a page (usually 4K). This limitation is
+relaxed for UMEM areas allocated with HugeTLB pages, in which case
+chunk sizes up to 64K are allowed. Note, this only works with hugepages
+allocated from the kernel's persistent pool. Using Transparent Huge
+Pages (THP) has no effect on the maximum chunk size.
 
 There is also an option to set the headroom of each single buffer in
 the UMEM. If you set this to N bytes, it means that the packet will
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e96a1151ec75..a71589539c38 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -25,6 +25,8 @@ struct xdp_umem {
 	u32 chunk_size;
 	u32 chunks;
 	u32 npgs;
+	u32 page_shift;
+	u32 page_size;
 	struct user_struct *user;
 	refcount_t users;
 	u8 flags;
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 9c0d860609ba..83fba3060c9a 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -12,6 +12,18 @@
 #define XDP_UMEM_MIN_CHUNK_SHIFT 11
 #define XDP_UMEM_MIN_CHUNK_SIZE (1 << XDP_UMEM_MIN_CHUNK_SHIFT)
 
+static_assert(XDP_UMEM_MIN_CHUNK_SIZE <= PAGE_SIZE);
+
+/* Allow chunk sizes up to the maximum size of an ethernet frame (64 KiB).
+ * Larger chunks are not guaranteed to fit in a single SKB.
+ */
+#ifdef CONFIG_HUGETLB_PAGE
+#define XDP_UMEM_MAX_CHUNK_SHIFT min(16, HPAGE_SHIFT)
+#else
+#define XDP_UMEM_MAX_CHUNK_SHIFT min(16, PAGE_SHIFT)
+#endif
+#define XDP_UMEM_MAX_CHUNK_SIZE (1 << XDP_UMEM_MAX_CHUNK_SHIFT)
+
 #ifdef CONFIG_XDP_SOCKETS
 
 void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries);
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index a8d7b8a3688a..af822b322d89 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -68,6 +68,8 @@ struct xsk_buff_pool {
 	struct xdp_desc *tx_descs;
 	u64 chunk_mask;
 	u64 addrs_cnt;
+	u32 page_shift;
+	u32 page_size;
 	u32 free_list_cnt;
 	u32 dma_pages_cnt;
 	u32 free_heads_cnt;
@@ -123,8 +125,8 @@ static inline void xp_init_xskb_addr(struct xdp_buff_xsk *xskb, struct xsk_buff_
 static inline void xp_init_xskb_dma(struct xdp_buff_xsk *xskb, struct xsk_buff_pool *pool,
 				    dma_addr_t *dma_pages, u64 addr)
 {
-	xskb->frame_dma = (dma_pages[addr >> PAGE_SHIFT] & ~XSK_NEXT_PG_CONTIG_MASK) +
-		(addr & ~PAGE_MASK);
+	xskb->frame_dma = (dma_pages[addr >> pool->page_shift] & ~XSK_NEXT_PG_CONTIG_MASK) +
+			  (addr & (pool->page_size - 1));
 	xskb->dma = xskb->frame_dma + pool->headroom + XDP_PACKET_HEADROOM;
 }
 
@@ -175,13 +177,13 @@ static inline void xp_dma_sync_for_device(struct xsk_buff_pool *pool,
 static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
 						 u64 addr, u32 len)
 {
-	bool cross_pg = (addr & (PAGE_SIZE - 1)) + len > PAGE_SIZE;
+	bool cross_pg = (addr & (pool->page_size - 1)) + len > pool->page_size;
 
 	if (likely(!cross_pg))
 		return false;
 
 	return pool->dma_pages &&
-	       !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
+	       !(pool->dma_pages[addr >> pool->page_shift] & XSK_NEXT_PG_CONTIG_MASK);
 }
 
 static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr)
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 4681e8e8ad94..6fb984be8f40 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -10,6 +10,8 @@
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 #include <linux/bpf.h>
+#include <linux/hugetlb.h>
+#include <linux/hugetlb_inline.h>
 #include <linux/mm.h>
 #include <linux/netdevice.h>
 #include <linux/rtnetlink.h>
@@ -91,9 +93,39 @@ void xdp_put_umem(struct xdp_umem *umem, bool defer_cleanup)
 	}
 }
 
+/* NOTE: The mmap_lock must be held by the caller. */
+static void xdp_umem_init_page_size(struct xdp_umem *umem, unsigned long address)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	struct vm_area_struct *vma;
+	struct vma_iterator vmi;
+	unsigned long end;
+
+	if (!IS_ALIGNED(address, HPAGE_SIZE))
+		goto no_hugetlb;
+
+	vma_iter_init(&vmi, current->mm, address);
+	end = address + umem->size;
+
+	for_each_vma_range(vmi, vma, end) {
+		if (!is_vm_hugetlb_page(vma))
+			goto no_hugetlb;
+		/* Hugepage sizes smaller than the default are not supported. */
+		if (huge_page_size(hstate_vma(vma)) < HPAGE_SIZE)
+			goto no_hugetlb;
+	}
+
+	umem->page_shift = HPAGE_SHIFT;
+	umem->page_size = HPAGE_SIZE;
+	return;
+no_hugetlb:
+#endif
+	umem->page_shift = PAGE_SHIFT;
+	umem->page_size = PAGE_SIZE;
+}
+
 static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
 {
-	unsigned int gup_flags = FOLL_WRITE;
 	long npgs;
 	int err;
 
@@ -102,8 +134,18 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
 		return -ENOMEM;
 
 	mmap_read_lock(current->mm);
+
+	xdp_umem_init_page_size(umem, address);
+
+	if (umem->chunk_size > umem->page_size) {
+		mmap_read_unlock(current->mm);
+		err = -EINVAL;
+		goto out_pgs;
+	}
+
 	npgs = pin_user_pages(address, umem->npgs,
-			      gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
+			      FOLL_WRITE | FOLL_LONGTERM, &umem->pgs[0], NULL);
+
 	mmap_read_unlock(current->mm);
 
 	if (npgs != umem->npgs) {
@@ -156,15 +198,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	unsigned int chunks, chunks_rem;
 	int err;
 
-	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
-		/* Strictly speaking we could support this, if:
-		 * - huge pages, or*
-		 * - using an IOMMU, or
-		 * - making sure the memory area is consecutive
-		 * but for now, we simply say "computer says no".
-		 */
+	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > XDP_UMEM_MAX_CHUNK_SIZE)
 		return -EINVAL;
-	}
 
 	if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG)
 		return -EINVAL;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 26f6d304451e..85b36c31b505 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -75,14 +75,16 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
 
 	pool->chunk_mask = ~((u64)umem->chunk_size - 1);
 	pool->addrs_cnt = umem->size;
+	pool->page_shift = umem->page_shift;
+	pool->page_size = umem->page_size;
 	pool->heads_cnt = umem->chunks;
 	pool->free_heads_cnt = umem->chunks;
 	pool->headroom = umem->headroom;
 	pool->chunk_size = umem->chunk_size;
 	pool->chunk_shift = ffs(umem->chunk_size) - 1;
-	pool->unaligned = unaligned;
 	pool->frame_len = umem->chunk_size - umem->headroom -
 		XDP_PACKET_HEADROOM;
+	pool->unaligned = unaligned;
 	pool->umem = umem;
 	pool->addrs = umem->addrs;
 	INIT_LIST_HEAD(&pool->free_list);
@@ -328,7 +330,8 @@ static void xp_destroy_dma_map(struct xsk_dma_map *dma_map)
 	kfree(dma_map);
 }
 
-static void __xp_dma_unmap(struct xsk_dma_map *dma_map, unsigned long attrs)
+static void __xp_dma_unmap(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map,
+			   unsigned long attrs)
 {
 	dma_addr_t *dma;
 	u32 i;
@@ -337,7 +340,7 @@ static void __xp_dma_unmap(struct xsk_dma_map *dma_map, unsigned long attrs)
 		dma = &dma_map->dma_pages[i];
 		if (*dma) {
 			*dma &= ~XSK_NEXT_PG_CONTIG_MASK;
-			dma_unmap_page_attrs(dma_map->dev, *dma, PAGE_SIZE,
+			dma_unmap_page_attrs(dma_map->dev, *dma, pool->page_size,
 					     DMA_BIDIRECTIONAL, attrs);
 			*dma = 0;
 		}
@@ -362,7 +365,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 	if (!refcount_dec_and_test(&dma_map->users))
 		return;
 
-	__xp_dma_unmap(dma_map, attrs);
+	__xp_dma_unmap(pool, dma_map, attrs);
 	kvfree(pool->dma_pages);
 	pool->dma_pages = NULL;
 	pool->dma_pages_cnt = 0;
@@ -370,16 +373,17 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 }
 EXPORT_SYMBOL(xp_dma_unmap);
 
-static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
+static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map, u32 page_size)
 {
 	u32 i;
 
-	for (i = 0; i < dma_map->dma_pages_cnt - 1; i++) {
-		if (dma_map->dma_pages[i] + PAGE_SIZE == dma_map->dma_pages[i + 1])
+	for (i = 0; i + 1 < dma_map->dma_pages_cnt; i++) {
+		if (dma_map->dma_pages[i] + page_size == dma_map->dma_pages[i + 1])
 			dma_map->dma_pages[i] |= XSK_NEXT_PG_CONTIG_MASK;
 		else
 			dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK;
 	}
+	dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK;
 }
 
 static int xp_init_dma_info(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map)
@@ -412,6 +416,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 {
 	struct xsk_dma_map *dma_map;
 	dma_addr_t dma;
+	u32 stride;
 	int err;
 	u32 i;
 
@@ -425,15 +430,19 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 		return 0;
 	}
 
+	/* dma_pages use pool->page_size whereas `pages` are always order-0. */
+	stride = pool->page_size >> PAGE_SHIFT; /* in order-0 pages */
+	nr_pages = (nr_pages + stride - 1) >> (pool->page_shift - PAGE_SHIFT);
+
 	dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, pool->umem);
 	if (!dma_map)
 		return -ENOMEM;
 
 	for (i = 0; i < dma_map->dma_pages_cnt; i++) {
-		dma = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE,
+		dma = dma_map_page_attrs(dev, pages[i * stride], 0, pool->page_size,
 					 DMA_BIDIRECTIONAL, attrs);
 		if (dma_mapping_error(dev, dma)) {
-			__xp_dma_unmap(dma_map, attrs);
+			__xp_dma_unmap(pool, dma_map, attrs);
 			return -ENOMEM;
 		}
 		if (dma_need_sync(dev, dma))
@@ -442,11 +451,11 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 	}
 
 	if (pool->unaligned)
-		xp_check_dma_contiguity(dma_map);
+		xp_check_dma_contiguity(dma_map, pool->page_size);
 
 	err = xp_init_dma_info(pool, dma_map);
 	if (err) {
-		__xp_dma_unmap(dma_map, attrs);
+		__xp_dma_unmap(pool, dma_map, attrs);
 		return err;
 	}
 
@@ -663,9 +672,8 @@ EXPORT_SYMBOL(xp_raw_get_data);
 dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
 {
 	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
-	return (pool->dma_pages[addr >> PAGE_SHIFT] &
-		~XSK_NEXT_PG_CONTIG_MASK) +
-		(addr & ~PAGE_MASK);
+	return (pool->dma_pages[addr >> pool->page_shift] & ~XSK_NEXT_PG_CONTIG_MASK) +
+	       (addr & (pool->page_size - 1));
 }
 EXPORT_SYMBOL(xp_raw_get_dma);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 3/4] selftests: xsk: Use hugepages when umem->frame_size > PAGE_SIZE
  2023-04-10 12:06 [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
  2023-04-10 12:06 ` [PATCH bpf-next v5 1/4] xsk: Use pool->dma_pages to check for DMA Kal Conley
  2023-04-10 12:06 ` [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
@ 2023-04-10 12:06 ` Kal Conley
  2023-04-10 12:06 ` [PATCH bpf-next v5 4/4] selftests: xsk: Add tests for 8K and 9K frame sizes Kal Conley
  2023-04-12 13:43 ` [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Magnus Karlsson
  4 siblings, 0 replies; 12+ messages in thread
From: Kal Conley @ 2023-04-10 12:06 UTC (permalink / raw)
  To: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, Andrii Nakryiko, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan
  Cc: Kal Conley, netdev, bpf, linux-kselftest, linux-kernel

HugeTLB UMEMs now support chunk_size > PAGE_SIZE. Set MAP_HUGETLB when
frame_size > PAGE_SIZE for future tests.

Signed-off-by: Kal Conley <kal.conley@dectris.com>
---
 tools/testing/selftests/bpf/xskxceiver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 5a9691e942de..7eccf57a0ccc 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -1289,7 +1289,7 @@ static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject)
 	void *bufs;
 	int ret;
 
-	if (ifobject->umem->unaligned_mode)
+	if (ifobject->umem->frame_size > sysconf(_SC_PAGESIZE) || ifobject->umem->unaligned_mode)
 		mmap_flags |= MAP_HUGETLB;
 
 	if (ifobject->shared_umem)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 4/4] selftests: xsk: Add tests for 8K and 9K frame sizes
  2023-04-10 12:06 [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
                   ` (2 preceding siblings ...)
  2023-04-10 12:06 ` [PATCH bpf-next v5 3/4] selftests: xsk: Use hugepages when umem->frame_size " Kal Conley
@ 2023-04-10 12:06 ` Kal Conley
  2023-04-12 13:43 ` [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Magnus Karlsson
  4 siblings, 0 replies; 12+ messages in thread
From: Kal Conley @ 2023-04-10 12:06 UTC (permalink / raw)
  To: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, Andrii Nakryiko, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan
  Cc: Kal Conley, netdev, bpf, linux-kselftest, linux-kernel

Add tests:
- RUN_TO_COMPLETION_8K_FRAME_SIZE: frame_size=8192 (aligned)
- UNALIGNED_9K_FRAME_SIZE: frame_size=9000 (unaligned)

Signed-off-by: Kal Conley <kal.conley@dectris.com>
---
 tools/testing/selftests/bpf/xskxceiver.c | 25 ++++++++++++++++++++++++
 tools/testing/selftests/bpf/xskxceiver.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 7eccf57a0ccc..86797de7fc50 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -1841,6 +1841,17 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
 		pkt_stream_replace(test, DEFAULT_PKT_CNT, PKT_SIZE);
 		testapp_validate_traffic(test);
 		break;
+	case TEST_TYPE_RUN_TO_COMPLETION_8K_FRAME:
+		if (!hugepages_present(test->ifobj_tx)) {
+			ksft_test_result_skip("No 2M huge pages present.\n");
+			return;
+		}
+		test_spec_set_name(test, "RUN_TO_COMPLETION_8K_FRAME_SIZE");
+		test->ifobj_tx->umem->frame_size = 8192;
+		test->ifobj_rx->umem->frame_size = 8192;
+		pkt_stream_replace(test, DEFAULT_PKT_CNT, PKT_SIZE);
+		testapp_validate_traffic(test);
+		break;
 	case TEST_TYPE_RX_POLL:
 		test->ifobj_rx->use_poll = true;
 		test_spec_set_name(test, "POLL_RX");
@@ -1904,6 +1915,20 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
 		if (!testapp_unaligned(test))
 			return;
 		break;
+	case TEST_TYPE_UNALIGNED_9K_FRAME:
+		if (!hugepages_present(test->ifobj_tx)) {
+			ksft_test_result_skip("No 2M huge pages present.\n");
+			return;
+		}
+		test_spec_set_name(test, "UNALIGNED_9K_FRAME_SIZE");
+		test->ifobj_tx->umem->frame_size = 9000;
+		test->ifobj_rx->umem->frame_size = 9000;
+		test->ifobj_tx->umem->unaligned_mode = true;
+		test->ifobj_rx->umem->unaligned_mode = true;
+		pkt_stream_replace(test, DEFAULT_PKT_CNT, PKT_SIZE);
+		test->ifobj_rx->pkt_stream->use_addr_for_fill = true;
+		testapp_validate_traffic(test);
+		break;
 	case TEST_TYPE_HEADROOM:
 		testapp_headroom(test);
 		break;
diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h
index 919327807a4e..7f52f737f5e9 100644
--- a/tools/testing/selftests/bpf/xskxceiver.h
+++ b/tools/testing/selftests/bpf/xskxceiver.h
@@ -69,12 +69,14 @@ enum test_mode {
 enum test_type {
 	TEST_TYPE_RUN_TO_COMPLETION,
 	TEST_TYPE_RUN_TO_COMPLETION_2K_FRAME,
+	TEST_TYPE_RUN_TO_COMPLETION_8K_FRAME,
 	TEST_TYPE_RUN_TO_COMPLETION_SINGLE_PKT,
 	TEST_TYPE_RX_POLL,
 	TEST_TYPE_TX_POLL,
 	TEST_TYPE_POLL_RXQ_TMOUT,
 	TEST_TYPE_POLL_TXQ_TMOUT,
 	TEST_TYPE_UNALIGNED,
+	TEST_TYPE_UNALIGNED_9K_FRAME,
 	TEST_TYPE_ALIGNED_INV_DESC,
 	TEST_TYPE_ALIGNED_INV_DESC_2K_FRAME,
 	TEST_TYPE_UNALIGNED_INV_DESC,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-10 12:06 ` [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
@ 2023-04-11  3:10   ` Bagas Sanjaya
  2023-04-12 13:39   ` Magnus Karlsson
  1 sibling, 0 replies; 12+ messages in thread
From: Bagas Sanjaya @ 2023-04-11  3:10 UTC (permalink / raw)
  To: Kal Conley, Magnus Karlsson, Björn Töpel,
	Maciej Fijalkowski, Jonathan Lemon, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend
  Cc: netdev, bpf, linux-doc, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3393 bytes --]

On Mon, Apr 10, 2023 at 02:06:27PM +0200, Kal Conley wrote:
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index 247c6c4127e9..ea65cd882af6 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -105,12 +105,13 @@ with AF_XDP". It can be found at https://lwn.net/Articles/750845/.
>  UMEM
>  ----
>  
> -UMEM is a region of virtual contiguous memory, divided into
> -equal-sized frames. An UMEM is associated to a netdev and a specific
> -queue id of that netdev. It is created and configured (chunk size,
> -headroom, start address and size) by using the XDP_UMEM_REG setsockopt
> -system call. A UMEM is bound to a netdev and queue id, via the bind()
> -system call.
> +UMEM is a region of virtual contiguous memory divided into equal-sized
> +frames. This is the area that contains all the buffers that packets can
> +reside in. A UMEM is associated with a netdev and a specific queue id of
> +that netdev. It is created and configured (start address, size,
> +chunk size, and headroom) by using the XDP_UMEM_REG setsockopt system
> +call. A UMEM is bound to a netdev and queue id via the bind() system
> +call.
>  
>  An AF_XDP is socket linked to a single UMEM, but one UMEM can have
>  multiple AF_XDP sockets. To share an UMEM created via one socket A,
> @@ -418,14 +419,21 @@ negatively impact performance.
>  XDP_UMEM_REG setsockopt
>  -----------------------
>  
> -This setsockopt registers a UMEM to a socket. This is the area that
> -contain all the buffers that packet can reside in. The call takes a
> -pointer to the beginning of this area and the size of it. Moreover, it
> -also has parameter called chunk_size that is the size that the UMEM is
> -divided into. It can only be 2K or 4K at the moment. If you have an
> -UMEM area that is 128K and a chunk size of 2K, this means that you
> -will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM
> -area and that your largest packet size can be 2K.
> +This setsockopt registers a UMEM to a socket. The call takes a pointer
> +to the beginning of this area and the size of it. Moreover, there is a
> +parameter called chunk_size that is the size that the UMEM is divided
> +into. The chunk size limits the maximum packet size that can be sent or
> +received. For example, if you have a UMEM area that is 128K and a chunk
> +size of 2K, then you will be able to hold a maximum of 128K / 2K = 64
> +packets in your UMEM. In this case, the maximum packet size will be 2K.
> +
> +Valid chunk sizes range from 2K to 64K. However, in aligned mode, the
> +chunk size must also be a power of two. Additionally, the chunk size
> +must not exceed the size of a page (usually 4K). This limitation is
> +relaxed for UMEM areas allocated with HugeTLB pages, in which case
> +chunk sizes up to 64K are allowed. Note, this only works with hugepages
> +allocated from the kernel's persistent pool. Using Transparent Huge
> +Pages (THP) has no effect on the maximum chunk size.
>  
>  There is also an option to set the headroom of each single buffer in
>  the UMEM. If you set this to N bytes, it means that the packet will

The doc LGTM, thanks!

For the doc part,

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-10 12:06 ` [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
  2023-04-11  3:10   ` Bagas Sanjaya
@ 2023-04-12 13:39   ` Magnus Karlsson
  2023-04-12 14:35     ` Kal Cutter Conley
  1 sibling, 1 reply; 12+ messages in thread
From: Magnus Karlsson @ 2023-04-12 13:39 UTC (permalink / raw)
  To: Kal Conley
  Cc: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, linux-doc, linux-kernel

On Mon, 10 Apr 2023 at 14:08, Kal Conley <kal.conley@dectris.com> wrote:
>
> Add core AF_XDP support for chunk sizes larger than PAGE_SIZE. This
> enables sending/receiving jumbo ethernet frames up to the theoretical
> maximum of 64 KiB. For chunk sizes > PAGE_SIZE, the UMEM is required
> to consist of HugeTLB VMAs (and be hugepage aligned). Initially, only
> SKB mode is usable pending future driver work.
>
> For consistency, check for HugeTLB pages during UMEM registration. This
> implies that hugepages are required for XDP_COPY mode despite DMA not
> being used. This restriction is desirable since it ensures user software
> can take advantage of future driver support.
>
> Despite this change, always store order-0 pages in the umem->pgs array
> since this is what is returned by pin_user_pages(). Conversely, XSK
> pools bound to HugeTLB UMEMs do DMA page accounting at hugepage
> granularity (HPAGE_SIZE).
>
> No significant change in RX/TX performance was observed with this patch.
> A few data points are reproduced below:
>
> Machine : Dell PowerEdge R940
> CPU     : Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
> NIC     : MT27700 Family [ConnectX-4]
>
> +-----+------+------+-------+--------+--------+--------+
> |     |      |      | chunk | packet | rxdrop | rxdrop |
> |     | mode |  mtu |  size |   size | (Mpps) | (Gbps) |
> +-----+------+------+-------+--------+--------+--------+
> | old |   -z | 3498 |  4000 |    320 |   15.9 |   40.8 |
> | new |   -z | 3498 |  4000 |    320 |   15.9 |   40.8 |
> +-----+------+------+-------+--------+--------+--------+
> | old |   -z | 3498 |  4096 |    320 |   16.5 |   42.2 |
> | new |   -z | 3498 |  4096 |    320 |   16.5 |   42.3 |
> +-----+------+------+-------+--------+--------+--------+
> | new |   -c | 3498 | 10240 |    320 |    6.1 |   15.7 |
> +-----+------+------+-------+--------+--------+--------+
> | new |   -S | 9000 | 10240 |   9000 |   0.37 |   26.4 |
> +-----+------+------+-------+--------+--------+--------+
>
> Signed-off-by: Kal Conley <kal.conley@dectris.com>
> ---
>  Documentation/networking/af_xdp.rst | 36 +++++++++++--------
>  include/net/xdp_sock.h              |  2 ++
>  include/net/xdp_sock_drv.h          | 12 +++++++
>  include/net/xsk_buff_pool.h         | 10 +++---
>  net/xdp/xdp_umem.c                  | 55 +++++++++++++++++++++++------
>  net/xdp/xsk_buff_pool.c             | 36 +++++++++++--------
>  6 files changed, 109 insertions(+), 42 deletions(-)
>
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index 247c6c4127e9..ea65cd882af6 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -105,12 +105,13 @@ with AF_XDP". It can be found at https://lwn.net/Articles/750845/.
>  UMEM
>  ----
>
> -UMEM is a region of virtual contiguous memory, divided into
> -equal-sized frames. An UMEM is associated to a netdev and a specific
> -queue id of that netdev. It is created and configured (chunk size,
> -headroom, start address and size) by using the XDP_UMEM_REG setsockopt
> -system call. A UMEM is bound to a netdev and queue id, via the bind()
> -system call.
> +UMEM is a region of virtual contiguous memory divided into equal-sized
> +frames. This is the area that contains all the buffers that packets can
> +reside in. A UMEM is associated with a netdev and a specific queue id of
> +that netdev. It is created and configured (start address, size,
> +chunk size, and headroom) by using the XDP_UMEM_REG setsockopt system
> +call. A UMEM is bound to a netdev and queue id via the bind() system
> +call.
>
>  An AF_XDP is socket linked to a single UMEM, but one UMEM can have
>  multiple AF_XDP sockets. To share an UMEM created via one socket A,
> @@ -418,14 +419,21 @@ negatively impact performance.
>  XDP_UMEM_REG setsockopt
>  -----------------------
>
> -This setsockopt registers a UMEM to a socket. This is the area that
> -contain all the buffers that packet can reside in. The call takes a
> -pointer to the beginning of this area and the size of it. Moreover, it
> -also has parameter called chunk_size that is the size that the UMEM is
> -divided into. It can only be 2K or 4K at the moment. If you have an
> -UMEM area that is 128K and a chunk size of 2K, this means that you
> -will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM
> -area and that your largest packet size can be 2K.
> +This setsockopt registers a UMEM to a socket. The call takes a pointer
> +to the beginning of this area and the size of it. Moreover, there is a
> +parameter called chunk_size that is the size that the UMEM is divided
> +into. The chunk size limits the maximum packet size that can be sent or
> +received. For example, if you have a UMEM area that is 128K and a chunk
> +size of 2K, then you will be able to hold a maximum of 128K / 2K = 64
> +packets in your UMEM. In this case, the maximum packet size will be 2K.
> +
> +Valid chunk sizes range from 2K to 64K. However, in aligned mode, the
> +chunk size must also be a power of two. Additionally, the chunk size
> +must not exceed the size of a page (usually 4K). This limitation is
> +relaxed for UMEM areas allocated with HugeTLB pages, in which case
> +chunk sizes up to 64K are allowed. Note, this only works with hugepages
> +allocated from the kernel's persistent pool. Using Transparent Huge
> +Pages (THP) has no effect on the maximum chunk size.
>
>  There is also an option to set the headroom of each single buffer in
>  the UMEM. If you set this to N bytes, it means that the packet will
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index e96a1151ec75..a71589539c38 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -25,6 +25,8 @@ struct xdp_umem {
>         u32 chunk_size;
>         u32 chunks;
>         u32 npgs;
> +       u32 page_shift;
> +       u32 page_size;
>         struct user_struct *user;
>         refcount_t users;
>         u8 flags;
> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> index 9c0d860609ba..83fba3060c9a 100644
> --- a/include/net/xdp_sock_drv.h
> +++ b/include/net/xdp_sock_drv.h
> @@ -12,6 +12,18 @@
>  #define XDP_UMEM_MIN_CHUNK_SHIFT 11
>  #define XDP_UMEM_MIN_CHUNK_SIZE (1 << XDP_UMEM_MIN_CHUNK_SHIFT)
>
> +static_assert(XDP_UMEM_MIN_CHUNK_SIZE <= PAGE_SIZE);
> +
> +/* Allow chunk sizes up to the maximum size of an ethernet frame (64 KiB).
> + * Larger chunks are not guaranteed to fit in a single SKB.
> + */
> +#ifdef CONFIG_HUGETLB_PAGE
> +#define XDP_UMEM_MAX_CHUNK_SHIFT min(16, HPAGE_SHIFT)
> +#else
> +#define XDP_UMEM_MAX_CHUNK_SHIFT min(16, PAGE_SHIFT)
> +#endif
> +#define XDP_UMEM_MAX_CHUNK_SIZE (1 << XDP_UMEM_MAX_CHUNK_SHIFT)
> +
>  #ifdef CONFIG_XDP_SOCKETS
>
>  void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries);
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index a8d7b8a3688a..af822b322d89 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -68,6 +68,8 @@ struct xsk_buff_pool {
>         struct xdp_desc *tx_descs;
>         u64 chunk_mask;
>         u64 addrs_cnt;
> +       u32 page_shift;
> +       u32 page_size;
>         u32 free_list_cnt;
>         u32 dma_pages_cnt;
>         u32 free_heads_cnt;
> @@ -123,8 +125,8 @@ static inline void xp_init_xskb_addr(struct xdp_buff_xsk *xskb, struct xsk_buff_
>  static inline void xp_init_xskb_dma(struct xdp_buff_xsk *xskb, struct xsk_buff_pool *pool,
>                                     dma_addr_t *dma_pages, u64 addr)
>  {
> -       xskb->frame_dma = (dma_pages[addr >> PAGE_SHIFT] & ~XSK_NEXT_PG_CONTIG_MASK) +
> -               (addr & ~PAGE_MASK);
> +       xskb->frame_dma = (dma_pages[addr >> pool->page_shift] & ~XSK_NEXT_PG_CONTIG_MASK) +
> +                         (addr & (pool->page_size - 1));
>         xskb->dma = xskb->frame_dma + pool->headroom + XDP_PACKET_HEADROOM;
>  }
>
> @@ -175,13 +177,13 @@ static inline void xp_dma_sync_for_device(struct xsk_buff_pool *pool,
>  static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
>                                                  u64 addr, u32 len)
>  {
> -       bool cross_pg = (addr & (PAGE_SIZE - 1)) + len > PAGE_SIZE;
> +       bool cross_pg = (addr & (pool->page_size - 1)) + len > pool->page_size;
>
>         if (likely(!cross_pg))
>                 return false;
>
>         return pool->dma_pages &&
> -              !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
> +              !(pool->dma_pages[addr >> pool->page_shift] & XSK_NEXT_PG_CONTIG_MASK);
>  }
>
>  static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr)
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 4681e8e8ad94..6fb984be8f40 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -10,6 +10,8 @@
>  #include <linux/uaccess.h>
>  #include <linux/slab.h>
>  #include <linux/bpf.h>
> +#include <linux/hugetlb.h>
> +#include <linux/hugetlb_inline.h>
>  #include <linux/mm.h>
>  #include <linux/netdevice.h>
>  #include <linux/rtnetlink.h>
> @@ -91,9 +93,39 @@ void xdp_put_umem(struct xdp_umem *umem, bool defer_cleanup)
>         }
>  }
>
> +/* NOTE: The mmap_lock must be held by the caller. */
> +static void xdp_umem_init_page_size(struct xdp_umem *umem, unsigned long address)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> +       struct vm_area_struct *vma;
> +       struct vma_iterator vmi;
> +       unsigned long end;
> +
> +       if (!IS_ALIGNED(address, HPAGE_SIZE))
> +               goto no_hugetlb;
> +
> +       vma_iter_init(&vmi, current->mm, address);
> +       end = address + umem->size;
> +
> +       for_each_vma_range(vmi, vma, end) {
> +               if (!is_vm_hugetlb_page(vma))
> +                       goto no_hugetlb;
> +               /* Hugepage sizes smaller than the default are not supported. */
> +               if (huge_page_size(hstate_vma(vma)) < HPAGE_SIZE)
> +                       goto no_hugetlb;
> +       }
> +
> +       umem->page_shift = HPAGE_SHIFT;
> +       umem->page_size = HPAGE_SIZE;
> +       return;
> +no_hugetlb:
> +#endif
> +       umem->page_shift = PAGE_SHIFT;
> +       umem->page_size = PAGE_SIZE;
> +}
> +
>  static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
>  {
> -       unsigned int gup_flags = FOLL_WRITE;
>         long npgs;
>         int err;
>
> @@ -102,8 +134,18 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
>                 return -ENOMEM;
>
>         mmap_read_lock(current->mm);
> +
> +       xdp_umem_init_page_size(umem, address);
> +
> +       if (umem->chunk_size > umem->page_size) {
> +               mmap_read_unlock(current->mm);
> +               err = -EINVAL;
> +               goto out_pgs;
> +       }
> +
>         npgs = pin_user_pages(address, umem->npgs,
> -                             gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
> +                             FOLL_WRITE | FOLL_LONGTERM, &umem->pgs[0], NULL);
> +
>         mmap_read_unlock(current->mm);
>
>         if (npgs != umem->npgs) {
> @@ -156,15 +198,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>         unsigned int chunks, chunks_rem;
>         int err;
>
> -       if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
> -               /* Strictly speaking we could support this, if:
> -                * - huge pages, or*
> -                * - using an IOMMU, or
> -                * - making sure the memory area is consecutive
> -                * but for now, we simply say "computer says no".
> -                */
> +       if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > XDP_UMEM_MAX_CHUNK_SIZE)
>                 return -EINVAL;
> -       }
>
>         if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>                 return -EINVAL;
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 26f6d304451e..85b36c31b505 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -75,14 +75,16 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
>
>         pool->chunk_mask = ~((u64)umem->chunk_size - 1);
>         pool->addrs_cnt = umem->size;
> +       pool->page_shift = umem->page_shift;
> +       pool->page_size = umem->page_size;
>         pool->heads_cnt = umem->chunks;
>         pool->free_heads_cnt = umem->chunks;
>         pool->headroom = umem->headroom;
>         pool->chunk_size = umem->chunk_size;
>         pool->chunk_shift = ffs(umem->chunk_size) - 1;
> -       pool->unaligned = unaligned;
>         pool->frame_len = umem->chunk_size - umem->headroom -
>                 XDP_PACKET_HEADROOM;
> +       pool->unaligned = unaligned;

nit: This change is not necessary.

>         pool->umem = umem;
>         pool->addrs = umem->addrs;
>         INIT_LIST_HEAD(&pool->free_list);
> @@ -328,7 +330,8 @@ static void xp_destroy_dma_map(struct xsk_dma_map *dma_map)
>         kfree(dma_map);
>  }
>
> -static void __xp_dma_unmap(struct xsk_dma_map *dma_map, unsigned long attrs)
> +static void __xp_dma_unmap(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map,
> +                          unsigned long attrs)

Instead of sending down the whole buffer pool, it would be better to
pass down the page_size here. __xp_dma_unmap(*dma_map, attrs,
page_size)

Also makes it consistent with the check_dma_contiguity below.

>  {
>         dma_addr_t *dma;
>         u32 i;
> @@ -337,7 +340,7 @@ static void __xp_dma_unmap(struct xsk_dma_map *dma_map, unsigned long attrs)
>                 dma = &dma_map->dma_pages[i];
>                 if (*dma) {
>                         *dma &= ~XSK_NEXT_PG_CONTIG_MASK;
> -                       dma_unmap_page_attrs(dma_map->dev, *dma, PAGE_SIZE,
> +                       dma_unmap_page_attrs(dma_map->dev, *dma, pool->page_size,
>                                              DMA_BIDIRECTIONAL, attrs);
>                         *dma = 0;
>                 }
> @@ -362,7 +365,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
>         if (!refcount_dec_and_test(&dma_map->users))
>                 return;
>
> -       __xp_dma_unmap(dma_map, attrs);
> +       __xp_dma_unmap(pool, dma_map, attrs);
>         kvfree(pool->dma_pages);
>         pool->dma_pages = NULL;
>         pool->dma_pages_cnt = 0;
> @@ -370,16 +373,17 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
>  }
>  EXPORT_SYMBOL(xp_dma_unmap);
>
> -static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
> +static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map, u32 page_size)
>  {
>         u32 i;
>
> -       for (i = 0; i < dma_map->dma_pages_cnt - 1; i++) {
> -               if (dma_map->dma_pages[i] + PAGE_SIZE == dma_map->dma_pages[i + 1])
> +       for (i = 0; i + 1 < dma_map->dma_pages_cnt; i++) {

I think the previous version is clearer than this new one.

> +               if (dma_map->dma_pages[i] + page_size == dma_map->dma_pages[i + 1])
>                         dma_map->dma_pages[i] |= XSK_NEXT_PG_CONTIG_MASK;
>                 else
>                         dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK;
>         }
> +       dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK;
>  }
>
>  static int xp_init_dma_info(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map)
> @@ -412,6 +416,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>  {
>         struct xsk_dma_map *dma_map;
>         dma_addr_t dma;
> +       u32 stride;
>         int err;
>         u32 i;
>
> @@ -425,15 +430,19 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>                 return 0;
>         }
>
> +       /* dma_pages use pool->page_size whereas `pages` are always order-0. */
> +       stride = pool->page_size >> PAGE_SHIFT; /* in order-0 pages */
> +       nr_pages = (nr_pages + stride - 1) >> (pool->page_shift - PAGE_SHIFT);
> +
>         dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, pool->umem);
>         if (!dma_map)
>                 return -ENOMEM;
>
>         for (i = 0; i < dma_map->dma_pages_cnt; i++) {
> -               dma = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE,
> +               dma = dma_map_page_attrs(dev, pages[i * stride], 0, pool->page_size,
>                                          DMA_BIDIRECTIONAL, attrs);
>                 if (dma_mapping_error(dev, dma)) {
> -                       __xp_dma_unmap(dma_map, attrs);
> +                       __xp_dma_unmap(pool, dma_map, attrs);
>                         return -ENOMEM;
>                 }
>                 if (dma_need_sync(dev, dma))
> @@ -442,11 +451,11 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>         }
>
>         if (pool->unaligned)
> -               xp_check_dma_contiguity(dma_map);
> +               xp_check_dma_contiguity(dma_map, pool->page_size);
>
>         err = xp_init_dma_info(pool, dma_map);
>         if (err) {
> -               __xp_dma_unmap(dma_map, attrs);
> +               __xp_dma_unmap(pool, dma_map, attrs);
>                 return err;
>         }
>
> @@ -663,9 +672,8 @@ EXPORT_SYMBOL(xp_raw_get_data);
>  dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
>  {
>         addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
> -       return (pool->dma_pages[addr >> PAGE_SHIFT] &
> -               ~XSK_NEXT_PG_CONTIG_MASK) +
> -               (addr & ~PAGE_MASK);
> +       return (pool->dma_pages[addr >> pool->page_shift] & ~XSK_NEXT_PG_CONTIG_MASK) +
> +              (addr & (pool->page_size - 1));
>  }
>  EXPORT_SYMBOL(xp_raw_get_dma);
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-10 12:06 [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
                   ` (3 preceding siblings ...)
  2023-04-10 12:06 ` [PATCH bpf-next v5 4/4] selftests: xsk: Add tests for 8K and 9K frame sizes Kal Conley
@ 2023-04-12 13:43 ` Magnus Karlsson
  2023-04-12 18:51   ` Kal Cutter Conley
  4 siblings, 1 reply; 12+ messages in thread
From: Magnus Karlsson @ 2023-04-12 13:43 UTC (permalink / raw)
  To: Kal Conley
  Cc: Magnus Karlsson, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Mon, 10 Apr 2023 at 14:08, Kal Conley <kal.conley@dectris.com> wrote:
>
> The main purpose of this patchset is to add AF_XDP support for UMEM
> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB
> pages.
>
> Note, v5 fixes a major bug in previous versions of this patchset.
> In particular, dma_map_page_attrs used to be called once for each
> order-0 page in a hugepage with the assumption that returned I/O
> addresses are contiguous within a hugepage. This assumption is incorrect
> when an IOMMU is enabled. To fix this, v5 does DMA page accounting
> accounting at hugepage granularity.

Thank you so much Kal for implementing this feature. After you have
fixed the three small things I had for patch #2, you have my ack for
the whole set below. Please add it.

For the whole set:
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

It would be great if you have the time and desire to also take this to
zero-copy mode. I have had multiple AF_XDP users mailing me privately
that such a feature would be very useful for them. For some of them it
was even a requirement to be able to get down to the latencies they
were aiming for.

> Changes since v4:
>   * Use hugepages in DMA map (fixes zero-copy mode with IOMMU).
>   * Use pool->dma_pages to check for DMA. This change is needed to avoid
>     performance regressions).
>   * Update commit message and benchmark table.
>
> Changes since v3:
>   * Fix checkpatch.pl whitespace error.
>
> Changes since v2:
>   * Related fixes/improvements included with v2 have been removed. These
>     changes have all been resubmitted as standalone patchsets.
>   * Minimize uses of #ifdef CONFIG_HUGETLB_PAGE.
>   * Improve AF_XDP documentation.
>   * Update benchmark table in commit message.
>
> Changes since v1:
>   * Add many fixes/improvements to the XSK selftests.
>   * Add check for unaligned descriptors that overrun UMEM.
>   * Fix compile errors when CONFIG_HUGETLB_PAGE is not set.
>   * Fix incorrect use of _Static_assert.
>   * Update AF_XDP documentation.
>   * Rename unaligned 9K frame size test.
>   * Make xp_check_dma_contiguity less conservative.
>   * Add more information to benchmark table.
>
> Thanks to Magnus Karlsson for all his support!
>
> Happy Easter!
>
> Kal Conley (4):
>   xsk: Use pool->dma_pages to check for DMA
>   xsk: Support UMEM chunk_size > PAGE_SIZE
>   selftests: xsk: Use hugepages when umem->frame_size > PAGE_SIZE
>   selftests: xsk: Add tests for 8K and 9K frame sizes
>
>  Documentation/networking/af_xdp.rst      | 36 ++++++++++------
>  include/net/xdp_sock.h                   |  2 +
>  include/net/xdp_sock_drv.h               | 12 ++++++
>  include/net/xsk_buff_pool.h              | 12 +++---
>  net/xdp/xdp_umem.c                       | 55 +++++++++++++++++++-----
>  net/xdp/xsk_buff_pool.c                  | 43 ++++++++++--------
>  tools/testing/selftests/bpf/xskxceiver.c | 27 +++++++++++-
>  tools/testing/selftests/bpf/xskxceiver.h |  2 +
>  8 files changed, 142 insertions(+), 47 deletions(-)
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-12 13:39   ` Magnus Karlsson
@ 2023-04-12 14:35     ` Kal Cutter Conley
  2023-04-12 15:02       ` Magnus Karlsson
  0 siblings, 1 reply; 12+ messages in thread
From: Kal Cutter Conley @ 2023-04-12 14:35 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, linux-doc, linux-kernel

> > -       pool->unaligned = unaligned;
> >         pool->frame_len = umem->chunk_size - umem->headroom -
> >                 XDP_PACKET_HEADROOM;
> > +       pool->unaligned = unaligned;
>
> nit: This change is not necessary.

Do you mind if we keep it? It makes the assignments better match the
order in the struct declaration.

> > -static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
> > +static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map, u32 page_size)
> >  {
> >         u32 i;
> >
> > -       for (i = 0; i < dma_map->dma_pages_cnt - 1; i++) {
> > -               if (dma_map->dma_pages[i] + PAGE_SIZE == dma_map->dma_pages[i + 1])
> > +       for (i = 0; i + 1 < dma_map->dma_pages_cnt; i++) {
>
> I think the previous version is clearer than this new one.

I like using `i + 1` since it matches the subscript usage. I'm used to
writing it like this for SIMD code where subtraction may wrap if the
length is unsigned, that doesn't matter in this case though. I can
restore the old way if you want.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-12 14:35     ` Kal Cutter Conley
@ 2023-04-12 15:02       ` Magnus Karlsson
  0 siblings, 0 replies; 12+ messages in thread
From: Magnus Karlsson @ 2023-04-12 15:02 UTC (permalink / raw)
  To: Kal Cutter Conley
  Cc: Magnus Karlsson, Björn Töpel, Maciej Fijalkowski,
	Jonathan Lemon, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, linux-doc, linux-kernel

On Wed, 12 Apr 2023 at 16:30, Kal Cutter Conley <kal.conley@dectris.com> wrote:
>
> > > -       pool->unaligned = unaligned;
> > >         pool->frame_len = umem->chunk_size - umem->headroom -
> > >                 XDP_PACKET_HEADROOM;
> > > +       pool->unaligned = unaligned;
> >
> > nit: This change is not necessary.
>
> Do you mind if we keep it? It makes the assignments better match the
> order in the struct declaration.

Do not mind.

> > > -static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
> > > +static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map, u32 page_size)
> > >  {
> > >         u32 i;
> > >
> > > -       for (i = 0; i < dma_map->dma_pages_cnt - 1; i++) {
> > > -               if (dma_map->dma_pages[i] + PAGE_SIZE == dma_map->dma_pages[i + 1])
> > > +       for (i = 0; i + 1 < dma_map->dma_pages_cnt; i++) {
> >
> > I think the previous version is clearer than this new one.
>
> I like using `i + 1` since it matches the subscript usage. I'm used to
> writing it like this for SIMD code where subtraction may wrap if the
> length is unsigned, that doesn't matter in this case though. I can
> restore the old way if you want.

Please restore it in that case. I am not used to SIMD code :-).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-12 13:43 ` [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Magnus Karlsson
@ 2023-04-12 18:51   ` Kal Cutter Conley
  2023-04-20 23:41     ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Kal Cutter Conley @ 2023-04-12 18:51 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Magnus Karlsson, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, open list:NETWORKING [GENERAL],
	bpf, Tariq Toukan, Gal Pressman, Saeed Mahameed

> Thank you so much Kal for implementing this feature. After you have
> fixed the three small things I had for patch #2, you have my ack for
> the whole set below. Please add it.
>
> For the whole set:
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
>
> It would be great if you have the time and desire to also take this to
> zero-copy mode. I have had multiple AF_XDP users mailing me privately
> that such a feature would be very useful for them. For some of them it
> was even a requirement to be able to get down to the latencies they
> were aiming for.

Yes. We need this to work with zero-copy so next I will look into
implementing this for the mlx5 driver since it has to work for us on
Mellanox adapters.

Roping in the Mellanox engineers in case they want to add something :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE
  2023-04-12 18:51   ` Kal Cutter Conley
@ 2023-04-20 23:41     ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2023-04-20 23:41 UTC (permalink / raw)
  To: Kal Cutter Conley
  Cc: Magnus Karlsson, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend,
	open list:NETWORKING [GENERAL],
	bpf, Tariq Toukan, Gal Pressman, Saeed Mahameed

On Wed, Apr 12, 2023 at 11:47 AM Kal Cutter Conley
<kal.conley@dectris.com> wrote:
>
> > Thank you so much Kal for implementing this feature. After you have
> > fixed the three small things I had for patch #2, you have my ack for
> > the whole set below. Please add it.
> >
> > For the whole set:
> > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> >
> > It would be great if you have the time and desire to also take this to
> > zero-copy mode. I have had multiple AF_XDP users mailing me privately
> > that such a feature would be very useful for them. For some of them it
> > was even a requirement to be able to get down to the latencies they
> > were aiming for.
>
> Yes. We need this to work with zero-copy so next I will look into
> implementing this for the mlx5 driver since it has to work for us on
> Mellanox adapters.

My understanding is that the v6 version of this set is good to go.
Unfortunately it gained a merge conflict while sitting in patchwork.
Please respin with all acks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-04-20 23:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-10 12:06 [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
2023-04-10 12:06 ` [PATCH bpf-next v5 1/4] xsk: Use pool->dma_pages to check for DMA Kal Conley
2023-04-10 12:06 ` [PATCH bpf-next v5 2/4] xsk: Support UMEM chunk_size > PAGE_SIZE Kal Conley
2023-04-11  3:10   ` Bagas Sanjaya
2023-04-12 13:39   ` Magnus Karlsson
2023-04-12 14:35     ` Kal Cutter Conley
2023-04-12 15:02       ` Magnus Karlsson
2023-04-10 12:06 ` [PATCH bpf-next v5 3/4] selftests: xsk: Use hugepages when umem->frame_size " Kal Conley
2023-04-10 12:06 ` [PATCH bpf-next v5 4/4] selftests: xsk: Add tests for 8K and 9K frame sizes Kal Conley
2023-04-12 13:43 ` [PATCH bpf-next v5 0/4] xsk: Support UMEM chunk_size > PAGE_SIZE Magnus Karlsson
2023-04-12 18:51   ` Kal Cutter Conley
2023-04-20 23:41     ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).