linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits
@ 2019-08-15 12:13 Ivan Khoronzhuk
  2019-08-15 12:13 ` [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall Ivan Khoronzhuk
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk

This patchset contains several improvements for af_xdp socket umem
mappings for 32bit systems. Also, there is one more patch outside of
this series that on linux-next tree and related to mmap2 af_xdp umem
offsets: "mm: mmap: increase sockets maximum memory size pgoff for 32bits"
https://lkml.org/lkml/2019/8/12/549

Based on bpf-next/master

Prev: https://lkml.org/lkml/2019/8/13/437

v2..v1:
	- replaced "libbpf: add asm/unistd.h to xsk to get __NR_mmap2" on
	 "libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2
	 syscall"
	- use vmap along with page_address to avoid overkill
	- define mmap syscall trace5 for mmap if defined

Ivan Khoronzhuk (3):
  libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall
  xdp: xdp_umem: replace kmap on vmap for umem map
  samples: bpf: syscal_nrs: use mmap2 if defined

 net/xdp/xdp_umem.c         | 36 +++++++++++++++++++++++-----
 samples/bpf/syscall_nrs.c  |  6 +++++
 samples/bpf/tracex5_kern.c | 13 ++++++++++
 tools/lib/bpf/Makefile     |  1 +
 tools/lib/bpf/xsk.c        | 49 +++++++++++---------------------------
 5 files changed, 64 insertions(+), 41 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall
  2019-08-15 12:13 [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Ivan Khoronzhuk
@ 2019-08-15 12:13 ` Ivan Khoronzhuk
  2019-08-16  0:06   ` Yonghong Song
  2019-08-15 12:13 ` [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map Ivan Khoronzhuk
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk

Drop __NR_mmap2 fork in flavor of LFS, that is _FILE_OFFSET_BITS=64
(glibc & bionic) / LARGEFILE64_SOURCE (for musl) decision. It allows
mmap() to use 64bit offset that is passed to mmap2 syscall. As result
pgoff is not truncated and no need to use direct access to mmap2 for
32 bits systems.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 tools/lib/bpf/Makefile |  1 +
 tools/lib/bpf/xsk.c    | 49 ++++++++++++------------------------------
 2 files changed, 15 insertions(+), 35 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 9312066a1ae3..844f6cd79c03 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -113,6 +113,7 @@ override CFLAGS += -Werror -Wall
 override CFLAGS += -fPIC
 override CFLAGS += $(INCLUDES)
 override CFLAGS += -fvisibility=hidden
+override CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
 
 ifeq ($(VERBOSE),1)
   Q =
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 680e63066cf3..7392f428c07b 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -74,23 +74,6 @@ struct xsk_nl_info {
 	int fd;
 };
 
-/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
- * Unfortunately, it is not part of glibc.
- */
-static inline void *xsk_mmap(void *addr, size_t length, int prot, int flags,
-			     int fd, __u64 offset)
-{
-#ifdef __NR_mmap2
-	unsigned int page_shift = __builtin_ffs(getpagesize()) - 1;
-	long ret = syscall(__NR_mmap2, addr, length, prot, flags, fd,
-			   (off_t)(offset >> page_shift));
-
-	return (void *)ret;
-#else
-	return mmap(addr, length, prot, flags, fd, offset);
-#endif
-}
-
 int xsk_umem__fd(const struct xsk_umem *umem)
 {
 	return umem ? umem->fd : -EINVAL;
@@ -210,10 +193,9 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 		goto out_socket;
 	}
 
-	map = xsk_mmap(NULL, off.fr.desc +
-		       umem->config.fill_size * sizeof(__u64),
-		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
-		       umem->fd, XDP_UMEM_PGOFF_FILL_RING);
+	map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64),
+		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
+		   XDP_UMEM_PGOFF_FILL_RING);
 	if (map == MAP_FAILED) {
 		err = -errno;
 		goto out_socket;
@@ -227,10 +209,9 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	fill->ring = map + off.fr.desc;
 	fill->cached_cons = umem->config.fill_size;
 
-	map = xsk_mmap(NULL,
-		       off.cr.desc + umem->config.comp_size * sizeof(__u64),
-		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
-		       umem->fd, XDP_UMEM_PGOFF_COMPLETION_RING);
+	map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64),
+		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
+		   XDP_UMEM_PGOFF_COMPLETION_RING);
 	if (map == MAP_FAILED) {
 		err = -errno;
 		goto out_mmap;
@@ -550,11 +531,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	}
 
 	if (rx) {
-		rx_map = xsk_mmap(NULL, off.rx.desc +
-				  xsk->config.rx_size * sizeof(struct xdp_desc),
-				  PROT_READ | PROT_WRITE,
-				  MAP_SHARED | MAP_POPULATE,
-				  xsk->fd, XDP_PGOFF_RX_RING);
+		rx_map = mmap(NULL, off.rx.desc +
+			      xsk->config.rx_size * sizeof(struct xdp_desc),
+			      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+			      xsk->fd, XDP_PGOFF_RX_RING);
 		if (rx_map == MAP_FAILED) {
 			err = -errno;
 			goto out_socket;
@@ -569,11 +549,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	xsk->rx = rx;
 
 	if (tx) {
-		tx_map = xsk_mmap(NULL, off.tx.desc +
-				  xsk->config.tx_size * sizeof(struct xdp_desc),
-				  PROT_READ | PROT_WRITE,
-				  MAP_SHARED | MAP_POPULATE,
-				  xsk->fd, XDP_PGOFF_TX_RING);
+		tx_map = mmap(NULL, off.tx.desc +
+			      xsk->config.tx_size * sizeof(struct xdp_desc),
+			      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+			      xsk->fd, XDP_PGOFF_TX_RING);
 		if (tx_map == MAP_FAILED) {
 			err = -errno;
 			goto out_mmap_rx;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
  2019-08-15 12:13 [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Ivan Khoronzhuk
  2019-08-15 12:13 ` [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall Ivan Khoronzhuk
@ 2019-08-15 12:13 ` Ivan Khoronzhuk
  2019-08-15 18:23   ` Jonathan Lemon
  2019-08-15 19:33   ` Jonathan Lemon
  2019-08-15 12:13 ` [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined Ivan Khoronzhuk
  2019-08-21 12:35 ` [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Daniel Borkmann
  3 siblings, 2 replies; 11+ messages in thread
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk

For 64-bit there is no reason to use vmap/vunmap, so use page_address
as it was initially. For 32 bits, in some apps, like in samples
xdpsock_user.c when number of pgs in use is quite big, the kmap
memory can be not enough, despite on this, kmap looks like is
deprecated in such cases as it can block and should be used rather
for dynamic mm.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index a0607969f8c0..d740c4f8810c 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -14,7 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/rtnetlink.h>
 #include <linux/idr.h>
-#include <linux/highmem.h>
+#include <linux/vmalloc.h>
 
 #include "xdp_umem.h"
 #include "xsk_queue.h"
@@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct xdp_umem *umem)
 	unsigned int i;
 
 	for (i = 0; i < umem->npgs; i++)
-		kunmap(umem->pgs[i]);
+		if (PageHighMem(umem->pgs[i]))
+			vunmap(umem->pages[i].addr);
+}
+
+static int xdp_umem_map_pages(struct xdp_umem *umem)
+{
+	unsigned int i;
+	void *addr;
+
+	for (i = 0; i < umem->npgs; i++) {
+		if (PageHighMem(umem->pgs[i]))
+			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
+		else
+			addr = page_address(umem->pgs[i]);
+
+		if (!addr) {
+			xdp_umem_unmap_pages(umem);
+			return -ENOMEM;
+		}
+
+		umem->pages[i].addr = addr;
+	}
+
+	return 0;
 }
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
@@ -312,7 +335,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
-	int size_chk, err, i;
+	int size_chk, err;
 
 	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
 		/* Strictly speaking we could support this, if:
@@ -378,10 +401,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		goto out_account;
 	}
 
-	for (i = 0; i < umem->npgs; i++)
-		umem->pages[i].addr = kmap(umem->pgs[i]);
+	err = xdp_umem_map_pages(umem);
+	if (!err)
+		return 0;
 
-	return 0;
+	kfree(umem->pages);
 
 out_account:
 	xdp_umem_unaccount_pages(umem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined
  2019-08-15 12:13 [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Ivan Khoronzhuk
  2019-08-15 12:13 ` [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall Ivan Khoronzhuk
  2019-08-15 12:13 ` [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map Ivan Khoronzhuk
@ 2019-08-15 12:13 ` Ivan Khoronzhuk
  2019-08-19 21:17   ` Jonathan Lemon
  2019-08-21 12:35 ` [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Daniel Borkmann
  3 siblings, 1 reply; 11+ messages in thread
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk

For arm32 xdp sockets mmap2 is preferred, so use it if it's defined.
Declaration of __NR_mmap can be skipped and it breaks build.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/syscall_nrs.c  |  6 ++++++
 samples/bpf/tracex5_kern.c | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/samples/bpf/syscall_nrs.c b/samples/bpf/syscall_nrs.c
index 516e255cbe8f..88f940052450 100644
--- a/samples/bpf/syscall_nrs.c
+++ b/samples/bpf/syscall_nrs.c
@@ -9,5 +9,11 @@ void syscall_defines(void)
 	COMMENT("Linux system call numbers.");
 	SYSNR(__NR_write);
 	SYSNR(__NR_read);
+#ifdef __NR_mmap2
+	SYSNR(__NR_mmap2);
+#endif
+#ifdef __NR_mmap
 	SYSNR(__NR_mmap);
+#endif
+
 }
diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c
index f57f4e1ea1ec..35cb0eed3be5 100644
--- a/samples/bpf/tracex5_kern.c
+++ b/samples/bpf/tracex5_kern.c
@@ -68,12 +68,25 @@ PROG(SYS__NR_read)(struct pt_regs *ctx)
 	return 0;
 }
 
+#ifdef __NR_mmap2
+PROG(SYS__NR_mmap2)(struct pt_regs *ctx)
+{
+	char fmt[] = "mmap2\n";
+
+	bpf_trace_printk(fmt, sizeof(fmt));
+	return 0;
+}
+#endif
+
+#ifdef __NR_mmap
 PROG(SYS__NR_mmap)(struct pt_regs *ctx)
 {
 	char fmt[] = "mmap\n";
+
 	bpf_trace_printk(fmt, sizeof(fmt));
 	return 0;
 }
+#endif
 
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
  2019-08-15 12:13 ` [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map Ivan Khoronzhuk
@ 2019-08-15 18:23   ` Jonathan Lemon
  2019-08-15 19:19     ` Ivan Khoronzhuk
  2019-08-15 19:33   ` Jonathan Lemon
  1 sibling, 1 reply; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-15 18:23 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko

On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:

> For 64-bit there is no reason to use vmap/vunmap, so use page_address
> as it was initially. For 32 bits, in some apps, like in samples
> xdpsock_user.c when number of pgs in use is quite big, the kmap
> memory can be not enough, despite on this, kmap looks like is
> deprecated in such cases as it can block and should be used rather
> for dynamic mm.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index a0607969f8c0..d740c4f8810c 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -14,7 +14,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/idr.h>
> -#include <linux/highmem.h>
> +#include <linux/vmalloc.h>
>
>  #include "xdp_umem.h"
>  #include "xsk_queue.h"
> @@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct xdp_umem 
> *umem)
>  	unsigned int i;
>
>  	for (i = 0; i < umem->npgs; i++)
> -		kunmap(umem->pgs[i]);
> +		if (PageHighMem(umem->pgs[i]))
> +			vunmap(umem->pages[i].addr);
> +}
> +
> +static int xdp_umem_map_pages(struct xdp_umem *umem)
> +{
> +	unsigned int i;
> +	void *addr;
> +
> +	for (i = 0; i < umem->npgs; i++) {
> +		if (PageHighMem(umem->pgs[i]))
> +			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
> +		else
> +			addr = page_address(umem->pgs[i]);
> +
> +		if (!addr) {
> +			xdp_umem_unmap_pages(umem);
> +			return -ENOMEM;
> +		}
> +
> +		umem->pages[i].addr = addr;
> +	}
> +
> +	return 0;
>  }

You'll want a __xdp_umem_unmap_pages() helper here that takes an
count of the number of pages to unmap, so it can be called from
xdp_umem_unmap_pages() in the normal case, and xdp_umem_map_pages()
in the error case.  Otherwise the error case ends up calling
PageHighMem on a null page.
-- 
Jonathan

>  static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> @@ -312,7 +335,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> -	int size_chk, err, i;
> +	int size_chk, err;
>
>  	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) 
> {
>  		/* Strictly speaking we could support this, if:
> @@ -378,10 +401,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		goto out_account;
>  	}
>
> -	for (i = 0; i < umem->npgs; i++)
> -		umem->pages[i].addr = kmap(umem->pgs[i]);
> +	err = xdp_umem_map_pages(umem);
> +	if (!err)
> +		return 0;
>
> -	return 0;
> +	kfree(umem->pages);
>
>  out_account:
>  	xdp_umem_unaccount_pages(umem);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
  2019-08-15 18:23   ` Jonathan Lemon
@ 2019-08-15 19:19     ` Ivan Khoronzhuk
  2019-08-15 19:32       ` Jonathan Lemon
  0 siblings, 1 reply; 11+ messages in thread
From: Ivan Khoronzhuk @ 2019-08-15 19:19 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko

On Thu, Aug 15, 2019 at 11:23:16AM -0700, Jonathan Lemon wrote:
>On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:
>
>>For 64-bit there is no reason to use vmap/vunmap, so use page_address
>>as it was initially. For 32 bits, in some apps, like in samples
>>xdpsock_user.c when number of pgs in use is quite big, the kmap
>>memory can be not enough, despite on this, kmap looks like is
>>deprecated in such cases as it can block and should be used rather
>>for dynamic mm.
>>
>>Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>---
>> net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
>> 1 file changed, 30 insertions(+), 6 deletions(-)
>>
>>diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>>index a0607969f8c0..d740c4f8810c 100644
>>--- a/net/xdp/xdp_umem.c
>>+++ b/net/xdp/xdp_umem.c
>>@@ -14,7 +14,7 @@
>> #include <linux/netdevice.h>
>> #include <linux/rtnetlink.h>
>> #include <linux/idr.h>
>>-#include <linux/highmem.h>
>>+#include <linux/vmalloc.h>
>>
>> #include "xdp_umem.h"
>> #include "xsk_queue.h"
>>@@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct 
>>xdp_umem *umem)
>> 	unsigned int i;
>>
>> 	for (i = 0; i < umem->npgs; i++)
>>-		kunmap(umem->pgs[i]);
>>+		if (PageHighMem(umem->pgs[i]))
>>+			vunmap(umem->pages[i].addr);
>>+}
>>+
>>+static int xdp_umem_map_pages(struct xdp_umem *umem)
>>+{
>>+	unsigned int i;
>>+	void *addr;
>>+
>>+	for (i = 0; i < umem->npgs; i++) {
>>+		if (PageHighMem(umem->pgs[i]))
>>+			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
>>+		else
>>+			addr = page_address(umem->pgs[i]);
>>+
>>+		if (!addr) {
>>+			xdp_umem_unmap_pages(umem);
>>+			return -ENOMEM;
>>+		}
>>+
>>+		umem->pages[i].addr = addr;
>>+	}
>>+
>>+	return 0;
>> }
>
>You'll want a __xdp_umem_unmap_pages() helper here that takes an
>count of the number of pages to unmap, so it can be called from
>xdp_umem_unmap_pages() in the normal case, and xdp_umem_map_pages()
>in the error case.  Otherwise the error case ends up calling
>PageHighMem on a null page.
>-- 
>Jonathan

Do you mean null address?
If so, then vunmap do nothing if it's null, and addr is null if it's not
assigned... and it's not assigned w/o correct mapping...

If you mean null page, then it is not possible after all they are
pinned above, here: xdp_umem_pin_pages(), thus assigned.

Or I missed smth?

Despite of this, seems like here should be one more patch, adding unpinning page
in error path, but this not related to this change. Will do this in follow up
fix patch, if no objection to my explanation, ofc.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
  2019-08-15 19:19     ` Ivan Khoronzhuk
@ 2019-08-15 19:32       ` Jonathan Lemon
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-15 19:32 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko



On 15 Aug 2019, at 12:19, Ivan Khoronzhuk wrote:

> On Thu, Aug 15, 2019 at 11:23:16AM -0700, Jonathan Lemon wrote:
>> On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:
>>
>>> For 64-bit there is no reason to use vmap/vunmap, so use 
>>> page_address
>>> as it was initially. For 32 bits, in some apps, like in samples
>>> xdpsock_user.c when number of pgs in use is quite big, the kmap
>>> memory can be not enough, despite on this, kmap looks like is
>>> deprecated in such cases as it can block and should be used rather
>>> for dynamic mm.
>>>
>>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>> ---
>>> net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
>>> 1 file changed, 30 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>>> index a0607969f8c0..d740c4f8810c 100644
>>> --- a/net/xdp/xdp_umem.c
>>> +++ b/net/xdp/xdp_umem.c
>>> @@ -14,7 +14,7 @@
>>> #include <linux/netdevice.h>
>>> #include <linux/rtnetlink.h>
>>> #include <linux/idr.h>
>>> -#include <linux/highmem.h>
>>> +#include <linux/vmalloc.h>
>>>
>>> #include "xdp_umem.h"
>>> #include "xsk_queue.h"
>>> @@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct 
>>> xdp_umem *umem)
>>> 	unsigned int i;
>>>
>>> 	for (i = 0; i < umem->npgs; i++)
>>> -		kunmap(umem->pgs[i]);
>>> +		if (PageHighMem(umem->pgs[i]))
>>> +			vunmap(umem->pages[i].addr);
>>> +}
>>> +
>>> +static int xdp_umem_map_pages(struct xdp_umem *umem)
>>> +{
>>> +	unsigned int i;
>>> +	void *addr;
>>> +
>>> +	for (i = 0; i < umem->npgs; i++) {
>>> +		if (PageHighMem(umem->pgs[i]))
>>> +			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
>>> +		else
>>> +			addr = page_address(umem->pgs[i]);
>>> +
>>> +		if (!addr) {
>>> +			xdp_umem_unmap_pages(umem);
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		umem->pages[i].addr = addr;
>>> +	}
>>> +
>>> +	return 0;
>>> }
>>
>> You'll want a __xdp_umem_unmap_pages() helper here that takes an
>> count of the number of pages to unmap, so it can be called from
>> xdp_umem_unmap_pages() in the normal case, and xdp_umem_map_pages()
>> in the error case.  Otherwise the error case ends up calling
>> PageHighMem on a null page.
>> -- 
>> Jonathan
>
> Do you mean null address?
> If so, then vunmap do nothing if it's null, and addr is null if it's 
> not
> assigned... and it's not assigned w/o correct mapping...
>
> If you mean null page, then it is not possible after all they are
> pinned above, here: xdp_umem_pin_pages(), thus assigned.
>
> Or I missed smth?

No - I forgot about umem_pin_pages() - feel free to ignore my comments.
--
Jonathan

>
> Despite of this, seems like here should be one more patch, adding 
> unpinning page
> in error path, but this not related to this change. Will do this in 
> follow up
> fix patch, if no objection to my explanation, ofc.
>
> -- 
> Regards,
> Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
  2019-08-15 12:13 ` [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map Ivan Khoronzhuk
  2019-08-15 18:23   ` Jonathan Lemon
@ 2019-08-15 19:33   ` Jonathan Lemon
  1 sibling, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-15 19:33 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko



On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:

> For 64-bit there is no reason to use vmap/vunmap, so use page_address
> as it was initially. For 32 bits, in some apps, like in samples
> xdpsock_user.c when number of pgs in use is quite big, the kmap
> memory can be not enough, despite on this, kmap looks like is
> deprecated in such cases as it can block and should be used rather
> for dynamic mm.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall
  2019-08-15 12:13 ` [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall Ivan Khoronzhuk
@ 2019-08-16  0:06   ` Yonghong Song
  0 siblings, 0 replies; 11+ messages in thread
From: Yonghong Song @ 2019-08-16  0:06 UTC (permalink / raw)
  To: Ivan Khoronzhuk, magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, andrii.nakryiko



On 8/15/19 5:13 AM, Ivan Khoronzhuk wrote:
> Drop __NR_mmap2 fork in flavor of LFS, that is _FILE_OFFSET_BITS=64
> (glibc & bionic) / LARGEFILE64_SOURCE (for musl) decision. It allows
> mmap() to use 64bit offset that is passed to mmap2 syscall. As result
> pgoff is not truncated and no need to use direct access to mmap2 for
> 32 bits systems.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined
  2019-08-15 12:13 ` [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined Ivan Khoronzhuk
@ 2019-08-19 21:17   ` Jonathan Lemon
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-19 21:17 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko



On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:

> For arm32 xdp sockets mmap2 is preferred, so use it if it's defined.
> Declaration of __NR_mmap can be skipped and it breaks build.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>


> ---
>  samples/bpf/syscall_nrs.c  |  6 ++++++
>  samples/bpf/tracex5_kern.c | 13 +++++++++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/samples/bpf/syscall_nrs.c b/samples/bpf/syscall_nrs.c
> index 516e255cbe8f..88f940052450 100644
> --- a/samples/bpf/syscall_nrs.c
> +++ b/samples/bpf/syscall_nrs.c
> @@ -9,5 +9,11 @@ void syscall_defines(void)
>  	COMMENT("Linux system call numbers.");
>  	SYSNR(__NR_write);
>  	SYSNR(__NR_read);
> +#ifdef __NR_mmap2
> +	SYSNR(__NR_mmap2);
> +#endif
> +#ifdef __NR_mmap
>  	SYSNR(__NR_mmap);
> +#endif
> +
>  }
> diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c
> index f57f4e1ea1ec..35cb0eed3be5 100644
> --- a/samples/bpf/tracex5_kern.c
> +++ b/samples/bpf/tracex5_kern.c
> @@ -68,12 +68,25 @@ PROG(SYS__NR_read)(struct pt_regs *ctx)
>  	return 0;
>  }
>
> +#ifdef __NR_mmap2
> +PROG(SYS__NR_mmap2)(struct pt_regs *ctx)
> +{
> +	char fmt[] = "mmap2\n";
> +
> +	bpf_trace_printk(fmt, sizeof(fmt));
> +	return 0;
> +}
> +#endif
> +
> +#ifdef __NR_mmap
>  PROG(SYS__NR_mmap)(struct pt_regs *ctx)
>  {
>  	char fmt[] = "mmap\n";
> +
>  	bpf_trace_printk(fmt, sizeof(fmt));
>  	return 0;
>  }
> +#endif
>
>  char _license[] SEC("license") = "GPL";
>  u32 _version SEC("version") = LINUX_VERSION_CODE;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits
  2019-08-15 12:13 [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Ivan Khoronzhuk
                   ` (2 preceding siblings ...)
  2019-08-15 12:13 ` [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined Ivan Khoronzhuk
@ 2019-08-21 12:35 ` Daniel Borkmann
  3 siblings, 0 replies; 11+ messages in thread
From: Daniel Borkmann @ 2019-08-21 12:35 UTC (permalink / raw)
  To: Ivan Khoronzhuk, magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko

On 8/15/19 2:13 PM, Ivan Khoronzhuk wrote:
> This patchset contains several improvements for af_xdp socket umem
> mappings for 32bit systems. Also, there is one more patch outside of
> this series that on linux-next tree and related to mmap2 af_xdp umem
> offsets: "mm: mmap: increase sockets maximum memory size pgoff for 32bits"
> https://lkml.org/lkml/2019/8/12/549
> 
> Based on bpf-next/master
> 
> Prev: https://lkml.org/lkml/2019/8/13/437
> 
> v2..v1:
> 	- replaced "libbpf: add asm/unistd.h to xsk to get __NR_mmap2" on
> 	 "libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2
> 	 syscall"
> 	- use vmap along with page_address to avoid overkill
> 	- define mmap syscall trace5 for mmap if defined
> 
> Ivan Khoronzhuk (3):
>    libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall
>    xdp: xdp_umem: replace kmap on vmap for umem map
>    samples: bpf: syscal_nrs: use mmap2 if defined
> 
>   net/xdp/xdp_umem.c         | 36 +++++++++++++++++++++++-----
>   samples/bpf/syscall_nrs.c  |  6 +++++
>   samples/bpf/tracex5_kern.c | 13 ++++++++++
>   tools/lib/bpf/Makefile     |  1 +
>   tools/lib/bpf/xsk.c        | 49 +++++++++++---------------------------
>   5 files changed, 64 insertions(+), 41 deletions(-)
> 

Applied, and fixed up typo in last one's subject, thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-21 12:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 12:13 [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Ivan Khoronzhuk
2019-08-15 12:13 ` [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall Ivan Khoronzhuk
2019-08-16  0:06   ` Yonghong Song
2019-08-15 12:13 ` [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map Ivan Khoronzhuk
2019-08-15 18:23   ` Jonathan Lemon
2019-08-15 19:19     ` Ivan Khoronzhuk
2019-08-15 19:32       ` Jonathan Lemon
2019-08-15 19:33   ` Jonathan Lemon
2019-08-15 12:13 ` [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined Ivan Khoronzhuk
2019-08-19 21:17   ` Jonathan Lemon
2019-08-21 12:35 ` [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).