All of lore.kernel.org
 help / color / mirror / Atom feed
* [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP
@ 2018-03-12 19:23 John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 01/18] sock: make static tls function alloc_sg generic sock helper John Fastabend
                   ` (17 more replies)
  0 siblings, 18 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

This series adds a BPF hook for sendmsg and senfile by using
the ULP infrastructure and sockmap. A simple pseudocode example
would be,

  // load the programs
  bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
                &obj, &msg_prog);

  // lookup the sockmap
  bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");

  // get fd for sockmap
  map_fd_msg = bpf_map__fd(bpf_map_msg);

  // attach program to sockmap
  bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);

  // Add a socket 'fd' to sockmap at location 'i'
  bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);

  
After the above snippet any socket attached to the map would run
msg_prog on sendmsg and sendfile system calls.

Three additional helpers are added bpf_msg_apply_bytes(),
bpf_msg_cork_bytes(), and bpf_msg_pull_data(). With
bpf_msg_apply_bytes BPF programs can tell the infrastructure how
many bytes the given verdict should apply to. This has two cases.
First, a BPF program applies verdict to fewer bytes than in the
current sendmsg/sendfile msg this will apply the verdict to the
first N bytes of the message then run the BPF program again with
data pointers recalculated to the N+1 byte. The second case is the
BPF program applies a verdict to more bytes than the current sendmsg
or sendfile system call. In this case the infrastructure will cache
the verdict and apply it to future sendmsg/sendfile calls until the
byte limit is reached. This avoids the overhead of running BPF
programs on large payloads.

The helper bpf_msg_cork_bytes() handles a different case where
a BPF program can not reach a verdict on a msg until it receives
more bytes AND the program doesn't want to forward the packet
until it is known to be "good". The example case being a user
(albeit a dumb one probably) sends a N byte header in 1B system
calls. The BPF program can call bpf_msg_cork_bytes with the
required byte limit to reach a verdict and then the program will
only be called again once N bytes are received.

The last helper added in this series is bpf_msg_pull_data(). It
is used to pull data in for modification or reading. Similar to
how sk_pull_data() works msg_pull_data can be used to access data
not in the initial (data_start, data_end) range. For sendpage()
calls this is needed if any data is accessed because the BPF
sendpage hook initializes the data_start and data_end pointers to
zero. We do this because sendpage data is shared with the user
and can be modified during or after the BPF verdict possibly
invalidating any verdict the BPF program decides. For sendmsg
the data is already copied by the sendmsg bpf infrastructure so
we only copy the data if the user request a data range that is
not already linearized. This happens if the user requests larger
blocks of data that are not in a single scatterlist element. The
common case seems to be accessing headers which normally are
in the first scatterlist element and already linearized.

For more examples please review the sample program. There are
examples for all the actions and helpers there.

Patches 1-8 implement the above sockmap/BPF infrastructure. The
remaining patches flush out some minimal selftests and the sample
sockmap program. The sockmap sample program is the main vehicle
for testing this infrastructure and will be moved into selftests
shortly. The final patch in this series is a simple shell script
to run a set of tests. These are the tests I run after any changes
to sockmap. The next task on the list after this series is to
push those into selftests so we can avoid manually testing.

Couple notes on future items in the pipeline,

  0. move sample sockmap programs into selftests (noted above)
  1. add additional support for tcp flags, most are ignored now.
  2. add a Documentation/bpf/sockmap file with these details
  3. support stacked ULP types to allow this and ktls to cooperate
  4. Ingress flag support, redirect only supports egress here. The
     other redirect helpers support ingress and egress flags.
  5. add optimizations, I cut a few optimizations here in the
     first iteration of the code for later study/implementation

-v2 updates (discussion):

Dave noticed that sendpage call was previously (in v1) running
on the data directly. This allowed users to potentially modify
the data after or during the BPF program. However doing a copy
automatically even if the data is not accessed has measurable 
performance impact. So we added another helper modeled after
the existing skb_pull_data() helper to allow users to selectively
pull data from the msg. This is also useful in the sendmsg case
when users need to access data outside the first scatterlist
element or across scatterlist boundaries.

While doing this I also unified the sendmsg and sendfile handlers
a bit. Originally the sendfile call was optimized for never
touching the data. I've decided for a first submission to drop
this optimization and we can add it back later. It introduced
unnecessary complexity, at least for a first posting, for a
use case I have not entirely flushed out yet. When the use
case is deployed we can add it back if needed. Then we can
review concrete performance deltas as well on real-world
use-cases/applications.

Lastly, I reorganized the patches a bit. Now all sockmap
changes are in a single patch and each helper gets its own
patch. This, at least IMO, makes it easier to review because
sockmap changes are not spread across the patch series. On
the other hand now apply_bytes, cork_bytes logic is only
activated later in the series. But that should be OK.

I've propagated ACKs from Dave forward on patches that have
not changed from v1.

Thanks,
John

---

John Fastabend (18):
      sock: make static tls function alloc_sg generic sock helper
      sockmap: convert refcnt to an atomic refcnt
      net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG
      net: generalize sk_alloc_sg to work with scatterlist rings
      bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
      bpf: sockmap, add bpf_msg_apply_bytes() helper
      bpf: sockmap, add msg_cork_bytes() helper
      bpf: sk_msg program helper bpf_sk_msg_pull_data
      bpf: add map tests for BPF_PROG_TYPE_SK_MSG
      bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG
      bpf: sockmap sample, add option to attach SK_MSG program
      bpf: sockmap sample, add sendfile test
      bpf: sockmap sample, add data verification option
      bpf: sockmap, add sample option to test apply_bytes helper
      bpf: sockmap sample support for bpf_msg_cork_bytes()
      bpf: sockmap add SK_DROP tests
      bpf: sockmap sample test for bpf_msg_pull_data
      bpf: sockmap test script


 include/linux/bpf.h                                |    1 
 include/linux/bpf_types.h                          |    1 
 include/linux/filter.h                             |   17 
 include/linux/socket.h                             |    1 
 include/net/sock.h                                 |    4 
 include/uapi/linux/bpf.h                           |   31 +
 include/uapi/linux/bpf_common.h                    |    7 
 kernel/bpf/sockmap.c                               |  735 +++++++++++++++++++-
 kernel/bpf/syscall.c                               |   14 
 kernel/bpf/verifier.c                              |    5 
 net/core/filter.c                                  |  271 +++++++
 net/core/sock.c                                    |   61 ++
 net/ipv4/tcp.c                                     |    4 
 net/tls/tls_sw.c                                   |   69 --
 samples/bpf/bpf_load.c                             |    8 
 samples/sockmap/sockmap_kern.c                     |  197 +++++
 samples/sockmap/sockmap_test.sh                    |  450 ++++++++++++
 samples/sockmap/sockmap_user.c                     |  301 ++++++++
 tools/include/uapi/linux/bpf.h                     |   30 +
 tools/lib/bpf/libbpf.c                             |    1 
 tools/testing/selftests/bpf/Makefile               |    2 
 tools/testing/selftests/bpf/bpf_helpers.h          |   10 
 tools/testing/selftests/bpf/sockmap_parse_prog.c   |   15 
 tools/testing/selftests/bpf/sockmap_verdict_prog.c |    7 
 tools/testing/selftests/bpf/test_maps.c            |   55 +
 tools/testing/selftests/bpf/test_verifier.c        |   54 +
 26 files changed, 2214 insertions(+), 137 deletions(-)
 create mode 100755 samples/sockmap/sockmap_test.sh

--
Signature

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 01/18] sock: make static tls function alloc_sg generic sock helper
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 02/18] sockmap: convert refcnt to an atomic refcnt John Fastabend
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

The TLS ULP module builds scatterlists from a sock using
page_frag_refill(). This is going to be useful for other ULPs
so move it into sock file for more general use.

In the process remove useless goto at end of while loop.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/sock.h |    4 +++
 net/core/sock.c    |   56 ++++++++++++++++++++++++++++++++++++++++++
 net/tls/tls_sw.c   |   69 +++++-----------------------------------------------
 3 files changed, 67 insertions(+), 62 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b962458..447150c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2141,6 +2141,10 @@ static inline struct page_frag *sk_page_frag(struct sock *sk)
 
 bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
 
+int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
+		int *sg_num_elem, unsigned int *sg_size,
+		int first_coalesce);
+
 /*
  *	Default write policy as shown to user space via poll/select/SIGIO
  */
diff --git a/net/core/sock.c b/net/core/sock.c
index 507d8c6..4bda3e9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2238,6 +2238,62 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 }
 EXPORT_SYMBOL(sk_page_frag_refill);
 
+int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
+		int *sg_num_elem, unsigned int *sg_size,
+		int first_coalesce)
+{
+	struct page_frag *pfrag;
+	unsigned int size = *sg_size;
+	int num_elem = *sg_num_elem, use = 0, rc = 0;
+	struct scatterlist *sge;
+	unsigned int orig_offset;
+
+	len -= size;
+	pfrag = sk_page_frag(sk);
+
+	while (len > 0) {
+		if (!sk_page_frag_refill(sk, pfrag)) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		use = min_t(int, len, pfrag->size - pfrag->offset);
+
+		if (!sk_wmem_schedule(sk, use)) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		sk_mem_charge(sk, use);
+		size += use;
+		orig_offset = pfrag->offset;
+		pfrag->offset += use;
+
+		sge = sg + num_elem - 1;
+		if (num_elem > first_coalesce && sg_page(sg) == pfrag->page &&
+		    sg->offset + sg->length == orig_offset) {
+			sg->length += use;
+		} else {
+			sge++;
+			sg_unmark_end(sge);
+			sg_set_page(sge, pfrag->page, use, orig_offset);
+			get_page(pfrag->page);
+			++num_elem;
+			if (num_elem == MAX_SKB_FRAGS) {
+				rc = -ENOSPC;
+				break;
+			}
+		}
+
+		len -= use;
+	}
+out:
+	*sg_size = size;
+	*sg_num_elem = num_elem;
+	return rc;
+}
+EXPORT_SYMBOL(sk_alloc_sg);
+
 static void __lock_sock(struct sock *sk)
 	__releases(&sk->sk_lock.slock)
 	__acquires(&sk->sk_lock.slock)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f26376e..0fc8a24 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -87,71 +87,16 @@ static void trim_both_sgl(struct sock *sk, int target_size)
 		target_size);
 }
 
-static int alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
-		    int *sg_num_elem, unsigned int *sg_size,
-		    int first_coalesce)
-{
-	struct page_frag *pfrag;
-	unsigned int size = *sg_size;
-	int num_elem = *sg_num_elem, use = 0, rc = 0;
-	struct scatterlist *sge;
-	unsigned int orig_offset;
-
-	len -= size;
-	pfrag = sk_page_frag(sk);
-
-	while (len > 0) {
-		if (!sk_page_frag_refill(sk, pfrag)) {
-			rc = -ENOMEM;
-			goto out;
-		}
-
-		use = min_t(int, len, pfrag->size - pfrag->offset);
-
-		if (!sk_wmem_schedule(sk, use)) {
-			rc = -ENOMEM;
-			goto out;
-		}
-
-		sk_mem_charge(sk, use);
-		size += use;
-		orig_offset = pfrag->offset;
-		pfrag->offset += use;
-
-		sge = sg + num_elem - 1;
-		if (num_elem > first_coalesce && sg_page(sg) == pfrag->page &&
-		    sg->offset + sg->length == orig_offset) {
-			sg->length += use;
-		} else {
-			sge++;
-			sg_unmark_end(sge);
-			sg_set_page(sge, pfrag->page, use, orig_offset);
-			get_page(pfrag->page);
-			++num_elem;
-			if (num_elem == MAX_SKB_FRAGS) {
-				rc = -ENOSPC;
-				break;
-			}
-		}
-
-		len -= use;
-	}
-	goto out;
-
-out:
-	*sg_size = size;
-	*sg_num_elem = num_elem;
-	return rc;
-}
-
 static int alloc_encrypted_sg(struct sock *sk, int len)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
 	int rc = 0;
 
-	rc = alloc_sg(sk, len, ctx->sg_encrypted_data,
-		      &ctx->sg_encrypted_num_elem, &ctx->sg_encrypted_size, 0);
+	rc = sk_alloc_sg(sk, len,
+			 ctx->sg_encrypted_data,
+			 &ctx->sg_encrypted_num_elem,
+			 &ctx->sg_encrypted_size, 0);
 
 	return rc;
 }
@@ -162,9 +107,9 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
 	struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
 	int rc = 0;
 
-	rc = alloc_sg(sk, len, ctx->sg_plaintext_data,
-		      &ctx->sg_plaintext_num_elem, &ctx->sg_plaintext_size,
-		      tls_ctx->pending_open_record_frags);
+	rc = sk_alloc_sg(sk, len, ctx->sg_plaintext_data,
+			 &ctx->sg_plaintext_num_elem, &ctx->sg_plaintext_size,
+			 tls_ctx->pending_open_record_frags);
 
 	return rc;
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 02/18] sockmap: convert refcnt to an atomic refcnt
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 01/18] sock: make static tls function alloc_sg generic sock helper John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

The sockmap refcnt up until now has been wrapped in the
sk_callback_lock(). So its not actually needed any locking of its
own. The counter itself tracks the lifetime of the psock object.
Sockets in a sockmap have a lifetime that is independent of the
map they are part of. This is possible because a single socket may
be in multiple maps. When this happens we can only release the
psock data associated with the socket when the refcnt reaches
zero. There are three possible delete sock reference decrement
paths first through the normal sockmap process, the user deletes
the socket from the map. Second the map is removed and all sockets
in the map are removed, delete path is similar to case 1. The third
case is an asyncronous socket event such as a closing the socket. The
last case handles removing sockets that are no longer available.
For completeness, although inc does not pose any problems in this
patch series, the inc case only happens when a psock is added to a
map.

Next we plan to add another socket prog type to handle policy and
monitoring on the TX path. When we do this however we will need to
keep a reference count open across the sendmsg/sendpage call and
holding the sk_callback_lock() here (on every send) seems less than
ideal, also it may sleep in cases where we hit memory pressure.
Instead of dealing with these issues in some clever way simply make
the reference counting a refcnt_t type and do proper atomic ops.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 kernel/bpf/sockmap.c |   23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index a927e89..051b2242 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -62,8 +62,7 @@ struct smap_psock_map_entry {
 
 struct smap_psock {
 	struct rcu_head	rcu;
-	/* refcnt is used inside sk_callback_lock */
-	u32 refcnt;
+	refcount_t refcnt;
 
 	/* datapath variables */
 	struct sk_buff_head rxqueue;
@@ -373,15 +372,13 @@ static void smap_destroy_psock(struct rcu_head *rcu)
 
 static void smap_release_sock(struct smap_psock *psock, struct sock *sock)
 {
-	psock->refcnt--;
-	if (psock->refcnt)
-		return;
-
-	tcp_cleanup_ulp(sock);
-	smap_stop_sock(psock, sock);
-	clear_bit(SMAP_TX_RUNNING, &psock->state);
-	rcu_assign_sk_user_data(sock, NULL);
-	call_rcu_sched(&psock->rcu, smap_destroy_psock);
+	if (refcount_dec_and_test(&psock->refcnt)) {
+		tcp_cleanup_ulp(sock);
+		smap_stop_sock(psock, sock);
+		clear_bit(SMAP_TX_RUNNING, &psock->state);
+		rcu_assign_sk_user_data(sock, NULL);
+		call_rcu_sched(&psock->rcu, smap_destroy_psock);
+	}
 }
 
 static int smap_parse_func_strparser(struct strparser *strp,
@@ -511,7 +508,7 @@ static struct smap_psock *smap_init_psock(struct sock *sock,
 	INIT_WORK(&psock->tx_work, smap_tx_work);
 	INIT_WORK(&psock->gc_work, smap_gc_work);
 	INIT_LIST_HEAD(&psock->maps);
-	psock->refcnt = 1;
+	refcount_set(&psock->refcnt, 1);
 
 	rcu_assign_sk_user_data(sock, psock);
 	sock_hold(sock);
@@ -772,7 +769,7 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 			err = -EBUSY;
 			goto out_progs;
 		}
-		psock->refcnt++;
+		refcount_inc(&psock->refcnt);
 	} else {
 		psock = smap_init_psock(sock, stab);
 		if (IS_ERR(psock)) {

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 01/18] sock: make static tls function alloc_sg generic sock helper John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 02/18] sockmap: convert refcnt to an atomic refcnt John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:40   ` David Miller
  2018-03-12 19:23 ` [bpf-next PATCH v2 04/18] net: generalize sk_alloc_sg to work with scatterlist rings John Fastabend
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

When calling do_tcp_sendpages() from in kernel and we know the data
has no references from user side we can omit SKBTX_SHARED_FRAG flag.
This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
to omit setting SKBTX_SHARED_FRAG.

The flag is not exposed to userspace because the sendpage call from
the splice logic masks out all bits except MSG_MORE.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/socket.h |    1 +
 net/ipv4/tcp.c         |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 1ce1f76..60e0148 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -287,6 +287,7 @@ struct ucred {
 #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
 #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
 #define MSG_EOF         MSG_FIN
+#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
 
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a335397..ff8a8d3 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -994,7 +994,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 			get_page(page);
 			skb_fill_page_desc(skb, i, page, offset, copy);
 		}
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+
+		if (!(flags & MSG_NO_SHARED_FRAGS))
+			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
 
 		skb->len += copy;
 		skb->data_len += copy;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 04/18] net: generalize sk_alloc_sg to work with scatterlist rings
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (2 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-12 19:23 ` [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

The current implementation of sk_alloc_sg expects scatterlist to always
start at entry 0 and complete at entry MAX_SKB_FRAGS.

Future patches will want to support starting at arbitrary offset into
scatterlist so add an additional sg_start parameters and then default
to the current values in TLS code paths.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/sock.h |    2 +-
 net/core/sock.c    |   27 ++++++++++++++++-----------
 net/tls/tls_sw.c   |    4 ++--
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 447150c..b7c75e0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2142,7 +2142,7 @@ static inline struct page_frag *sk_page_frag(struct sock *sk)
 bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
 
 int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
-		int *sg_num_elem, unsigned int *sg_size,
+		int sg_start, int *sg_curr, unsigned int *sg_size,
 		int first_coalesce);
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index 4bda3e9..d14f64b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2239,19 +2239,20 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 EXPORT_SYMBOL(sk_page_frag_refill);
 
 int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
-		int *sg_num_elem, unsigned int *sg_size,
+		int sg_start, int *sg_curr_index, unsigned int *sg_curr_size,
 		int first_coalesce)
 {
+	int sg_curr = *sg_curr_index, use = 0, rc = 0;
+	unsigned int size = *sg_curr_size;
 	struct page_frag *pfrag;
-	unsigned int size = *sg_size;
-	int num_elem = *sg_num_elem, use = 0, rc = 0;
 	struct scatterlist *sge;
-	unsigned int orig_offset;
 
 	len -= size;
 	pfrag = sk_page_frag(sk);
 
 	while (len > 0) {
+		unsigned int orig_offset;
+
 		if (!sk_page_frag_refill(sk, pfrag)) {
 			rc = -ENOMEM;
 			goto out;
@@ -2269,17 +2270,21 @@ int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
 		orig_offset = pfrag->offset;
 		pfrag->offset += use;
 
-		sge = sg + num_elem - 1;
-		if (num_elem > first_coalesce && sg_page(sg) == pfrag->page &&
+		sge = sg + sg_curr - 1;
+		if (sg_curr > first_coalesce && sg_page(sg) == pfrag->page &&
 		    sg->offset + sg->length == orig_offset) {
 			sg->length += use;
 		} else {
-			sge++;
+			sge = sg + sg_curr;
 			sg_unmark_end(sge);
 			sg_set_page(sge, pfrag->page, use, orig_offset);
 			get_page(pfrag->page);
-			++num_elem;
-			if (num_elem == MAX_SKB_FRAGS) {
+			sg_curr++;
+
+			if (sg_curr == MAX_SKB_FRAGS)
+				sg_curr = 0;
+
+			if (sg_curr == sg_start) {
 				rc = -ENOSPC;
 				break;
 			}
@@ -2288,8 +2293,8 @@ int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
 		len -= use;
 	}
 out:
-	*sg_size = size;
-	*sg_num_elem = num_elem;
+	*sg_curr_size = size;
+	*sg_curr_index = sg_curr;
 	return rc;
 }
 EXPORT_SYMBOL(sk_alloc_sg);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0fc8a24..057a558 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -94,7 +94,7 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
 	int rc = 0;
 
 	rc = sk_alloc_sg(sk, len,
-			 ctx->sg_encrypted_data,
+			 ctx->sg_encrypted_data, 0,
 			 &ctx->sg_encrypted_num_elem,
 			 &ctx->sg_encrypted_size, 0);
 
@@ -107,7 +107,7 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
 	struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
 	int rc = 0;
 
-	rc = sk_alloc_sg(sk, len, ctx->sg_plaintext_data,
+	rc = sk_alloc_sg(sk, len, ctx->sg_plaintext_data, 0,
 			 &ctx->sg_plaintext_num_elem, &ctx->sg_plaintext_size,
 			 tls_ctx->pending_open_record_frags);
 

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (3 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 04/18] net: generalize sk_alloc_sg to work with scatterlist rings John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:41   ` David Miller
  2018-03-15 21:59   ` Alexei Starovoitov
  2018-03-12 19:23 ` [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

This implements a BPF ULP layer to allow policy enforcement and
monitoring at the socket layer. In order to support this a new
program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
the sendmsg/sendpage hook. To attach the policy to sockets a
sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.

Similar to previous sockmap usages when a sock is added to a
sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
program type attached then the BPF ULP layer is created on the
socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
every msg in sendmsg case and page/offset in sendpage case.

BPF_PROG_TYPE_SK_MSG Semantics/API:

BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and
SK_DROP. Returning SK_DROP free's the copied data in the sendmsg
case and in the sendpage case leaves the data untouched. Both cases
return -EACESS to the user. Returning SK_PASS will allow the msg to
be sent.

In the sendmsg case data is copied into kernel space buffers before
running the BPF program. The kernel space buffers are stored in a
scatterlist object where each element is a kernel memory buffer.
Some effort is made to coalesce data from the sendmsg call here.
For example a sendmsg call with many one byte iov entries will
likely be pushed into a single entry. The BPF program is run with
data pointers (start/end) pointing to the first sg element.

In the sendpage case data is not copied. We opt not to copy the
data by default here, because the BPF infrastructure does not
know what bytes will be needed nor when they will be needed. So
copying all bytes may be wasteful. Because of this the initial
start/end data pointers are (0,0). Meaning no data can be read or
written. This avoids reading data that may be modified by the
user. A new helper is added later in this series if reading and
writing the data is needed. The helper call will do a copy by
default so that the page is exclusively owned by the BPF call.

The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg
in the sendmsg() case and the entire page/offset in the sendpage case.
This avoids ambiguity on how to handle mixed return codes in the
sendmsg case. Again a helper is added later in the series if
a verdict needs to apply to multiple system calls and/or only
a subpart of the currently being processed message.

The helper msg_redirect_map() can be used to select the socket to
send the data on. This is used similar to existing redirect use
cases. This allows policy to redirect msgs.

Pseudo code simple example:

The basic logic to attach a program to a socket is as follows,

  // load the programs
  bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
		&obj, &msg_prog);

  // lookup the sockmap
  bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");

  // get fd for sockmap
  map_fd_msg = bpf_map__fd(bpf_map_msg);

  // attach program to sockmap
  bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);

Adding sockets to the map is done in the normal way,

  // Add a socket 'fd' to sockmap at location 'i'
  bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);

After the above any socket attached to "my_sock_map", in this case
'fd', will run the BPF msg verdict program (msg_prog) on every
sendmsg and sendpage system call.

For a complete example see BPF selftests or sockmap samples.

Implementation notes:

It seemed the simplest, to me at least, to use a refcnt to ensure
psock is not lost across the sendmsg copy into the sg, the bpf program
running on the data in sg_data, and the final pass to the TCP stack.
Some performance testing may show a better method to do this and avoid
the refcnt cost, but for now use the simpler method.

Another item that will come after basic support is in place is
supporting MSG_MORE flag. At the moment we call sendpages even if
the MSG_MORE flag is set. An enhancement would be to collect the
pages into a larger scatterlist and pass down the stack. Notice that
bpf_tcp_sendmsg() could support this with some additional state saved
across sendmsg calls. I built the code to support this without having
to do refactoring work. Other features TBD include ZEROCOPY and the
TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series
shortly.

Future work could improve size limits on the scatterlist rings used
here. Currently, we use MAX_SKB_FRAGS simply because this was being
used already in the TLS case. Future work could extend the kernel sk
APIs to tune this depending on workload. This is a trade-off
between memory usage and throughput performance.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/bpf.h       |    1 
 include/linux/bpf_types.h |    1 
 include/linux/filter.h    |   17 +
 include/uapi/linux/bpf.h  |   28 ++
 kernel/bpf/sockmap.c      |  714 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/syscall.c      |   14 +
 kernel/bpf/verifier.c     |    5 
 net/core/filter.c         |  106 +++++++
 8 files changed, 863 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 66df387..819229c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -21,6 +21,7 @@
 struct perf_event;
 struct bpf_prog;
 struct bpf_map;
+struct sock;
 
 /* map is generic key/value storage optionally accesible by eBPF programs */
 struct bpf_map_ops {
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 19b8349..5e2e8a4 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -13,6 +13,7 @@
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
 #endif
 #ifdef CONFIG_BPF_EVENTS
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fdb691b..8afb723 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -507,6 +507,22 @@ struct xdp_buff {
 	struct xdp_rxq_info *rxq;
 };
 
+struct sk_msg_buff {
+	void *data;
+	void *data_end;
+	int apply_bytes;
+	int cork_bytes;
+	int sg_copybreak;
+	int sg_start;
+	int sg_curr;
+	int sg_end;
+	struct scatterlist sg_data[MAX_SKB_FRAGS];
+	bool sg_copy[MAX_SKB_FRAGS];
+	__u32 key;
+	__u32 flags;
+	struct bpf_map *map;
+};
+
 /* Compute the linear packet data range [data, data_end) which
  * will be accessed by various program types (cls_bpf, act_bpf,
  * lwt, ...). Subsystems allowing direct data access must (!)
@@ -771,6 +787,7 @@ int xdp_do_redirect(struct net_device *dev,
 void bpf_warn_invalid_xdp_action(u32 act);
 
 struct sock *do_sk_redirect_map(struct sk_buff *skb);
+struct sock *do_msg_redirect_map(struct sk_msg_buff *md);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2a66769..b8275f0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -133,6 +133,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SOCK_OPS,
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
+	BPF_PROG_TYPE_SK_MSG,
 };
 
 enum bpf_attach_type {
@@ -143,6 +144,7 @@ enum bpf_attach_type {
 	BPF_SK_SKB_STREAM_PARSER,
 	BPF_SK_SKB_STREAM_VERDICT,
 	BPF_CGROUP_DEVICE,
+	BPF_SK_MSG_VERDICT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -696,6 +698,15 @@ enum bpf_attach_type {
  * int bpf_override_return(pt_regs, rc)
  *	@pt_regs: pointer to struct pt_regs
  *	@rc: the return value to set
+ *
+ * int bpf_msg_redirect_map(map, key, flags)
+ *     Redirect msg to a sock in map using key as a lookup key for the
+ *     sock in map.
+ *     @map: pointer to sockmap
+ *     @key: key to lookup sock in map
+ *     @flags: reserved for future use
+ *     Return: SK_PASS
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -757,7 +768,8 @@ enum bpf_attach_type {
 	FN(perf_prog_read_value),	\
 	FN(getsockopt),			\
 	FN(override_return),		\
-	FN(sock_ops_cb_flags_set),
+	FN(sock_ops_cb_flags_set),	\
+	FN(msg_redirect_map),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -920,6 +932,20 @@ enum sk_action {
 	SK_PASS,
 };
 
+/* User return codes for SK_MSG prog type. */
+enum sk_msg_action {
+	SK_MSG_DROP = 0,
+	SK_MSG_PASS,
+};
+
+/* user accessible metadata for SK_MSG packet hook, new fields must
+ * be added to the end of this structure
+ */
+struct sk_msg_md {
+	__u32 data;
+	__u32 data_end;
+};
+
 #define BPF_TAG_SIZE	8
 
 struct bpf_prog_info {
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 051b2242..374cfd4 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -38,6 +38,7 @@
 #include <linux/skbuff.h>
 #include <linux/workqueue.h>
 #include <linux/list.h>
+#include <linux/mm.h>
 #include <net/strparser.h>
 #include <net/tcp.h>
 
@@ -47,6 +48,7 @@
 struct bpf_stab {
 	struct bpf_map map;
 	struct sock **sock_map;
+	struct bpf_prog *bpf_tx_msg;
 	struct bpf_prog *bpf_parse;
 	struct bpf_prog *bpf_verdict;
 };
@@ -73,7 +75,16 @@ struct smap_psock {
 	int save_off;
 	struct sk_buff *save_skb;
 
+	/* datapath variables for tx_msg ULP */
+	struct sock *sk_redir;
+	int apply_bytes;
+	int cork_bytes;
+	int sg_size;
+	int eval;
+	struct sk_msg_buff *cork;
+
 	struct strparser strp;
+	struct bpf_prog *bpf_tx_msg;
 	struct bpf_prog *bpf_parse;
 	struct bpf_prog *bpf_verdict;
 	struct list_head maps;
@@ -91,6 +102,11 @@ struct smap_psock {
 	void (*save_write_space)(struct sock *sk);
 };
 
+static void smap_release_sock(struct smap_psock *psock, struct sock *sock);
+static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
+static int bpf_tcp_sendpage(struct sock *sk, struct page *page,
+			    int offset, size_t size, int flags);
+
 static inline struct smap_psock *smap_psock_sk(const struct sock *sk)
 {
 	return rcu_dereference_sk_user_data(sk);
@@ -115,27 +131,41 @@ static int bpf_tcp_init(struct sock *sk)
 
 	psock->save_close = sk->sk_prot->close;
 	psock->sk_proto = sk->sk_prot;
+
+	if (psock->bpf_tx_msg) {
+		tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
+		tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
+	}
+
 	sk->sk_prot = &tcp_bpf_proto;
 	rcu_read_unlock();
 	return 0;
 }
 
+static void smap_release_sock(struct smap_psock *psock, struct sock *sock);
+static int free_start_sg(struct sock *sk, struct sk_msg_buff *md);
+
 static void bpf_tcp_release(struct sock *sk)
 {
 	struct smap_psock *psock;
 
 	rcu_read_lock();
 	psock = smap_psock_sk(sk);
+	if (unlikely(!psock))
+		goto out;
 
-	if (likely(psock)) {
-		sk->sk_prot = psock->sk_proto;
-		psock->sk_proto = NULL;
+	if (psock->cork) {
+		free_start_sg(psock->sock, psock->cork);
+		kfree(psock->cork);
+		psock->cork = NULL;
 	}
+
+	sk->sk_prot = psock->sk_proto;
+	psock->sk_proto = NULL;
+out:
 	rcu_read_unlock();
 }
 
-static void smap_release_sock(struct smap_psock *psock, struct sock *sock);
-
 static void bpf_tcp_close(struct sock *sk, long timeout)
 {
 	void (*close_fun)(struct sock *sk, long timeout);
@@ -174,6 +204,7 @@ enum __sk_action {
 	__SK_DROP = 0,
 	__SK_PASS,
 	__SK_REDIRECT,
+	__SK_NONE,
 };
 
 static struct tcp_ulp_ops bpf_tcp_ulp_ops __read_mostly = {
@@ -185,10 +216,621 @@ enum __sk_action {
 	.release	= bpf_tcp_release,
 };
 
+static int memcopy_from_iter(struct sock *sk,
+			     struct sk_msg_buff *md,
+			     struct iov_iter *from, int bytes)
+{
+	struct scatterlist *sg = md->sg_data;
+	int i = md->sg_curr, rc = -ENOSPC;
+
+	do {
+		int copy;
+		char *to;
+
+		if (md->sg_copybreak >= sg[i].length) {
+			md->sg_copybreak = 0;
+
+			if (++i == MAX_SKB_FRAGS)
+				i = 0;
+
+			if (i == md->sg_end)
+				break;
+		}
+
+		copy = sg[i].length - md->sg_copybreak;
+		to = sg_virt(&sg[i]) + md->sg_copybreak;
+		md->sg_copybreak += copy;
+
+		if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY)
+			rc = copy_from_iter_nocache(to, copy, from);
+		else
+			rc = copy_from_iter(to, copy, from);
+
+		if (rc != copy) {
+			rc = -EFAULT;
+			goto out;
+		}
+
+		bytes -= copy;
+		if (!bytes)
+			break;
+
+		md->sg_copybreak = 0;
+		if (++i == MAX_SKB_FRAGS)
+			i = 0;
+	} while (i != md->sg_end);
+out:
+	md->sg_curr = i;
+	return rc;
+}
+
+static int bpf_tcp_push(struct sock *sk, int apply_bytes,
+			struct sk_msg_buff *md,
+			int flags, bool uncharge)
+{
+	bool apply = apply_bytes;
+	struct scatterlist *sg;
+	int offset, ret = 0;
+	struct page *p;
+	size_t size;
+
+	while (1) {
+		sg = md->sg_data + md->sg_start;
+		size = (apply && apply_bytes < sg->length) ?
+			apply_bytes : sg->length;
+		offset = sg->offset;
+
+		tcp_rate_check_app_limited(sk);
+		p = sg_page(sg);
+retry:
+		ret = do_tcp_sendpages(sk, p, offset, size, flags);
+		if (ret != size) {
+			if (ret > 0) {
+				if (apply)
+					apply_bytes -= ret;
+				size -= ret;
+				offset += ret;
+				if (uncharge)
+					sk_mem_uncharge(sk, ret);
+				goto retry;
+			}
+
+			sg->length = size;
+			sg->offset = offset;
+			return ret;
+		}
+
+		if (apply)
+			apply_bytes -= ret;
+		sg->offset += ret;
+		sg->length -= ret;
+		if (uncharge)
+			sk_mem_uncharge(sk, ret);
+
+		if (!sg->length) {
+			put_page(p);
+			md->sg_start++;
+			if (md->sg_start == MAX_SKB_FRAGS)
+				md->sg_start = 0;
+			memset(sg, 0, sizeof(*sg));
+
+			if (md->sg_start == md->sg_end)
+				break;
+		}
+
+		if (apply && !apply_bytes)
+			break;
+	}
+	return 0;
+}
+
+static inline void bpf_compute_data_pointers_sg(struct sk_msg_buff *md)
+{
+	struct scatterlist *sg = md->sg_data + md->sg_start;
+
+	if (md->sg_copy[md->sg_start]) {
+		md->data = md->data_end = 0;
+	} else {
+		md->data = sg_virt(sg);
+		md->data_end = md->data + sg->length;
+	}
+}
+
+static void return_mem_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
+{
+	struct scatterlist *sg = md->sg_data;
+	int i = md->sg_start;
+
+	do {
+		int uncharge = (bytes < sg[i].length) ? bytes : sg[i].length;
+
+		sk_mem_uncharge(sk, uncharge);
+		bytes -= uncharge;
+		if (!bytes)
+			break;
+		i++;
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	} while (i != md->sg_end);
+}
+
+static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
+{
+	struct scatterlist *sg = md->sg_data;
+	int i = md->sg_start, free;
+
+	while (bytes && sg[i].length) {
+		free = sg[i].length;
+		if (bytes < free) {
+			sg[i].length -= bytes;
+			sg[i].offset += bytes;
+			sk_mem_uncharge(sk, bytes);
+			break;
+		}
+
+		sk_mem_uncharge(sk, sg[i].length);
+		put_page(sg_page(&sg[i]));
+		bytes -= sg[i].length;
+		sg[i].length = 0;
+		sg[i].page_link = 0;
+		sg[i].offset = 0;
+		i++;
+
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	}
+}
+
+static int free_sg(struct sock *sk, int start, struct sk_msg_buff *md)
+{
+	struct scatterlist *sg = md->sg_data;
+	int i = start, free = 0;
+
+	while (sg[i].length) {
+		free += sg[i].length;
+		sk_mem_uncharge(sk, sg[i].length);
+		put_page(sg_page(&sg[i]));
+		sg[i].length = 0;
+		sg[i].page_link = 0;
+		sg[i].offset = 0;
+		i++;
+
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	}
+
+	return free;
+}
+
+static int free_start_sg(struct sock *sk, struct sk_msg_buff *md)
+{
+	int free = free_sg(sk, md->sg_start, md);
+
+	md->sg_start = md->sg_end;
+	return free;
+}
+
+static int free_curr_sg(struct sock *sk, struct sk_msg_buff *md)
+{
+	return free_sg(sk, md->sg_curr, md);
+}
+
+static int bpf_map_msg_verdict(int _rc, struct sk_msg_buff *md)
+{
+	return ((_rc == SK_PASS) ?
+	       (md->map ? __SK_REDIRECT : __SK_PASS) :
+	       __SK_DROP);
+}
+
+static unsigned int smap_do_tx_msg(struct sock *sk,
+				   struct smap_psock *psock,
+				   struct sk_msg_buff *md)
+{
+	struct bpf_prog *prog;
+	unsigned int rc, _rc;
+
+	preempt_disable();
+	rcu_read_lock();
+
+	/* If the policy was removed mid-send then default to 'accept' */
+	prog = READ_ONCE(psock->bpf_tx_msg);
+	if (unlikely(!prog)) {
+		_rc = SK_PASS;
+		goto verdict;
+	}
+
+	bpf_compute_data_pointers_sg(md);
+	rc = (*prog->bpf_func)(md, prog->insnsi);
+	psock->apply_bytes = md->apply_bytes;
+
+	/* Moving return codes from UAPI namespace into internal namespace */
+	_rc = bpf_map_msg_verdict(rc, md);
+
+	/* The psock has a refcount on the sock but not on the map and because
+	 * we need to drop rcu read lock here its possible the map could be
+	 * removed between here and when we need it to execute the sock
+	 * redirect. So do the map lookup now for future use.
+	 */
+	if (_rc == __SK_REDIRECT) {
+		if (psock->sk_redir)
+			sock_put(psock->sk_redir);
+		psock->sk_redir = do_msg_redirect_map(md);
+		if (!psock->sk_redir) {
+			_rc = __SK_DROP;
+			goto verdict;
+		}
+		sock_hold(psock->sk_redir);
+	}
+verdict:
+	rcu_read_unlock();
+	preempt_enable();
+
+	return _rc;
+}
+
+static int bpf_tcp_sendmsg_do_redirect(struct sock *sk, int send,
+				       struct sk_msg_buff *md,
+				       int flags)
+{
+	struct smap_psock *psock;
+	struct scatterlist *sg;
+	int i, err, free = 0;
+
+	sg = md->sg_data;
+
+	rcu_read_lock();
+	psock = smap_psock_sk(sk);
+	if (unlikely(!psock))
+		goto out_rcu;
+
+	if (!refcount_inc_not_zero(&psock->refcnt))
+		goto out_rcu;
+
+	rcu_read_unlock();
+	lock_sock(sk);
+	err = bpf_tcp_push(sk, send, md, flags, false);
+	release_sock(sk);
+	smap_release_sock(psock, sk);
+	if (unlikely(err))
+		goto out;
+	return 0;
+out_rcu:
+	rcu_read_unlock();
+out:
+	i = md->sg_start;
+	while (sg[i].length) {
+		free += sg[i].length;
+		put_page(sg_page(&sg[i]));
+		sg[i].length = 0;
+		i++;
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	}
+	return free;
+}
+
+static inline void bpf_md_init(struct smap_psock *psock)
+{
+	if (!psock->apply_bytes) {
+		psock->eval =  __SK_NONE;
+		if (psock->sk_redir) {
+			sock_put(psock->sk_redir);
+			psock->sk_redir = NULL;
+		}
+	}
+}
+
+static void apply_bytes_dec(struct smap_psock *psock, int i)
+{
+	if (psock->apply_bytes) {
+		if (psock->apply_bytes < i)
+			psock->apply_bytes = 0;
+		else
+			psock->apply_bytes -= i;
+	}
+}
+
+static int bpf_exec_tx_verdict(struct smap_psock *psock,
+			       struct sk_msg_buff *m,
+			       struct sock *sk,
+			       int *copied, int flags)
+{
+	bool cork = false, enospc = (m->sg_start == m->sg_end);
+	struct sock *redir;
+	int err = 0;
+	int send;
+
+more_data:
+	if (psock->eval == __SK_NONE)
+		psock->eval = smap_do_tx_msg(sk, psock, m);
+
+	if (m->cork_bytes &&
+	    m->cork_bytes > psock->sg_size && !enospc) {
+		psock->cork_bytes = m->cork_bytes - psock->sg_size;
+		if (!psock->cork) {
+			psock->cork = kcalloc(1,
+					sizeof(struct sk_msg_buff),
+					GFP_ATOMIC | __GFP_NOWARN);
+
+			if (!psock->cork) {
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+		memcpy(psock->cork, m, sizeof(*m));
+		goto out_err;
+	}
+
+	send = psock->sg_size;
+	if (psock->apply_bytes && psock->apply_bytes < send)
+		send = psock->apply_bytes;
+
+	switch (psock->eval) {
+	case __SK_PASS:
+		err = bpf_tcp_push(sk, send, m, flags, true);
+		if (unlikely(err)) {
+			*copied -= free_start_sg(sk, m);
+			break;
+		}
+
+		apply_bytes_dec(psock, send);
+		psock->sg_size -= send;
+		break;
+	case __SK_REDIRECT:
+		redir = psock->sk_redir;
+		apply_bytes_dec(psock, send);
+
+		if (psock->cork) {
+			cork = true;
+			psock->cork = NULL;
+		}
+
+		return_mem_sg(sk, send, m);
+		release_sock(sk);
+
+		err = bpf_tcp_sendmsg_do_redirect(redir, send, m, flags);
+		lock_sock(sk);
+
+		if (cork) {
+			free_start_sg(sk, m);
+			kfree(m);
+			m = NULL;
+		}
+		if (unlikely(err))
+			*copied -= err;
+		else
+			psock->sg_size -= send;
+		break;
+	case __SK_DROP:
+	default:
+		free_bytes_sg(sk, send, m);
+		apply_bytes_dec(psock, send);
+		*copied -= send;
+		psock->sg_size -= send;
+		err = -EACCES;
+		break;
+	}
+
+	if (likely(!err)) {
+		bpf_md_init(psock);
+		if (m &&
+		    m->sg_data[m->sg_start].page_link &&
+		    m->sg_data[m->sg_start].length)
+			goto more_data;
+	}
+
+out_err:
+	return err;
+}
+
+static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+{
+	int flags = msg->msg_flags | MSG_NO_SHARED_FRAGS;
+	struct sk_msg_buff md = {0};
+	unsigned int sg_copy = 0;
+	struct smap_psock *psock;
+	int copied = 0, err = 0;
+	struct scatterlist *sg;
+	long timeo;
+
+	/* Its possible a sock event or user removed the psock _but_ the ops
+	 * have not been reprogrammed yet so we get here. In this case fallback
+	 * to tcp_sendmsg. Note this only works because we _only_ ever allow
+	 * a single ULP there is no hierarchy here.
+	 */
+	rcu_read_lock();
+	psock = smap_psock_sk(sk);
+	if (unlikely(!psock)) {
+		rcu_read_unlock();
+		return tcp_sendmsg(sk, msg, size);
+	}
+
+	/* Increment the psock refcnt to ensure its not released while sending a
+	 * message. Required because sk lookup and bpf programs are used in
+	 * separate rcu critical sections. Its OK if we lose the map entry
+	 * but we can't lose the sock reference.
+	 */
+	if (!refcount_inc_not_zero(&psock->refcnt)) {
+		rcu_read_unlock();
+		return tcp_sendmsg(sk, msg, size);
+	}
+
+	sg = md.sg_data;
+	sg_init_table(sg, MAX_SKB_FRAGS);
+	rcu_read_unlock();
+
+	lock_sock(sk);
+	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
+
+	while (msg_data_left(msg)) {
+		struct sk_msg_buff *m;
+		bool enospc = false;
+		int copy;
+
+		if (sk->sk_err) {
+			err = sk->sk_err;
+			goto out_err;
+		}
+
+		copy = msg_data_left(msg);
+		if (!sk_stream_memory_free(sk))
+			goto wait_for_sndbuf;
+
+		m = psock->cork_bytes ? psock->cork : &md;
+		m->sg_curr = m->sg_copybreak ? m->sg_curr : m->sg_end;
+		err = sk_alloc_sg(sk, copy, m->sg_data,
+				  m->sg_start, &m->sg_end, &sg_copy,
+				  m->sg_end - 1);
+		if (err) {
+			if (err != -ENOSPC)
+				goto wait_for_memory;
+			enospc = true;
+			copy = sg_copy;
+		}
+
+		err = memcopy_from_iter(sk, m, &msg->msg_iter, copy);
+		if (err < 0) {
+			free_curr_sg(sk, m);
+			goto out_err;
+		}
+
+		psock->sg_size += copy;
+		copied += copy;
+		sg_copy = 0;
+
+		/* When bytes are being corked skip running BPF program and
+		 * applying verdict unless there is no more buffer space. In
+		 * the ENOSPC case simply run BPF prorgram with currently
+		 * accumulated data. We don't have much choice at this point
+		 * we could try extending the page frags or chaining complex
+		 * frags but even in these cases _eventually_ we will hit an
+		 * OOM scenario. More complex recovery schemes may be
+		 * implemented in the future, but BPF programs must handle
+		 * the case where apply_cork requests are not honored. The
+		 * canonical method to verify this is to check data length.
+		 */
+		if (psock->cork_bytes) {
+			if (copy > psock->cork_bytes)
+				psock->cork_bytes = 0;
+			else
+				psock->cork_bytes -= copy;
+
+			if (psock->cork_bytes && !enospc)
+				goto out_cork;
+
+			/* All cork bytes accounted for re-run filter */
+			psock->eval = __SK_NONE;
+			psock->cork_bytes = 0;
+		}
+
+		err = bpf_exec_tx_verdict(psock, m, sk, &copied, flags);
+		if (unlikely(err < 0))
+			goto out_err;
+		continue;
+wait_for_sndbuf:
+		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+wait_for_memory:
+		err = sk_stream_wait_memory(sk, &timeo);
+		if (err)
+			goto out_err;
+	}
+out_err:
+	if (err < 0)
+		err = sk_stream_error(sk, msg->msg_flags, err);
+out_cork:
+	release_sock(sk);
+	smap_release_sock(psock, sk);
+	return copied ? copied : err;
+}
+
+static int bpf_tcp_sendpage(struct sock *sk, struct page *page,
+			    int offset, size_t size, int flags)
+{
+	struct sk_msg_buff md = {0}, *m = NULL;
+	int err = 0, copied = 0;
+	struct smap_psock *psock;
+	struct scatterlist *sg;
+	bool enospc = false;
+
+	rcu_read_lock();
+	psock = smap_psock_sk(sk);
+	if (unlikely(!psock))
+		goto accept;
+
+	if (!refcount_inc_not_zero(&psock->refcnt))
+		goto accept;
+	rcu_read_unlock();
+
+	lock_sock(sk);
+
+	if (psock->cork_bytes)
+		m = psock->cork;
+	else
+		m = &md;
+
+	/* Catch case where ring is full and sendpage is stalled. */
+	if (unlikely(m->sg_end == m->sg_start &&
+	    m->sg_data[m->sg_end].length))
+		goto out_err;
+
+	psock->sg_size += size;
+	sg = &m->sg_data[m->sg_end];
+	sg_set_page(sg, page, size, offset);
+	get_page(page);
+	m->sg_copy[m->sg_end] = true;
+	sk_mem_charge(sk, size);
+	m->sg_end++;
+	copied = size;
+
+	if (m->sg_end == MAX_SKB_FRAGS)
+		m->sg_end = 0;
+
+	if (m->sg_end == m->sg_start)
+		enospc = true;
+
+	if (psock->cork_bytes) {
+		if (size > psock->cork_bytes)
+			psock->cork_bytes = 0;
+		else
+			psock->cork_bytes -= size;
+
+		if (psock->cork_bytes && !enospc)
+			goto out_err;
+
+		/* All cork bytes accounted for re-run filter */
+		psock->eval = __SK_NONE;
+		psock->cork_bytes = 0;
+	}
+
+	err = bpf_exec_tx_verdict(psock, m, sk, &copied, flags);
+out_err:
+	release_sock(sk);
+	smap_release_sock(psock, sk);
+	return copied ? copied : err;
+accept:
+	rcu_read_unlock();
+	return tcp_sendpage(sk, page, offset, size, flags);
+}
+
+static void bpf_tcp_msg_add(struct smap_psock *psock,
+			    struct sock *sk,
+			    struct bpf_prog *tx_msg)
+{
+	struct bpf_prog *orig_tx_msg;
+
+	orig_tx_msg = xchg(&psock->bpf_tx_msg, tx_msg);
+	if (orig_tx_msg)
+		bpf_prog_put(orig_tx_msg);
+}
+
 static int bpf_tcp_ulp_register(void)
 {
 	tcp_bpf_proto = tcp_prot;
 	tcp_bpf_proto.close = bpf_tcp_close;
+	/* Once BPF TX ULP is registered it is never unregistered. It
+	 * will be in the ULP list for the lifetime of the system. Doing
+	 * duplicate registers is not a problem.
+	 */
 	return tcp_register_ulp(&bpf_tcp_ulp_ops);
 }
 
@@ -412,7 +1054,6 @@ static int smap_parse_func_strparser(struct strparser *strp,
 	return rc;
 }
 
-
 static int smap_read_sock_done(struct strparser *strp, int err)
 {
 	return err;
@@ -482,12 +1123,22 @@ static void smap_gc_work(struct work_struct *w)
 		bpf_prog_put(psock->bpf_parse);
 	if (psock->bpf_verdict)
 		bpf_prog_put(psock->bpf_verdict);
+	if (psock->bpf_tx_msg)
+		bpf_prog_put(psock->bpf_tx_msg);
+
+	if (psock->cork) {
+		free_start_sg(psock->sock, psock->cork);
+		kfree(psock->cork);
+	}
 
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
 		list_del(&e->list);
 		kfree(e);
 	}
 
+	if (psock->sk_redir)
+		sock_put(psock->sk_redir);
+
 	sock_put(psock->sock);
 	kfree(psock);
 }
@@ -503,6 +1154,7 @@ static struct smap_psock *smap_init_psock(struct sock *sock,
 	if (!psock)
 		return ERR_PTR(-ENOMEM);
 
+	psock->eval =  __SK_NONE;
 	psock->sock = sock;
 	skb_queue_head_init(&psock->rxqueue);
 	INIT_WORK(&psock->tx_work, smap_tx_work);
@@ -668,8 +1320,6 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 	if (!psock)
 		goto out;
 
-	if (psock->bpf_parse)
-		smap_stop_sock(psock, sock);
 	smap_list_remove(psock, &stab->sock_map[k]);
 	smap_release_sock(psock, sock);
 out:
@@ -711,10 +1361,11 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
 	struct smap_psock_map_entry *e = NULL;
-	struct bpf_prog *verdict, *parse;
+	struct bpf_prog *verdict, *parse, *tx_msg;
 	struct sock *osock, *sock;
 	struct smap_psock *psock;
 	u32 i = *(u32 *)key;
+	bool new = false;
 	int err;
 
 	if (unlikely(flags > BPF_EXIST))
@@ -737,6 +1388,7 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	 */
 	verdict = READ_ONCE(stab->bpf_verdict);
 	parse = READ_ONCE(stab->bpf_parse);
+	tx_msg = READ_ONCE(stab->bpf_tx_msg);
 
 	if (parse && verdict) {
 		/* bpf prog refcnt may be zero if a concurrent attach operation
@@ -755,6 +1407,17 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		}
 	}
 
+	if (tx_msg) {
+		tx_msg = bpf_prog_inc_not_zero(stab->bpf_tx_msg);
+		if (IS_ERR(tx_msg)) {
+			if (verdict)
+				bpf_prog_put(verdict);
+			if (parse)
+				bpf_prog_put(parse);
+			return PTR_ERR(tx_msg);
+		}
+	}
+
 	write_lock_bh(&sock->sk_callback_lock);
 	psock = smap_psock_sk(sock);
 
@@ -769,7 +1432,14 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 			err = -EBUSY;
 			goto out_progs;
 		}
-		refcount_inc(&psock->refcnt);
+		if (READ_ONCE(psock->bpf_tx_msg) && tx_msg) {
+			err = -EBUSY;
+			goto out_progs;
+		}
+		if (!refcount_inc_not_zero(&psock->refcnt)) {
+			err = -EAGAIN;
+			goto out_progs;
+		}
 	} else {
 		psock = smap_init_psock(sock, stab);
 		if (IS_ERR(psock)) {
@@ -777,11 +1447,8 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 			goto out_progs;
 		}
 
-		err = tcp_set_ulp_id(sock, TCP_ULP_BPF);
-		if (err)
-			goto out_progs;
-
 		set_bit(SMAP_TX_RUNNING, &psock->state);
+		new = true;
 	}
 
 	e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
@@ -794,6 +1461,14 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	/* 3. At this point we have a reference to a valid psock that is
 	 * running. Attach any BPF programs needed.
 	 */
+	if (tx_msg)
+		bpf_tcp_msg_add(psock, sock, tx_msg);
+	if (new) {
+		err = tcp_set_ulp_id(sock, TCP_ULP_BPF);
+		if (err)
+			goto out_free;
+	}
+
 	if (parse && verdict && !psock->strp_enabled) {
 		err = smap_init_sock(psock, sock);
 		if (err)
@@ -815,8 +1490,6 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		struct smap_psock *opsock = smap_psock_sk(osock);
 
 		write_lock_bh(&osock->sk_callback_lock);
-		if (osock != sock && parse)
-			smap_stop_sock(opsock, osock);
 		smap_list_remove(opsock, &stab->sock_map[i]);
 		smap_release_sock(opsock, osock);
 		write_unlock_bh(&osock->sk_callback_lock);
@@ -829,6 +1502,8 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		bpf_prog_put(verdict);
 	if (parse)
 		bpf_prog_put(parse);
+	if (tx_msg)
+		bpf_prog_put(tx_msg);
 	write_unlock_bh(&sock->sk_callback_lock);
 	kfree(e);
 	return err;
@@ -843,6 +1518,9 @@ int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type)
 		return -EINVAL;
 
 	switch (type) {
+	case BPF_SK_MSG_VERDICT:
+		orig = xchg(&stab->bpf_tx_msg, prog);
+		break;
 	case BPF_SK_SKB_STREAM_PARSER:
 		orig = xchg(&stab->bpf_parse, prog);
 		break;
@@ -904,6 +1582,10 @@ static void sock_map_release(struct bpf_map *map, struct file *map_file)
 	orig = xchg(&stab->bpf_verdict, NULL);
 	if (orig)
 		bpf_prog_put(orig);
+
+	orig = xchg(&stab->bpf_tx_msg, NULL);
+	if (orig)
+		bpf_prog_put(orig);
 }
 
 const struct bpf_map_ops sock_map_ops = {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e24aa32..3aeb4ea 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1315,7 +1315,8 @@ static int bpf_obj_get(const union bpf_attr *attr)
 
 #define BPF_PROG_ATTACH_LAST_FIELD attach_flags
 
-static int sockmap_get_from_fd(const union bpf_attr *attr, bool attach)
+static int sockmap_get_from_fd(const union bpf_attr *attr,
+			       int type, bool attach)
 {
 	struct bpf_prog *prog = NULL;
 	int ufd = attr->target_fd;
@@ -1329,8 +1330,7 @@ static int sockmap_get_from_fd(const union bpf_attr *attr, bool attach)
 		return PTR_ERR(map);
 
 	if (attach) {
-		prog = bpf_prog_get_type(attr->attach_bpf_fd,
-					 BPF_PROG_TYPE_SK_SKB);
+		prog = bpf_prog_get_type(attr->attach_bpf_fd, type);
 		if (IS_ERR(prog)) {
 			fdput(f);
 			return PTR_ERR(prog);
@@ -1382,9 +1382,11 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_DEVICE:
 		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
 		break;
+	case BPF_SK_MSG_VERDICT:
+		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_MSG, true);
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
-		return sockmap_get_from_fd(attr, true);
+		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
 	default:
 		return -EINVAL;
 	}
@@ -1437,9 +1439,11 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_CGROUP_DEVICE:
 		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
 		break;
+	case BPF_SK_MSG_VERDICT:
+		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_MSG, false);
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
-		return sockmap_get_from_fd(attr, false);
+		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
 	default:
 		return -EINVAL;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c74b16..3d14059 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1248,6 +1248,7 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 	case BPF_PROG_TYPE_XDP:
 	case BPF_PROG_TYPE_LWT_XMIT:
 	case BPF_PROG_TYPE_SK_SKB:
+	case BPF_PROG_TYPE_SK_MSG:
 		if (meta)
 			return meta->pkt_access;
 
@@ -2062,7 +2063,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_MAP_TYPE_SOCKMAP:
 		if (func_id != BPF_FUNC_sk_redirect_map &&
 		    func_id != BPF_FUNC_sock_map_update &&
-		    func_id != BPF_FUNC_map_delete_elem)
+		    func_id != BPF_FUNC_map_delete_elem &&
+		    func_id != BPF_FUNC_msg_redirect_map)
 			goto error;
 		break;
 	default:
@@ -2100,6 +2102,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 			goto error;
 		break;
 	case BPF_FUNC_sk_redirect_map:
+	case BPF_FUNC_msg_redirect_map:
 		if (map->map_type != BPF_MAP_TYPE_SOCKMAP)
 			goto error;
 		break;
diff --git a/net/core/filter.c b/net/core/filter.c
index 33edfa8..314c311 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1890,6 +1890,44 @@ struct sock *do_sk_redirect_map(struct sk_buff *skb)
 	.arg4_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg_buff *, msg,
+	   struct bpf_map *, map, u32, key, u64, flags)
+{
+	/* If user passes invalid input drop the packet. */
+	if (unlikely(flags))
+		return SK_DROP;
+
+	msg->key = key;
+	msg->flags = flags;
+	msg->map = map;
+
+	return SK_PASS;
+}
+
+struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
+{
+	struct sock *sk = NULL;
+
+	if (msg->map) {
+		sk = __sock_map_lookup_elem(msg->map, msg->key);
+
+		msg->key = 0;
+		msg->map = NULL;
+	}
+
+	return sk;
+}
+
+static const struct bpf_func_proto bpf_msg_redirect_map_proto = {
+	.func           = bpf_msg_redirect_map,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_CONST_MAP_PTR,
+	.arg3_type      = ARG_ANYTHING,
+	.arg4_type      = ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
 	return task_get_classid(skb);
@@ -3591,6 +3629,16 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
 	}
 }
 
+static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_msg_redirect_map:
+		return &bpf_msg_redirect_map_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
 static const struct bpf_func_proto *sk_skb_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -3980,6 +4028,32 @@ static bool sk_skb_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, info);
 }
 
+static bool sk_msg_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   struct bpf_insn_access_aux *info)
+{
+	if (type == BPF_WRITE)
+		return false;
+
+	switch (off) {
+	case offsetof(struct sk_msg_md, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case offsetof(struct sk_msg_md, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	if (off < 0 || off >= sizeof(struct sk_msg_md))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (size != sizeof(__u32))
+		return false;
+
+	return true;
+}
+
 static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 				  const struct bpf_insn *si,
 				  struct bpf_insn *insn_buf,
@@ -4778,6 +4852,29 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type,
 	return insn - insn_buf;
 }
 
+static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
+				     const struct bpf_insn *si,
+				     struct bpf_insn *insn_buf,
+				     struct bpf_prog *prog, u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (si->off) {
+	case offsetof(struct sk_msg_md, data):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_msg_buff, data),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, data));
+		break;
+	case offsetof(struct sk_msg_md, data_end):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_msg_buff, data_end),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, data_end));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
 const struct bpf_verifier_ops sk_filter_verifier_ops = {
 	.get_func_proto		= sk_filter_func_proto,
 	.is_valid_access	= sk_filter_is_valid_access,
@@ -4868,6 +4965,15 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type,
 const struct bpf_prog_ops sk_skb_prog_ops = {
 };
 
+const struct bpf_verifier_ops sk_msg_verifier_ops = {
+	.get_func_proto		= sk_msg_func_proto,
+	.is_valid_access	= sk_msg_is_valid_access,
+	.convert_ctx_access	= sk_msg_convert_ctx_access,
+};
+
+const struct bpf_prog_ops sk_msg_prog_ops = {
+};
+
 int sk_detach_filter(struct sock *sk)
 {
 	int ret = -ENOENT;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (4 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:41   ` David Miller
                     ` (2 more replies)
  2018-03-12 19:23 ` [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper John Fastabend
                   ` (11 subsequent siblings)
  17 siblings, 3 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

A single sendmsg or sendfile system call can contain multiple logical
messages that a BPF program may want to read and apply a verdict. But,
without an apply_bytes helper any verdict on the data applies to all
bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
care to read the first N bytes of a msg. If the payload is large say
MB or even GB setting up and calling the BPF program repeatedly for
all bytes, even though the verdict is already known, creates
unnecessary overhead.

To allow BPF programs to control how many bytes a given verdict
applies to we implement a bpf_msg_apply_bytes() helper. When called
from within a BPF program this sets a counter, internal to the
BPF infrastructure, that applies the last verdict to the next N
bytes. If the N is smaller than the current data being processed
from a sendmsg/sendfile call, the first N bytes will be sent and
the BPF program will be re-run with start_data pointing to the N+1
byte. If N is larger than the current data being processed the
BPF verdict will be applied to multiple sendmsg/sendfile calls
until N bytes are consumed.

Note1 if a socket closes with apply_bytes counter non-zero this
is not a problem because data is not being buffered for N bytes
and is sent as its received.

Note2 if this is operating in the sendpage context the data
pointers may be zeroed after this call if the apply walks beyond
a msg_pull_data() call specified data range. (helper implemented
shortly in this series).

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h |    3 ++-
 net/core/filter.c        |   16 ++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b8275f0..e50c61f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -769,7 +769,8 @@ enum bpf_attach_type {
 	FN(getsockopt),			\
 	FN(override_return),		\
 	FN(sock_ops_cb_flags_set),	\
-	FN(msg_redirect_map),
+	FN(msg_redirect_map),		\
+	FN(msg_apply_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 314c311..df2a8f4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1928,6 +1928,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	.arg4_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u64, bytes)
+{
+	msg->apply_bytes = bytes;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_msg_apply_bytes_proto = {
+	.func           = bpf_msg_apply_bytes,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
 	return task_get_classid(skb);
@@ -3634,6 +3648,8 @@ static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
 	switch (func_id) {
 	case BPF_FUNC_msg_redirect_map:
 		return &bpf_msg_redirect_map_proto;
+	case BPF_FUNC_msg_apply_bytes:
+		return &bpf_msg_apply_bytes_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (5 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:41   ` David Miller
  2018-03-12 19:23 ` [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data John Fastabend
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

In the case where we need a specific number of bytes before a
verdict can be assigned, even if the data spans multiple sendmsg
or sendfile calls. The BPF program may use msg_cork_bytes().

The extreme case is a user can call sendmsg repeatedly with
1-byte msg segments. Obviously, this is bad for performance but
is still valid. If the BPF program needs N bytes to validate
a header it can use msg_cork_bytes to specify N bytes and the
BPF program will not be called again until N bytes have been
accumulated. The infrastructure will attempt to coalesce data
if possible so in many cases (most my use cases at least) the
data will be in a single scatterlist element with data pointers
pointing to start/end of the element. However, this is dependent
on available memory so is not guaranteed. So BPF programs must
validate data pointer ranges, but this is the case anyways to
convince the verifier the accesses are valid.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h |    3 ++-
 net/core/filter.c        |   16 ++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e50c61f..cfcc002 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -770,7 +770,8 @@ enum bpf_attach_type {
 	FN(override_return),		\
 	FN(sock_ops_cb_flags_set),	\
 	FN(msg_redirect_map),		\
-	FN(msg_apply_bytes),
+	FN(msg_apply_bytes),		\
+	FN(msg_cork_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index df2a8f4..2c73af0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1942,6 +1942,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	.arg2_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg_buff *, msg, u64, bytes)
+{
+	msg->cork_bytes = bytes;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
+	.func           = bpf_msg_cork_bytes,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
 	return task_get_classid(skb);
@@ -3650,6 +3664,8 @@ static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
 		return &bpf_msg_redirect_map_proto;
 	case BPF_FUNC_msg_apply_bytes:
 		return &bpf_msg_apply_bytes_proto;
+	case BPF_FUNC_msg_cork_bytes:
+		return &bpf_msg_cork_bytes_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (6 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-15 20:25   ` Daniel Borkmann
  2018-03-12 19:23 ` [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

Currently, if a bpf sk msg program is run the program
can only parse data that the (start,end) pointers already
consumed. For sendmsg hooks this is likely the first
scatterlist element. For sendpage this will be the range
(0,0) because the data is shared with userspace and by
default we want to avoid allowing userspace to modify
data while (or after) BPF verdict is being decided.

To support pulling in additional bytes for parsing use
a new helper bpf_sk_msg_pull(start, end, flags) which
works similar to cls tc logic. This helper will attempt
to point the data start pointer at 'start' bytes offest
into msg and data end pointer at 'end' bytes offset into
message.

After basic sanity checks to ensure 'start' <= 'end' and
'end' <= msg_length there are a few cases we need to
handle.

First the sendmsg hook has already copied the data from
userspace and has exclusive access to it. Therefor, it
is not necessesary to copy the data. However, it may
be required. After finding the scatterlist element with
'start' offset byte in it there are two cases. One the
range (start,end) is entirely contained in the sg element
and is already linear. All that is needed is to update the
data pointers, no allocate/copy is needed. The other case
is (start, end) crosses sg element boundaries. In this
case we allocate a block of size 'end - start' and copy
the data to linearize it.

Next sendpage hook has not copied any data in initial
state so that data pointers are (0,0). In this case we
handle it similar to the above sendmsg case except the
allocation/copy must always happen. Then when sending
the data we have possibly three memory regions that
need to be sent, (0, start - 1), (start, end), and
(end + 1, msg_length). This is required to ensure any
writes by the BPF program are correctly transmitted.

Lastly this operation will invalidate any previous
data checks so BPF programs will have to revalidate
pointers after making this BPF call.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h |    3 +
 net/core/filter.c        |  133 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cfcc002..c9c4a0b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -771,7 +771,8 @@ enum bpf_attach_type {
 	FN(sock_ops_cb_flags_set),	\
 	FN(msg_redirect_map),		\
 	FN(msg_apply_bytes),		\
-	FN(msg_cork_bytes),
+	FN(msg_cork_bytes),		\
+	FN(msg_pull_data),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 2c73af0..7b9e63e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1956,6 +1956,134 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	.arg2_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_msg_pull_data,
+	   struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
+{
+	unsigned int len = 0, offset = 0, copy = 0;
+	struct scatterlist *sg = msg->sg_data;
+	int first_sg, last_sg, i, shift;
+	unsigned char *p, *to, *from;
+	int bytes = end - start;
+	struct page *page;
+
+	if (unlikely(end < start))
+		return -EINVAL;
+
+	/* First find the starting scatterlist element */
+	i = msg->sg_start;
+	do {
+		len = sg[i].length;
+		offset += len;
+		if (start < offset + len)
+			break;
+		i++;
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	} while (i != msg->sg_end);
+
+	if (unlikely(start >= offset + len))
+		return -EINVAL;
+
+	if (!msg->sg_copy[i] && bytes <= len)
+		goto out;
+
+	first_sg = i;
+
+	/* At this point we need to linearize multiple scatterlist
+	 * elements or a single shared page. Either way we need to
+	 * copy into a linear buffer exclusively owned by BPF. Then
+	 * place the buffer in the scatterlist and fixup the original
+	 * entries by removing the entries now in the linear buffer
+	 * and shifting the remaining entries. For now we do not try
+	 * to copy partial entries to avoid complexity of running out
+	 * of sg_entry slots. The downside is reading a single byte
+	 * will copy the entire sg entry.
+	 */
+	do {
+		copy += sg[i].length;
+		i++;
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+		if (bytes < copy)
+			break;
+	} while (i != msg->sg_end);
+	last_sg = i;
+
+	if (unlikely(copy < end - start))
+		return -EINVAL;
+
+	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
+	p = page_address(page);
+	offset = 0;
+
+	i = first_sg;
+	do {
+		from = sg_virt(&sg[i]);
+		len = sg[i].length;
+		to = p + offset;
+
+		memcpy(to, from, len);
+		offset += len;
+		sg[i].length = 0;
+		put_page(sg_page(&sg[i]));
+
+		i++;
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	} while (i != last_sg);
+
+	sg[first_sg].length = copy;
+	sg_set_page(&sg[first_sg], page, copy, 0);
+
+	/* To repair sg ring we need to shift entries. If we only
+	 * had a single entry though we can just replace it and
+	 * be done. Otherwise walk the ring and shift the entries.
+	 */
+	shift = last_sg - first_sg - 1;
+	if (!shift)
+		goto out;
+
+	i = first_sg + 1;
+	do {
+		int move_from;
+
+		if (i + shift >= MAX_SKB_FRAGS)
+			move_from = i + shift - MAX_SKB_FRAGS;
+		else
+			move_from = i + shift;
+
+		if (move_from == msg->sg_end)
+			break;
+
+		sg[i] = sg[move_from];
+		sg[move_from].length = 0;
+		sg[move_from].page_link = 0;
+		sg[move_from].offset = 0;
+
+		i++;
+		if (i == MAX_SKB_FRAGS)
+			i = 0;
+	} while (1);
+	msg->sg_end -= shift;
+	if (msg->sg_end < 0)
+		msg->sg_end += MAX_SKB_FRAGS;
+out:
+	msg->data = sg_virt(&sg[i]) + start - offset;
+	msg->data_end = msg->data + bytes;
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_msg_pull_data_proto = {
+	.func		= bpf_msg_pull_data,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
 	return task_get_classid(skb);
@@ -2897,7 +3025,8 @@ bool bpf_helper_changes_pkt_data(void *func)
 	    func == bpf_l3_csum_replace ||
 	    func == bpf_l4_csum_replace ||
 	    func == bpf_xdp_adjust_head ||
-	    func == bpf_xdp_adjust_meta)
+	    func == bpf_xdp_adjust_meta ||
+	    func == bpf_msg_pull_data)
 		return true;
 
 	return false;
@@ -3666,6 +3795,8 @@ static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
 		return &bpf_msg_apply_bytes_proto;
 	case BPF_FUNC_msg_cork_bytes:
 		return &bpf_msg_cork_bytes_proto;
+	case BPF_FUNC_msg_pull_data:
+		return &bpf_msg_pull_data_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (7 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-12 19:23 ` [bpf-next PATCH v2 10/18] bpf: add verifier " John Fastabend
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/include/uapi/linux/bpf.h                     |   16 ++++++
 tools/testing/selftests/bpf/Makefile               |    2 -
 tools/testing/selftests/bpf/bpf_helpers.h          |    2 +
 tools/testing/selftests/bpf/sockmap_parse_prog.c   |   15 +++++
 tools/testing/selftests/bpf/sockmap_verdict_prog.c |    7 +++
 tools/testing/selftests/bpf/test_maps.c            |   55 +++++++++++++++++++-
 6 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index db6bdc3..eb483b5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -133,6 +133,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SOCK_OPS,
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
+	BPF_PROG_TYPE_SK_MSG,
 };
 
 enum bpf_attach_type {
@@ -143,6 +144,7 @@ enum bpf_attach_type {
 	BPF_SK_SKB_STREAM_PARSER,
 	BPF_SK_SKB_STREAM_VERDICT,
 	BPF_CGROUP_DEVICE,
+	BPF_SK_MSG_VERDICT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -919,6 +921,20 @@ enum sk_action {
 	SK_PASS,
 };
 
+/* User return codes for SK_MSG prog type. */
+enum sk_msg_action {
+	SK_MSG_DROP = 0,
+	SK_MSG_PASS,
+};
+
+/* user accessible metadata for SK_MSG packet hook, new fields must
+ * be added to the end of this structure
+ */
+struct sk_msg_md {
+	__u32 data;
+	__u32 data_end;
+};
+
 #define BPF_TAG_SIZE	8
 
 struct bpf_prog_info {
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8567a858..b6618d6 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -21,7 +21,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
 	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
 	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
-	sample_map_ret0.o test_tcpbpf_kern.o
+	sample_map_ret0.o test_tcpbpf_kern.o sockmap_tcp_msg_prog.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11..1558fe8 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -123,6 +123,8 @@ static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
 	(void *) BPF_FUNC_skb_under_cgroup;
 static int (*bpf_skb_change_head)(void *, int len, int flags) =
 	(void *) BPF_FUNC_skb_change_head;
+static int (*bpf_skb_pull_data)(void *, int len) =
+	(void *) BPF_FUNC_skb_pull_data;
 
 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
 #if defined(__TARGET_ARCH_x86)
diff --git a/tools/testing/selftests/bpf/sockmap_parse_prog.c b/tools/testing/selftests/bpf/sockmap_parse_prog.c
index a1dec2b..0f92858 100644
--- a/tools/testing/selftests/bpf/sockmap_parse_prog.c
+++ b/tools/testing/selftests/bpf/sockmap_parse_prog.c
@@ -20,14 +20,25 @@ int bpf_prog1(struct __sk_buff *skb)
 	__u32 lport = skb->local_port;
 	__u32 rport = skb->remote_port;
 	__u8 *d = data;
+	__u32 len = (__u32) data_end - (__u32) data;
+	int err;
 
-	if (data + 10 > data_end)
-		return skb->len;
+	if (data + 10 > data_end) {
+		err = bpf_skb_pull_data(skb, 10);
+		if (err)
+			return SK_DROP;
+
+		data_end = (void *)(long)skb->data_end;
+		data = (void *)(long)skb->data;
+		if (data + 10 > data_end)
+			return SK_DROP;
+	}
 
 	/* This write/read is a bit pointless but tests the verifier and
 	 * strparser handler for read/write pkt data and access into sk
 	 * fields.
 	 */
+	d = data;
 	d[7] = 1;
 	return skb->len;
 }
diff --git a/tools/testing/selftests/bpf/sockmap_verdict_prog.c b/tools/testing/selftests/bpf/sockmap_verdict_prog.c
index d7bea97..2ce7634 100644
--- a/tools/testing/selftests/bpf/sockmap_verdict_prog.c
+++ b/tools/testing/selftests/bpf/sockmap_verdict_prog.c
@@ -26,6 +26,13 @@ struct bpf_map_def SEC("maps") sock_map_tx = {
 	.max_entries = 20,
 };
 
+struct bpf_map_def SEC("maps") sock_map_msg = {
+	.type = BPF_MAP_TYPE_SOCKMAP,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 20,
+};
+
 struct bpf_map_def SEC("maps") sock_map_break = {
 	.type = BPF_MAP_TYPE_ARRAY,
 	.key_size = sizeof(int),
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 1238733..6c25334 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -464,15 +464,17 @@ static void test_devmap(int task, void *data)
 #include <linux/err.h>
 #define SOCKMAP_PARSE_PROG "./sockmap_parse_prog.o"
 #define SOCKMAP_VERDICT_PROG "./sockmap_verdict_prog.o"
+#define SOCKMAP_TCP_MSG_PROG "./sockmap_tcp_msg_prog.o"
 static void test_sockmap(int tasks, void *data)
 {
-	int one = 1, map_fd_rx = 0, map_fd_tx = 0, map_fd_break, s, sc, rc;
-	struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_break;
+	struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_msg, *bpf_map_break;
+	int map_fd_msg = 0, map_fd_rx = 0, map_fd_tx = 0, map_fd_break;
 	int ports[] = {50200, 50201, 50202, 50204};
 	int err, i, fd, udp, sfd[6] = {0xdeadbeef};
 	u8 buf[20] = {0x0, 0x5, 0x3, 0x2, 0x1, 0x0};
-	int parse_prog, verdict_prog;
+	int parse_prog, verdict_prog, msg_prog;
 	struct sockaddr_in addr;
+	int one = 1, s, sc, rc;
 	struct bpf_object *obj;
 	struct timeval to;
 	__u32 key, value;
@@ -584,6 +586,12 @@ static void test_sockmap(int tasks, void *data)
 		goto out_sockmap;
 	}
 
+	err = bpf_prog_attach(-1, fd, BPF_SK_MSG_VERDICT, 0);
+	if (!err) {
+		printf("Failed invalid msg verdict prog attach\n");
+		goto out_sockmap;
+	}
+
 	err = bpf_prog_attach(-1, fd, __MAX_BPF_ATTACH_TYPE, 0);
 	if (!err) {
 		printf("Failed unknown prog attach\n");
@@ -602,6 +610,12 @@ static void test_sockmap(int tasks, void *data)
 		goto out_sockmap;
 	}
 
+	err = bpf_prog_detach(fd, BPF_SK_MSG_VERDICT);
+	if (err) {
+		printf("Failed empty msg verdict prog detach\n");
+		goto out_sockmap;
+	}
+
 	err = bpf_prog_detach(fd, __MAX_BPF_ATTACH_TYPE);
 	if (!err) {
 		printf("Detach invalid prog successful\n");
@@ -616,6 +630,13 @@ static void test_sockmap(int tasks, void *data)
 		goto out_sockmap;
 	}
 
+	err = bpf_prog_load(SOCKMAP_TCP_MSG_PROG,
+			    BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog);
+	if (err) {
+		printf("Failed to load SK_SKB msg prog\n");
+		goto out_sockmap;
+	}
+
 	err = bpf_prog_load(SOCKMAP_VERDICT_PROG,
 			    BPF_PROG_TYPE_SK_SKB, &obj, &verdict_prog);
 	if (err) {
@@ -631,7 +652,7 @@ static void test_sockmap(int tasks, void *data)
 
 	map_fd_rx = bpf_map__fd(bpf_map_rx);
 	if (map_fd_rx < 0) {
-		printf("Failed to get map fd\n");
+		printf("Failed to get map rx fd\n");
 		goto out_sockmap;
 	}
 
@@ -647,6 +668,18 @@ static void test_sockmap(int tasks, void *data)
 		goto out_sockmap;
 	}
 
+	bpf_map_msg = bpf_object__find_map_by_name(obj, "sock_map_msg");
+	if (IS_ERR(bpf_map_msg)) {
+		printf("Failed to load map msg from msg_verdict prog\n");
+		goto out_sockmap;
+	}
+
+	map_fd_msg = bpf_map__fd(bpf_map_msg);
+	if (map_fd_msg < 0) {
+		printf("Failed to get map msg fd\n");
+		goto out_sockmap;
+	}
+
 	bpf_map_break = bpf_object__find_map_by_name(obj, "sock_map_break");
 	if (IS_ERR(bpf_map_break)) {
 		printf("Failed to load map tx from verdict prog\n");
@@ -680,6 +713,12 @@ static void test_sockmap(int tasks, void *data)
 		goto out_sockmap;
 	}
 
+	err = bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);
+	if (err) {
+		printf("Failed msg verdict bpf prog attach\n");
+		goto out_sockmap;
+	}
+
 	err = bpf_prog_attach(verdict_prog, map_fd_rx,
 			      __MAX_BPF_ATTACH_TYPE, 0);
 	if (!err) {
@@ -719,6 +758,14 @@ static void test_sockmap(int tasks, void *data)
 		}
 	}
 
+	/* Put sfd[2] (sending fd below) into msg map to test sendmsg bpf */
+	i = 0;
+	err = bpf_map_update_elem(map_fd_msg, &i, &sfd[2], BPF_ANY);
+	if (err) {
+		printf("Failed map_fd_msg update sockmap %i\n", err);
+		goto out_sockmap;
+	}
+
 	/* Test map send/recv */
 	for (i = 0; i < 2; i++) {
 		buf[0] = i;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 10/18] bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (8 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
@ 2018-03-12 19:23 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program John Fastabend
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:23 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

Test read and writes for BPF_PROG_TYPE_SK_MSG.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/testing/selftests/bpf/test_verifier.c |   54 +++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 9eb05f3..db49528 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1597,6 +1597,60 @@ struct test_val {
 		.prog_type = BPF_PROG_TYPE_SK_SKB,
 	},
 	{
+		"direct packet read for SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct sk_msg_md, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct sk_msg_md, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"direct packet write for SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct sk_msg_md, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct sk_msg_md, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_STX_MEM(BPF_B, BPF_REG_2, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"overlapping checks for direct packet access SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct sk_msg_md, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct sk_msg_md, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 4),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_2, 6),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
 		"check skb->mark is not writeable by sockets",
 		.insns = {
 			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_1,

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (9 preceding siblings ...)
  2018-03-12 19:23 ` [bpf-next PATCH v2 10/18] bpf: add verifier " John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test John Fastabend
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

Add sockmap option to use SK_MSG program types.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/bpf/bpf_load.c                    |    8 +++
 samples/sockmap/sockmap_kern.c            |   52 +++++++++++++++++++++++
 samples/sockmap/sockmap_user.c            |   67 ++++++++++++++++++++++++++---
 tools/include/uapi/linux/bpf.h            |   13 +++++-
 tools/lib/bpf/libbpf.c                    |    1 
 tools/testing/selftests/bpf/bpf_helpers.h |    3 +
 6 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 69806d7..b1a310c 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -67,6 +67,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
 	bool is_sockops = strncmp(event, "sockops", 7) == 0;
 	bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+	bool is_sk_msg = strncmp(event, "sk_msg", 6) == 0;
 	size_t insns_cnt = size / sizeof(struct bpf_insn);
 	enum bpf_prog_type prog_type;
 	char buf[256];
@@ -96,6 +97,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_SOCK_OPS;
 	} else if (is_sk_skb) {
 		prog_type = BPF_PROG_TYPE_SK_SKB;
+	} else if (is_sk_msg) {
+		prog_type = BPF_PROG_TYPE_SK_MSG;
 	} else {
 		printf("Unknown event '%s'\n", event);
 		return -1;
@@ -113,7 +116,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
 		return 0;
 
-	if (is_socket || is_sockops || is_sk_skb) {
+	if (is_socket || is_sockops || is_sk_skb || is_sk_msg) {
 		if (is_socket)
 			event += 6;
 		else
@@ -589,7 +592,8 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		    memcmp(shname, "socket", 6) == 0 ||
 		    memcmp(shname, "cgroup/", 7) == 0 ||
 		    memcmp(shname, "sockops", 7) == 0 ||
-		    memcmp(shname, "sk_skb", 6) == 0) {
+		    memcmp(shname, "sk_skb", 6) == 0 ||
+		    memcmp(shname, "sk_msg", 6) == 0) {
 			ret = load_and_attach(shname, data->d_buf,
 					      data->d_size);
 			if (ret != 0)
diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 52b0053..75edb2f 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -43,6 +43,20 @@ struct bpf_map_def SEC("maps") sock_map = {
 	.max_entries = 20,
 };
 
+struct bpf_map_def SEC("maps") sock_map_txmsg = {
+	.type = BPF_MAP_TYPE_SOCKMAP,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 20,
+};
+
+struct bpf_map_def SEC("maps") sock_map_redir = {
+	.type = BPF_MAP_TYPE_SOCKMAP,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 1,
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -105,4 +119,42 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 
 	return 0;
 }
+
+SEC("sk_msg1")
+int bpf_prog4(struct sk_msg_md *msg)
+{
+	return SK_PASS;
+}
+
+SEC("sk_msg2")
+int bpf_prog5(struct sk_msg_md *msg)
+{
+	void *data_end = (void *)(long) msg->data_end;
+	void *data = (void *)(long) msg->data;
+
+	bpf_printk("sk_msg2: data length %i\n", (__u32)data_end - (__u32)data);
+	return SK_PASS;
+}
+
+SEC("sk_msg3")
+int bpf_prog6(struct sk_msg_md *msg)
+{
+	void *data_end = (void *)(long) msg->data_end;
+	void *data = (void *)(long) msg->data;
+	int ret = 0;
+
+	return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+}
+
+SEC("sk_msg4")
+int bpf_prog7(struct sk_msg_md *msg)
+{
+	void *data_end = (void *)(long) msg->data_end;
+	void *data = (void *)(long) msg->data;
+	int ret = 0;
+
+	bpf_printk("sk_msg3: redirect(%iB)\n", (__u32)data_end - (__u32)data);
+	return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 95a54a8..bbfe3a2 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -54,6 +54,11 @@
 /* global sockets */
 int s1, s2, c1, c2, p1, p2;
 
+int txmsg_pass;
+int txmsg_noisy;
+int txmsg_redir;
+int txmsg_redir_noisy;
+
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
 	{"cgroup",	required_argument,	NULL, 'c' },
@@ -62,6 +67,10 @@
 	{"iov_count",	required_argument,	NULL, 'i' },
 	{"length",	required_argument,	NULL, 'l' },
 	{"test",	required_argument,	NULL, 't' },
+	{"txmsg",		no_argument,		&txmsg_pass,  1  },
+	{"txmsg_noisy",		no_argument,		&txmsg_noisy, 1  },
+	{"txmsg_redir",		no_argument,		&txmsg_redir, 1  },
+	{"txmsg_redir_noisy",	no_argument,		&txmsg_redir_noisy, 1},
 	{0, 0, NULL, 0 }
 };
 
@@ -447,13 +456,13 @@ enum {
 
 int main(int argc, char **argv)
 {
-	int iov_count = 1, length = 1024, rate = 1, verbose = 0;
+	int iov_count = 1, length = 1024, rate = 1, verbose = 0, tx_prog_fd;
 	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
 	int opt, longindex, err, cg_fd = 0;
 	int test = PING_PONG;
 	char filename[256];
 
-	while ((opt = getopt_long(argc, argv, "hvc:r:i:l:t:",
+	while ((opt = getopt_long(argc, argv, ":hvc:r:i:l:t:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
 		/* Cgroup configuration */
@@ -490,6 +499,8 @@ int main(int argc, char **argv)
 				return -1;
 			}
 			break;
+		case 0:
+			break;
 		case 'h':
 		default:
 			usage(argv);
@@ -515,16 +526,16 @@ int main(int argc, char **argv)
 	/* catch SIGINT */
 	signal(SIGINT, running_handler);
 
-	/* If base test skip BPF setup */
-	if (test == BASE)
-		goto run;
-
 	if (load_bpf_file(filename)) {
 		fprintf(stderr, "load_bpf_file: (%s) %s\n",
 			filename, strerror(errno));
 		return 1;
 	}
 
+	/* If base test skip BPF setup */
+	if (test == BASE)
+		goto run;
+
 	/* Attach programs to sockmap */
 	err = bpf_prog_attach(prog_fd[0], map_fd[0],
 				BPF_SK_SKB_STREAM_PARSER, 0);
@@ -557,6 +568,50 @@ int main(int argc, char **argv)
 		goto out;
 	}
 
+	/* Attach txmsg program to sockmap */
+	if (txmsg_pass)
+		tx_prog_fd = prog_fd[3];
+	else if (txmsg_noisy)
+		tx_prog_fd = prog_fd[4];
+	else if (txmsg_redir)
+		tx_prog_fd = prog_fd[5];
+	else if (txmsg_redir_noisy)
+		tx_prog_fd = prog_fd[6];
+	else
+		tx_prog_fd = 0;
+
+	if (tx_prog_fd) {
+		int redir_fd, i = 0;
+
+		err = bpf_prog_attach(tx_prog_fd,
+				      map_fd[1], BPF_SK_MSG_VERDICT, 0);
+		if (err) {
+			fprintf(stderr,
+				"ERROR: bpf_prog_attach (txmsg): %d (%s)\n",
+				err, strerror(errno));
+			return err;
+		}
+
+		err = bpf_map_update_elem(map_fd[1], &i, &c1, BPF_ANY);
+		if (err) {
+			fprintf(stderr,
+				"ERROR: bpf_map_update_elem (txmsg):  %d (%s\n",
+				err, strerror(errno));
+			return err;
+		}
+		if (test == SENDMSG)
+			redir_fd = c2;
+		else
+			redir_fd = c1;
+
+		err = bpf_map_update_elem(map_fd[2], &i, &redir_fd, BPF_ANY);
+		if (err) {
+			fprintf(stderr,
+				"ERROR: bpf_map_update_elem (txmsg):  %d (%s\n",
+				err, strerror(errno));
+			return err;
+		}
+	}
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, verbose);
 	else if (test == SENDMSG)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index eb483b5..609456f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -698,6 +698,15 @@ enum bpf_attach_type {
  * int bpf_override_return(pt_regs, rc)
  *	@pt_regs: pointer to struct pt_regs
  *	@rc: the return value to set
+ *
+ * int bpf_msg_redirect_map(map, key, flags)
+ *     Redirect msg to a sock in map using key as a lookup key for the
+ *     sock in map.
+ *     @map: pointer to sockmap
+ *     @key: key to lookup sock in map
+ *     @flags: reserved for future use
+ *     Return: SK_PASS
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -759,7 +768,9 @@ enum bpf_attach_type {
 	FN(perf_prog_read_value),	\
 	FN(getsockopt),			\
 	FN(override_return),		\
-	FN(sock_ops_cb_flags_set),
+	FN(sock_ops_cb_flags_set),	\
+	FN(msg_redirect_map),		\
+	FN(msg_apply_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5bbbf28..64a8fc3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1857,6 +1857,7 @@ static bool bpf_program__is_type(struct bpf_program *prog,
 	BPF_PROG_SEC("lwt_xmit",	BPF_PROG_TYPE_LWT_XMIT),
 	BPF_PROG_SEC("sockops",		BPF_PROG_TYPE_SOCK_OPS),
 	BPF_PROG_SEC("sk_skb",		BPF_PROG_TYPE_SK_SKB),
+	BPF_PROG_SEC("sk_msg",		BPF_PROG_TYPE_SK_MSG),
 };
 #undef BPF_PROG_SEC
 
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 1558fe8..bba7ee6 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -86,6 +86,9 @@ static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
 	(void *) BPF_FUNC_perf_prog_read_value;
 static int (*bpf_override_return)(void *ctx, unsigned long rc) =
 	(void *) BPF_FUNC_override_return;
+static int (*bpf_msg_redirect_map)(void *ctx, void *map, int key, int flags) =
+	(void *) BPF_FUNC_msg_redirect_map;
+
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (10 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option John Fastabend
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

To exercise TX ULP sendpage implementation we need a test that does
a sendfile. Add sendfile test option here.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_user.c |   70 ++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 10 deletions(-)

diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index bbfe3a2..ec624a8 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -29,6 +29,7 @@
 #include <sys/time.h>
 #include <sys/resource.h>
 #include <sys/types.h>
+#include <sys/sendfile.h>
 
 #include <linux/netlink.h>
 #include <linux/socket.h>
@@ -67,10 +68,10 @@
 	{"iov_count",	required_argument,	NULL, 'i' },
 	{"length",	required_argument,	NULL, 'l' },
 	{"test",	required_argument,	NULL, 't' },
-	{"txmsg",		no_argument,		&txmsg_pass,  1  },
-	{"txmsg_noisy",		no_argument,		&txmsg_noisy, 1  },
-	{"txmsg_redir",		no_argument,		&txmsg_redir, 1  },
-	{"txmsg_redir_noisy",	no_argument,		&txmsg_redir_noisy, 1},
+	{"txmsg",		no_argument,	&txmsg_pass,  1  },
+	{"txmsg_noisy",		no_argument,	&txmsg_noisy, 1  },
+	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
+	{"txmsg_redir_noisy",	no_argument,	&txmsg_redir_noisy, 1},
 	{0, 0, NULL, 0 }
 };
 
@@ -204,6 +205,35 @@ struct msg_stats {
 	struct timespec end;
 };
 
+static int msg_loop_sendpage(int fd, int iov_length, int cnt,
+			     struct msg_stats *s)
+{
+	off_t offset = 0;
+	FILE *file;
+	int i, fp;
+
+	file = fopen(".sendpage_tst.tmp", "w+");
+	fseek(file, iov_length * cnt, SEEK_CUR);
+	fprintf(file, "A");
+	fseek(file, 0, SEEK_SET);
+
+	fp = fileno(file);
+	clock_gettime(CLOCK_MONOTONIC, &s->start);
+	for (i = 0; i < cnt; i++) {
+		int sent = sendfile(fd, fp, &offset, iov_length);
+
+		if (sent < 0) {
+			perror("send loop error:");
+			fclose(file);
+			return sent;
+		}
+		s->bytes_sent += sent;
+	}
+	clock_gettime(CLOCK_MONOTONIC, &s->end);
+	fclose(file);
+	return 0;
+}
+
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		    struct msg_stats *s, bool tx)
 {
@@ -309,7 +339,7 @@ static inline float recvdBps(struct msg_stats s)
 }
 
 static int sendmsg_test(int iov_count, int iov_buf, int cnt,
-			int verbose, bool base)
+			int verbose, bool base, bool sendpage)
 {
 	float sent_Bps = 0, recvd_Bps = 0;
 	int rx_fd, txpid, rxpid, err = 0;
@@ -325,6 +355,8 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
 	rxpid = fork();
 	if (rxpid == 0) {
+		if (sendpage)
+			iov_count = 1;
 		err = msg_loop(rx_fd, iov_count, iov_buf, cnt, &s, false);
 		if (err)
 			fprintf(stderr,
@@ -348,7 +380,11 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
 	txpid = fork();
 	if (txpid == 0) {
-		err = msg_loop(c1, iov_count, iov_buf, cnt, &s, true);
+		if (sendpage)
+			err = msg_loop_sendpage(c1, iov_buf, cnt, &s);
+		else
+			err = msg_loop(c1, iov_count, iov_buf, cnt, &s, true);
+
 		if (err)
 			fprintf(stderr,
 				"msg_loop_tx: iov_count %i iov_buf %i cnt %i err %i\n",
@@ -452,6 +488,8 @@ enum {
 	PING_PONG,
 	SENDMSG,
 	BASE,
+	BASE_SENDPAGE,
+	SENDPAGE,
 };
 
 int main(int argc, char **argv)
@@ -494,6 +532,10 @@ int main(int argc, char **argv)
 				test = SENDMSG;
 			} else if (strcmp(optarg, "base") == 0) {
 				test = BASE;
+			} else if (strcmp(optarg, "base_sendpage") == 0) {
+				test = BASE_SENDPAGE;
+			} else if (strcmp(optarg, "sendpage") == 0) {
+				test = SENDPAGE;
 			} else {
 				usage(argv);
 				return -1;
@@ -533,7 +575,7 @@ int main(int argc, char **argv)
 	}
 
 	/* If base test skip BPF setup */
-	if (test == BASE)
+	if (test == BASE || test == BASE_SENDPAGE)
 		goto run;
 
 	/* Attach programs to sockmap */
@@ -599,7 +641,7 @@ int main(int argc, char **argv)
 				err, strerror(errno));
 			return err;
 		}
-		if (test == SENDMSG)
+		if (txmsg_redir || txmsg_redir_noisy)
 			redir_fd = c2;
 		else
 			redir_fd = c1;
@@ -615,9 +657,17 @@ int main(int argc, char **argv)
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, verbose);
 	else if (test == SENDMSG)
-		err = sendmsg_test(iov_count, length, rate, verbose, false);
+		err = sendmsg_test(iov_count, length, rate,
+				   verbose, false, false);
+	else if (test == SENDPAGE)
+		err = sendmsg_test(iov_count, length, rate,
+				   verbose, false, true);
 	else if (test == BASE)
-		err = sendmsg_test(iov_count, length, rate, verbose, true);
+		err = sendmsg_test(iov_count, length, rate,
+				   verbose, true, false);
+	else if (test == BASE_SENDPAGE)
+		err = sendmsg_test(iov_count, length, rate,
+				   verbose, true, true);
 	else
 		fprintf(stderr, "unknown test\n");
 out:

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (11 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper John Fastabend
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

To verify data is not being dropped or corrupted this adds an option
to verify test-patterns on recv.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_user.c |  118 ++++++++++++++++++++++++++++------------
 1 file changed, 84 insertions(+), 34 deletions(-)

diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index ec624a8..8017ad7a 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -68,6 +68,7 @@
 	{"iov_count",	required_argument,	NULL, 'i' },
 	{"length",	required_argument,	NULL, 'l' },
 	{"test",	required_argument,	NULL, 't' },
+	{"data_test",   no_argument,		NULL, 'd' },
 	{"txmsg",		no_argument,	&txmsg_pass,  1  },
 	{"txmsg_noisy",		no_argument,	&txmsg_noisy, 1  },
 	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
@@ -208,45 +209,49 @@ struct msg_stats {
 static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 			     struct msg_stats *s)
 {
-	off_t offset = 0;
+	unsigned char k = 0;
 	FILE *file;
 	int i, fp;
 
 	file = fopen(".sendpage_tst.tmp", "w+");
-	fseek(file, iov_length * cnt, SEEK_CUR);
-	fprintf(file, "A");
+	for (i = 0; i < iov_length * cnt; i++, k++)
+		fwrite(&k, sizeof(char), 1, file);
+	fflush(file);
 	fseek(file, 0, SEEK_SET);
+	fclose(file);
 
-	fp = fileno(file);
+	fp = open(".sendpage_tst.tmp", O_RDONLY);
 	clock_gettime(CLOCK_MONOTONIC, &s->start);
 	for (i = 0; i < cnt; i++) {
-		int sent = sendfile(fd, fp, &offset, iov_length);
+		int sent = sendfile(fd, fp, NULL, iov_length);
 
 		if (sent < 0) {
 			perror("send loop error:");
-			fclose(file);
+			close(fp);
 			return sent;
 		}
 		s->bytes_sent += sent;
 	}
 	clock_gettime(CLOCK_MONOTONIC, &s->end);
-	fclose(file);
+	close(fp);
 	return 0;
 }
 
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
-		    struct msg_stats *s, bool tx)
+		    struct msg_stats *s, bool tx, bool data_test)
 {
 	struct msghdr msg = {0};
 	int err, i, flags = MSG_NOSIGNAL;
 	struct iovec *iov;
+	unsigned char k;
 
 	iov = calloc(iov_count, sizeof(struct iovec));
 	if (!iov)
 		return errno;
 
+	k = 0;
 	for (i = 0; i < iov_count; i++) {
-		char *d = calloc(iov_length, sizeof(char));
+		unsigned char *d = calloc(iov_length, sizeof(char));
 
 		if (!d) {
 			fprintf(stderr, "iov_count %i/%i OOM\n", i, iov_count);
@@ -254,10 +259,18 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		}
 		iov[i].iov_base = d;
 		iov[i].iov_len = iov_length;
+
+		if (data_test && tx) {
+			int j;
+
+			for (j = 0; j < iov_length; j++)
+				d[j] = k++;
+		}
 	}
 
 	msg.msg_iov = iov;
 	msg.msg_iovlen = iov_count;
+	k = 0;
 
 	if (tx) {
 		clock_gettime(CLOCK_MONOTONIC, &s->start);
@@ -311,6 +324,26 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 			}
 
 			s->bytes_recvd += recv;
+
+			if (data_test) {
+				int j;
+
+				for (i = 0; i < msg.msg_iovlen; i++) {
+					unsigned char *d = iov[i].iov_base;
+
+					for (j = 0;
+					     j < iov[i].iov_len && recv; j++) {
+						if (d[j] != k++) {
+							errno = -EIO;
+							fprintf(stderr,
+								"detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
+								i, j, d[j], k - 1, d[j+1], k + 1);
+							goto out_errno;
+						}
+						recv--;
+					}
+				}
+			}
 		}
 		clock_gettime(CLOCK_MONOTONIC, &s->end);
 	}
@@ -338,8 +371,15 @@ static inline float recvdBps(struct msg_stats s)
 	return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
 }
 
+struct sockmap_options {
+	int verbose;
+	bool base;
+	bool sendpage;
+	bool data_test;
+};
+
 static int sendmsg_test(int iov_count, int iov_buf, int cnt,
-			int verbose, bool base, bool sendpage)
+			struct sockmap_options *opt)
 {
 	float sent_Bps = 0, recvd_Bps = 0;
 	int rx_fd, txpid, rxpid, err = 0;
@@ -348,16 +388,17 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
 	errno = 0;
 
-	if (base)
+	if (opt->base)
 		rx_fd = p1;
 	else
 		rx_fd = p2;
 
 	rxpid = fork();
 	if (rxpid == 0) {
-		if (sendpage)
+		if (opt->sendpage)
 			iov_count = 1;
-		err = msg_loop(rx_fd, iov_count, iov_buf, cnt, &s, false);
+		err = msg_loop(rx_fd, iov_count, iov_buf,
+			       cnt, &s, false, opt->data_test);
 		if (err)
 			fprintf(stderr,
 				"msg_loop_rx: iov_count %i iov_buf %i cnt %i err %i\n",
@@ -380,10 +421,11 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
 	txpid = fork();
 	if (txpid == 0) {
-		if (sendpage)
+		if (opt->sendpage)
 			err = msg_loop_sendpage(c1, iov_buf, cnt, &s);
 		else
-			err = msg_loop(c1, iov_count, iov_buf, cnt, &s, true);
+			err = msg_loop(c1, iov_count, iov_buf,
+				       cnt, &s, true, opt->data_test);
 
 		if (err)
 			fprintf(stderr,
@@ -409,7 +451,7 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 	return err;
 }
 
-static int forever_ping_pong(int rate, int verbose)
+static int forever_ping_pong(int rate, struct sockmap_options *opt)
 {
 	struct timeval timeout;
 	char buf[1024] = {0};
@@ -474,7 +516,7 @@ static int forever_ping_pong(int rate, int verbose)
 		if (rate)
 			sleep(rate);
 
-		if (verbose) {
+		if (opt->verbose) {
 			printf(".");
 			fflush(stdout);
 
@@ -494,13 +536,14 @@ enum {
 
 int main(int argc, char **argv)
 {
-	int iov_count = 1, length = 1024, rate = 1, verbose = 0, tx_prog_fd;
+	int iov_count = 1, length = 1024, rate = 1, tx_prog_fd;
 	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
 	int opt, longindex, err, cg_fd = 0;
+	struct sockmap_options options = {0};
 	int test = PING_PONG;
 	char filename[256];
 
-	while ((opt = getopt_long(argc, argv, ":hvc:r:i:l:t:",
+	while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
 		/* Cgroup configuration */
@@ -517,7 +560,7 @@ int main(int argc, char **argv)
 			rate = atoi(optarg);
 			break;
 		case 'v':
-			verbose = 1;
+			options.verbose = 1;
 			break;
 		case 'i':
 			iov_count = atoi(optarg);
@@ -525,6 +568,9 @@ int main(int argc, char **argv)
 		case 'l':
 			length = atoi(optarg);
 			break;
+		case 'd':
+			options.data_test = true;
+			break;
 		case 't':
 			if (strcmp(optarg, "ping") == 0) {
 				test = PING_PONG;
@@ -655,20 +701,24 @@ int main(int argc, char **argv)
 		}
 	}
 	if (test == PING_PONG)
-		err = forever_ping_pong(rate, verbose);
-	else if (test == SENDMSG)
-		err = sendmsg_test(iov_count, length, rate,
-				   verbose, false, false);
-	else if (test == SENDPAGE)
-		err = sendmsg_test(iov_count, length, rate,
-				   verbose, false, true);
-	else if (test == BASE)
-		err = sendmsg_test(iov_count, length, rate,
-				   verbose, true, false);
-	else if (test == BASE_SENDPAGE)
-		err = sendmsg_test(iov_count, length, rate,
-				   verbose, true, true);
-	else
+		err = forever_ping_pong(rate, &options);
+	else if (test == SENDMSG) {
+		options.base = false;
+		options.sendpage = false;
+		err = sendmsg_test(iov_count, length, rate, &options);
+	} else if (test == SENDPAGE) {
+		options.base = false;
+		options.sendpage = true;
+		err = sendmsg_test(iov_count, length, rate, &options);
+	} else if (test == BASE) {
+		options.base = true;
+		options.sendpage = false;
+		err = sendmsg_test(iov_count, length, rate, &options);
+	} else if (test == BASE_SENDPAGE) {
+		options.base = true;
+		options.sendpage = true;
+		err = sendmsg_test(iov_count, length, rate, &options);
+	} else
 		fprintf(stderr, "unknown test\n");
 out:
 	bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (12 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

This adds an option to test the apply_bytes helper. This option lets
the user specify an int on the command line specifying how much data
each verdict should apply to.

When this is set a map entry is set with the bytes input by the user
and then the specified program --txmsg or --txmsg_redir will use the
value and set the applied data. If no other option is set then a
default --txmsg_apply program is run. This program will drop pkts
if an error is detected on the bytes map lookup. Useful to verify
the map lookup and apply helper are working and causing a hard
error if it is not.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_kern.c            |   54 ++++++++++++++++++++++++++---
 samples/sockmap/sockmap_user.c            |   19 ++++++++++
 tools/testing/selftests/bpf/bpf_helpers.h |    3 +-
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 75edb2f..5a51f15 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -57,6 +57,13 @@ struct bpf_map_def SEC("maps") sock_map_redir = {
 	.max_entries = 1,
 };
 
+struct bpf_map_def SEC("maps") sock_apply_bytes = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 1
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -123,6 +130,11 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 SEC("sk_msg1")
 int bpf_prog4(struct sk_msg_md *msg)
 {
+	int *bytes, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes)
+		bpf_msg_apply_bytes(msg, *bytes);
 	return SK_PASS;
 }
 
@@ -131,8 +143,13 @@ int bpf_prog5(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
+	int *bytes, err = 0, zero = 0;
 
-	bpf_printk("sk_msg2: data length %i\n", (__u32)data_end - (__u32)data);
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes)
+		err = bpf_msg_apply_bytes(msg, *bytes);
+	bpf_printk("sk_msg2: data length %i err %i\n",
+		   (__u32)data_end - (__u32)data, err);
 	return SK_PASS;
 }
 
@@ -141,9 +158,12 @@ int bpf_prog6(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
-	int ret = 0;
+	int *bytes, zero = 0;
 
-	return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes)
+		bpf_msg_apply_bytes(msg, *bytes);
+	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
 SEC("sk_msg4")
@@ -151,10 +171,32 @@ int bpf_prog7(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
-	int ret = 0;
+	int *bytes, err = 0, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes)
+		err = bpf_msg_apply_bytes(msg, *bytes);
+	bpf_printk("sk_msg3: redirect(%iB) err=%i\n",
+		   (__u32)data_end - (__u32)data, err);
+	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
+}
 
-	bpf_printk("sk_msg3: redirect(%iB)\n", (__u32)data_end - (__u32)data);
-	return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+SEC("sk_msg5")
+int bpf_prog8(struct sk_msg_md *msg)
+{
+	void *data_end = (void *)(long) msg->data_end;
+	void *data = (void *)(long) msg->data;
+	int ret = 0, *bytes, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes) {
+		ret = bpf_msg_apply_bytes(msg, *bytes);
+		if (ret)
+			return SK_DROP;
+	} else {
+		return SK_DROP;
+	}
+	return SK_PASS;
 }
 
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 8017ad7a..41774ec 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -59,6 +59,7 @@
 int txmsg_noisy;
 int txmsg_redir;
 int txmsg_redir_noisy;
+int txmsg_apply;
 
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
@@ -73,6 +74,7 @@
 	{"txmsg_noisy",		no_argument,	&txmsg_noisy, 1  },
 	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
 	{"txmsg_redir_noisy",	no_argument,	&txmsg_redir_noisy, 1},
+	{"txmsg_apply",	required_argument,	NULL, 'a'},
 	{0, 0, NULL, 0 }
 };
 
@@ -546,7 +548,9 @@ int main(int argc, char **argv)
 	while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
-		/* Cgroup configuration */
+		case 'a':
+			txmsg_apply = atoi(optarg);
+			break;
 		case 'c':
 			cg_fd = open(optarg, O_DIRECTORY, O_RDONLY);
 			if (cg_fd < 0) {
@@ -665,6 +669,8 @@ int main(int argc, char **argv)
 		tx_prog_fd = prog_fd[5];
 	else if (txmsg_redir_noisy)
 		tx_prog_fd = prog_fd[6];
+	else if (txmsg_apply)
+		tx_prog_fd = prog_fd[7];
 	else
 		tx_prog_fd = 0;
 
@@ -699,6 +705,17 @@ int main(int argc, char **argv)
 				err, strerror(errno));
 			return err;
 		}
+
+		if (txmsg_apply) {
+			err = bpf_map_update_elem(map_fd[3],
+						  &i, &txmsg_apply, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (apply_bytes):  %d (%s\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
 	}
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, &options);
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index bba7ee6..4713de4 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -88,7 +88,8 @@ static int (*bpf_override_return)(void *ctx, unsigned long rc) =
 	(void *) BPF_FUNC_override_return;
 static int (*bpf_msg_redirect_map)(void *ctx, void *map, int key, int flags) =
 	(void *) BPF_FUNC_msg_redirect_map;
-
+static int (*bpf_msg_apply_bytes)(void *ctx, int len) =
+	(void *) BPF_FUNC_msg_apply_bytes;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (13 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:42   ` David Miller
  2018-03-15 20:15   ` Alexei Starovoitov
  2018-03-12 19:24 ` [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests John Fastabend
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.

Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf_common.h           |    7 ++--
 samples/sockmap/sockmap_kern.c            |   53 +++++++++++++++++++++++++----
 samples/sockmap/sockmap_user.c            |   19 ++++++++++
 tools/include/uapi/linux/bpf.h            |    3 +-
 tools/testing/selftests/bpf/bpf_helpers.h |    2 +
 5 files changed, 71 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
index ee97668..18be907 100644
--- a/include/uapi/linux/bpf_common.h
+++ b/include/uapi/linux/bpf_common.h
@@ -15,10 +15,9 @@
 
 /* ld/ldx fields */
 #define BPF_SIZE(code)  ((code) & 0x18)
-#define		BPF_W		0x00 /* 32-bit */
-#define		BPF_H		0x08 /* 16-bit */
-#define		BPF_B		0x10 /*  8-bit */
-/* eBPF		BPF_DW		0x18    64-bit */
+#define		BPF_W		0x00
+#define		BPF_H		0x08
+#define		BPF_B		0x10
 #define BPF_MODE(code)  ((code) & 0xe0)
 #define		BPF_IMM		0x00
 #define		BPF_ABS		0x20
diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 5a51f15..1c430926 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -64,6 +64,13 @@ struct bpf_map_def SEC("maps") sock_apply_bytes = {
 	.max_entries = 1
 };
 
+struct bpf_map_def SEC("maps") sock_cork_bytes = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 1
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -135,6 +142,9 @@ int bpf_prog4(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
 		bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		bpf_msg_cork_bytes(msg, *bytes);
 	return SK_PASS;
 }
 
@@ -143,13 +153,16 @@ int bpf_prog5(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
-	int *bytes, err = 0, zero = 0;
+	int *bytes, err1 = -1, err2 = -1, zero = 0;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
-		err = bpf_msg_apply_bytes(msg, *bytes);
-	bpf_printk("sk_msg2: data length %i err %i\n",
-		   (__u32)data_end - (__u32)data, err);
+		err1 = bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		err2 = bpf_msg_cork_bytes(msg, *bytes);
+	bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n",
+		   (__u32)data_end - (__u32)data, err1, err2);
 	return SK_PASS;
 }
 
@@ -163,6 +176,9 @@ int bpf_prog6(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
 		bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		bpf_msg_cork_bytes(msg, *bytes);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -171,13 +187,17 @@ int bpf_prog7(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
-	int *bytes, err = 0, zero = 0;
+	int *bytes, err1 = 0, err2 = 0, zero = 0;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
-		err = bpf_msg_apply_bytes(msg, *bytes);
-	bpf_printk("sk_msg3: redirect(%iB) err=%i\n",
-		   (__u32)data_end - (__u32)data, err);
+		err1 = bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		err2 = bpf_msg_cork_bytes(msg, *bytes);
+
+	bpf_printk("sk_msg3: redirect(%iB) err1=%i err2=%i\n",
+		   (__u32)data_end - (__u32)data, err1, err2);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -198,5 +218,22 @@ int bpf_prog8(struct sk_msg_md *msg)
 	}
 	return SK_PASS;
 }
+SEC("sk_msg6")
+int bpf_prog9(struct sk_msg_md *msg)
+{
+	void *data_end = (void *)(long) msg->data_end;
+	void *data = (void *)(long) msg->data;
+	int ret = 0, *bytes, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes) {
+		if (((__u32)data_end - (__u32)data) >= *bytes)
+			return SK_PASS;
+		ret = bpf_msg_cork_bytes(msg, *bytes);
+		if (ret)
+			return SK_DROP;
+	}
+	return SK_PASS;
+}
 
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 41774ec..4e0a3d8 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -60,6 +60,7 @@
 int txmsg_redir;
 int txmsg_redir_noisy;
 int txmsg_apply;
+int txmsg_cork;
 
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
@@ -75,6 +76,7 @@
 	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
 	{"txmsg_redir_noisy",	no_argument,	&txmsg_redir_noisy, 1},
 	{"txmsg_apply",	required_argument,	NULL, 'a'},
+	{"txmsg_cork",	required_argument,	NULL, 'k'},
 	{0, 0, NULL, 0 }
 };
 
@@ -551,6 +553,9 @@ int main(int argc, char **argv)
 		case 'a':
 			txmsg_apply = atoi(optarg);
 			break;
+		case 'k':
+			txmsg_cork = atoi(optarg);
+			break;
 		case 'c':
 			cg_fd = open(optarg, O_DIRECTORY, O_RDONLY);
 			if (cg_fd < 0) {
@@ -671,6 +676,8 @@ int main(int argc, char **argv)
 		tx_prog_fd = prog_fd[6];
 	else if (txmsg_apply)
 		tx_prog_fd = prog_fd[7];
+	else if (txmsg_cork)
+		tx_prog_fd = prog_fd[8];
 	else
 		tx_prog_fd = 0;
 
@@ -716,6 +723,18 @@ int main(int argc, char **argv)
 				return err;
 			}
 		}
+
+		if (txmsg_cork) {
+			err = bpf_map_update_elem(map_fd[4],
+						  &i, &txmsg_cork, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (cork_bytes):  %d (%s\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
+
 	}
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, &options);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 609456f..ce07a13 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -770,7 +770,8 @@ enum bpf_attach_type {
 	FN(override_return),		\
 	FN(sock_ops_cb_flags_set),	\
 	FN(msg_redirect_map),		\
-	FN(msg_apply_bytes),
+	FN(msg_apply_bytes),		\
+	FN(msg_cork_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 4713de4..b5b45ff 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -90,6 +90,8 @@ static int (*bpf_msg_redirect_map)(void *ctx, void *map, int key, int flags) =
 	(void *) BPF_FUNC_msg_redirect_map;
 static int (*bpf_msg_apply_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_apply_bytes;
+static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
+	(void *) BPF_FUNC_msg_cork_bytes;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (14 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:43   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data John Fastabend
  2018-03-12 19:24 ` [bpf-next PATCH v2 18/18] bpf: sockmap test script John Fastabend
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

Add tests for SK_DROP.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_kern.c |   15 ++++++++++
 samples/sockmap/sockmap_user.c |   62 ++++++++++++++++++++++++++++++----------
 2 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 1c430926..5842f1e 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -236,4 +236,19 @@ int bpf_prog9(struct sk_msg_md *msg)
 	return SK_PASS;
 }
 
+SEC("sk_msg7")
+int bpf_prog10(struct sk_msg_md *msg)
+{
+	int *bytes, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes)
+		bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		bpf_msg_cork_bytes(msg, *bytes);
+	return SK_DROP;
+}
+
+
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 4e0a3d8..52c4ed7 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -59,6 +59,7 @@
 int txmsg_noisy;
 int txmsg_redir;
 int txmsg_redir_noisy;
+int txmsg_drop;
 int txmsg_apply;
 int txmsg_cork;
 
@@ -75,6 +76,7 @@
 	{"txmsg_noisy",		no_argument,	&txmsg_noisy, 1  },
 	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
 	{"txmsg_redir_noisy",	no_argument,	&txmsg_redir_noisy, 1},
+	{"txmsg_drop",		no_argument,	&txmsg_drop, 1 },
 	{"txmsg_apply",	required_argument,	NULL, 'a'},
 	{"txmsg_cork",	required_argument,	NULL, 'k'},
 	{0, 0, NULL, 0 }
@@ -210,9 +212,19 @@ struct msg_stats {
 	struct timespec end;
 };
 
+struct sockmap_options {
+	int verbose;
+	bool base;
+	bool sendpage;
+	bool data_test;
+	bool drop_expected;
+};
+
 static int msg_loop_sendpage(int fd, int iov_length, int cnt,
-			     struct msg_stats *s)
+			     struct msg_stats *s,
+			     struct sockmap_options *opt)
 {
+	bool drop = opt->drop_expected;
 	unsigned char k = 0;
 	FILE *file;
 	int i, fp;
@@ -229,12 +241,18 @@ static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 	for (i = 0; i < cnt; i++) {
 		int sent = sendfile(fd, fp, NULL, iov_length);
 
-		if (sent < 0) {
+		if (!drop && sent < 0) {
 			perror("send loop error:");
 			close(fp);
 			return sent;
+		} else if (drop && sent >= 0) {
+			printf("sendpage loop error expected: %i\n", sent);
+			close(fp);
+			return -EIO;
 		}
-		s->bytes_sent += sent;
+
+		if (sent > 0)
+			s->bytes_sent += sent;
 	}
 	clock_gettime(CLOCK_MONOTONIC, &s->end);
 	close(fp);
@@ -242,12 +260,15 @@ static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 }
 
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
-		    struct msg_stats *s, bool tx, bool data_test)
+		    struct msg_stats *s, bool tx,
+		    struct sockmap_options *opt)
 {
 	struct msghdr msg = {0};
 	int err, i, flags = MSG_NOSIGNAL;
 	struct iovec *iov;
 	unsigned char k;
+	bool data_test = opt->data_test;
+	bool drop = opt->drop_expected;
 
 	iov = calloc(iov_count, sizeof(struct iovec));
 	if (!iov)
@@ -281,11 +302,16 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		for (i = 0; i < cnt; i++) {
 			int sent = sendmsg(fd, &msg, flags);
 
-			if (sent < 0) {
+			if (!drop && sent < 0) {
 				perror("send loop error:");
 				goto out_errno;
+			} else if (drop && sent >= 0) {
+				printf("send loop error expected: %i\n", sent);
+				errno = -EIO;
+				goto out_errno;
 			}
-			s->bytes_sent += sent;
+			if (sent > 0)
+				s->bytes_sent += sent;
 		}
 		clock_gettime(CLOCK_MONOTONIC, &s->end);
 	} else {
@@ -375,13 +401,6 @@ static inline float recvdBps(struct msg_stats s)
 	return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
 }
 
-struct sockmap_options {
-	int verbose;
-	bool base;
-	bool sendpage;
-	bool data_test;
-};
-
 static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 			struct sockmap_options *opt)
 {
@@ -399,10 +418,13 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
 	rxpid = fork();
 	if (rxpid == 0) {
+		if (opt->drop_expected)
+			exit(1);
+
 		if (opt->sendpage)
 			iov_count = 1;
 		err = msg_loop(rx_fd, iov_count, iov_buf,
-			       cnt, &s, false, opt->data_test);
+			       cnt, &s, false, opt);
 		if (err)
 			fprintf(stderr,
 				"msg_loop_rx: iov_count %i iov_buf %i cnt %i err %i\n",
@@ -426,10 +448,10 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 	txpid = fork();
 	if (txpid == 0) {
 		if (opt->sendpage)
-			err = msg_loop_sendpage(c1, iov_buf, cnt, &s);
+			err = msg_loop_sendpage(c1, iov_buf, cnt, &s, opt);
 		else
 			err = msg_loop(c1, iov_count, iov_buf,
-				       cnt, &s, true, opt->data_test);
+				       cnt, &s, true, opt);
 
 		if (err)
 			fprintf(stderr,
@@ -674,6 +696,9 @@ int main(int argc, char **argv)
 		tx_prog_fd = prog_fd[5];
 	else if (txmsg_redir_noisy)
 		tx_prog_fd = prog_fd[6];
+	else if (txmsg_drop)
+		tx_prog_fd = prog_fd[9];
+	/* apply and cork must be last */
 	else if (txmsg_apply)
 		tx_prog_fd = prog_fd[7];
 	else if (txmsg_cork)
@@ -700,6 +725,7 @@ int main(int argc, char **argv)
 				err, strerror(errno));
 			return err;
 		}
+
 		if (txmsg_redir || txmsg_redir_noisy)
 			redir_fd = c2;
 		else
@@ -736,6 +762,10 @@ int main(int argc, char **argv)
 		}
 
 	}
+
+	if (txmsg_drop)
+		options.drop_expected = true;
+
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, &options);
 	else if (test == SENDMSG) {

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (15 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:43   ` David Miller
  2018-03-12 19:24 ` [bpf-next PATCH v2 18/18] bpf: sockmap test script John Fastabend
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.

The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_kern.c            |   79 ++++++++++++++++++++++++-----
 samples/sockmap/sockmap_user.c            |   32 ++++++++++++
 tools/testing/selftests/bpf/bpf_helpers.h |    2 +
 3 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 5842f1e..a8b555a 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -71,6 +71,14 @@ struct bpf_map_def SEC("maps") sock_cork_bytes = {
 	.max_entries = 1
 };
 
+struct bpf_map_def SEC("maps") sock_pull_bytes = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 2
+};
+
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -137,7 +145,8 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 SEC("sk_msg1")
 int bpf_prog4(struct sk_msg_md *msg)
 {
-	int *bytes, zero = 0;
+	int *bytes, zero = 0, one = 1;
+	int *start, *end;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -145,15 +154,18 @@ int bpf_prog4(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		bpf_msg_cork_bytes(msg, *bytes);
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end)
+		bpf_msg_pull_data(msg, *start, *end, 0);
 	return SK_PASS;
 }
 
 SEC("sk_msg2")
 int bpf_prog5(struct sk_msg_md *msg)
 {
-	void *data_end = (void *)(long) msg->data_end;
-	void *data = (void *)(long) msg->data;
-	int *bytes, err1 = -1, err2 = -1, zero = 0;
+	int err1 = -1, err2 = -1, zero = 0, one = 1;
+	int *bytes, *start, *end, len1, len2;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -161,17 +173,32 @@ int bpf_prog5(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		err2 = bpf_msg_cork_bytes(msg, *bytes);
+	len1 = (__u32)msg->data_end - (__u32)msg->data;
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end) {
+		int err;
+
+		bpf_printk("sk_msg2: pull(%i:%i)\n",
+		   	   start ? *start : 0, end ? *end : 0);
+		err = bpf_msg_pull_data(msg, *start, *end, 0);
+		if (err)
+			bpf_printk("sk_msg2: pull_data err %i\n",
+				   err);
+		len2 = (__u32)msg->data_end - (__u32)msg->data;
+		bpf_printk("sk_msg2: length update %i->%i\n",
+			   len1, len2);
+	}
 	bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n",
-		   (__u32)data_end - (__u32)data, err1, err2);
+		   len1, err1, err2);
 	return SK_PASS;
 }
 
 SEC("sk_msg3")
 int bpf_prog6(struct sk_msg_md *msg)
 {
-	void *data_end = (void *)(long) msg->data_end;
-	void *data = (void *)(long) msg->data;
-	int *bytes, zero = 0;
+	int *bytes, zero = 0, one = 1;
+	int *start, *end;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -179,15 +206,18 @@ int bpf_prog6(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		bpf_msg_cork_bytes(msg, *bytes);
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end)
+		bpf_msg_pull_data(msg, *start, *end, 0);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
 SEC("sk_msg4")
 int bpf_prog7(struct sk_msg_md *msg)
 {
-	void *data_end = (void *)(long) msg->data_end;
-	void *data = (void *)(long) msg->data;
-	int *bytes, err1 = 0, err2 = 0, zero = 0;
+	int err1 = 0, err2 = 0, zero = 0, one = 1;
+	int *bytes, *start, *end, len1, len2;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -195,9 +225,24 @@ int bpf_prog7(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		err2 = bpf_msg_cork_bytes(msg, *bytes);
-
+	len1 = (__u32)msg->data_end - (__u32)msg->data;
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end) {
+		int err;
+
+		bpf_printk("sk_msg2: pull(%i:%i)\n",
+		   	   start ? *start : 0, end ? *end : 0);
+		err = bpf_msg_pull_data(msg, *start, *end, 0);
+		if (err)
+			bpf_printk("sk_msg2: pull_data err %i\n",
+				   err);
+		len2 = (__u32)msg->data_end - (__u32)msg->data;
+		bpf_printk("sk_msg2: length update %i->%i\n",
+			   len1, len2);
+	}
 	bpf_printk("sk_msg3: redirect(%iB) err1=%i err2=%i\n",
-		   (__u32)data_end - (__u32)data, err1, err2);
+		   len1, err1, err2);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -239,7 +284,8 @@ int bpf_prog9(struct sk_msg_md *msg)
 SEC("sk_msg7")
 int bpf_prog10(struct sk_msg_md *msg)
 {
-	int *bytes, zero = 0;
+	int *bytes, zero = 0, one = 1;
+	int *start, *end;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -247,6 +293,11 @@ int bpf_prog10(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		bpf_msg_cork_bytes(msg, *bytes);
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end)
+		bpf_msg_pull_data(msg, *start, *end, 0);
+
 	return SK_DROP;
 }
 
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 52c4ed7..307d097 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -62,6 +62,8 @@
 int txmsg_drop;
 int txmsg_apply;
 int txmsg_cork;
+int txmsg_start;
+int txmsg_end;
 
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
@@ -79,6 +81,8 @@
 	{"txmsg_drop",		no_argument,	&txmsg_drop, 1 },
 	{"txmsg_apply",	required_argument,	NULL, 'a'},
 	{"txmsg_cork",	required_argument,	NULL, 'k'},
+	{"txmsg_start", required_argument,	NULL, 's'},
+	{"txmsg_end",	required_argument,	NULL, 'e'},
 	{0, 0, NULL, 0 }
 };
 
@@ -572,6 +576,12 @@ int main(int argc, char **argv)
 	while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
+		case 's':
+			txmsg_start = atoi(optarg);
+			break;
+		case 'e':	
+			txmsg_end = atoi(optarg);
+			break;
 		case 'a':
 			txmsg_apply = atoi(optarg);
 			break;
@@ -761,6 +771,28 @@ int main(int argc, char **argv)
 			}
 		}
 
+		if (txmsg_start) {
+			err = bpf_map_update_elem(map_fd[5],
+						  &i, &txmsg_start, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (txmsg_start):  %d (%s)\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
+
+		if (txmsg_end) {
+			i = 1;
+			err = bpf_map_update_elem(map_fd[5],
+						  &i, &txmsg_end, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (txmsg_end):  %d (%s)\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
 	}
 
 	if (txmsg_drop)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index b5b45ff..7cae376 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -92,6 +92,8 @@ static int (*bpf_msg_apply_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_apply_bytes;
 static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_cork_bytes;
+static int (*bpf_msg_pull_data)(void *ctx, int start, int end, int flags) =
+	(void *) BPF_FUNC_msg_pull_data;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [bpf-next PATCH v2 18/18] bpf: sockmap test script
  2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
                   ` (16 preceding siblings ...)
  2018-03-12 19:24 ` [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data John Fastabend
@ 2018-03-12 19:24 ` John Fastabend
  2018-03-15 18:43   ` David Miller
  17 siblings, 1 reply; 49+ messages in thread
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev

This adds the test script I am currently using to validate
the latest sockmap changes. Shortly sockmap will be ported
to selftests and these will be run from the infrastructure
there. Until then add the script here so we have a coverage
checklist when porting into selftests.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_test.sh |  450 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 450 insertions(+)
 create mode 100755 samples/sockmap/sockmap_test.sh

diff --git a/samples/sockmap/sockmap_test.sh b/samples/sockmap/sockmap_test.sh
new file mode 100755
index 0000000..21c8598
--- /dev/null
+++ b/samples/sockmap/sockmap_test.sh
@@ -0,0 +1,450 @@
+#Test a bunch of positive cases to verify basic functionality
+for prog in "--txmsg" "--txmsg_redir" "--txmsg_drop"; do
+for t in "sendmsg" "sendpage"; do
+for r in 1 10 100; do
+	for i in 1 10 100; do
+		for l in 1 10 100; do
+			TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+			echo $TEST
+			$TEST
+			sleep 2
+		done
+	done
+done
+done
+done
+
+#Test max iov
+t="sendmsg"
+r=1
+i=1024
+l=1
+prog="--txmsg"
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+prog="--txmsg_redir"
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+
+# Test max iov with 1k send
+
+t="sendmsg"
+r=1
+i=1024
+l=1024
+prog="--txmsg"
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+prog="--txmsg_redir"
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+
+# Test apply with 1B
+r=1
+i=1024
+l=1024
+prog="--txmsg_apply 1"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply with apply that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply and redirect with 1B
+r=1
+i=1024
+l=1024
+prog="--txmsg_redir --txmsg_apply 1"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply and redirect with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_redir --txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply and redirect with apply that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with 1B not really useful but test it anyways
+r=1
+i=1024
+l=1024
+prog="--txmsg_cork 1"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with a more reasonable 100B
+r=1
+i=1000
+l=1000
+prog="--txmsg_cork 100"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with cork that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+r=1
+i=1024
+l=1024
+prog="--txmsg_redir --txmsg_cork 1"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with a more reasonable 100B
+r=1
+i=1000
+l=1000
+prog="--txmsg_redir --txmsg_cork 100"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_redir --txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with cork that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+
+# mix and match cork and apply not really useful but valid programs
+
+# Test apply < cork
+r=100
+i=1
+l=5
+prog="--txmsg_apply 10 --txmsg_cork 100"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Try again with larger sizes so we hit overflow case
+r=100
+i=1000
+l=2048
+prog="--txmsg_apply 4096 --txmsg_cork 8096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply > cork
+r=100
+i=1
+l=5
+prog="--txmsg_apply 100 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Again with larger sizes so we hit overflow cases
+r=100
+i=1000
+l=2048
+prog="--txmsg_apply 8096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+
+# Test apply = cork
+r=100
+i=1
+l=5
+prog="--txmsg_apply 10 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+r=100
+i=1000
+l=2048
+prog="--txmsg_apply 4096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply < cork
+r=100
+i=1
+l=5
+prog="--txmsg_redir --txmsg_apply 10 --txmsg_cork 100"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Try again with larger sizes so we hit overflow case
+r=100
+i=1000
+l=2048
+prog="--txmsg_redir --txmsg_apply 4096 --txmsg_cork 8096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply > cork
+r=100
+i=1
+l=5
+prog="--txmsg_redir --txmsg_apply 100 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Again with larger sizes so we hit overflow cases
+r=100
+i=1000
+l=2048
+prog="--txmsg_redir --txmsg_apply 8096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+
+# Test apply = cork
+r=100
+i=1
+l=5
+prog="--txmsg_redir --txmsg_apply 10 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+r=100
+i=1000
+l=2048
+prog="--txmsg_redir --txmsg_apply 4096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Tests for bpf_msg_pull_data()
+for i in `seq 99 100 1600`; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+		--txmsg --txmsg_start 0 --txmsg_end $i --txmsg_cork 1600"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+for i in `seq 199 100 1600`; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+		--txmsg --txmsg_start 100 --txmsg_end $i --txmsg_cork 1600"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 1500 --txmsg_end 1600 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 1111 --txmsg_end 1112 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 1111 --txmsg_end 0 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 0 --txmsg_end 1601 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 0 --txmsg_end 1601 --txmsg_cork 1602"
+echo $TEST
+$TEST
+sleep 2
+
+# Run through gamut again with start and end
+for prog in "--txmsg" "--txmsg_redir" "--txmsg_drop"; do
+for t in "sendmsg" "sendpage"; do
+for r in 1 10 100; do
+	for i in 1 10 100; do
+		for l in 1 10 100; do
+			TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog --txmsg_start 1 --txmsg_end 2"
+			echo $TEST
+		#	$TEST
+			sleep 2
+		done
+	done
+done
+done
+done
+
+# Some specific tests to cover specific code paths
+./sockmap --cgroup /mnt/cgroup2/ -t sendpage \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 3
+./sockmap --cgroup /mnt/cgroup2/ -t sendmsg \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 3
+./sockmap --cgroup /mnt/cgroup2/ -t sendpage \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 5
+./sockmap --cgroup /mnt/cgroup2/ -t sendmsg \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 5

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG
  2018-03-12 19:23 ` [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
@ 2018-03-15 18:40   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:40 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:19 -0700

> When calling do_tcp_sendpages() from in kernel and we know the data
> has no references from user side we can omit SKBTX_SHARED_FRAG flag.
> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
> to omit setting SKBTX_SHARED_FRAG.
> 
> The flag is not exposed to userspace because the sendpage call from
> the splice logic masks out all bits except MSG_MORE.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-12 19:23 ` [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
@ 2018-03-15 18:41   ` David Miller
  2018-03-15 21:59   ` Alexei Starovoitov
  1 sibling, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:29 -0700

> This implements a BPF ULP layer to allow policy enforcement and
> monitoring at the socket layer. In order to support this a new
> program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
> the sendmsg/sendpage hook. To attach the policy to sockets a
> sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.
 ...
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
  2018-03-12 19:23 ` [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
@ 2018-03-15 18:41   ` David Miller
  2018-03-15 20:32   ` Daniel Borkmann
  2018-03-15 21:45   ` Alexei Starovoitov
  2 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:34 -0700

> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper
  2018-03-12 19:23 ` [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper John Fastabend
@ 2018-03-15 18:41   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:39 -0700

> In the case where we need a specific number of bytes before a
> verdict can be assigned, even if the data spans multiple sendmsg
> or sendfile calls. The BPF program may use msg_cork_bytes().
> 
> The extreme case is a user can call sendmsg repeatedly with
> 1-byte msg segments. Obviously, this is bad for performance but
> is still valid. If the BPF program needs N bytes to validate
> a header it can use msg_cork_bytes to specify N bytes and the
> BPF program will not be called again until N bytes have been
> accumulated. The infrastructure will attempt to coalesce data
> if possible so in many cases (most my use cases at least) the
> data will be in a single scatterlist element with data pointers
> pointing to start/end of the element. However, this is dependent
> on available memory so is not guaranteed. So BPF programs must
> validate data pointer ranges, but this is the case anyways to
> convince the verifier the accesses are valid.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data
  2018-03-12 19:23 ` [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data John Fastabend
@ 2018-03-15 18:42   ` David Miller
  2018-03-15 20:25   ` Daniel Borkmann
  1 sibling, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:45 -0700

> Currently, if a bpf sk msg program is run the program
> can only parse data that the (start,end) pointers already
> consumed. For sendmsg hooks this is likely the first
> scatterlist element. For sendpage this will be the range
> (0,0) because the data is shared with userspace and by
> default we want to avoid allowing userspace to modify
> data while (or after) BPF verdict is being decided.
> 
> To support pulling in additional bytes for parsing use
> a new helper bpf_sk_msg_pull(start, end, flags) which
> works similar to cls tc logic. This helper will attempt
> to point the data start pointer at 'start' bytes offest
> into msg and data end pointer at 'end' bytes offset into
> message.
 ...
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Yeah this looks really nice.

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG
  2018-03-12 19:23 ` [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
@ 2018-03-15 18:42   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:50 -0700

> Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 10/18] bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG
  2018-03-12 19:23 ` [bpf-next PATCH v2 10/18] bpf: add verifier " John Fastabend
@ 2018-03-15 18:42   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:55 -0700

> Test read and writes for BPF_PROG_TYPE_SK_MSG.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program
  2018-03-12 19:24 ` [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program John Fastabend
@ 2018-03-15 18:42   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:00 -0700

> Add sockmap option to use SK_MSG program types.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test
  2018-03-12 19:24 ` [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test John Fastabend
@ 2018-03-15 18:42   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:05 -0700

> To exercise TX ULP sendpage implementation we need a test that does
> a sendfile. Add sendfile test option here.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option
  2018-03-12 19:24 ` [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option John Fastabend
@ 2018-03-15 18:42   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:10 -0700

> To verify data is not being dropped or corrupted this adds an option
> to verify test-patterns on recv.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper
  2018-03-12 19:24 ` [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper John Fastabend
@ 2018-03-15 18:42   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:16 -0700

> This adds an option to test the apply_bytes helper. This option lets
> the user specify an int on the command line specifying how much data
> each verdict should apply to.
> 
> When this is set a map entry is set with the bytes input by the user
> and then the specified program --txmsg or --txmsg_redir will use the
> value and set the applied data. If no other option is set then a
> default --txmsg_apply program is run. This program will drop pkts
> if an error is detected on the bytes map lookup. Useful to verify
> the map lookup and apply helper are working and causing a hard
> error if it is not.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
  2018-03-12 19:24 ` [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
@ 2018-03-15 18:42   ` David Miller
  2018-03-15 20:15   ` Alexei Starovoitov
  1 sibling, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:21 -0700

> Add sample application support for the bpf_msg_cork_bytes helper. This
> lets the user specify how many bytes each verdict should apply to.
> 
> Similar to apply_bytes() tests these can be run as a stand-alone test
> when used without other options or inline with other tests by using
> the txmsg_cork option along with any of the basic tests txmsg,
> txmsg_redir, txmsg_drop.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests
  2018-03-12 19:24 ` [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests John Fastabend
@ 2018-03-15 18:43   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:43 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:26 -0700

> Add tests for SK_DROP.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data
  2018-03-12 19:24 ` [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data John Fastabend
@ 2018-03-15 18:43   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:43 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:31 -0700

> This adds an option to test the msg_pull_data helper. This
> uses two options txmsg_start and txmsg_end to let the user
> specify start and end bytes to pull.
> 
> The options can be used with txmsg_apply, txmsg_cork options
> as well as with any of the basic tests, txmsg, txmsg_redir and
> txmsg_drop (plus noisy variants) to run pull_data inline with
> those tests. By giving user direct control over the variables
> we can easily do negative testing as well as positive tests.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 18/18] bpf: sockmap test script
  2018-03-12 19:24 ` [bpf-next PATCH v2 18/18] bpf: sockmap test script John Fastabend
@ 2018-03-15 18:43   ` David Miller
  0 siblings, 0 replies; 49+ messages in thread
From: David Miller @ 2018-03-15 18:43 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:36 -0700

> This adds the test script I am currently using to validate
> the latest sockmap changes. Shortly sockmap will be ported
> to selftests and these will be run from the infrastructure
> there. Until then add the script here so we have a coverage
> checklist when porting into selftests.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
  2018-03-12 19:24 ` [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
  2018-03-15 18:42   ` David Miller
@ 2018-03-15 20:15   ` Alexei Starovoitov
  2018-03-15 22:04     ` John Fastabend
  1 sibling, 1 reply; 49+ messages in thread
From: Alexei Starovoitov @ 2018-03-15 20:15 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, ast, daniel, davejwatson, netdev

On Mon, Mar 12, 2018 at 12:24:21PM -0700, John Fastabend wrote:
> Add sample application support for the bpf_msg_cork_bytes helper. This
> lets the user specify how many bytes each verdict should apply to.
> 
> Similar to apply_bytes() tests these can be run as a stand-alone test
> when used without other options or inline with other tests by using
> the txmsg_cork option along with any of the basic tests txmsg,
> txmsg_redir, txmsg_drop.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/uapi/linux/bpf_common.h           |    7 ++--
>  samples/sockmap/sockmap_kern.c            |   53 +++++++++++++++++++++++++----
>  samples/sockmap/sockmap_user.c            |   19 ++++++++++
>  tools/include/uapi/linux/bpf.h            |    3 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |    2 +
>  5 files changed, 71 insertions(+), 13 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
> index ee97668..18be907 100644
> --- a/include/uapi/linux/bpf_common.h
> +++ b/include/uapi/linux/bpf_common.h
> @@ -15,10 +15,9 @@
>  
>  /* ld/ldx fields */
>  #define BPF_SIZE(code)  ((code) & 0x18)
> -#define		BPF_W		0x00 /* 32-bit */
> -#define		BPF_H		0x08 /* 16-bit */
> -#define		BPF_B		0x10 /*  8-bit */
> -/* eBPF		BPF_DW		0x18    64-bit */
> +#define		BPF_W		0x00
> +#define		BPF_H		0x08
> +#define		BPF_B		0x10

this hunk seems wrong here. Botched rebase?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data
  2018-03-12 19:23 ` [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data John Fastabend
  2018-03-15 18:42   ` David Miller
@ 2018-03-15 20:25   ` Daniel Borkmann
  1 sibling, 0 replies; 49+ messages in thread
From: Daniel Borkmann @ 2018-03-15 20:25 UTC (permalink / raw)
  To: John Fastabend, davem, ast, davejwatson; +Cc: netdev

On 03/12/2018 08:23 PM, John Fastabend wrote:
[...]
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2c73af0..7b9e63e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1956,6 +1956,134 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>  	.arg2_type      = ARG_ANYTHING,
>  };
>  
> +BPF_CALL_4(bpf_msg_pull_data,
> +	   struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
> +{
> +	unsigned int len = 0, offset = 0, copy = 0;
> +	struct scatterlist *sg = msg->sg_data;
> +	int first_sg, last_sg, i, shift;
> +	unsigned char *p, *to, *from;
> +	int bytes = end - start;
> +	struct page *page;
> +
> +	if (unlikely(end < start))
> +		return -EINVAL;

Actually should be:

if (unlikely(flags || end <= start))
	return -EINVAL;

> +	/* First find the starting scatterlist element */
> +	i = msg->sg_start;
> +	do {
> +		len = sg[i].length;
> +		offset += len;
> +		if (start < offset + len)
> +			break;
> +		i++;
> +		if (i == MAX_SKB_FRAGS)
> +			i = 0;
> +	} while (i != msg->sg_end);
> +
> +	if (unlikely(start >= offset + len))
> +		return -EINVAL;
> +
> +	if (!msg->sg_copy[i] && bytes <= len)
> +		goto out;
> +
> +	first_sg = i;
> +
> +	/* At this point we need to linearize multiple scatterlist
> +	 * elements or a single shared page. Either way we need to
> +	 * copy into a linear buffer exclusively owned by BPF. Then
> +	 * place the buffer in the scatterlist and fixup the original
> +	 * entries by removing the entries now in the linear buffer
> +	 * and shifting the remaining entries. For now we do not try
> +	 * to copy partial entries to avoid complexity of running out
> +	 * of sg_entry slots. The downside is reading a single byte
> +	 * will copy the entire sg entry.
> +	 */
> +	do {
> +		copy += sg[i].length;
> +		i++;
> +		if (i == MAX_SKB_FRAGS)
> +			i = 0;
> +		if (bytes < copy)
> +			break;
> +	} while (i != msg->sg_end);
> +	last_sg = i;
> +
> +	if (unlikely(copy < end - start))
> +		return -EINVAL;
> +
> +	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));

if (unlikely(!page))
	return -ENOMEM;

> +	p = page_address(page);
> +	offset = 0;
> +
> +	i = first_sg;
> +	do {
> +		from = sg_virt(&sg[i]);
> +		len = sg[i].length;
> +		to = p + offset;
> +
> +		memcpy(to, from, len);
> +		offset += len;
> +		sg[i].length = 0;
> +		put_page(sg_page(&sg[i]));
> +
> +		i++;
> +		if (i == MAX_SKB_FRAGS)
> +			i = 0;
> +	} while (i != last_sg);
> +
> +	sg[first_sg].length = copy;
> +	sg_set_page(&sg[first_sg], page, copy, 0);
> +
> +	/* To repair sg ring we need to shift entries. If we only
> +	 * had a single entry though we can just replace it and
> +	 * be done. Otherwise walk the ring and shift the entries.
> +	 */
> +	shift = last_sg - first_sg - 1;
> +	if (!shift)
> +		goto out;
> +
> +	i = first_sg + 1;
> +	do {
> +		int move_from;
> +
> +		if (i + shift >= MAX_SKB_FRAGS)
> +			move_from = i + shift - MAX_SKB_FRAGS;
> +		else
> +			move_from = i + shift;
> +
> +		if (move_from == msg->sg_end)
> +			break;
> +
> +		sg[i] = sg[move_from];
> +		sg[move_from].length = 0;
> +		sg[move_from].page_link = 0;
> +		sg[move_from].offset = 0;
> +
> +		i++;
> +		if (i == MAX_SKB_FRAGS)
> +			i = 0;
> +	} while (1);
> +	msg->sg_end -= shift;
> +	if (msg->sg_end < 0)
> +		msg->sg_end += MAX_SKB_FRAGS;
> +out:
> +	msg->data = sg_virt(&sg[i]) + start - offset;
> +	msg->data_end = msg->data + bytes;
> +
> +	return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_msg_pull_data_proto = {
> +	.func		= bpf_msg_pull_data,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_ANYTHING,
> +	.arg4_type	= ARG_ANYTHING,
> +};
> +
>  BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
>  {
>  	return task_get_classid(skb);
> @@ -2897,7 +3025,8 @@ bool bpf_helper_changes_pkt_data(void *func)
>  	    func == bpf_l3_csum_replace ||
>  	    func == bpf_l4_csum_replace ||
>  	    func == bpf_xdp_adjust_head ||
> -	    func == bpf_xdp_adjust_meta)
> +	    func == bpf_xdp_adjust_meta ||
> +	    func == bpf_msg_pull_data)
>  		return true;
>  
>  	return false;
> @@ -3666,6 +3795,8 @@ static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
>  		return &bpf_msg_apply_bytes_proto;
>  	case BPF_FUNC_msg_cork_bytes:
>  		return &bpf_msg_cork_bytes_proto;
> +	case BPF_FUNC_msg_pull_data:
> +		return &bpf_msg_pull_data_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
  2018-03-12 19:23 ` [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
  2018-03-15 18:41   ` David Miller
@ 2018-03-15 20:32   ` Daniel Borkmann
  2018-03-15 22:02     ` John Fastabend
  2018-03-15 21:45   ` Alexei Starovoitov
  2 siblings, 1 reply; 49+ messages in thread
From: Daniel Borkmann @ 2018-03-15 20:32 UTC (permalink / raw)
  To: John Fastabend, davem, ast, davejwatson; +Cc: netdev

On 03/12/2018 08:23 PM, John Fastabend wrote:
> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/uapi/linux/bpf.h |    3 ++-
>  net/core/filter.c        |   16 ++++++++++++++++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b8275f0..e50c61f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -769,7 +769,8 @@ enum bpf_attach_type {
>  	FN(getsockopt),			\
>  	FN(override_return),		\
>  	FN(sock_ops_cb_flags_set),	\
> -	FN(msg_redirect_map),
> +	FN(msg_redirect_map),		\
> +	FN(msg_apply_bytes),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 314c311..df2a8f4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1928,6 +1928,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>  	.arg4_type      = ARG_ANYTHING,
>  };
>  
> +BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u64, bytes)
> +{
> +	msg->apply_bytes = bytes;

Here in bpf_msg_apply_bytes() but also in bpf_msg_cork_bytes() the signature
is u64, but in struct sk_msg_buff and struct smap_psock it's type int, so
user provided u64 will make these negative. Is there a reason to have this
allow a negative value and not being of type u32 everywhere?

> +	return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_msg_apply_bytes_proto = {
> +	.func           = bpf_msg_apply_bytes,
> +	.gpl_only       = false,
> +	.ret_type       = RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_ANYTHING,
> +};
> +
>  BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
>  {
>  	return task_get_classid(skb);
> @@ -3634,6 +3648,8 @@ static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
>  	switch (func_id) {
>  	case BPF_FUNC_msg_redirect_map:
>  		return &bpf_msg_redirect_map_proto;
> +	case BPF_FUNC_msg_apply_bytes:
> +		return &bpf_msg_apply_bytes_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
  2018-03-12 19:23 ` [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
  2018-03-15 18:41   ` David Miller
  2018-03-15 20:32   ` Daniel Borkmann
@ 2018-03-15 21:45   ` Alexei Starovoitov
  2018-03-15 21:59     ` John Fastabend
  2 siblings, 1 reply; 49+ messages in thread
From: Alexei Starovoitov @ 2018-03-15 21:45 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, ast, daniel, davejwatson, netdev

On Mon, Mar 12, 2018 at 12:23:34PM -0700, John Fastabend wrote:
> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).

instead of 'shortly in this seris' you meant 'implemented earlier'?
patch 5 handles it, but it's set here, right?

The semantics of the helper looks great.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
  2018-03-15 21:45   ` Alexei Starovoitov
@ 2018-03-15 21:59     ` John Fastabend
  0 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-15 21:59 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, ast, daniel, davejwatson, netdev

On 03/15/2018 02:45 PM, Alexei Starovoitov wrote:
> On Mon, Mar 12, 2018 at 12:23:34PM -0700, John Fastabend wrote:
>> A single sendmsg or sendfile system call can contain multiple logical
>> messages that a BPF program may want to read and apply a verdict. But,
>> without an apply_bytes helper any verdict on the data applies to all
>> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
>> care to read the first N bytes of a msg. If the payload is large say
>> MB or even GB setting up and calling the BPF program repeatedly for
>> all bytes, even though the verdict is already known, creates
>> unnecessary overhead.
>>
>> To allow BPF programs to control how many bytes a given verdict
>> applies to we implement a bpf_msg_apply_bytes() helper. When called
>> from within a BPF program this sets a counter, internal to the
>> BPF infrastructure, that applies the last verdict to the next N
>> bytes. If the N is smaller than the current data being processed
>> from a sendmsg/sendfile call, the first N bytes will be sent and
>> the BPF program will be re-run with start_data pointing to the N+1
>> byte. If N is larger than the current data being processed the
>> BPF verdict will be applied to multiple sendmsg/sendfile calls
>> until N bytes are consumed.
>>
>> Note1 if a socket closes with apply_bytes counter non-zero this
>> is not a problem because data is not being buffered for N bytes
>> and is sent as its received.
>>
>> Note2 if this is operating in the sendpage context the data
>> pointers may be zeroed after this call if the apply walks beyond
>> a msg_pull_data() call specified data range. (helper implemented
>> shortly in this series).
> 
> instead of 'shortly in this seris' you meant 'implemented earlier'?
> patch 5 handles it, but it's set here, right?
> 

Yep just a hold-over from an earlier patch description. I'll remove
that entire note2 and fixup a couple small things Daniel noticed
with a v3.

> The semantics of the helper looks great.
> 

Great!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-12 19:23 ` [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
  2018-03-15 18:41   ` David Miller
@ 2018-03-15 21:59   ` Alexei Starovoitov
  2018-03-15 22:08     ` John Fastabend
  2018-03-15 22:17     ` Daniel Borkmann
  1 sibling, 2 replies; 49+ messages in thread
From: Alexei Starovoitov @ 2018-03-15 21:59 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, ast, daniel, davejwatson, netdev

On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>  
> +/* User return codes for SK_MSG prog type. */
> +enum sk_msg_action {
> +	SK_MSG_DROP = 0,
> +	SK_MSG_PASS,
> +};

do we really need new enum here?
It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
and there will be only drop/pass in both enums.
Also I don't see where these two new SK_MSG_* are used...

> +
> +/* user accessible metadata for SK_MSG packet hook, new fields must
> + * be added to the end of this structure
> + */
> +struct sk_msg_md {
> +	__u32 data;
> +	__u32 data_end;
> +};

I think it's time for me to ask for forgiveness :)
I used __u32 for data and data_end only because all other fields
in __sk_buff were __u32 at the time and I couldn't easily figure out
how to teach verifier to recognize 8-byte rewrites.
Unfortunately my mistake stuck and was copied over into xdp.
Since this is new struct let's do it right and add
'void *data, *data_end' here,
since bpf prog will use them as 'void *' pointers.
There are no compat issues here, since bpf is always 64-bit.

> +static int bpf_map_msg_verdict(int _rc, struct sk_msg_buff *md)
> +{
> +	return ((_rc == SK_PASS) ?
> +	       (md->map ? __SK_REDIRECT : __SK_PASS) :
> +	       __SK_DROP);

you're using old SK_PASS here too ;)
that's to my point of not adding SK_MSG_PASS...

Overall the patch set looks absolutely great.
Thank you for working on it.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
  2018-03-15 20:32   ` Daniel Borkmann
@ 2018-03-15 22:02     ` John Fastabend
  0 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-15 22:02 UTC (permalink / raw)
  To: Daniel Borkmann, davem, ast, davejwatson; +Cc: netdev

On 03/15/2018 01:32 PM, Daniel Borkmann wrote:
> On 03/12/2018 08:23 PM, John Fastabend wrote:
>> A single sendmsg or sendfile system call can contain multiple logical
>> messages that a BPF program may want to read and apply a verdict. But,
>> without an apply_bytes helper any verdict on the data applies to all
>> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
>> care to read the first N bytes of a msg. If the payload is large say
>> MB or even GB setting up and calling the BPF program repeatedly for
>> all bytes, even though the verdict is already known, creates
>> unnecessary overhead.
>>
>> To allow BPF programs to control how many bytes a given verdict
>> applies to we implement a bpf_msg_apply_bytes() helper. When called
>> from within a BPF program this sets a counter, internal to the
>> BPF infrastructure, that applies the last verdict to the next N
>> bytes. If the N is smaller than the current data being processed
>> from a sendmsg/sendfile call, the first N bytes will be sent and
>> the BPF program will be re-run with start_data pointing to the N+1
>> byte. If N is larger than the current data being processed the
>> BPF verdict will be applied to multiple sendmsg/sendfile calls
>> until N bytes are consumed.
>>
>> Note1 if a socket closes with apply_bytes counter non-zero this
>> is not a problem because data is not being buffered for N bytes
>> and is sent as its received.
>>
>> Note2 if this is operating in the sendpage context the data
>> pointers may be zeroed after this call if the apply walks beyond
>> a msg_pull_data() call specified data range. (helper implemented
>> shortly in this series).
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>  include/uapi/linux/bpf.h |    3 ++-
>>  net/core/filter.c        |   16 ++++++++++++++++
>>  2 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index b8275f0..e50c61f 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -769,7 +769,8 @@ enum bpf_attach_type {
>>  	FN(getsockopt),			\
>>  	FN(override_return),		\
>>  	FN(sock_ops_cb_flags_set),	\
>> -	FN(msg_redirect_map),
>> +	FN(msg_redirect_map),		\
>> +	FN(msg_apply_bytes),
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>   * function eBPF program intends to call
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 314c311..df2a8f4 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -1928,6 +1928,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>  	.arg4_type      = ARG_ANYTHING,
>>  };
>>  
>> +BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u64, bytes)
>> +{
>> +	msg->apply_bytes = bytes;
> 
> Here in bpf_msg_apply_bytes() but also in bpf_msg_cork_bytes() the signature
> is u64, but in struct sk_msg_buff and struct smap_psock it's type int, so
> user provided u64 will make these negative. Is there a reason to have this
> allow a negative value and not being of type u32 everywhere?
> 

Nope no reason for negative values, we can make it consistently
u32.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
  2018-03-15 20:15   ` Alexei Starovoitov
@ 2018-03-15 22:04     ` John Fastabend
  0 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-15 22:04 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, ast, daniel, davejwatson, netdev

On 03/15/2018 01:15 PM, Alexei Starovoitov wrote:
> On Mon, Mar 12, 2018 at 12:24:21PM -0700, John Fastabend wrote:
>> Add sample application support for the bpf_msg_cork_bytes helper. This
>> lets the user specify how many bytes each verdict should apply to.
>>
>> Similar to apply_bytes() tests these can be run as a stand-alone test
>> when used without other options or inline with other tests by using
>> the txmsg_cork option along with any of the basic tests txmsg,
>> txmsg_redir, txmsg_drop.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>  include/uapi/linux/bpf_common.h           |    7 ++--
>>  samples/sockmap/sockmap_kern.c            |   53 +++++++++++++++++++++++++----
>>  samples/sockmap/sockmap_user.c            |   19 ++++++++++
>>  tools/include/uapi/linux/bpf.h            |    3 +-
>>  tools/testing/selftests/bpf/bpf_helpers.h |    2 +
>>  5 files changed, 71 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
>> index ee97668..18be907 100644
>> --- a/include/uapi/linux/bpf_common.h
>> +++ b/include/uapi/linux/bpf_common.h
>> @@ -15,10 +15,9 @@
>>  
>>  /* ld/ldx fields */
>>  #define BPF_SIZE(code)  ((code) & 0x18)
>> -#define		BPF_W		0x00 /* 32-bit */
>> -#define		BPF_H		0x08 /* 16-bit */
>> -#define		BPF_B		0x10 /*  8-bit */
>> -/* eBPF		BPF_DW		0x18    64-bit */
>> +#define		BPF_W		0x00
>> +#define		BPF_H		0x08
>> +#define		BPF_B		0x10
> 
> this hunk seems wrong here. Botched rebase?
> 

Yep this hunk has nothing to do with my work so will
remove this hunk.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-15 21:59   ` Alexei Starovoitov
@ 2018-03-15 22:08     ` John Fastabend
  2018-03-15 22:17     ` Daniel Borkmann
  1 sibling, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-15 22:08 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, ast, daniel, davejwatson, netdev

On 03/15/2018 02:59 PM, Alexei Starovoitov wrote:
> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>>  
>> +/* User return codes for SK_MSG prog type. */
>> +enum sk_msg_action {
>> +	SK_MSG_DROP = 0,
>> +	SK_MSG_PASS,
>> +};
> 
> do we really need new enum here?

Nope and as you noticed the actual code uses the
SK_{DROP|PASS} enum. Will remove this.

> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
> and there will be only drop/pass in both enums.
> Also I don't see where these two new SK_MSG_* are used...
> 
>> +
>> +/* user accessible metadata for SK_MSG packet hook, new fields must
>> + * be added to the end of this structure
>> + */
>> +struct sk_msg_md {
>> +	__u32 data;
>> +	__u32 data_end;
>> +};
> 
> I think it's time for me to ask for forgiveness :)
> I used __u32 for data and data_end only because all other fields
> in __sk_buff were __u32 at the time and I couldn't easily figure out
> how to teach verifier to recognize 8-byte rewrites.
> Unfortunately my mistake stuck and was copied over into xdp.
> Since this is new struct let's do it right and add
> 'void *data, *data_end' here,
> since bpf prog will use them as 'void *' pointers.
> There are no compat issues here, since bpf is always 64-bit.
> 

aha nice catch. Yep lets use 'void*' here. I had forgot about
that discussion and copied them here as well.

>> +static int bpf_map_msg_verdict(int _rc, struct sk_msg_buff *md)
>> +{
>> +	return ((_rc == SK_PASS) ?
>> +	       (md->map ? __SK_REDIRECT : __SK_PASS) :
>> +	       __SK_DROP);
> 
> you're using old SK_PASS here too ;)
> that's to my point of not adding SK_MSG_PASS...
> 

+1

> Overall the patch set looks absolutely great.
> Thank you for working on it.
> 

I'll fixup a few of these small things now and should have
a v3 shortly.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-15 21:59   ` Alexei Starovoitov
  2018-03-15 22:08     ` John Fastabend
@ 2018-03-15 22:17     ` Daniel Borkmann
  2018-03-15 22:20       ` Alexei Starovoitov
  1 sibling, 1 reply; 49+ messages in thread
From: Daniel Borkmann @ 2018-03-15 22:17 UTC (permalink / raw)
  To: Alexei Starovoitov, John Fastabend; +Cc: davem, ast, davejwatson, netdev

On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>>  
>> +/* User return codes for SK_MSG prog type. */
>> +enum sk_msg_action {
>> +	SK_MSG_DROP = 0,
>> +	SK_MSG_PASS,
>> +};
> 
> do we really need new enum here?
> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
> and there will be only drop/pass in both enums.
> Also I don't see where these two new SK_MSG_* are used...
> 
>> +
>> +/* user accessible metadata for SK_MSG packet hook, new fields must
>> + * be added to the end of this structure
>> + */
>> +struct sk_msg_md {
>> +	__u32 data;
>> +	__u32 data_end;
>> +};
> 
> I think it's time for me to ask for forgiveness :)

:-)

> I used __u32 for data and data_end only because all other fields
> in __sk_buff were __u32 at the time and I couldn't easily figure out
> how to teach verifier to recognize 8-byte rewrites.
> Unfortunately my mistake stuck and was copied over into xdp.
> Since this is new struct let's do it right and add
> 'void *data, *data_end' here,
> since bpf prog will use them as 'void *' pointers.
> There are no compat issues here, since bpf is always 64-bit.

But at least offset-wise when you do the ctx rewrite this would then
be a bit more tricky when you have 64 bit kernel with 32 bit user
space since void * members are in each cases at different offset. So
unless I'm missing something, this still should either be __u32 or
__u64 instead of void *, no?

>> +static int bpf_map_msg_verdict(int _rc, struct sk_msg_buff *md)
>> +{
>> +	return ((_rc == SK_PASS) ?
>> +	       (md->map ? __SK_REDIRECT : __SK_PASS) :
>> +	       __SK_DROP);
> 
> you're using old SK_PASS here too ;)
> that's to my point of not adding SK_MSG_PASS...
> 
> Overall the patch set looks absolutely great.
> Thank you for working on it.

+1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-15 22:17     ` Daniel Borkmann
@ 2018-03-15 22:20       ` Alexei Starovoitov
  2018-03-15 22:55         ` Daniel Borkmann
  0 siblings, 1 reply; 49+ messages in thread
From: Alexei Starovoitov @ 2018-03-15 22:20 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: John Fastabend, davem, ast, davejwatson, netdev

On Thu, Mar 15, 2018 at 11:17:12PM +0100, Daniel Borkmann wrote:
> On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
> > On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
> >>  
> >> +/* User return codes for SK_MSG prog type. */
> >> +enum sk_msg_action {
> >> +	SK_MSG_DROP = 0,
> >> +	SK_MSG_PASS,
> >> +};
> > 
> > do we really need new enum here?
> > It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
> > and there will be only drop/pass in both enums.
> > Also I don't see where these two new SK_MSG_* are used...
> > 
> >> +
> >> +/* user accessible metadata for SK_MSG packet hook, new fields must
> >> + * be added to the end of this structure
> >> + */
> >> +struct sk_msg_md {
> >> +	__u32 data;
> >> +	__u32 data_end;
> >> +};
> > 
> > I think it's time for me to ask for forgiveness :)
> 
> :-)
> 
> > I used __u32 for data and data_end only because all other fields
> > in __sk_buff were __u32 at the time and I couldn't easily figure out
> > how to teach verifier to recognize 8-byte rewrites.
> > Unfortunately my mistake stuck and was copied over into xdp.
> > Since this is new struct let's do it right and add
> > 'void *data, *data_end' here,
> > since bpf prog will use them as 'void *' pointers.
> > There are no compat issues here, since bpf is always 64-bit.
> 
> But at least offset-wise when you do the ctx rewrite this would then
> be a bit more tricky when you have 64 bit kernel with 32 bit user
> space since void * members are in each cases at different offset. So
> unless I'm missing something, this still should either be __u32 or
> __u64 instead of void *, no?

there is no 32-bit user space. these structs are seen by bpf progs only
and bpf is 64-bit only too.
unless I'm missing your point.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-15 22:20       ` Alexei Starovoitov
@ 2018-03-15 22:55         ` Daniel Borkmann
  2018-03-15 23:06           ` Alexei Starovoitov
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Borkmann @ 2018-03-15 22:55 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: John Fastabend, davem, ast, davejwatson, netdev

On 03/15/2018 11:20 PM, Alexei Starovoitov wrote:
> On Thu, Mar 15, 2018 at 11:17:12PM +0100, Daniel Borkmann wrote:
>> On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
>>> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>>>>  
>>>> +/* User return codes for SK_MSG prog type. */
>>>> +enum sk_msg_action {
>>>> +	SK_MSG_DROP = 0,
>>>> +	SK_MSG_PASS,
>>>> +};
>>>
>>> do we really need new enum here?
>>> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
>>> and there will be only drop/pass in both enums.
>>> Also I don't see where these two new SK_MSG_* are used...
>>>
>>>> +
>>>> +/* user accessible metadata for SK_MSG packet hook, new fields must
>>>> + * be added to the end of this structure
>>>> + */
>>>> +struct sk_msg_md {
>>>> +	__u32 data;
>>>> +	__u32 data_end;
>>>> +};
>>>
>>> I think it's time for me to ask for forgiveness :)
>>
>> :-)
>>
>>> I used __u32 for data and data_end only because all other fields
>>> in __sk_buff were __u32 at the time and I couldn't easily figure out
>>> how to teach verifier to recognize 8-byte rewrites.
>>> Unfortunately my mistake stuck and was copied over into xdp.
>>> Since this is new struct let's do it right and add
>>> 'void *data, *data_end' here,
>>> since bpf prog will use them as 'void *' pointers.
>>> There are no compat issues here, since bpf is always 64-bit.
>>
>> But at least offset-wise when you do the ctx rewrite this would then
>> be a bit more tricky when you have 64 bit kernel with 32 bit user
>> space since void * members are in each cases at different offset. So
>> unless I'm missing something, this still should either be __u32 or
>> __u64 instead of void *, no?
> 
> there is no 32-bit user space. these structs are seen by bpf progs only
> and bpf is 64-bit only too.
> unless I'm missing your point.

Ok, so lets say you have 32 bit LLVM binary and compile the prog where
you access md->data_end. Given the void * in the struct will that access
end up being BPF_W at ctx offset 4 or BPF_DW at ctx offset 8 from clang
perspective (iow, is the back end treating this special and always use
fixed BPF_DW in such case)? If not and it would be the first case with
offset 4, then we could have the case that underlying 64 bit kernel is
expecting ctx offset 8 for doing the md ctx conversion.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-15 22:55         ` Daniel Borkmann
@ 2018-03-15 23:06           ` Alexei Starovoitov
  2018-03-16  0:37             ` Daniel Borkmann
  0 siblings, 1 reply; 49+ messages in thread
From: Alexei Starovoitov @ 2018-03-15 23:06 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: John Fastabend, davem, ast, davejwatson, netdev

On Thu, Mar 15, 2018 at 11:55:39PM +0100, Daniel Borkmann wrote:
> On 03/15/2018 11:20 PM, Alexei Starovoitov wrote:
> > On Thu, Mar 15, 2018 at 11:17:12PM +0100, Daniel Borkmann wrote:
> >> On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
> >>> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
> >>>>  
> >>>> +/* User return codes for SK_MSG prog type. */
> >>>> +enum sk_msg_action {
> >>>> +	SK_MSG_DROP = 0,
> >>>> +	SK_MSG_PASS,
> >>>> +};
> >>>
> >>> do we really need new enum here?
> >>> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
> >>> and there will be only drop/pass in both enums.
> >>> Also I don't see where these two new SK_MSG_* are used...
> >>>
> >>>> +
> >>>> +/* user accessible metadata for SK_MSG packet hook, new fields must
> >>>> + * be added to the end of this structure
> >>>> + */
> >>>> +struct sk_msg_md {
> >>>> +	__u32 data;
> >>>> +	__u32 data_end;
> >>>> +};
> >>>
> >>> I think it's time for me to ask for forgiveness :)
> >>
> >> :-)
> >>
> >>> I used __u32 for data and data_end only because all other fields
> >>> in __sk_buff were __u32 at the time and I couldn't easily figure out
> >>> how to teach verifier to recognize 8-byte rewrites.
> >>> Unfortunately my mistake stuck and was copied over into xdp.
> >>> Since this is new struct let's do it right and add
> >>> 'void *data, *data_end' here,
> >>> since bpf prog will use them as 'void *' pointers.
> >>> There are no compat issues here, since bpf is always 64-bit.
> >>
> >> But at least offset-wise when you do the ctx rewrite this would then
> >> be a bit more tricky when you have 64 bit kernel with 32 bit user
> >> space since void * members are in each cases at different offset. So
> >> unless I'm missing something, this still should either be __u32 or
> >> __u64 instead of void *, no?
> > 
> > there is no 32-bit user space. these structs are seen by bpf progs only
> > and bpf is 64-bit only too.
> > unless I'm missing your point.
> 
> Ok, so lets say you have 32 bit LLVM binary and compile the prog where
> you access md->data_end. Given the void * in the struct will that access
> end up being BPF_W at ctx offset 4 or BPF_DW at ctx offset 8 from clang
> perspective (iow, is the back end treating this special and always use
> fixed BPF_DW in such case)? If not and it would be the first case with
> offset 4, then we could have the case that underlying 64 bit kernel is
> expecting ctx offset 8 for doing the md ctx conversion.

i'm still not quite following.
Whether llvm itself is 32-bit binary or it's arm32 or sprac32 binary
doesn't matter. It will produce the same 64-bit bpf code.
It will see 'void *' deref from this struct and will emit DW.
May be confusion is from newly added -mattr=+alu32 flag?
That option doesn't change that sizeof(void*)==8.
It only allows backend to emit 32-bit alu insns.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-15 23:06           ` Alexei Starovoitov
@ 2018-03-16  0:37             ` Daniel Borkmann
  2018-03-16 16:47               ` John Fastabend
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Borkmann @ 2018-03-16  0:37 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: John Fastabend, davem, ast, davejwatson, netdev

On 03/16/2018 12:06 AM, Alexei Starovoitov wrote:
> On Thu, Mar 15, 2018 at 11:55:39PM +0100, Daniel Borkmann wrote:
>> On 03/15/2018 11:20 PM, Alexei Starovoitov wrote:
>>> On Thu, Mar 15, 2018 at 11:17:12PM +0100, Daniel Borkmann wrote:
>>>> On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
>>>>> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>>>>>>  
>>>>>> +/* User return codes for SK_MSG prog type. */
>>>>>> +enum sk_msg_action {
>>>>>> +	SK_MSG_DROP = 0,
>>>>>> +	SK_MSG_PASS,
>>>>>> +};
>>>>>
>>>>> do we really need new enum here?
>>>>> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
>>>>> and there will be only drop/pass in both enums.
>>>>> Also I don't see where these two new SK_MSG_* are used...
>>>>>
>>>>>> +
>>>>>> +/* user accessible metadata for SK_MSG packet hook, new fields must
>>>>>> + * be added to the end of this structure
>>>>>> + */
>>>>>> +struct sk_msg_md {
>>>>>> +	__u32 data;
>>>>>> +	__u32 data_end;
>>>>>> +};
>>>>>
>>>>> I think it's time for me to ask for forgiveness :)
>>>>
>>>> :-)
>>>>
>>>>> I used __u32 for data and data_end only because all other fields
>>>>> in __sk_buff were __u32 at the time and I couldn't easily figure out
>>>>> how to teach verifier to recognize 8-byte rewrites.
>>>>> Unfortunately my mistake stuck and was copied over into xdp.
>>>>> Since this is new struct let's do it right and add
>>>>> 'void *data, *data_end' here,
>>>>> since bpf prog will use them as 'void *' pointers.
>>>>> There are no compat issues here, since bpf is always 64-bit.
>>>>
>>>> But at least offset-wise when you do the ctx rewrite this would then
>>>> be a bit more tricky when you have 64 bit kernel with 32 bit user
>>>> space since void * members are in each cases at different offset. So
>>>> unless I'm missing something, this still should either be __u32 or
>>>> __u64 instead of void *, no?
>>>
>>> there is no 32-bit user space. these structs are seen by bpf progs only
>>> and bpf is 64-bit only too.
>>> unless I'm missing your point.
>>
>> Ok, so lets say you have 32 bit LLVM binary and compile the prog where
>> you access md->data_end. Given the void * in the struct will that access
>> end up being BPF_W at ctx offset 4 or BPF_DW at ctx offset 8 from clang
>> perspective (iow, is the back end treating this special and always use
>> fixed BPF_DW in such case)? If not and it would be the first case with
>> offset 4, then we could have the case that underlying 64 bit kernel is
>> expecting ctx offset 8 for doing the md ctx conversion.
> 
> i'm still not quite following.
> Whether llvm itself is 32-bit binary or it's arm32 or sprac32 binary
> doesn't matter. It will produce the same 64-bit bpf code.
> It will see 'void *' deref from this struct and will emit DW.
> May be confusion is from newly added -mattr=+alu32 flag?
> That option doesn't change that sizeof(void*)==8.
> It only allows backend to emit 32-bit alu insns.

Ok, so conclusion we had is that while BPF target is unconditionally 64 bit,
it depends which clang front end you use for compilation wrt structs. E.g.
on 32 bit native (e.g. arm) clang front end it would compile the ctx void *
pointers as 4 byte while using clang -target bpf it would compile it as 8
byte. The native clang front end is needed in case of tracing when accessing
pt_regs for walking data structures, but not for networking use case, so
always using -target bpf there is proper way. Meaning there would be no
confusion on the void * since size will always be 8 regardless of underlying
arch being 32 or 64 bit or clang/llvm binary being 32 bit on 64 bit kernel.
Thus, sticking to void * would be fine, but definitely samples/sockmap/Makefile
must be fixed as well, such that people don't copy it wrongly.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
  2018-03-16  0:37             ` Daniel Borkmann
@ 2018-03-16 16:47               ` John Fastabend
  0 siblings, 0 replies; 49+ messages in thread
From: John Fastabend @ 2018-03-16 16:47 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: davem, ast, davejwatson, netdev

On 03/15/2018 05:37 PM, Daniel Borkmann wrote:
> On 03/16/2018 12:06 AM, Alexei Starovoitov wrote:
>> On Thu, Mar 15, 2018 at 11:55:39PM +0100, Daniel Borkmann wrote:
>>> On 03/15/2018 11:20 PM, Alexei Starovoitov wrote:
>>>> On Thu, Mar 15, 2018 at 11:17:12PM +0100, Daniel Borkmann wrote:
>>>>> On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
>>>>>> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>>>>>>>  
>>>>>>> +/* User return codes for SK_MSG prog type. */
>>>>>>> +enum sk_msg_action {
>>>>>>> +	SK_MSG_DROP = 0,
>>>>>>> +	SK_MSG_PASS,
>>>>>>> +};
>>>>>>
>>>>>> do we really need new enum here?
>>>>>> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
>>>>>> and there will be only drop/pass in both enums.
>>>>>> Also I don't see where these two new SK_MSG_* are used...
>>>>>>
>>>>>>> +
>>>>>>> +/* user accessible metadata for SK_MSG packet hook, new fields must
>>>>>>> + * be added to the end of this structure
>>>>>>> + */
>>>>>>> +struct sk_msg_md {
>>>>>>> +	__u32 data;
>>>>>>> +	__u32 data_end;
>>>>>>> +};
>>>>>>
>>>>>> I think it's time for me to ask for forgiveness :)
>>>>>
>>>>> :-)
>>>>>
>>>>>> I used __u32 for data and data_end only because all other fields
>>>>>> in __sk_buff were __u32 at the time and I couldn't easily figure out
>>>>>> how to teach verifier to recognize 8-byte rewrites.
>>>>>> Unfortunately my mistake stuck and was copied over into xdp.
>>>>>> Since this is new struct let's do it right and add
>>>>>> 'void *data, *data_end' here,
>>>>>> since bpf prog will use them as 'void *' pointers.
>>>>>> There are no compat issues here, since bpf is always 64-bit.
>>>>>
>>>>> But at least offset-wise when you do the ctx rewrite this would then
>>>>> be a bit more tricky when you have 64 bit kernel with 32 bit user
>>>>> space since void * members are in each cases at different offset. So
>>>>> unless I'm missing something, this still should either be __u32 or
>>>>> __u64 instead of void *, no?
>>>>
>>>> there is no 32-bit user space. these structs are seen by bpf progs only
>>>> and bpf is 64-bit only too.
>>>> unless I'm missing your point.
>>>
>>> Ok, so lets say you have 32 bit LLVM binary and compile the prog where
>>> you access md->data_end. Given the void * in the struct will that access
>>> end up being BPF_W at ctx offset 4 or BPF_DW at ctx offset 8 from clang
>>> perspective (iow, is the back end treating this special and always use
>>> fixed BPF_DW in such case)? If not and it would be the first case with
>>> offset 4, then we could have the case that underlying 64 bit kernel is
>>> expecting ctx offset 8 for doing the md ctx conversion.
>>
>> i'm still not quite following.
>> Whether llvm itself is 32-bit binary or it's arm32 or sprac32 binary
>> doesn't matter. It will produce the same 64-bit bpf code.
>> It will see 'void *' deref from this struct and will emit DW.
>> May be confusion is from newly added -mattr=+alu32 flag?
>> That option doesn't change that sizeof(void*)==8.
>> It only allows backend to emit 32-bit alu insns.
> 
> Ok, so conclusion we had is that while BPF target is unconditionally 64 bit,
> it depends which clang front end you use for compilation wrt structs. E.g.
> on 32 bit native (e.g. arm) clang front end it would compile the ctx void *
> pointers as 4 byte while using clang -target bpf it would compile it as 8
> byte. The native clang front end is needed in case of tracing when accessing
> pt_regs for walking data structures, but not for networking use case, so
> always using -target bpf there is proper way. Meaning there would be no
> confusion on the void * since size will always be 8 regardless of underlying
> arch being 32 or 64 bit or clang/llvm binary being 32 bit on 64 bit kernel.
> Thus, sticking to void * would be fine, but definitely samples/sockmap/Makefile
> must be fixed as well, such that people don't copy it wrongly.
> 
> Cheers,
> Danie
I'll send a fix for sockmap/Makefile then as a separate series. And
go ahead and change this series to use 'void *'.

Thanks for the follow-up on this.
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2018-03-16 16:47 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 19:23 [bpf-next PATCH v2 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
2018-03-12 19:23 ` [bpf-next PATCH v2 01/18] sock: make static tls function alloc_sg generic sock helper John Fastabend
2018-03-12 19:23 ` [bpf-next PATCH v2 02/18] sockmap: convert refcnt to an atomic refcnt John Fastabend
2018-03-12 19:23 ` [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
2018-03-15 18:40   ` David Miller
2018-03-12 19:23 ` [bpf-next PATCH v2 04/18] net: generalize sk_alloc_sg to work with scatterlist rings John Fastabend
2018-03-12 19:23 ` [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
2018-03-15 18:41   ` David Miller
2018-03-15 21:59   ` Alexei Starovoitov
2018-03-15 22:08     ` John Fastabend
2018-03-15 22:17     ` Daniel Borkmann
2018-03-15 22:20       ` Alexei Starovoitov
2018-03-15 22:55         ` Daniel Borkmann
2018-03-15 23:06           ` Alexei Starovoitov
2018-03-16  0:37             ` Daniel Borkmann
2018-03-16 16:47               ` John Fastabend
2018-03-12 19:23 ` [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
2018-03-15 18:41   ` David Miller
2018-03-15 20:32   ` Daniel Borkmann
2018-03-15 22:02     ` John Fastabend
2018-03-15 21:45   ` Alexei Starovoitov
2018-03-15 21:59     ` John Fastabend
2018-03-12 19:23 ` [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper John Fastabend
2018-03-15 18:41   ` David Miller
2018-03-12 19:23 ` [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-15 20:25   ` Daniel Borkmann
2018-03-12 19:23 ` [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-12 19:23 ` [bpf-next PATCH v2 10/18] bpf: add verifier " John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
2018-03-15 18:42   ` David Miller
2018-03-15 20:15   ` Alexei Starovoitov
2018-03-15 22:04     ` John Fastabend
2018-03-12 19:24 ` [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests John Fastabend
2018-03-15 18:43   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data John Fastabend
2018-03-15 18:43   ` David Miller
2018-03-12 19:24 ` [bpf-next PATCH v2 18/18] bpf: sockmap test script John Fastabend
2018-03-15 18:43   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.