linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints
@ 2019-04-19 21:04 Matt Mullins
  2019-04-19 21:04 ` [PATCH bpf-next v3 1/5] bpf: add writable context for " Matt Mullins
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Matt Mullins @ 2019-04-19 21:04 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song

This adds an opt-in interface for tracepoints to expose a writable context to
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while
supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT
programs, as well as from non-BPF-based tracepoints.

The initial motivation is to support tracing that can be observed from the
remote end of an NBD socket, e.g. by adding flags to the struct nbd_request
header.  Earlier attempts included adding an NBD-specific tracepoint fd, but in
code review, I was recommended to implement it more generically -- as a result,
this patchset is far simpler than my initial try.

v2->v3:
  * Andrew addressed Josef's comments:
    * C-style commenting in nbd.c
    * Collapsed identical events into a single DECLARE_EVENT_CLASS.
      This saves about 2kB of kernel text

v1->v2:
  * add selftests
    * sync tools/include/uapi/linux/bpf.h
  * reject variable offset into the buffer
  * add string representation of PTR_TO_TP_BUFFER to reg_type_str

Andrew Hall (1):
  nbd: add tracepoints for send/receive timing

Matt Mullins (4):
  bpf: add writable context for raw tracepoints
  nbd: trace sending nbd requests
  tools: sync bpf.h
  selftests: bpf: test writable buffers in raw tps

 MAINTAINERS                                   |   1 +
 drivers/block/nbd.c                           |  13 +++
 include/linux/bpf.h                           |   2 +
 include/linux/bpf_types.h                     |   1 +
 include/linux/tracepoint-defs.h               |   1 +
 include/trace/bpf_probe.h                     |  27 ++++-
 include/trace/events/bpf_test_run.h           |  50 ++++++++
 include/trace/events/nbd.h                    | 107 ++++++++++++++++++
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/syscall.c                          |   8 +-
 kernel/bpf/verifier.c                         |  31 +++++
 kernel/trace/bpf_trace.c                      |  21 ++++
 net/bpf/test_run.c                            |   4 +
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 .../raw_tp_writable_reject_nbd_invalid.c      |  40 +++++++
 .../bpf/prog_tests/raw_tp_writable_test_run.c |  80 +++++++++++++
 .../selftests/bpf/verifier/raw_tp_writable.c  |  34 ++++++
 19 files changed, 420 insertions(+), 4 deletions(-)
 create mode 100644 include/trace/events/bpf_test_run.h
 create mode 100644 include/trace/events/nbd.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
 create mode 100644 tools/testing/selftests/bpf/verifier/raw_tp_writable.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 1/5] bpf: add writable context for raw tracepoints
  2019-04-19 21:04 [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints Matt Mullins
@ 2019-04-19 21:04 ` Matt Mullins
  2019-04-22 18:12   ` Yonghong Song
  2019-04-19 21:04 ` [PATCH bpf-next v3 2/5] nbd: trace sending nbd requests Matt Mullins
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-19 21:04 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Steven Rostedt, Ingo Molnar

This is an opt-in interface that allows a tracepoint to provide a safe
buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
The size of the buffer must be a compile-time constant, and is checked
before allowing a BPF program to attach to a tracepoint that uses this
feature.

The pointer to this buffer will be the first argument of tracepoints
that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
tracepoint, but the buffer to which it points may only be written by the
latter.

Signed-off-by: Matt Mullins <mmullins@fb.com>
---
 include/linux/bpf.h             |  2 ++
 include/linux/bpf_types.h       |  1 +
 include/linux/tracepoint-defs.h |  1 +
 include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
 include/uapi/linux/bpf.h        |  1 +
 kernel/bpf/syscall.c            |  8 ++++++--
 kernel/bpf/verifier.c           | 31 +++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
 8 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a2132e09dc1c..d3c71fd67476 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -263,6 +263,7 @@ enum bpf_reg_type {
 	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
 	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
 	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
+	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -352,6 +353,7 @@ struct bpf_prog_aux {
 	u32 used_map_cnt;
 	u32 max_ctx_offset;
 	u32 max_pkt_offset;
+	u32 max_tp_access;
 	u32 stack_depth;
 	u32 id;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 08bf2f1fe553..c766108608cb 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -25,6 +25,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
 BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
+BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
 #endif
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 49ba9cde7e4b..b29950a19205 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -45,6 +45,7 @@ struct bpf_raw_event_map {
 	struct tracepoint	*tp;
 	void			*bpf_func;
 	u32			num_args;
+	u32			writable_size;
 } __aligned(32);
 
 #endif
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 505dae0bed80..d6e556c0a085 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -69,8 +69,7 @@ __bpf_trace_##call(void *__data, proto)					\
  * to make sure that if the tracepoint handling changes, the
  * bpf probe will fail to compile unless it too is updated.
  */
-#undef DEFINE_EVENT
-#define DEFINE_EVENT(template, call, proto, args)			\
+#define __DEFINE_EVENT(template, call, proto, args, size)		\
 static inline void bpf_test_probe_##call(void)				\
 {									\
 	check_trace_callback_type_##call(__bpf_trace_##template);	\
@@ -81,12 +80,36 @@ __bpf_trace_tp_map_##call = {						\
 	.tp		= &__tracepoint_##call,				\
 	.bpf_func	= (void *)__bpf_trace_##template,		\
 	.num_args	= COUNT_ARGS(args),				\
+	.writable_size	= size,						\
 };
 
+#define FIRST(x, ...) x
+
+#undef DEFINE_EVENT_WRITABLE
+#define DEFINE_EVENT_WRITABLE(template, call, proto, args, size)	\
+static inline void bpf_test_buffer_##call(void)				\
+{									\
+	/* BUILD_BUG_ON() is ignored if the code is completely eliminated, but \
+	 * BUILD_BUG_ON_ZERO() uses a different mechanism that is not	\
+	 * dead-code-eliminated.					\
+	 */								\
+	FIRST(proto);							\
+	(void)BUILD_BUG_ON_ZERO(size != sizeof(*FIRST(args)));		\
+}									\
+__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
+
+#undef DEFINE_EVENT
+#define DEFINE_EVENT(template, call, proto, args)			\
+	__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), 0)
 
 #undef DEFINE_EVENT_PRINT
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
 	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+
+#undef DEFINE_EVENT_WRITABLE
+#undef __DEFINE_EVENT
+#undef FIRST
+
 #endif /* CONFIG_BPF_EVENTS */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3c38ac9a92a7..c5335d53ce82 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -166,6 +166,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LIRC_MODE2,
 	BPF_PROG_TYPE_SK_REUSEPORT,
 	BPF_PROG_TYPE_FLOW_DISSECTOR,
+	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 62f6bced3a3c..27e2f22879a4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1720,12 +1720,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 	}
 	raw_tp->btp = btp;
 
-	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
-				 BPF_PROG_TYPE_RAW_TRACEPOINT);
+	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
 	if (IS_ERR(prog)) {
 		err = PTR_ERR(prog);
 		goto out_free_tp;
 	}
+	if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
+	    prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
+		err = -EINVAL;
+		goto out_put_prog;
+	}
 
 	err = bpf_probe_register(raw_tp->btp, prog);
 	if (err)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ce166a002d16..32f706e99319 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -411,6 +411,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_SOCK_COMMON_OR_NULL] = "sock_common_or_null",
 	[PTR_TO_TCP_SOCK]	= "tcp_sock",
 	[PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
+	[PTR_TO_TP_BUFFER]	= "tp_buffer",
 };
 
 static char slot_type_char[] = {
@@ -1958,6 +1959,32 @@ static int check_ctx_reg(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int check_tp_buffer_access(struct bpf_verifier_env *env,
+				  const struct bpf_reg_state *reg,
+				  int regno, int off, int size)
+{
+	if (off < 0) {
+		verbose(env,
+			"R%d invalid tracepoint buffer access: off=%d, size=%d",
+			regno, off, size);
+		return -EACCES;
+	}
+	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
+		char tn_buf[48];
+
+		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+		verbose(env,
+			"R%d invalid variable buffer offset: off=%d, var_off=%s",
+			regno, off, tn_buf);
+		return -EACCES;
+	}
+	if (off + size > env->prog->aux->max_tp_access)
+		env->prog->aux->max_tp_access = off + size;
+
+	return 0;
+}
+
+
 /* truncate register to smaller size (in bytes)
  * must be called with size < BPF_REG_SIZE
  */
@@ -2100,6 +2127,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_sock_access(env, insn_idx, regno, off, size, t);
 		if (!err && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
+	} else if (reg->type == PTR_TO_TP_BUFFER) {
+		err = check_tp_buffer_access(env, reg, regno, off, size);
+		if (!err && t == BPF_READ && value_regno >= 0)
+			mark_reg_unknown(env, regs, value_regno);
 	} else {
 		verbose(env, "R%d invalid mem access '%s'\n", regno,
 			reg_type_str[reg->type]);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d64c00afceb5..a2dd79dc6871 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
 const struct bpf_prog_ops raw_tracepoint_prog_ops = {
 };
 
+static bool raw_tp_writable_prog_is_valid_access(int off, int size,
+						 enum bpf_access_type type,
+						 const struct bpf_prog *prog,
+						 struct bpf_insn_access_aux *info)
+{
+	if (off == 0 && size == sizeof(u64))
+		info->reg_type = PTR_TO_TP_BUFFER;
+	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
+}
+
+const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
+	.get_func_proto  = raw_tp_prog_func_proto,
+	.is_valid_access = raw_tp_writable_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
+};
+
 static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
 				    const struct bpf_prog *prog,
 				    struct bpf_insn_access_aux *info)
@@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
 		return -EINVAL;
 
+	if (prog->aux->max_tp_access > btp->writable_size)
+		return -EINVAL;
+
 	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 2/5] nbd: trace sending nbd requests
  2019-04-19 21:04 [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints Matt Mullins
  2019-04-19 21:04 ` [PATCH bpf-next v3 1/5] bpf: add writable context for " Matt Mullins
@ 2019-04-19 21:04 ` Matt Mullins
  2019-04-19 21:16   ` Josef Bacik
  2019-04-19 21:04 ` [PATCH bpf-next v3 3/5] nbd: add tracepoints for send/receive timing Matt Mullins
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-19 21:04 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Josef Bacik, Jens Axboe, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Mauro Carvalho Chehab,
	Greg Kroah-Hartman, Nicolas Ferre, linux-block, nbd

This adds a tracepoint that can both observe the nbd request being sent
to the server, as well as modify that request , e.g., setting a flag in
the request that will cause the server to collect detailed tracing data.

The struct request * being handled is included to permit correlation
with the block tracepoints.

Signed-off-by: Matt Mullins <mmullins@fb.com>
---
 MAINTAINERS                |  1 +
 drivers/block/nbd.c        |  5 ++++
 include/trace/events/nbd.h | 56 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+)
 create mode 100644 include/trace/events/nbd.h

diff --git a/MAINTAINERS b/MAINTAINERS
index e17ebf70b548..6db583d2b0ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10737,6 +10737,7 @@ L:	linux-block@vger.kernel.org
 L:	nbd@other.debian.org
 F:	Documentation/blockdev/nbd.txt
 F:	drivers/block/nbd.c
+F:	include/trace/events/nbd.h
 F:	include/uapi/linux/nbd.h
 
 NETWORK DROP MONITOR
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 90ba9f4c03f3..7393d04d255c 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -44,6 +44,9 @@
 #include <linux/nbd-netlink.h>
 #include <net/genetlink.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/nbd.h>
+
 static DEFINE_IDR(nbd_index_idr);
 static DEFINE_MUTEX(nbd_index_mutex);
 static int nbd_total_devices = 0;
@@ -526,6 +529,8 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	handle = nbd_cmd_handle(cmd);
 	memcpy(request.handle, &handle, sizeof(handle));
 
+	trace_nbd_send_request(&request, nbd->index, blk_mq_rq_from_pdu(cmd));
+
 	dev_dbg(nbd_to_dev(nbd), "request %p: sending control (%s@%llu,%uB)\n",
 		req, nbdcmd_to_ascii(type),
 		(unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req));
diff --git a/include/trace/events/nbd.h b/include/trace/events/nbd.h
new file mode 100644
index 000000000000..5928255ed02e
--- /dev/null
+++ b/include/trace/events/nbd.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM nbd
+
+#if !defined(_TRACE_NBD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NBD_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(nbd_send_request,
+
+	TP_PROTO(struct nbd_request *nbd_request, int index,
+		 struct request *rq),
+
+	TP_ARGS(nbd_request, index, rq),
+
+	TP_STRUCT__entry(
+		__field(struct nbd_request *, nbd_request)
+		__field(u64, dev_index)
+		__field(struct request *, request)
+	),
+
+	TP_fast_assign(
+		__entry->nbd_request = 0;
+		__entry->dev_index = index;
+		__entry->request = rq;
+	),
+
+	TP_printk("nbd%lld: request %p", __entry->dev_index, __entry->request)
+);
+
+#ifdef DEFINE_EVENT_WRITABLE
+#undef NBD_DEFINE_EVENT
+#define NBD_DEFINE_EVENT(template, call, proto, args, size)		\
+	DEFINE_EVENT_WRITABLE(template, call, PARAMS(proto),		\
+			      PARAMS(args), size)
+#else
+#undef NBD_DEFINE_EVENT
+#define NBD_DEFINE_EVENT(template, call, proto, args, size)		\
+	DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args))
+#endif
+
+NBD_DEFINE_EVENT(nbd_send_request, nbd_send_request,
+
+	TP_PROTO(struct nbd_request *nbd_request, int index,
+		 struct request *rq),
+
+	TP_ARGS(nbd_request, index, rq),
+
+	sizeof(struct nbd_request)
+);
+
+#endif
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 3/5] nbd: add tracepoints for send/receive timing
  2019-04-19 21:04 [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints Matt Mullins
  2019-04-19 21:04 ` [PATCH bpf-next v3 1/5] bpf: add writable context for " Matt Mullins
  2019-04-19 21:04 ` [PATCH bpf-next v3 2/5] nbd: trace sending nbd requests Matt Mullins
@ 2019-04-19 21:04 ` Matt Mullins
  2019-04-19 21:16   ` Josef Bacik
  2019-04-19 21:04 ` [PATCH bpf-next v3 4/5] tools: sync bpf.h Matt Mullins
  2019-04-19 21:04 ` [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps Matt Mullins
  4 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-19 21:04 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Josef Bacik, Jens Axboe, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, linux-block, nbd

From: Andrew Hall <hall@fb.com>

This adds four tracepoints to nbd, enabling separate tracing of payload
and header sending/receipt.

In the send path for headers that have already been sent, we also
explicitly initialize the handle so it can be referenced by the later
tracepoint.

Signed-off-by: Andrew Hall <hall@fb.com>
---
 drivers/block/nbd.c        |  8 ++++++
 include/trace/events/nbd.h | 51 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7393d04d255c..cb5ae6711ee0 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -513,6 +513,10 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	if (sent) {
 		if (sent >= sizeof(request)) {
 			skip = sent - sizeof(request);
+
+			/* initialize handle for tracing purposes */
+			handle = nbd_cmd_handle(cmd);
+
 			goto send_pages;
 		}
 		iov_iter_advance(&from, sent);
@@ -536,6 +540,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		(unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req));
 	result = sock_xmit(nbd, index, 1, &from,
 			(type == NBD_CMD_WRITE) ? MSG_MORE : 0, &sent);
+	trace_nbd_header_sent(req, handle);
 	if (result <= 0) {
 		if (was_interrupted(result)) {
 			/* If we havne't sent anything we can just return BUSY,
@@ -608,6 +613,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		bio = next;
 	}
 out:
+	trace_nbd_payload_sent(req, handle);
 	nsock->pending = NULL;
 	nsock->sent = 0;
 	return 0;
@@ -655,6 +661,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 			tag, req);
 		return ERR_PTR(-ENOENT);
 	}
+	trace_nbd_header_received(req, handle);
 	cmd = blk_mq_rq_to_pdu(req);
 
 	mutex_lock(&cmd->lock);
@@ -708,6 +715,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		}
 	}
 out:
+	trace_nbd_payload_received(req, handle);
 	mutex_unlock(&cmd->lock);
 	return ret ? ERR_PTR(ret) : cmd;
 }
diff --git a/include/trace/events/nbd.h b/include/trace/events/nbd.h
index 5928255ed02e..9849956f34d8 100644
--- a/include/trace/events/nbd.h
+++ b/include/trace/events/nbd.h
@@ -7,6 +7,57 @@
 
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(nbd_transport_event,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle),
+
+	TP_STRUCT__entry(
+		__field(struct request *, req)
+		__field(u64, handle)
+	),
+
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->handle = handle;
+	),
+
+	TP_printk(
+		"nbd transport event: request %p, handle 0x%016llx",
+		__entry->req,
+		__entry->handle
+	)
+);
+
+DEFINE_EVENT(nbd_transport_event, nbd_header_sent,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle)
+);
+
+DEFINE_EVENT(nbd_transport_event, nbd_payload_sent,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle)
+);
+
+DEFINE_EVENT(nbd_transport_event, nbd_header_received,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle)
+);
+
+DEFINE_EVENT(nbd_transport_event, nbd_payload_received,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle)
+);
+
 DECLARE_EVENT_CLASS(nbd_send_request,
 
 	TP_PROTO(struct nbd_request *nbd_request, int index,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 4/5] tools: sync bpf.h
  2019-04-19 21:04 [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints Matt Mullins
                   ` (2 preceding siblings ...)
  2019-04-19 21:04 ` [PATCH bpf-next v3 3/5] nbd: add tracepoints for send/receive timing Matt Mullins
@ 2019-04-19 21:04 ` Matt Mullins
  2019-04-19 21:04 ` [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps Matt Mullins
  4 siblings, 0 replies; 16+ messages in thread
From: Matt Mullins @ 2019-04-19 21:04 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, Andrey Ignatov, Quentin Monnet

This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the

	error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum]

build errors it would otherwise cause in libbpf.

Signed-off-by: Matt Mullins <mmullins@fb.com>
---
 tools/include/uapi/linux/bpf.h | 1 +
 tools/lib/bpf/libbpf.c         | 1 +
 tools/lib/bpf/libbpf_probes.c  | 1 +
 3 files changed, 3 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3c38ac9a92a7..c5335d53ce82 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -166,6 +166,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LIRC_MODE2,
 	BPF_PROG_TYPE_SK_REUSEPORT,
 	BPF_PROG_TYPE_FLOW_DISSECTOR,
+	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 };
 
 enum bpf_attach_type {
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d5b830d60601..9b6ce703c057 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1681,6 +1681,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_UNSPEC:
 	case BPF_PROG_TYPE_TRACEPOINT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
 	case BPF_PROG_TYPE_PERF_EVENT:
 		return false;
 	case BPF_PROG_TYPE_KPROBE:
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 8c3a1c04dcb2..26eff2f39bed 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -93,6 +93,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_SK_MSG:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
 	case BPF_PROG_TYPE_LWT_SEG6LOCAL:
 	case BPF_PROG_TYPE_LIRC_MODE2:
 	case BPF_PROG_TYPE_SK_REUSEPORT:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps
  2019-04-19 21:04 [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints Matt Mullins
                   ` (3 preceding siblings ...)
  2019-04-19 21:04 ` [PATCH bpf-next v3 4/5] tools: sync bpf.h Matt Mullins
@ 2019-04-19 21:04 ` Matt Mullins
  2019-04-22 18:32   ` Yonghong Song
  4 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-19 21:04 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
	Shuah Khan, linux-kselftest

This tests that:
  * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
    uses either:
    * a variable offset to the tracepoint buffer, or
    * an offset beyond the size of the tracepoint buffer
  * a tracer can modify the buffer provided when attached to a writable
    tracepoint in bpf_prog_test_run

Signed-off-by: Matt Mullins <mmullins@fb.com>
---
 include/trace/events/bpf_test_run.h           | 50 ++++++++++++
 net/bpf/test_run.c                            |  4 +
 .../raw_tp_writable_reject_nbd_invalid.c      | 40 ++++++++++
 .../bpf/prog_tests/raw_tp_writable_test_run.c | 80 +++++++++++++++++++
 .../selftests/bpf/verifier/raw_tp_writable.c  | 34 ++++++++
 5 files changed, 208 insertions(+)
 create mode 100644 include/trace/events/bpf_test_run.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
 create mode 100644 tools/testing/selftests/bpf/verifier/raw_tp_writable.c

diff --git a/include/trace/events/bpf_test_run.h b/include/trace/events/bpf_test_run.h
new file mode 100644
index 000000000000..abf466839ea4
--- /dev/null
+++ b/include/trace/events/bpf_test_run.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bpf_test_run
+
+#if !defined(_TRACE_NBD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BPF_TEST_RUN_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(bpf_test_finish,
+
+	TP_PROTO(int *err),
+
+	TP_ARGS(err),
+
+	TP_STRUCT__entry(
+		__field(int, err)
+	),
+
+	TP_fast_assign(
+		__entry->err = *err;
+	),
+
+	TP_printk("bpf_test_finish with err=%d", __entry->err)
+);
+
+#ifdef DEFINE_EVENT_WRITABLE
+#undef BPF_TEST_RUN_DEFINE_EVENT
+#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
+	DEFINE_EVENT_WRITABLE(template, call, PARAMS(proto),		\
+			      PARAMS(args), size)
+#else
+#undef BPF_TEST_RUN_DEFINE_EVENT
+#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
+	DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args))
+#endif
+
+BPF_TEST_RUN_DEFINE_EVENT(bpf_test_finish, bpf_test_finish,
+
+	TP_PROTO(int *err),
+
+	TP_ARGS(err),
+
+	sizeof(int)
+);
+
+#endif
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index fab142b796ef..25e757102595 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -13,6 +13,9 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/bpf_test_run.h>
+
 static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 			u32 *retval, u32 *time)
 {
@@ -100,6 +103,7 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 	if (err != -ENOSPC)
 		err = 0;
 out:
+	trace_bpf_test_finish(&err);
 	return err;
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
new file mode 100644
index 000000000000..328d5c4b084b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include <linux/nbd.h>
+
+void test_raw_tp_writable_reject_nbd_invalid(void)
+{
+	__u32 duration = 0;
+	char error[4096];
+	int bpf_fd = -1, tp_fd = -1;
+
+	const struct bpf_insn program[] = {
+		/* r6 is our tp buffer */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 128),
+		BPF_EXIT_INSN(),
+	};
+
+	struct bpf_load_program_attr load_attr = {
+		.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
+		.license = "GPL v2",
+		.insns = program,
+		.insns_cnt = sizeof(program) / sizeof(struct bpf_insn),
+		.log_level = 2,
+	};
+
+	bpf_fd = bpf_load_program_xattr(&load_attr, error, sizeof(error));
+	if (CHECK(bpf_fd < 0, "bpf_raw_tracepoint_writable loaded",
+		  "failed: %d errno %d\n", bpf_fd, errno))
+		return;
+
+	tp_fd = bpf_raw_tracepoint_open("nbd_send_request", bpf_fd);
+	if (CHECK(tp_fd >= 0, "bpf_raw_tracepoint_writable opened",
+		  "erroneously succeeded\n"))
+		goto out_bpffd;
+
+	close(tp_fd);
+out_bpffd:
+	close(bpf_fd);
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
new file mode 100644
index 000000000000..4145925f9cab
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include <linux/nbd.h>
+
+void test_raw_tp_writable_test_run(void)
+{
+	__u32 duration = 0;
+	char error[4096];
+
+	const struct bpf_insn trace_program[] = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, 0),
+		BPF_LD_IMM64(BPF_REG_0, 42),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+
+	struct bpf_load_program_attr load_attr = {
+		.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
+		.license = "GPL v2",
+		.insns = trace_program,
+		.insns_cnt = sizeof(trace_program) / sizeof(struct bpf_insn),
+		.log_level = 2,
+	};
+
+	int bpf_fd = bpf_load_program_xattr(&load_attr, error, sizeof(error));
+	if (CHECK(bpf_fd < 0, "bpf_raw_tracepoint_writable loaded",
+		  "failed: %d errno %d\n", bpf_fd, errno))
+		return;
+
+	const struct bpf_insn skb_program[] = {
+		BPF_LD_IMM64(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+
+	struct bpf_load_program_attr skb_load_attr = {
+		.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
+		.license = "GPL v2",
+		.insns = skb_program,
+		.insns_cnt = sizeof(skb_program) / sizeof(struct bpf_insn),
+	};
+
+	int filter_fd =
+		bpf_load_program_xattr(&skb_load_attr, error, sizeof(error));
+	if (CHECK(filter_fd < 0, "test_program_loaded", "failed: %d errno %d\n",
+		  filter_fd, errno))
+		goto out_bpffd;
+
+	int tp_fd = bpf_raw_tracepoint_open("bpf_test_finish", bpf_fd);
+	if (CHECK(tp_fd < 0, "bpf_raw_tracepoint_writable opened",
+		  "failed: %d errno %d\n", tp_fd, errno))
+		goto out_filterfd;
+
+	char test_skb[128] = {
+		0,
+	};
+
+	__u32 prog_ret;
+	int err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0,
+				    0, &prog_ret, 0);
+	CHECK(err != 42, "test_run",
+	      "tracepoint did not modify return value\n");
+	CHECK(prog_ret != 0, "test_run_ret",
+	      "socket_filter did not return 0\n");
+
+	close(tp_fd);
+
+	err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0, 0,
+				&prog_ret, 0);
+	CHECK(err != 0, "test_run_notrace",
+	      "test_run failed with %d errno %d\n", err, errno);
+	CHECK(prog_ret != 0, "test_run_ret_notrace",
+	      "socket_filter did not return 0\n");
+
+out_filterfd:
+	close(filter_fd);
+out_bpffd:
+	close(bpf_fd);
+}
diff --git a/tools/testing/selftests/bpf/verifier/raw_tp_writable.c b/tools/testing/selftests/bpf/verifier/raw_tp_writable.c
new file mode 100644
index 000000000000..95b5d70a1dc1
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/raw_tp_writable.c
@@ -0,0 +1,34 @@
+{
+	"raw_tracepoint_writable: reject variable offset",
+	.insns = {
+		/* r6 is our tp buffer */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
+
+		BPF_LD_MAP_FD(BPF_REG_1, 0),
+		/* move the key (== 0) to r10-8 */
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+		BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+		/* lookup in the map */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			     BPF_FUNC_map_lookup_elem),
+
+		/* exit clean if null */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+		BPF_EXIT_INSN(),
+
+		/* shift the buffer pointer to a variable location */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
+		BPF_ALU64_REG(BPF_ADD, BPF_REG_6, BPF_REG_0),
+		/* clobber whatever's there */
+		BPF_MOV64_IMM(BPF_REG_7, 4242),
+		BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_7, 0),
+
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.fixup_map_hash_8b = { 1, },
+	.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
+	.errstr = "R6 invalid variable buffer offset: off=0, var_off=(0x0; 0xffffffff)",
+},
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/5] nbd: trace sending nbd requests
  2019-04-19 21:04 ` [PATCH bpf-next v3 2/5] nbd: trace sending nbd requests Matt Mullins
@ 2019-04-19 21:16   ` Josef Bacik
  0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2019-04-19 21:16 UTC (permalink / raw)
  To: Matt Mullins
  Cc: hall, ast, bpf, netdev, linux-kernel, Josef Bacik, Jens Axboe,
	Steven Rostedt, Ingo Molnar, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, David S. Miller, Mauro Carvalho Chehab,
	Greg Kroah-Hartman, Nicolas Ferre, linux-block, nbd

On Fri, Apr 19, 2019 at 02:04:06PM -0700, Matt Mullins wrote:
> This adds a tracepoint that can both observe the nbd request being sent
> to the server, as well as modify that request , e.g., setting a flag in
> the request that will cause the server to collect detailed tracing data.
> 
> The struct request * being handled is included to permit correlation
> with the block tracepoints.
> 
> Signed-off-by: Matt Mullins <mmullins@fb.com>
> ---

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 3/5] nbd: add tracepoints for send/receive timing
  2019-04-19 21:04 ` [PATCH bpf-next v3 3/5] nbd: add tracepoints for send/receive timing Matt Mullins
@ 2019-04-19 21:16   ` Josef Bacik
  0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2019-04-19 21:16 UTC (permalink / raw)
  To: Matt Mullins
  Cc: hall, ast, bpf, netdev, linux-kernel, Josef Bacik, Jens Axboe,
	Steven Rostedt, Ingo Molnar, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, linux-block, nbd

On Fri, Apr 19, 2019 at 02:04:07PM -0700, Matt Mullins wrote:
> From: Andrew Hall <hall@fb.com>
> 
> This adds four tracepoints to nbd, enabling separate tracing of payload
> and header sending/receipt.
> 
> In the send path for headers that have already been sent, we also
> explicitly initialize the handle so it can be referenced by the later
> tracepoint.
> 
> Signed-off-by: Andrew Hall <hall@fb.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: add writable context for raw tracepoints
  2019-04-19 21:04 ` [PATCH bpf-next v3 1/5] bpf: add writable context for " Matt Mullins
@ 2019-04-22 18:12   ` Yonghong Song
  2019-04-22 19:23     ` Matt Mullins
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-04-22 18:12 UTC (permalink / raw)
  To: Matt Mullins, Andrew Hall, ast, bpf, netdev
  Cc: linux-kernel, Daniel Borkmann, Martin Lau, Song Liu,
	Steven Rostedt, Ingo Molnar



On 4/19/19 2:04 PM, Matt Mullins wrote:
> This is an opt-in interface that allows a tracepoint to provide a safe
> buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> The size of the buffer must be a compile-time constant, and is checked
> before allowing a BPF program to attach to a tracepoint that uses this
> feature.
> 
> The pointer to this buffer will be the first argument of tracepoints
> that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> tracepoint, but the buffer to which it points may only be written by the
> latter.
> 
> Signed-off-by: Matt Mullins <mmullins@fb.com>
> ---
>   include/linux/bpf.h             |  2 ++
>   include/linux/bpf_types.h       |  1 +
>   include/linux/tracepoint-defs.h |  1 +
>   include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
>   include/uapi/linux/bpf.h        |  1 +
>   kernel/bpf/syscall.c            |  8 ++++++--
>   kernel/bpf/verifier.c           | 31 +++++++++++++++++++++++++++++++
>   kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
>   8 files changed, 88 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a2132e09dc1c..d3c71fd67476 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -263,6 +263,7 @@ enum bpf_reg_type {
>   	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
>   	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
>   	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
>   };
>   
[...]
>   /* truncate register to smaller size (in bytes)
>    * must be called with size < BPF_REG_SIZE
>    */
> @@ -2100,6 +2127,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>   		err = check_sock_access(env, insn_idx, regno, off, size, t);
>   		if (!err && value_regno >= 0)
>   			mark_reg_unknown(env, regs, value_regno);
> +	} else if (reg->type == PTR_TO_TP_BUFFER) {
> +		err = check_tp_buffer_access(env, reg, regno, off, size);
> +		if (!err && t == BPF_READ && value_regno >= 0)
> +			mark_reg_unknown(env, regs, value_regno);
>   	} else {
>   		verbose(env, "R%d invalid mem access '%s'\n", regno,
>   			reg_type_str[reg->type]);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index d64c00afceb5..a2dd79dc6871 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
>   const struct bpf_prog_ops raw_tracepoint_prog_ops = {
>   };
>   
> +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
> +						 enum bpf_access_type type,
> +						 const struct bpf_prog *prog,
> +						 struct bpf_insn_access_aux *info)
> +{
> +	if (off == 0 && size == sizeof(u64))
> +		info->reg_type = PTR_TO_TP_BUFFER;

on 32bit system, the first argument pointer size could be sizeof(u32)?
Should the first argument for raw_tp_writable_prog be always 
PTR_TO_TP_BUFFER?


> +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
> +}
> +
> +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
> +	.get_func_proto  = raw_tp_prog_func_proto,
> +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
> +};
> +
> +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
> +};
> +
>   static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
>   				    const struct bpf_prog *prog,
>   				    struct bpf_insn_access_aux *info)
> @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
>   	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
>   		return -EINVAL;
>   
> +	if (prog->aux->max_tp_access > btp->writable_size)
> +		return -EINVAL;
> +
>   	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps
  2019-04-19 21:04 ` [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps Matt Mullins
@ 2019-04-22 18:32   ` Yonghong Song
  2019-04-22 19:27     ` Matt Mullins
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-04-22 18:32 UTC (permalink / raw)
  To: Matt Mullins, Andrew Hall, ast, bpf, netdev
  Cc: linux-kernel, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
	Martin Lau, Song Liu, David S. Miller, Shuah Khan,
	linux-kselftest



On 4/19/19 2:04 PM, Matt Mullins wrote:
> This tests that:
>    * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
>      uses either:
>      * a variable offset to the tracepoint buffer, or
>      * an offset beyond the size of the tracepoint buffer
>    * a tracer can modify the buffer provided when attached to a writable
>      tracepoint in bpf_prog_test_run
> 
> Signed-off-by: Matt Mullins <mmullins@fb.com>
> ---
>   include/trace/events/bpf_test_run.h           | 50 ++++++++++++
>   net/bpf/test_run.c                            |  4 +
>   .../raw_tp_writable_reject_nbd_invalid.c      | 40 ++++++++++
>   .../bpf/prog_tests/raw_tp_writable_test_run.c | 80 +++++++++++++++++++
>   .../selftests/bpf/verifier/raw_tp_writable.c  | 34 ++++++++
>   5 files changed, 208 insertions(+)
>   create mode 100644 include/trace/events/bpf_test_run.h
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/raw_tp_writable.c
> 
> diff --git a/include/trace/events/bpf_test_run.h b/include/trace/events/bpf_test_run.h
> new file mode 100644
> index 000000000000..abf466839ea4
> --- /dev/null
> +++ b/include/trace/events/bpf_test_run.h
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM bpf_test_run
> +
> +#if !defined(_TRACE_NBD_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_BPF_TEST_RUN_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(bpf_test_finish,
> +
> +	TP_PROTO(int *err),
> +
> +	TP_ARGS(err),
> +
> +	TP_STRUCT__entry(
> +		__field(int, err)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->err = *err;
> +	),
> +
> +	TP_printk("bpf_test_finish with err=%d", __entry->err)
> +);
> +
> +#ifdef DEFINE_EVENT_WRITABLE
> +#undef BPF_TEST_RUN_DEFINE_EVENT
> +#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
> +	DEFINE_EVENT_WRITABLE(template, call, PARAMS(proto),		\
> +			      PARAMS(args), size)
> +#else
> +#undef BPF_TEST_RUN_DEFINE_EVENT
> +#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
> +	DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args))
> +#endif
> +
> +BPF_TEST_RUN_DEFINE_EVENT(bpf_test_finish, bpf_test_finish,
> +
> +	TP_PROTO(int *err),
> +
> +	TP_ARGS(err),
> +
> +	sizeof(int)
> +);
> +
> +#endif
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index fab142b796ef..25e757102595 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -13,6 +13,9 @@
>   #include <net/sock.h>
>   #include <net/tcp.h>
>   
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/bpf_test_run.h>
> +
>   static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
>   			u32 *retval, u32 *time)
>   {
> @@ -100,6 +103,7 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>   	if (err != -ENOSPC)
>   		err = 0;
>   out:
> +	trace_bpf_test_finish(&err);
>   	return err;
>   }
>   
> diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
> new file mode 100644
> index 000000000000..328d5c4b084b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +#include <linux/nbd.h>
> +
> +void test_raw_tp_writable_reject_nbd_invalid(void)
> +{
> +	__u32 duration = 0;
> +	char error[4096];
> +	int bpf_fd = -1, tp_fd = -1;
> +
> +	const struct bpf_insn program[] = {
> +		/* r6 is our tp buffer */
> +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
> +		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 128),

The number "128" is a little cryptic. Maybe you can use something like
sizeof(struct nbd_request)?

> +		BPF_EXIT_INSN(),
> +	};
> +
> +	struct bpf_load_program_attr load_attr = {
> +		.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> +		.license = "GPL v2",
> +		.insns = program,
> +		.insns_cnt = sizeof(program) / sizeof(struct bpf_insn),
> +		.log_level = 2,
> +	};
> +
> +	bpf_fd = bpf_load_program_xattr(&load_attr, error, sizeof(error));
> +	if (CHECK(bpf_fd < 0, "bpf_raw_tracepoint_writable loaded",
> +		  "failed: %d errno %d\n", bpf_fd, errno))
> +		return;
> +
> +	tp_fd = bpf_raw_tracepoint_open("nbd_send_request", bpf_fd);
> +	if (CHECK(tp_fd >= 0, "bpf_raw_tracepoint_writable opened",
> +		  "erroneously succeeded\n"))
> +		goto out_bpffd;
> +
> +	close(tp_fd);
> +out_bpffd:
> +	close(bpf_fd);
> +}
> diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
> new file mode 100644
> index 000000000000..4145925f9cab
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +#include <linux/nbd.h>
> +
> +void test_raw_tp_writable_test_run(void)
> +{
> +	__u32 duration = 0;
> +	char error[4096];
> +
> +	const struct bpf_insn trace_program[] = {
> +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
> +		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, 0),
> +		BPF_LD_IMM64(BPF_REG_0, 42),
You can use BPF_MOV64_IMM(BPF_REG_0, 42) instead of BPF_LD_IMM64.
BPF_LD_IMM64 is fine, but probably BPF_MOV64_IMM is better.
The same for a few below instances.

> +		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_0, 0),
> +		BPF_EXIT_INSN(),
> +	};
> +
> +	struct bpf_load_program_attr load_attr = {
> +		.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> +		.license = "GPL v2",
> +		.insns = trace_program,
> +		.insns_cnt = sizeof(trace_program) / sizeof(struct bpf_insn),
> +		.log_level = 2,
> +	};
> +
> +	int bpf_fd = bpf_load_program_xattr(&load_attr, error, sizeof(error));
> +	if (CHECK(bpf_fd < 0, "bpf_raw_tracepoint_writable loaded",
> +		  "failed: %d errno %d\n", bpf_fd, errno))
> +		return;
> +
> +	const struct bpf_insn skb_program[] = {
> +		BPF_LD_IMM64(BPF_REG_0, 0),
> +		BPF_EXIT_INSN(),
> +	};
> +
> +	struct bpf_load_program_attr skb_load_attr = {
> +		.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
> +		.license = "GPL v2",
> +		.insns = skb_program,
> +		.insns_cnt = sizeof(skb_program) / sizeof(struct bpf_insn),
> +	};
> +
> +	int filter_fd =
> +		bpf_load_program_xattr(&skb_load_attr, error, sizeof(error));
> +	if (CHECK(filter_fd < 0, "test_program_loaded", "failed: %d errno %d\n",
> +		  filter_fd, errno))
> +		goto out_bpffd;
> +
> +	int tp_fd = bpf_raw_tracepoint_open("bpf_test_finish", bpf_fd);
> +	if (CHECK(tp_fd < 0, "bpf_raw_tracepoint_writable opened",
> +		  "failed: %d errno %d\n", tp_fd, errno))
> +		goto out_filterfd;
> +
> +	char test_skb[128] = {
> +		0,
> +	};
> +
> +	__u32 prog_ret;
> +	int err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0,
> +				    0, &prog_ret, 0);
> +	CHECK(err != 42, "test_run",
> +	      "tracepoint did not modify return value\n");
> +	CHECK(prog_ret != 0, "test_run_ret",
> +	      "socket_filter did not return 0\n");
> +
> +	close(tp_fd);
> +
> +	err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0, 0,
> +				&prog_ret, 0);
> +	CHECK(err != 0, "test_run_notrace",
> +	      "test_run failed with %d errno %d\n", err, errno);
> +	CHECK(prog_ret != 0, "test_run_ret_notrace",
> +	      "socket_filter did not return 0\n");
> +
> +out_filterfd:
> +	close(filter_fd);
> +out_bpffd:
> +	close(bpf_fd);
> +}
> diff --git a/tools/testing/selftests/bpf/verifier/raw_tp_writable.c b/tools/testing/selftests/bpf/verifier/raw_tp_writable.c
> new file mode 100644
> index 000000000000..95b5d70a1dc1
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/verifier/raw_tp_writable.c
> @@ -0,0 +1,34 @@
> +{
> +	"raw_tracepoint_writable: reject variable offset",
> +	.insns = {
> +		/* r6 is our tp buffer */
> +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
> +
> +		BPF_LD_MAP_FD(BPF_REG_1, 0),
> +		/* move the key (== 0) to r10-8 */
> +		BPF_MOV32_IMM(BPF_REG_0, 0),
> +		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +		BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
> +		/* lookup in the map */
> +		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +			     BPF_FUNC_map_lookup_elem),
> +
> +		/* exit clean if null */
> +		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
> +		BPF_EXIT_INSN(),
> +
> +		/* shift the buffer pointer to a variable location */
> +		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
> +		BPF_ALU64_REG(BPF_ADD, BPF_REG_6, BPF_REG_0),
> +		/* clobber whatever's there */
> +		BPF_MOV64_IMM(BPF_REG_7, 4242),
> +		BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_7, 0),
> +
> +		BPF_MOV64_IMM(BPF_REG_0, 0),
> +		BPF_EXIT_INSN(),
> +	},
> +	.fixup_map_hash_8b = { 1, },
> +	.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> +	.errstr = "R6 invalid variable buffer offset: off=0, var_off=(0x0; 0xffffffff)",
> +},
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: add writable context for raw tracepoints
  2019-04-22 18:12   ` Yonghong Song
@ 2019-04-22 19:23     ` Matt Mullins
  2019-04-22 21:17       ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-22 19:23 UTC (permalink / raw)
  To: netdev, Yonghong Song, Andrew Hall, bpf, ast
  Cc: daniel, linux-kernel, Martin Lau, rostedt, mingo, Song Liu

On Mon, 2019-04-22 at 18:12 +0000, Yonghong Song wrote:
> 
> On 4/19/19 2:04 PM, Matt Mullins wrote:
> > This is an opt-in interface that allows a tracepoint to provide a safe
> > buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> > The size of the buffer must be a compile-time constant, and is checked
> > before allowing a BPF program to attach to a tracepoint that uses this
> > feature.
> > 
> > The pointer to this buffer will be the first argument of tracepoints
> > that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> > and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> > tracepoint, but the buffer to which it points may only be written by the
> > latter.
> > 
> > Signed-off-by: Matt Mullins <mmullins@fb.com>
> > ---
> >   include/linux/bpf.h             |  2 ++
> >   include/linux/bpf_types.h       |  1 +
> >   include/linux/tracepoint-defs.h |  1 +
> >   include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
> >   include/uapi/linux/bpf.h        |  1 +
> >   kernel/bpf/syscall.c            |  8 ++++++--
> >   kernel/bpf/verifier.c           | 31 +++++++++++++++++++++++++++++++
> >   kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
> >   8 files changed, 88 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index a2132e09dc1c..d3c71fd67476 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -263,6 +263,7 @@ enum bpf_reg_type {
> >   	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
> >   	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
> >   	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
> >   };
> >   
> 
> [...]
> >   /* truncate register to smaller size (in bytes)
> >    * must be called with size < BPF_REG_SIZE
> >    */
> > @@ -2100,6 +2127,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >   		err = check_sock_access(env, insn_idx, regno, off, size, t);
> >   		if (!err && value_regno >= 0)
> >   			mark_reg_unknown(env, regs, value_regno);
> > +	} else if (reg->type == PTR_TO_TP_BUFFER) {
> > +		err = check_tp_buffer_access(env, reg, regno, off, size);
> > +		if (!err && t == BPF_READ && value_regno >= 0)
> > +			mark_reg_unknown(env, regs, value_regno);
> >   	} else {
> >   		verbose(env, "R%d invalid mem access '%s'\n", regno,
> >   			reg_type_str[reg->type]);
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index d64c00afceb5..a2dd79dc6871 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
> >   const struct bpf_prog_ops raw_tracepoint_prog_ops = {
> >   };
> >   
> > +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
> > +						 enum bpf_access_type type,
> > +						 const struct bpf_prog *prog,
> > +						 struct bpf_insn_access_aux *info)
> > +{
> > +	if (off == 0 && size == sizeof(u64))
> > +		info->reg_type = PTR_TO_TP_BUFFER;
> 
> on 32bit system, the first argument pointer size could be sizeof(u32)?

As far as I can tell, pointers are always 64 bits wide from the
perspective of the eBPF instruction set.  I think the proper fixup is
in include/trace/events/nbd.h ... I should use a u64 instead of a
pointer type.

> Should the first argument for raw_tp_writable_prog be always 
> PTR_TO_TP_BUFFER?

That is the purpose of this patch series, yes.  My initial attempt at
this tried to add it to the end of the context structure instead, and
that ended up being quite complex to track.

> 
> > +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
> > +}
> > +
> > +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
> > +	.get_func_proto  = raw_tp_prog_func_proto,
> > +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
> > +};
> > +
> > +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
> > +};
> > +
> >   static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
> >   				    const struct bpf_prog *prog,
> >   				    struct bpf_insn_access_aux *info)
> > @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> >   	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
> >   		return -EINVAL;
> >   
> > +	if (prog->aux->max_tp_access > btp->writable_size)
> > +		return -EINVAL;
> > +
> >   	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
> >   }
> >   
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps
  2019-04-22 18:32   ` Yonghong Song
@ 2019-04-22 19:27     ` Matt Mullins
  2019-04-22 21:13       ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-22 19:27 UTC (permalink / raw)
  To: netdev, Yonghong Song, Andrew Hall, bpf, ast
  Cc: Song Liu, linux-kernel, daniel, rostedt, mingo, shuah,
	Martin Lau, linux-kselftest, davem

On Mon, 2019-04-22 at 18:32 +0000, Yonghong Song wrote:
> 
> On 4/19/19 2:04 PM, Matt Mullins wrote:
> > This tests that:
> >    * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
> >      uses either:
> >      * a variable offset to the tracepoint buffer, or
> >      * an offset beyond the size of the tracepoint buffer
> >    * a tracer can modify the buffer provided when attached to a writable
> >      tracepoint in bpf_prog_test_run
> > 
> > Signed-off-by: Matt Mullins <mmullins@fb.com>
> > ---
> >   include/trace/events/bpf_test_run.h           | 50 ++++++++++++
> >   net/bpf/test_run.c                            |  4 +
> >   .../raw_tp_writable_reject_nbd_invalid.c      | 40 ++++++++++
> >   .../bpf/prog_tests/raw_tp_writable_test_run.c | 80 +++++++++++++++++++
> >   .../selftests/bpf/verifier/raw_tp_writable.c  | 34 ++++++++
> >   5 files changed, 208 insertions(+)
> >   create mode 100644 include/trace/events/bpf_test_run.h
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/raw_tp_writable.c
> > 
> > diff --git a/include/trace/events/bpf_test_run.h b/include/trace/events/bpf_test_run.h
> > new file mode 100644
> > index 000000000000..abf466839ea4
> > --- /dev/null
> > +++ b/include/trace/events/bpf_test_run.h
> > @@ -0,0 +1,50 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM bpf_test_run
> > +
> > +#if !defined(_TRACE_NBD_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_BPF_TEST_RUN_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +DECLARE_EVENT_CLASS(bpf_test_finish,
> > +
> > +	TP_PROTO(int *err),
> > +
> > +	TP_ARGS(err),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(int, err)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->err = *err;
> > +	),
> > +
> > +	TP_printk("bpf_test_finish with err=%d", __entry->err)
> > +);
> > +
> > +#ifdef DEFINE_EVENT_WRITABLE
> > +#undef BPF_TEST_RUN_DEFINE_EVENT
> > +#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
> > +	DEFINE_EVENT_WRITABLE(template, call, PARAMS(proto),		\
> > +			      PARAMS(args), size)
> > +#else
> > +#undef BPF_TEST_RUN_DEFINE_EVENT
> > +#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
> > +	DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args))
> > +#endif
> > +
> > +BPF_TEST_RUN_DEFINE_EVENT(bpf_test_finish, bpf_test_finish,
> > +
> > +	TP_PROTO(int *err),
> > +
> > +	TP_ARGS(err),
> > +
> > +	sizeof(int)
> > +);
> > +
> > +#endif
> > +
> > +/* This part must be outside protection */
> > +#include <trace/define_trace.h>
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index fab142b796ef..25e757102595 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -13,6 +13,9 @@
> >   #include <net/sock.h>
> >   #include <net/tcp.h>
> >   
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/bpf_test_run.h>
> > +
> >   static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
> >   			u32 *retval, u32 *time)
> >   {
> > @@ -100,6 +103,7 @@ static int bpf_test_finish(const union bpf_attr *kattr,
> >   	if (err != -ENOSPC)
> >   		err = 0;
> >   out:
> > +	trace_bpf_test_finish(&err);
> >   	return err;
> >   }
> >   
> > diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
> > new file mode 100644
> > index 000000000000..328d5c4b084b
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
> > @@ -0,0 +1,40 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <test_progs.h>
> > +#include <linux/nbd.h>
> > +
> > +void test_raw_tp_writable_reject_nbd_invalid(void)
> > +{
> > +	__u32 duration = 0;
> > +	char error[4096];
> > +	int bpf_fd = -1, tp_fd = -1;
> > +
> > +	const struct bpf_insn program[] = {
> > +		/* r6 is our tp buffer */
> > +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
> > +		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 128),
> 
> The number "128" is a little cryptic. Maybe you can use something like
> sizeof(struct nbd_request)?

That was explicitly chosen to be (far) larger than an nbd_request, as
this program should be rejected by the verifier.  If you really want, I
can do `sizeof(struct nbd_request) + some constant` and add a comment. 
But the size of an nbd request should never change, as that's a network
protocol.

> 
> > +		BPF_EXIT_INSN(),
> > +	};
> > +
> > +	struct bpf_load_program_attr load_attr = {
> > +		.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> > +		.license = "GPL v2",
> > +		.insns = program,
> > +		.insns_cnt = sizeof(program) / sizeof(struct bpf_insn),
> > +		.log_level = 2,
> > +	};
> > +
> > +	bpf_fd = bpf_load_program_xattr(&load_attr, error, sizeof(error));
> > +	if (CHECK(bpf_fd < 0, "bpf_raw_tracepoint_writable loaded",
> > +		  "failed: %d errno %d\n", bpf_fd, errno))
> > +		return;
> > +
> > +	tp_fd = bpf_raw_tracepoint_open("nbd_send_request", bpf_fd);
> > +	if (CHECK(tp_fd >= 0, "bpf_raw_tracepoint_writable opened",
> > +		  "erroneously succeeded\n"))
> > +		goto out_bpffd;
> > +
> > +	close(tp_fd);
> > +out_bpffd:
> > +	close(bpf_fd);
> > +}
> > diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
> > new file mode 100644
> > index 000000000000..4145925f9cab
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
> > @@ -0,0 +1,80 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <test_progs.h>
> > +#include <linux/nbd.h>
> > +
> > +void test_raw_tp_writable_test_run(void)
> > +{
> > +	__u32 duration = 0;
> > +	char error[4096];
> > +
> > +	const struct bpf_insn trace_program[] = {
> > +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
> > +		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, 0),
> > +		BPF_LD_IMM64(BPF_REG_0, 42),
> 
> You can use BPF_MOV64_IMM(BPF_REG_0, 42) instead of BPF_LD_IMM64.
> BPF_LD_IMM64 is fine, but probably BPF_MOV64_IMM is better.
> The same for a few below instances.

Ah, right.  I don't need the second opcode if the value can be zero-
extended.

> 
> > +		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_0, 0),
> > +		BPF_EXIT_INSN(),
> > +	};
> > +
> > +	struct bpf_load_program_attr load_attr = {
> > +		.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> > +		.license = "GPL v2",
> > +		.insns = trace_program,
> > +		.insns_cnt = sizeof(trace_program) / sizeof(struct bpf_insn),
> > +		.log_level = 2,
> > +	};
> > +
> > +	int bpf_fd = bpf_load_program_xattr(&load_attr, error, sizeof(error));
> > +	if (CHECK(bpf_fd < 0, "bpf_raw_tracepoint_writable loaded",
> > +		  "failed: %d errno %d\n", bpf_fd, errno))
> > +		return;
> > +
> > +	const struct bpf_insn skb_program[] = {
> > +		BPF_LD_IMM64(BPF_REG_0, 0),
> > +		BPF_EXIT_INSN(),
> > +	};
> > +
> > +	struct bpf_load_program_attr skb_load_attr = {
> > +		.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
> > +		.license = "GPL v2",
> > +		.insns = skb_program,
> > +		.insns_cnt = sizeof(skb_program) / sizeof(struct bpf_insn),
> > +	};
> > +
> > +	int filter_fd =
> > +		bpf_load_program_xattr(&skb_load_attr, error, sizeof(error));
> > +	if (CHECK(filter_fd < 0, "test_program_loaded", "failed: %d errno %d\n",
> > +		  filter_fd, errno))
> > +		goto out_bpffd;
> > +
> > +	int tp_fd = bpf_raw_tracepoint_open("bpf_test_finish", bpf_fd);
> > +	if (CHECK(tp_fd < 0, "bpf_raw_tracepoint_writable opened",
> > +		  "failed: %d errno %d\n", tp_fd, errno))
> > +		goto out_filterfd;
> > +
> > +	char test_skb[128] = {
> > +		0,
> > +	};
> > +
> > +	__u32 prog_ret;
> > +	int err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0,
> > +				    0, &prog_ret, 0);
> > +	CHECK(err != 42, "test_run",
> > +	      "tracepoint did not modify return value\n");
> > +	CHECK(prog_ret != 0, "test_run_ret",
> > +	      "socket_filter did not return 0\n");
> > +
> > +	close(tp_fd);
> > +
> > +	err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0, 0,
> > +				&prog_ret, 0);
> > +	CHECK(err != 0, "test_run_notrace",
> > +	      "test_run failed with %d errno %d\n", err, errno);
> > +	CHECK(prog_ret != 0, "test_run_ret_notrace",
> > +	      "socket_filter did not return 0\n");
> > +
> > +out_filterfd:
> > +	close(filter_fd);
> > +out_bpffd:
> > +	close(bpf_fd);
> > +}
> > diff --git a/tools/testing/selftests/bpf/verifier/raw_tp_writable.c b/tools/testing/selftests/bpf/verifier/raw_tp_writable.c
> > new file mode 100644
> > index 000000000000..95b5d70a1dc1
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/verifier/raw_tp_writable.c
> > @@ -0,0 +1,34 @@
> > +{
> > +	"raw_tracepoint_writable: reject variable offset",
> > +	.insns = {
> > +		/* r6 is our tp buffer */
> > +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
> > +
> > +		BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +		/* move the key (== 0) to r10-8 */
> > +		BPF_MOV32_IMM(BPF_REG_0, 0),
> > +		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +		BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
> > +		/* lookup in the map */
> > +		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +			     BPF_FUNC_map_lookup_elem),
> > +
> > +		/* exit clean if null */
> > +		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
> > +		BPF_EXIT_INSN(),
> > +
> > +		/* shift the buffer pointer to a variable location */
> > +		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
> > +		BPF_ALU64_REG(BPF_ADD, BPF_REG_6, BPF_REG_0),
> > +		/* clobber whatever's there */
> > +		BPF_MOV64_IMM(BPF_REG_7, 4242),
> > +		BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_7, 0),
> > +
> > +		BPF_MOV64_IMM(BPF_REG_0, 0),
> > +		BPF_EXIT_INSN(),
> > +	},
> > +	.fixup_map_hash_8b = { 1, },
> > +	.prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> > +	.errstr = "R6 invalid variable buffer offset: off=0, var_off=(0x0; 0xffffffff)",
> > +},
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps
  2019-04-22 19:27     ` Matt Mullins
@ 2019-04-22 21:13       ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-04-22 21:13 UTC (permalink / raw)
  To: Matt Mullins, netdev, Andrew Hall, bpf, ast
  Cc: Song Liu, linux-kernel, daniel, rostedt, mingo, shuah,
	Martin Lau, linux-kselftest, davem



On 4/22/19 12:27 PM, Matt Mullins wrote:
> On Mon, 2019-04-22 at 18:32 +0000, Yonghong Song wrote:
>>
>> On 4/19/19 2:04 PM, Matt Mullins wrote:
>>> This tests that:
>>>     * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
>>>       uses either:
>>>       * a variable offset to the tracepoint buffer, or
>>>       * an offset beyond the size of the tracepoint buffer
>>>     * a tracer can modify the buffer provided when attached to a writable
>>>       tracepoint in bpf_prog_test_run
>>>
>>> Signed-off-by: Matt Mullins <mmullins@fb.com>
>>> ---
>>>    include/trace/events/bpf_test_run.h           | 50 ++++++++++++
>>>    net/bpf/test_run.c                            |  4 +
>>>    .../raw_tp_writable_reject_nbd_invalid.c      | 40 ++++++++++
>>>    .../bpf/prog_tests/raw_tp_writable_test_run.c | 80 +++++++++++++++++++
>>>    .../selftests/bpf/verifier/raw_tp_writable.c  | 34 ++++++++
>>>    5 files changed, 208 insertions(+)
>>>    create mode 100644 include/trace/events/bpf_test_run.h
>>>    create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
>>>    create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_run.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/raw_tp_writable.c
>>>
>>> diff --git a/include/trace/events/bpf_test_run.h b/include/trace/events/bpf_test_run.h
>>> new file mode 100644
>>> index 000000000000..abf466839ea4
>>> --- /dev/null
>>> +++ b/include/trace/events/bpf_test_run.h
>>> @@ -0,0 +1,50 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#undef TRACE_SYSTEM
>>> +#define TRACE_SYSTEM bpf_test_run
>>> +
>>> +#if !defined(_TRACE_NBD_H) || defined(TRACE_HEADER_MULTI_READ)
>>> +#define _TRACE_BPF_TEST_RUN_H
>>> +
>>> +#include <linux/tracepoint.h>
>>> +
>>> +DECLARE_EVENT_CLASS(bpf_test_finish,
>>> +
>>> +	TP_PROTO(int *err),
>>> +
>>> +	TP_ARGS(err),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__field(int, err)
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__entry->err = *err;
>>> +	),
>>> +
>>> +	TP_printk("bpf_test_finish with err=%d", __entry->err)
>>> +);
>>> +
>>> +#ifdef DEFINE_EVENT_WRITABLE
>>> +#undef BPF_TEST_RUN_DEFINE_EVENT
>>> +#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
>>> +	DEFINE_EVENT_WRITABLE(template, call, PARAMS(proto),		\
>>> +			      PARAMS(args), size)
>>> +#else
>>> +#undef BPF_TEST_RUN_DEFINE_EVENT
>>> +#define BPF_TEST_RUN_DEFINE_EVENT(template, call, proto, args, size)	\
>>> +	DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args))
>>> +#endif
>>> +
>>> +BPF_TEST_RUN_DEFINE_EVENT(bpf_test_finish, bpf_test_finish,
>>> +
>>> +	TP_PROTO(int *err),
>>> +
>>> +	TP_ARGS(err),
>>> +
>>> +	sizeof(int)
>>> +);
>>> +
>>> +#endif
>>> +
>>> +/* This part must be outside protection */
>>> +#include <trace/define_trace.h>
>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>>> index fab142b796ef..25e757102595 100644
>>> --- a/net/bpf/test_run.c
>>> +++ b/net/bpf/test_run.c
>>> @@ -13,6 +13,9 @@
>>>    #include <net/sock.h>
>>>    #include <net/tcp.h>
>>>    
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/bpf_test_run.h>
>>> +
>>>    static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
>>>    			u32 *retval, u32 *time)
>>>    {
>>> @@ -100,6 +103,7 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>>>    	if (err != -ENOSPC)
>>>    		err = 0;
>>>    out:
>>> +	trace_bpf_test_finish(&err);
>>>    	return err;
>>>    }
>>>    
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
>>> new file mode 100644
>>> index 000000000000..328d5c4b084b
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_writable_reject_nbd_invalid.c
>>> @@ -0,0 +1,40 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +#include <test_progs.h>
>>> +#include <linux/nbd.h>
>>> +
>>> +void test_raw_tp_writable_reject_nbd_invalid(void)
>>> +{
>>> +	__u32 duration = 0;
>>> +	char error[4096];
>>> +	int bpf_fd = -1, tp_fd = -1;
>>> +
>>> +	const struct bpf_insn program[] = {
>>> +		/* r6 is our tp buffer */
>>> +		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
>>> +		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 128),
>>
>> The number "128" is a little cryptic. Maybe you can use something like
>> sizeof(struct nbd_request)?
> 
> That was explicitly chosen to be (far) larger than an nbd_request, as
> this program should be rejected by the verifier.  If you really want, I
> can do `sizeof(struct nbd_request) + some constant` and add a comment.
> But the size of an nbd request should never change, as that's a network
> protocol.

I think `sizeof(struct nbd_request) + some constant` is better than 
number `128`.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: add writable context for raw tracepoints
  2019-04-22 19:23     ` Matt Mullins
@ 2019-04-22 21:17       ` Yonghong Song
  2019-04-22 23:01         ` Matt Mullins
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-04-22 21:17 UTC (permalink / raw)
  To: Matt Mullins, netdev, Andrew Hall, bpf, ast
  Cc: daniel, linux-kernel, Martin Lau, rostedt, mingo, Song Liu



On 4/22/19 12:23 PM, Matt Mullins wrote:
> On Mon, 2019-04-22 at 18:12 +0000, Yonghong Song wrote:
>>
>> On 4/19/19 2:04 PM, Matt Mullins wrote:
>>> This is an opt-in interface that allows a tracepoint to provide a safe
>>> buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
>>> The size of the buffer must be a compile-time constant, and is checked
>>> before allowing a BPF program to attach to a tracepoint that uses this
>>> feature.
>>>
>>> The pointer to this buffer will be the first argument of tracepoints
>>> that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
>>> and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
>>> tracepoint, but the buffer to which it points may only be written by the
>>> latter.
>>>
>>> Signed-off-by: Matt Mullins <mmullins@fb.com>
>>> ---
>>>    include/linux/bpf.h             |  2 ++
>>>    include/linux/bpf_types.h       |  1 +
>>>    include/linux/tracepoint-defs.h |  1 +
>>>    include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
>>>    include/uapi/linux/bpf.h        |  1 +
>>>    kernel/bpf/syscall.c            |  8 ++++++--
>>>    kernel/bpf/verifier.c           | 31 +++++++++++++++++++++++++++++++
>>>    kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
>>>    8 files changed, 88 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index a2132e09dc1c..d3c71fd67476 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -263,6 +263,7 @@ enum bpf_reg_type {
>>>    	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
>>>    	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
>>>    	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
>>> +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
>>>    };
>>>    
>>
>> [...]
>>>    /* truncate register to smaller size (in bytes)
>>>     * must be called with size < BPF_REG_SIZE
>>>     */
>>> @@ -2100,6 +2127,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>>>    		err = check_sock_access(env, insn_idx, regno, off, size, t);
>>>    		if (!err && value_regno >= 0)
>>>    			mark_reg_unknown(env, regs, value_regno);
>>> +	} else if (reg->type == PTR_TO_TP_BUFFER) {
>>> +		err = check_tp_buffer_access(env, reg, regno, off, size);
>>> +		if (!err && t == BPF_READ && value_regno >= 0)
>>> +			mark_reg_unknown(env, regs, value_regno);
>>>    	} else {
>>>    		verbose(env, "R%d invalid mem access '%s'\n", regno,
>>>    			reg_type_str[reg->type]);
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index d64c00afceb5..a2dd79dc6871 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
>>>    const struct bpf_prog_ops raw_tracepoint_prog_ops = {
>>>    };
>>>    
>>> +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
>>> +						 enum bpf_access_type type,
>>> +						 const struct bpf_prog *prog,
>>> +						 struct bpf_insn_access_aux *info)
>>> +{
>>> +	if (off == 0 && size == sizeof(u64))
>>> +		info->reg_type = PTR_TO_TP_BUFFER;
>>
>> on 32bit system, the first argument pointer size could be sizeof(u32)?
> 
> As far as I can tell, pointers are always 64 bits wide from the
> perspective of the eBPF instruction set.  I think the proper fixup is
> in include/trace/events/nbd.h ... I should use a u64 instead of a
> pointer type.

u64 is okay. You may want to double check tracepoint definition to 
ensure the assign to the first argument converting to u64 as well to 
avoid potential garbage. It would be good if this is enforced during
compilation time.

> 
>> Should the first argument for raw_tp_writable_prog be always
>> PTR_TO_TP_BUFFER?
> 
> That is the purpose of this patch series, yes.  My initial attempt at
> this tried to add it to the end of the context structure instead, and
> that ended up being quite complex to track.

So `size == sizeof(u64)` can be removed, off == 0 just implies
reg_type PTR_TO_TP_BUFFER?

> 
>>
>>> +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
>>> +}
>>> +
>>> +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
>>> +	.get_func_proto  = raw_tp_prog_func_proto,
>>> +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
>>> +};
>>> +
>>> +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
>>> +};
>>> +
>>>    static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
>>>    				    const struct bpf_prog *prog,
>>>    				    struct bpf_insn_access_aux *info)
>>> @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
>>>    	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
>>>    		return -EINVAL;
>>>    
>>> +	if (prog->aux->max_tp_access > btp->writable_size)
>>> +		return -EINVAL;
>>> +
>>>    	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
>>>    }
>>>    
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: add writable context for raw tracepoints
  2019-04-22 21:17       ` Yonghong Song
@ 2019-04-22 23:01         ` Matt Mullins
  2019-04-22 23:16           ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Mullins @ 2019-04-22 23:01 UTC (permalink / raw)
  To: netdev, Yonghong Song, Andrew Hall, bpf, ast
  Cc: daniel, linux-kernel, Martin Lau, rostedt, mingo, Song Liu

On Mon, 2019-04-22 at 21:17 +0000, Yonghong Song wrote:
> 
> On 4/22/19 12:23 PM, Matt Mullins wrote:
> > On Mon, 2019-04-22 at 18:12 +0000, Yonghong Song wrote:
> > > 
> > > On 4/19/19 2:04 PM, Matt Mullins wrote:
> > > > This is an opt-in interface that allows a tracepoint to provide a safe
> > > > buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> > > > The size of the buffer must be a compile-time constant, and is checked
> > > > before allowing a BPF program to attach to a tracepoint that uses this
> > > > feature.
> > > > 
> > > > The pointer to this buffer will be the first argument of tracepoints
> > > > that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> > > > and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> > > > tracepoint, but the buffer to which it points may only be written by the
> > > > latter.
> > > > 
> > > > Signed-off-by: Matt Mullins <mmullins@fb.com>
> > > > ---
> > > >    include/linux/bpf.h             |  2 ++
> > > >    include/linux/bpf_types.h       |  1 +
> > > >    include/linux/tracepoint-defs.h |  1 +
> > > >    include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
> > > >    include/uapi/linux/bpf.h        |  1 +
> > > >    kernel/bpf/syscall.c            |  8 ++++++--
> > > >    kernel/bpf/verifier.c           | 31 +++++++++++++++++++++++++++++++
> > > >    kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
> > > >    8 files changed, 88 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index a2132e09dc1c..d3c71fd67476 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -263,6 +263,7 @@ enum bpf_reg_type {
> > > >    	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
> > > >    	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
> > > >    	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > > > +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
> > > >    };
> > > >    
> > > 
> > > [...]
> > > >    /* truncate register to smaller size (in bytes)
> > > >     * must be called with size < BPF_REG_SIZE
> > > >     */
> > > > @@ -2100,6 +2127,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> > > >    		err = check_sock_access(env, insn_idx, regno, off, size, t);
> > > >    		if (!err && value_regno >= 0)
> > > >    			mark_reg_unknown(env, regs, value_regno);
> > > > +	} else if (reg->type == PTR_TO_TP_BUFFER) {
> > > > +		err = check_tp_buffer_access(env, reg, regno, off, size);
> > > > +		if (!err && t == BPF_READ && value_regno >= 0)
> > > > +			mark_reg_unknown(env, regs, value_regno);
> > > >    	} else {
> > > >    		verbose(env, "R%d invalid mem access '%s'\n", regno,
> > > >    			reg_type_str[reg->type]);
> > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > index d64c00afceb5..a2dd79dc6871 100644
> > > > --- a/kernel/trace/bpf_trace.c
> > > > +++ b/kernel/trace/bpf_trace.c
> > > > @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
> > > >    const struct bpf_prog_ops raw_tracepoint_prog_ops = {
> > > >    };
> > > >    
> > > > +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
> > > > +						 enum bpf_access_type type,
> > > > +						 const struct bpf_prog *prog,
> > > > +						 struct bpf_insn_access_aux *info)
> > > > +{
> > > > +	if (off == 0 && size == sizeof(u64))
> > > > +		info->reg_type = PTR_TO_TP_BUFFER;
> > > 
> > > on 32bit system, the first argument pointer size could be sizeof(u32)?
> > 
> > As far as I can tell, pointers are always 64 bits wide from the
> > perspective of the eBPF instruction set.  I think the proper fixup is
> > in include/trace/events/nbd.h ... I should use a u64 instead of a
> > pointer type.
> 
> u64 is okay. You may want to double check tracepoint definition to 
> ensure the assign to the first argument converting to u64 as well to 
> avoid potential garbage. It would be good if this is enforced during
> compilation time.

Now that I've looked into this further, this is already handled in
include/trace/bpf_probe.h:

#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
static notrace void							\
__bpf_trace_##call(void *__data, proto)					\
{									\
	struct bpf_prog *prog = __data;					\
	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
}

The 32-bit value of the struct nbd_request pointer will be zero-
extended to 64-bits before the BPF program sees it, so there won't be
any garbage in the upper half of the register.  I'm going to leave the
trace_* functions taking the pointer as-is, so that I can keep the
compile-time checks that writable_size == sizeof(*first_argument).

> > > Should the first argument for raw_tp_writable_prog be always
> > > PTR_TO_TP_BUFFER?
> > 
> > That is the purpose of this patch series, yes.  My initial attempt at
> > this tried to add it to the end of the context structure instead, and
> > that ended up being quite complex to track.
> 
> So `size == sizeof(u64)` can be removed, off == 0 just implies
> reg_type PTR_TO_TP_BUFFER?

I can't get rid of the size check, because I can emit an opcode like

  0: (71) r6 = *(u8 *)(r1 +0)

and I don't want to accidentally mark a value as PTR_TO_TP_BUFFER
unless it is a whole, valid pointer.

> > > > +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
> > > > +}
> > > > +
> > > > +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
> > > > +	.get_func_proto  = raw_tp_prog_func_proto,
> > > > +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
> > > > +};
> > > > +
> > > > +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
> > > > +};
> > > > +
> > > >    static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
> > > >    				    const struct bpf_prog *prog,
> > > >    				    struct bpf_insn_access_aux *info)
> > > > @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> > > >    	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
> > > >    		return -EINVAL;
> > > >    
> > > > +	if (prog->aux->max_tp_access > btp->writable_size)
> > > > +		return -EINVAL;
> > > > +
> > > >    	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
> > > >    }
> > > >    
> > > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: add writable context for raw tracepoints
  2019-04-22 23:01         ` Matt Mullins
@ 2019-04-22 23:16           ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-04-22 23:16 UTC (permalink / raw)
  To: Matt Mullins, netdev, Andrew Hall, bpf, ast
  Cc: daniel, linux-kernel, Martin Lau, rostedt, mingo, Song Liu



On 4/22/19 4:01 PM, Matt Mullins wrote:
> On Mon, 2019-04-22 at 21:17 +0000, Yonghong Song wrote:
>>
>> On 4/22/19 12:23 PM, Matt Mullins wrote:
>>> On Mon, 2019-04-22 at 18:12 +0000, Yonghong Song wrote:
>>>>
>>>> On 4/19/19 2:04 PM, Matt Mullins wrote:
>>>>> This is an opt-in interface that allows a tracepoint to provide a safe
>>>>> buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
>>>>> The size of the buffer must be a compile-time constant, and is checked
>>>>> before allowing a BPF program to attach to a tracepoint that uses this
>>>>> feature.
>>>>>
>>>>> The pointer to this buffer will be the first argument of tracepoints
>>>>> that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
>>>>> and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
>>>>> tracepoint, but the buffer to which it points may only be written by the
>>>>> latter.
>>>>>
>>>>> Signed-off-by: Matt Mullins <mmullins@fb.com>
>>>>> ---
>>>>>     include/linux/bpf.h             |  2 ++
>>>>>     include/linux/bpf_types.h       |  1 +
>>>>>     include/linux/tracepoint-defs.h |  1 +
>>>>>     include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
>>>>>     include/uapi/linux/bpf.h        |  1 +
>>>>>     kernel/bpf/syscall.c            |  8 ++++++--
>>>>>     kernel/bpf/verifier.c           | 31 +++++++++++++++++++++++++++++++
>>>>>     kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
>>>>>     8 files changed, 88 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>>> index a2132e09dc1c..d3c71fd67476 100644
>>>>> --- a/include/linux/bpf.h
>>>>> +++ b/include/linux/bpf.h
>>>>> @@ -263,6 +263,7 @@ enum bpf_reg_type {
>>>>>     	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
>>>>>     	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
>>>>>     	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
>>>>> +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
>>>>>     };
>>>>>     
>>>>
>>>> [...]
>>>>>     /* truncate register to smaller size (in bytes)
>>>>>      * must be called with size < BPF_REG_SIZE
>>>>>      */
>>>>> @@ -2100,6 +2127,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>>>>>     		err = check_sock_access(env, insn_idx, regno, off, size, t);
>>>>>     		if (!err && value_regno >= 0)
>>>>>     			mark_reg_unknown(env, regs, value_regno);
>>>>> +	} else if (reg->type == PTR_TO_TP_BUFFER) {
>>>>> +		err = check_tp_buffer_access(env, reg, regno, off, size);
>>>>> +		if (!err && t == BPF_READ && value_regno >= 0)
>>>>> +			mark_reg_unknown(env, regs, value_regno);
>>>>>     	} else {
>>>>>     		verbose(env, "R%d invalid mem access '%s'\n", regno,
>>>>>     			reg_type_str[reg->type]);
>>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>>> index d64c00afceb5..a2dd79dc6871 100644
>>>>> --- a/kernel/trace/bpf_trace.c
>>>>> +++ b/kernel/trace/bpf_trace.c
>>>>> @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
>>>>>     const struct bpf_prog_ops raw_tracepoint_prog_ops = {
>>>>>     };
>>>>>     
>>>>> +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
>>>>> +						 enum bpf_access_type type,
>>>>> +						 const struct bpf_prog *prog,
>>>>> +						 struct bpf_insn_access_aux *info)
>>>>> +{
>>>>> +	if (off == 0 && size == sizeof(u64))
>>>>> +		info->reg_type = PTR_TO_TP_BUFFER;
>>>>
>>>> on 32bit system, the first argument pointer size could be sizeof(u32)?
>>>
>>> As far as I can tell, pointers are always 64 bits wide from the
>>> perspective of the eBPF instruction set.  I think the proper fixup is
>>> in include/trace/events/nbd.h ... I should use a u64 instead of a
>>> pointer type.
>>
>> u64 is okay. You may want to double check tracepoint definition to
>> ensure the assign to the first argument converting to u64 as well to
>> avoid potential garbage. It would be good if this is enforced during
>> compilation time.
> 
> Now that I've looked into this further, this is already handled in
> include/trace/bpf_probe.h:
> 
> #undef DECLARE_EVENT_CLASS
> #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
> static notrace void							\
> __bpf_trace_##call(void *__data, proto)					\
> {									\
> 	struct bpf_prog *prog = __data;					\
> 	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
> }
> 
> The 32-bit value of the struct nbd_request pointer will be zero-
> extended to 64-bits before the BPF program sees it, so there won't be
> any garbage in the upper half of the register.  I'm going to leave the
> trace_* functions taking the pointer as-is, so that I can keep the
> compile-time checks that writable_size == sizeof(*first_argument).
> 
>>>> Should the first argument for raw_tp_writable_prog be always
>>>> PTR_TO_TP_BUFFER?
>>>
>>> That is the purpose of this patch series, yes.  My initial attempt at
>>> this tried to add it to the end of the context structure instead, and
>>> that ended up being quite complex to track.
>>
>> So `size == sizeof(u64)` can be removed, off == 0 just implies
>> reg_type PTR_TO_TP_BUFFER?
> 
> I can't get rid of the size check, because I can emit an opcode like
> 
>    0: (71) r6 = *(u8 *)(r1 +0)
> 
> and I don't want to accidentally mark a value as PTR_TO_TP_BUFFER
> unless it is a whole, valid pointer.

We could reject if off == 0 && size != 8 to allow only whole pointer 
access. I cannot think that user will use r6 in a meaningful except the 
hash. But if you have a valid use case to permit such access, I
am okay with that.

> 
>>>>> +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
>>>>> +}
>>>>> +
>>>>> +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
>>>>> +	.get_func_proto  = raw_tp_prog_func_proto,
>>>>> +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
>>>>> +};
>>>>> +
>>>>> +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
>>>>> +};
>>>>> +
>>>>>     static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
>>>>>     				    const struct bpf_prog *prog,
>>>>>     				    struct bpf_insn_access_aux *info)
>>>>> @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
>>>>>     	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
>>>>>     		return -EINVAL;
>>>>>     
>>>>> +	if (prog->aux->max_tp_access > btp->writable_size)
>>>>> +		return -EINVAL;
>>>>> +
>>>>>     	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
>>>>>     }
>>>>>     
>>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-04-22 23:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-19 21:04 [PATCH bpf-next v3 0/5] writable contexts for bpf raw tracepoints Matt Mullins
2019-04-19 21:04 ` [PATCH bpf-next v3 1/5] bpf: add writable context for " Matt Mullins
2019-04-22 18:12   ` Yonghong Song
2019-04-22 19:23     ` Matt Mullins
2019-04-22 21:17       ` Yonghong Song
2019-04-22 23:01         ` Matt Mullins
2019-04-22 23:16           ` Yonghong Song
2019-04-19 21:04 ` [PATCH bpf-next v3 2/5] nbd: trace sending nbd requests Matt Mullins
2019-04-19 21:16   ` Josef Bacik
2019-04-19 21:04 ` [PATCH bpf-next v3 3/5] nbd: add tracepoints for send/receive timing Matt Mullins
2019-04-19 21:16   ` Josef Bacik
2019-04-19 21:04 ` [PATCH bpf-next v3 4/5] tools: sync bpf.h Matt Mullins
2019-04-19 21:04 ` [PATCH bpf-next v3 5/5] selftests: bpf: test writable buffers in raw tps Matt Mullins
2019-04-22 18:32   ` Yonghong Song
2019-04-22 19:27     ` Matt Mullins
2019-04-22 21:13       ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).