linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] writable contexts for bpf raw tracepoints
@ 2019-03-29  0:07 Matt Mullins
  2019-03-29  0:07 ` [PATCH bpf-next 1/3] bpf: add writable context for " Matt Mullins
  2019-03-29  0:07 ` [PATCH bpf-next 3/3] nbd: add tracepoints for send/receive timing Matt Mullins
  0 siblings, 2 replies; 7+ messages in thread
From: Matt Mullins @ 2019-03-29  0:07 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song

This adds an opt-in interface for tracepoints to expose a writable context to
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while
supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT
programs, as well as from non-BPF-based tracepoints.

The initial motivation is to support tracing that can be observed from the
remote end of an NBD socket, e.g. by adding flags to the struct nbd_request
header.  Earlier attempts included adding an NBD-specific tracepoint fd, but in
code review, I was recommended to implement it more generically -- as a result,
this patchset is far simpler than my initial try.

Andrew Hall (1):
  nbd: add tracepoints for send/receive timing

Matt Mullins (2):
  bpf: add writable context for raw tracepoints
  nbd: trace sending nbd requests

 MAINTAINERS                     |   1 +
 drivers/block/nbd.c             |  13 +++
 include/linux/bpf.h             |   2 +
 include/linux/bpf_types.h       |   1 +
 include/linux/tracepoint-defs.h |   1 +
 include/trace/bpf_probe.h       |  27 +++++-
 include/trace/events/nbd.h      | 148 ++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h        |   1 +
 kernel/bpf/syscall.c            |   8 +-
 kernel/bpf/verifier.c           |  11 +++
 kernel/trace/bpf_trace.c        |  21 +++++
 11 files changed, 230 insertions(+), 4 deletions(-)
 create mode 100644 include/trace/events/nbd.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 1/3] bpf: add writable context for raw tracepoints
  2019-03-29  0:07 [PATCH bpf-next 0/3] writable contexts for bpf raw tracepoints Matt Mullins
@ 2019-03-29  0:07 ` Matt Mullins
  2019-04-01 20:40   ` Daniel Borkmann
  2019-03-29  0:07 ` [PATCH bpf-next 3/3] nbd: add tracepoints for send/receive timing Matt Mullins
  1 sibling, 1 reply; 7+ messages in thread
From: Matt Mullins @ 2019-03-29  0:07 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Steven Rostedt, Ingo Molnar

This is an opt-in interface that allows a tracepoint to provide a safe
buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
The size of the buffer must be a compile-time constant, and is checked
before allowing a BPF program to attach to a tracepoint that uses this
feature.

The pointer to this buffer will be the first argument of tracepoints
that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
tracepoint, but the buffer to which it points may only be written by the
latter.

bpf_probe: assert that writable tracepoint size is correct

Signed-off-by: Matt Mullins <mmullins@fb.com>
---
 include/linux/bpf.h             |  2 ++
 include/linux/bpf_types.h       |  1 +
 include/linux/tracepoint-defs.h |  1 +
 include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
 include/uapi/linux/bpf.h        |  1 +
 kernel/bpf/syscall.c            |  8 ++++++--
 kernel/bpf/verifier.c           | 11 +++++++++++
 kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
 8 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a2132e09dc1c..d3c71fd67476 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -263,6 +263,7 @@ enum bpf_reg_type {
 	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
 	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
 	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
+	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -352,6 +353,7 @@ struct bpf_prog_aux {
 	u32 used_map_cnt;
 	u32 max_ctx_offset;
 	u32 max_pkt_offset;
+	u32 max_tp_access;
 	u32 stack_depth;
 	u32 id;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 08bf2f1fe553..c766108608cb 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -25,6 +25,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
 BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
+BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
 #endif
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 49ba9cde7e4b..b29950a19205 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -45,6 +45,7 @@ struct bpf_raw_event_map {
 	struct tracepoint	*tp;
 	void			*bpf_func;
 	u32			num_args;
+	u32			writable_size;
 } __aligned(32);
 
 #endif
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 505dae0bed80..d6e556c0a085 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -69,8 +69,7 @@ __bpf_trace_##call(void *__data, proto)					\
  * to make sure that if the tracepoint handling changes, the
  * bpf probe will fail to compile unless it too is updated.
  */
-#undef DEFINE_EVENT
-#define DEFINE_EVENT(template, call, proto, args)			\
+#define __DEFINE_EVENT(template, call, proto, args, size)		\
 static inline void bpf_test_probe_##call(void)				\
 {									\
 	check_trace_callback_type_##call(__bpf_trace_##template);	\
@@ -81,12 +80,36 @@ __bpf_trace_tp_map_##call = {						\
 	.tp		= &__tracepoint_##call,				\
 	.bpf_func	= (void *)__bpf_trace_##template,		\
 	.num_args	= COUNT_ARGS(args),				\
+	.writable_size	= size,						\
 };
 
+#define FIRST(x, ...) x
+
+#undef DEFINE_EVENT_WRITABLE
+#define DEFINE_EVENT_WRITABLE(template, call, proto, args, size)	\
+static inline void bpf_test_buffer_##call(void)				\
+{									\
+	/* BUILD_BUG_ON() is ignored if the code is completely eliminated, but \
+	 * BUILD_BUG_ON_ZERO() uses a different mechanism that is not	\
+	 * dead-code-eliminated.					\
+	 */								\
+	FIRST(proto);							\
+	(void)BUILD_BUG_ON_ZERO(size != sizeof(*FIRST(args)));		\
+}									\
+__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
+
+#undef DEFINE_EVENT
+#define DEFINE_EVENT(template, call, proto, args)			\
+	__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), 0)
 
 #undef DEFINE_EVENT_PRINT
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
 	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+
+#undef DEFINE_EVENT_WRITABLE
+#undef __DEFINE_EVENT
+#undef FIRST
+
 #endif /* CONFIG_BPF_EVENTS */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3c38ac9a92a7..c5335d53ce82 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -166,6 +166,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LIRC_MODE2,
 	BPF_PROG_TYPE_SK_REUSEPORT,
 	BPF_PROG_TYPE_FLOW_DISSECTOR,
+	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 62f6bced3a3c..27e2f22879a4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1720,12 +1720,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 	}
 	raw_tp->btp = btp;
 
-	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
-				 BPF_PROG_TYPE_RAW_TRACEPOINT);
+	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
 	if (IS_ERR(prog)) {
 		err = PTR_ERR(prog);
 		goto out_free_tp;
 	}
+	if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
+	    prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
+		err = -EINVAL;
+		goto out_put_prog;
+	}
 
 	err = bpf_probe_register(raw_tp->btp, prog);
 	if (err)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ce166a002d16..b6b4a2ca9f0c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2100,6 +2100,17 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_sock_access(env, insn_idx, regno, off, size, t);
 		if (!err && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
+	} else if (reg->type == PTR_TO_TP_BUFFER) {
+		if (off < 0) {
+			verbose(env,
+				"R%d invalid tracepoint buffer access: off=%d, size=%d",
+				value_regno, off, size);
+			return -EACCES;
+		}
+		if (off + size > env->prog->aux->max_tp_access)
+			env->prog->aux->max_tp_access = off + size;
+		if (t == BPF_READ && value_regno >= 0)
+			mark_reg_unknown(env, regs, value_regno);
 	} else {
 		verbose(env, "R%d invalid mem access '%s'\n", regno,
 			reg_type_str[reg->type]);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d64c00afceb5..a2dd79dc6871 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
 const struct bpf_prog_ops raw_tracepoint_prog_ops = {
 };
 
+static bool raw_tp_writable_prog_is_valid_access(int off, int size,
+						 enum bpf_access_type type,
+						 const struct bpf_prog *prog,
+						 struct bpf_insn_access_aux *info)
+{
+	if (off == 0 && size == sizeof(u64))
+		info->reg_type = PTR_TO_TP_BUFFER;
+	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
+}
+
+const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
+	.get_func_proto  = raw_tp_prog_func_proto,
+	.is_valid_access = raw_tp_writable_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
+};
+
 static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
 				    const struct bpf_prog *prog,
 				    struct bpf_insn_access_aux *info)
@@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
 		return -EINVAL;
 
+	if (prog->aux->max_tp_access > btp->writable_size)
+		return -EINVAL;
+
 	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 3/3] nbd: add tracepoints for send/receive timing
  2019-03-29  0:07 [PATCH bpf-next 0/3] writable contexts for bpf raw tracepoints Matt Mullins
  2019-03-29  0:07 ` [PATCH bpf-next 1/3] bpf: add writable context for " Matt Mullins
@ 2019-03-29  0:07 ` Matt Mullins
  1 sibling, 0 replies; 7+ messages in thread
From: Matt Mullins @ 2019-03-29  0:07 UTC (permalink / raw)
  To: hall, mmullins, ast, bpf, netdev
  Cc: linux-kernel, Josef Bacik, Jens Axboe, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, linux-block, nbd

From: Andrew Hall <hall@fb.com>

This adds four tracepoints to nbd, enabling separate tracing of payload
and header sending/receipt.

In the send path for headers that have already been sent, we also
explicitly initialize the handle so it can be referenced by the later
tracepoint.

Signed-off-by: Andrew Hall <hall@fb.com>
Signed-off-by: Matt Mullins <mmullins@fb.com>
---
 drivers/block/nbd.c        |  8 ++++
 include/trace/events/nbd.h | 92 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7393d04d255c..d3d914620f66 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -513,6 +513,10 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	if (sent) {
 		if (sent >= sizeof(request)) {
 			skip = sent - sizeof(request);
+
+			// initialize handle for tracing purposes
+			handle = nbd_cmd_handle(cmd);
+
 			goto send_pages;
 		}
 		iov_iter_advance(&from, sent);
@@ -536,6 +540,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		(unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req));
 	result = sock_xmit(nbd, index, 1, &from,
 			(type == NBD_CMD_WRITE) ? MSG_MORE : 0, &sent);
+	trace_nbd_header_sent(req, handle);
 	if (result <= 0) {
 		if (was_interrupted(result)) {
 			/* If we havne't sent anything we can just return BUSY,
@@ -608,6 +613,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		bio = next;
 	}
 out:
+	trace_nbd_payload_sent(req, handle);
 	nsock->pending = NULL;
 	nsock->sent = 0;
 	return 0;
@@ -655,6 +661,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 			tag, req);
 		return ERR_PTR(-ENOENT);
 	}
+	trace_nbd_header_received(req, handle);
 	cmd = blk_mq_rq_to_pdu(req);
 
 	mutex_lock(&cmd->lock);
@@ -708,6 +715,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		}
 	}
 out:
+	trace_nbd_payload_received(req, handle);
 	mutex_unlock(&cmd->lock);
 	return ret ? ERR_PTR(ret) : cmd;
 }
diff --git a/include/trace/events/nbd.h b/include/trace/events/nbd.h
index 5928255ed02e..eef476fef95a 100644
--- a/include/trace/events/nbd.h
+++ b/include/trace/events/nbd.h
@@ -7,6 +7,98 @@
 
 #include <linux/tracepoint.h>
 
+TRACE_EVENT(nbd_header_sent,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle),
+
+	TP_STRUCT__entry(
+		__field(struct request *, req)
+		__field(u64, handle)
+	),
+
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->handle = handle;
+	),
+
+	TP_printk(
+		"nbd header sent: request %p, handle 0x%016llx",
+		__entry->req,
+		__entry->handle
+	)
+);
+
+TRACE_EVENT(nbd_payload_sent,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle),
+
+	TP_STRUCT__entry(
+		__field(struct request *, req)
+		__field(u64, handle)
+	),
+
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->handle = handle;
+	),
+
+	TP_printk(
+		"nbd payload sent: request %p, handle 0x%016llx",
+		__entry->req,
+		__entry->handle
+	)
+);
+
+TRACE_EVENT(nbd_header_received,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle),
+
+	TP_STRUCT__entry(
+		__field(struct request *, req)
+		__field(u64, handle)
+	),
+
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->handle = handle;
+	),
+
+	TP_printk(
+		"nbd header received: request %p, handle 0x%016llx",
+		__entry->req,
+		__entry->handle
+	)
+);
+
+TRACE_EVENT(nbd_payload_received,
+
+	TP_PROTO(struct request *req, u64 handle),
+
+	TP_ARGS(req, handle),
+
+	TP_STRUCT__entry(
+		__field(struct request *, req)
+		__field(u64, handle)
+	),
+
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->handle = handle;
+	),
+
+	TP_printk(
+		"nbd payload received: request %p, handle 0x%016llx",
+		__entry->req,
+		__entry->handle
+	)
+);
+
 DECLARE_EVENT_CLASS(nbd_send_request,
 
 	TP_PROTO(struct nbd_request *nbd_request, int index,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: add writable context for raw tracepoints
  2019-03-29  0:07 ` [PATCH bpf-next 1/3] bpf: add writable context for " Matt Mullins
@ 2019-04-01 20:40   ` Daniel Borkmann
  2019-04-03 18:39     ` Matt Mullins
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2019-04-01 20:40 UTC (permalink / raw)
  To: Matt Mullins, hall, ast, bpf, netdev
  Cc: linux-kernel, Martin KaFai Lau, Song Liu, Yonghong Song,
	Steven Rostedt, Ingo Molnar

On 03/29/2019 01:07 AM, Matt Mullins wrote:
> This is an opt-in interface that allows a tracepoint to provide a safe
> buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> The size of the buffer must be a compile-time constant, and is checked
> before allowing a BPF program to attach to a tracepoint that uses this
> feature.
> 
> The pointer to this buffer will be the first argument of tracepoints
> that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> tracepoint, but the buffer to which it points may only be written by the
> latter.
> 
> bpf_probe: assert that writable tracepoint size is correct

Maybe also add a kselftest into bpf test suite to i) demo it and ii) make
sure it's continuously been tested by bots running the suite?

> Signed-off-by: Matt Mullins <mmullins@fb.com>
> ---
>  include/linux/bpf.h             |  2 ++
>  include/linux/bpf_types.h       |  1 +
>  include/linux/tracepoint-defs.h |  1 +
>  include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
>  include/uapi/linux/bpf.h        |  1 +
>  kernel/bpf/syscall.c            |  8 ++++++--
>  kernel/bpf/verifier.c           | 11 +++++++++++
>  kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
>  8 files changed, 68 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a2132e09dc1c..d3c71fd67476 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -263,6 +263,7 @@ enum bpf_reg_type {
>  	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
>  	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
>  	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
>  };
>  
>  /* The information passed from prog-specific *_is_valid_access
> @@ -352,6 +353,7 @@ struct bpf_prog_aux {
>  	u32 used_map_cnt;
>  	u32 max_ctx_offset;
>  	u32 max_pkt_offset;
> +	u32 max_tp_access;
>  	u32 stack_depth;
>  	u32 id;
>  	u32 func_cnt; /* used by non-func prog as the number of func progs */
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 08bf2f1fe553..c766108608cb 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -25,6 +25,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> +BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
>  #endif
>  #ifdef CONFIG_CGROUP_BPF
>  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> index 49ba9cde7e4b..b29950a19205 100644
> --- a/include/linux/tracepoint-defs.h
> +++ b/include/linux/tracepoint-defs.h
> @@ -45,6 +45,7 @@ struct bpf_raw_event_map {
>  	struct tracepoint	*tp;
>  	void			*bpf_func;
>  	u32			num_args;
> +	u32			writable_size;
>  } __aligned(32);
>  
>  #endif
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index 505dae0bed80..d6e556c0a085 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -69,8 +69,7 @@ __bpf_trace_##call(void *__data, proto)					\
>   * to make sure that if the tracepoint handling changes, the
>   * bpf probe will fail to compile unless it too is updated.
>   */
> -#undef DEFINE_EVENT
> -#define DEFINE_EVENT(template, call, proto, args)			\
> +#define __DEFINE_EVENT(template, call, proto, args, size)		\
>  static inline void bpf_test_probe_##call(void)				\
>  {									\
>  	check_trace_callback_type_##call(__bpf_trace_##template);	\
> @@ -81,12 +80,36 @@ __bpf_trace_tp_map_##call = {						\
>  	.tp		= &__tracepoint_##call,				\
>  	.bpf_func	= (void *)__bpf_trace_##template,		\
>  	.num_args	= COUNT_ARGS(args),				\
> +	.writable_size	= size,						\
>  };
>  
> +#define FIRST(x, ...) x
> +
> +#undef DEFINE_EVENT_WRITABLE
> +#define DEFINE_EVENT_WRITABLE(template, call, proto, args, size)	\
> +static inline void bpf_test_buffer_##call(void)				\
> +{									\
> +	/* BUILD_BUG_ON() is ignored if the code is completely eliminated, but \
> +	 * BUILD_BUG_ON_ZERO() uses a different mechanism that is not	\
> +	 * dead-code-eliminated.					\
> +	 */								\
> +	FIRST(proto);							\
> +	(void)BUILD_BUG_ON_ZERO(size != sizeof(*FIRST(args)));		\
> +}									\
> +__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
> +
> +#undef DEFINE_EVENT
> +#define DEFINE_EVENT(template, call, proto, args)			\
> +	__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), 0)
>  
>  #undef DEFINE_EVENT_PRINT
>  #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
>  	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
>  
>  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> +
> +#undef DEFINE_EVENT_WRITABLE
> +#undef __DEFINE_EVENT
> +#undef FIRST
> +
>  #endif /* CONFIG_BPF_EVENTS */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 3c38ac9a92a7..c5335d53ce82 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -166,6 +166,7 @@ enum bpf_prog_type {
>  	BPF_PROG_TYPE_LIRC_MODE2,
>  	BPF_PROG_TYPE_SK_REUSEPORT,
>  	BPF_PROG_TYPE_FLOW_DISSECTOR,
> +	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
>  };
>  
>  enum bpf_attach_type {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 62f6bced3a3c..27e2f22879a4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1720,12 +1720,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
>  	}
>  	raw_tp->btp = btp;
>  
> -	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
> -				 BPF_PROG_TYPE_RAW_TRACEPOINT);
> +	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
>  	if (IS_ERR(prog)) {
>  		err = PTR_ERR(prog);
>  		goto out_free_tp;
>  	}
> +	if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
> +	    prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {

I don't think we'd gain a lot by making this an extra prog type which can do the
same as BPF_PROG_TYPE_RAW_TRACEPOINT modulo optional writing. Why not integrating
this directly into BPF_PROG_TYPE_RAW_TRACEPOINT then? The actual opt-in comes from
the DEFINE_EVENT_WRITABLE(), not from the prog type.

> +		err = -EINVAL;
> +		goto out_put_prog;
> +	}
>  
>  	err = bpf_probe_register(raw_tp->btp, prog);
>  	if (err)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index ce166a002d16..b6b4a2ca9f0c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2100,6 +2100,17 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  		err = check_sock_access(env, insn_idx, regno, off, size, t);
>  		if (!err && value_regno >= 0)
>  			mark_reg_unknown(env, regs, value_regno);
> +	} else if (reg->type == PTR_TO_TP_BUFFER) {
> +		if (off < 0) {
> +			verbose(env,
> +				"R%d invalid tracepoint buffer access: off=%d, size=%d",
> +				value_regno, off, size);
> +			return -EACCES;
> +		}
> +		if (off + size > env->prog->aux->max_tp_access)
> +			env->prog->aux->max_tp_access = off + size;
> +		if (t == BPF_READ && value_regno >= 0)
> +			mark_reg_unknown(env, regs, value_regno);

This should also disallow variable access into the reg, I presume (see check_ctx_reg())?
Or is there a clear rationale for having it enabled?

>  	} else {
>  		verbose(env, "R%d invalid mem access '%s'\n", regno,
>  			reg_type_str[reg->type]);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index d64c00afceb5..a2dd79dc6871 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
>  const struct bpf_prog_ops raw_tracepoint_prog_ops = {
>  };
>  
> +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
> +						 enum bpf_access_type type,
> +						 const struct bpf_prog *prog,
> +						 struct bpf_insn_access_aux *info)
> +{
> +	if (off == 0 && size == sizeof(u64))
> +		info->reg_type = PTR_TO_TP_BUFFER;
> +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
> +}
> +
> +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
> +	.get_func_proto  = raw_tp_prog_func_proto,
> +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
> +};
> +
> +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
> +};
> +
>  static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
>  				    const struct bpf_prog *prog,
>  				    struct bpf_insn_access_aux *info)
> @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
>  	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
>  		return -EINVAL;
>  
> +	if (prog->aux->max_tp_access > btp->writable_size)
> +		return -EINVAL;
> +
>  	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: add writable context for raw tracepoints
  2019-04-01 20:40   ` Daniel Borkmann
@ 2019-04-03 18:39     ` Matt Mullins
  2019-04-05  1:17       ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Mullins @ 2019-04-03 18:39 UTC (permalink / raw)
  To: daniel, netdev, Andrew Hall, bpf, ast
  Cc: linux-kernel, Martin Lau, Yonghong Song, rostedt, mingo, Song Liu

On Mon, 2019-04-01 at 22:40 +0200, Daniel Borkmann wrote:
> On 03/29/2019 01:07 AM, Matt Mullins wrote:
> > This is an opt-in interface that allows a tracepoint to provide a safe
> > buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> > The size of the buffer must be a compile-time constant, and is checked
> > before allowing a BPF program to attach to a tracepoint that uses this
> > feature.
> > 
> > The pointer to this buffer will be the first argument of tracepoints
> > that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> > and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> > tracepoint, but the buffer to which it points may only be written by the
> > latter.
> > 
> > bpf_probe: assert that writable tracepoint size is correct
> 
> Maybe also add a kselftest into bpf test suite to i) demo it and ii) make
> sure it's continuously been tested by bots running the suite?

Will do.

> 
> > Signed-off-by: Matt Mullins <mmullins@fb.com>
> > ---
> >  include/linux/bpf.h             |  2 ++
> >  include/linux/bpf_types.h       |  1 +
> >  include/linux/tracepoint-defs.h |  1 +
> >  include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
> >  include/uapi/linux/bpf.h        |  1 +
> >  kernel/bpf/syscall.c            |  8 ++++++--
> >  kernel/bpf/verifier.c           | 11 +++++++++++
> >  kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
> >  8 files changed, 68 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index a2132e09dc1c..d3c71fd67476 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -263,6 +263,7 @@ enum bpf_reg_type {
> >  	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
> >  	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
> >  	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
> >  };
> >  
> >  /* The information passed from prog-specific *_is_valid_access
> > @@ -352,6 +353,7 @@ struct bpf_prog_aux {
> >  	u32 used_map_cnt;
> >  	u32 max_ctx_offset;
> >  	u32 max_pkt_offset;
> > +	u32 max_tp_access;
> >  	u32 stack_depth;
> >  	u32 id;
> >  	u32 func_cnt; /* used by non-func prog as the number of func progs */
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index 08bf2f1fe553..c766108608cb 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -25,6 +25,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
> >  BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
> >  BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
> >  BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > +BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
> >  #endif
> >  #ifdef CONFIG_CGROUP_BPF
> >  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
> > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> > index 49ba9cde7e4b..b29950a19205 100644
> > --- a/include/linux/tracepoint-defs.h
> > +++ b/include/linux/tracepoint-defs.h
> > @@ -45,6 +45,7 @@ struct bpf_raw_event_map {
> >  	struct tracepoint	*tp;
> >  	void			*bpf_func;
> >  	u32			num_args;
> > +	u32			writable_size;
> >  } __aligned(32);
> >  
> >  #endif
> > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > index 505dae0bed80..d6e556c0a085 100644
> > --- a/include/trace/bpf_probe.h
> > +++ b/include/trace/bpf_probe.h
> > @@ -69,8 +69,7 @@ __bpf_trace_##call(void *__data, proto)					\
> >   * to make sure that if the tracepoint handling changes, the
> >   * bpf probe will fail to compile unless it too is updated.
> >   */
> > -#undef DEFINE_EVENT
> > -#define DEFINE_EVENT(template, call, proto, args)			\
> > +#define __DEFINE_EVENT(template, call, proto, args, size)		\
> >  static inline void bpf_test_probe_##call(void)				\
> >  {									\
> >  	check_trace_callback_type_##call(__bpf_trace_##template);	\
> > @@ -81,12 +80,36 @@ __bpf_trace_tp_map_##call = {						\
> >  	.tp		= &__tracepoint_##call,				\
> >  	.bpf_func	= (void *)__bpf_trace_##template,		\
> >  	.num_args	= COUNT_ARGS(args),				\
> > +	.writable_size	= size,						\
> >  };
> >  
> > +#define FIRST(x, ...) x
> > +
> > +#undef DEFINE_EVENT_WRITABLE
> > +#define DEFINE_EVENT_WRITABLE(template, call, proto, args, size)	\
> > +static inline void bpf_test_buffer_##call(void)				\
> > +{									\
> > +	/* BUILD_BUG_ON() is ignored if the code is completely eliminated, but \
> > +	 * BUILD_BUG_ON_ZERO() uses a different mechanism that is not	\
> > +	 * dead-code-eliminated.					\
> > +	 */								\
> > +	FIRST(proto);							\
> > +	(void)BUILD_BUG_ON_ZERO(size != sizeof(*FIRST(args)));		\
> > +}									\
> > +__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
> > +
> > +#undef DEFINE_EVENT
> > +#define DEFINE_EVENT(template, call, proto, args)			\
> > +	__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), 0)
> >  
> >  #undef DEFINE_EVENT_PRINT
> >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
> >  	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> >  
> >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> > +
> > +#undef DEFINE_EVENT_WRITABLE
> > +#undef __DEFINE_EVENT
> > +#undef FIRST
> > +
> >  #endif /* CONFIG_BPF_EVENTS */
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 3c38ac9a92a7..c5335d53ce82 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -166,6 +166,7 @@ enum bpf_prog_type {
> >  	BPF_PROG_TYPE_LIRC_MODE2,
> >  	BPF_PROG_TYPE_SK_REUSEPORT,
> >  	BPF_PROG_TYPE_FLOW_DISSECTOR,
> > +	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> >  };
> >  
> >  enum bpf_attach_type {
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 62f6bced3a3c..27e2f22879a4 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1720,12 +1720,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
> >  	}
> >  	raw_tp->btp = btp;
> >  
> > -	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
> > -				 BPF_PROG_TYPE_RAW_TRACEPOINT);
> > +	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
> >  	if (IS_ERR(prog)) {
> >  		err = PTR_ERR(prog);
> >  		goto out_free_tp;
> >  	}
> > +	if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
> > +	    prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
> 
> I don't think we'd gain a lot by making this an extra prog type which can do the
> same as BPF_PROG_TYPE_RAW_TRACEPOINT modulo optional writing. Why not integrating
> this directly into BPF_PROG_TYPE_RAW_TRACEPOINT then? The actual opt-in comes from
> the DEFINE_EVENT_WRITABLE(), not from the prog type.

I did that to separate the hook into
raw_tp_writable_prog_is_valid_access, which (compared to
raw_tp_prog_is_valid_access):

  1) permits writes, and
  2) encodes the assumption than the context begins with the pointer to
that writable buffer

I'm not sure those are appropriate for all users of
BPF_PROG_TYPE_RAW_TRACEPOINT, but I can't immediately point out any
harm in doing so -- some dereferences of ctx that have historically
returned a SCALAR_VALUE would end up tagged as a PTR_TO_TP_BUFFER, but
they still won't be able to access through that pointer unless they're
attached to the right tracepoint.

I'll try to unify the two and see what I get.

> 
> > +		err = -EINVAL;
> > +		goto out_put_prog;
> > +	}
> >  
> >  	err = bpf_probe_register(raw_tp->btp, prog);
> >  	if (err)
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index ce166a002d16..b6b4a2ca9f0c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -2100,6 +2100,17 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >  		err = check_sock_access(env, insn_idx, regno, off, size, t);
> >  		if (!err && value_regno >= 0)
> >  			mark_reg_unknown(env, regs, value_regno);
> > +	} else if (reg->type == PTR_TO_TP_BUFFER) {
> > +		if (off < 0) {
> > +			verbose(env,
> > +				"R%d invalid tracepoint buffer access: off=%d, size=%d",
> > +				value_regno, off, size);
> > +			return -EACCES;
> > +		}
> > +		if (off + size > env->prog->aux->max_tp_access)
> > +			env->prog->aux->max_tp_access = off + size;
> > +		if (t == BPF_READ && value_regno >= 0)
> > +			mark_reg_unknown(env, regs, value_regno);
> 
> This should also disallow variable access into the reg, I presume (see check_ctx_reg())?
> Or is there a clear rationale for having it enabled?

Nope, that was an oversight from an (incorrect) assumption that
arithmetic would be disallowed on a PTR_TO_TP_BUFFER.  I'll fix this in
v2.

> 
> >  	} else {
> >  		verbose(env, "R%d invalid mem access '%s'\n", regno,
> >  			reg_type_str[reg->type]);
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index d64c00afceb5..a2dd79dc6871 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -909,6 +909,24 @@ const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
> >  const struct bpf_prog_ops raw_tracepoint_prog_ops = {
> >  };
> >  
> > +static bool raw_tp_writable_prog_is_valid_access(int off, int size,
> > +						 enum bpf_access_type type,
> > +						 const struct bpf_prog *prog,
> > +						 struct bpf_insn_access_aux *info)
> > +{
> > +	if (off == 0 && size == sizeof(u64))
> > +		info->reg_type = PTR_TO_TP_BUFFER;
> > +	return raw_tp_prog_is_valid_access(off, size, type, prog, info);
> > +}
> > +
> > +const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = {
> > +	.get_func_proto  = raw_tp_prog_func_proto,
> > +	.is_valid_access = raw_tp_writable_prog_is_valid_access,
> > +};
> > +
> > +const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = {
> > +};
> > +
> >  static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
> >  				    const struct bpf_prog *prog,
> >  				    struct bpf_insn_access_aux *info)
> > @@ -1198,6 +1216,9 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> >  	if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
> >  		return -EINVAL;
> >  
> > +	if (prog->aux->max_tp_access > btp->writable_size)
> > +		return -EINVAL;
> > +
> >  	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
> >  }
> >  
> > 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: add writable context for raw tracepoints
  2019-04-03 18:39     ` Matt Mullins
@ 2019-04-05  1:17       ` Alexei Starovoitov
  2019-04-05 21:51         ` Matt Mullins
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2019-04-05  1:17 UTC (permalink / raw)
  To: Matt Mullins
  Cc: daniel, netdev, Andrew Hall, bpf, ast, linux-kernel, Martin Lau,
	Yonghong Song, rostedt, mingo, Song Liu

On Wed, Apr 03, 2019 at 06:39:12PM +0000, Matt Mullins wrote:
> On Mon, 2019-04-01 at 22:40 +0200, Daniel Borkmann wrote:
> > On 03/29/2019 01:07 AM, Matt Mullins wrote:
> > > This is an opt-in interface that allows a tracepoint to provide a safe
> > > buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> > > The size of the buffer must be a compile-time constant, and is checked
> > > before allowing a BPF program to attach to a tracepoint that uses this
> > > feature.
> > > 
> > > The pointer to this buffer will be the first argument of tracepoints
> > > that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> > > and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> > > tracepoint, but the buffer to which it points may only be written by the
> > > latter.
> > > 
> > > bpf_probe: assert that writable tracepoint size is correct
> > 
> > Maybe also add a kselftest into bpf test suite to i) demo it and ii) make
> > sure it's continuously been tested by bots running the suite?
> 
> Will do.
> 
> > 
> > > Signed-off-by: Matt Mullins <mmullins@fb.com>
> > > ---
> > >  include/linux/bpf.h             |  2 ++
> > >  include/linux/bpf_types.h       |  1 +
> > >  include/linux/tracepoint-defs.h |  1 +
> > >  include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
> > >  include/uapi/linux/bpf.h        |  1 +
> > >  kernel/bpf/syscall.c            |  8 ++++++--
> > >  kernel/bpf/verifier.c           | 11 +++++++++++
> > >  kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
> > >  8 files changed, 68 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index a2132e09dc1c..d3c71fd67476 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -263,6 +263,7 @@ enum bpf_reg_type {
> > >  	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
> > >  	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
> > >  	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > > +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
> > >  };
> > >  
> > >  /* The information passed from prog-specific *_is_valid_access
> > > @@ -352,6 +353,7 @@ struct bpf_prog_aux {
> > >  	u32 used_map_cnt;
> > >  	u32 max_ctx_offset;
> > >  	u32 max_pkt_offset;
> > > +	u32 max_tp_access;
> > >  	u32 stack_depth;
> > >  	u32 id;
> > >  	u32 func_cnt; /* used by non-func prog as the number of func progs */
> > > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > > index 08bf2f1fe553..c766108608cb 100644
> > > --- a/include/linux/bpf_types.h
> > > +++ b/include/linux/bpf_types.h
> > > @@ -25,6 +25,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
> > >  BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
> > >  BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
> > >  BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > > +BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
> > >  #endif
> > >  #ifdef CONFIG_CGROUP_BPF
> > >  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
> > > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> > > index 49ba9cde7e4b..b29950a19205 100644
> > > --- a/include/linux/tracepoint-defs.h
> > > +++ b/include/linux/tracepoint-defs.h
> > > @@ -45,6 +45,7 @@ struct bpf_raw_event_map {
> > >  	struct tracepoint	*tp;
> > >  	void			*bpf_func;
> > >  	u32			num_args;
> > > +	u32			writable_size;
> > >  } __aligned(32);
> > >  
> > >  #endif
> > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > index 505dae0bed80..d6e556c0a085 100644
> > > --- a/include/trace/bpf_probe.h
> > > +++ b/include/trace/bpf_probe.h
> > > @@ -69,8 +69,7 @@ __bpf_trace_##call(void *__data, proto)					\
> > >   * to make sure that if the tracepoint handling changes, the
> > >   * bpf probe will fail to compile unless it too is updated.
> > >   */
> > > -#undef DEFINE_EVENT
> > > -#define DEFINE_EVENT(template, call, proto, args)			\
> > > +#define __DEFINE_EVENT(template, call, proto, args, size)		\
> > >  static inline void bpf_test_probe_##call(void)				\
> > >  {									\
> > >  	check_trace_callback_type_##call(__bpf_trace_##template);	\
> > > @@ -81,12 +80,36 @@ __bpf_trace_tp_map_##call = {						\
> > >  	.tp		= &__tracepoint_##call,				\
> > >  	.bpf_func	= (void *)__bpf_trace_##template,		\
> > >  	.num_args	= COUNT_ARGS(args),				\
> > > +	.writable_size	= size,						\
> > >  };
> > >  
> > > +#define FIRST(x, ...) x
> > > +
> > > +#undef DEFINE_EVENT_WRITABLE
> > > +#define DEFINE_EVENT_WRITABLE(template, call, proto, args, size)	\
> > > +static inline void bpf_test_buffer_##call(void)				\
> > > +{									\
> > > +	/* BUILD_BUG_ON() is ignored if the code is completely eliminated, but \
> > > +	 * BUILD_BUG_ON_ZERO() uses a different mechanism that is not	\
> > > +	 * dead-code-eliminated.					\
> > > +	 */								\
> > > +	FIRST(proto);							\
> > > +	(void)BUILD_BUG_ON_ZERO(size != sizeof(*FIRST(args)));		\
> > > +}									\
> > > +__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
> > > +
> > > +#undef DEFINE_EVENT
> > > +#define DEFINE_EVENT(template, call, proto, args)			\
> > > +	__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), 0)
> > >  
> > >  #undef DEFINE_EVENT_PRINT
> > >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
> > >  	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> > >  
> > >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> > > +
> > > +#undef DEFINE_EVENT_WRITABLE
> > > +#undef __DEFINE_EVENT
> > > +#undef FIRST
> > > +
> > >  #endif /* CONFIG_BPF_EVENTS */
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 3c38ac9a92a7..c5335d53ce82 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -166,6 +166,7 @@ enum bpf_prog_type {
> > >  	BPF_PROG_TYPE_LIRC_MODE2,
> > >  	BPF_PROG_TYPE_SK_REUSEPORT,
> > >  	BPF_PROG_TYPE_FLOW_DISSECTOR,
> > > +	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> > >  };
> > >  
> > >  enum bpf_attach_type {
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 62f6bced3a3c..27e2f22879a4 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -1720,12 +1720,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
> > >  	}
> > >  	raw_tp->btp = btp;
> > >  
> > > -	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
> > > -				 BPF_PROG_TYPE_RAW_TRACEPOINT);
> > > +	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
> > >  	if (IS_ERR(prog)) {
> > >  		err = PTR_ERR(prog);
> > >  		goto out_free_tp;
> > >  	}
> > > +	if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
> > > +	    prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
> > 
> > I don't think we'd gain a lot by making this an extra prog type which can do the
> > same as BPF_PROG_TYPE_RAW_TRACEPOINT modulo optional writing. Why not integrating
> > this directly into BPF_PROG_TYPE_RAW_TRACEPOINT then? The actual opt-in comes from
> > the DEFINE_EVENT_WRITABLE(), not from the prog type.
> 
> I did that to separate the hook into
> raw_tp_writable_prog_is_valid_access, which (compared to
> raw_tp_prog_is_valid_access):
> 
>   1) permits writes, and
>   2) encodes the assumption than the context begins with the pointer to
> that writable buffer
> 
> I'm not sure those are appropriate for all users of
> BPF_PROG_TYPE_RAW_TRACEPOINT, but I can't immediately point out any
> harm in doing so -- some dereferences of ctx that have historically
> returned a SCALAR_VALUE would end up tagged as a PTR_TO_TP_BUFFER, but
> they still won't be able to access through that pointer unless they're
> attached to the right tracepoint.
> 
> I'll try to unify the two and see what I get.

I think combining raw_tp prog type with raw_tp_writeable is possible,
but too cumbersome to use from user pov.
Since such raw_tp will be accepted at prog load time,
but only at the time of attach it will be rejected in __bpf_probe_register.

raw_tp_writable_prog_is_valid_access() doesn't know future attach point.
we cannot use bpf_attr.expected_attach_type here.
That's why it simply does:
+       if (off == 0 && size == sizeof(u64))
+               info->reg_type = PTR_TO_TP_BUFFER;

essentially hard coding first argument of writeable tp to be the buffer.

TP invocation is also new.
Like in patch 2:
trace_nbd_send_request(&request, nbd->index, blk_mq_rq_from_pdu(cmd));

Patches 1,2,3 actually look fine to me,
but I agree that selftests are necessary.

Matt, could you add new writeable tracepoint somewhere in bpf_test_run()
and corresponding selftest?

Thanks


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: add writable context for raw tracepoints
  2019-04-05  1:17       ` Alexei Starovoitov
@ 2019-04-05 21:51         ` Matt Mullins
  0 siblings, 0 replies; 7+ messages in thread
From: Matt Mullins @ 2019-04-05 21:51 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: Song Liu, linux-kernel, daniel, bpf, ast, rostedt, Andrew Hall,
	mingo, netdev, Martin Lau, Yonghong Song

On Thu, 2019-04-04 at 18:17 -0700, Alexei Starovoitov wrote:
> On Wed, Apr 03, 2019 at 06:39:12PM +0000, Matt Mullins wrote:
> > On Mon, 2019-04-01 at 22:40 +0200, Daniel Borkmann wrote:
> > > On 03/29/2019 01:07 AM, Matt Mullins wrote:
> > > > This is an opt-in interface that allows a tracepoint to provide a safe
> > > > buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
> > > > The size of the buffer must be a compile-time constant, and is checked
> > > > before allowing a BPF program to attach to a tracepoint that uses this
> > > > feature.
> > > > 
> > > > The pointer to this buffer will be the first argument of tracepoints
> > > > that opt in; the buffer is readable by both BPF_PROG_TYPE_RAW_TRACEPOINT
> > > > and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a
> > > > tracepoint, but the buffer to which it points may only be written by the
> > > > latter.
> > > > 
> > > > bpf_probe: assert that writable tracepoint size is correct
> > > 
> > > Maybe also add a kselftest into bpf test suite to i) demo it and ii) make
> > > sure it's continuously been tested by bots running the suite?
> > 
> > Will do.
> > 
> > > 
> > > > Signed-off-by: Matt Mullins <mmullins@fb.com>
> > > > ---
> > > >  include/linux/bpf.h             |  2 ++
> > > >  include/linux/bpf_types.h       |  1 +
> > > >  include/linux/tracepoint-defs.h |  1 +
> > > >  include/trace/bpf_probe.h       | 27 +++++++++++++++++++++++++--
> > > >  include/uapi/linux/bpf.h        |  1 +
> > > >  kernel/bpf/syscall.c            |  8 ++++++--
> > > >  kernel/bpf/verifier.c           | 11 +++++++++++
> > > >  kernel/trace/bpf_trace.c        | 21 +++++++++++++++++++++
> > > >  8 files changed, 68 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index a2132e09dc1c..d3c71fd67476 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -263,6 +263,7 @@ enum bpf_reg_type {
> > > >  	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
> > > >  	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
> > > >  	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > > > +	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
> > > >  };
> > > >  
> > > >  /* The information passed from prog-specific *_is_valid_access
> > > > @@ -352,6 +353,7 @@ struct bpf_prog_aux {
> > > >  	u32 used_map_cnt;
> > > >  	u32 max_ctx_offset;
> > > >  	u32 max_pkt_offset;
> > > > +	u32 max_tp_access;
> > > >  	u32 stack_depth;
> > > >  	u32 id;
> > > >  	u32 func_cnt; /* used by non-func prog as the number of func progs */
> > > > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > > > index 08bf2f1fe553..c766108608cb 100644
> > > > --- a/include/linux/bpf_types.h
> > > > +++ b/include/linux/bpf_types.h
> > > > @@ -25,6 +25,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
> > > >  BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
> > > >  BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
> > > >  BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > > > +BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
> > > >  #endif
> > > >  #ifdef CONFIG_CGROUP_BPF
> > > >  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
> > > > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> > > > index 49ba9cde7e4b..b29950a19205 100644
> > > > --- a/include/linux/tracepoint-defs.h
> > > > +++ b/include/linux/tracepoint-defs.h
> > > > @@ -45,6 +45,7 @@ struct bpf_raw_event_map {
> > > >  	struct tracepoint	*tp;
> > > >  	void			*bpf_func;
> > > >  	u32			num_args;
> > > > +	u32			writable_size;
> > > >  } __aligned(32);
> > > >  
> > > >  #endif
> > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > index 505dae0bed80..d6e556c0a085 100644
> > > > --- a/include/trace/bpf_probe.h
> > > > +++ b/include/trace/bpf_probe.h
> > > > @@ -69,8 +69,7 @@ __bpf_trace_##call(void *__data, proto)					\
> > > >   * to make sure that if the tracepoint handling changes, the
> > > >   * bpf probe will fail to compile unless it too is updated.
> > > >   */
> > > > -#undef DEFINE_EVENT
> > > > -#define DEFINE_EVENT(template, call, proto, args)			\
> > > > +#define __DEFINE_EVENT(template, call, proto, args, size)		\
> > > >  static inline void bpf_test_probe_##call(void)				\
> > > >  {									\
> > > >  	check_trace_callback_type_##call(__bpf_trace_##template);	\
> > > > @@ -81,12 +80,36 @@ __bpf_trace_tp_map_##call = {						\
> > > >  	.tp		= &__tracepoint_##call,				\
> > > >  	.bpf_func	= (void *)__bpf_trace_##template,		\
> > > >  	.num_args	= COUNT_ARGS(args),				\
> > > > +	.writable_size	= size,						\
> > > >  };
> > > >  
> > > > +#define FIRST(x, ...) x
> > > > +
> > > > +#undef DEFINE_EVENT_WRITABLE
> > > > +#define DEFINE_EVENT_WRITABLE(template, call, proto, args, size)	\
> > > > +static inline void bpf_test_buffer_##call(void)				\
> > > > +{									\
> > > > +	/* BUILD_BUG_ON() is ignored if the code is completely eliminated, but \
> > > > +	 * BUILD_BUG_ON_ZERO() uses a different mechanism that is not	\
> > > > +	 * dead-code-eliminated.					\
> > > > +	 */								\
> > > > +	FIRST(proto);							\
> > > > +	(void)BUILD_BUG_ON_ZERO(size != sizeof(*FIRST(args)));		\
> > > > +}									\
> > > > +__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
> > > > +
> > > > +#undef DEFINE_EVENT
> > > > +#define DEFINE_EVENT(template, call, proto, args)			\
> > > > +	__DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), 0)
> > > >  
> > > >  #undef DEFINE_EVENT_PRINT
> > > >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
> > > >  	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> > > >  
> > > >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> > > > +
> > > > +#undef DEFINE_EVENT_WRITABLE
> > > > +#undef __DEFINE_EVENT
> > > > +#undef FIRST
> > > > +
> > > >  #endif /* CONFIG_BPF_EVENTS */
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index 3c38ac9a92a7..c5335d53ce82 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -166,6 +166,7 @@ enum bpf_prog_type {
> > > >  	BPF_PROG_TYPE_LIRC_MODE2,
> > > >  	BPF_PROG_TYPE_SK_REUSEPORT,
> > > >  	BPF_PROG_TYPE_FLOW_DISSECTOR,
> > > > +	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
> > > >  };
> > > >  
> > > >  enum bpf_attach_type {
> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > index 62f6bced3a3c..27e2f22879a4 100644
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -1720,12 +1720,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
> > > >  	}
> > > >  	raw_tp->btp = btp;
> > > >  
> > > > -	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
> > > > -				 BPF_PROG_TYPE_RAW_TRACEPOINT);
> > > > +	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
> > > >  	if (IS_ERR(prog)) {
> > > >  		err = PTR_ERR(prog);
> > > >  		goto out_free_tp;
> > > >  	}
> > > > +	if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
> > > > +	    prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
> > > 
> > > I don't think we'd gain a lot by making this an extra prog type which can do the
> > > same as BPF_PROG_TYPE_RAW_TRACEPOINT modulo optional writing. Why not integrating
> > > this directly into BPF_PROG_TYPE_RAW_TRACEPOINT then? The actual opt-in comes from
> > > the DEFINE_EVENT_WRITABLE(), not from the prog type.
> > 
> > I did that to separate the hook into
> > raw_tp_writable_prog_is_valid_access, which (compared to
> > raw_tp_prog_is_valid_access):
> > 
> >   1) permits writes, and
> >   2) encodes the assumption than the context begins with the pointer to
> > that writable buffer
> > 
> > I'm not sure those are appropriate for all users of
> > BPF_PROG_TYPE_RAW_TRACEPOINT, but I can't immediately point out any
> > harm in doing so -- some dereferences of ctx that have historically
> > returned a SCALAR_VALUE would end up tagged as a PTR_TO_TP_BUFFER, but
> > they still won't be able to access through that pointer unless they're
> > attached to the right tracepoint.
> > 
> > I'll try to unify the two and see what I get.
> 
> I think combining raw_tp prog type with raw_tp_writeable is possible,
> but too cumbersome to use from user pov.
> Since such raw_tp will be accepted at prog load time,
> but only at the time of attach it will be rejected in __bpf_probe_register.
> 
> raw_tp_writable_prog_is_valid_access() doesn't know future attach point.
> we cannot use bpf_attr.expected_attach_type here.

We could use any load-time flag of sorts, but I overlooked
expected_attach_type since that should match the enum bpf_attach_type
values, and we don't use BPF_PROG_ATTACH for tracepoints.

> That's why it simply does:
> +       if (off == 0 && size == sizeof(u64))
> +               info->reg_type = PTR_TO_TP_BUFFER;
> 
> essentially hard coding first argument of writeable tp to be the buffer.

It appears to be possible to xdo this to all
BPF_PROG_TYPE_RAW_TRACEPOINTs, but I think it's clearer to keep a
separate bpf_prog_type for programs that assume ctx[0] is the pointer
to the buffer.

> 
> TP invocation is also new.
> Like in patch 2:
> trace_nbd_send_request(&request, nbd->index, blk_mq_rq_from_pdu(cmd));
> 
> Patches 1,2,3 actually look fine to me,
> but I agree that selftests are necessary.
> 
> Matt, could you add new writeable tracepoint somewhere in bpf_test_run()
> and corresponding selftest?

Yep, I've just got that working, will be included in v2 (along with
some other minor bugfixes).

> 
> Thanks
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-05 21:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29  0:07 [PATCH bpf-next 0/3] writable contexts for bpf raw tracepoints Matt Mullins
2019-03-29  0:07 ` [PATCH bpf-next 1/3] bpf: add writable context for " Matt Mullins
2019-04-01 20:40   ` Daniel Borkmann
2019-04-03 18:39     ` Matt Mullins
2019-04-05  1:17       ` Alexei Starovoitov
2019-04-05 21:51         ` Matt Mullins
2019-03-29  0:07 ` [PATCH bpf-next 3/3] nbd: add tracepoints for send/receive timing Matt Mullins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).