linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] BPF event output helper improvements
@ 2016-07-12 22:36 Daniel Borkmann
  2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-12 22:36 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, peterz, tgraf, netdev, linux-kernel, Daniel Borkmann

This set adds improvements to the BPF event output helper to
support non-linear data sampling, here specifically, for skb
context. For details please see individual patches. The set
is based against net-next tree.

Thanks a lot!

Daniel Borkmann (3):
  perf, events: add non-linear data support for raw records
  bpf, perf: split bpf_perf_event_output
  bpf: avoid stack copy and use skb ctx for event output

 arch/s390/kernel/perf_cpum_sf.c |  2 ++
 arch/x86/events/amd/ibs.c       |  2 ++
 include/linux/bpf.h             |  5 +++-
 include/linux/perf_event.h      |  8 ++++++
 include/uapi/linux/bpf.h        |  2 ++
 kernel/bpf/core.c               |  8 ++++--
 kernel/events/core.c            | 13 +++++++--
 kernel/events/internal.h        | 18 +++++++++---
 kernel/trace/bpf_trace.c        | 64 ++++++++++++++++++++++-------------------
 net/core/filter.c               | 43 ++++++++++++++++++++++++++-
 10 files changed, 125 insertions(+), 40 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-12 22:36 [PATCH net-next 0/3] BPF event output helper improvements Daniel Borkmann
@ 2016-07-12 22:36 ` Daniel Borkmann
  2016-07-13  7:52   ` Peter Zijlstra
  2016-07-13 13:42   ` Peter Zijlstra
  2016-07-12 22:36 ` [PATCH net-next 2/3] bpf, perf: split bpf_perf_event_output Daniel Borkmann
  2016-07-12 22:36 ` [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output Daniel Borkmann
  2 siblings, 2 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-12 22:36 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, peterz, tgraf, netdev, linux-kernel, Daniel Borkmann

This patch adds support for non-linear data on raw records. It means
that for such data, the newly introduced __output_custom() helper will
be used instead of __output_copy(). __output_custom() will invoke
whatever custom callback is passed in via struct perf_raw_record_frag
to extract the data into the ring buffer slot.

To keep changes in perf_prepare_sample() and in perf_output_sample()
minimal, size/size_head split was added to perf_raw_record that call
sites fill out, so that two extra tests in fast-path can be avoided.

The few users of raw records are adapted to initialize their size_head
and frag data; no change in behavior for them. Later patch will extend
BPF side with a first user and callback for this facility, future users
could be things like XDP BPF programs (that work on different context
though and would thus have a different callback), etc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/s390/kernel/perf_cpum_sf.c |  2 ++
 arch/x86/events/amd/ibs.c       |  2 ++
 include/linux/perf_event.h      |  8 ++++++++
 kernel/events/core.c            | 13 ++++++++++---
 kernel/events/internal.h        | 18 ++++++++++++++----
 kernel/trace/bpf_trace.c        |  1 +
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index a8e8321..99c5952 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -984,7 +984,9 @@ static int perf_push_sample(struct perf_event *event, struct sf_raw_sample *sfr)
 	/* Setup perf sample */
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 	raw.size = sfr->size;
+	raw.size_head = raw.size;
 	raw.data = sfr;
+	raw.frag = NULL;
 	data.raw = &raw;
 
 	/* Setup pt_regs to look like an CPU-measurement external interrupt
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index feb90f6..9b27dff 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -656,7 +656,9 @@ fail:
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		raw.size = sizeof(u32) + ibs_data.size;
+		raw.size_head = raw.size;
 		raw.data = ibs_data.data;
+		raw.frag = NULL;
 		data.raw = &raw;
 	}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1a827ce..bf08bdf 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -69,9 +69,17 @@ struct perf_callchain_entry_ctx {
 	bool			    contexts_maxed;
 };
 
+struct perf_raw_record_frag {
+	void				*data;
+	unsigned long (*copy_cb)	(void *dst, const void *src,
+					 unsigned long n);
+};
+
 struct perf_raw_record {
 	u32				size;
+	u32				size_head;
 	void				*data;
+	struct perf_raw_record_frag	*frag;
 };
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9c51ec3..3e1dd7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5553,14 +5553,20 @@ void perf_output_sample(struct perf_output_handle *handle,
 	}
 
 	if (sample_type & PERF_SAMPLE_RAW) {
-		if (data->raw) {
-			u32 raw_size = data->raw->size;
+		struct perf_raw_record *raw = data->raw;
+
+		if (raw) {
+			u32 raw_size = raw->size;
 			u32 real_size = round_up(raw_size + sizeof(u32),
 						 sizeof(u64)) - sizeof(u32);
 			u64 zero = 0;
 
 			perf_output_put(handle, real_size);
-			__output_copy(handle, data->raw->data, raw_size);
+			__output_copy(handle, raw->data, raw->size_head);
+			if (raw->frag)
+				__output_custom(handle, raw->frag->copy_cb,
+						raw->frag->data,
+						raw->size - raw->size_head);
 			if (real_size - raw_size)
 				__output_copy(handle, &zero, real_size - raw_size);
 		} else {
@@ -7388,6 +7394,7 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
 
 	struct perf_raw_record raw = {
 		.size = entry_size,
+		.size_head = entry_size,
 		.data = record,
 	};
 
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 05f9f6d..1b08d94 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -123,10 +123,7 @@ static inline unsigned long perf_aux_size(struct ring_buffer *rb)
 	return rb->aux_nr_pages << PAGE_SHIFT;
 }
 
-#define DEFINE_OUTPUT_COPY(func_name, memcpy_func)			\
-static inline unsigned long						\
-func_name(struct perf_output_handle *handle,				\
-	  const void *buf, unsigned long len)				\
+#define __DEFINE_OUTPUT_COPY_BODY(memcpy_func)				\
 {									\
 	unsigned long size, written;					\
 									\
@@ -152,6 +149,19 @@ func_name(struct perf_output_handle *handle,				\
 	return len;							\
 }
 
+#define DEFINE_OUTPUT_COPY(func_name, memcpy_func)			\
+static inline unsigned long						\
+func_name(struct perf_output_handle *handle,				\
+	  const void *buf, unsigned long len)				\
+__DEFINE_OUTPUT_COPY_BODY(memcpy_func)
+
+static inline unsigned long
+__output_custom(struct perf_output_handle *handle,
+		unsigned long (*copy_cb)(void *dst, const void *src,
+					 unsigned long n),
+		const void *buf, unsigned long len)
+__DEFINE_OUTPUT_COPY_BODY(copy_cb)
+
 static inline unsigned long
 memcpy_common(void *dst, const void *src, unsigned long n)
 {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 094c716..8540bd5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -246,6 +246,7 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
 	struct perf_event *event;
 	struct perf_raw_record raw = {
 		.size = size,
+		.size_head = size,
 		.data = data,
 	};
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/3] bpf, perf: split bpf_perf_event_output
  2016-07-12 22:36 [PATCH net-next 0/3] BPF event output helper improvements Daniel Borkmann
  2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
@ 2016-07-12 22:36 ` Daniel Borkmann
  2016-07-12 22:36 ` [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output Daniel Borkmann
  2 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-12 22:36 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, peterz, tgraf, netdev, linux-kernel, Daniel Borkmann

Split the bpf_perf_event_output() helper as a preparation into two
parts. The newly bpf_perf_event_output() will prepare the raw record
itself and test for unknown flags from BPF trace context, where the
__bpf_perf_event_output() does the core work and will be reused later
as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/trace/bpf_trace.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8540bd5..4d3d5b8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -233,25 +233,17 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
-static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
+static __always_inline u64
+__bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
+			u64 flags, struct perf_raw_record *raw)
 {
-	struct pt_regs *regs = (struct pt_regs *) (long) r1;
-	struct bpf_map *map = (struct bpf_map *) (long) r2;
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	unsigned int cpu = smp_processor_id();
 	u64 index = flags & BPF_F_INDEX_MASK;
-	void *data = (void *) (long) r4;
 	struct perf_sample_data sample_data;
 	struct bpf_event_entry *ee;
 	struct perf_event *event;
-	struct perf_raw_record raw = {
-		.size = size,
-		.size_head = size,
-		.data = data,
-	};
 
-	if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
-		return -EINVAL;
 	if (index == BPF_F_CURRENT_CPU)
 		index = cpu;
 	if (unlikely(index >= array->map.max_entries))
@@ -270,11 +262,27 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
 		return -EOPNOTSUPP;
 
 	perf_sample_data_init(&sample_data, 0, 0);
-	sample_data.raw = &raw;
+	sample_data.raw = raw;
 	perf_event_output(event, &sample_data, regs);
 	return 0;
 }
 
+static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
+{
+	struct perf_raw_record raw = {
+		.size		= size,
+		.size_head	= size,
+		.data		= (void *)(long) r4,
+	};
+
+	if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
+		return -EINVAL;
+
+	return __bpf_perf_event_output((struct pt_regs *)(long) r1,
+				       (struct bpf_map *)(long) r2,
+				       flags, &raw);
+}
+
 static const struct bpf_func_proto bpf_perf_event_output_proto = {
 	.func		= bpf_perf_event_output,
 	.gpl_only	= true,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output
  2016-07-12 22:36 [PATCH net-next 0/3] BPF event output helper improvements Daniel Borkmann
  2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
  2016-07-12 22:36 ` [PATCH net-next 2/3] bpf, perf: split bpf_perf_event_output Daniel Borkmann
@ 2016-07-12 22:36 ` Daniel Borkmann
  2016-07-12 23:25   ` kbuild test robot
  2 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-12 22:36 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, peterz, tgraf, netdev, linux-kernel, Daniel Borkmann

This work improves bpf_skb_event_output() helper in two ways, i) it
avoids that users need to unnecessary extract sampled skb data to
stack first via bpf_skb_load_bytes() and then copy once again into
the ring buffer slot, and ii) it avoids that only portions can be
sampled with bpf_skb_load_bytes() due to stack limit.

Instead, we can make use of the passed in skb context that we have
in the helper anyway, and use some of the reserved flag bits as a
length argument. The helper will use the new __output_custom() facility
from perf with bpf_skb_copy_cb() as callback helper. It will pass
the data for setup to bpf_event_output(), which generates and pushes
the raw record. The linear data used in the non-frag part of the
record serves as custom / programmatically defined meta data passed
along with the appended sample.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h      |  5 ++++-
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/core.c        |  8 ++++++--
 kernel/trace/bpf_trace.c | 33 +++++++++++++++------------------
 net/core/filter.c        | 43 ++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 69 insertions(+), 22 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b3336b4..afd64c8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -209,8 +209,11 @@ u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
-const struct bpf_func_proto *bpf_get_event_output_proto(void);
 
+u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 size_meta,
+		     void *ctx, u64 size_ctx,
+		     unsigned long (*ctx_copy_cb)(void *dst, const void *src,
+						  unsigned long n));
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 262a7e8..c4d9224 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -401,6 +401,8 @@ enum bpf_func_id {
 /* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
 #define BPF_F_INDEX_MASK		0xffffffffULL
 #define BPF_F_CURRENT_CPU		BPF_F_INDEX_MASK
+/* BPF_FUNC_perf_event_output for sk_buff input context. */
+#define BPF_F_CTXLEN_MASK		(0xfffffULL << 32)
 
 /* user accessible mirror of in-kernel sk_buff.
  * new fields can only be added to the end of this structure
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index d638062..47a7054 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1054,9 +1054,13 @@ const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 	return NULL;
 }
 
-const struct bpf_func_proto * __weak bpf_get_event_output_proto(void)
+u64 __weak
+bpf_event_output(struct bpf_map *map, u64 flags, void *meta,
+		 u64 size_meta, void *ctx, u64 size_ctx,
+		 unsigned long (*ctx_copy_cb)(void *dst, const void *src,
+					      unsigned long n))
 {
-	return NULL;
+	return -ENOTSUPP;
 }
 
 /* Always built-in helper functions. */
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4d3d5b8..9c076d1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -296,29 +296,26 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = {
 
 static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs);
 
-static u64 bpf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
+u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 size_meta,
+		     void *ctx, u64 size_ctx,
+		     unsigned long (*ctx_copy_cb)(void *dst, const void *src,
+						  unsigned long n))
 {
 	struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs);
+	struct perf_raw_record_frag frag = {
+		.data		= ctx,
+		.copy_cb	= ctx_copy_cb,
+	};
+	struct perf_raw_record raw = {
+		.size		= size_meta + size_ctx,
+		.size_head	= size_meta,
+		.data		= meta,
+		.frag		= size_ctx ? &frag : NULL,
+	};
 
 	perf_fetch_caller_regs(regs);
 
-	return bpf_perf_event_output((long)regs, r2, flags, r4, size);
-}
-
-static const struct bpf_func_proto bpf_event_output_proto = {
-	.func		= bpf_event_output,
-	.gpl_only	= true,
-	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_CONST_MAP_PTR,
-	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_STACK,
-	.arg5_type	= ARG_CONST_STACK_SIZE,
-};
-
-const struct bpf_func_proto *bpf_get_event_output_proto(void)
-{
-	return &bpf_event_output_proto;
+	return __bpf_perf_event_output(regs, map, flags, &raw);
 }
 
 static u64 bpf_get_current_task(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
diff --git a/net/core/filter.c b/net/core/filter.c
index 10c4a2f..af48d1e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2025,6 +2025,47 @@ bool bpf_helper_changes_skb_data(void *func)
 	return false;
 }
 
+static unsigned long bpf_skb_copy_cb(void *dst_buff, const void *skb,
+				     unsigned long len)
+{
+	void *ptr = skb_header_pointer(skb, 0, len, dst_buff);
+
+	if (unlikely(!ptr))
+		return len;
+	if (ptr != dst_buff)
+		memcpy(dst_buff, ptr, len);
+
+	return 0;
+}
+
+static u64 bpf_skb_event_output(u64 r1, u64 r2, u64 flags, u64 r4,
+				u64 size_meta)
+{
+	struct sk_buff *skb = (struct sk_buff *)(long) r1;
+	u64 size_skb = (flags & BPF_F_CTXLEN_MASK) >> 32;
+
+	if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
+		return -EINVAL;
+	if (unlikely(size_skb > skb->len))
+		return -EINVAL;
+
+	return bpf_event_output((struct bpf_map *)(long) r2,
+				flags & ~BPF_F_CTXLEN_MASK,
+				(void *)(long) r4, size_meta,
+				skb, size_skb, bpf_skb_copy_cb);
+}
+
+static const struct bpf_func_proto bpf_skb_event_output_proto = {
+	.func		= bpf_skb_event_output,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_STACK,
+	.arg5_type	= ARG_CONST_STACK_SIZE,
+};
+
 static unsigned short bpf_tunnel_key_af(u64 flags)
 {
 	return flags & BPF_F_TUNINFO_IPV6 ? AF_INET6 : AF_INET;
@@ -2357,7 +2398,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_get_hash_recalc:
 		return &bpf_get_hash_recalc_proto;
 	case BPF_FUNC_perf_event_output:
-		return bpf_get_event_output_proto();
+		return &bpf_skb_event_output_proto;
 	case BPF_FUNC_get_smp_processor_id:
 		return &bpf_get_smp_processor_id_proto;
 #ifdef CONFIG_SOCK_CGROUP_DATA
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output
  2016-07-12 22:36 ` [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output Daniel Borkmann
@ 2016-07-12 23:25   ` kbuild test robot
  2016-07-12 23:45     ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: kbuild test robot @ 2016-07-12 23:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: kbuild-all, davem, alexei.starovoitov, peterz, tgraf, netdev,
	linux-kernel, Daniel Borkmann

[-- Attachment #1: Type: text/plain, Size: 4633 bytes --]

Hi,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Daniel-Borkmann/BPF-event-output-helper-improvements/20160713-065944
config: s390-allyesconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All warnings (new ones prefixed by >>):

   kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output':
   kernel/trace/bpf_trace.c:284:1: warning: 'bpf_perf_event_output' uses dynamic stack allocation
    }
    ^
   kernel/trace/bpf_trace.c: In function 'bpf_event_output':
>> kernel/trace/bpf_trace.c:319:1: warning: 'bpf_event_output' uses dynamic stack allocation
    }
    ^

vim +/bpf_event_output +319 kernel/trace/bpf_trace.c

3644f5a9 Daniel Borkmann    2016-07-13  278  	if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
3644f5a9 Daniel Borkmann    2016-07-13  279  		return -EINVAL;
3644f5a9 Daniel Borkmann    2016-07-13  280  
3644f5a9 Daniel Borkmann    2016-07-13  281  	return __bpf_perf_event_output((struct pt_regs *)(long) r1,
3644f5a9 Daniel Borkmann    2016-07-13  282  				       (struct bpf_map *)(long) r2,
3644f5a9 Daniel Borkmann    2016-07-13  283  				       flags, &raw);
3644f5a9 Daniel Borkmann    2016-07-13 @284  }
3644f5a9 Daniel Borkmann    2016-07-13  285  
a43eec30 Alexei Starovoitov 2015-10-20  286  static const struct bpf_func_proto bpf_perf_event_output_proto = {
a43eec30 Alexei Starovoitov 2015-10-20  287  	.func		= bpf_perf_event_output,
1075ef59 Alexei Starovoitov 2015-10-23  288  	.gpl_only	= true,
a43eec30 Alexei Starovoitov 2015-10-20  289  	.ret_type	= RET_INTEGER,
a43eec30 Alexei Starovoitov 2015-10-20  290  	.arg1_type	= ARG_PTR_TO_CTX,
a43eec30 Alexei Starovoitov 2015-10-20  291  	.arg2_type	= ARG_CONST_MAP_PTR,
a43eec30 Alexei Starovoitov 2015-10-20  292  	.arg3_type	= ARG_ANYTHING,
a43eec30 Alexei Starovoitov 2015-10-20  293  	.arg4_type	= ARG_PTR_TO_STACK,
a43eec30 Alexei Starovoitov 2015-10-20  294  	.arg5_type	= ARG_CONST_STACK_SIZE,
a43eec30 Alexei Starovoitov 2015-10-20  295  };
a43eec30 Alexei Starovoitov 2015-10-20  296  
bd570ff9 Daniel Borkmann    2016-04-18  297  static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs);
bd570ff9 Daniel Borkmann    2016-04-18  298  
f428cc22 Daniel Borkmann    2016-07-13  299  u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 size_meta,
f428cc22 Daniel Borkmann    2016-07-13  300  		     void *ctx, u64 size_ctx,
f428cc22 Daniel Borkmann    2016-07-13  301  		     unsigned long (*ctx_copy_cb)(void *dst, const void *src,
f428cc22 Daniel Borkmann    2016-07-13  302  						  unsigned long n))
bd570ff9 Daniel Borkmann    2016-04-18  303  {
bd570ff9 Daniel Borkmann    2016-04-18  304  	struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs);
f428cc22 Daniel Borkmann    2016-07-13  305  	struct perf_raw_record_frag frag = {
f428cc22 Daniel Borkmann    2016-07-13  306  		.data		= ctx,
f428cc22 Daniel Borkmann    2016-07-13  307  		.copy_cb	= ctx_copy_cb,
f428cc22 Daniel Borkmann    2016-07-13  308  	};
f428cc22 Daniel Borkmann    2016-07-13  309  	struct perf_raw_record raw = {
f428cc22 Daniel Borkmann    2016-07-13  310  		.size		= size_meta + size_ctx,
f428cc22 Daniel Borkmann    2016-07-13  311  		.size_head	= size_meta,
f428cc22 Daniel Borkmann    2016-07-13  312  		.data		= meta,
f428cc22 Daniel Borkmann    2016-07-13  313  		.frag		= size_ctx ? &frag : NULL,
f428cc22 Daniel Borkmann    2016-07-13  314  	};
bd570ff9 Daniel Borkmann    2016-04-18  315  
bd570ff9 Daniel Borkmann    2016-04-18  316  	perf_fetch_caller_regs(regs);
bd570ff9 Daniel Borkmann    2016-04-18  317  
f428cc22 Daniel Borkmann    2016-07-13  318  	return __bpf_perf_event_output(regs, map, flags, &raw);
bd570ff9 Daniel Borkmann    2016-04-18 @319  }
bd570ff9 Daniel Borkmann    2016-04-18  320  
606274c5 Alexei Starovoitov 2016-07-06  321  static u64 bpf_get_current_task(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
606274c5 Alexei Starovoitov 2016-07-06  322  {

:::::: The code at line 319 was first introduced by commit
:::::: bd570ff970a54df653b48ed0cfb373f2ebed083d bpf: add event output helper for notifications/sampling/logging

:::::: TO: Daniel Borkmann <daniel@iogearbox.net>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 41717 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output
  2016-07-12 23:25   ` kbuild test robot
@ 2016-07-12 23:45     ` Daniel Borkmann
  2016-07-13  0:01       ` Fengguang Wu
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-12 23:45 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, davem, alexei.starovoitov, peterz, tgraf, netdev,
	linux-kernel

On 07/13/2016 01:25 AM, kbuild test robot wrote:
> Hi,
>
> [auto build test WARNING on net-next/master]
>
> url:    https://github.com/0day-ci/linux/commits/Daniel-Borkmann/BPF-event-output-helper-improvements/20160713-065944
> config: s390-allyesconfig (attached as .config)
> compiler: s390x-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
> reproduce:
>          wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=s390
>
> All warnings (new ones prefixed by >>):
>
>     kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output':
>     kernel/trace/bpf_trace.c:284:1: warning: 'bpf_perf_event_output' uses dynamic stack allocation
>      }
>      ^
>     kernel/trace/bpf_trace.c: In function 'bpf_event_output':
>>> kernel/trace/bpf_trace.c:319:1: warning: 'bpf_event_output' uses dynamic stack allocation
>      }
>      ^

Hmm, searching a bit on lkml, it seems these warnings on s390 are actually mostly
harmless I believe [1][2] ... looks like they are there to find structs sitting
on stack, for example, at least that's also what the currently existing one in the
above line (bpf_trace.c +284) appears to be about.

Thanks,
Daniel

   [1] http://lkml.iu.edu/hypermail/linux/kernel/1601.2/04074.html
   [2] https://lkml.org/lkml/2013/6/25/42

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output
  2016-07-12 23:45     ` Daniel Borkmann
@ 2016-07-13  0:01       ` Fengguang Wu
  0 siblings, 0 replies; 15+ messages in thread
From: Fengguang Wu @ 2016-07-13  0:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: kbuild-all, davem, alexei.starovoitov, peterz, tgraf, netdev,
	linux-kernel

Hi Daniel,

On Wed, Jul 13, 2016 at 01:45:47AM +0200, Daniel Borkmann wrote:
>On 07/13/2016 01:25 AM, kbuild test robot wrote:
>> Hi,
>>
>> [auto build test WARNING on net-next/master]
>>
>> url:    https://github.com/0day-ci/linux/commits/Daniel-Borkmann/BPF-event-output-helper-improvements/20160713-065944
>> config: s390-allyesconfig (attached as .config)
>> compiler: s390x-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
>> reproduce:
>>          wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>>          chmod +x ~/bin/make.cross
>>          # save the attached .config to linux build tree
>>          make.cross ARCH=s390
>>
>> All warnings (new ones prefixed by >>):
>>
>>     kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output':
>>     kernel/trace/bpf_trace.c:284:1: warning: 'bpf_perf_event_output' uses dynamic stack allocation
>>      }
>>      ^
>>     kernel/trace/bpf_trace.c: In function 'bpf_event_output':
>>>> kernel/trace/bpf_trace.c:319:1: warning: 'bpf_event_output' uses dynamic stack allocation
>>      }
>>      ^
>
>Hmm, searching a bit on lkml, it seems these warnings on s390 are actually mostly
>harmless I believe [1][2] ... looks like they are there to find structs sitting
>on stack, for example, at least that's also what the currently existing one in the
>above line (bpf_trace.c +284) appears to be about.

Yes it does look so. All such warnings happen only in s390:

% g -h -o '[^ ]*config' *dynamic-stack* | sort | uniq -c | sort -nr
    118 s390-allyesconfig
     80 s390-allmodconfig

Let's ignore all of them on s390.

Thanks,
Fengguang

>   [1] http://lkml.iu.edu/hypermail/linux/kernel/1601.2/04074.html
>   [2] https://lkml.org/lkml/2013/6/25/42

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
@ 2016-07-13  7:52   ` Peter Zijlstra
  2016-07-13  9:24     ` Daniel Borkmann
  2016-07-13 13:42   ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-07-13  7:52 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

On Wed, Jul 13, 2016 at 12:36:17AM +0200, Daniel Borkmann wrote:
> This patch adds support for non-linear data on raw records. It means
> that for such data, the newly introduced __output_custom() helper will
> be used instead of __output_copy(). __output_custom() will invoke
> whatever custom callback is passed in via struct perf_raw_record_frag
> to extract the data into the ring buffer slot.
> 
> To keep changes in perf_prepare_sample() and in perf_output_sample()
> minimal, size/size_head split was added to perf_raw_record that call
> sites fill out, so that two extra tests in fast-path can be avoided.
> 
> The few users of raw records are adapted to initialize their size_head
> and frag data; no change in behavior for them. Later patch will extend
> BPF side with a first user and callback for this facility, future users
> could be things like XDP BPF programs (that work on different context
> though and would thus have a different callback), etc.

Why? What problem are we solving?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-13  7:52   ` Peter Zijlstra
@ 2016-07-13  9:24     ` Daniel Borkmann
  2016-07-13 12:10       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-13  9:24 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

Hi Peter,

On 07/13/2016 09:52 AM, Peter Zijlstra wrote:
> On Wed, Jul 13, 2016 at 12:36:17AM +0200, Daniel Borkmann wrote:
>> This patch adds support for non-linear data on raw records. It means
>> that for such data, the newly introduced __output_custom() helper will
>> be used instead of __output_copy(). __output_custom() will invoke
>> whatever custom callback is passed in via struct perf_raw_record_frag
>> to extract the data into the ring buffer slot.
>>
>> To keep changes in perf_prepare_sample() and in perf_output_sample()
>> minimal, size/size_head split was added to perf_raw_record that call
>> sites fill out, so that two extra tests in fast-path can be avoided.
>>
>> The few users of raw records are adapted to initialize their size_head
>> and frag data; no change in behavior for them. Later patch will extend
>> BPF side with a first user and callback for this facility, future users
>> could be things like XDP BPF programs (that work on different context
>> though and would thus have a different callback), etc.
>
> Why? What problem are we solving?

I've tried to summarize it in patch 3/3, there are a number of improvements
this set would provide: the current BPF_FUNC_perf_event_output helper that
can be used from tc/networking programs has the limitation that if data from
the skb should be part of the sample, that data first needs to be copied to
eBPF stack with the bpf_skb_load_bytes() helper, and only then passed into
bpf_event_output().

This currently has 3 issues we'd like to resolve: i) We need two copies instead
of just a single one for the skb data. The data can be non-linear, see also
skb_copy_bits() as an example for walking/extracting it, ii) for static
verification reasons, the bpf_skb_load_bytes() helper needs to see a constant
size on the passed buffer to make sure BPF verifier can do its sanity checks
on it during verification time, so just passing in skb->len (or any other
non-constant value) wouldn't work, but changing bpf_skb_load_bytes() is also
not the real solution since we still have two copies we'd like to avoid as well
and iii) bpf_skb_load_bytes() is just for rather smaller buffers (e.g. headers)
since they need to sit on the limited eBPF stack anyway. The set would improve
the BPF helper to address all 3 at once.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-13  9:24     ` Daniel Borkmann
@ 2016-07-13 12:10       ` Peter Zijlstra
  2016-07-13 13:16         ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-07-13 12:10 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

On Wed, Jul 13, 2016 at 11:24:13AM +0200, Daniel Borkmann wrote:
> Hi Peter,
> 
> On 07/13/2016 09:52 AM, Peter Zijlstra wrote:
> >On Wed, Jul 13, 2016 at 12:36:17AM +0200, Daniel Borkmann wrote:
> >>This patch adds support for non-linear data on raw records. It means
> >>that for such data, the newly introduced __output_custom() helper will
> >>be used instead of __output_copy(). __output_custom() will invoke
> >>whatever custom callback is passed in via struct perf_raw_record_frag
> >>to extract the data into the ring buffer slot.
> >>
> >>To keep changes in perf_prepare_sample() and in perf_output_sample()
> >>minimal, size/size_head split was added to perf_raw_record that call
> >>sites fill out, so that two extra tests in fast-path can be avoided.
> >>
> >>The few users of raw records are adapted to initialize their size_head
> >>and frag data; no change in behavior for them. Later patch will extend
> >>BPF side with a first user and callback for this facility, future users
> >>could be things like XDP BPF programs (that work on different context
> >>though and would thus have a different callback), etc.
> >
> >Why? What problem are we solving?
> 
> I've tried to summarize it in patch 3/3,

Which is pretty useless if you're staring at this patch.

> This currently has 3 issues we'd like to resolve:

> i) We need two copies instead of just a single one for the skb data.
> The data can be non-linear, see also skb_copy_bits() as an example for
> walking/extracting it,

I'm not familiar enough with the network gunk to be able to read that.
But upto skb_walk_frags() it looks entirely linear to me.

> ii) for static verification reasons, the bpf_skb_load_bytes() helper
> needs to see a constant size on the passed buffer to make sure BPF
> verifier can do its sanity checks on it during verification time, so
> just passing in skb->len (or any other non-constant value) wouldn't
> work, but changing bpf_skb_load_bytes() is also not the real solution
> since we still have two copies we'd like to avoid as well, and

> iii) bpf_skb_load_bytes() is just for rather smaller buffers (e.g.
> headers) since they need to sit on the limited eBPF stack anyway. The
> set would improve the BPF helper to address all 3 at once.

Humm, maybe. Lemme go try and reverse engineer that patch, because I'm
not at all sure wth it's supposed to do, nor am I entirely sure this
clarified things :/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-13 12:10       ` Peter Zijlstra
@ 2016-07-13 13:16         ` Daniel Borkmann
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-13 13:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

On 07/13/2016 02:10 PM, Peter Zijlstra wrote:
> On Wed, Jul 13, 2016 at 11:24:13AM +0200, Daniel Borkmann wrote:
>> On 07/13/2016 09:52 AM, Peter Zijlstra wrote:
>>> On Wed, Jul 13, 2016 at 12:36:17AM +0200, Daniel Borkmann wrote:
>>>> This patch adds support for non-linear data on raw records. It means
>>>> that for such data, the newly introduced __output_custom() helper will
>>>> be used instead of __output_copy(). __output_custom() will invoke
>>>> whatever custom callback is passed in via struct perf_raw_record_frag
>>>> to extract the data into the ring buffer slot.
>>>>
>>>> To keep changes in perf_prepare_sample() and in perf_output_sample()
>>>> minimal, size/size_head split was added to perf_raw_record that call
>>>> sites fill out, so that two extra tests in fast-path can be avoided.
>>>>
>>>> The few users of raw records are adapted to initialize their size_head
>>>> and frag data; no change in behavior for them. Later patch will extend
>>>> BPF side with a first user and callback for this facility, future users
>>>> could be things like XDP BPF programs (that work on different context
>>>> though and would thus have a different callback), etc.
>>>
>>> Why? What problem are we solving?
>>
>> I've tried to summarize it in patch 3/3,
>
> Which is pretty useless if you're staring at this patch.
>
>> This currently has 3 issues we'd like to resolve:
>
>> i) We need two copies instead of just a single one for the skb data.
>> The data can be non-linear, see also skb_copy_bits() as an example for
>> walking/extracting it,
>
> I'm not familiar enough with the network gunk to be able to read that.
> But upto skb_walk_frags() it looks entirely linear to me.

Hm, fair enough, there are three parts, skb can have a linear part
which is taken via skb->data, either in its entirety or there can be a
non-linear part appended to that which can consist of pages that are in
shared info section (skb_shinfo(skb) -> frags[], nr_frags members), that
will be linearized, and in addition to that, appended after the frags[]
data there can be further skbs to the 'root' skb that contain fragmented
data, which is all what skb_copy_bits() copies linearized into 'to' buffer.
So depending on the origin of the skb, its structure can be quite different
and skb_copy_bits() covers all the cases generically. Maybe [1] summarizes
it better if you want to familiarize yourself with how skbs work,
although some parts are not up to date anymore.

   [1] http://vger.kernel.org/~davem/skb_data.html

>> ii) for static verification reasons, the bpf_skb_load_bytes() helper
>> needs to see a constant size on the passed buffer to make sure BPF
>> verifier can do its sanity checks on it during verification time, so
>> just passing in skb->len (or any other non-constant value) wouldn't
>> work, but changing bpf_skb_load_bytes() is also not the real solution
>> since we still have two copies we'd like to avoid as well, and
>
>> iii) bpf_skb_load_bytes() is just for rather smaller buffers (e.g.
>> headers) since they need to sit on the limited eBPF stack anyway. The
>> set would improve the BPF helper to address all 3 at once.
>
> Humm, maybe. Lemme go try and reverse engineer that patch, because I'm
> not at all sure wth it's supposed to do, nor am I entirely sure this
> clarified things :/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
  2016-07-13  7:52   ` Peter Zijlstra
@ 2016-07-13 13:42   ` Peter Zijlstra
  2016-07-13 14:08     ` Daniel Borkmann
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-07-13 13:42 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel


Ok so the nonlinear thing was it doing _two_ copies, one the regular
__output_copy() on raw->data and second the optional fragment thingy
using __output_custom().

Would something like this work instead?

It does the nonlinear thing and the custom copy function thing but
allows more than 2 fragments and allows each fragment to have a custom
copy.

It doesn't look obviously more expensive; it has the one ->copy branch
extra, but then it doesn't recompute the sizes.

---

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1fe22032f228..83e2a83e8db3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -69,9 +69,18 @@ struct perf_callchain_entry_ctx {
 	bool			    contexts_maxed;
 };
 
+typedef unsigned long (*perf_copy_f)(void *dst, const void *src, unsigned long len);
+
+struct perf_raw_frag {
+	struct perf_raw_frag		*next;
+	perf_copy_f			copy;
+	void				*data;
+	u32				size;
+} __packed;
+
 struct perf_raw_record {
+	struct perf_raw_frag		frag;
 	u32				size;
-	void				*data;
 };
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fe8d49a56322..f7ad7d65317d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5617,16 +5617,21 @@ void perf_output_sample(struct perf_output_handle *handle,
 	}
 
 	if (sample_type & PERF_SAMPLE_RAW) {
-		if (data->raw) {
-			u32 raw_size = data->raw->size;
-			u32 real_size = round_up(raw_size + sizeof(u32),
-						 sizeof(u64)) - sizeof(u32);
-			u64 zero = 0;
-
-			perf_output_put(handle, real_size);
-			__output_copy(handle, data->raw->data, raw_size);
-			if (real_size - raw_size)
-				__output_copy(handle, &zero, real_size - raw_size);
+		struct perf_raw_record *raw = data->raw;
+
+		if (raw) {
+			struct perf_raw_frag *frag = &raw->frag;
+
+			perf_output_put(handle, raw->size);
+			do {
+				if (frag->copy) {
+					__output_custom(handle, frag->copy,
+							frag->data, frag->size);
+				} else {
+					__output_copy(handle, frag->data, frag->size);
+				}
+				frag = frag->next;
+			} while (frag);
 		} else {
 			struct {
 				u32	size;
@@ -5751,14 +5756,22 @@ void perf_prepare_sample(struct perf_event_header *header,
 	}
 
 	if (sample_type & PERF_SAMPLE_RAW) {
-		int size = sizeof(u32);
+		struct perf_raw_record *raw = data->raw;
+		int size = sizeof(u64);
 
-		if (data->raw)
-			size += data->raw->size;
-		else
-			size += sizeof(u32);
+		if (raw) {
+			struct perf_raw_frag *frag = &raw->frag;
 
-		header->size += round_up(size, sizeof(u64));
+			size = sizeof(u32);
+			do {
+				size += frag->size;
+				frag = frag->next;
+			} while (frag)
+			size = round_up(size, sizeof(u64));
+			raw->size = size;
+		}
+
+		header->size += size;
 	}
 
 	if (sample_type & PERF_SAMPLE_BRANCH_STACK) {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-13 13:42   ` Peter Zijlstra
@ 2016-07-13 14:08     ` Daniel Borkmann
  2016-07-13 16:40       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-13 14:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

Hi Peter,

On 07/13/2016 03:42 PM, Peter Zijlstra wrote:
>
> Ok so the nonlinear thing was it doing _two_ copies, one the regular
> __output_copy() on raw->data and second the optional fragment thingy
> using __output_custom().
>
> Would something like this work instead?
>
> It does the nonlinear thing and the custom copy function thing but
> allows more than 2 fragments and allows each fragment to have a custom
> copy.
>
> It doesn't look obviously more expensive; it has the one ->copy branch
> extra, but then it doesn't recompute the sizes.

Yes, that would work as well on a quick glance with diff just a bit
bigger, but more generic this way. Do you want me to adapt this into
the first patch?

One question below:

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1fe22032f228..83e2a83e8db3 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -69,9 +69,18 @@ struct perf_callchain_entry_ctx {
>   	bool			    contexts_maxed;
>   };
>
> +typedef unsigned long (*perf_copy_f)(void *dst, const void *src, unsigned long len);
> +
> +struct perf_raw_frag {
> +	struct perf_raw_frag		*next;
> +	perf_copy_f			copy;
> +	void				*data;
> +	u32				size;
> +} __packed;
> +
>   struct perf_raw_record {
> +	struct perf_raw_frag		frag;
>   	u32				size;
> -	void				*data;
>   };
>
>   /*
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index fe8d49a56322..f7ad7d65317d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5617,16 +5617,21 @@ void perf_output_sample(struct perf_output_handle *handle,
>   	}
>
>   	if (sample_type & PERF_SAMPLE_RAW) {
> -		if (data->raw) {
> -			u32 raw_size = data->raw->size;
> -			u32 real_size = round_up(raw_size + sizeof(u32),
> -						 sizeof(u64)) - sizeof(u32);
> -			u64 zero = 0;
> -
> -			perf_output_put(handle, real_size);
> -			__output_copy(handle, data->raw->data, raw_size);
> -			if (real_size - raw_size)
> -				__output_copy(handle, &zero, real_size - raw_size);
> +		struct perf_raw_record *raw = data->raw;
> +
> +		if (raw) {
> +			struct perf_raw_frag *frag = &raw->frag;
> +
> +			perf_output_put(handle, raw->size);
> +			do {
> +				if (frag->copy) {
> +					__output_custom(handle, frag->copy,
> +							frag->data, frag->size);
> +				} else {
> +					__output_copy(handle, frag->data, frag->size);
> +				}
> +				frag = frag->next;
> +			} while (frag);

We still need the zero padding here from above with the computed
raw->size, right?

>   		} else {
>   			struct {
>   				u32	size;
> @@ -5751,14 +5756,22 @@ void perf_prepare_sample(struct perf_event_header *header,

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-13 14:08     ` Daniel Borkmann
@ 2016-07-13 16:40       ` Peter Zijlstra
  2016-07-13 16:53         ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-07-13 16:40 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

On Wed, Jul 13, 2016 at 04:08:55PM +0200, Daniel Borkmann wrote:
> Hi Peter,
> 
> On 07/13/2016 03:42 PM, Peter Zijlstra wrote:
> >
> >Ok so the nonlinear thing was it doing _two_ copies, one the regular
> >__output_copy() on raw->data and second the optional fragment thingy
> >using __output_custom().
> >
> >Would something like this work instead?
> >
> >It does the nonlinear thing and the custom copy function thing but
> >allows more than 2 fragments and allows each fragment to have a custom
> >copy.
> >
> >It doesn't look obviously more expensive; it has the one ->copy branch
> >extra, but then it doesn't recompute the sizes.
> 
> Yes, that would work as well on a quick glance with diff just a bit
> bigger, but more generic this way. Do you want me to adapt this into
> the first patch?

Please.

> One question below:
> 

> >-			u64 zero = 0;

> >-			if (real_size - raw_size)
> >-				__output_copy(handle, &zero, real_size - raw_size);

> 
> We still need the zero padding here from above with the computed
> raw->size, right?

Ah, yes, we need some __output*() in order to advance the handle offset.
We don't _need_ to copy the 0s, but I doubt __output_skip() is much
cheaper for these 1-3 bytes worth of data; we've already touched that
line anyway.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
  2016-07-13 16:40       ` Peter Zijlstra
@ 2016-07-13 16:53         ` Daniel Borkmann
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-07-13 16:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: davem, alexei.starovoitov, tgraf, netdev, linux-kernel

On 07/13/2016 06:40 PM, Peter Zijlstra wrote:
> On Wed, Jul 13, 2016 at 04:08:55PM +0200, Daniel Borkmann wrote:
>> On 07/13/2016 03:42 PM, Peter Zijlstra wrote:
>>>
>>> Ok so the nonlinear thing was it doing _two_ copies, one the regular
>>> __output_copy() on raw->data and second the optional fragment thingy
>>> using __output_custom().
>>>
>>> Would something like this work instead?
>>>
>>> It does the nonlinear thing and the custom copy function thing but
>>> allows more than 2 fragments and allows each fragment to have a custom
>>> copy.
>>>
>>> It doesn't look obviously more expensive; it has the one ->copy branch
>>> extra, but then it doesn't recompute the sizes.
>>
>> Yes, that would work as well on a quick glance with diff just a bit
>> bigger, but more generic this way. Do you want me to adapt this into
>> the first patch?
>
> Please.
>
>> One question below:
>>
>
>>> -			u64 zero = 0;
>
>>> -			if (real_size - raw_size)
>>> -				__output_copy(handle, &zero, real_size - raw_size);
>
>> We still need the zero padding here from above with the computed
>> raw->size, right?
>
> Ah, yes, we need some __output*() in order to advance the handle offset.
> We don't _need_ to copy the 0s, but I doubt __output_skip() is much
> cheaper for these 1-3 bytes worth of data; we've already touched that
> line anyway.

Okay, thanks for your input! I'll respin then.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-07-13 16:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 22:36 [PATCH net-next 0/3] BPF event output helper improvements Daniel Borkmann
2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
2016-07-13  7:52   ` Peter Zijlstra
2016-07-13  9:24     ` Daniel Borkmann
2016-07-13 12:10       ` Peter Zijlstra
2016-07-13 13:16         ` Daniel Borkmann
2016-07-13 13:42   ` Peter Zijlstra
2016-07-13 14:08     ` Daniel Borkmann
2016-07-13 16:40       ` Peter Zijlstra
2016-07-13 16:53         ` Daniel Borkmann
2016-07-12 22:36 ` [PATCH net-next 2/3] bpf, perf: split bpf_perf_event_output Daniel Borkmann
2016-07-12 22:36 ` [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output Daniel Borkmann
2016-07-12 23:25   ` kbuild test robot
2016-07-12 23:45     ` Daniel Borkmann
2016-07-13  0:01       ` Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).