netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/4] bpf: add bpf-based bpf_trace_printk()-like support
@ 2020-08-06 14:42 Alan Maguire
  2020-08-06 14:42 ` [RFC PATCH bpf-next 1/4] bpf: provide function to get vmlinux BTF information Alan Maguire
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Alan Maguire @ 2020-08-06 14:42 UTC (permalink / raw)
  To: ast, daniel, andriin, yhs
  Cc: linux, andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest, Alan Maguire

This series attempts to provide a simple way for BPF programs (and in
future other consumers) to utilize BPF Type Format (BTF) information
to display kernel data structures in-kernel.  The use case this
functionality is applied to here is to support a bpf_trace_printk
trace event-based method of rendering type information.

There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; rather than seq_printf()ing the show data, it is traced
using the bpf_trace_printk trace event.  Other future uses of the
show functionality could include a bpf_printk_btf() function which
printk()ed the data instead. Oops messaging in particular would
be an interesting application for such functionality.

The above potential use case hints at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in-kernel tracing records
data and the userspace program prints it out.  While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also.  Critically in BPF programs it greatly simplifies
debugging and tracing of such data to invoking a one-line
helper.

One challenge raised in an earlier iteration of this work -
where the BTF printing was implemented as a printk() format
specifier - was that the amount of data printed per
printk() was large, and other format specifiers were far
simpler.  This patchset tackles this by instead displaying
data _as the data structure is traversed_, rathern than copying
it to a string for later display.  The problem in doing this
however is that such output can be intermixed with other
bpf_trace_printk events.  The solution pursued here is to
associate a trace ID with the bpf_trace_printk events.  For
now, the bpf_trace_printk() helper sets this trace ID to 0,
and bpf_trace_btf() can specify non-zero values.  This allows
a BPF program to coordinate with a user-space program which
creates a separate trace instance which filters trace events
based on trace ID, allowing for clean display without pollution
from other data sources.  Future work could enhance
bpf_trace_printk() to do this too, either via a new helper
or by smuggling a 32-bit trace id value into the "fmt_size"
argument (the latter might be problematic for 32-bit platforms
though, so a new helper might be preferred).

To aid in display the bpf_trace_btf() helper is passed a
"struct btf_ptr *" which specifies the data to be traced
(struct sk_buff * say),  the BTF id of the type (the BTF
id of "struct sk_buff") or a string representation of
the type ("struct sk_buff").  A flags field is also
present for future use.

Separately a number of flags control how the output is
rendered, see patch 3 for more details.

A snippet of output from printing "struct sk_buff *"
(see patch 3 for the full output) looks like this:

          <idle>-0     [023] d.s.  1825.778400: bpf_trace_printk: (struct sk_buff){
          <idle>-0     [023] d.s.  1825.778409: bpf_trace_printk:  (union){
          <idle>-0     [023] d.s.  1825.778410: bpf_trace_printk:   (struct){
          <idle>-0     [023] d.s.  1825.778412: bpf_trace_printk:    .prev = (struct sk_buff *)0x00000000b2a3df7e,
          <idle>-0     [023] d.s.  1825.778413: bpf_trace_printk:    (union){
          <idle>-0     [023] d.s.  1825.778414: bpf_trace_printk:     .dev = (struct net_device *)0x000000001658808b,
          <idle>-0     [023] d.s.  1825.778416: bpf_trace_printk:     .dev_scratch = (long unsigned int)18446628460391432192,
          <idle>-0     [023] d.s.  1825.778417: bpf_trace_printk:    },
          <idle>-0     [023] d.s.  1825.778417: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778418: bpf_trace_printk:   .rbnode = (struct rb_node){
          <idle>-0     [023] d.s.  1825.778419: bpf_trace_printk:    .rb_right = (struct rb_node *)0x00000000b2a3df7e,
          <idle>-0     [023] d.s.  1825.778420: bpf_trace_printk:    .rb_left = (struct rb_node *)0x000000001658808b,
          <idle>-0     [023] d.s.  1825.778420: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778421: bpf_trace_printk:   .list = (struct list_head){
          <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:    .prev = (struct list_head *)0x00000000b2a3df7e,
          <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:  },
          <idle>-0     [023] d.s.  1825.778426: bpf_trace_printk:  .len = (unsigned int)168,
          <idle>-0     [023] d.s.  1825.778427: bpf_trace_printk:  .mac_len = (__u16)14,

Changes since v3:

- Moved to RFC since the approach is different (and bpf-next is
  closed)
- Rather than using a printk() format specifier as the means
  of invoking BTF-enabled display, a dedicated BPF helper is
  used.  This solves the issue of printk() having to output
  large amounts of data using a complex mechanism such as
  BTF traversal, but still provides a way for the display of
  such data to be achieved via BPF programs.  Future work could
  include a bpf_printk_btf() function to invoke display via
  printk() where the elements of a data structure are printk()ed
  one at a time.  Thanks to Petr Mladek, Andy Shevchenko and
  Rasmus Villemoes who took time to look at the earlier printk()
  format-specifier-focused version of this and provided feedback
  clarifying the problems with that approach.
- Added trace id to the bpf_trace_printk events as a means of
  separating output from standard bpf_trace_printk() events,
  ensuring it can be easily parsed by the reader.
- Added bpf_trace_btf() helper tests which do simple verification
  of the various display options.

Changes since v2:

- Alexei and Yonghong suggested it would be good to use
  probe_kernel_read() on to-be-shown data to ensure safety
  during operation.  Safe copy via probe_kernel_read() to a
  buffer object in "struct btf_show" is used to support
  this.  A few different approaches were explored
  including dynamic allocation and per-cpu buffers. The
  downside of dynamic allocation is that it would be done
  during BPF program execution for bpf_trace_printk()s using
  %pT format specifiers. The problem with per-cpu buffers
  is we'd have to manage preemption and since the display
  of an object occurs over an extended period and in printk
  context where we'd rather not change preemption status,
  it seemed tricky to manage buffer safety while considering
  preemption.  The approach of utilizing stack buffer space
  via the "struct btf_show" seemed like the simplest approach.
  The stack size of the associated functions which have a
  "struct btf_show" on their stack to support show operation
  (btf_type_snprintf_show() and btf_type_seq_show()) stays
  under 500 bytes. The compromise here is the safe buffer we
  use is small - 256 bytes - and as a result multiple
  probe_kernel_read()s are needed for larger objects. Most
  objects of interest are smaller than this (e.g.
  "struct sk_buff" is 224 bytes), and while task_struct is a
  notable exception at ~8K, performance is not the priority for
  BTF-based display. (Alexei and Yonghong, patch 2).
- safe buffer use is the default behaviour (and is mandatory
  for BPF) but unsafe display - meaning no safe copy is done
  and we operate on the object itself - is supported via a
  'u' option.
- pointers are prefixed with 0x for clarity (Alexei, patch 2)
- added additional comments and explanations around BTF show
  code, especially around determining whether objects such
  zeroed. Also tried to comment safe object scheme used. (Yonghong,
  patch 2)
- added late_initcall() to initialize vmlinux BTF so that it would
  not have to be initialized during printk operation (Alexei,
  patch 5)
- removed CONFIG_BTF_PRINTF config option as it is not needed;
  CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
  determining behaviour of type-based printk can be done via
  retrieval of BTF data; if it's not there BTF was unavailable
  or broken (Alexei, patches 4,6)
- fix bpf_trace_printk test to use vmlinux.h and globals via
  skeleton infrastructure, removing need for perf events
  (Andrii, patch 8)

Changes since v1:

- changed format to be more drgn-like, rendering indented type info
  along with type names by default (Alexei)
- zeroed values are omitted (Arnaldo) by default unless the '0'
  modifier is specified (Alexei)
- added an option to print pointer values without obfuscation.
  The reason to do this is the sysctls controlling pointer display
  are likely to be irrelevant in many if not most tracing contexts.
  Some questions on this in the outstanding questions section below...
- reworked printk format specifer so that we no longer rely on format
  %pT<type> but instead use a struct * which contains type information
  (Rasmus). This simplifies the printk parsing, makes use more dynamic
  and also allows specification by BTF id as well as name.
- removed incorrect patch which tried to fix dereferencing of resolved
  BTF info for vmlinux; instead we skip modifiers for the relevant
  case (array element type determination) (Alexei).
- fixed issues with negative snprintf format length (Rasmus)
- added test cases for various data structure formats; base types,
  typedefs, structs, etc.
- tests now iterate through all typedef, enum, struct and unions
  defined for vmlinux BTF and render a version of the target dummy
  value which is either all zeros or all 0xff values; the idea is this
  exercises the "skip if zero" and "print everything" cases.
- added support in BPF for using the %pT format specifier in
  bpf_trace_printk()
- added BPF tests which ensure %pT format specifier use works (Alexei).





Alan Maguire (4):
  bpf: provide function to get vmlinux BTF information
  bpf: make BTF show support generic, apply to seq
    files/bpf_trace_printk
  bpf: add bpf_trace_btf helper
  selftests/bpf: add bpf_trace_btf helper tests

 include/linux/bpf.h                                |   5 +
 include/linux/btf.h                                |  38 +
 include/uapi/linux/bpf.h                           |  63 ++
 kernel/bpf/btf.c                                   | 962 ++++++++++++++++++---
 kernel/bpf/core.c                                  |   5 +
 kernel/bpf/helpers.c                               |   4 +
 kernel/bpf/verifier.c                              |  18 +-
 kernel/trace/bpf_trace.c                           | 121 ++-
 kernel/trace/bpf_trace.h                           |   6 +-
 scripts/bpf_helpers_doc.py                         |   2 +
 tools/include/uapi/linux/bpf.h                     |  63 ++
 tools/testing/selftests/bpf/prog_tests/trace_btf.c |  45 +
 .../selftests/bpf/progs/netif_receive_skb.c        |  43 +
 13 files changed, 1257 insertions(+), 118 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_btf.c
 create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 1/4] bpf: provide function to get vmlinux BTF information
  2020-08-06 14:42 [RFC PATCH bpf-next 0/4] bpf: add bpf-based bpf_trace_printk()-like support Alan Maguire
@ 2020-08-06 14:42 ` Alan Maguire
  2020-08-06 14:42 ` [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk Alan Maguire
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Alan Maguire @ 2020-08-06 14:42 UTC (permalink / raw)
  To: ast, daniel, andriin, yhs
  Cc: linux, andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest, Alan Maguire

It will be used later for BPF structure display support

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/bpf.h   |  2 ++
 kernel/bpf/verifier.c | 18 ++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cef4ef0..55eb67d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1290,6 +1290,8 @@ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr,
 	      union bpf_attr __user *uattr);
 void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
 
+struct btf *bpf_get_btf_vmlinux(void);
+
 /* Map specifics */
 struct xdp_buff;
 struct sk_buff;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b6ccfce..05dfc41 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11064,6 +11064,17 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	}
 }
 
+struct btf *bpf_get_btf_vmlinux(void)
+{
+	if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
+		mutex_lock(&bpf_verifier_lock);
+		if (!btf_vmlinux)
+			btf_vmlinux = btf_parse_vmlinux();
+		mutex_unlock(&bpf_verifier_lock);
+	}
+	return btf_vmlinux;
+}
+
 int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	      union bpf_attr __user *uattr)
 {
@@ -11097,12 +11108,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	env->ops = bpf_verifier_ops[env->prog->type];
 	is_priv = bpf_capable();
 
-	if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
-		mutex_lock(&bpf_verifier_lock);
-		if (!btf_vmlinux)
-			btf_vmlinux = btf_parse_vmlinux();
-		mutex_unlock(&bpf_verifier_lock);
-	}
+	bpf_get_btf_vmlinux();
 
 	/* grab the mutex to protect few globals used by verifier */
 	if (!is_priv)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk
  2020-08-06 14:42 [RFC PATCH bpf-next 0/4] bpf: add bpf-based bpf_trace_printk()-like support Alan Maguire
  2020-08-06 14:42 ` [RFC PATCH bpf-next 1/4] bpf: provide function to get vmlinux BTF information Alan Maguire
@ 2020-08-06 14:42 ` Alan Maguire
  2020-08-13  1:46   ` Alexei Starovoitov
  2020-08-06 14:42 ` [RFC PATCH bpf-next 3/4] bpf: add bpf_trace_btf helper Alan Maguire
  2020-08-06 14:42 ` [RFC PATCH bpf-next 4/4] selftests/bpf: add bpf_trace_btf helper tests Alan Maguire
  3 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2020-08-06 14:42 UTC (permalink / raw)
  To: ast, daniel, andriin, yhs
  Cc: linux, andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest, Alan Maguire

generalize the "seq_show" seq file support in btf.c to support
a generic show callback of which we support three instances;

- the current seq file show
- a show which triggers the bpf_trace/bpf_trace_printk tracepoint
  for each portion of the data displayed

Both classes of show function call btf_type_show() with different
targets:

- for seq_show, the seq file is the target
- for bpf_trace_printk(), no target is needed.

In the tracing case we need to also track additional data -
length of data written specifically for the return value.

By default show will display type information, field members and
their types and values etc, and the information is indented
based upon structure depth. Zeroed fields are omitted.

Show however supports flags which modify its behaviour:

BTF_SHOW_COMPACT - suppress newline/indent.
BTF_SHOW_NONEWLINE - suppress newline only.
BTF_SHOW_NONAME - suppress show of type and member names.
BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
BTF_SHOW_ZERO - show zeroed values (by default they are not shown).

The bpf_trace_printk tracepoint is augmented with a "trace_id"
field; it is used to allow tracepoint filtering as typed display
information can easily be interspersed with other tracing data,
making it hard to read.  Specifying a trace_id will allow users
to selectively trace data, eliminating noise.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/bpf.h      |   2 +
 include/linux/btf.h      |  37 ++
 kernel/bpf/btf.c         | 962 ++++++++++++++++++++++++++++++++++++++++++-----
 kernel/trace/bpf_trace.c |  19 +-
 kernel/trace/bpf_trace.h |   6 +-
 5 files changed, 916 insertions(+), 110 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 55eb67d..6143b6e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -946,6 +946,8 @@ typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type,
 u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 		     void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
 
+int bpf_trace_vprintk(__u32 trace_id, const char *fmt, va_list ap);
+
 /* an array of programs to be executed under rcu_lock.
  *
  * Typical usage:
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8b81fbb..46bf9f4 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -13,6 +13,7 @@
 struct btf_member;
 struct btf_type;
 union bpf_attr;
+struct btf_show;
 
 extern const struct file_operations btf_fops;
 
@@ -46,8 +47,44 @@ int btf_get_info_by_fd(const struct btf *btf,
 const struct btf_type *btf_type_id_size(const struct btf *btf,
 					u32 *type_id,
 					u32 *ret_size);
+
+/*
+ * Options to control show behaviour.
+ *	- BTF_SHOW_COMPACT: no formatting around type information
+ *	- BTF_SHOW_NONAME: no struct/union member names/types
+ *	- BTF_SHOW_PTR_RAW: show raw (unobfuscated) pointer values;
+ *	  equivalent to %px.
+ *	- BTF_SHOW_ZERO: show zero-valued struct/union members; they
+ *	  are not displayed by default
+ *	- BTF_SHOW_NONEWLINE: include indent, but suppress newline;
+ *	  to be used when a show function implicitly includes a newline.
+ *	- BTF_SHOW_UNSAFE: skip use of bpf_probe_read() to safely read
+ *	  data before displaying it.
+ */
+#define BTF_SHOW_COMPACT	(1ULL << 0)
+#define BTF_SHOW_NONAME		(1ULL << 1)
+#define BTF_SHOW_PTR_RAW	(1ULL << 2)
+#define BTF_SHOW_ZERO		(1ULL << 3)
+#define BTF_SHOW_NONEWLINE	(1ULL << 32)
+#define BTF_SHOW_UNSAFE		(1ULL << 33)
+
 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 		       struct seq_file *m);
+
+/*
+ * Trace string representation of obj of BTF type_id.
+ *
+ * @btf: struct btf object
+ * @type_id: type id of type obj points to
+ * @obj: pointer to typed data
+ * @flags: show options (see above)
+ *
+ * Return: length that would have been/was copied as per snprintf, or
+ *	   negative error.
+ */
+int btf_type_trace_show(const struct btf *btf, u32 type_id, void *obj,
+			u32 trace_id, u64 flags);
+
 int btf_get_fd_by_id(u32 id);
 u32 btf_id(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 91afdd4..be47304 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -282,6 +282,88 @@ static const char *btf_type_str(const struct btf_type *t)
 	return btf_kind_str[BTF_INFO_KIND(t->info)];
 }
 
+/* Chunk size we use in safe copy of data to be shown. */
+#define BTF_SHOW_OBJ_SAFE_SIZE		256
+
+/*
+ * This is the maximum size of a base type value (equivalent to a
+ * 128-bit int); if we are at the end of our safe buffer and have
+ * less than 16 bytes space we can't be assured of being able
+ * to copy the next type safely, so in such cases we will initiate
+ * a new copy.
+ */
+#define BTF_SHOW_OBJ_BASE_TYPE_SIZE	16
+
+/*
+ * Common data to all BTF show operations. Private show functions can add
+ * their own data to a structure containing a struct btf_show and consult it
+ * in the show callback.  See btf_type_show() below.
+ *
+ * One challenge with showing nested data is we want to skip 0-valued
+ * data, but in order to figure out whether a nested object is all zeros
+ * we need to walk through it.  As a result, we need to make two passes
+ * when handling structs, unions and arrays; the first path simply looks
+ * for nonzero data, while the second actually does the display.  The first
+ * pass is signalled by show->state.depth_check being set, and if we
+ * encounter a non-zero value we set show->state.depth_to_show to
+ * the depth at which we encountered it.  When we have completed the
+ * first pass, we will know if anything needs to be displayed if
+ * depth_to_show > depth.  See btf_[struct,array]_show() for the
+ * implementation of this.
+ *
+ * Another problem is we want to ensure the data for display is safe to
+ * access.  To support this, the "struct obj" is used to track the data
+ * object and our safe copy of it.  We copy portions of the data needed
+ * to the object "copy" buffer, but because its size is limited to
+ * BTF_SHOW_OBJ_COPY_LEN bytes, multiple copies may be required as we
+ * traverse larger objects for display.
+ *
+ * The various data type show functions all start with a call to
+ * btf_show_start_type() which returns a pointer to the safe copy
+ * of the data needed (or if BTF_SHOW_UNSAFE is specified, to the
+ * raw data itself).  btf_show_obj_safe() is responsible for
+ * using copy_from_kernel_nofault() to update the safe data if necessary
+ * as we traverse the object's data.  skbuff-like semantics are
+ * used:
+ *
+ * - obj.head points to the start of the toplevel object for display
+ * - obj.size is the size of the toplevel object
+ * - obj.data points to the current point in the original data at
+ *   which our safe data starts.  obj.data will advance as we copy
+ *   portions of the data.
+ *
+ * In most cases a single copy will suffice, but larger data structures
+ * such as "struct task_struct" will require many copies.  The logic in
+ * btf_show_obj_safe() handles the logic that determines if a new
+ * copy_from_kernel_nofault() is needed.
+ */
+struct btf_show {
+	u64 flags;
+	void *target;	/* target of show operation (seq file, buffer) */
+	void (*showfn)(struct btf_show *show, const char *fmt, ...);
+	const struct btf *btf;
+	/* below are used during iteration */
+	struct {
+		u8 depth;
+		u8 depth_to_show;
+		u8 depth_check;
+		u8 array_member:1,
+		   array_terminated:1;
+		u16 array_encoding;
+		u32 type_id;
+		int status;			/* non-zero for error */
+		const struct btf_type *type;
+		const struct btf_member *member;
+		char name[KSYM_NAME_LEN];	/* space for member name/type */
+	} state;
+	struct {
+		u32 size;
+		void *head;
+		void *data;
+		u8 safe[BTF_SHOW_OBJ_SAFE_SIZE];
+	} obj;
+};
+
 struct btf_kind_operations {
 	s32 (*check_meta)(struct btf_verifier_env *env,
 			  const struct btf_type *t,
@@ -298,9 +380,9 @@ struct btf_kind_operations {
 				  const struct btf_type *member_type);
 	void (*log_details)(struct btf_verifier_env *env,
 			    const struct btf_type *t);
-	void (*seq_show)(const struct btf *btf, const struct btf_type *t,
+	void (*show)(const struct btf *btf, const struct btf_type *t,
 			 u32 type_id, void *data, u8 bits_offsets,
-			 struct seq_file *m);
+			 struct btf_show *show);
 };
 
 static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS];
@@ -677,6 +759,458 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 	return true;
 }
 
+/* Similar to btf_type_skip_modifiers() but does not skip typedefs. */
+static inline
+const struct btf_type *btf_type_skip_qualifiers(const struct btf *btf, u32 id)
+{
+	const struct btf_type *t = btf_type_by_id(btf, id);
+
+	while (btf_type_is_modifier(t) &&
+	       BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
+		id = t->type;
+		t = btf_type_by_id(btf, t->type);
+	}
+
+	return t;
+}
+
+#define BTF_SHOW_MAX_ITER	10
+
+#define BTF_KIND_BIT(kind)	(1ULL << kind)
+
+/*
+ * Populate show->state.name with type name information.
+ * Format of type name is
+ *
+ * [.member_name = ] (type_name)
+ */
+static inline const char *btf_show_name(struct btf_show *show)
+{
+	/* BTF_MAX_ITER array suffixes "[]" */
+	const char *array_suffixes = "[][][][][][][][][][]";
+	const char *array_suffix = &array_suffixes[strlen(array_suffixes)];
+	/* BTF_MAX_ITER pointer suffixes "*" */
+	const char *ptr_suffixes = "**********";
+	const char *ptr_suffix = &ptr_suffixes[strlen(ptr_suffixes)];
+	const char *name = NULL, *prefix = "", *parens = "";
+	const struct btf_member *m = show->state.member;
+	const struct btf_type *t = show->state.type;
+	const struct btf_array *array;
+	u32 id = show->state.type_id;
+	const char *member = NULL;
+	bool show_member = false;
+	u64 kinds = 0;
+	int i;
+
+	show->state.name[0] = '\0';
+
+	/*
+	 * Don't show type name if we're showing an array member;
+	 * in that case we show the array type so don't need to repeat
+	 * ourselves for each member.
+	 */
+	if (show->state.array_member)
+		return "";
+
+	/* Retrieve member name, if any. */
+	if (m) {
+		member = btf_name_by_offset(show->btf, m->name_off);
+		show_member = strlen(member) > 0;
+		id = m->type;
+	}
+
+	/*
+	 * Start with type_id, as we have have resolved the struct btf_type *
+	 * via btf_modifier_show() past the parent typedef to the child
+	 * struct, int etc it is defined as.  In such cases, the type_id
+	 * still represents the starting type while the the struct btf_type *
+	 * in our show->state points at the resolved type of the typedef.
+	 */
+	t = btf_type_by_id(show->btf, id);
+	if (!t)
+		return "";
+
+	/*
+	 * The goal here is to build up the right number of pointer and
+	 * array suffixes while ensuring the type name for a typedef
+	 * is represented.  Along the way we accumulate a list of
+	 * BTF kinds we have encountered, since these will inform later
+	 * display; for example, pointer types will not require an
+	 * opening "{" for struct, we will just display the pointer value.
+	 *
+	 * We also want to accumulate the right number of pointer or array
+	 * indices in the format string while iterating until we get to
+	 * the typedef/pointee/array member target type.
+	 *
+	 * We start by pointing at the end of pointer and array suffix
+	 * strings; as we accumulate pointers and arrays we move the pointer
+	 * or array string backwards so it will show the expected number of
+	 * '*' or '[]' for the type.  BTF_SHOW_MAX_ITER of nesting of pointers
+	 * and/or arrays and typedefs are supported as a precaution.
+	 *
+	 * We also want to get typedef name while proceeding to resolve
+	 * type it points to so that we can add parentheses if it is a
+	 * "typedef struct" etc.
+	 */
+	for (i = 0; i < BTF_SHOW_MAX_ITER; i++) {
+
+		switch (BTF_INFO_KIND(t->info)) {
+		case BTF_KIND_TYPEDEF:
+			if (!name)
+				name = btf_name_by_offset(show->btf,
+							       t->name_off);
+			kinds |= BTF_KIND_BIT(BTF_KIND_TYPEDEF);
+			id = t->type;
+			break;
+		case BTF_KIND_ARRAY:
+			kinds |= BTF_KIND_BIT(BTF_KIND_ARRAY);
+			parens = "[";
+			if (!t)
+				return "";
+			array = btf_type_array(t);
+			if (array_suffix > array_suffixes)
+				array_suffix -= 2;
+			id = array->type;
+			break;
+		case BTF_KIND_PTR:
+			kinds |= BTF_KIND_BIT(BTF_KIND_PTR);
+			if (ptr_suffix > ptr_suffixes)
+				ptr_suffix -= 1;
+			id = t->type;
+			break;
+		default:
+			id = 0;
+			break;
+		}
+		if (!id)
+			break;
+		t = btf_type_skip_qualifiers(show->btf, id);
+	}
+	/* We may not be able to represent this type; bail to be safe */
+	if (i == BTF_SHOW_MAX_ITER)
+		return "";
+
+	if (!name)
+		name = btf_name_by_offset(show->btf, t->name_off);
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+		prefix = BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT ?
+			 "struct" : "union";
+		/* if it's an array of struct/union, parens is already set */
+		if (!(kinds & (BTF_KIND_BIT(BTF_KIND_ARRAY))))
+			parens = "{";
+		break;
+	case BTF_KIND_ENUM:
+		prefix = "enum";
+		break;
+	default:
+		break;
+	}
+
+	/* pointer does not require parens */
+	if (kinds & BTF_KIND_BIT(BTF_KIND_PTR))
+		parens = "";
+	/* typedef does not require struct/union/enum prefix */
+	if (kinds & BTF_KIND_BIT(BTF_KIND_TYPEDEF))
+		prefix = "";
+
+	if (!name)
+		name = "";
+
+	/* Even if we don't want type name info, we want parentheses etc */
+	if (show->flags & BTF_SHOW_NONAME)
+		snprintf(show->state.name, sizeof(show->state.name), "%s",
+			 parens);
+	else
+		snprintf(show->state.name, sizeof(show->state.name),
+			 "%s%s%s(%s%s%s%s%s%s)%s",
+			 /* first 3 strings comprise ".member = " */
+			 show_member ? "." : "",
+			 show_member ? member : "",
+			 show_member ? " = " : "",
+			 /* ...next is our prefix (struct, enum, etc) */
+			 prefix,
+			 strlen(prefix) > 0 && strlen(name) > 0 ? " " : "",
+			 /* ...this is the type name itself */
+			 name,
+			 /* ...suffixed by the appropriate '*', '[]' suffixes */
+			 strlen(ptr_suffix) > 0 ? " " : "", ptr_suffix,
+			 array_suffix, parens);
+
+	return show->state.name;
+}
+
+#define btf_show(show, ...)						      \
+	do {								      \
+		if (!show->state.depth_check)				      \
+			show->showfn(show, __VA_ARGS__);		      \
+	} while (0)
+
+static inline const char *__btf_show_indent(struct btf_show *show)
+{
+	const char *indents = "                                ";
+	const char *indent = &indents[strlen(indents)];
+
+	if ((indent - show->state.depth) >= indents)
+		return indent - show->state.depth;
+	return indents;
+}
+
+#define btf_show_indent(show)						       \
+	((show->flags & BTF_SHOW_COMPACT) ? "" : __btf_show_indent(show))
+
+#define btf_show_newline(show)						       \
+	((show->flags & (BTF_SHOW_COMPACT | BTF_SHOW_NONEWLINE)) ? "" : "\n")
+
+#define btf_show_delim(show)						       \
+	(show->state.depth == 0 ? "" :					       \
+	 ((show->flags & BTF_SHOW_COMPACT) && show->state.type &&	       \
+	  BTF_INFO_KIND(show->state.type->info) == BTF_KIND_UNION) ? "|" : ",")
+
+#define btf_show_type_value(show, fmt, value)				       \
+	do {								       \
+		if ((value) != 0 || (show->flags & BTF_SHOW_ZERO) ||	       \
+		    show->state.depth == 0) {				       \
+			btf_show(show, "%s%s" fmt "%s%s",		       \
+				 btf_show_indent(show),			       \
+				 btf_show_name(show),			       \
+				 value, btf_show_delim(show),		       \
+				 btf_show_newline(show));		       \
+			if (show->state.depth > show->state.depth_to_show)     \
+				show->state.depth_to_show = show->state.depth; \
+		}							       \
+	} while (0)
+
+#define btf_show_type_values(show, fmt, ...)				       \
+	do {								       \
+		btf_show(show, "%s%s" fmt "%s%s", btf_show_indent(show),       \
+			 btf_show_name(show),				       \
+			 __VA_ARGS__, btf_show_delim(show),		       \
+			 btf_show_newline(show));			       \
+		if (show->state.depth > show->state.depth_to_show)	       \
+			show->state.depth_to_show = show->state.depth;	       \
+	} while (0)
+
+/* How much is left to copy to safe buffer after @data? */
+#define btf_show_obj_size_left(show, data)				       \
+	(show->obj.head + show->obj.size - data)
+
+/* Is object pointed to by @data of @size already copied to our safe buffer? */
+#define btf_show_obj_is_safe(show, data, size)				       \
+	(data >= show->obj.data &&					       \
+	 (data + size) < (show->obj.data + BTF_SHOW_OBJ_SAFE_SIZE))
+
+/*
+ * If object pointed to by @data of @size falls within our safe buffer, return
+ * the equivalent pointer to the same safe data.  Assumes
+ * copy_from_kernel_nofault() has already happened and our safe buffer is
+ * populated.
+ */
+#define __btf_show_obj_safe(show, data, size)				       \
+	(btf_show_obj_is_safe(show, data, size) ?			       \
+	 show->obj.safe + (data - show->obj.data) : NULL)
+
+/*
+ * Return a safe-to-access version of data pointed to by @data.
+ * We do this by copying the relevant amount of information
+ * to the struct btf_show obj.safe buffer using copy_from_kernel_nofault().
+ *
+ * If BTF_SHOW_UNSAFE is specified, just return data as-is; no
+ * safe copy is needed.
+ *
+ * Otherwise we need to determine if we have the required amount
+ * of data (determined by the @data pointer and the size of the
+ * largest base type we can encounter (represented by
+ * BTF_SHOW_OBJ_BASE_TYPE_SIZE). Having that much data ensures
+ * that we will be able to print some of the current object,
+ * and if more is needed a copy will be triggered.
+ * Some objects such as structs will not fit into the buffer;
+ * in such cases additional copies when we iterate over their
+ * members may be needed.
+ *
+ * btf_show_obj_safe() is used to return a safe buffer for
+ * btf_show_start_type(); this ensures that as we recurse into
+ * nested types we always have safe data for the given type.
+ * This approach is somewhat wasteful; it's possible for example
+ * that when iterating over a large union we'll end up copying the
+ * same data repeatedly, but the goal is safety not performance.
+ * We use stack data as opposed to per-CPU buffers because the
+ * iteration over a type can take some time, and preemption handling
+ * would greatly complicate use of the safe buffer.
+ */
+static inline void *btf_show_obj_safe(struct btf_show *show,
+				      const struct btf_type *t,
+				      void *data)
+{
+	int size_left, size;
+	void *safe = NULL;
+
+	if (show->flags & BTF_SHOW_UNSAFE)
+		return data;
+
+	(void) btf_resolve_size(show->btf, t, &size, NULL, NULL);
+
+	/*
+	 * Is this toplevel object? If so, set total object size and
+	 * initialize pointers.  Otherwise check if we still fall within
+	 * our safe object data.
+	 */
+	if (show->state.depth == 0) {
+		show->obj.size = size;
+		show->obj.head = data;
+	} else {
+		/*
+		 * If the size of the current object is > our remaining
+		 * safe buffer we _may_ need to do a new copy.  However
+		 * consider the case of a nested struct; it's size pushes
+		 * us over the safe buffer limit, but showing any individual
+		 * struct members does not.  In such cases, we don't need
+		 * to initiate a fresh copy yet; however we definitely need
+		 * at least BTF_SHOW_OBJ_BASE_TYPE_SIZE bytes left
+		 * in our buffer, regardless of the current object size.
+		 * The logic here is that as we resolve types we will
+		 * hit a base type at some point, and we need to be sure
+		 * the next chunk of data is safely available to display
+		 * that type info safely.  We cannot rely on the size of
+		 * the current object here because it may be much larger
+		 * than our current buffer (e.g. task_struct is 8k).
+		 * All we want to do here is ensure that we can print the
+		 * next basic type, which we can if either
+		 * - the current type size is within the safe buffer; or
+		 * - at least BTF_SHOW_OBJ_BASE_TYPE_SIZE bytes are left in
+		 *   the safe buffer.
+		 */
+		safe = __btf_show_obj_safe(show, data,
+					   min(size,
+					       BTF_SHOW_OBJ_BASE_TYPE_SIZE));
+	}
+
+	/*
+	 * We need a new copy to our safe object, either because we haven't
+	 * yet copied and are intializing safe data, or because the data
+	 * we want falls outside the boundaries of the safe object.
+	 */
+	if (!safe) {
+		size_left = btf_show_obj_size_left(show, data);
+		if (size_left > BTF_SHOW_OBJ_SAFE_SIZE)
+			size_left = BTF_SHOW_OBJ_SAFE_SIZE;
+		show->state.status = copy_from_kernel_nofault(show->obj.safe,
+							      data,
+							      size_left);
+		if (!show->state.status) {
+			show->obj.data = data;
+			safe = show->obj.safe;
+		}
+	}
+
+	return safe;
+}
+
+/*
+ * Set the type we are starting to show and return a safe data pointer
+ * to be used for showing the associated data.
+ */
+static inline void *btf_show_start_type(struct btf_show *show,
+					const struct btf_type *t,
+					u32 type_id,
+					void *data)
+{
+	show->state.type = t;
+	show->state.type_id = type_id;
+	show->state.name[0] = '\0';
+
+	return btf_show_obj_safe(show, t, data);
+}
+
+static inline void btf_show_end_type(struct btf_show *show)
+{
+	show->state.type = NULL;
+	show->state.type_id = 0;
+	show->state.name[0] = '\0';
+}
+
+static inline void *btf_show_start_aggr_type(struct btf_show *show,
+					     const struct btf_type *t,
+					     u32 type_id,
+					     void *data)
+{
+	void *safe_data = btf_show_start_type(show, t, type_id, data);
+
+	if (!safe_data)
+		return safe_data;
+
+	btf_show(show, "%s%s%s", btf_show_indent(show),
+		 btf_show_name(show),
+		 btf_show_newline(show));
+	show->state.depth++;
+	return safe_data;
+}
+
+static inline void btf_show_end_aggr_type(struct btf_show *show,
+					  const char *suffix)
+{
+	show->state.depth--;
+	btf_show(show, "%s%s%s%s", btf_show_indent(show), suffix,
+		 btf_show_delim(show), btf_show_newline(show));
+	btf_show_end_type(show);
+}
+
+static inline void btf_show_start_member(struct btf_show *show,
+					 const struct btf_member *m)
+{
+	show->state.member = m;
+}
+
+static inline void btf_show_start_array_member(struct btf_show *show)
+{
+	show->state.array_member = 1;
+	btf_show_start_member(show, NULL);
+}
+
+static inline void btf_show_end_member(struct btf_show *show)
+{
+	show->state.member = NULL;
+}
+
+static inline void btf_show_end_array_member(struct btf_show *show)
+{
+	show->state.array_member = 0;
+	btf_show_end_member(show);
+}
+
+static inline void *btf_show_start_array_type(struct btf_show *show,
+					      const struct btf_type *t,
+					      u32 type_id,
+					      u16 array_encoding,
+					      void *data)
+{
+	show->state.array_encoding = array_encoding;
+	show->state.array_terminated = 0;
+	return btf_show_start_aggr_type(show, t, type_id, data);
+}
+
+static inline void btf_show_end_array_type(struct btf_show *show)
+{
+	show->state.array_encoding = 0;
+	show->state.array_terminated = 0;
+	btf_show_end_aggr_type(show, "]");
+}
+
+static inline void *btf_show_start_struct_type(struct btf_show *show,
+					       const struct btf_type *t,
+					       u32 type_id,
+					       void *data)
+{
+	return btf_show_start_aggr_type(show, t, type_id, data);
+}
+
+static inline void btf_show_end_struct_type(struct btf_show *show)
+{
+	btf_show_end_aggr_type(show, "}");
+}
+
 __printf(2, 3) static void __btf_verifier_log(struct bpf_verifier_log *log,
 					      const char *fmt, ...)
 {
@@ -1250,11 +1784,11 @@ static int btf_df_resolve(struct btf_verifier_env *env,
 	return -EINVAL;
 }
 
-static void btf_df_seq_show(const struct btf *btf, const struct btf_type *t,
-			    u32 type_id, void *data, u8 bits_offsets,
-			    struct seq_file *m)
+static void btf_df_show(const struct btf *btf, const struct btf_type *t,
+			u32 type_id, void *data, u8 bits_offsets,
+			struct btf_show *show)
 {
-	seq_printf(m, "<unsupported kind:%u>", BTF_INFO_KIND(t->info));
+	btf_show(show, "<unsupported kind:%u>", BTF_INFO_KIND(t->info));
 }
 
 static int btf_int_check_member(struct btf_verifier_env *env,
@@ -1427,7 +1961,7 @@ static void btf_int_log(struct btf_verifier_env *env,
 			 btf_int_encoding_str(BTF_INT_ENCODING(int_data)));
 }
 
-static void btf_int128_print(struct seq_file *m, void *data)
+static void btf_int128_print(struct btf_show *show, void *data)
 {
 	/* data points to a __int128 number.
 	 * Suppose
@@ -1446,9 +1980,10 @@ static void btf_int128_print(struct seq_file *m, void *data)
 	lower_num = *(u64 *)data;
 #endif
 	if (upper_num == 0)
-		seq_printf(m, "0x%llx", lower_num);
+		btf_show_type_value(show, "0x%llx", lower_num);
 	else
-		seq_printf(m, "0x%llx%016llx", upper_num, lower_num);
+		btf_show_type_values(show, "0x%llx%016llx", upper_num,
+				     lower_num);
 }
 
 static void btf_int128_shift(u64 *print_num, u16 left_shift_bits,
@@ -1492,8 +2027,8 @@ static void btf_int128_shift(u64 *print_num, u16 left_shift_bits,
 #endif
 }
 
-static void btf_bitfield_seq_show(void *data, u8 bits_offset,
-				  u8 nr_bits, struct seq_file *m)
+static void btf_bitfield_show(void *data, u8 bits_offset,
+			      u8 nr_bits, struct btf_show *show)
 {
 	u16 left_shift_bits, right_shift_bits;
 	u8 nr_copy_bytes;
@@ -1513,14 +2048,14 @@ static void btf_bitfield_seq_show(void *data, u8 bits_offset,
 	right_shift_bits = BITS_PER_U128 - nr_bits;
 
 	btf_int128_shift(print_num, left_shift_bits, right_shift_bits);
-	btf_int128_print(m, print_num);
+	btf_int128_print(show, print_num);
 }
 
 
-static void btf_int_bits_seq_show(const struct btf *btf,
-				  const struct btf_type *t,
-				  void *data, u8 bits_offset,
-				  struct seq_file *m)
+static void btf_int_bits_show(const struct btf *btf,
+			      const struct btf_type *t,
+			      void *data, u8 bits_offset,
+			      struct btf_show *show)
 {
 	u32 int_data = btf_type_int(t);
 	u8 nr_bits = BTF_INT_BITS(int_data);
@@ -1533,55 +2068,77 @@ static void btf_int_bits_seq_show(const struct btf *btf,
 	total_bits_offset = bits_offset + BTF_INT_OFFSET(int_data);
 	data += BITS_ROUNDDOWN_BYTES(total_bits_offset);
 	bits_offset = BITS_PER_BYTE_MASKED(total_bits_offset);
-	btf_bitfield_seq_show(data, bits_offset, nr_bits, m);
+	btf_bitfield_show(data, bits_offset, nr_bits, show);
 }
 
-static void btf_int_seq_show(const struct btf *btf, const struct btf_type *t,
-			     u32 type_id, void *data, u8 bits_offset,
-			     struct seq_file *m)
+static void btf_int_show(const struct btf *btf, const struct btf_type *t,
+			 u32 type_id, void *data, u8 bits_offset,
+			 struct btf_show *show)
 {
 	u32 int_data = btf_type_int(t);
 	u8 encoding = BTF_INT_ENCODING(int_data);
 	bool sign = encoding & BTF_INT_SIGNED;
 	u8 nr_bits = BTF_INT_BITS(int_data);
+	void *safe_data;
+
+	safe_data = btf_show_start_type(show, t, type_id, data);
+	if (!safe_data)
+		return;
 
 	if (bits_offset || BTF_INT_OFFSET(int_data) ||
 	    BITS_PER_BYTE_MASKED(nr_bits)) {
-		btf_int_bits_seq_show(btf, t, data, bits_offset, m);
-		return;
+		btf_int_bits_show(btf, t, safe_data, bits_offset, show);
+		goto out;
 	}
 
 	switch (nr_bits) {
 	case 128:
-		btf_int128_print(m, data);
+		btf_int128_print(show, safe_data);
 		break;
 	case 64:
 		if (sign)
-			seq_printf(m, "%lld", *(s64 *)data);
+			btf_show_type_value(show, "%lld", *(s64 *)safe_data);
 		else
-			seq_printf(m, "%llu", *(u64 *)data);
+			btf_show_type_value(show, "%llu", *(u64 *)safe_data);
 		break;
 	case 32:
 		if (sign)
-			seq_printf(m, "%d", *(s32 *)data);
+			btf_show_type_value(show, "%d", *(s32 *)safe_data);
 		else
-			seq_printf(m, "%u", *(u32 *)data);
+			btf_show_type_value(show, "%u", *(u32 *)safe_data);
 		break;
 	case 16:
 		if (sign)
-			seq_printf(m, "%d", *(s16 *)data);
+			btf_show_type_value(show, "%d", *(s16 *)safe_data);
 		else
-			seq_printf(m, "%u", *(u16 *)data);
+			btf_show_type_value(show, "%u", *(u16 *)safe_data);
 		break;
 	case 8:
+		if (show->state.array_encoding == BTF_INT_CHAR) {
+			/* check for null terminator */
+			if (show->state.array_terminated)
+				break;
+			if (*(char *)data == '\0') {
+				show->state.array_terminated = 1;
+				break;
+			}
+			if (isprint(*(char *)data)) {
+				btf_show_type_value(show, "'%c'",
+						    *(char *)safe_data);
+				break;
+			}
+		}
 		if (sign)
-			seq_printf(m, "%d", *(s8 *)data);
+			btf_show_type_value(show, "%d", *(s8 *)safe_data);
 		else
-			seq_printf(m, "%u", *(u8 *)data);
+			btf_show_type_value(show, "%u", *(u8 *)safe_data);
 		break;
 	default:
-		btf_int_bits_seq_show(btf, t, data, bits_offset, m);
+		btf_int_bits_show(btf, t, safe_data, bits_offset, show);
+		break;
 	}
+out:
+	btf_show_end_type(show);
 }
 
 static const struct btf_kind_operations int_ops = {
@@ -1590,7 +2147,7 @@ static void btf_int_seq_show(const struct btf *btf, const struct btf_type *t,
 	.check_member = btf_int_check_member,
 	.check_kflag_member = btf_int_check_kflag_member,
 	.log_details = btf_int_log,
-	.seq_show = btf_int_seq_show,
+	.show = btf_int_show,
 };
 
 static int btf_modifier_check_member(struct btf_verifier_env *env,
@@ -1854,34 +2411,44 @@ static int btf_ptr_resolve(struct btf_verifier_env *env,
 	return 0;
 }
 
-static void btf_modifier_seq_show(const struct btf *btf,
-				  const struct btf_type *t,
-				  u32 type_id, void *data,
-				  u8 bits_offset, struct seq_file *m)
+static void btf_modifier_show(const struct btf *btf,
+			      const struct btf_type *t,
+			      u32 type_id, void *data,
+			      u8 bits_offset, struct btf_show *show)
 {
 	if (btf->resolved_ids)
 		t = btf_type_id_resolve(btf, &type_id);
 	else
 		t = btf_type_skip_modifiers(btf, type_id, NULL);
 
-	btf_type_ops(t)->seq_show(btf, t, type_id, data, bits_offset, m);
+	btf_type_ops(t)->show(btf, t, type_id, data, bits_offset, show);
 }
 
-static void btf_var_seq_show(const struct btf *btf, const struct btf_type *t,
-			     u32 type_id, void *data, u8 bits_offset,
-			     struct seq_file *m)
+static void btf_var_show(const struct btf *btf, const struct btf_type *t,
+			 u32 type_id, void *data, u8 bits_offset,
+			 struct btf_show *show)
 {
 	t = btf_type_id_resolve(btf, &type_id);
 
-	btf_type_ops(t)->seq_show(btf, t, type_id, data, bits_offset, m);
+	btf_type_ops(t)->show(btf, t, type_id, data, bits_offset, show);
 }
 
-static void btf_ptr_seq_show(const struct btf *btf, const struct btf_type *t,
-			     u32 type_id, void *data, u8 bits_offset,
-			     struct seq_file *m)
+static void btf_ptr_show(const struct btf *btf, const struct btf_type *t,
+			 u32 type_id, void *data, u8 bits_offset,
+			 struct btf_show *show)
 {
-	/* It is a hashed value */
-	seq_printf(m, "%p", *(void **)data);
+	void *safe_data;
+
+	safe_data = btf_show_start_type(show, t, type_id, data);
+	if (!safe_data)
+		return;
+
+	/* It is a hashed value unless BTF_SHOW_PTR_RAW is specified */
+	if (show->flags & BTF_SHOW_PTR_RAW)
+		btf_show_type_value(show, "0x%px", *(void **)safe_data);
+	else
+		btf_show_type_value(show, "0x%p", *(void **)safe_data);
+	btf_show_end_type(show);
 }
 
 static void btf_ref_type_log(struct btf_verifier_env *env,
@@ -1896,7 +2463,7 @@ static void btf_ref_type_log(struct btf_verifier_env *env,
 	.check_member = btf_modifier_check_member,
 	.check_kflag_member = btf_modifier_check_kflag_member,
 	.log_details = btf_ref_type_log,
-	.seq_show = btf_modifier_seq_show,
+	.show = btf_modifier_show,
 };
 
 static struct btf_kind_operations ptr_ops = {
@@ -1905,7 +2472,7 @@ static void btf_ref_type_log(struct btf_verifier_env *env,
 	.check_member = btf_ptr_check_member,
 	.check_kflag_member = btf_generic_check_kflag_member,
 	.log_details = btf_ref_type_log,
-	.seq_show = btf_ptr_seq_show,
+	.show = btf_ptr_show,
 };
 
 static s32 btf_fwd_check_meta(struct btf_verifier_env *env,
@@ -1946,7 +2513,7 @@ static void btf_fwd_type_log(struct btf_verifier_env *env,
 	.check_member = btf_df_check_member,
 	.check_kflag_member = btf_df_check_kflag_member,
 	.log_details = btf_fwd_type_log,
-	.seq_show = btf_df_seq_show,
+	.show = btf_df_show,
 };
 
 static int btf_array_check_member(struct btf_verifier_env *env,
@@ -2105,28 +2672,90 @@ static void btf_array_log(struct btf_verifier_env *env,
 			 array->type, array->index_type, array->nelems);
 }
 
-static void btf_array_seq_show(const struct btf *btf, const struct btf_type *t,
-			       u32 type_id, void *data, u8 bits_offset,
-			       struct seq_file *m)
+static void __btf_array_show(const struct btf *btf, const struct btf_type *t,
+			     u32 type_id, void *data, u8 bits_offset,
+			     struct btf_show *show)
 {
 	const struct btf_array *array = btf_type_array(t);
 	const struct btf_kind_operations *elem_ops;
 	const struct btf_type *elem_type;
-	u32 i, elem_size, elem_type_id;
+	u32 i, elem_size = 0, elem_type_id;
+	u16 encoding = 0;
 
 	elem_type_id = array->type;
-	elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+	elem_type = btf_type_skip_modifiers(btf, elem_type_id, NULL);
+	if (elem_type && btf_type_has_size(elem_type))
+		elem_size = elem_type->size;
+
+	if (elem_type && btf_type_is_int(elem_type)) {
+		u32 int_type = btf_type_int(elem_type);
+
+		encoding = BTF_INT_ENCODING(int_type);
+
+		/*
+		 * BTF_INT_CHAR encoding never seems to be set for
+		 * char arrays, so if size is 1 and element is
+		 * printable as a char, we'll do that.
+		 */
+		if (elem_size == 1)
+			encoding = BTF_INT_CHAR;
+	}
+
+	if (!btf_show_start_array_type(show, t, type_id, encoding, data))
+		return;
+
+	if (!elem_type)
+		goto out;
 	elem_ops = btf_type_ops(elem_type);
-	seq_puts(m, "[");
+
 	for (i = 0; i < array->nelems; i++) {
-		if (i)
-			seq_puts(m, ",");
 
-		elem_ops->seq_show(btf, elem_type, elem_type_id, data,
-				   bits_offset, m);
+		btf_show_start_array_member(show);
+
+		elem_ops->show(btf, elem_type, elem_type_id, data,
+			       bits_offset, show);
 		data += elem_size;
+
+		btf_show_end_array_member(show);
+
+		if (show->state.array_terminated)
+			break;
+	}
+out:
+	btf_show_end_array_type(show);
+}
+
+static void btf_array_show(const struct btf *btf, const struct btf_type *t,
+			   u32 type_id, void *data, u8 bits_offset,
+			   struct btf_show *show)
+{
+	const struct btf_member *m = show->state.member;
+
+	/*
+	 * First check if any members would be shown (are non-zero).
+	 * See comments above "struct btf_show" definition for more
+	 * details on how this works at a high-level.
+	 */
+	if (show->state.depth > 0 && !(show->flags & BTF_SHOW_ZERO)) {
+		if (!show->state.depth_check) {
+			show->state.depth_check = show->state.depth + 1;
+			show->state.depth_to_show = 0;
+		}
+		__btf_array_show(btf, t, type_id, data, bits_offset, show);
+		show->state.member = m;
+
+		if (show->state.depth_check != show->state.depth + 1)
+			return;
+		show->state.depth_check = 0;
+
+		if (show->state.depth_to_show <= show->state.depth)
+			return;
+		/*
+		 * Reaching here indicates we have recursed and found
+		 * non-zero array member(s).
+		 */
 	}
-	seq_puts(m, "]");
+	__btf_array_show(btf, t, type_id, data, bits_offset, show);
 }
 
 static struct btf_kind_operations array_ops = {
@@ -2135,7 +2764,7 @@ static void btf_array_seq_show(const struct btf *btf, const struct btf_type *t,
 	.check_member = btf_array_check_member,
 	.check_kflag_member = btf_generic_check_kflag_member,
 	.log_details = btf_array_log,
-	.seq_show = btf_array_seq_show,
+	.show = btf_array_show,
 };
 
 static int btf_struct_check_member(struct btf_verifier_env *env,
@@ -2358,15 +2987,18 @@ int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
 	return off;
 }
 
-static void btf_struct_seq_show(const struct btf *btf, const struct btf_type *t,
-				u32 type_id, void *data, u8 bits_offset,
-				struct seq_file *m)
+static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
+			      u32 type_id, void *data, u8 bits_offset,
+			      struct btf_show *show)
 {
-	const char *seq = BTF_INFO_KIND(t->info) == BTF_KIND_UNION ? "|" : ",";
 	const struct btf_member *member;
+	void *safe_data;
 	u32 i;
 
-	seq_puts(m, "{");
+	safe_data = btf_show_start_struct_type(show, t, type_id, data);
+	if (!safe_data)
+		return;
+
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
 								member->type);
@@ -2375,23 +3007,60 @@ static void btf_struct_seq_show(const struct btf *btf, const struct btf_type *t,
 		u32 bytes_offset;
 		u8 bits8_offset;
 
-		if (i)
-			seq_puts(m, seq);
+		btf_show_start_member(show, member);
 
 		member_offset = btf_member_bit_offset(t, member);
 		bitfield_size = btf_member_bitfield_size(t, member);
 		bytes_offset = BITS_ROUNDDOWN_BYTES(member_offset);
 		bits8_offset = BITS_PER_BYTE_MASKED(member_offset);
 		if (bitfield_size) {
-			btf_bitfield_seq_show(data + bytes_offset, bits8_offset,
-					      bitfield_size, m);
+			btf_bitfield_show(safe_data + bytes_offset,
+					  bits8_offset,
+					  bitfield_size, show);
 		} else {
 			ops = btf_type_ops(member_type);
-			ops->seq_show(btf, member_type, member->type,
-				      data + bytes_offset, bits8_offset, m);
+			ops->show(btf, member_type, member->type,
+				  data + bytes_offset, bits8_offset, show);
 		}
+
+		btf_show_end_member(show);
 	}
-	seq_puts(m, "}");
+
+	btf_show_end_struct_type(show);
+}
+
+static void btf_struct_show(const struct btf *btf, const struct btf_type *t,
+			    u32 type_id, void *data, u8 bits_offset,
+			    struct btf_show *show)
+{
+	const struct btf_member *m = show->state.member;
+
+	/*
+	 * First check if any members would be shown (are non-zero).
+	 * See comments above "struct btf_show" definition for more
+	 * details on how this works at a high-level.
+	 */
+	if (show->state.depth > 0 && !(show->flags & BTF_SHOW_ZERO)) {
+		if (!show->state.depth_check) {
+			show->state.depth_check = show->state.depth + 1;
+			show->state.depth_to_show = 0;
+		}
+		__btf_struct_show(btf, t, type_id, data, bits_offset, show);
+		/* Restore saved member data here */
+		show->state.member = m;
+		if (show->state.depth_check != show->state.depth + 1)
+			return;
+		show->state.depth_check = 0;
+
+		if (show->state.depth_to_show <= show->state.depth)
+			return;
+		/*
+		 * Reaching here indicates we have recursed and found
+		 * non-zero child values.
+		 */
+	}
+
+	__btf_struct_show(btf, t, type_id, data, bits_offset, show);
 }
 
 static struct btf_kind_operations struct_ops = {
@@ -2400,7 +3069,7 @@ static void btf_struct_seq_show(const struct btf *btf, const struct btf_type *t,
 	.check_member = btf_struct_check_member,
 	.check_kflag_member = btf_generic_check_kflag_member,
 	.log_details = btf_struct_log,
-	.seq_show = btf_struct_seq_show,
+	.show = btf_struct_show,
 };
 
 static int btf_enum_check_member(struct btf_verifier_env *env,
@@ -2531,24 +3200,35 @@ static void btf_enum_log(struct btf_verifier_env *env,
 	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
 }
 
-static void btf_enum_seq_show(const struct btf *btf, const struct btf_type *t,
-			      u32 type_id, void *data, u8 bits_offset,
-			      struct seq_file *m)
+static void btf_enum_show(const struct btf *btf, const struct btf_type *t,
+			  u32 type_id, void *data, u8 bits_offset,
+			  struct btf_show *show)
 {
 	const struct btf_enum *enums = btf_type_enum(t);
 	u32 i, nr_enums = btf_type_vlen(t);
-	int v = *(int *)data;
+	void *safe_data;
+	int v;
+
+	safe_data = btf_show_start_type(show, t, type_id, data);
+	if (!safe_data)
+		return;
+
+	v = *(int *)safe_data;
 
 	for (i = 0; i < nr_enums; i++) {
-		if (v == enums[i].val) {
-			seq_printf(m, "%s",
-				   __btf_name_by_offset(btf,
-							enums[i].name_off));
-			return;
-		}
+		if (v != enums[i].val)
+			continue;
+
+		btf_show_type_value(show, "%s",
+				    __btf_name_by_offset(btf,
+							 enums[i].name_off));
+
+		btf_show_end_type(show);
+		return;
 	}
 
-	seq_printf(m, "%d", v);
+	btf_show_type_value(show, "%d", v);
+	btf_show_end_type(show);
 }
 
 static struct btf_kind_operations enum_ops = {
@@ -2557,7 +3237,7 @@ static void btf_enum_seq_show(const struct btf *btf, const struct btf_type *t,
 	.check_member = btf_enum_check_member,
 	.check_kflag_member = btf_enum_check_kflag_member,
 	.log_details = btf_enum_log,
-	.seq_show = btf_enum_seq_show,
+	.show = btf_enum_show,
 };
 
 static s32 btf_func_proto_check_meta(struct btf_verifier_env *env,
@@ -2644,7 +3324,7 @@ static void btf_func_proto_log(struct btf_verifier_env *env,
 	.check_member = btf_df_check_member,
 	.check_kflag_member = btf_df_check_kflag_member,
 	.log_details = btf_func_proto_log,
-	.seq_show = btf_df_seq_show,
+	.show = btf_df_show,
 };
 
 static s32 btf_func_check_meta(struct btf_verifier_env *env,
@@ -2678,7 +3358,7 @@ static s32 btf_func_check_meta(struct btf_verifier_env *env,
 	.check_member = btf_df_check_member,
 	.check_kflag_member = btf_df_check_kflag_member,
 	.log_details = btf_ref_type_log,
-	.seq_show = btf_df_seq_show,
+	.show = btf_df_show,
 };
 
 static s32 btf_var_check_meta(struct btf_verifier_env *env,
@@ -2742,7 +3422,7 @@ static void btf_var_log(struct btf_verifier_env *env, const struct btf_type *t)
 	.check_member		= btf_df_check_member,
 	.check_kflag_member	= btf_df_check_kflag_member,
 	.log_details		= btf_var_log,
-	.seq_show		= btf_var_seq_show,
+	.show			= btf_var_show,
 };
 
 static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
@@ -2868,24 +3548,28 @@ static void btf_datasec_log(struct btf_verifier_env *env,
 	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
 }
 
-static void btf_datasec_seq_show(const struct btf *btf,
-				 const struct btf_type *t, u32 type_id,
-				 void *data, u8 bits_offset,
-				 struct seq_file *m)
+static void btf_datasec_show(const struct btf *btf,
+			     const struct btf_type *t, u32 type_id,
+			     void *data, u8 bits_offset,
+			     struct btf_show *show)
 {
 	const struct btf_var_secinfo *vsi;
 	const struct btf_type *var;
 	u32 i;
 
-	seq_printf(m, "section (\"%s\") = {", __btf_name_by_offset(btf, t->name_off));
+	if (!btf_show_start_type(show, t, type_id, data))
+		return;
+
+	btf_show_type_value(show, "section (\"%s\") = {",
+			    __btf_name_by_offset(btf, t->name_off));
 	for_each_vsi(i, t, vsi) {
 		var = btf_type_by_id(btf, vsi->type);
 		if (i)
-			seq_puts(m, ",");
-		btf_type_ops(var)->seq_show(btf, var, vsi->type,
-					    data + vsi->offset, bits_offset, m);
+			btf_show(show, ",");
+		btf_type_ops(var)->show(btf, var, vsi->type,
+					data + vsi->offset, bits_offset, show);
 	}
-	seq_puts(m, "}");
+	btf_show_end_type(show);
 }
 
 static const struct btf_kind_operations datasec_ops = {
@@ -2894,7 +3578,7 @@ static void btf_datasec_seq_show(const struct btf *btf,
 	.check_member		= btf_df_check_member,
 	.check_kflag_member	= btf_df_check_kflag_member,
 	.log_details		= btf_datasec_log,
-	.seq_show		= btf_datasec_seq_show,
+	.show			= btf_datasec_show,
 };
 
 static int btf_func_proto_check(struct btf_verifier_env *env,
@@ -4516,12 +5200,84 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 	return 0;
 }
 
-void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
-		       struct seq_file *m)
+void btf_type_show(const struct btf *btf, u32 type_id, void *obj,
+		   struct btf_show *show)
 {
 	const struct btf_type *t = btf_type_by_id(btf, type_id);
 
-	btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
+	show->btf = btf;
+	memset(&show->state, 0, sizeof(show->state));
+	memset(&show->obj, 0, sizeof(show->obj));
+
+	btf_type_ops(t)->show(btf, t, type_id, obj, 0, show);
+}
+
+static void btf_seq_show(struct btf_show *show, const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	seq_vprintf((struct seq_file *)show->target, fmt, args);
+	va_end(args);
+}
+
+void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
+			struct seq_file *m)
+{
+	struct btf_show sseq;
+
+	sseq.target = m;
+	sseq.showfn = btf_seq_show;
+	sseq.flags = BTF_SHOW_NONAME | BTF_SHOW_COMPACT | BTF_SHOW_ZERO |
+		     BTF_SHOW_UNSAFE;
+
+	btf_type_show(btf, type_id, obj, &sseq);
+}
+
+/* pass in trace id and record number of bytes written. */
+struct btf_show_trace {
+	struct btf_show show;
+	__u32 trace_id;
+	int len;		/* length written */
+};
+
+static void btf_trace_show(struct btf_show *show, const char *fmt, ...)
+{
+	struct btf_show_trace *strace = (struct btf_show_trace *)show;
+	va_list args;
+	int len;
+
+	va_start(args, fmt);
+	len = bpf_trace_vprintk(strace->trace_id, fmt, args);
+	va_end(args);
+
+	if (len > 0)
+		strace->len += len;
+}
+
+int btf_type_trace_show(const struct btf *btf, u32 type_id, void *obj,
+			u32 trace_id, u64 flags)
+{
+	struct btf_show_trace strace;
+
+	strace.show.showfn = btf_trace_show;
+	strace.show.target = NULL;
+	/* Each trace event in object display is already emitted on a different
+	 * line; suppress show newline output to avoid a double newline after
+	 * each event.
+	 */
+	strace.show.flags = flags | BTF_SHOW_NONEWLINE;
+	strace.trace_id = trace_id;
+	strace.len = 0;
+
+	btf_type_show(btf, type_id, obj, (struct btf_show *)&strace);
+
+	/* If we encontered an error, return it. */
+	if (strace.show.state.status)
+		return strace.show.state.status;
+
+	/* Otherwise return length we would have written */
+	return strace.len;
 }
 
 #ifdef CONFIG_PROC_FS
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cb91ef9..6453a75 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -383,26 +383,35 @@ static void bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
 
 #define BPF_TRACE_PRINTK_SIZE   1024
 
-static inline __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
+int bpf_trace_vprintk(__u32 trace_id, const char *fmt, va_list ap)
 {
 	static char buf[BPF_TRACE_PRINTK_SIZE];
 	unsigned long flags;
-	va_list ap;
 	int ret;
 
 	raw_spin_lock_irqsave(&trace_printk_lock, flags);
-	va_start(ap, fmt);
 	ret = vsnprintf(buf, sizeof(buf), fmt, ap);
-	va_end(ap);
 	/* vsnprintf() will not append null for zero-length strings */
 	if (ret == 0)
 		buf[0] = '\0';
-	trace_bpf_trace_printk(buf);
+	trace_bpf_trace_printk(buf, trace_id);
 	raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
 
 	return ret;
 }
 
+static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
+{
+	va_list ap;
+	int ret;
+
+	va_start(ap, fmt);
+	ret = bpf_trace_vprintk(0, fmt, ap);
+	va_end(ap);
+
+	return ret;
+}
+
 /*
  * Only limited trace_printk() conversion specifiers allowed:
  * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %pB %pks %pus %s
diff --git a/kernel/trace/bpf_trace.h b/kernel/trace/bpf_trace.h
index 9acbc11..5c401b3 100644
--- a/kernel/trace/bpf_trace.h
+++ b/kernel/trace/bpf_trace.h
@@ -10,16 +10,18 @@
 
 TRACE_EVENT(bpf_trace_printk,
 
-	TP_PROTO(const char *bpf_string),
+	TP_PROTO(const char *bpf_string, __u32 trace_id),
 
-	TP_ARGS(bpf_string),
+	TP_ARGS(bpf_string, trace_id),
 
 	TP_STRUCT__entry(
 		__string(bpf_string, bpf_string)
+		__field(__u32, trace_id)
 	),
 
 	TP_fast_assign(
 		__assign_str(bpf_string, bpf_string);
+		__entry->trace_id = trace_id;
 	),
 
 	TP_printk("%s", __get_str(bpf_string))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 3/4] bpf: add bpf_trace_btf helper
  2020-08-06 14:42 [RFC PATCH bpf-next 0/4] bpf: add bpf-based bpf_trace_printk()-like support Alan Maguire
  2020-08-06 14:42 ` [RFC PATCH bpf-next 1/4] bpf: provide function to get vmlinux BTF information Alan Maguire
  2020-08-06 14:42 ` [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk Alan Maguire
@ 2020-08-06 14:42 ` Alan Maguire
  2020-08-20  6:36   ` Andrii Nakryiko
  2020-08-06 14:42 ` [RFC PATCH bpf-next 4/4] selftests/bpf: add bpf_trace_btf helper tests Alan Maguire
  3 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2020-08-06 14:42 UTC (permalink / raw)
  To: ast, daniel, andriin, yhs
  Cc: linux, andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest, Alan Maguire

A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF).  Its signature is

long bpf_trace_btf(struct btf_ptr *ptr, u32 btf_ptr_size, u32 trace_id,
                   u64 flags);

struct btf_ptr * specifies

- a pointer to the data to be traced;
- the BTF id of the type of data pointed to; or
- a string representation of the type of data pointed to
- a flags field is provided for future use; these flags
  are not to be confused with the BTF_TRACE_F_* flags
  below that control how the btf_ptr is displayed; the
  flags member of the struct btf_ptr may be used to
  disambiguate types in kernel versus module BTF, etc;
  the main distinction is the flags relate to the type
  and information needed in identifying it; not how it
  is displayed.

The helper also specifies a trace id which is set for the
bpf_trace_printk tracepoint; this allows BPF programs
to filter on specific trace ids, ensuring output does
not become mixed between different traced events and
hard to read.

For example a BPF program with a struct sk_buff *skb
could do the following:

        static const char *skb_type = "struct sk_buff";
        static struct btf_ptr b = { };

        b.ptr = skb;
        b.type = skb_type;
        bpf_trace_btf(&b, sizeof(b), 0, 0);

Default output in the trace_pipe looks like this:

          <idle>-0     [023] d.s.  1825.778400: bpf_trace_printk: (struct sk_buff){
          <idle>-0     [023] d.s.  1825.778409: bpf_trace_printk:  (union){
          <idle>-0     [023] d.s.  1825.778410: bpf_trace_printk:   (struct){
          <idle>-0     [023] d.s.  1825.778412: bpf_trace_printk:    .prev = (struct sk_buff *)0x00000000b2a3df7e,
          <idle>-0     [023] d.s.  1825.778413: bpf_trace_printk:    (union){
          <idle>-0     [023] d.s.  1825.778414: bpf_trace_printk:     .dev = (struct net_device *)0x000000001658808b,
          <idle>-0     [023] d.s.  1825.778416: bpf_trace_printk:     .dev_scratch = (long unsigned int)18446628460391432192,
          <idle>-0     [023] d.s.  1825.778417: bpf_trace_printk:    },
          <idle>-0     [023] d.s.  1825.778417: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778418: bpf_trace_printk:   .rbnode = (struct rb_node){
          <idle>-0     [023] d.s.  1825.778419: bpf_trace_printk:    .rb_right = (struct rb_node *)0x00000000b2a3df7e,
          <idle>-0     [023] d.s.  1825.778420: bpf_trace_printk:    .rb_left = (struct rb_node *)0x000000001658808b,
          <idle>-0     [023] d.s.  1825.778420: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778421: bpf_trace_printk:   .list = (struct list_head){
          <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:    .prev = (struct list_head *)0x00000000b2a3df7e,
          <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:  },
          <idle>-0     [023] d.s.  1825.778426: bpf_trace_printk:  .len = (unsigned int)168,
          <idle>-0     [023] d.s.  1825.778427: bpf_trace_printk:  .mac_len = (__u16)14,
          <idle>-0     [023] d.s.  1825.778428: bpf_trace_printk:  .queue_mapping = (__u16)17,
          <idle>-0     [023] d.s.  1825.778430: bpf_trace_printk:  .head_frag = (__u8)0x1,
          <idle>-0     [023] d.s.  1825.778431: bpf_trace_printk:  .ip_summed = (__u8)0x1,
          <idle>-0     [023] d.s.  1825.778432: bpf_trace_printk:  .l4_hash = (__u8)0x1,
          <idle>-0     [023] d.s.  1825.778433: bpf_trace_printk:  .hash = (__u32)1873247608,
          <idle>-0     [023] d.s.  1825.778434: bpf_trace_printk:  (union){
          <idle>-0     [023] d.s.  1825.778435: bpf_trace_printk:   .napi_id = (unsigned int)8209,
          <idle>-0     [023] d.s.  1825.778436: bpf_trace_printk:   .sender_cpu = (unsigned int)8209,
          <idle>-0     [023] d.s.  1825.778436: bpf_trace_printk:  },
          <idle>-0     [023] d.s.  1825.778437: bpf_trace_printk:  .protocol = (__be16)8,
          <idle>-0     [023] d.s.  1825.778438: bpf_trace_printk:  .transport_header = (__u16)226,
          <idle>-0     [023] d.s.  1825.778439: bpf_trace_printk:  .network_header = (__u16)206,
          <idle>-0     [023] d.s.  1825.778440: bpf_trace_printk:  .mac_header = (__u16)192,
          <idle>-0     [023] d.s.  1825.778440: bpf_trace_printk:  .tail = (sk_buff_data_t)374,
          <idle>-0     [023] d.s.  1825.778441: bpf_trace_printk:  .end = (sk_buff_data_t)1728,
          <idle>-0     [023] d.s.  1825.778442: bpf_trace_printk:  .head = (unsigned char *)0x000000009798cb6b,
          <idle>-0     [023] d.s.  1825.778443: bpf_trace_printk:  .data = (unsigned char *)0x0000000064823282,
          <idle>-0     [023] d.s.  1825.778444: bpf_trace_printk:  .truesize = (unsigned int)2304,
          <idle>-0     [023] d.s.  1825.778445: bpf_trace_printk:  .users = (refcount_t){
          <idle>-0     [023] d.s.  1825.778445: bpf_trace_printk:   .refs = (atomic_t){
          <idle>-0     [023] d.s.  1825.778447: bpf_trace_printk:    .counter = (int)1,
          <idle>-0     [023] d.s.  1825.778447: bpf_trace_printk:   },
          <idle>-0     [023] d.s.  1825.778447: bpf_trace_printk:  },
          <idle>-0     [023] d.s.  1825.778448: bpf_trace_printk: }

Flags modifying display are as follows:

- BTF_TRACE_F_COMPACT:	no formatting around type information
- BTF_TRACE_F_NONAME:	no struct/union member names/types
- BTF_TRACE_F_PTR_RAW:	show raw (unobfuscated) pointer values;
			equivalent to %px.
- BTF_TRACE_F_ZERO:	show zero-valued struct/union members;
			they are not displayed by default

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/bpf.h            |   1 +
 include/linux/btf.h            |   9 ++--
 include/uapi/linux/bpf.h       |  63 +++++++++++++++++++++++++
 kernel/bpf/core.c              |   5 ++
 kernel/bpf/helpers.c           |   4 ++
 kernel/trace/bpf_trace.c       | 102 ++++++++++++++++++++++++++++++++++++++++-
 scripts/bpf_helpers_doc.py     |   2 +
 tools/include/uapi/linux/bpf.h |  63 +++++++++++++++++++++++++
 8 files changed, 243 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6143b6e..f67819d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -934,6 +934,7 @@ struct bpf_event_entry {
 const char *kernel_type_name(u32 btf_type_id);
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
+const struct bpf_func_proto *bpf_get_trace_btf_proto(void);
 
 typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
 					unsigned long off, unsigned long len);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 46bf9f4..3d31e28 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -6,6 +6,7 @@
 
 #include <linux/types.h>
 #include <uapi/linux/btf.h>
+#include <uapi/linux/bpf.h>
 
 #define BTF_TYPE_EMIT(type) ((void)(type *)0)
 
@@ -61,10 +62,10 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
  *	- BTF_SHOW_UNSAFE: skip use of bpf_probe_read() to safely read
  *	  data before displaying it.
  */
-#define BTF_SHOW_COMPACT	(1ULL << 0)
-#define BTF_SHOW_NONAME		(1ULL << 1)
-#define BTF_SHOW_PTR_RAW	(1ULL << 2)
-#define BTF_SHOW_ZERO		(1ULL << 3)
+#define BTF_SHOW_COMPACT	BTF_TRACE_F_COMPACT
+#define BTF_SHOW_NONAME		BTF_TRACE_F_NONAME
+#define BTF_SHOW_PTR_RAW	BTF_TRACE_F_PTR_RAW
+#define BTF_SHOW_ZERO		BTF_TRACE_F_ZERO
 #define BTF_SHOW_NONEWLINE	(1ULL << 32)
 #define BTF_SHOW_UNSAFE		(1ULL << 33)
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b134e67..726fee4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3394,6 +3394,36 @@ struct bpf_stack_build_id {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * long bpf_trace_btf(struct btf_ptr *ptr, u32 btf_ptr_size, u32 trace_id, u64 flags)
+ *	Description
+ *		Utilize BTF to trace a representation of *ptr*->ptr, using
+ *		*ptr*->type name or *ptr*->type_id.  *ptr*->type_name
+ *		should specify the type *ptr*->ptr points to. Traversing that
+ *		data structure using BTF, the type information and values are
+ *		bpf_trace_printk()ed.  Safe copy of the pointer data is
+ *		carried out to avoid kernel crashes during data display.
+ *		Tracing specifies *trace_id* as the id associated with the
+ *		trace event; this can be used to filter trace events
+ *		to show a subset of all traced output, helping to avoid
+ *		the situation where BTF output is intermixed with other
+ *		output.
+ *
+ *		*flags* is a combination of
+ *
+ *		**BTF_TRACE_F_COMPACT**
+ *			no formatting around type information
+ *		**BTF_TRACE_F_NONAME**
+ *			no struct/union member names/types
+ *		**BTF_TRACE_F_PTR_RAW**
+ *			show raw (unobfuscated) pointer values;
+ *			equivalent to printk specifier %px.
+ *		**BTF_TRACE_F_ZERO**
+ *			show zero-valued struct/union members; they
+ *			are not displayed by default
+ *
+ *	Return
+ *		The number of bytes traced, or a negative error in cases of
+ *		failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3538,6 +3568,7 @@ struct bpf_stack_build_id {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(trace_btf),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -4446,4 +4477,36 @@ struct bpf_sk_lookup {
 	__u32 local_port;	/* Host byte order */
 };
 
+/*
+ * struct btf_ptr is used for typed pointer display; the
+ * additional type string/BTF type id are used to render the pointer
+ * data as the appropriate type via the bpf_trace_btf() helper
+ * above.  A flags field - potentially to specify additional details
+ * about the BTF pointer (rather than its mode of display) - is
+ * present for future use.  Display flags - BTF_TRACE_F_* - are
+ * passed to display functions separately.
+ */
+struct btf_ptr {
+	void *ptr;
+	const char *type;
+	__u32 type_id;
+	__u32 flags;		/* BTF ptr flags; unused at present. */
+};
+
+/*
+ * Flags to control bpf_trace_btf() behaviour.
+ *	- BTF_TRACE_F_COMPACT: no formatting around type information
+ *	- BTF_TRACE_F_NONAME: no struct/union member names/types
+ *	- BTF_TRACE_F_PTR_RAW: show raw (unobfuscated) pointer values;
+ *	  equivalent to %px.
+ *	- BTF_TRACE_F_ZERO: show zero-valued struct/union members; they
+ *	  are not displayed by default
+ */
+enum {
+	BTF_TRACE_F_COMPACT	=	(1ULL << 0),
+	BTF_TRACE_F_NONAME	=	(1ULL << 1),
+	BTF_TRACE_F_PTR_RAW	=	(1ULL << 2),
+	BTF_TRACE_F_ZERO	=	(1ULL << 3),
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index bde9334..82b3a98 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2214,6 +2214,11 @@ const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 	return NULL;
 }
 
+const struct bpf_func_proto * __weak bpf_get_trace_btf_proto(void)
+{
+	return NULL;
+}
+
 u64 __weak
 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 		 void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be43ab3..b9a842b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -661,6 +661,10 @@ static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
 		if (!perfmon_capable())
 			return NULL;
 		return bpf_get_trace_printk_proto();
+	case BPF_FUNC_trace_btf:
+		if (!perfmon_capable())
+			return NULL;
+		return bpf_get_trace_btf_proto();
 	case BPF_FUNC_jiffies64:
 		return &bpf_jiffies64_proto;
 	default:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6453a75..92212a1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -14,8 +14,12 @@
 #include <linux/spinlock.h>
 #include <linux/syscalls.h>
 #include <linux/error-injection.h>
+#include <linux/btf.h>
 #include <linux/btf_ids.h>
 
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/btf.h>
+
 #include <asm/tlb.h>
 
 #include "trace_probe.h"
@@ -555,10 +559,91 @@ static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
 	.arg2_type	= ARG_CONST_SIZE,
 };
 
-const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
+#define BTF_TRACE_F_ALL	(BTF_TRACE_F_COMPACT | BTF_TRACE_F_NONAME | \
+			 BTF_TRACE_F_PTR_RAW | BTF_TRACE_F_ZERO)
+
+BPF_CALL_4(bpf_trace_btf, struct btf_ptr *, ptr, u32, btf_ptr_size,
+	   u32, trace_id, u64, flags)
+{
+	u8 btf_kind = BTF_KIND_TYPEDEF;
+	char type_name[KSYM_NAME_LEN];
+	const struct btf_type *t;
+	const struct btf *btf;
+	const char *btf_type;
+	s32 btf_id;
+	int ret;
+
+	if (unlikely(flags & ~(BTF_TRACE_F_ALL)))
+		return -EINVAL;
+
+	if (btf_ptr_size != sizeof(struct btf_ptr))
+		return -EINVAL;
+
+	btf = bpf_get_btf_vmlinux();
+
+	if (IS_ERR_OR_NULL(btf))
+		return PTR_ERR(btf);
+
+	if (ptr->type != NULL) {
+		ret = copy_from_kernel_nofault(type_name, ptr->type,
+					       sizeof(type_name));
+		if (ret)
+			return ret;
+
+		btf_type = type_name;
+
+		if (strncmp(btf_type, "struct ", strlen("struct ")) == 0) {
+			btf_kind = BTF_KIND_STRUCT;
+			btf_type += strlen("struct ");
+		} else if (strncmp(btf_type, "union ", strlen("union ")) == 0) {
+			btf_kind = BTF_KIND_UNION;
+			btf_type += strlen("union ");
+		} else if (strncmp(btf_type, "enum ", strlen("enum ")) == 0) {
+			btf_kind = BTF_KIND_ENUM;
+			btf_type += strlen("enum ");
+		}
+
+		if (strlen(btf_type) == 0)
+			return -EINVAL;
+
+		/*
+		 * Assume type specified is a typedef as there's not much
+		 * benefit in specifying int types other than wasting time
+		 * on BTF lookups; we optimize for the most useful path.
+		 *
+		 * Fall back to BTF_KIND_INT if this fails.
+		 */
+		btf_id = btf_find_by_name_kind(btf, btf_type, btf_kind);
+		if (btf_id < 0)
+			btf_id = btf_find_by_name_kind(btf, btf_type,
+						       BTF_KIND_INT);
+	} else if (ptr->type_id > 0)
+		btf_id = ptr->type_id;
+	else
+		return -EINVAL;
+
+	if (btf_id > 0)
+		t = btf_type_by_id(btf, btf_id);
+	if (btf_id <= 0 || !t)
+		return -ENOENT;
+
+	return btf_type_trace_show(btf, btf_id, ptr->ptr, trace_id, flags);
+}
+
+static const struct bpf_func_proto bpf_trace_btf_proto = {
+	.func		= bpf_trace_btf,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+static void bpf_trace_printk_enable(void)
 {
 	/*
-	 * This program might be calling bpf_trace_printk,
+	 * This program might be calling bpf_trace_[printk|btf],
 	 * so enable the associated bpf_trace/bpf_trace_printk event.
 	 * Repeat this each time as it is possible a user has
 	 * disabled bpf_trace_printk events.  By loading a program
@@ -567,10 +652,21 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
 	 */
 	if (trace_set_clr_event("bpf_trace", "bpf_trace_printk", 1))
 		pr_warn_ratelimited("could not enable bpf_trace_printk events");
+}
+const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
+{
+	bpf_trace_printk_enable();
 
 	return &bpf_trace_printk_proto;
 }
 
+const struct bpf_func_proto *bpf_get_trace_btf_proto(void)
+{
+	bpf_trace_printk_enable();
+
+	return &bpf_trace_btf_proto;
+}
+
 #define MAX_SEQ_PRINTF_VARARGS		12
 #define MAX_SEQ_PRINTF_MAX_MEMCPY	6
 #define MAX_SEQ_PRINTF_STR_LEN		128
@@ -1139,6 +1235,8 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type)
 		return &bpf_get_current_comm_proto;
 	case BPF_FUNC_trace_printk:
 		return bpf_get_trace_printk_proto();
+	case BPF_FUNC_trace_btf:
+		return bpf_get_trace_btf_proto();
 	case BPF_FUNC_get_smp_processor_id:
 		return &bpf_get_smp_processor_id_proto;
 	case BPF_FUNC_get_numa_node_id:
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 5bfa448..7c7384b 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -432,6 +432,7 @@ class PrinterHelpers(Printer):
             'struct __sk_buff',
             'struct sk_msg_md',
             'struct xdp_md',
+            'struct btf_ptr',
     ]
     known_types = {
             '...',
@@ -472,6 +473,7 @@ class PrinterHelpers(Printer):
             'struct tcp_request_sock',
             'struct udp6_sock',
             'struct task_struct',
+            'struct btf_ptr',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b134e67..726fee4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3394,6 +3394,36 @@ struct bpf_stack_build_id {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * long bpf_trace_btf(struct btf_ptr *ptr, u32 btf_ptr_size, u32 trace_id, u64 flags)
+ *	Description
+ *		Utilize BTF to trace a representation of *ptr*->ptr, using
+ *		*ptr*->type name or *ptr*->type_id.  *ptr*->type_name
+ *		should specify the type *ptr*->ptr points to. Traversing that
+ *		data structure using BTF, the type information and values are
+ *		bpf_trace_printk()ed.  Safe copy of the pointer data is
+ *		carried out to avoid kernel crashes during data display.
+ *		Tracing specifies *trace_id* as the id associated with the
+ *		trace event; this can be used to filter trace events
+ *		to show a subset of all traced output, helping to avoid
+ *		the situation where BTF output is intermixed with other
+ *		output.
+ *
+ *		*flags* is a combination of
+ *
+ *		**BTF_TRACE_F_COMPACT**
+ *			no formatting around type information
+ *		**BTF_TRACE_F_NONAME**
+ *			no struct/union member names/types
+ *		**BTF_TRACE_F_PTR_RAW**
+ *			show raw (unobfuscated) pointer values;
+ *			equivalent to printk specifier %px.
+ *		**BTF_TRACE_F_ZERO**
+ *			show zero-valued struct/union members; they
+ *			are not displayed by default
+ *
+ *	Return
+ *		The number of bytes traced, or a negative error in cases of
+ *		failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3538,6 +3568,7 @@ struct bpf_stack_build_id {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(trace_btf),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -4446,4 +4477,36 @@ struct bpf_sk_lookup {
 	__u32 local_port;	/* Host byte order */
 };
 
+/*
+ * struct btf_ptr is used for typed pointer display; the
+ * additional type string/BTF type id are used to render the pointer
+ * data as the appropriate type via the bpf_trace_btf() helper
+ * above.  A flags field - potentially to specify additional details
+ * about the BTF pointer (rather than its mode of display) - is
+ * present for future use.  Display flags - BTF_TRACE_F_* - are
+ * passed to display functions separately.
+ */
+struct btf_ptr {
+	void *ptr;
+	const char *type;
+	__u32 type_id;
+	__u32 flags;		/* BTF ptr flags; unused at present. */
+};
+
+/*
+ * Flags to control bpf_trace_btf() behaviour.
+ *	- BTF_TRACE_F_COMPACT: no formatting around type information
+ *	- BTF_TRACE_F_NONAME: no struct/union member names/types
+ *	- BTF_TRACE_F_PTR_RAW: show raw (unobfuscated) pointer values;
+ *	  equivalent to %px.
+ *	- BTF_TRACE_F_ZERO: show zero-valued struct/union members; they
+ *	  are not displayed by default
+ */
+enum {
+	BTF_TRACE_F_COMPACT	=	(1ULL << 0),
+	BTF_TRACE_F_NONAME	=	(1ULL << 1),
+	BTF_TRACE_F_PTR_RAW	=	(1ULL << 2),
+	BTF_TRACE_F_ZERO	=	(1ULL << 3),
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 4/4] selftests/bpf: add bpf_trace_btf helper tests
  2020-08-06 14:42 [RFC PATCH bpf-next 0/4] bpf: add bpf-based bpf_trace_printk()-like support Alan Maguire
                   ` (2 preceding siblings ...)
  2020-08-06 14:42 ` [RFC PATCH bpf-next 3/4] bpf: add bpf_trace_btf helper Alan Maguire
@ 2020-08-06 14:42 ` Alan Maguire
  3 siblings, 0 replies; 11+ messages in thread
From: Alan Maguire @ 2020-08-06 14:42 UTC (permalink / raw)
  To: ast, daniel, andriin, yhs
  Cc: linux, andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest, Alan Maguire

Basic tests verifying various flag combinations for bpf_trace_btf()
using a tp_btf program to trace skb data.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/testing/selftests/bpf/prog_tests/trace_btf.c | 45 ++++++++++++++++++++++
 .../selftests/bpf/progs/netif_receive_skb.c        | 43 +++++++++++++++++++++
 2 files changed, 88 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_btf.c
 create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c

diff --git a/tools/testing/selftests/bpf/prog_tests/trace_btf.c b/tools/testing/selftests/bpf/prog_tests/trace_btf.c
new file mode 100644
index 0000000..e64b69d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/trace_btf.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#include "netif_receive_skb.skel.h"
+
+void test_trace_btf(void)
+{
+	struct netif_receive_skb *skel;
+	struct netif_receive_skb__bss *bss;
+	int err, duration = 0;
+
+	skel = netif_receive_skb__open();
+	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
+		return;
+
+	err = netif_receive_skb__load(skel);
+	if (CHECK(err, "skel_load", "failed to load skeleton: %d\n", err))
+		goto cleanup;
+
+	bss = skel->bss;
+
+	err = netif_receive_skb__attach(skel);
+	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+		goto cleanup;
+
+	/* generate receive event */
+	system("ping -c 10 127.0.0.1");
+
+	/*
+	 * Make sure netif_receive_skb program was triggered
+	 * and it set expected return values from bpf_trace_printk()s
+	 * and all tests ran.
+	 */
+	if (CHECK(bss->ret <= 0,
+		  "bpf_trace_btf: got return value",
+		  "ret <= 0 %ld test %d\n", bss->ret, bss->num_subtests))
+		goto cleanup;
+
+	CHECK(bss->num_subtests != bss->ran_subtests, "check all subtests ran",
+	      "only ran %d of %d tests\n", bss->num_subtests,
+	      bss->ran_subtests);
+
+cleanup:
+	netif_receive_skb__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/netif_receive_skb.c b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
new file mode 100644
index 0000000..cab764e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Oracle and/or its affiliates. */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+long ret = 0;
+int num_subtests = 0;
+int ran_subtests = 0;
+
+#define CHECK_TRACE(_p, flags)						 \
+	do {								 \
+		++num_subtests;						 \
+		if (ret >= 0) {						 \
+			++ran_subtests;					 \
+			ret = bpf_trace_btf(_p, sizeof(*(_p)), 0, flags);\
+		}							 \
+	} while (0)
+
+/* TRACE_EVENT(netif_receive_skb,
+ *	TP_PROTO(struct sk_buff *skb),
+ */
+SEC("tp_btf/netif_receive_skb")
+int BPF_PROG(trace_netif_receive_skb, struct sk_buff *skb)
+{
+	static const char skb_type[] = "struct sk_buff";
+	static struct btf_ptr p = { };
+
+	p.ptr = skb;
+	p.type = skb_type;
+
+	CHECK_TRACE(&p, 0);
+	CHECK_TRACE(&p, BTF_TRACE_F_COMPACT);
+	CHECK_TRACE(&p, BTF_TRACE_F_NONAME);
+	CHECK_TRACE(&p, BTF_TRACE_F_PTR_RAW);
+	CHECK_TRACE(&p, BTF_TRACE_F_ZERO);
+	CHECK_TRACE(&p, BTF_TRACE_F_COMPACT | BTF_TRACE_F_NONAME |
+		    BTF_TRACE_F_PTR_RAW | BTF_TRACE_F_ZERO);
+
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk
  2020-08-06 14:42 ` [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk Alan Maguire
@ 2020-08-13  1:46   ` Alexei Starovoitov
  2020-08-14 13:06     ` Alan Maguire
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2020-08-13  1:46 UTC (permalink / raw)
  To: Alan Maguire
  Cc: ast, daniel, andriin, yhs, linux, andriy.shevchenko, pmladek,
	kafai, songliubraving, john.fastabend, kpsingh, shuah, rdna,
	scott.branden, quentin, cneirabustos, jakub, mingo, rostedt, bpf,
	netdev, linux-kernel, linux-kselftest

On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
> 
> The bpf_trace_printk tracepoint is augmented with a "trace_id"
> field; it is used to allow tracepoint filtering as typed display
> information can easily be interspersed with other tracing data,
> making it hard to read.  Specifying a trace_id will allow users
> to selectively trace data, eliminating noise.

Since trace_id is not seen in trace_pipe, how do you expect users
to filter by it?
It also feels like workaround. May be let bpf prog print the whole
struct in one go with multiple new lines and call
trace_bpf_trace_printk(buf) once?

Also please add interface into bpf_seq_printf.
BTF enabled struct prints is useful for iterators too
and generalization you've done in this patch pretty much takes it there.

> +/*
> + * Options to control show behaviour.
> + *	- BTF_SHOW_COMPACT: no formatting around type information
> + *	- BTF_SHOW_NONAME: no struct/union member names/types
> + *	- BTF_SHOW_PTR_RAW: show raw (unobfuscated) pointer values;
> + *	  equivalent to %px.
> + *	- BTF_SHOW_ZERO: show zero-valued struct/union members; they
> + *	  are not displayed by default
> + *	- BTF_SHOW_NONEWLINE: include indent, but suppress newline;
> + *	  to be used when a show function implicitly includes a newline.
> + *	- BTF_SHOW_UNSAFE: skip use of bpf_probe_read() to safely read
> + *	  data before displaying it.
> + */
> +#define BTF_SHOW_COMPACT	(1ULL << 0)
> +#define BTF_SHOW_NONAME		(1ULL << 1)
> +#define BTF_SHOW_PTR_RAW	(1ULL << 2)
> +#define BTF_SHOW_ZERO		(1ULL << 3)
> +#define BTF_SHOW_NONEWLINE	(1ULL << 32)
> +#define BTF_SHOW_UNSAFE		(1ULL << 33)

I could have missed it earlier, but what is the motivation to leave the gap
in bits? Just do bit 4 and 5 ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk
  2020-08-13  1:46   ` Alexei Starovoitov
@ 2020-08-14 13:06     ` Alan Maguire
  2020-08-14 17:01       ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2020-08-14 13:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, ast, daniel, andriin, yhs, linux,
	andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest

On Wed, 12 Aug 2020, Alexei Starovoitov wrote:

> On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
> > 
> > The bpf_trace_printk tracepoint is augmented with a "trace_id"
> > field; it is used to allow tracepoint filtering as typed display
> > information can easily be interspersed with other tracing data,
> > making it hard to read.  Specifying a trace_id will allow users
> > to selectively trace data, eliminating noise.
> 
> Since trace_id is not seen in trace_pipe, how do you expect users
> to filter by it?

Sorry should have specified this.  The approach is to use trace
instances and filtering such that we only see events associated
with a specific trace_id.  There's no need for the trace event to
actually display the trace_id - it's still usable as a filter.
The steps involved are:

1. create a trace instance within which we can specify a fresh
   set of trace event enablings, filters etc.

mkdir /sys/kernel/debug/tracing/instances/traceid100

2. enable the filter for the specific trace id

echo "trace_id == 100" > 
/sys/kernel/debug/tracing/instances/traceid100/events/bpf_trace/bpf_trace_printk/filter

3. enable the trace event

echo 1 > 
/sys/kernel/debug/tracing/instances/events/bpf_trace/bpf_trace_printk/enable

4. ensure the BPF program uses a trace_id 100 when calling bpf_trace_btf()

So the above can be done for multiple programs; output can then
be separated for different programs if trace_ids and filtering are
used together. The above trace instance only sees bpf_trace_btf()
events which specify trace_id 100.

I've attached a tweaked version of the patch 4 in the patchset that 
ensures that a trace instance with filtering enabled as above sees
the bpf_trace_btf events, but _not_ bpf_trace_printk events (since
they have trace_id 0 by default).

To me the above provides a simple way to separate BPF program output
for simple BPF programs; ringbuf and perf events require a bit more
work in both BPF and userspace to support such coordination.  What do
you think - does this approach seem worth using? If so we could also
consider extending it to bpf_trace_printk(), if we can find a way
to provide a trace_id there too.
 
> It also feels like workaround. May be let bpf prog print the whole
> struct in one go with multiple new lines and call
> trace_bpf_trace_printk(buf) once?

We can do that absolutely, but I'd be interested to get your take
on the filtering mechanism before taking that approach.  I'll add
a description of the above mechanism to the cover letter and
patch to be clearer next time too.

> 
> Also please add interface into bpf_seq_printf.
> BTF enabled struct prints is useful for iterators too
> and generalization you've done in this patch pretty much takes it there.
> 

Sure, I'll try and tackle that next time.

> > +#define BTF_SHOW_COMPACT	(1ULL << 0)
> > +#define BTF_SHOW_NONAME		(1ULL << 1)
> > +#define BTF_SHOW_PTR_RAW	(1ULL << 2)
> > +#define BTF_SHOW_ZERO		(1ULL << 3)
> > +#define BTF_SHOW_NONEWLINE	(1ULL << 32)
> > +#define BTF_SHOW_UNSAFE		(1ULL << 33)
> 
> I could have missed it earlier, but what is the motivation to leave the gap
> in bits? Just do bit 4 and 5 ?
> 

Patch 3 uses the first 4 as flags to bpf_trace_btf();
the final two are not supported for the helper as flag values
so I wanted to leave some space for additional bpf_trace_btf() flags.
BTF_SHOW_NONEWLINE is always used for bpf_trace_btf(), since the
tracing adds a newline for us and we don't want to double up on newlines, 
so it's ORed in as an implicit argument for the bpf_trace_btf() case. 
BTF_SHOW_UNSAFE isn't allowed within BPF so it's not available as
a flag for the helper.

Thanks!

Alan

From 10bd268b2585084c8f35d1b6ab0c3df76203f5cc Mon Sep 17 00:00:00 2001
From: Alan Maguire <alan.maguire@oracle.com>
Date: Thu, 6 Aug 2020 14:21:10 +0200
Subject: [PATCH] selftests/bpf: add bpf_trace_btf helper tests

Basic tests verifying various flag combinations for bpf_trace_btf()
using a tp_btf program to trace skb data.  Also verify that we
can create a trace instance to filter trace data, using the
trace_id value passed to bpf_trace/bpf_trace_printk events.

trace_id is specifiable for bpf_trace_btf() so the test ensures
the trace instance sees the filtered events only.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/testing/selftests/bpf/prog_tests/trace_btf.c | 150 +++++++++++++++++++++
 .../selftests/bpf/progs/netif_receive_skb.c        |  48 +++++++
 2 files changed, 198 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_btf.c
 create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c

diff --git a/tools/testing/selftests/bpf/prog_tests/trace_btf.c b/tools/testing/selftests/bpf/prog_tests/trace_btf.c
new file mode 100644
index 0000000..64a920f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/trace_btf.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#include "netif_receive_skb.skel.h"
+
+#define TRACEFS		"/sys/kernel/debug/tracing/"
+#define TRACEBUF	TRACEFS "trace_pipe"
+#define TRACEID		"100"
+#define TRACEINST	TRACEFS "instances/traceid" TRACEID "/"
+#define TRACEINSTBUF	TRACEINST "trace_pipe"
+#define TRACEINSTEVENT	TRACEINST "events/bpf_trace/bpf_trace_printk/"
+#define BTFMSG		"(struct sk_buff)"
+#define PRINTKMSG	"testing,testing"
+#define MAXITER		1000
+
+static bool findmsg(FILE *fp, const char *msg)
+{
+	bool found = false;
+	char *buf = NULL;
+	size_t buflen;
+	int iter = 0;
+
+	/* verify our search string is in the trace buffer */
+	while (++iter < MAXITER &&
+	       (getline(&buf, &buflen, fp) >= 0 || errno == EAGAIN)) {
+		found = strstr(buf, msg) != NULL;
+		if (found)
+			break;
+	}
+	free(buf);
+
+	return found;
+}
+
+/* Demonstrate that bpf_trace_btf succeeds with non-zero return values,
+ * and that filtering by trace instance works.  bpf_trace_btf() is called
+ * with trace_id of 100, so we create a trace instance that enables 
+ * bpf_trace_printk() events and filters on trace_id.  The trace instance
+ * pipe should therefore only see the trace_id == 100 events, i.e. the
+ * bpf_trace_btf() events.  The bpf program also uses bpf_trace_printk() -
+ * we ensure the global trace_pipe sees those but the instance does not.
+ */
+void test_trace_btf(void)
+{
+	struct netif_receive_skb *skel;
+	struct netif_receive_skb__bss *bss;
+	FILE *fp = NULL, *fpinst = NULL;
+	int err, duration = 0;
+
+	skel = netif_receive_skb__open();
+	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
+		return;
+
+	err = netif_receive_skb__load(skel);
+	if (CHECK(err, "skel_load", "failed to load skeleton: %d\n", err))
+		goto cleanup;
+
+	bss = skel->bss;
+
+	err = netif_receive_skb__attach(skel);
+	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+		goto cleanup;
+
+	/* Enable trace instance which filters on trace id associated with
+	 * bpf_trace_btf().
+	 */
+	if (CHECK(system("mkdir " TRACEINST), "could not create trace instance",
+		  "error creating %s", TRACEINST))
+		goto cleanup;
+
+	if (CHECK(system("echo \"trace_id == " TRACEID "\" > " TRACEINSTEVENT "filter"),
+		  "could not specify trace id filter for %s", TRACEID))
+		goto cleanup;
+	if (CHECK(system("echo 1 > " TRACEINSTEVENT "enable"), "could not enable trace event",
+		  "error enabling %s", TRACEINSTEVENT))
+		goto cleanup;
+
+	fp = fopen(TRACEBUF, "r");
+	if (CHECK(fp == NULL, "could not open trace buffer",
+		  "error %d opening %s", errno, TRACEBUF))
+		goto cleanup;
+
+	/* We do not want to wait forever if this test fails... */
+	fcntl(fileno(fp), F_SETFL, O_NONBLOCK);
+
+	fpinst = fopen(TRACEINSTBUF, "r");
+	if (CHECK(fpinst == NULL, "could not open instance trace buffer",
+		  "error %d opening %s", errno, TRACEINSTBUF))
+		goto cleanup;
+
+	/* We do not want to wait forever if this test fails... */
+	fcntl(fileno(fpinst), F_SETFL, O_NONBLOCK);
+
+	/* generate receive event */
+	system("ping -c 1 127.0.0.1 > /dev/null");
+
+	/*
+	 * Make sure netif_receive_skb program was triggered
+	 * and it set expected return values from bpf_trace_printk()s
+	 * and all tests ran.
+	 */
+	if (CHECK(bss->ret <= 0,
+		  "bpf_trace_btf: got return value",
+		  "ret <= 0 %ld test %d\n", bss->ret, bss->num_subtests))
+		goto cleanup;
+
+	if (CHECK(bss->num_subtests != bss->ran_subtests, "check all subtests ran",
+	      "only ran %d of %d tests\n", bss->num_subtests,
+	      bss->ran_subtests))
+		goto cleanup;
+
+	/* All messages should be in global trace_pipe */
+	if (CHECK(!findmsg(fp, BTFMSG), "global trace pipe should have BTF",
+		  "btf data not in trace pipe"))
+		goto cleanup;
+
+
+	if (CHECK(!findmsg(fp, PRINTKMSG),
+		  "global trace pipe should have printk",
+		  "trace_printk data not in global trace pipe"))
+		goto cleanup;
+
+
+	if (CHECK(!findmsg(fpinst, BTFMSG),
+		  "instance trace pipe should have BTF",
+		  "btf data not in instance trace pipe"))
+		goto cleanup;
+
+
+	/* we filtered on trace id, so bpf_trace_printk() message should not
+	 * be in our trace instance log.
+	 */
+	CHECK(findmsg(fpinst, PRINTKMSG),
+	      "instance trace pipe should not have printk",
+	      "trace_printk data should not be in instance trace pipe");
+	
+cleanup:
+	if (fpinst)
+		 fclose(fpinst);
+	if (fp)
+		fclose(fp);
+
+	system("echo 0 > " TRACEINSTEVENT "enable");
+	system("rmdir " TRACEINST);
+	netif_receive_skb__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/netif_receive_skb.c b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
new file mode 100644
index 0000000..a14f79b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Oracle and/or its affiliates. */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+long ret = 0;
+int num_subtests = 0;
+int ran_subtests = 0;
+
+/* Note a trace id of 100 is used, allowing filtering. */
+#define CHECK_TRACE(_p, flags)						 	\
+	do {								 	\
+		++num_subtests;						 	\
+		if (ret >= 0) {						 	\
+			++ran_subtests;					 	\
+			ret = bpf_trace_btf(_p, sizeof(*(_p)), 100, flags);	\
+		}							 	\
+	} while (0)
+
+/* TRACE_EVENT(netif_receive_skb,
+ *	TP_PROTO(struct sk_buff *skb),
+ */
+SEC("tp_btf/netif_receive_skb")
+int BPF_PROG(trace_netif_receive_skb, struct sk_buff *skb)
+{
+	static const char printkmsg[] = "testing,testing, ran %d so far...";
+	static const char skb_type[] = "struct sk_buff";
+	static struct btf_ptr p = { };
+
+	p.ptr = skb;
+	p.type = skb_type;
+
+	/* This message will go to the global tracing log */
+	bpf_trace_printk(printkmsg, sizeof(printkmsg), ran_subtests);
+
+	CHECK_TRACE(&p, 0);
+	CHECK_TRACE(&p, BTF_TRACE_F_COMPACT);
+	CHECK_TRACE(&p, BTF_TRACE_F_NONAME);
+	CHECK_TRACE(&p, BTF_TRACE_F_PTR_RAW);
+	CHECK_TRACE(&p, BTF_TRACE_F_ZERO);
+	CHECK_TRACE(&p, BTF_TRACE_F_COMPACT | BTF_TRACE_F_NONAME |
+		    BTF_TRACE_F_PTR_RAW | BTF_TRACE_F_ZERO);
+
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk
  2020-08-14 13:06     ` Alan Maguire
@ 2020-08-14 17:01       ` Alexei Starovoitov
  2020-08-18  9:12         ` Alan Maguire
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2020-08-14 17:01 UTC (permalink / raw)
  To: Alan Maguire
  Cc: ast, daniel, andriin, yhs, linux, andriy.shevchenko, pmladek,
	kafai, songliubraving, john.fastabend, kpsingh, shuah, rdna,
	scott.branden, quentin, cneirabustos, jakub, mingo, rostedt, bpf,
	netdev, linux-kernel, linux-kselftest

On Fri, Aug 14, 2020 at 02:06:37PM +0100, Alan Maguire wrote:
> On Wed, 12 Aug 2020, Alexei Starovoitov wrote:
> 
> > On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
> > > 
> > > The bpf_trace_printk tracepoint is augmented with a "trace_id"
> > > field; it is used to allow tracepoint filtering as typed display
> > > information can easily be interspersed with other tracing data,
> > > making it hard to read.  Specifying a trace_id will allow users
> > > to selectively trace data, eliminating noise.
> > 
> > Since trace_id is not seen in trace_pipe, how do you expect users
> > to filter by it?
> 
> Sorry should have specified this.  The approach is to use trace
> instances and filtering such that we only see events associated
> with a specific trace_id.  There's no need for the trace event to
> actually display the trace_id - it's still usable as a filter.
> The steps involved are:
> 
> 1. create a trace instance within which we can specify a fresh
>    set of trace event enablings, filters etc.
> 
> mkdir /sys/kernel/debug/tracing/instances/traceid100
> 
> 2. enable the filter for the specific trace id
> 
> echo "trace_id == 100" > 
> /sys/kernel/debug/tracing/instances/traceid100/events/bpf_trace/bpf_trace_printk/filter
> 
> 3. enable the trace event
> 
> echo 1 > 
> /sys/kernel/debug/tracing/instances/events/bpf_trace/bpf_trace_printk/enable
> 
> 4. ensure the BPF program uses a trace_id 100 when calling bpf_trace_btf()

ouch.
I think you interpreted the acceptance of the
commit 7fb20f9e901e ("bpf, doc: Remove references to warning message when using bpf_trace_printk()")
in the wrong way.

Everything that doc had said is still valid. In particular:
-A: This is done to nudge program authors into better interfaces when
-programs need to pass data to user space. Like bpf_perf_event_output()
-can be used to efficiently stream data via perf ring buffer.
-BPF maps can be used for asynchronous data sharing between kernel
-and user space. bpf_trace_printk() should only be used for debugging.

bpf_trace_printk is for debugging only. _debugging of bpf programs themselves_.
What you're describing above is logging and tracing. It's not debugging of programs.
perf buffer, ring buffer, and seq_file interfaces are the right
interfaces for tracing, logging, and kernel debugging.

> > It also feels like workaround. May be let bpf prog print the whole
> > struct in one go with multiple new lines and call
> > trace_bpf_trace_printk(buf) once?
> 
> We can do that absolutely, but I'd be interested to get your take
> on the filtering mechanism before taking that approach.  I'll add
> a description of the above mechanism to the cover letter and
> patch to be clearer next time too.

I think patch 3 is no go, because it takes bpf_trace_printk in
the wrong direction.
Instead please refactor it to use string buffer or seq_file as an output.
If the user happen to use bpf_trace_printk("%s", buf);
after that to print that string buffer to trace_pipe that's user's choice.
I can see such use case when program author wants to debug
their bpf program. That's fine. But for kernel debugging, on demand and
"always on" logging and tracing the documentation should point
to sustainable interfaces that don't interfere with each other,
can be run in parallel by multiple users, etc.

> > 
> > Also please add interface into bpf_seq_printf.
> > BTF enabled struct prints is useful for iterators too
> > and generalization you've done in this patch pretty much takes it there.
> > 
> 
> Sure, I'll try and tackle that next time.
> 
> > > +#define BTF_SHOW_COMPACT	(1ULL << 0)
> > > +#define BTF_SHOW_NONAME		(1ULL << 1)
> > > +#define BTF_SHOW_PTR_RAW	(1ULL << 2)
> > > +#define BTF_SHOW_ZERO		(1ULL << 3)
> > > +#define BTF_SHOW_NONEWLINE	(1ULL << 32)
> > > +#define BTF_SHOW_UNSAFE		(1ULL << 33)
> > 
> > I could have missed it earlier, but what is the motivation to leave the gap
> > in bits? Just do bit 4 and 5 ?
> > 
> 
> Patch 3 uses the first 4 as flags to bpf_trace_btf();
> the final two are not supported for the helper as flag values
> so I wanted to leave some space for additional bpf_trace_btf() flags.
> BTF_SHOW_NONEWLINE is always used for bpf_trace_btf(), since the
> tracing adds a newline for us and we don't want to double up on newlines, 
> so it's ORed in as an implicit argument for the bpf_trace_btf() case. 
> BTF_SHOW_UNSAFE isn't allowed within BPF so it's not available as
> a flag for the helper.

I see. I think such approach is more error prone than explicit filtering.
It's better to check for NONEWLINE and UNSAFE explicitly in those places
instead of playing obscure game of upper bits.
Also the choice of NONEWLINE vs NEWLINE should be available from the program
when it wants to print it into a string or into a seq_file.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk
  2020-08-14 17:01       ` Alexei Starovoitov
@ 2020-08-18  9:12         ` Alan Maguire
  2020-08-20 22:16           ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2020-08-18  9:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, ast, daniel, andriin, yhs, linux,
	andriy.shevchenko, pmladek, kafai, songliubraving,
	john.fastabend, kpsingh, shuah, rdna, scott.branden, quentin,
	cneirabustos, jakub, mingo, rostedt, bpf, netdev, linux-kernel,
	linux-kselftest


On Fri, 14 Aug 2020, Alexei Starovoitov wrote:

> On Fri, Aug 14, 2020 at 02:06:37PM +0100, Alan Maguire wrote:
> > On Wed, 12 Aug 2020, Alexei Starovoitov wrote:
> > 
> > > On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
> > > > 
> > > > The bpf_trace_printk tracepoint is augmented with a "trace_id"
> > > > field; it is used to allow tracepoint filtering as typed display
> > > > information can easily be interspersed with other tracing data,
> > > > making it hard to read.  Specifying a trace_id will allow users
> > > > to selectively trace data, eliminating noise.
> > > 
> > > Since trace_id is not seen in trace_pipe, how do you expect users
> > > to filter by it?
> > 
> > Sorry should have specified this.  The approach is to use trace
> > instances and filtering such that we only see events associated
> > with a specific trace_id.  There's no need for the trace event to
> > actually display the trace_id - it's still usable as a filter.
> > The steps involved are:
> > 
> > 1. create a trace instance within which we can specify a fresh
> >    set of trace event enablings, filters etc.
> > 
> > mkdir /sys/kernel/debug/tracing/instances/traceid100
> > 
> > 2. enable the filter for the specific trace id
> > 
> > echo "trace_id == 100" > 
> > /sys/kernel/debug/tracing/instances/traceid100/events/bpf_trace/bpf_trace_printk/filter
> > 
> > 3. enable the trace event
> > 
> > echo 1 > 
> > /sys/kernel/debug/tracing/instances/events/bpf_trace/bpf_trace_printk/enable
> > 
> > 4. ensure the BPF program uses a trace_id 100 when calling bpf_trace_btf()
> 
> ouch.
> I think you interpreted the acceptance of the
> commit 7fb20f9e901e ("bpf, doc: Remove references to warning message when using bpf_trace_printk()")
> in the wrong way.
> 
> Everything that doc had said is still valid. In particular:
> -A: This is done to nudge program authors into better interfaces when
> -programs need to pass data to user space. Like bpf_perf_event_output()
> -can be used to efficiently stream data via perf ring buffer.
> -BPF maps can be used for asynchronous data sharing between kernel
> -and user space. bpf_trace_printk() should only be used for debugging.
> 
> bpf_trace_printk is for debugging only. _debugging of bpf programs themselves_.
> What you're describing above is logging and tracing. It's not debugging of programs.
> perf buffer, ring buffer, and seq_file interfaces are the right
> interfaces for tracing, logging, and kernel debugging.
> 
> > > It also feels like workaround. May be let bpf prog print the whole
> > > struct in one go with multiple new lines and call
> > > trace_bpf_trace_printk(buf) once?
> > 
> > We can do that absolutely, but I'd be interested to get your take
> > on the filtering mechanism before taking that approach.  I'll add
> > a description of the above mechanism to the cover letter and
> > patch to be clearer next time too.
> 
> I think patch 3 is no go, because it takes bpf_trace_printk in
> the wrong direction.
> Instead please refactor it to use string buffer or seq_file as an output.

Fair enough. I'm thinking a helper like

long bpf_btf_snprintf(char *str, u32 str_size, struct btf_ptr *ptr,
		      u32 ptr_size, u64 flags);

Then the user can choose perf event or ringbuf interfaces
to share the results with userspace.

> If the user happen to use bpf_trace_printk("%s", buf);
> after that to print that string buffer to trace_pipe that's user's choice.
> I can see such use case when program author wants to debug
> their bpf program. That's fine. But for kernel debugging, on demand and
> "always on" logging and tracing the documentation should point
> to sustainable interfaces that don't interfere with each other,
> can be run in parallel by multiple users, etc.
> 

The problem with bpf_trace_printk() under this approach is
that the string size for %s arguments is very limited;
bpf_trace_printk() restricts these to 64 bytes in size.
Looks like bpf_seq_printf() restricts a %s string to 128
bytes also.  We could add an additional helper for the 
bpf_seq case which calls bpf_seq_printf() for each component
in the object, i.e.

long bpf_seq_btf_printf(struct seq_file *m, struct btf_ptr *ptr,
			u32 ptr_size, u64 flags);

This would steer users away from bpf_trace_printk()
for this use case - since it can print only a small
amount of the string - while supporting all 
the other user-space communication mechanisms.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 3/4] bpf: add bpf_trace_btf helper
  2020-08-06 14:42 ` [RFC PATCH bpf-next 3/4] bpf: add bpf_trace_btf helper Alan Maguire
@ 2020-08-20  6:36   ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2020-08-20  6:36 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song, Rasmus Villemoes, andriy.shevchenko, Petr Mladek,
	Martin Lau, Song Liu, john fastabend, KP Singh, Shuah Khan,
	Andrey Ignatov, scott.branden, Quentin Monnet, Carlos Neira,
	Jakub Sitnicki, Ingo Molnar, Steven Rostedt, bpf, Networking,
	open list, open list:KERNEL SELFTEST FRAMEWORK

On Thu, Aug 6, 2020 at 1:24 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> A helper is added to support tracing kernel type information in BPF
> using the BPF Type Format (BTF).  Its signature is
>
> long bpf_trace_btf(struct btf_ptr *ptr, u32 btf_ptr_size, u32 trace_id,
>                    u64 flags);
>
> struct btf_ptr * specifies
>
> - a pointer to the data to be traced;
> - the BTF id of the type of data pointed to; or
> - a string representation of the type of data pointed to
> - a flags field is provided for future use; these flags
>   are not to be confused with the BTF_TRACE_F_* flags
>   below that control how the btf_ptr is displayed; the
>   flags member of the struct btf_ptr may be used to
>   disambiguate types in kernel versus module BTF, etc;
>   the main distinction is the flags relate to the type
>   and information needed in identifying it; not how it
>   is displayed.
>
> The helper also specifies a trace id which is set for the
> bpf_trace_printk tracepoint; this allows BPF programs
> to filter on specific trace ids, ensuring output does
> not become mixed between different traced events and
> hard to read.
>
> For example a BPF program with a struct sk_buff *skb
> could do the following:
>
>         static const char *skb_type = "struct sk_buff";
>         static struct btf_ptr b = { };
>
>         b.ptr = skb;
>         b.type = skb_type;
>         bpf_trace_btf(&b, sizeof(b), 0, 0);
>
> Default output in the trace_pipe looks like this:
>
>           <idle>-0     [023] d.s.  1825.778400: bpf_trace_printk: (struct sk_buff){
>           <idle>-0     [023] d.s.  1825.778409: bpf_trace_printk:  (union){
>           <idle>-0     [023] d.s.  1825.778410: bpf_trace_printk:   (struct){
>           <idle>-0     [023] d.s.  1825.778412: bpf_trace_printk:    .prev = (struct sk_buff *)0x00000000b2a3df7e,
>           <idle>-0     [023] d.s.  1825.778413: bpf_trace_printk:    (union){
>           <idle>-0     [023] d.s.  1825.778414: bpf_trace_printk:     .dev = (struct net_device *)0x000000001658808b,
>           <idle>-0     [023] d.s.  1825.778416: bpf_trace_printk:     .dev_scratch = (long unsigned int)18446628460391432192,
>           <idle>-0     [023] d.s.  1825.778417: bpf_trace_printk:    },
>           <idle>-0     [023] d.s.  1825.778417: bpf_trace_printk:   },
>           <idle>-0     [023] d.s.  1825.778418: bpf_trace_printk:   .rbnode = (struct rb_node){
>           <idle>-0     [023] d.s.  1825.778419: bpf_trace_printk:    .rb_right = (struct rb_node *)0x00000000b2a3df7e,
>           <idle>-0     [023] d.s.  1825.778420: bpf_trace_printk:    .rb_left = (struct rb_node *)0x000000001658808b,
>           <idle>-0     [023] d.s.  1825.778420: bpf_trace_printk:   },
>           <idle>-0     [023] d.s.  1825.778421: bpf_trace_printk:   .list = (struct list_head){
>           <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:    .prev = (struct list_head *)0x00000000b2a3df7e,
>           <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:   },
>           <idle>-0     [023] d.s.  1825.778422: bpf_trace_printk:  },
>           <idle>-0     [023] d.s.  1825.778426: bpf_trace_printk:  .len = (unsigned int)168,
>           <idle>-0     [023] d.s.  1825.778427: bpf_trace_printk:  .mac_len = (__u16)14,
>           <idle>-0     [023] d.s.  1825.778428: bpf_trace_printk:  .queue_mapping = (__u16)17,
>           <idle>-0     [023] d.s.  1825.778430: bpf_trace_printk:  .head_frag = (__u8)0x1,
>           <idle>-0     [023] d.s.  1825.778431: bpf_trace_printk:  .ip_summed = (__u8)0x1,
>           <idle>-0     [023] d.s.  1825.778432: bpf_trace_printk:  .l4_hash = (__u8)0x1,
>           <idle>-0     [023] d.s.  1825.778433: bpf_trace_printk:  .hash = (__u32)1873247608,
>           <idle>-0     [023] d.s.  1825.778434: bpf_trace_printk:  (union){
>           <idle>-0     [023] d.s.  1825.778435: bpf_trace_printk:   .napi_id = (unsigned int)8209,
>           <idle>-0     [023] d.s.  1825.778436: bpf_trace_printk:   .sender_cpu = (unsigned int)8209,
>           <idle>-0     [023] d.s.  1825.778436: bpf_trace_printk:  },
>           <idle>-0     [023] d.s.  1825.778437: bpf_trace_printk:  .protocol = (__be16)8,
>           <idle>-0     [023] d.s.  1825.778438: bpf_trace_printk:  .transport_header = (__u16)226,
>           <idle>-0     [023] d.s.  1825.778439: bpf_trace_printk:  .network_header = (__u16)206,
>           <idle>-0     [023] d.s.  1825.778440: bpf_trace_printk:  .mac_header = (__u16)192,
>           <idle>-0     [023] d.s.  1825.778440: bpf_trace_printk:  .tail = (sk_buff_data_t)374,
>           <idle>-0     [023] d.s.  1825.778441: bpf_trace_printk:  .end = (sk_buff_data_t)1728,
>           <idle>-0     [023] d.s.  1825.778442: bpf_trace_printk:  .head = (unsigned char *)0x000000009798cb6b,
>           <idle>-0     [023] d.s.  1825.778443: bpf_trace_printk:  .data = (unsigned char *)0x0000000064823282,
>           <idle>-0     [023] d.s.  1825.778444: bpf_trace_printk:  .truesize = (unsigned int)2304,
>           <idle>-0     [023] d.s.  1825.778445: bpf_trace_printk:  .users = (refcount_t){
>           <idle>-0     [023] d.s.  1825.778445: bpf_trace_printk:   .refs = (atomic_t){
>           <idle>-0     [023] d.s.  1825.778447: bpf_trace_printk:    .counter = (int)1,
>           <idle>-0     [023] d.s.  1825.778447: bpf_trace_printk:   },
>           <idle>-0     [023] d.s.  1825.778447: bpf_trace_printk:  },
>           <idle>-0     [023] d.s.  1825.778448: bpf_trace_printk: }
>
> Flags modifying display are as follows:
>
> - BTF_TRACE_F_COMPACT:  no formatting around type information
> - BTF_TRACE_F_NONAME:   no struct/union member names/types
> - BTF_TRACE_F_PTR_RAW:  show raw (unobfuscated) pointer values;
>                         equivalent to %px.
> - BTF_TRACE_F_ZERO:     show zero-valued struct/union members;
>                         they are not displayed by default
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  include/linux/bpf.h            |   1 +
>  include/linux/btf.h            |   9 ++--
>  include/uapi/linux/bpf.h       |  63 +++++++++++++++++++++++++
>  kernel/bpf/core.c              |   5 ++
>  kernel/bpf/helpers.c           |   4 ++
>  kernel/trace/bpf_trace.c       | 102 ++++++++++++++++++++++++++++++++++++++++-
>  scripts/bpf_helpers_doc.py     |   2 +
>  tools/include/uapi/linux/bpf.h |  63 +++++++++++++++++++++++++
>  8 files changed, 243 insertions(+), 6 deletions(-)
>

[...]

> +/*
> + * struct btf_ptr is used for typed pointer display; the
> + * additional type string/BTF type id are used to render the pointer
> + * data as the appropriate type via the bpf_trace_btf() helper
> + * above.  A flags field - potentially to specify additional details
> + * about the BTF pointer (rather than its mode of display) - is
> + * present for future use.  Display flags - BTF_TRACE_F_* - are
> + * passed to display functions separately.
> + */
> +struct btf_ptr {
> +       void *ptr;
> +       const char *type;
> +       __u32 type_id;
> +       __u32 flags;            /* BTF ptr flags; unused at present. */
> +};

Would it be possible to just utilize __builtin_btf_type_id() to pass
BTF type id directly, without this string -> BTF type translation?
Please check [0] to see if that would make sense here. Thanks.

  [0] https://patchwork.ozlabs.org/project/netdev/patch/20200819194519.3375898-4-andriin@fb.com/

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk
  2020-08-18  9:12         ` Alan Maguire
@ 2020-08-20 22:16           ` Alexei Starovoitov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2020-08-20 22:16 UTC (permalink / raw)
  To: Alan Maguire
  Cc: ast, daniel, andriin, yhs, linux, andriy.shevchenko, pmladek,
	kafai, songliubraving, john.fastabend, kpsingh, shuah, rdna,
	scott.branden, quentin, cneirabustos, jakub, mingo, rostedt, bpf,
	netdev, linux-kernel, linux-kselftest

On Tue, Aug 18, 2020 at 10:12:05AM +0100, Alan Maguire wrote:
> 
> Fair enough. I'm thinking a helper like
> 
> long bpf_btf_snprintf(char *str, u32 str_size, struct btf_ptr *ptr,
> 		      u32 ptr_size, u64 flags);
> 
> Then the user can choose perf event or ringbuf interfaces
> to share the results with userspace.

makes sense.

> 
> > If the user happen to use bpf_trace_printk("%s", buf);
> > after that to print that string buffer to trace_pipe that's user's choice.
> > I can see such use case when program author wants to debug
> > their bpf program. That's fine. But for kernel debugging, on demand and
> > "always on" logging and tracing the documentation should point
> > to sustainable interfaces that don't interfere with each other,
> > can be run in parallel by multiple users, etc.
> > 
> 
> The problem with bpf_trace_printk() under this approach is
> that the string size for %s arguments is very limited;
> bpf_trace_printk() restricts these to 64 bytes in size.
> Looks like bpf_seq_printf() restricts a %s string to 128
> bytes also.  We could add an additional helper for the 
> bpf_seq case which calls bpf_seq_printf() for each component
> in the object, i.e.
> 
> long bpf_seq_btf_printf(struct seq_file *m, struct btf_ptr *ptr,
> 			u32 ptr_size, u64 flags);

yeah. this one is needed as well.
Please double check that it works out of bpf iterator too.
Especially the case when output is not power of 2.
bpf_iter.c keeps page sized buffer and will restart
iteration on overflow.
Could you please modify progs/bpf_iter_task.c to do
bpf_seq_btf_printf(seq, {task, }, 0);
and check that the output doesn't contain repeated/garbled lines.

Also I'm not sure why you see 64 limit of bpf_trace_printk as a blocker.
We could do:
if (in_nmi())
  use 64 byte on stack
else
  spin_lock_irqsave and use 1k static buffer.
and/or we can introduce bpf_trace_puts(const char *buf, u32 size_of_buf);
with
        .arg1_type      = ARG_PTR_TO_MEM,
        .arg2_type      = ARG_CONST_SIZE,
add null termination check and call trace_bpf_trace_printk() on that
buffer. That will avoid a bunch of copies.
But that still doesn't make bpf_trace_puts() a good interface for
dumping stuff to user space. It's still debug only feature.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-08-20 22:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06 14:42 [RFC PATCH bpf-next 0/4] bpf: add bpf-based bpf_trace_printk()-like support Alan Maguire
2020-08-06 14:42 ` [RFC PATCH bpf-next 1/4] bpf: provide function to get vmlinux BTF information Alan Maguire
2020-08-06 14:42 ` [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk Alan Maguire
2020-08-13  1:46   ` Alexei Starovoitov
2020-08-14 13:06     ` Alan Maguire
2020-08-14 17:01       ` Alexei Starovoitov
2020-08-18  9:12         ` Alan Maguire
2020-08-20 22:16           ` Alexei Starovoitov
2020-08-06 14:42 ` [RFC PATCH bpf-next 3/4] bpf: add bpf_trace_btf helper Alan Maguire
2020-08-20  6:36   ` Andrii Nakryiko
2020-08-06 14:42 ` [RFC PATCH bpf-next 4/4] selftests/bpf: add bpf_trace_btf helper tests Alan Maguire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).