bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/6] BPF verifier rotating log
@ 2023-03-28 23:56 Andrii Nakryiko
  2023-03-28 23:56 ` [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file Andrii Nakryiko
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

This patch set changes BPF verifier log behavior to behave as a rotating log,
by default. If user-supplied log buffer is big enough to contain entire
verifier log output, there is no effective difference. But where previously
user supplied too small log buffer and would get -ENOSPC error result and the
beginning part of the verifier log, now there will be no error and user will
get ending part of verifier log filling up user-supplied log buffer.  Which
is, in absolute majority of cases, is exactly what's useful, relevant, and
what users want and need, as the ending of the verifier log is containing
details of verifier failure and relevant state that got us to that failure. So
this rotating mode is made default, but for some niche advanced debugging
scenarios it's possible to request old behavior by specifying additional
BPF_LOG_FIXED (8) flag.

This patch set adjusts libbpf to allow specifying flags beyond 1 | 2 | 4. We
also add --log-size and --log-fixed options to veristat to be able to both
test this functionality manually, but also to be used in various debugging
scenarios. We also add selftests that tries many variants of log buffer size
to stress-test correctness of internal verifier log bookkeeping code.

v1->v2:
  - return -ENOSPC even in rotating log mode for preserving backwards
    compatibility (Lorenz);

Andrii Nakryiko (6):
  bpf: split off basic BPF verifier log into separate file
  bpf: remove minimum size restrictions on verifier log buffer
  bpf: switch BPF verifier log to be a rotating log by default
  libbpf: don't enforce verifier log levels on libbpf side
  selftests/bpf: add more veristat control over verifier log options
  selftests/bpf: add fixed vs rotating verifier log tests

 include/linux/bpf_verifier.h                  |  48 ++--
 kernel/bpf/Makefile                           |   3 +-
 kernel/bpf/btf.c                              |   3 +-
 kernel/bpf/log.c                              | 262 ++++++++++++++++++
 kernel/bpf/verifier.c                         |  88 +-----
 tools/lib/bpf/bpf.c                           |   2 -
 .../selftests/bpf/prog_tests/log_fixup.c      |   1 +
 .../selftests/bpf/prog_tests/verifier_log.c   | 164 +++++++++++
 tools/testing/selftests/bpf/veristat.c        |  42 ++-
 9 files changed, 506 insertions(+), 107 deletions(-)
 create mode 100644 kernel/bpf/log.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/verifier_log.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-30 17:12   ` Lorenz Bauer
  2023-03-28 23:56 ` [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer Andrii Nakryiko
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

kernel/bpf/verifier.c file is large and growing larger all the time. So
it's good to start splitting off more or less self-contained parts into
separate files to keep source code size (somewhat) somewhat under
control.

This patch is a one step in this direction, moving some of BPF verifier log
routines into a separate kernel/bpf/log.c. Right now it's most low-level
and isolated routines to append data to log, reset log to previous
position, etc. Eventually we could probably move verifier state
printing logic here as well, but this patch doesn't attempt to do that
yet.

Subsequent patches will add more logic to verifier log management, so
having basics in a separate file will make sure verifier.c doesn't grow
more with new changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_verifier.h | 19 +++-----
 kernel/bpf/Makefile          |  3 +-
 kernel/bpf/log.c             | 85 ++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c        | 69 -----------------------------
 4 files changed, 94 insertions(+), 82 deletions(-)
 create mode 100644 kernel/bpf/log.c

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 81d525d057c7..83dff25545ee 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -498,11 +498,6 @@ struct bpf_verifier_log {
 	u32 len_total;
 };
 
-static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
-{
-	return log->len_used >= log->len_total - 1;
-}
-
 #define BPF_LOG_LEVEL1	1
 #define BPF_LOG_LEVEL2	2
 #define BPF_LOG_STATS	4
@@ -512,6 +507,11 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
 #define BPF_LOG_MIN_ALIGNMENT 8U
 #define BPF_LOG_ALIGNMENT 40U
 
+static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
+{
+	return log->len_used >= log->len_total - 1;
+}
+
 static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 {
 	return log &&
@@ -519,13 +519,6 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 		 log->level == BPF_LOG_KERNEL);
 }
 
-static inline bool
-bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
-{
-	return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 &&
-	       log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK);
-}
-
 #define BPF_MAX_SUBPROGS 256
 
 struct bpf_subprog_info {
@@ -608,12 +601,14 @@ struct bpf_verifier_env {
 	char type_str_buf[TYPE_STR_BUF_LEN];
 };
 
+bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log);
 __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
 				      const char *fmt, va_list args);
 __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
 					   const char *fmt, ...);
 __printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
 			    const char *fmt, ...);
+void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos);
 
 static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
 {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 02242614dcc7..1d3892168d32 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,7 +6,8 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
 endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
new file mode 100644
index 000000000000..920061e38d2e
--- /dev/null
+++ b/kernel/bpf/log.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ * Copyright (c) 2016 Facebook
+ * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
+ */
+#include <uapi/linux/btf.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <linux/bpf_verifier.h>
+
+bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
+{
+	return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 &&
+	       log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK);
+}
+
+void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
+		       va_list args)
+{
+	unsigned int n;
+
+	n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args);
+
+	WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1,
+		  "verifier log line truncated - local buffer too short\n");
+
+	if (log->level == BPF_LOG_KERNEL) {
+		bool newline = n > 0 && log->kbuf[n - 1] == '\n';
+
+		pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n");
+		return;
+	}
+
+	n = min(log->len_total - log->len_used - 1, n);
+	log->kbuf[n] = '\0';
+	if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
+		log->len_used += n;
+	else
+		log->ubuf = NULL;
+}
+
+void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos)
+{
+	char zero = 0;
+
+	if (!bpf_verifier_log_needed(log))
+		return;
+
+	log->len_used = new_pos;
+	if (put_user(zero, log->ubuf + new_pos))
+		log->ubuf = NULL;
+}
+
+/* log_level controls verbosity level of eBPF verifier.
+ * bpf_verifier_log_write() is used to dump the verification trace to the log,
+ * so the user can figure out what's wrong with the program
+ */
+__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
+					   const char *fmt, ...)
+{
+	va_list args;
+
+	if (!bpf_verifier_log_needed(&env->log))
+		return;
+
+	va_start(args, fmt);
+	bpf_verifier_vlog(&env->log, fmt, args);
+	va_end(args);
+}
+EXPORT_SYMBOL_GPL(bpf_verifier_log_write);
+
+__printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
+			    const char *fmt, ...)
+{
+	va_list args;
+
+	if (!bpf_verifier_log_needed(log))
+		return;
+
+	va_start(args, fmt);
+	bpf_verifier_vlog(log, fmt, args);
+	va_end(args);
+}
+EXPORT_SYMBOL_GPL(bpf_log);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 20eb2015842f..9028a40107c4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -335,61 +335,6 @@ find_linfo(const struct bpf_verifier_env *env, u32 insn_off)
 	return &linfo[i - 1];
 }
 
-void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
-		       va_list args)
-{
-	unsigned int n;
-
-	n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args);
-
-	WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1,
-		  "verifier log line truncated - local buffer too short\n");
-
-	if (log->level == BPF_LOG_KERNEL) {
-		bool newline = n > 0 && log->kbuf[n - 1] == '\n';
-
-		pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n");
-		return;
-	}
-
-	n = min(log->len_total - log->len_used - 1, n);
-	log->kbuf[n] = '\0';
-	if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
-		log->len_used += n;
-	else
-		log->ubuf = NULL;
-}
-
-static void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos)
-{
-	char zero = 0;
-
-	if (!bpf_verifier_log_needed(log))
-		return;
-
-	log->len_used = new_pos;
-	if (put_user(zero, log->ubuf + new_pos))
-		log->ubuf = NULL;
-}
-
-/* log_level controls verbosity level of eBPF verifier.
- * bpf_verifier_log_write() is used to dump the verification trace to the log,
- * so the user can figure out what's wrong with the program
- */
-__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
-					   const char *fmt, ...)
-{
-	va_list args;
-
-	if (!bpf_verifier_log_needed(&env->log))
-		return;
-
-	va_start(args, fmt);
-	bpf_verifier_vlog(&env->log, fmt, args);
-	va_end(args);
-}
-EXPORT_SYMBOL_GPL(bpf_verifier_log_write);
-
 __printf(2, 3) static void verbose(void *private_data, const char *fmt, ...)
 {
 	struct bpf_verifier_env *env = private_data;
@@ -403,20 +348,6 @@ __printf(2, 3) static void verbose(void *private_data, const char *fmt, ...)
 	va_end(args);
 }
 
-__printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
-			    const char *fmt, ...)
-{
-	va_list args;
-
-	if (!bpf_verifier_log_needed(log))
-		return;
-
-	va_start(args, fmt);
-	bpf_verifier_vlog(log, fmt, args);
-	va_end(args);
-}
-EXPORT_SYMBOL_GPL(bpf_log);
-
 static const char *ltrim(const char *s)
 {
 	while (isspace(*s))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
  2023-03-28 23:56 ` [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-30 17:12   ` Lorenz Bauer
  2023-03-28 23:56 ` [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default Andrii Nakryiko
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

It's not clear why we have 128 as minimum size, but it makes testing
harder and seems unnecessary, as we carefully handle truncation
scenarios and use proper snprintf variants. So remove this limitation
and just enfore positive length for log buffer.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 920061e38d2e..1974891fc324 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -11,7 +11,7 @@
 
 bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
 {
-	return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 &&
+	return log->len_total > 0 && log->len_total <= UINT_MAX >> 2 &&
 	       log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
  2023-03-28 23:56 ` [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file Andrii Nakryiko
  2023-03-28 23:56 ` [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-30 17:12   ` Lorenz Bauer
  2023-03-28 23:56 ` [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side Andrii Nakryiko
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

Currently, if user-supplied log buffer to collect BPF verifier log turns
out to be too small to contain full log, bpf() syscall returns -ENOSPC,
fails BPF program verification/load, and preserves first N-1 bytes of
the verifier log (where N is the size of user-supplied buffer).

This is problematic in a bunch of common scenarios, especially when
working with real-world BPF programs that tend to be pretty complex as
far as verification goes and require big log buffers. Typically, it's
when debugging tricky cases at log level 2 (verbose). Also, when BPF program
is successfully validated, log level 2 is the only way to actually see
verifier state progression and all the important details.

Even with log level 1, it's possible to get -ENOSPC even if the final
verifier log fits in log buffer, if there is a code path that's deep
enough to fill up entire log, even if normally it would be reset later
on (there is a logic to chop off successfully validated portions of BPF
verifier log).

In short, it's not always possible to pre-size log buffer. Also, what's
worse, in practice, the end of the log most often is way more important
than the beginning, but verifier stops emitting log as soon as initial
log buffer is filled up.

This patch switches BPF verifier log behavior to effectively behave as
rotating log. That is, if user-supplied log buffer turns out to be too
short, verifier will keep overwriting previously written log,
effectively treating user's log buffer as a ring buffer. -ENOSPC is
still going to be returned at the end, to notify user that log contents
was truncated, but the important last N bytes of the log would be
returned, which might be all that user really needs. This consistent
-ENOSPC behavior, regardless of rotating or fixed log behavior, allows
to prevent backwards compatibility breakage. The only user-visible
change is which portion of verifier log user ends up seeing *if buffer
is too small*. Given contents of verifier log itself is not an ABI,
there is no breakage due to this behavior change. Specialized tools that
rely on specific contents of verifier log in -ENOSPC scenario are
expected to be easily adapted to accommodate old and new behaviors.

Importantly, though, to preserve good user experience and not require
every user-space application to adopt to this new behavior, before
exiting to user-space verifier will rotate log (in place) to make it
start at the very beginning of user buffer as a continuous
zero-terminated string. The contents will be a chopped off N-1 last
bytes of full verifier log, of course.

Given beginning of log is sometimes important as well, we add
BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows
tools like veristat to request first part of verifier log, if necessary.
BPF_LOG_FIXED flag is also a simple and straightforward way to check if
BPF verifier supports rotating behavior.

On the implementation side, conceptually, it's all simple. We maintain
64-bit logical start and end positions. If we need to truncate the log,
start position will be adjusted accordingly to lag end position by
N bytes. We then use those logical positions to calculate their matching
actual positions in user buffer and handle wrap around the end of the
buffer properly. Finally, right before returning from bpf_check(), we
rotate user log buffer contents in-place as necessary, to make log
contents contiguous. See comments in relevant functions for details.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_verifier.h                  |  33 ++-
 kernel/bpf/btf.c                              |   3 +-
 kernel/bpf/log.c                              | 195 +++++++++++++++++-
 kernel/bpf/verifier.c                         |  19 +-
 .../selftests/bpf/prog_tests/log_fixup.c      |   1 +
 5 files changed, 225 insertions(+), 26 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 83dff25545ee..4c926227f612 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -491,25 +491,42 @@ struct bpf_insn_aux_data {
 #define BPF_VERIFIER_TMP_LOG_SIZE	1024
 
 struct bpf_verifier_log {
-	u32 level;
-	char kbuf[BPF_VERIFIER_TMP_LOG_SIZE];
+	/* Logical start and end positions of a "log window" of the verifier log.
+	 * start_pos == 0 means we haven't truncated anything.
+	 * Once truncation starts to happen, start_pos + len_total == end_pos,
+	 * except during log reset situations, in which (end_pos - start_pos)
+	 * might get smaller than len_total (see bpf_vlog_reset()).
+	 * Generally, (end_pos - start_pos) gives number of useful data in
+	 * user log buffer.
+	 */
+	u64 start_pos;
+	u64 end_pos;
 	char __user *ubuf;
-	u32 len_used;
+	u32 level;
 	u32 len_total;
+	char kbuf[BPF_VERIFIER_TMP_LOG_SIZE];
 };
 
 #define BPF_LOG_LEVEL1	1
 #define BPF_LOG_LEVEL2	2
 #define BPF_LOG_STATS	4
+#define BPF_LOG_FIXED	8
 #define BPF_LOG_LEVEL	(BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
-#define BPF_LOG_MASK	(BPF_LOG_LEVEL | BPF_LOG_STATS)
+#define BPF_LOG_MASK	(BPF_LOG_LEVEL | BPF_LOG_STATS | BPF_LOG_FIXED)
 #define BPF_LOG_KERNEL	(BPF_LOG_MASK + 1) /* kernel internal flag */
 #define BPF_LOG_MIN_ALIGNMENT 8U
 #define BPF_LOG_ALIGNMENT 40U
 
+static inline u32 bpf_log_used(const struct bpf_verifier_log *log)
+{
+	return log->end_pos - log->start_pos;
+}
+
 static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
 {
-	return log->len_used >= log->len_total - 1;
+	if (log->level & BPF_LOG_FIXED)
+		return bpf_log_used(log) >= log->len_total - 1;
+	return false;
 }
 
 static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
@@ -596,7 +613,7 @@ struct bpf_verifier_env {
 	u32 scratched_regs;
 	/* Same as scratched_regs but for stack slots */
 	u64 scratched_stack_slots;
-	u32 prev_log_len, prev_insn_print_len;
+	u64 prev_log_pos, prev_insn_print_pos;
 	/* buffer used in reg_type_str() to generate reg_type string */
 	char type_str_buf[TYPE_STR_BUF_LEN];
 };
@@ -608,7 +625,9 @@ __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
 					   const char *fmt, ...);
 __printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
 			    const char *fmt, ...);
-void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos);
+void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos);
+void bpf_vlog_finalize(struct bpf_verifier_log *log);
+bool bpf_vlog_truncated(const struct bpf_verifier_log *log);
 
 static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
 {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index b7e5a5510b91..2574cc9b3e28 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5593,7 +5593,8 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 		}
 	}
 
-	if (log->level && bpf_verifier_log_full(log)) {
+	bpf_vlog_finalize(log);
+	if (log->level && bpf_vlog_truncated(log)) {
 		err = -ENOSPC;
 		goto errout_meta;
 	}
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 1974891fc324..beab7c7f52a7 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -8,6 +8,7 @@
 #include <linux/types.h>
 #include <linux/bpf.h>
 #include <linux/bpf_verifier.h>
+#include <linux/math64.h>
 
 bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
 {
@@ -32,23 +33,199 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
 		return;
 	}
 
-	n = min(log->len_total - log->len_used - 1, n);
-	log->kbuf[n] = '\0';
-	if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
-		log->len_used += n;
-	else
-		log->ubuf = NULL;
+	if (log->level & BPF_LOG_FIXED) {
+		n = min(log->len_total - bpf_log_used(log) - 1, n);
+		log->kbuf[n] = '\0';
+		n += 1;
+
+		if (copy_to_user(log->ubuf + log->end_pos, log->kbuf, n))
+			goto fail;
+
+		log->end_pos += n - 1; /* don't count terminating '\0' */
+	} else {
+		u64 new_end, new_start, cur_pos;
+		u32 buf_start, buf_end, new_n;
+
+		log->kbuf[n] = '\0';
+		n += 1;
+
+		new_end = log->end_pos + n;
+		if (new_end - log->start_pos >= log->len_total)
+			new_start = new_end - log->len_total;
+		else
+			new_start = log->start_pos;
+		new_n = min(n, log->len_total);
+		cur_pos = new_end - new_n;
+
+		div_u64_rem(cur_pos, log->len_total, &buf_start);
+		div_u64_rem(new_end, log->len_total, &buf_end);
+		/* new_end and buf_end are exclusive indices, so if buf_end is
+		 * exactly zero, then it actually points right to the end of
+		 * ubuf and there is no wrap around
+		 */
+		if (buf_end == 0)
+			buf_end = log->len_total;
+
+		/* if buf_start > buf_end, we wrapped around;
+		 * if buf_start == buf_end, then we fill ubuf completely; we
+		 * can't have buf_start == buf_end to mean that there is
+		 * nothing to write, because we always write at least
+		 * something, even if terminal '\0'
+		 */
+		if (buf_start < buf_end) {
+			/* message fits within contiguous chunk of ubuf */
+			if (copy_to_user(log->ubuf + buf_start,
+					 log->kbuf + n - new_n,
+					 buf_end - buf_start))
+				goto fail;
+		} else {
+			/* message wraps around the end of ubuf, copy in two chunks */
+			if (copy_to_user(log->ubuf + buf_start,
+					 log->kbuf + n - new_n,
+					 log->len_total - buf_start))
+				goto fail;
+			if (copy_to_user(log->ubuf,
+					 log->kbuf + n - buf_end,
+					 buf_end))
+				goto fail;
+		}
+
+		log->start_pos = new_start;
+		log->end_pos = new_end - 1; /* don't count terminating '\0' */
+	}
+
+	return;
+fail:
+	log->ubuf = NULL;
 }
 
-void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos)
+void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos)
 {
 	char zero = 0;
+	u32 pos;
 
 	if (!bpf_verifier_log_needed(log))
 		return;
 
-	log->len_used = new_pos;
-	if (put_user(zero, log->ubuf + new_pos))
+	/* if position to which we reset is beyond current log window,
+	 * then we didn't preserve any useful content and should adjust
+	 * start_pos to end up with an empty log (start_pos == end_pos)
+	 */
+	log->end_pos = new_pos;
+	if (log->end_pos < log->start_pos)
+		log->start_pos = log->end_pos;
+	div_u64_rem(new_pos, log->len_total, &pos);
+	if (put_user(zero, log->ubuf + pos))
+		log->ubuf = NULL;
+}
+
+static void bpf_vlog_reverse_kbuf(char *buf, int len)
+{
+	int i, j;
+
+	for (i = 0, j = len - 1; i < j; i++, j--)
+		swap(buf[i], buf[j]);
+}
+
+static int bpf_vlog_reverse_ubuf(struct bpf_verifier_log *log, int start, int end)
+{
+	/* we split log->kbuf into two equal parts for both ends of array */
+	int n = sizeof(log->kbuf) / 2, nn;
+	char *lbuf = log->kbuf, *rbuf = log->kbuf + n;
+
+	/* Read ubuf's section [start, end) two chunks at a time, from left
+	 * and right side; within each chunk, swap all the bytes; after that
+	 * reverse the order of lbuf and rbuf and write result back to ubuf.
+	 * This way we'll end up with swapped contents of specified
+	 * [start, end) ubuf segment.
+	 */
+	while (end - start > 1) {
+		nn = min(n, (end - start ) / 2);
+
+		if (copy_from_user(lbuf, log->ubuf + start, nn))
+			return -EFAULT;
+		if (copy_from_user(rbuf, log->ubuf + end - nn, nn))
+			return -EFAULT;
+
+		bpf_vlog_reverse_kbuf(lbuf, nn);
+		bpf_vlog_reverse_kbuf(rbuf, nn);
+
+		/* we write lbuf to the right end of ubuf, while rbuf to the
+		 * left one to end up with properly reversed overall ubuf
+		 */
+		if (copy_to_user(log->ubuf + start, rbuf, nn))
+			return -EFAULT;
+		if (copy_to_user(log->ubuf + end - nn, lbuf, nn))
+			return -EFAULT;
+
+		start += nn;
+		end -= nn;
+	}
+
+	return 0;
+}
+
+bool bpf_vlog_truncated(const struct bpf_verifier_log *log)
+{
+	if (log->level & BPF_LOG_FIXED)
+		return bpf_log_used(log) >= log->len_total - 1;
+	else
+		return log->start_pos > 0;
+}
+
+void bpf_vlog_finalize(struct bpf_verifier_log *log)
+{
+	u32 sublen;
+	int err;
+
+	if (!log || !log->level || !log->ubuf)
+		return;
+	if ((log->level & BPF_LOG_FIXED) || log->level == BPF_LOG_KERNEL)
+		return;
+
+	/* If we never truncated log, there is nothing to move around. */
+	if (log->start_pos == 0)
+		return;
+
+	/* Otherwise we need to rotate log contents to make it start from the
+	 * buffer beginning and be a continuous zero-terminated string. Note
+	 * that if log->start_pos != 0 then we definitely filled up entire log
+	 * buffer with no gaps, and we just need to shift buffer contents to
+	 * the left by (log->start_pos % log->len_total) bytes.
+	 *
+	 * Unfortunately, user buffer could be huge and we don't want to
+	 * allocate temporary kernel memory of the same size just to shift
+	 * contents in a straightforward fashion. Instead, we'll be clever and
+	 * do in-place array rotation. This is a leetcode-style problem, which
+	 * could be solved by three rotations.
+	 *
+	 * Let's say we have log buffer that has to be shifted left by 7 bytes
+	 * (spaces and vertical bar is just for demonstrative purposes):
+	 *   E F G H I J K | A B C D
+	 *
+	 * First, we reverse entire array:
+	 *   D C B A | K J I H G F E
+	 *
+	 * Then we rotate first 4 bytes (DCBA) and separately last 7 bytes
+	 * (KJIHGFE), resulting in a properly rotated array:
+	 *   A B C D | E F G H I J K
+	 *
+	 * We'll utilize log->kbuf to read user memory chunk by chunk, swap
+	 * bytes, and write them back. Doing it byte-by-byte would be
+	 * unnecessarily inefficient. Altogether we are going to read and
+	 * write each byte twice.
+	 */
+
+	/* length of the chopped off part that will be the beginning;
+	 * len(ABCD) in the example above
+	 */
+	div_u64_rem(log->start_pos, log->len_total, &sublen);
+	sublen = log->len_total - sublen;
+
+	err = bpf_vlog_reverse_ubuf(log, 0, log->len_total);
+	err = err ?: bpf_vlog_reverse_ubuf(log, 0, sublen);
+	err = err ?: bpf_vlog_reverse_ubuf(log, sublen, log->len_total);
+	if (err)
 		log->ubuf = NULL;
 }
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9028a40107c4..e121c6c13635 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1439,10 +1439,10 @@ static inline u32 vlog_alignment(u32 pos)
 static void print_insn_state(struct bpf_verifier_env *env,
 			     const struct bpf_func_state *state)
 {
-	if (env->prev_log_len && env->prev_log_len == env->log.len_used) {
+	if (env->prev_log_pos && env->prev_log_pos == env->log.end_pos) {
 		/* remove new line character */
-		bpf_vlog_reset(&env->log, env->prev_log_len - 1);
-		verbose(env, "%*c;", vlog_alignment(env->prev_insn_print_len), ' ');
+		bpf_vlog_reset(&env->log, env->prev_log_pos - 1);
+		verbose(env, "%*c;", vlog_alignment(env->prev_insn_print_pos), ' ');
 	} else {
 		verbose(env, "%d:", env->insn_idx);
 	}
@@ -1750,7 +1750,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 	elem->insn_idx = insn_idx;
 	elem->prev_insn_idx = prev_insn_idx;
 	elem->next = env->head;
-	elem->log_pos = env->log.len_used;
+	elem->log_pos = env->log.end_pos;
 	env->head = elem;
 	env->stack_size++;
 	err = copy_verifier_state(&elem->st, cur);
@@ -2286,7 +2286,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
 	elem->insn_idx = insn_idx;
 	elem->prev_insn_idx = prev_insn_idx;
 	elem->next = env->head;
-	elem->log_pos = env->log.len_used;
+	elem->log_pos = env->log.end_pos;
 	env->head = elem;
 	env->stack_size++;
 	if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
@@ -15564,11 +15564,11 @@ static int do_check(struct bpf_verifier_env *env)
 				print_insn_state(env, state->frame[state->curframe]);
 
 			verbose_linfo(env, env->insn_idx, "; ");
-			env->prev_log_len = env->log.len_used;
+			env->prev_log_pos = env->log.end_pos;
 			verbose(env, "%d: ", env->insn_idx);
 			print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
-			env->prev_insn_print_len = env->log.len_used - env->prev_log_len;
-			env->prev_log_len = env->log.len_used;
+			env->prev_insn_print_pos = env->log.end_pos - env->prev_log_pos;
+			env->prev_log_pos = env->log.end_pos;
 		}
 
 		if (bpf_prog_is_offloaded(env->prog->aux)) {
@@ -18786,7 +18786,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 	print_verification_stats(env);
 	env->prog->aux->verified_insns = env->insn_processed;
 
-	if (log->level && bpf_verifier_log_full(log))
+	bpf_vlog_finalize(log);
+	if (log->level && bpf_vlog_truncated(log))
 		ret = -ENOSPC;
 	if (log->level && !log->ubuf) {
 		ret = -EFAULT;
diff --git a/tools/testing/selftests/bpf/prog_tests/log_fixup.c b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
index 239e1c5753b0..bc27170bdeb0 100644
--- a/tools/testing/selftests/bpf/prog_tests/log_fixup.c
+++ b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
@@ -24,6 +24,7 @@ static void bad_core_relo(size_t log_buf_size, enum trunc_type trunc_type)
 	bpf_program__set_autoload(skel->progs.bad_relo, true);
 	memset(log_buf, 0, sizeof(log_buf));
 	bpf_program__set_log_buf(skel->progs.bad_relo, log_buf, log_buf_size ?: sizeof(log_buf));
+	bpf_program__set_log_level(skel->progs.bad_relo, 1 | 8); /* BPF_LOG_FIXED to force truncation */
 
 	err = test_log_fixup__load(skel);
 	if (!ASSERT_ERR(err, "load_fail"))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-30 17:12   ` Lorenz Bauer
  2023-03-28 23:56 ` [PATCH bpf-next 4/6] libbpf: don't enfore " Andrii Nakryiko
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

This basically prevents any forward compatibility. And we either way
just return -EINVAL, which would otherwise be returned from bpf()
syscall anyways.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 767035900354..615185226ed0 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -290,8 +290,6 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 
 	if (!!log_buf != !!log_size)
 		return libbpf_err(-EINVAL);
-	if (log_level > (4 | 2 | 1))
-		return libbpf_err(-EINVAL);
 	if (log_level && !log_buf)
 		return libbpf_err(-EINVAL);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 4/6] libbpf: don't enfore verifier log levels on libbpf side
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-28 23:56 ` [PATCH v2 bpf-next 5/6] selftests/bpf: add more veristat control over verifier log options Andrii Nakryiko
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

This basically prevents any forward compatibility. And we either way
just return -EINVAL, which would otherwise be returned from bpf()
syscall anyways.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index e750b6f5fcc3..eb08a998b526 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -290,8 +290,6 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 
 	if (!!log_buf != !!log_size)
 		return libbpf_err(-EINVAL);
-	if (log_level > (4 | 2 | 1))
-		return libbpf_err(-EINVAL);
 	if (log_level && !log_buf)
 		return libbpf_err(-EINVAL);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 bpf-next 5/6] selftests/bpf: add more veristat control over verifier log options
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH bpf-next 4/6] libbpf: don't enfore " Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-28 23:56 ` [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests Andrii Nakryiko
  2023-03-28 23:59 ` [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
  7 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

Add --log-size to be able to customize log buffer sent to bpf() syscall
for BPF program verification logging.

Add --log-fixed to enforce BPF_LOG_FIXED behavior for BPF verifier log.
This is useful in unlikely event that beginning of truncated verifier
log is more important than the end of it (which with rotating verifier
log behavior is the default now).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/veristat.c | 42 +++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c
index 83231456d3c5..3d24b6b96e75 100644
--- a/tools/testing/selftests/bpf/veristat.c
+++ b/tools/testing/selftests/bpf/veristat.c
@@ -136,11 +136,14 @@ static struct env {
 	int filename_cnt;
 	bool verbose;
 	bool quiet;
-	int log_level;
 	enum resfmt out_fmt;
 	bool comparison_mode;
 	bool replay_mode;
 
+	int log_level;
+	int log_size;
+	bool log_fixed;
+
 	struct verif_stats *prog_stats;
 	int prog_stat_cnt;
 
@@ -182,10 +185,17 @@ const char argp_program_doc[] =
 "USAGE: veristat <obj-file> [<obj-file>...]\n"
 "   OR: veristat -C <baseline.csv> <comparison.csv>\n";
 
+enum {
+	OPT_LOG_FIXED = 1000,
+	OPT_LOG_SIZE = 1001,
+};
+
 static const struct argp_option opts[] = {
 	{ NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" },
 	{ "verbose", 'v', NULL, 0, "Verbose mode" },
 	{ "log-level", 'l', "LEVEL", 0, "Verifier log level (default 0 for normal mode, 1 for verbose mode)" },
+	{ "log-fixed", OPT_LOG_FIXED, NULL, 0, "Disable verifier log rotation" },
+	{ "log-size", OPT_LOG_SIZE, "BYTES", 0, "Customize verifier log size (default to 16MB)" },
 	{ "quiet", 'q', NULL, 0, "Quiet mode" },
 	{ "emit", 'e', "SPEC", 0, "Specify stats to be emitted" },
 	{ "sort", 's', "SPEC", 0, "Specify sort order" },
@@ -243,6 +253,17 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
 			argp_usage(state);
 		}
 		break;
+	case OPT_LOG_FIXED:
+		env.log_fixed = true;
+		break;
+	case OPT_LOG_SIZE:
+		errno = 0;
+		env.log_size = strtol(arg, NULL, 10);
+		if (errno) {
+			fprintf(stderr, "invalid log size: %s\n", arg);
+			argp_usage(state);
+		}
+		break;
 	case 'C':
 		env.comparison_mode = true;
 		break;
@@ -797,8 +818,8 @@ static void fixup_obj(struct bpf_object *obj)
 static int process_prog(const char *filename, struct bpf_object *obj, struct bpf_program *prog)
 {
 	const char *prog_name = bpf_program__name(prog);
-	size_t buf_sz = sizeof(verif_log_buf);
-	char *buf = verif_log_buf;
+	char *buf;
+	int buf_sz, log_level;
 	struct verif_stats *stats;
 	int err = 0;
 	void *tmp;
@@ -816,18 +837,23 @@ static int process_prog(const char *filename, struct bpf_object *obj, struct bpf
 	memset(stats, 0, sizeof(*stats));
 
 	if (env.verbose) {
-		buf_sz = 16 * 1024 * 1024;
+		buf_sz = env.log_size ? env.log_size : 16 * 1024 * 1024;
 		buf = malloc(buf_sz);
 		if (!buf)
 			return -ENOMEM;
-		bpf_program__set_log_buf(prog, buf, buf_sz);
-		bpf_program__set_log_level(prog, env.log_level | 4); /* stats + log */
+		/* ensure we always request stats */
+		log_level = env.log_level | 4 | (env.log_fixed ? 8 : 0);
 	} else {
-		bpf_program__set_log_buf(prog, buf, buf_sz);
-		bpf_program__set_log_level(prog, 4); /* only verifier stats */
+		buf = verif_log_buf;
+		buf_sz = sizeof(verif_log_buf);
+		/* request only verifier stats */
+		log_level = 4 | (env.log_fixed ? 8 : 0);
 	}
 	verif_log_buf[0] = '\0';
 
+	bpf_program__set_log_buf(prog, buf, buf_sz);
+	bpf_program__set_log_level(prog, log_level);
+
 	/* increase chances of successful BPF object loading */
 	fixup_obj(obj);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v2 bpf-next 5/6] selftests/bpf: add more veristat control over verifier log options Andrii Nakryiko
@ 2023-03-28 23:56 ` Andrii Nakryiko
  2023-03-30 17:12   ` Lorenz Bauer
  2023-03-28 23:59 ` [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
  7 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:56 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge; +Cc: andrii, kernel-team

Add selftests validating BPF_LOG_FIXED behavior, which used to be the
only behavior, and now default rotating BPF verifier log, which returns
just up to last N bytes of full verifier log, instead of returning
-ENOSPC.

To stress test correctness of in-kernel verifier log logic, we force it
to truncate program's verifier log to all lengths from 1 all the way to
its full size (about 450 bytes today). This was a useful stress test
while developing the feature.

For both fixed and rotating log modes we expect -ENOSPC if log contents
doesn't fit in user-supplied log buffer.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/verifier_log.c   | 164 ++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/verifier_log.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier_log.c b/tools/testing/selftests/bpf/prog_tests/verifier_log.c
new file mode 100644
index 000000000000..afe9e0384055
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/verifier_log.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <bpf/btf.h>
+
+#include "test_log_buf.skel.h"
+
+
+static bool check_prog_load(int prog_fd, bool expect_err, const char *tag)
+{
+	if (expect_err) {
+		if (!ASSERT_LT(prog_fd, 0, tag)) {
+			close(prog_fd);
+			return false;
+		}
+	} else /* !expect_err */ {
+		if (!ASSERT_GT(prog_fd, 0, tag))
+			return false;
+	}
+	return true;
+}
+
+static void verif_log_subtest(const char *name, bool expect_load_error, int log_level)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, opts);
+	struct {
+		/* strategically placed before others to avoid accidental modification by kernel */
+		char filler[1024];
+		char buf[1024];
+		/* strategically placed after buf[] to catch more accidental corruptions */
+		char reference[1024];
+	} logs;
+	char *exp_log, prog_name[16], op_name[32];
+	struct test_log_buf *skel;
+	struct bpf_program *prog;
+	const struct bpf_insn *insns;
+	size_t insn_cnt, fixed_log_sz;
+	int i, err, prog_fd;
+
+	skel = test_log_buf__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bpf_object__for_each_program(prog, skel->obj) {
+		if (strcmp(bpf_program__name(prog), name) == 0)
+			bpf_program__set_autoload(prog, true);
+		else
+			bpf_program__set_autoload(prog, false);
+	}
+
+	err = test_log_buf__load(skel);
+	if (!expect_load_error && !ASSERT_OK(err, "unexpected_load_failure"))
+		goto cleanup;
+	if (expect_load_error && !ASSERT_ERR(err, "unexpected_load_success"))
+		goto cleanup;
+
+	insns = bpf_program__insns(skel->progs.good_prog);
+	insn_cnt = bpf_program__insn_cnt(skel->progs.good_prog);
+
+	opts.log_buf = logs.reference;
+	opts.log_size = sizeof(logs.reference);
+	opts.log_level = log_level | 8 /* BPF_LOG_FIXED */;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_RAW_TRACEPOINT, "log_fixed",
+				"GPL", insns, insn_cnt, &opts);
+	if (!check_prog_load(prog_fd, expect_load_error, "fixed_buf_prog_load"))
+		goto cleanup;
+	close(prog_fd);
+
+	fixed_log_sz = strlen(logs.reference) + 1;
+	if (!ASSERT_GT(fixed_log_sz, 50, "fixed_log_sz"))
+		goto cleanup;
+	memset(logs.reference + fixed_log_sz, 0, sizeof(logs.reference) - fixed_log_sz);
+
+	/* validate BPF_LOG_FIXED works as verifier log used to work, that is:
+	 * we get -ENOSPC and beginning of the full verifier log. This only
+	 * works for log_level 2 and log_level 1 + failed program. For log
+	 * level 2 we don't reset log at all. For log_level 1 + failed program
+	 * we don't get to verification stats output. With log level 1
+	 * for successful program  final result will be just verifier stats.
+	 * But if provided too short log buf, kernel will NULL-out log->ubuf
+	 * and will stop emitting further log. This means we'll never see
+	 * predictable verifier stats.
+	 * Long story short, we do the following -ENOSPC test only for
+	 * predictable combinations.
+	 */
+	if (log_level >= 2 || expect_load_error) {
+		opts.log_buf = logs.buf;
+		opts.log_level = log_level | 8; /* fixed-length log */
+		opts.log_size = 25;
+
+		prog_fd = bpf_prog_load(BPF_PROG_TYPE_RAW_TRACEPOINT, "log_fixed50",
+					"GPL", insns, insn_cnt, &opts);
+		if (!ASSERT_EQ(prog_fd, -ENOSPC, "unexpected_log_fixed_prog_load_result")) {
+			if (prog_fd >= 0)
+				close(prog_fd);
+			goto cleanup;
+		}
+		if (!ASSERT_EQ(strlen(logs.buf), 24, "log_fixed_25"))
+			goto cleanup;
+		if (!ASSERT_STRNEQ(logs.buf, logs.reference, 24, op_name))
+			goto cleanup;
+	}
+
+	/* validate rolling verifier log logic: try all variations of log buf
+	 * length to force various truncation scenarios
+	 */
+	opts.log_buf = logs.buf;
+	opts.log_level = log_level; /* rolling log */
+
+	/* prefill logs.buf with 'A's to detect any write beyond allowed length */
+	memset(logs.filler, 'A', sizeof(logs.filler));
+	logs.filler[sizeof(logs.filler) - 1] = '\0';
+	memset(logs.buf, 'A', sizeof(logs.buf));
+	logs.buf[sizeof(logs.buf) - 1] = '\0';
+
+	for (i = 1; i < fixed_log_sz; i++) {
+		opts.log_size = i;
+
+		snprintf(prog_name, sizeof(prog_name), "log_roll_%d", i);
+		prog_fd = bpf_prog_load(BPF_PROG_TYPE_RAW_TRACEPOINT, prog_name,
+					"GPL", insns, insn_cnt, &opts);
+
+		snprintf(op_name, sizeof(op_name), "log_roll_prog_load_%d", i);
+		if (!ASSERT_EQ(prog_fd, -ENOSPC, op_name)) {
+			if (prog_fd >= 0)
+				close(prog_fd);
+			goto cleanup;
+		}
+
+		exp_log = logs.reference + fixed_log_sz - i;
+		snprintf(op_name, sizeof(op_name), "log_roll_contents_%d", i);
+		if (!ASSERT_STREQ(logs.buf, exp_log, op_name)) {
+			printf("CMP:%d\nS1:'%s'\nS2:'%s'\n",
+				strcmp(logs.buf, exp_log),
+				logs.buf, exp_log);
+			goto cleanup;
+		}
+
+		/* check that unused portions of logs.buf is not overwritten */
+		snprintf(op_name, sizeof(op_name), "log_roll_unused_tail_%d", i);
+		if (!ASSERT_STREQ(logs.buf + i, logs.filler + i, op_name)) {
+			printf("CMP:%d\nS1:'%s'\nS2:'%s'\n",
+				strcmp(logs.buf + i, logs.filler + i),
+				logs.buf + i, logs.filler + i);
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	test_log_buf__destroy(skel);
+}
+
+void test_verifier_log(void)
+{
+	if (test__start_subtest("good_prog-level1"))
+		verif_log_subtest("good_prog", false, 1);
+	if (test__start_subtest("good_prog-level2"))
+		verif_log_subtest("good_prog", false, 2);
+	if (test__start_subtest("bad_prog-level1"))
+		verif_log_subtest("bad_prog", true, 1);
+	if (test__start_subtest("bad_prog-level2"))
+		verif_log_subtest("bad_prog", true, 2);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 0/6] BPF verifier rotating log
  2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests Andrii Nakryiko
@ 2023-03-28 23:59 ` Andrii Nakryiko
  2023-03-30  5:38   ` Alexei Starovoitov
  7 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-28 23:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, lmb, timo, robin.goegge, kernel-team

On Tue, Mar 28, 2023 at 4:56 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> This patch set changes BPF verifier log behavior to behave as a rotating log,
> by default. If user-supplied log buffer is big enough to contain entire
> verifier log output, there is no effective difference. But where previously
> user supplied too small log buffer and would get -ENOSPC error result and the
> beginning part of the verifier log, now there will be no error and user will
> get ending part of verifier log filling up user-supplied log buffer.  Which
> is, in absolute majority of cases, is exactly what's useful, relevant, and
> what users want and need, as the ending of the verifier log is containing
> details of verifier failure and relevant state that got us to that failure. So
> this rotating mode is made default, but for some niche advanced debugging
> scenarios it's possible to request old behavior by specifying additional
> BPF_LOG_FIXED (8) flag.
>
> This patch set adjusts libbpf to allow specifying flags beyond 1 | 2 | 4. We
> also add --log-size and --log-fixed options to veristat to be able to both
> test this functionality manually, but also to be used in various debugging
> scenarios. We also add selftests that tries many variants of log buffer size
> to stress-test correctness of internal verifier log bookkeeping code.
>
> v1->v2:
>   - return -ENOSPC even in rotating log mode for preserving backwards
>     compatibility (Lorenz);

I haven't implemented the feature we discussed, where the
BPF_PROG_LOAD (and BPF_BTF_LOAD) command will return back the full
size of the buffer that's necessary to contain the complete log
buffer. I'm building it on top of this patch set and would like to
send it separately as a follow up, as it touches UAPI some more, and I
feel like we'll have few revisions just for this. So I didn't want to
delay these changes. Plus, I think to add this even for BPF_LOG_FIXED
mode, so it's provided consistently. So I need a bit more time to
implement this. Hopefully this version will work for everyone and can
go in sooner.

>
> Andrii Nakryiko (6):
>   bpf: split off basic BPF verifier log into separate file
>   bpf: remove minimum size restrictions on verifier log buffer
>   bpf: switch BPF verifier log to be a rotating log by default
>   libbpf: don't enforce verifier log levels on libbpf side
>   selftests/bpf: add more veristat control over verifier log options
>   selftests/bpf: add fixed vs rotating verifier log tests
>
>  include/linux/bpf_verifier.h                  |  48 ++--
>  kernel/bpf/Makefile                           |   3 +-
>  kernel/bpf/btf.c                              |   3 +-
>  kernel/bpf/log.c                              | 262 ++++++++++++++++++
>  kernel/bpf/verifier.c                         |  88 +-----
>  tools/lib/bpf/bpf.c                           |   2 -
>  .../selftests/bpf/prog_tests/log_fixup.c      |   1 +
>  .../selftests/bpf/prog_tests/verifier_log.c   | 164 +++++++++++
>  tools/testing/selftests/bpf/veristat.c        |  42 ++-
>  9 files changed, 506 insertions(+), 107 deletions(-)
>  create mode 100644 kernel/bpf/log.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/verifier_log.c
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 0/6] BPF verifier rotating log
  2023-03-28 23:59 ` [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
@ 2023-03-30  5:38   ` Alexei Starovoitov
  2023-03-30  9:31     ` Lorenz Bauer
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2023-03-30  5:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, lmb, Timo Beckers, robin.goegge, Kernel Team

On Tue, Mar 28, 2023 at 4:59 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Mar 28, 2023 at 4:56 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > This patch set changes BPF verifier log behavior to behave as a rotating log,
> > by default. If user-supplied log buffer is big enough to contain entire
> > verifier log output, there is no effective difference. But where previously
> > user supplied too small log buffer and would get -ENOSPC error result and the
> > beginning part of the verifier log, now there will be no error and user will
> > get ending part of verifier log filling up user-supplied log buffer.  Which
> > is, in absolute majority of cases, is exactly what's useful, relevant, and
> > what users want and need, as the ending of the verifier log is containing
> > details of verifier failure and relevant state that got us to that failure. So
> > this rotating mode is made default, but for some niche advanced debugging
> > scenarios it's possible to request old behavior by specifying additional
> > BPF_LOG_FIXED (8) flag.
> >
> > This patch set adjusts libbpf to allow specifying flags beyond 1 | 2 | 4. We
> > also add --log-size and --log-fixed options to veristat to be able to both
> > test this functionality manually, but also to be used in various debugging
> > scenarios. We also add selftests that tries many variants of log buffer size
> > to stress-test correctness of internal verifier log bookkeeping code.
> >
> > v1->v2:
> >   - return -ENOSPC even in rotating log mode for preserving backwards
> >     compatibility (Lorenz);
>
> I haven't implemented the feature we discussed, where the
> BPF_PROG_LOAD (and BPF_BTF_LOAD) command will return back the full
> size of the buffer that's necessary to contain the complete log
> buffer. I'm building it on top of this patch set and would like to
> send it separately as a follow up, as it touches UAPI some more, and I
> feel like we'll have few revisions just for this. So I didn't want to
> delay these changes. Plus, I think to add this even for BPF_LOG_FIXED
> mode, so it's provided consistently. So I need a bit more time to
> implement this. Hopefully this version will work for everyone and can
> go in sooner.

Lorenz, ping.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 0/6] BPF verifier rotating log
  2023-03-30  5:38   ` Alexei Starovoitov
@ 2023-03-30  9:31     ` Lorenz Bauer
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenz Bauer @ 2023-03-30  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Timo Beckers, robin.goegge,
	Kernel Team

On Thu, Mar 30, 2023 at 6:39 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Lorenz, ping.

Sorry, I've had a busy week and I'm not working tomorrow. I'll try and
get some review done for the rotating log series today at least.

Lorenz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file
  2023-03-28 23:56 ` [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file Andrii Nakryiko
@ 2023-03-30 17:12   ` Lorenz Bauer
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenz Bauer @ 2023-03-30 17:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, timo, robin.goegge, kernel-team

On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> kernel/bpf/verifier.c file is large and growing larger all the time. So
> it's good to start splitting off more or less self-contained parts into
> separate files to keep source code size (somewhat) somewhat under
> control.
>
> This patch is a one step in this direction, moving some of BPF verifier log
> routines into a separate kernel/bpf/log.c. Right now it's most low-level
> and isolated routines to append data to log, reset log to previous
> position, etc. Eventually we could probably move verifier state
> printing logic here as well, but this patch doesn't attempt to do that
> yet.
>
> Subsequent patches will add more logic to verifier log management, so
> having basics in a separate file will make sure verifier.c doesn't grow
> more with new changes.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Lorenz Bauer <lmb@isovalent.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer
  2023-03-28 23:56 ` [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer Andrii Nakryiko
@ 2023-03-30 17:12   ` Lorenz Bauer
  2023-03-30 20:48     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenz Bauer @ 2023-03-30 17:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, timo, robin.goegge, kernel-team

On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> It's not clear why we have 128 as minimum size, but it makes testing
> harder and seems unnecessary, as we carefully handle truncation
> scenarios and use proper snprintf variants. So remove this limitation
> and just enfore positive length for log buffer.

Nit: enforce

>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Lorenz Bauer <lmb@isovalent.com>

> ---
>  kernel/bpf/log.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
> index 920061e38d2e..1974891fc324 100644
> --- a/kernel/bpf/log.c
> +++ b/kernel/bpf/log.c
> @@ -11,7 +11,7 @@
>
>  bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
>  {
> -       return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 &&
> +       return log->len_total > 0 && log->len_total <= UINT_MAX >> 2 &&
>                log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK);

Probably discussion for your second series. Thought experiment, could
this be len_total >= 0? I'm still after a way to get the correct log
buffer size from the first PROG_LOAD call. If the kernel could handle
this I could mmap a PROT_NONE page, set len_total to 0 and later read
out the correct buffer size.

I'm guessing that the null termination logic would have to be
adjusted, anything else though?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default
  2023-03-28 23:56 ` [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default Andrii Nakryiko
@ 2023-03-30 17:12   ` Lorenz Bauer
  2023-03-30 20:48     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenz Bauer @ 2023-03-30 17:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, timo, robin.goegge, kernel-team

On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:

I like the approach! My main problem is always following the various
offset calculations. I'm still not convinced I understand them fully.

For example, there is a bunch of accounting for the implicit 0 that is
written out to user space. Wouldn't it be easier to not terminate the
buffer during normal operation, and instead only do that in
vlog_finalize()?

Another example is start_pos in bpf_verifier_log. In a way, the
verifier log is a bit peculiar as ring buffers go, since there is no
reader we need to account for. It seems to me like we don't need to
track start_pos at all? We can deduce that truncation has occurred if
end_pos >= total_len. Similarly, the number of useful bytes is
min(end_pos, total_len). We might need to track the total required
size later (haven't had time to look at your other patch set, sorry),
but then we could do that independent of the ring buffer
implementation.

In my mind we could end up with something like this:

void bpf_verifier_vlog(...) {
  ...
  if (log->level & BPF_LOG_FIXED)
    n = min(n, bpf_vlog_available()); // TODO: Need to signal ENOSPC
somehow, maybe just a bool?

  bpf_vlog_append(..., n)
}

bpf_vlog_append() would only deal with writing into the buffer, no
BPF_LOG_FIXED specific code. Null termination would happen in
bpf_vlog_finalize.

Some more thoughts below:

>  #define BPF_LOG_LEVEL1 1
>  #define BPF_LOG_LEVEL2 2
>  #define BPF_LOG_STATS  4
> +#define BPF_LOG_FIXED  8

Nit: how about calling this BPF_LOG_TRUNCATE_TAIL instead? FIXED only
makes sense with the context that we implement this using a (rotating)
ring buffer.

> --- a/kernel/bpf/log.c
> +++ b/kernel/bpf/log.c
> @@ -8,6 +8,7 @@
>  #include <linux/types.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf_verifier.h>
> +#include <linux/math64.h>
>
>  bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
>  {
> @@ -32,23 +33,199 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
>                 return;
>         }
>
> -       n = min(log->len_total - log->len_used - 1, n);
> -       log->kbuf[n] = '\0';
> -       if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
> -               log->len_used += n;
> -       else
> -               log->ubuf = NULL;
> +       if (log->level & BPF_LOG_FIXED) {
> +               n = min(log->len_total - bpf_log_used(log) - 1, n);

Noob question, doesn't log->len_total - bpf_log_used(log) - 1
underflow if the log is full? Maybe worth turning this into
bpf_vlog_available() as well.

> +               log->kbuf[n] = '\0';
> +               n += 1;

I personally find the original style of doing copy_to_user(..., n+1)
instead of adding and subtracting from n easier to read. Still prefer
to terminate in finalize().

> +
> +               if (copy_to_user(log->ubuf + log->end_pos, log->kbuf, n))
> +                       goto fail;
> +
> +               log->end_pos += n - 1; /* don't count terminating '\0' */
> +       } else {
> +               u64 new_end, new_start, cur_pos;
> +               u32 buf_start, buf_end, new_n;
> +
> +               log->kbuf[n] = '\0';
> +               n += 1;
> +
> +               new_end = log->end_pos + n;
> +               if (new_end - log->start_pos >= log->len_total)
> +                       new_start = new_end - log->len_total;
> +               else
> +                       new_start = log->start_pos;
> +               new_n = min(n, log->len_total);
> +               cur_pos = new_end - new_n;
> +
> +               div_u64_rem(cur_pos, log->len_total, &buf_start);
> +               div_u64_rem(new_end, log->len_total, &buf_end);
> +               /* new_end and buf_end are exclusive indices, so if buf_end is
> +                * exactly zero, then it actually points right to the end of
> +                * ubuf and there is no wrap around
> +                */
> +               if (buf_end == 0)
> +                       buf_end = log->len_total;
> +
> +               /* if buf_start > buf_end, we wrapped around;
> +                * if buf_start == buf_end, then we fill ubuf completely; we
> +                * can't have buf_start == buf_end to mean that there is
> +                * nothing to write, because we always write at least
> +                * something, even if terminal '\0'
> +                */
> +               if (buf_start < buf_end) {
> +                       /* message fits within contiguous chunk of ubuf */
> +                       if (copy_to_user(log->ubuf + buf_start,
> +                                        log->kbuf + n - new_n,
> +                                        buf_end - buf_start))
> +                               goto fail;
> +               } else {
> +                       /* message wraps around the end of ubuf, copy in two chunks */
> +                       if (copy_to_user(log->ubuf + buf_start,
> +                                        log->kbuf + n - new_n,
> +                                        log->len_total - buf_start))
> +                               goto fail;
> +                       if (copy_to_user(log->ubuf,
> +                                        log->kbuf + n - buf_end,
> +                                        buf_end))
> +                               goto fail;
> +               }
> +
> +               log->start_pos = new_start;
> +               log->end_pos = new_end - 1; /* don't count terminating '\0' */
> +       }
> +
> +       return;
> +fail:
> +       log->ubuf = NULL;
>  }
>
> -void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos)
> +void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos)
>  {
>         char zero = 0;
> +       u32 pos;
>
>         if (!bpf_verifier_log_needed(log))
>                 return;
>
> -       log->len_used = new_pos;
> -       if (put_user(zero, log->ubuf + new_pos))
> +       /* if position to which we reset is beyond current log window,
> +        * then we didn't preserve any useful content and should adjust
> +        * start_pos to end up with an empty log (start_pos == end_pos)
> +        */
> +       log->end_pos = new_pos;

Is it valid for new_pos to be > log->end_pos? If not it would be nice
to bail out here (maybe with a WARN_ON_ONCE?). Then this function
could become bpf_vlog_truncate.

> +       if (log->end_pos < log->start_pos)
> +               log->start_pos = log->end_pos;
> +       div_u64_rem(new_pos, log->len_total, &pos);
> +       if (put_user(zero, log->ubuf + pos))
> +               log->ubuf = NULL;
> +}
> +
> +static void bpf_vlog_reverse_kbuf(char *buf, int len)

This isn't really kbuf specific, how about just reverse_buf?

> +{
> +       int i, j;
> +
> +       for (i = 0, j = len - 1; i < j; i++, j--)
> +               swap(buf[i], buf[j]);
> +}
> +
> +static int bpf_vlog_reverse_ubuf(struct bpf_verifier_log *log, int start, int end)

Oh boy, I spent a while trying to understand this, then trying to come
up with something simpler. Neither of which I was very successful with
:o)

> +{
> +       /* we split log->kbuf into two equal parts for both ends of array */
> +       int n = sizeof(log->kbuf) / 2, nn;
> +       char *lbuf = log->kbuf, *rbuf = log->kbuf + n;
> +
> +       /* Read ubuf's section [start, end) two chunks at a time, from left
> +        * and right side; within each chunk, swap all the bytes; after that
> +        * reverse the order of lbuf and rbuf and write result back to ubuf.
> +        * This way we'll end up with swapped contents of specified
> +        * [start, end) ubuf segment.
> +        */
> +       while (end - start > 1) {
> +               nn = min(n, (end - start ) / 2);
> +
> +               if (copy_from_user(lbuf, log->ubuf + start, nn))
> +                       return -EFAULT;
> +               if (copy_from_user(rbuf, log->ubuf + end - nn, nn))
> +                       return -EFAULT;
> +
> +               bpf_vlog_reverse_kbuf(lbuf, nn);
> +               bpf_vlog_reverse_kbuf(rbuf, nn);
> +
> +               /* we write lbuf to the right end of ubuf, while rbuf to the
> +                * left one to end up with properly reversed overall ubuf
> +                */
> +               if (copy_to_user(log->ubuf + start, rbuf, nn))
> +                       return -EFAULT;
> +               if (copy_to_user(log->ubuf + end - nn, lbuf, nn))
> +                       return -EFAULT;
> +
> +               start += nn;
> +               end -= nn;
> +       }
> +
> +       return 0;
> +}
> +
> +bool bpf_vlog_truncated(const struct bpf_verifier_log *log)
> +{
> +       if (log->level & BPF_LOG_FIXED)
> +               return bpf_log_used(log) >= log->len_total - 1;

Can this ever return true? In verifier_vlog we avoid writing more than
total - bpf_log_used. Seems like your new test case covers this, so
I'm a bit lost...

> +       else
> +               return log->start_pos > 0;
> +}
> +
> +void bpf_vlog_finalize(struct bpf_verifier_log *log)
> +{
> +       u32 sublen;

Nit: "pivot" might be more appropriate?

> +       int err;
> +
> +       if (!log || !log->level || !log->ubuf)
> +               return;
> +       if ((log->level & BPF_LOG_FIXED) || log->level == BPF_LOG_KERNEL)
> +               return;
> +
> +       /* If we never truncated log, there is nothing to move around. */
> +       if (log->start_pos == 0)
> +               return;
> +
> +       /* Otherwise we need to rotate log contents to make it start from the
> +        * buffer beginning and be a continuous zero-terminated string. Note
> +        * that if log->start_pos != 0 then we definitely filled up entire log
> +        * buffer with no gaps, and we just need to shift buffer contents to
> +        * the left by (log->start_pos % log->len_total) bytes.
> +        *
> +        * Unfortunately, user buffer could be huge and we don't want to
> +        * allocate temporary kernel memory of the same size just to shift
> +        * contents in a straightforward fashion. Instead, we'll be clever and
> +        * do in-place array rotation. This is a leetcode-style problem, which
> +        * could be solved by three rotations.

It took me a while to understand this, but the problem isn't in place
rotation, right? It's that we can't directly access ubuf so we have to
use kbuf as a window.

> +        *
> +        * Let's say we have log buffer that has to be shifted left by 7 bytes
> +        * (spaces and vertical bar is just for demonstrative purposes):
> +        *   E F G H I J K | A B C D
> +        *
> +        * First, we reverse entire array:
> +        *   D C B A | K J I H G F E
> +        *
> +        * Then we rotate first 4 bytes (DCBA) and separately last 7 bytes
> +        * (KJIHGFE), resulting in a properly rotated array:
> +        *   A B C D | E F G H I J K
> +        *
> +        * We'll utilize log->kbuf to read user memory chunk by chunk, swap
> +        * bytes, and write them back. Doing it byte-by-byte would be
> +        * unnecessarily inefficient. Altogether we are going to read and
> +        * write each byte twice.

But four copies in total?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side
  2023-03-28 23:56 ` [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side Andrii Nakryiko
@ 2023-03-30 17:12   ` Lorenz Bauer
  2023-03-30 21:05     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenz Bauer @ 2023-03-30 17:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, timo, robin.goegge, kernel-team

On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> This basically prevents any forward compatibility. And we either way
> just return -EINVAL, which would otherwise be returned from bpf()
> syscall anyways.

In your cover letter you make the argument that applications can opt
out of the behaviour, but I think shows that this isn't entirely true.
Apps linking old libbpf won't be able to fix their breakage without
updating libbpf. This is especially annoying when you have to support
multiple old versions where doing this isn't straightforward.

Take this as another plea to make this opt in and instead work
together to make this a default on the lib side. :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests
  2023-03-28 23:56 ` [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests Andrii Nakryiko
@ 2023-03-30 17:12   ` Lorenz Bauer
  2023-03-30 21:07     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenz Bauer @ 2023-03-30 17:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, timo, robin.goegge, kernel-team

On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add selftests validating BPF_LOG_FIXED behavior, which used to be the
> only behavior, and now default rotating BPF verifier log, which returns
> just up to last N bytes of full verifier log, instead of returning
> -ENOSPC.
>
> To stress test correctness of in-kernel verifier log logic, we force it
> to truncate program's verifier log to all lengths from 1 all the way to
> its full size (about 450 bytes today). This was a useful stress test
> while developing the feature.
>
> For both fixed and rotating log modes we expect -ENOSPC if log contents
> doesn't fit in user-supplied log buffer.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Lorenz Bauer <lmb@isovalent.com>

> +       /* validate BPF_LOG_FIXED works as verifier log used to work, that is:
> +        * we get -ENOSPC and beginning of the full verifier log. This only
> +        * works for log_level 2 and log_level 1 + failed program. For log
> +        * level 2 we don't reset log at all. For log_level 1 + failed program
> +        * we don't get to verification stats output. With log level 1
> +        * for successful program  final result will be just verifier stats.
> +        * But if provided too short log buf, kernel will NULL-out log->ubuf

Out of curiousity: why is ubuf NULLed? Is that something we could change?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default
  2023-03-30 17:12   ` Lorenz Bauer
@ 2023-03-30 20:48     ` Andrii Nakryiko
  2023-04-05 17:29       ` Lorenz Bauer
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-30 20:48 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, timo,
	robin.goegge, kernel-team

On Thu, Mar 30, 2023 at 10:12 AM Lorenz Bauer <lmb@isovalent.com> wrote:
>
> On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> I like the approach! My main problem is always following the various
> offset calculations. I'm still not convinced I understand them fully.
>
> For example, there is a bunch of accounting for the implicit 0 that is
> written out to user space. Wouldn't it be easier to not terminate the
> buffer during normal operation, and instead only do that in
> vlog_finalize()?

So I'm preserving the original behavior, but I also think the original
behavior makes sense, as it tries to keep valid string contents at all
times. At least I don't really see much problem with it, do you?

>
> Another example is start_pos in bpf_verifier_log. In a way, the
> verifier log is a bit peculiar as ring buffers go, since there is no
> reader we need to account for. It seems to me like we don't need to
> track start_pos at all? We can deduce that truncation has occurred if
> end_pos >= total_len. Similarly, the number of useful bytes is
> min(end_pos, total_len). We might need to track the total required
> size later (haven't had time to look at your other patch set, sorry),
> but then we could do that independent of the ring buffer
> implementation.

Hm... start_pos definitely is necessary due to bpf_vlog_reset() at
least. Similarly, end_pos can go back on log reset. But second patch
set simplifies this further, as we keep track of maximum log content
size we ever reach, and that could be straightforwardly compared to
log->len_total.

>
> In my mind we could end up with something like this:
>
> void bpf_verifier_vlog(...) {
>   ...
>   if (log->level & BPF_LOG_FIXED)
>     n = min(n, bpf_vlog_available()); // TODO: Need to signal ENOSPC
> somehow, maybe just a bool?
>
>   bpf_vlog_append(..., n)
> }
>
> bpf_vlog_append() would only deal with writing into the buffer, no
> BPF_LOG_FIXED specific code. Null termination would happen in
> bpf_vlog_finalize.

I'm not following. bpf_vlog_append() would have to distinguish between
BPF_LOG_FIXED and rotating mode, because in rotating mode you have to
wrap around the physical end of the buffer.

>
> Some more thoughts below:
>
> >  #define BPF_LOG_LEVEL1 1
> >  #define BPF_LOG_LEVEL2 2
> >  #define BPF_LOG_STATS  4
> > +#define BPF_LOG_FIXED  8
>
> Nit: how about calling this BPF_LOG_TRUNCATE_TAIL instead? FIXED only
> makes sense with the context that we implement this using a (rotating)
> ring buffer.

It's more verbose, so BPF_LOG_FIXED seems more in line with existing
constants names. But note that this is not part of UAPI, user-space
won't see/have "BPF_LOG_FIXED" constant.

>
> > --- a/kernel/bpf/log.c
> > +++ b/kernel/bpf/log.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/types.h>
> >  #include <linux/bpf.h>
> >  #include <linux/bpf_verifier.h>
> > +#include <linux/math64.h>
> >
> >  bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
> >  {
> > @@ -32,23 +33,199 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
> >                 return;
> >         }
> >
> > -       n = min(log->len_total - log->len_used - 1, n);
> > -       log->kbuf[n] = '\0';
> > -       if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
> > -               log->len_used += n;
> > -       else
> > -               log->ubuf = NULL;
> > +       if (log->level & BPF_LOG_FIXED) {
> > +               n = min(log->len_total - bpf_log_used(log) - 1, n);
>
> Noob question, doesn't log->len_total - bpf_log_used(log) - 1
> underflow if the log is full? Maybe worth turning this into
> bpf_vlog_available() as well.

It's a valid question. The answer is that currently
bpf_verifier_vlog() assumes log is not completely full, because all
the callers first check bpf_verifier_log_needed(), which returns false
if the log buffer is full.

I'm removing this assumption (and the whole "is log full" check) in
the follow up patch set.


>
> > +               log->kbuf[n] = '\0';
> > +               n += 1;
>
> I personally find the original style of doing copy_to_user(..., n+1)
> instead of adding and subtracting from n easier to read. Still prefer
> to terminate in finalize().
>

This makes the rest of the math (especially for rotating mode) simpler
with no extra +1/-1.  See how many uses of n are below, and imagine
that we'll have to do +1 there. Conceptually, n is "how much stuff we
write to user-space", so it makes sense if we write a zero-terminating
byte.

> > +
> > +               if (copy_to_user(log->ubuf + log->end_pos, log->kbuf, n))
> > +                       goto fail;
> > +
> > +               log->end_pos += n - 1; /* don't count terminating '\0' */
> > +       } else {
> > +               u64 new_end, new_start, cur_pos;
> > +               u32 buf_start, buf_end, new_n;
> > +
> > +               log->kbuf[n] = '\0';
> > +               n += 1;
> > +
> > +               new_end = log->end_pos + n;
> > +               if (new_end - log->start_pos >= log->len_total)
> > +                       new_start = new_end - log->len_total;
> > +               else
> > +                       new_start = log->start_pos;
> > +               new_n = min(n, log->len_total);
> > +               cur_pos = new_end - new_n;
> > +
> > +               div_u64_rem(cur_pos, log->len_total, &buf_start);
> > +               div_u64_rem(new_end, log->len_total, &buf_end);
> > +               /* new_end and buf_end are exclusive indices, so if buf_end is
> > +                * exactly zero, then it actually points right to the end of
> > +                * ubuf and there is no wrap around
> > +                */
> > +               if (buf_end == 0)
> > +                       buf_end = log->len_total;
> > +
> > +               /* if buf_start > buf_end, we wrapped around;
> > +                * if buf_start == buf_end, then we fill ubuf completely; we
> > +                * can't have buf_start == buf_end to mean that there is
> > +                * nothing to write, because we always write at least
> > +                * something, even if terminal '\0'
> > +                */
> > +               if (buf_start < buf_end) {
> > +                       /* message fits within contiguous chunk of ubuf */
> > +                       if (copy_to_user(log->ubuf + buf_start,
> > +                                        log->kbuf + n - new_n,
> > +                                        buf_end - buf_start))
> > +                               goto fail;
> > +               } else {
> > +                       /* message wraps around the end of ubuf, copy in two chunks */
> > +                       if (copy_to_user(log->ubuf + buf_start,
> > +                                        log->kbuf + n - new_n,
> > +                                        log->len_total - buf_start))
> > +                               goto fail;
> > +                       if (copy_to_user(log->ubuf,
> > +                                        log->kbuf + n - buf_end,
> > +                                        buf_end))
> > +                               goto fail;
> > +               }
> > +
> > +               log->start_pos = new_start;
> > +               log->end_pos = new_end - 1; /* don't count terminating '\0' */
> > +       }
> > +
> > +       return;
> > +fail:
> > +       log->ubuf = NULL;
> >  }
> >
> > -void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos)
> > +void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos)
> >  {
> >         char zero = 0;
> > +       u32 pos;
> >
> >         if (!bpf_verifier_log_needed(log))
> >                 return;
> >
> > -       log->len_used = new_pos;
> > -       if (put_user(zero, log->ubuf + new_pos))
> > +       /* if position to which we reset is beyond current log window,
> > +        * then we didn't preserve any useful content and should adjust
> > +        * start_pos to end up with an empty log (start_pos == end_pos)
> > +        */
> > +       log->end_pos = new_pos;
>
> Is it valid for new_pos to be > log->end_pos? If not it would be nice
> to bail out here (maybe with a WARN_ON_ONCE?). Then this function
> could become bpf_vlog_truncate.

new_pos should always be <= end_pos, unless there is some bug
somewhere else. It's unlikely we'll have a bug with this, because
verifier code just remembers previous end_pos temporarily and resets
to it, it doesn't do any match on that, I think.

I can add this extra check, but this feels like it falls into the
"unnecessary defensive code" bucket, so I expect a push back for that.

>
> > +       if (log->end_pos < log->start_pos)
> > +               log->start_pos = log->end_pos;
> > +       div_u64_rem(new_pos, log->len_total, &pos);
> > +       if (put_user(zero, log->ubuf + pos))
> > +               log->ubuf = NULL;
> > +}
> > +
> > +static void bpf_vlog_reverse_kbuf(char *buf, int len)
>
> This isn't really kbuf specific, how about just reverse_buf?

kbuf as opposed to ubuf. Kernel-space manipulations, which don't
require copy_from_user. I wanted to emphasize this distinction and
keep symmetry with bpf_vlog_reverse_ubuf().

>
> > +{
> > +       int i, j;
> > +
> > +       for (i = 0, j = len - 1; i < j; i++, j--)
> > +               swap(buf[i], buf[j]);
> > +}
> > +
> > +static int bpf_vlog_reverse_ubuf(struct bpf_verifier_log *log, int start, int end)
>
> Oh boy, I spent a while trying to understand this, then trying to come
> up with something simpler. Neither of which I was very successful with
> :o)

yeah, it's conceptually very simple, but because of kernel-user
boundary and buffering, we need to do it two symmetrical chunks at a
time

>
> > +{
> > +       /* we split log->kbuf into two equal parts for both ends of array */
> > +       int n = sizeof(log->kbuf) / 2, nn;
> > +       char *lbuf = log->kbuf, *rbuf = log->kbuf + n;
> > +
> > +       /* Read ubuf's section [start, end) two chunks at a time, from left
> > +        * and right side; within each chunk, swap all the bytes; after that
> > +        * reverse the order of lbuf and rbuf and write result back to ubuf.
> > +        * This way we'll end up with swapped contents of specified
> > +        * [start, end) ubuf segment.
> > +        */
> > +       while (end - start > 1) {
> > +               nn = min(n, (end - start ) / 2);
> > +
> > +               if (copy_from_user(lbuf, log->ubuf + start, nn))
> > +                       return -EFAULT;
> > +               if (copy_from_user(rbuf, log->ubuf + end - nn, nn))
> > +                       return -EFAULT;
> > +
> > +               bpf_vlog_reverse_kbuf(lbuf, nn);
> > +               bpf_vlog_reverse_kbuf(rbuf, nn);
> > +
> > +               /* we write lbuf to the right end of ubuf, while rbuf to the
> > +                * left one to end up with properly reversed overall ubuf
> > +                */
> > +               if (copy_to_user(log->ubuf + start, rbuf, nn))
> > +                       return -EFAULT;
> > +               if (copy_to_user(log->ubuf + end - nn, lbuf, nn))
> > +                       return -EFAULT;
> > +
> > +               start += nn;
> > +               end -= nn;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +bool bpf_vlog_truncated(const struct bpf_verifier_log *log)
> > +{
> > +       if (log->level & BPF_LOG_FIXED)
> > +               return bpf_log_used(log) >= log->len_total - 1;
>
> Can this ever return true? In verifier_vlog we avoid writing more than
> total - bpf_log_used. Seems like your new test case covers this, so
> I'm a bit lost...

I think so. bpf_log_used() returns buffer contents excluding
terminating zero, so it can only go up to log->len_total - 1, but if
it is as log->len_total - 1, then we treat it as overflow. I mention
this in the second patch set, but this approach doesn't distinguish
log buffer filled up to exactly N bytes, vs log_buffer that was > N
bytes long, but got truncated. I actually change and simplify this
logic in second patch set and make it distinguish these two
situations. This was important to make sure that log_size_actual can
be used as is exactly, otherwise users would have to supply at least
log_size_actual + 1.

So, basically, here, I'm preserving existing behavior, and patch set 2
changes this bit as part of overall fixed mode overhaul.

>
> > +       else
> > +               return log->start_pos > 0;
> > +}
> > +
> > +void bpf_vlog_finalize(struct bpf_verifier_log *log)
> > +{
> > +       u32 sublen;
>
> Nit: "pivot" might be more appropriate?

div_u64_rem(log->start_pos, log->len_total, &sublen);
sublen = log->len_total - sublen;

pivot implies it's an index of an element, and then causes questions
about inclusive/exclusive, right? While length is length, so seems
less ambiguous?


>
> > +       int err;
> > +
> > +       if (!log || !log->level || !log->ubuf)
> > +               return;
> > +       if ((log->level & BPF_LOG_FIXED) || log->level == BPF_LOG_KERNEL)
> > +               return;
> > +
> > +       /* If we never truncated log, there is nothing to move around. */
> > +       if (log->start_pos == 0)
> > +               return;
> > +
> > +       /* Otherwise we need to rotate log contents to make it start from the
> > +        * buffer beginning and be a continuous zero-terminated string. Note
> > +        * that if log->start_pos != 0 then we definitely filled up entire log
> > +        * buffer with no gaps, and we just need to shift buffer contents to
> > +        * the left by (log->start_pos % log->len_total) bytes.
> > +        *
> > +        * Unfortunately, user buffer could be huge and we don't want to
> > +        * allocate temporary kernel memory of the same size just to shift
> > +        * contents in a straightforward fashion. Instead, we'll be clever and
> > +        * do in-place array rotation. This is a leetcode-style problem, which
> > +        * could be solved by three rotations.
>
> It took me a while to understand this, but the problem isn't in place
> rotation, right? It's that we can't directly access ubuf so we have to
> use kbuf as a window.

Hm.. no, it is the rotation in place. Even if it was in the kernel
buffer and we wanted to rotate this without creating a second large
copy of the buffer, we'd have to do this double rotation. Kernel vs
user space adds complications for sure, but it is hidden inside
bpf_vlog_reverse_ubuf(), and as far as this overall algorithm goes,
doesn't matter.

>
> > +        *
> > +        * Let's say we have log buffer that has to be shifted left by 7 bytes
> > +        * (spaces and vertical bar is just for demonstrative purposes):
> > +        *   E F G H I J K | A B C D
> > +        *
> > +        * First, we reverse entire array:
> > +        *   D C B A | K J I H G F E
> > +        *
> > +        * Then we rotate first 4 bytes (DCBA) and separately last 7 bytes
> > +        * (KJIHGFE), resulting in a properly rotated array:
> > +        *   A B C D | E F G H I J K
> > +        *
> > +        * We'll utilize log->kbuf to read user memory chunk by chunk, swap
> > +        * bytes, and write them back. Doing it byte-by-byte would be
> > +        * unnecessarily inefficient. Altogether we are going to read and
> > +        * write each byte twice.
>
> But four copies in total?

So each rotation reads each byte once and writes each byte once. So
two copies. And then the entire buffer is rotated twice (three
rotating steps, but overall contents is rotated twice), so two reads
and two writes for each byte, 4 memory copies altogether. Would you
like me to clarify this some more?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer
  2023-03-30 17:12   ` Lorenz Bauer
@ 2023-03-30 20:48     ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-30 20:48 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, timo,
	robin.goegge, kernel-team

On Thu, Mar 30, 2023 at 10:12 AM Lorenz Bauer <lmb@isovalent.com> wrote:
>
> On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > It's not clear why we have 128 as minimum size, but it makes testing
> > harder and seems unnecessary, as we carefully handle truncation
> > scenarios and use proper snprintf variants. So remove this limitation
> > and just enfore positive length for log buffer.
>
> Nit: enforce

life is boring without typos, will fix in the next revision (if there
has to be another one) :)

>
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Acked-by: Lorenz Bauer <lmb@isovalent.com>
>
> > ---
> >  kernel/bpf/log.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
> > index 920061e38d2e..1974891fc324 100644
> > --- a/kernel/bpf/log.c
> > +++ b/kernel/bpf/log.c
> > @@ -11,7 +11,7 @@
> >
> >  bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
> >  {
> > -       return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 &&
> > +       return log->len_total > 0 && log->len_total <= UINT_MAX >> 2 &&
> >                log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK);
>
> Probably discussion for your second series. Thought experiment, could
> this be len_total >= 0? I'm still after a way to get the correct log
> buffer size from the first PROG_LOAD call. If the kernel could handle
> this I could mmap a PROT_NONE page, set len_total to 0 and later read
> out the correct buffer size.
>
> I'm guessing that the null termination logic would have to be
> adjusted, anything else though?

I haven't tried, but I'll check if anything needs to be adjusted. But
I don't mind this semantics, it matches other places, like when
getting JIT dump for program info, etc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side
  2023-03-30 17:12   ` Lorenz Bauer
@ 2023-03-30 21:05     ` Andrii Nakryiko
  2023-03-31  9:39       ` Timo Beckers
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-30 21:05 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, timo,
	robin.goegge, kernel-team

On Thu, Mar 30, 2023 at 10:13 AM Lorenz Bauer <lmb@isovalent.com> wrote:
>
> On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > This basically prevents any forward compatibility. And we either way
> > just return -EINVAL, which would otherwise be returned from bpf()
> > syscall anyways.
>
> In your cover letter you make the argument that applications can opt
> out of the behaviour, but I think shows that this isn't entirely true.
> Apps linking old libbpf won't be able to fix their breakage without
> updating libbpf. This is especially annoying when you have to support
> multiple old versions where doing this isn't straightforward.
>

Ok, technically, you are correct. If you somehow managed to get a
bleeding edge kernel, but outdated libbpf, you won't be able to
specify log_level = 8. This is not the only place where too old libbpf
would limit you from using bleeding edge kernel features, though, and
we have to live with that (though try our best to avoid such
dependencies, of course).

But in practice you get the freshest libbpf way before your kernel
becomes the freshest one, so I don't think this is a big deal in
practice.

> Take this as another plea to make this opt in and instead work
> together to make this a default on the lib side. :)

Please, help me understand the arguments against making rotating mode
a default, now that we return -ENOSPC on truncation. In which scenario
this difference matters?

1. If there is no truncation and the user provides a big enough buffer
(which my second patch set makes it even easier to do for libraries),
there is no difference, they get identical log contents and behavior.

2. If there was truncation, in both cases we get -ENOSPC. The contents
will differ. In one case we get the beginning of a long log with no
details about what actually caused the failure (useless in pretty much
any circumstances) versus you get the last N bytes of log, all the way
to actual error and some history leading towards it. Which is what we
used to debug and understand verification issues.

What is the situation where the beginning of the log is preferable? I
had exactly one case where I actually wanted the beginning of the log,
that was when I was debugging some bug in the verifier when
implementing open-coded iterators. This bug was happening early and
causing an infinite loop, so I wanted to see the first few pages of
the output to catch how it all started. But that was a development bug
of a tricky feature, definitely not something we expect for end users
to deal with. And it was literally *once* that I needed this.

Why are we fighting to preserve this much less useful behavior as a
default, if there is no reduction of functionality for end-users?
Library writers have full access to union bpf_attr and can opt-out
easily (though again, why?). Normal end users will never have to ask
for BPF_LOG_FIXED behavior. Maybe some advanced tool-building users
will want BPF_LOG_FIXED (like veristat, for example), but then it's in
their best interest to have fresh enough libbpf anyways.

So instead of "I want X over Y", let's discuss "*why* X is better than Y"?

> Apps linking old libbpf won't be able to fix their breakage without
> updating libbpf. This is especially annoying when you have to support

What sort of breakage would be there to fix?

Also keep in mind that not all use cases use BPF library's high-level
code that does all this fancy log buf manipulations. There are
legitimate cases where tools/applications want direct access to
log_buf, so needing to do extra feature detection to get rotating mode
(but falling back without failing to fixed mode on the old kernel) is
just an unnecessary nuisance.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests
  2023-03-30 17:12   ` Lorenz Bauer
@ 2023-03-30 21:07     ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-30 21:07 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, timo,
	robin.goegge, kernel-team

On Thu, Mar 30, 2023 at 10:13 AM Lorenz Bauer <lmb@isovalent.com> wrote:
>
> On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add selftests validating BPF_LOG_FIXED behavior, which used to be the
> > only behavior, and now default rotating BPF verifier log, which returns
> > just up to last N bytes of full verifier log, instead of returning
> > -ENOSPC.
> >
> > To stress test correctness of in-kernel verifier log logic, we force it
> > to truncate program's verifier log to all lengths from 1 all the way to
> > its full size (about 450 bytes today). This was a useful stress test
> > while developing the feature.
> >
> > For both fixed and rotating log modes we expect -ENOSPC if log contents
> > doesn't fit in user-supplied log buffer.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Acked-by: Lorenz Bauer <lmb@isovalent.com>
>
> > +       /* validate BPF_LOG_FIXED works as verifier log used to work, that is:
> > +        * we get -ENOSPC and beginning of the full verifier log. This only
> > +        * works for log_level 2 and log_level 1 + failed program. For log
> > +        * level 2 we don't reset log at all. For log_level 1 + failed program
> > +        * we don't get to verification stats output. With log level 1
> > +        * for successful program  final result will be just verifier stats.
> > +        * But if provided too short log buf, kernel will NULL-out log->ubuf
>
> Out of curiousity: why is ubuf NULLed? Is that something we could change?

This is internal kernel implementation detail. That's how the verifier
signals that there was an error during copy_to_user(). This is
behavior not observable from user space at all. Why would you like to
change that? bpf_attr's log_buf field will never be NULLed out, if
that's what you are worried about.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side
  2023-03-30 21:05     ` Andrii Nakryiko
@ 2023-03-31  9:39       ` Timo Beckers
  2023-03-31 15:59         ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Timo Beckers @ 2023-03-31  9:39 UTC (permalink / raw)
  To: Andrii Nakryiko, Lorenz Bauer
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, robin.goegge, kernel-team

On 3/30/23 23:05, Andrii Nakryiko wrote:
> On Thu, Mar 30, 2023 at 10:13 AM Lorenz Bauer <lmb@isovalent.com> wrote:
>> On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>> This basically prevents any forward compatibility. And we either way
>>> just return -EINVAL, which would otherwise be returned from bpf()
>>> syscall anyways.
>> In your cover letter you make the argument that applications can opt
>> out of the behaviour, but I think shows that this isn't entirely true.
>> Apps linking old libbpf won't be able to fix their breakage without
>> updating libbpf. This is especially annoying when you have to support
>> multiple old versions where doing this isn't straightforward.
>>
> Ok, technically, you are correct. If you somehow managed to get a
> bleeding edge kernel, but outdated libbpf, you won't be able to
> specify log_level = 8. This is not the only place where too old libbpf
> would limit you from using bleeding edge kernel features, though, and
> we have to live with that (though try our best to avoid such
> dependencies, of course).
>
> But in practice you get the freshest libbpf way before your kernel
> becomes the freshest one, so I don't think this is a big deal in
> practice.
Hey Andrii,

In an ideal world, yes, but not how it works out in practice. Anything
that ships in a container obviously misses out on distro packages of the
underlying host that are tied to the running kernel. This looks like it's
quickly becoming a majority use case of bpf in the wild, barring of course
Android and systemd.

In practice, we get to deal with rather large version swings in both
directions; users may want to run the latest tools on 4.19, or last
year's stable tool release on 6.1, so the need for both backwards- and
forwards-compatibility is definitely there.

Fortunately, things don't break that often. :) The biggest papercut
we've had recently was the introduction of enum64, which just flat
out requires the latest userspace if kernel btf is needed.
>> Take this as another plea to make this opt in and instead work
>> together to make this a default on the lib side. :)
> Please, help me understand the arguments against making rotating mode
> a default, now that we return -ENOSPC on truncation. In which scenario
> this difference matters?
I think there may be a misunderstanding. As you mentioned, there's rarely
anything useful at the start of the log. I think the opt-in behaviour
under discussion here is the lack of ENOSPC when the buffer is undersized.
Userspace should set a flag if it supports receiving a partial log without
expecting ENOSPC.

I've lost track of the exact changes that are now on the table with your
second patch set floating around. The way I understand it, there are
multiple isolated behavioural changes, so let's discuss them separately:

- Log will include the tail instead of the head. This is a good change, no
   argument there, and personally I wouldn't bother with a flag until
   someone complains. Old userspace is (worst case) already used to retrying
   with bigger buffers until ENOSPC is gone, and (best case) gets a few 
pages
   of actually useful log if it doesn't retry.

- log_size_actual: if populated by the kernel, userspace can do a perfect
   retry with the correct buffer size. Userspace will naturally be able to
   pick this up when their bpf_attr gets updated. No opt-in/flags needed
   because not a breaking change.

- No more ENOSPC: this needs to be opt-in, as old userspace relies on ENOSPC
   to drive the retry loop. If the kernel no longer returns ENOSPC by 
default,
   userspace will think it received a full log and won't be able to detect
   truncation if it doesn't yet know about the log_size_actual field.
   From what I gather, we're also in agreement on this. Idea for a flag 
name:
   BPF_LOG_PARTIAL?

Hope this can move the discussion forward, it looked to me like we were just
talking past each other. Thanks for addressing our feedback so far!
> 1. If there is no truncation and the user provides a big enough buffer
> (which my second patch set makes it even easier to do for libraries),
> there is no difference, they get identical log contents and behavior.
>
> 2. If there was truncation, in both cases we get -ENOSPC. The contents
> will differ. In one case we get the beginning of a long log with no
> details about what actually caused the failure (useless in pretty much
> any circumstances) versus you get the last N bytes of log, all the way
> to actual error and some history leading towards it. Which is what we
> used to debug and understand verification issues.
>
> What is the situation where the beginning of the log is preferable? I
> had exactly one case where I actually wanted the beginning of the log,
> that was when I was debugging some bug in the verifier when
> implementing open-coded iterators. This bug was happening early and
> causing an infinite loop, so I wanted to see the first few pages of
> the output to catch how it all started. But that was a development bug
> of a tricky feature, definitely not something we expect for end users
> to deal with. And it was literally *once* that I needed this.
>
> Why are we fighting to preserve this much less useful behavior as a
> default, if there is no reduction of functionality for end-users?
> Library writers have full access to union bpf_attr and can opt-out
> easily (though again, why?). Normal end users will never have to ask
> for BPF_LOG_FIXED behavior. Maybe some advanced tool-building users
> will want BPF_LOG_FIXED (like veristat, for example), but then it's in
> their best interest to have fresh enough libbpf anyways.
>
> So instead of "I want X over Y", let's discuss "*why* X is better than Y"?
>
>> Apps linking old libbpf won't be able to fix their breakage without
>> updating libbpf. This is especially annoying when you have to support
> What sort of breakage would be there to fix?
>
> Also keep in mind that not all use cases use BPF library's high-level
> code that does all this fancy log buf manipulations. There are
> legitimate cases where tools/applications want direct access to
> log_buf, so needing to do extra feature detection to get rotating mode
> (but falling back without failing to fixed mode on the old kernel) is
> just an unnecessary nuisance.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side
  2023-03-31  9:39       ` Timo Beckers
@ 2023-03-31 15:59         ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-03-31 15:59 UTC (permalink / raw)
  To: Timo Beckers
  Cc: Lorenz Bauer, Andrii Nakryiko, bpf, ast, daniel, martin.lau,
	robin.goegge, kernel-team

On Fri, Mar 31, 2023 at 2:39 AM Timo Beckers <timo@incline.eu> wrote:
>
> On 3/30/23 23:05, Andrii Nakryiko wrote:
> > On Thu, Mar 30, 2023 at 10:13 AM Lorenz Bauer <lmb@isovalent.com> wrote:
> >> On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>> This basically prevents any forward compatibility. And we either way
> >>> just return -EINVAL, which would otherwise be returned from bpf()
> >>> syscall anyways.
> >> In your cover letter you make the argument that applications can opt
> >> out of the behaviour, but I think shows that this isn't entirely true.
> >> Apps linking old libbpf won't be able to fix their breakage without
> >> updating libbpf. This is especially annoying when you have to support
> >> multiple old versions where doing this isn't straightforward.
> >>
> > Ok, technically, you are correct. If you somehow managed to get a
> > bleeding edge kernel, but outdated libbpf, you won't be able to
> > specify log_level = 8. This is not the only place where too old libbpf
> > would limit you from using bleeding edge kernel features, though, and
> > we have to live with that (though try our best to avoid such
> > dependencies, of course).
> >
> > But in practice you get the freshest libbpf way before your kernel
> > becomes the freshest one, so I don't think this is a big deal in
> > practice.
> Hey Andrii,
>

Hi Timo, thanks for chiming in!

> In an ideal world, yes, but not how it works out in practice. Anything
> that ships in a container obviously misses out on distro packages of the
> underlying host that are tied to the running kernel. This looks like it's
> quickly becoming a majority use case of bpf in the wild, barring of course
> Android and systemd.
>
> In practice, we get to deal with rather large version swings in both
> directions; users may want to run the latest tools on 4.19, or last
> year's stable tool release on 6.1, so the need for both backwards- and
> forwards-compatibility is definitely there.

Yep, that sucks. And just reminds us again and again what I've been
consistently telling anyone who wanted to listen: statically linking
against libbpf is the best strategy to avoid unnecessary headaches.
Control your destiny (i.e., dependencies).

But in general, yes, I totally agree that unnecessary changes that
require the newest libbpf should be avoided. I think that in this case
with BPF_LOG_FIXED it's justified, though, and I think you are
agreeing as well, see below.

>
> Fortunately, things don't break that often. :) The biggest papercut
> we've had recently was the introduction of enum64, which just flat
> out requires the latest userspace if kernel btf is needed.

Yep, BTF changes are most painful and are felt across the entire range
of tools, unfortunately. This is offtopic, but this is one of the
reasons we were discussing with Alan how to make BTF more
self-describing. I still think we should do this, no one had time to
take this on, yet.

> >> Take this as another plea to make this opt in and instead work
> >> together to make this a default on the lib side. :)
> > Please, help me understand the arguments against making rotating mode
> > a default, now that we return -ENOSPC on truncation. In which scenario
> > this difference matters?
> I think there may be a misunderstanding. As you mentioned, there's rarely
> anything useful at the start of the log. I think the opt-in behaviour
> under discussion here is the lack of ENOSPC when the buffer is undersized.
> Userspace should set a flag if it supports receiving a partial log without
> expecting ENOSPC.

Great, thanks for specifics. I think we can finally be on the same
page. Pretty much the only change in this v2 is exactly the -ENOSPC on
log truncation in *either mode*. See version log from cover letter:

  - return -ENOSPC even in rotating log mode for preserving backwards
    compatibility (Lorenz);

>
> I've lost track of the exact changes that are now on the table with your
> second patch set floating around. The way I understand it, there are
> multiple isolated behavioural changes, so let's discuss them separately:
>
> - Log will include the tail instead of the head. This is a good change, no
>    argument there, and personally I wouldn't bother with a flag until
>    someone complains. Old userspace is (worst case) already used to retrying
>    with bigger buffers until ENOSPC is gone, and (best case) gets a few
> pages
>    of actually useful log if it doesn't retry.

Agreed. And that's how it behaves, you don't need a flag to get this
behavior and that's exactly what I'm arguing with Lorenz to preserve.

>
> - log_size_actual: if populated by the kernel, userspace can do a perfect
>    retry with the correct buffer size. Userspace will naturally be able to
>    pick this up when their bpf_attr gets updated. No opt-in/flags needed
>    because not a breaking change.

Yes. That's what the follow up patch set is doing. No new flag, just a
new field which kernel populates. See [0]

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=735213&state=*


>
> - No more ENOSPC: this needs to be opt-in, as old userspace relies on ENOSPC
>    to drive the retry loop. If the kernel no longer returns ENOSPC by
> default,
>    userspace will think it received a full log and won't be able to detect
>    truncation if it doesn't yet know about the log_size_actual field.
>    From what I gather, we're also in agreement on this. Idea for a flag
> name:
>    BPF_LOG_PARTIAL?

Ok, this is where the misunderstanding is. We don't have a mode where
the verifier succeeds on log truncation. And I don't even plan to add
a flag to enable this no-ENOSPC mode. V2 always returns -ENOSPC on
truncation.

The only difference is that on truncation the user gets the tail of
the log, and that's the default behavior. And BPF_LOG_FIXED is there
only for uncommon cases where the beginning of the log is needed in
case of truncation. That's it.


>
> Hope this can move the discussion forward, it looked to me like we were just
> talking past each other. Thanks for addressing our feedback so far!

Yep, I hope so too. I think this v2 is exactly behaving like you want
and expect, and you guys missed that we do now return -ENOSPC even in
new rotating mode. Please let me know if this clarifies things,
thanks!


> > 1. If there is no truncation and the user provides a big enough buffer
> > (which my second patch set makes it even easier to do for libraries),
> > there is no difference, they get identical log contents and behavior.
> >
> > 2. If there was truncation, in both cases we get -ENOSPC. The contents
> > will differ. In one case we get the beginning of a long log with no
> > details about what actually caused the failure (useless in pretty much
> > any circumstances) versus you get the last N bytes of log, all the way
> > to actual error and some history leading towards it. Which is what we
> > used to debug and understand verification issues.
> >
> > What is the situation where the beginning of the log is preferable? I
> > had exactly one case where I actually wanted the beginning of the log,
> > that was when I was debugging some bug in the verifier when
> > implementing open-coded iterators. This bug was happening early and
> > causing an infinite loop, so I wanted to see the first few pages of
> > the output to catch how it all started. But that was a development bug
> > of a tricky feature, definitely not something we expect for end users
> > to deal with. And it was literally *once* that I needed this.
> >
> > Why are we fighting to preserve this much less useful behavior as a
> > default, if there is no reduction of functionality for end-users?
> > Library writers have full access to union bpf_attr and can opt-out
> > easily (though again, why?). Normal end users will never have to ask
> > for BPF_LOG_FIXED behavior. Maybe some advanced tool-building users
> > will want BPF_LOG_FIXED (like veristat, for example), but then it's in
> > their best interest to have fresh enough libbpf anyways.
> >
> > So instead of "I want X over Y", let's discuss "*why* X is better than Y"?
> >
> >> Apps linking old libbpf won't be able to fix their breakage without
> >> updating libbpf. This is especially annoying when you have to support
> > What sort of breakage would be there to fix?
> >
> > Also keep in mind that not all use cases use BPF library's high-level
> > code that does all this fancy log buf manipulations. There are
> > legitimate cases where tools/applications want direct access to
> > log_buf, so needing to do extra feature detection to get rotating mode
> > (but falling back without failing to fixed mode on the old kernel) is
> > just an unnecessary nuisance.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default
  2023-03-30 20:48     ` Andrii Nakryiko
@ 2023-04-05 17:29       ` Lorenz Bauer
  2023-04-05 18:03         ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenz Bauer @ 2023-04-05 17:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, timo,
	robin.goegge, kernel-team

On Thu, Mar 30, 2023 at 9:48 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> So I'm preserving the original behavior, but I also think the original
> behavior makes sense, as it tries to keep valid string contents at all
> times. At least I don't really see much problem with it, do you?

Like I said I find offset calculations hard to follow and they are a
great source of bugs, especially in C. I believe terminating at the
end of the syscall would be easier to understand and less bug prone.

What is the argument for keeping valid string contents during the
syscall? Peeking at the buffer while the syscall is ongoing? How would
that work with a rotating log, where it's not clear where the start
and the end is? Another observation is that during finalization the
buffer is going to be in all sorts of wonky states due to the shuffle
trick, so we're really not losing much. I'll send a prototype of what
I mean.

> Hm... start_pos definitely is necessary due to bpf_vlog_reset() at
> least. Similarly, end_pos can go back on log reset. But second patch
> set simplifies this further, as we keep track of maximum log content
> size we ever reach, and that could be straightforwardly compared to
> log->len_total.

Now that I fiddled with the code more I understand start_pos, sorry
for the noise.

> I'm not following. bpf_vlog_append() would have to distinguish between
> BPF_LOG_FIXED and rotating mode, because in rotating mode you have to
> wrap around the physical end of the buffer.

My idea is to prevent rotation by never appending more than what is
"unused". This means you have to deal with signalling truncation
separately, but your max_len makes that nice. Again I'll try and send
something to illustrate what I mean.

> It's more verbose, so BPF_LOG_FIXED seems more in line with existing
> constants names. But note that this is not part of UAPI, user-space
> won't see/have "BPF_LOG_FIXED" constant.

Ah! How come we don't have UAPI?

> > This isn't really kbuf specific, how about just reverse_buf?
>
> kbuf as opposed to ubuf. Kernel-space manipulations, which don't
> require copy_from_user. I wanted to emphasize this distinction and
> keep symmetry with bpf_vlog_reverse_ubuf().

Ah, right. I think due to naming I assumed that it reverses log->kbuf,
due to symmetry with bpf_vlog_reverse_ubuf.

> Hm.. no, it is the rotation in place. Even if it was in the kernel
> buffer and we wanted to rotate this without creating a second large
> copy of the buffer, we'd have to do this double rotation.

I said that because in kernel space we could do
https://cplusplus.com/reference/algorithm/rotate/

> So each rotation reads each byte once and writes each byte once. So
> two copies. And then the entire buffer is rotated twice (three
> rotating steps, but overall contents is rotated twice), so two reads
> and two writes for each byte, 4 memory copies altogether. Would you
> like me to clarify this some more?

I think explaining it in terms of copies (instead of read / write)
would make it easier to understand!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default
  2023-04-05 17:29       ` Lorenz Bauer
@ 2023-04-05 18:03         ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2023-04-05 18:03 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, timo,
	robin.goegge, kernel-team

On Wed, Apr 5, 2023 at 10:29 AM Lorenz Bauer <lmb@isovalent.com> wrote:
>
> On Thu, Mar 30, 2023 at 9:48 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > So I'm preserving the original behavior, but I also think the original
> > behavior makes sense, as it tries to keep valid string contents at all
> > times. At least I don't really see much problem with it, do you?
>
> Like I said I find offset calculations hard to follow and they are a
> great source of bugs, especially in C. I believe terminating at the
> end of the syscall would be easier to understand and less bug prone.

I mean, form my POV there are no complications to always sending full
strings with zero termination. But I'll check your code and see where
the simplification comes from.

>
> What is the argument for keeping valid string contents during the
> syscall? Peeking at the buffer while the syscall is ongoing? How would

I don't know. This actually helps during debugging kernel itself, as
we know that contents forms valid C string, so debug-dumping this
would be a bit easier. I'm past that point, thankfully, but still.

Another benefit is that even if we forget to finalize properly in one
of the error handling code paths, we still end up with valid C string.
But again, I'll take a look at your code.

> that work with a rotating log, where it's not clear where the start
> and the end is? Another observation is that during finalization the
> buffer is going to be in all sorts of wonky states due to the shuffle
> trick, so we're really not losing much. I'll send a prototype of what
> I mean.

I'll take a look at your prototype, thanks. Not that I'm looking
forward to redoing this part of the code, of course, but oh well, have
to do my due diligence.

>
> > Hm... start_pos definitely is necessary due to bpf_vlog_reset() at
> > least. Similarly, end_pos can go back on log reset. But second patch
> > set simplifies this further, as we keep track of maximum log content
> > size we ever reach, and that could be straightforwardly compared to
> > log->len_total.
>
> Now that I fiddled with the code more I understand start_pos, sorry
> for the noise.
>

No worries, this is part of the reviewing process.

> > I'm not following. bpf_vlog_append() would have to distinguish between
> > BPF_LOG_FIXED and rotating mode, because in rotating mode you have to
> > wrap around the physical end of the buffer.
>
> My idea is to prevent rotation by never appending more than what is
> "unused". This means you have to deal with signalling truncation
> separately, but your max_len makes that nice. Again I'll try and send
> something to illustrate what I mean.
>

Ok.

> > It's more verbose, so BPF_LOG_FIXED seems more in line with existing
> > constants names. But note that this is not part of UAPI, user-space
> > won't see/have "BPF_LOG_FIXED" constant.
>
> Ah! How come we don't have UAPI?
>

Historical reasons? I'm the wrong person to ask.


> > > This isn't really kbuf specific, how about just reverse_buf?
> >
> > kbuf as opposed to ubuf. Kernel-space manipulations, which don't
> > require copy_from_user. I wanted to emphasize this distinction and
> > keep symmetry with bpf_vlog_reverse_ubuf().
>
> Ah, right. I think due to naming I assumed that it reverses log->kbuf,
> due to symmetry with bpf_vlog_reverse_ubuf.
>
> > Hm.. no, it is the rotation in place. Even if it was in the kernel
> > buffer and we wanted to rotate this without creating a second large
> > copy of the buffer, we'd have to do this double rotation.
>
> I said that because in kernel space we could do
> https://cplusplus.com/reference/algorithm/rotate/

Yep, byte-by-byte makes it easier. Blocks make it significantly harder
due to no common alignment.

>
> > So each rotation reads each byte once and writes each byte once. So
> > two copies. And then the entire buffer is rotated twice (three
> > rotating steps, but overall contents is rotated twice), so two reads
> > and two writes for each byte, 4 memory copies altogether. Would you
> > like me to clarify this some more?
>
> I think explaining it in terms of copies (instead of read / write)
> would make it easier to understand!

Yep, I adjusted this comment in last version, mentioning number of
copies, thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-04-05 18:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 23:56 [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
2023-03-28 23:56 ` [PATCH v2 bpf-next 1/6] bpf: split off basic BPF verifier log into separate file Andrii Nakryiko
2023-03-30 17:12   ` Lorenz Bauer
2023-03-28 23:56 ` [PATCH v2 bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer Andrii Nakryiko
2023-03-30 17:12   ` Lorenz Bauer
2023-03-30 20:48     ` Andrii Nakryiko
2023-03-28 23:56 ` [PATCH v2 bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default Andrii Nakryiko
2023-03-30 17:12   ` Lorenz Bauer
2023-03-30 20:48     ` Andrii Nakryiko
2023-04-05 17:29       ` Lorenz Bauer
2023-04-05 18:03         ` Andrii Nakryiko
2023-03-28 23:56 ` [PATCH v2 bpf-next 4/6] libbpf: don't enforce verifier log levels on libbpf side Andrii Nakryiko
2023-03-30 17:12   ` Lorenz Bauer
2023-03-30 21:05     ` Andrii Nakryiko
2023-03-31  9:39       ` Timo Beckers
2023-03-31 15:59         ` Andrii Nakryiko
2023-03-28 23:56 ` [PATCH bpf-next 4/6] libbpf: don't enfore " Andrii Nakryiko
2023-03-28 23:56 ` [PATCH v2 bpf-next 5/6] selftests/bpf: add more veristat control over verifier log options Andrii Nakryiko
2023-03-28 23:56 ` [PATCH v2 bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests Andrii Nakryiko
2023-03-30 17:12   ` Lorenz Bauer
2023-03-30 21:07     ` Andrii Nakryiko
2023-03-28 23:59 ` [PATCH v2 bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
2023-03-30  5:38   ` Alexei Starovoitov
2023-03-30  9:31     ` Lorenz Bauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).