All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/2] introduce bpf_strncmp() helper
@ 2021-11-06 13:28 Hou Tao
  2021-11-06 13:28 ` [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper Hou Tao
  2021-11-06 13:28 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp Hou Tao
  0 siblings, 2 replies; 11+ messages in thread
From: Hou Tao @ 2021-11-06 13:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Song Liu, Daniel Borkmann,
	Andrii Nakryiko, netdev, bpf, houtao1

Hi,

The motivation for introducing bpf_strncmp() helper comes from
two aspects:

(1) clang doesn't always replace strncmp() automatically
    (and don't known why)
In tracing program, sometimes we need to using a home-made
strncmp() to check whether or not the file name is expected.

(2) the performance of home-made strncmp is not so good
As shown in the benchmark of patch #2, the performance of
bpf_strncmp helper is 80% better than home-made strncmp under
x86-64, and 600% better under arm64 thanks to its arch-optimized
strncmp().

But i'm concernt about whether the API of bpf_strncmp() is OK.
Now the first argument must be a read-only null-terminated
string, it is enough for our file-name comparsion case because
the target file name is const and read-only, but may be not
usable for comparsion of two strings stored in writable-maps.

Any comments are welcome.

Regards,
Tao

Hou Tao (2):
  bpf: add bpf_strncmp helper
  selftests/bpf: add benchmark bpf_strcmp

 include/linux/bpf.h                           |   1 +
 include/uapi/linux/bpf.h                      |  11 ++
 kernel/bpf/helpers.c                          |  14 +++
 kernel/trace/bpf_trace.c                      |   2 +
 tools/include/uapi/linux/bpf.h                |  11 ++
 .../bpf/prog_tests/test_strncmp_helper.c      |  75 ++++++++++++
 .../selftests/bpf/progs/strncmp_helper.c      | 109 ++++++++++++++++++
 7 files changed, 223 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_strncmp_helper.c
 create mode 100644 tools/testing/selftests/bpf/progs/strncmp_helper.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper
  2021-11-06 13:28 [RFC PATCH bpf-next 0/2] introduce bpf_strncmp() helper Hou Tao
@ 2021-11-06 13:28 ` Hou Tao
  2021-11-06 19:26   ` Alexei Starovoitov
  2021-11-06 13:28 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp Hou Tao
  1 sibling, 1 reply; 11+ messages in thread
From: Hou Tao @ 2021-11-06 13:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Song Liu, Daniel Borkmann,
	Andrii Nakryiko, netdev, bpf, houtao1

The helper compares two strings: one string is a null-terminated
read-only string, and another one has const max storage size. And
it can be used to compare file name in tracing or LSM program.

We don't check whether or not s2 in bpf_strncmp() is null-terminated,
because its content may be changed by malicous program, and we only
ensure the memory accessed is bounded by s2_sz.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       | 11 +++++++++++
 kernel/bpf/helpers.c           | 14 ++++++++++++++
 kernel/trace/bpf_trace.c       |  2 ++
 tools/include/uapi/linux/bpf.h | 11 +++++++++++
 5 files changed, 39 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2be6dfd68df9..5e8df7bda6c4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2157,6 +2157,7 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
 extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
+extern const struct bpf_func_proto bpf_strncmp_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba5af15e25f5..899274d3906e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4938,6 +4938,16 @@ union bpf_attr {
  *		**-ENOENT** if symbol is not found.
  *
  *		**-EPERM** if caller does not have permission to obtain kernel address.
+ *
+ * long bpf_strncmp(const char *s1, const char *s2, u32 s2_sz)
+ *	Description
+ *		Do strncmp() between **s1** and **s2**. **s1** must be a
+ *		read-only string. **s2_sz** is the maximum storage size	of
+ *		**s2**.
+ *	Return
+ *		Return an integer less than, equal to, or greater than zero
+ *		if the first **s2_sz** bytes of **s2** is found to be
+ *		less than, to match, or be greater than **s1**.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5120,6 +5130,7 @@ union bpf_attr {
 	FN(trace_vprintk),		\
 	FN(skc_to_unix_sock),		\
 	FN(kallsyms_lookup_name),	\
+	FN(strncmp),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1ffd469c217f..e73ebd8e2623 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -565,6 +565,20 @@ const struct bpf_func_proto bpf_strtoul_proto = {
 };
 #endif
 
+BPF_CALL_3(bpf_strncmp, const char *, s1, const char *, s2, size_t, s2_sz)
+{
+	return strncmp(s1, s2, s2_sz);
+}
+
+const struct bpf_func_proto bpf_strncmp_proto = {
+	.func		= bpf_strncmp,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CONST_STR,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+};
+
 BPF_CALL_4(bpf_get_ns_current_pid_tgid, u64, dev, u64, ino,
 	   struct bpf_pidns_info *, nsdata, u32, size)
 {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7396488793ff..37d2a52ddcb4 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1210,6 +1210,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_branch_snapshot_proto;
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
+	case BPF_FUNC_strncmp:
+		return &bpf_strncmp_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ba5af15e25f5..899274d3906e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4938,6 +4938,16 @@ union bpf_attr {
  *		**-ENOENT** if symbol is not found.
  *
  *		**-EPERM** if caller does not have permission to obtain kernel address.
+ *
+ * long bpf_strncmp(const char *s1, const char *s2, u32 s2_sz)
+ *	Description
+ *		Do strncmp() between **s1** and **s2**. **s1** must be a
+ *		read-only string. **s2_sz** is the maximum storage size	of
+ *		**s2**.
+ *	Return
+ *		Return an integer less than, equal to, or greater than zero
+ *		if the first **s2_sz** bytes of **s2** is found to be
+ *		less than, to match, or be greater than **s1**.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5120,6 +5130,7 @@ union bpf_attr {
 	FN(trace_vprintk),		\
 	FN(skc_to_unix_sock),		\
 	FN(kallsyms_lookup_name),	\
+	FN(strncmp),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp
  2021-11-06 13:28 [RFC PATCH bpf-next 0/2] introduce bpf_strncmp() helper Hou Tao
  2021-11-06 13:28 ` [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper Hou Tao
@ 2021-11-06 13:28 ` Hou Tao
  2021-11-06 18:43   ` Alexei Starovoitov
  1 sibling, 1 reply; 11+ messages in thread
From: Hou Tao @ 2021-11-06 13:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Song Liu, Daniel Borkmann,
	Andrii Nakryiko, netdev, bpf, houtao1

The benchmark runs a loop 5000 times. In the loop it reads the file name
from kprobe argument into stack by using bpf_probe_read_kernel_str(),
and compares the file name with a target character or string.

Three cases are compared: only compare one character, compare the whole
string by a home-made strncmp() and compare the whole string by
bpf_strcmp().

The following is the result:

x86-64 host:

one character: 2613499 ns
whole str by strncmp: 2920348 ns
whole str by helper: 2779332 ns

arm64 host:

one character: 3898867 ns
whole str by strncmp: 4396787 ns
whole str by helper: 3968113 ns

Compared with home-made strncmp, the performance of bpf_strncmp helper
improves 80% under x86-64 and 600% under arm64. The big performance win
on arm64 may comes from its arch-optimized strncmp().

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../bpf/prog_tests/test_strncmp_helper.c      |  75 ++++++++++++
 .../selftests/bpf/progs/strncmp_helper.c      | 109 ++++++++++++++++++
 2 files changed, 184 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_strncmp_helper.c
 create mode 100644 tools/testing/selftests/bpf/progs/strncmp_helper.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_strncmp_helper.c b/tools/testing/selftests/bpf/prog_tests/test_strncmp_helper.c
new file mode 100644
index 000000000000..1b48ef3402ff
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_strncmp_helper.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2021. Huawei Technologies Co., Ltd */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <time.h>
+#include <test_progs.h>
+
+#include "strncmp_helper.skel.h"
+
+static void run_strncmp_bench(struct strncmp_helper *skel, const char *name,
+			      int fd, int loop)
+{
+	struct timespec begin, end;
+	struct stat stat;
+	double nsecs;
+	int i;
+
+	skel->bss->equal = 0;
+	clock_gettime(CLOCK_MONOTONIC, &begin);
+	for (i = 0; i < loop; i++)
+		fstat(fd, &stat);
+	clock_gettime(CLOCK_MONOTONIC, &end);
+
+	nsecs = (end.tv_sec - begin.tv_sec) * 1e9 + (end.tv_nsec - begin.tv_nsec);
+	fprintf(stdout, "%s: loop %d nsecs %.0f\n", name, loop, nsecs);
+	fprintf(stdout, "equal nr %u\n", skel->bss->equal);
+}
+
+void test_test_strncmp_helper(void)
+{
+	const char *fpath = "/tmp/1234123412341234123412341234123412341234";
+	struct strncmp_helper *skel;
+	struct bpf_link *link;
+	int fd, loop;
+
+	skel = strncmp_helper__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "helper load"))
+		return;
+
+	fd = open(fpath, O_CREAT | O_RDONLY, 0644);
+	if (!ASSERT_GE(fd, 0, "create file"))
+		goto close_prog;
+
+	loop = 5000;
+	skel->bss->pid = getpid();
+
+	link = bpf_program__attach(skel->progs.vfs_getattr_nocmp);
+	if (!ASSERT_EQ(libbpf_get_error(link), 0, "attach nocmp"))
+		goto clean_file;
+
+	run_strncmp_bench(skel, "nocmp", fd, loop);
+	bpf_link__destroy(link);
+
+	link = bpf_program__attach(skel->progs.vfs_getattr_cmp);
+	if (!ASSERT_EQ(libbpf_get_error(link), 0, "attach cmp"))
+		goto clean_file;
+
+	run_strncmp_bench(skel, "cmp", fd, loop);
+	bpf_link__destroy(link);
+
+	link = bpf_program__attach(skel->progs.vfs_getattr_cmp_v2);
+	if (!ASSERT_EQ(libbpf_get_error(link), 0, "attach cmp_v2"))
+		goto clean_file;
+
+	run_strncmp_bench(skel, "cmp_v2", fd, loop);
+	bpf_link__destroy(link);
+
+clean_file:
+	close(fd);
+	unlink(fpath);
+close_prog:
+	strncmp_helper__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/strncmp_helper.c b/tools/testing/selftests/bpf/progs/strncmp_helper.c
new file mode 100644
index 000000000000..f212c1f50099
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/strncmp_helper.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2021. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define TARGET_FILE_NAME "1234123412341234123412341234123412341234"
+
+char _license[] SEC("license") = "GPL";
+
+__u32 pid = 0;
+__u32 equal = 0;
+
+struct qstr {
+	const char *name;
+} __attribute__((preserve_access_index));
+
+struct dentry {
+	struct qstr d_name;
+} __attribute__((preserve_access_index));
+
+struct path {
+	struct dentry *dentry;
+} __attribute__((preserve_access_index));
+
+static __always_inline int read_file_name(const struct path *path, char *buf,
+					  unsigned int len)
+{
+	const char *name;
+	int err;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != pid)
+		return -1;
+
+	name = BPF_CORE_READ(path, dentry, d_name.name);
+	err = bpf_probe_read_kernel_str(buf, len, name);
+	if (err <= 0)
+		return -1;
+
+	return err;
+}
+
+static __always_inline int local_strncmp(const char *s1, const char *s2,
+					 unsigned int sz)
+{
+	int ret = 0;
+	unsigned int i;
+
+	for (i = 0; i < sz; i++) {
+		ret = s1[i] - s2[i];
+		if (ret || !s1[i])
+			break;
+	}
+
+	return ret;
+}
+
+SEC("kprobe/vfs_getattr")
+int BPF_KPROBE(vfs_getattr_nocmp, const struct path *path)
+{
+	char buf[64] = {0};
+	int err;
+
+	err = read_file_name(path, buf, sizeof(buf));
+	if (err < 0)
+		return 0;
+
+	if (buf[0] == '1')
+		equal++;
+
+	return 0;
+}
+
+SEC("kprobe/vfs_getattr")
+int BPF_KPROBE(vfs_getattr_cmp, const struct path *path)
+{
+	char buf[64] = {0};
+	int err;
+
+	err = read_file_name(path, buf, sizeof(buf));
+	if (err < 0)
+		return 0;
+
+	err = local_strncmp(TARGET_FILE_NAME, buf, sizeof(buf));
+	if (!err)
+		equal++;
+
+	return 0;
+}
+
+SEC("kprobe/vfs_getattr")
+int BPF_KPROBE(vfs_getattr_cmp_v2, const struct path *path)
+{
+	char buf[64] = {0};
+	int err;
+
+	err = read_file_name(path, buf, sizeof(buf));
+	if (err < 0)
+		return 0;
+
+	err = bpf_strncmp(TARGET_FILE_NAME, buf, sizeof(buf));
+	if (!err)
+		equal++;
+
+	return 0;
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp
  2021-11-06 13:28 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp Hou Tao
@ 2021-11-06 18:43   ` Alexei Starovoitov
  2021-11-08 14:05     ` Hou Tao
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2021-11-06 18:43 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, Martin KaFai Lau, Yonghong Song, Song Liu,
	Daniel Borkmann, Andrii Nakryiko, netdev, bpf

On Sat, Nov 06, 2021 at 09:28:22PM +0800, Hou Tao wrote:
> The benchmark runs a loop 5000 times. In the loop it reads the file name
> from kprobe argument into stack by using bpf_probe_read_kernel_str(),
> and compares the file name with a target character or string.
> 
> Three cases are compared: only compare one character, compare the whole
> string by a home-made strncmp() and compare the whole string by
> bpf_strcmp().
> 
> The following is the result:
> 
> x86-64 host:
> 
> one character: 2613499 ns
> whole str by strncmp: 2920348 ns
> whole str by helper: 2779332 ns
> 
> arm64 host:
> 
> one character: 3898867 ns
> whole str by strncmp: 4396787 ns
> whole str by helper: 3968113 ns
> 
> Compared with home-made strncmp, the performance of bpf_strncmp helper
> improves 80% under x86-64 and 600% under arm64. The big performance win
> on arm64 may comes from its arch-optimized strncmp().

80% and 600% improvement?!
I don't understand how this math works.

Why one char is barely different in total nsec than the whole string?
The string shouldn't miscompare on the first char as far as I understand the test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper
  2021-11-06 13:28 ` [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper Hou Tao
@ 2021-11-06 19:26   ` Alexei Starovoitov
  2021-11-06 20:07     ` Andrii Nakryiko
  2021-11-08 13:45     ` Hou Tao
  0 siblings, 2 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2021-11-06 19:26 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, Martin KaFai Lau, Yonghong Song, Song Liu,
	Daniel Borkmann, Andrii Nakryiko, netdev, bpf

On Sat, Nov 06, 2021 at 09:28:21PM +0800, Hou Tao wrote:
> The helper compares two strings: one string is a null-terminated
> read-only string, and another one has const max storage size. And
> it can be used to compare file name in tracing or LSM program.
> 
> We don't check whether or not s2 in bpf_strncmp() is null-terminated,
> because its content may be changed by malicous program, and we only
> ensure the memory accessed is bounded by s2_sz.

I think "malicous" adjective is unnecessary and misleading.
It's also misspelled.
Just mention that 2nd argument doesn't have to be null terminated.

> + * long bpf_strncmp(const char *s1, const char *s2, u32 s2_sz)
...
> +BPF_CALL_3(bpf_strncmp, const char *, s1, const char *, s2, size_t, s2_sz)

probably should match u32 instead of size_t.

> @@ -1210,6 +1210,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_get_branch_snapshot_proto;
>  	case BPF_FUNC_trace_vprintk:
>  		return bpf_get_trace_vprintk_proto();
> +	case BPF_FUNC_strncmp:
> +		return &bpf_strncmp_proto;

why tracing only?
Should probably be in bpf_base_func_proto.

I was thinking whether the proto could be:
long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
but I think your version is better though having const string as 1st arg
is a bit odd in normal C.

Would it make sense to add bpf_memchr as well while we are at it?
And
static inline bpf_strnlen(const char *s, u32 sz)
{
  return bpf_memchr(s, sz, 0);
}
to bpf_helpers.h ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper
  2021-11-06 19:26   ` Alexei Starovoitov
@ 2021-11-06 20:07     ` Andrii Nakryiko
  2021-11-06 20:32       ` Alexei Starovoitov
  2021-11-08 13:45     ` Hou Tao
  1 sibling, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2021-11-06 20:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hou Tao, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
	Song Liu, Daniel Borkmann, Andrii Nakryiko, Networking, bpf

On Sat, Nov 6, 2021 at 12:26 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Nov 06, 2021 at 09:28:21PM +0800, Hou Tao wrote:
> > The helper compares two strings: one string is a null-terminated
> > read-only string, and another one has const max storage size. And
> > it can be used to compare file name in tracing or LSM program.
> >
> > We don't check whether or not s2 in bpf_strncmp() is null-terminated,
> > because its content may be changed by malicous program, and we only
> > ensure the memory accessed is bounded by s2_sz.
>
> I think "malicous" adjective is unnecessary and misleading.
> It's also misspelled.
> Just mention that 2nd argument doesn't have to be null terminated.
>
> > + * long bpf_strncmp(const char *s1, const char *s2, u32 s2_sz)
> ...
> > +BPF_CALL_3(bpf_strncmp, const char *, s1, const char *, s2, size_t, s2_sz)
>
> probably should match u32 instead of size_t.
>
> > @@ -1210,6 +1210,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >               return &bpf_get_branch_snapshot_proto;
> >       case BPF_FUNC_trace_vprintk:
> >               return bpf_get_trace_vprintk_proto();
> > +     case BPF_FUNC_strncmp:
> > +             return &bpf_strncmp_proto;
>
> why tracing only?
> Should probably be in bpf_base_func_proto.
>
> I was thinking whether the proto could be:
> long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
> but I think your version is better though having const string as 1st arg
> is a bit odd in normal C.

Why do you think it's better? This is equivalent to `123 == x` if it
was integer comparison, so it feels like bpf_strncmp(s, sz, "blah") is
indeed more natural. No big deal, just curious what's better about it.

>
> Would it make sense to add bpf_memchr as well while we are at it?
> And
> static inline bpf_strnlen(const char *s, u32 sz)
> {
>   return bpf_memchr(s, sz, 0);
> }
> to bpf_helpers.h ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper
  2021-11-06 20:07     ` Andrii Nakryiko
@ 2021-11-06 20:32       ` Alexei Starovoitov
  2021-11-08 14:08         ` Hou Tao
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2021-11-06 20:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Hou Tao, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
	Song Liu, Daniel Borkmann, Andrii Nakryiko, Networking, bpf

On Sat, Nov 6, 2021 at 1:07 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Nov 6, 2021 at 12:26 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Sat, Nov 06, 2021 at 09:28:21PM +0800, Hou Tao wrote:
> > > The helper compares two strings: one string is a null-terminated
> > > read-only string, and another one has const max storage size. And
> > > it can be used to compare file name in tracing or LSM program.
> > >
> > > We don't check whether or not s2 in bpf_strncmp() is null-terminated,
> > > because its content may be changed by malicous program, and we only
> > > ensure the memory accessed is bounded by s2_sz.
> >
> > I think "malicous" adjective is unnecessary and misleading.
> > It's also misspelled.
> > Just mention that 2nd argument doesn't have to be null terminated.
> >
> > > + * long bpf_strncmp(const char *s1, const char *s2, u32 s2_sz)
> > ...
> > > +BPF_CALL_3(bpf_strncmp, const char *, s1, const char *, s2, size_t, s2_sz)
> >
> > probably should match u32 instead of size_t.
> >
> > > @@ -1210,6 +1210,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > >               return &bpf_get_branch_snapshot_proto;
> > >       case BPF_FUNC_trace_vprintk:
> > >               return bpf_get_trace_vprintk_proto();
> > > +     case BPF_FUNC_strncmp:
> > > +             return &bpf_strncmp_proto;
> >
> > why tracing only?
> > Should probably be in bpf_base_func_proto.
> >
> > I was thinking whether the proto could be:
> > long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
> > but I think your version is better though having const string as 1st arg
> > is a bit odd in normal C.
>
> Why do you think it's better? This is equivalent to `123 == x` if it
> was integer comparison, so it feels like bpf_strncmp(s, sz, "blah") is
> indeed more natural. No big deal, just curious what's better about it.

Only that helper implementation has two less register moves.
which makes it 51%/49% win for me.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper
  2021-11-06 19:26   ` Alexei Starovoitov
  2021-11-06 20:07     ` Andrii Nakryiko
@ 2021-11-08 13:45     ` Hou Tao
  1 sibling, 0 replies; 11+ messages in thread
From: Hou Tao @ 2021-11-08 13:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Martin KaFai Lau, Yonghong Song, Song Liu,
	Daniel Borkmann, Andrii Nakryiko, netdev, bpf

Hi,

On 11/7/2021 3:26 AM, Alexei Starovoitov wrote:
> On Sat, Nov 06, 2021 at 09:28:21PM +0800, Hou Tao wrote:
>> The helper compares two strings: one string is a null-terminated
>> read-only string, and another one has const max storage size. And
>> it can be used to compare file name in tracing or LSM program.
>>
>> We don't check whether or not s2 in bpf_strncmp() is null-terminated,
>> because its content may be changed by malicous program, and we only
>> ensure the memory accessed is bounded by s2_sz.
> I think "malicous" adjective is unnecessary and misleading.
> It's also misspelled.
> Just mention that 2nd argument doesn't have to be null terminated.
Will do.
>> + * long bpf_strncmp(const char *s1, const char *s2, u32 s2_sz)
> ...
>> +BPF_CALL_3(bpf_strncmp, const char *, s1, const char *, s2, size_t, s2_sz)
> probably should match u32 instead of size_t.
Yes, will use u32 instead. I forgot to synchronize between these two declarations.
>
>> @@ -1210,6 +1210,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  		return &bpf_get_branch_snapshot_proto;
>>  	case BPF_FUNC_trace_vprintk:
>>  		return bpf_get_trace_vprintk_proto();
>> +	case BPF_FUNC_strncmp:
>> +		return &bpf_strncmp_proto;
> why tracing only?
> Should probably be in bpf_base_func_proto.
Because in our use case, bpf_strncmp() is only used by tracing program, but moving
it to bpf_base_func_proto() incurs no harm, so will do it.
>
> I was thinking whether the proto could be:
> long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
> but I think your version is better though having const string as 1st arg
> is a bit odd in normal C.
>
> Would it make sense to add bpf_memchr as well while we are at it?
> And
> static inline bpf_strnlen(const char *s, u32 sz)
> {
>   return bpf_memchr(s, sz, 0);
> }
> to bpf_helpers.h ?
> .
It is OK to add it, although I don't have a use case for it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp
  2021-11-06 18:43   ` Alexei Starovoitov
@ 2021-11-08 14:05     ` Hou Tao
  2021-11-08 18:00       ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Hou Tao @ 2021-11-08 14:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Martin KaFai Lau, Yonghong Song, Song Liu,
	Daniel Borkmann, Andrii Nakryiko, netdev, bpf

HI,

On 11/7/2021 2:43 AM, Alexei Starovoitov wrote:
> On Sat, Nov 06, 2021 at 09:28:22PM +0800, Hou Tao wrote:
>> The benchmark runs a loop 5000 times. In the loop it reads the file name
>> from kprobe argument into stack by using bpf_probe_read_kernel_str(),
>> and compares the file name with a target character or string.
>>
>> Three cases are compared: only compare one character, compare the whole
>> string by a home-made strncmp() and compare the whole string by
>> bpf_strcmp().
>>
>> The following is the result:
>>
>> x86-64 host:
>>
>> one character: 2613499 ns
>> whole str by strncmp: 2920348 ns
>> whole str by helper: 2779332 ns
>>
>> arm64 host:
>>
>> one character: 3898867 ns
>> whole str by strncmp: 4396787 ns
>> whole str by helper: 3968113 ns
>>
>> Compared with home-made strncmp, the performance of bpf_strncmp helper
>> improves 80% under x86-64 and 600% under arm64. The big performance win
>> on arm64 may comes from its arch-optimized strncmp().
> 80% and 600% improvement?!
> I don't understand how this math works.
> Why one char is barely different in total nsec than the whole string?
> The string shouldn't miscompare on the first char as far as I understand the test.
Because the result of "one character" includes the overhead of process filtering and
string read.
My bad, I should explain the tests results in more details.

Three tests are exercised:

(1) one character
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Only compare the first character of file name

(2) whole str by strncmp
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Compare by using home-made strncmp(): the compared two strings are the same, so
the whole string is compared

(3) whole str by helper
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Compare by using bpf_strncmp: the compared two strings are the same, so
the whole string is compared

Now "(1) one character" is used to calculate the overhead of process filtering and
string read. So under x86-64, the overhead of strncmp() is

  total time of whole str by strncmp  test  - total time of no character test =
306849 ns.

The overhead of bpf_strncmp() is:
  total time of whole str by helper test - total time of no character test =
165833 ns

So the performance win is about (306849  / 165833 ) * 100 - 100 = ~85%

And the win under arm64 is about (497920 / 69246) * 100 - 100 = ~600%

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper
  2021-11-06 20:32       ` Alexei Starovoitov
@ 2021-11-08 14:08         ` Hou Tao
  0 siblings, 0 replies; 11+ messages in thread
From: Hou Tao @ 2021-11-08 14:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko
  Cc: Martin KaFai Lau, Yonghong Song, Song Liu, Daniel Borkmann,
	Networking, bpf

Hi,

On 11/7/2021 4:32 AM, Alexei Starovoitov wrote:
> On Sat, Nov 6, 2021 at 1:07 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
snip
>>> I was thinking whether the proto could be:
>>> long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
>>> but I think your version is better though having const string as 1st arg
>>> is a bit odd in normal C.
>> Why do you think it's better? This is equivalent to `123 == x` if it
>> was integer comparison, so it feels like bpf_strncmp(s, sz, "blah") is
>> indeed more natural. No big deal, just curious what's better about it.
> Only that helper implementation has two less register moves.
> which makes it 51%/49% win for me.
> .
I agree with Andrii that bpf_strncmp(s, sz, "blah") is more nature. I can run
some simple benchmarks to show whether or not the difference matters.

Regards,
Tao.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp
  2021-11-08 14:05     ` Hou Tao
@ 2021-11-08 18:00       ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2021-11-08 18:00 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, Alexei Starovoitov, Martin KaFai Lau,
	Yonghong Song, Song Liu, Daniel Borkmann, Andrii Nakryiko,
	Networking, bpf

On Mon, Nov 8, 2021 at 6:05 AM Hou Tao <houtao1@huawei.com> wrote:
>
> HI,
>
> On 11/7/2021 2:43 AM, Alexei Starovoitov wrote:
> > On Sat, Nov 06, 2021 at 09:28:22PM +0800, Hou Tao wrote:
> >> The benchmark runs a loop 5000 times. In the loop it reads the file name
> >> from kprobe argument into stack by using bpf_probe_read_kernel_str(),
> >> and compares the file name with a target character or string.
> >>
> >> Three cases are compared: only compare one character, compare the whole
> >> string by a home-made strncmp() and compare the whole string by
> >> bpf_strcmp().
> >>
> >> The following is the result:
> >>
> >> x86-64 host:
> >>
> >> one character: 2613499 ns
> >> whole str by strncmp: 2920348 ns
> >> whole str by helper: 2779332 ns
> >>
> >> arm64 host:
> >>
> >> one character: 3898867 ns
> >> whole str by strncmp: 4396787 ns
> >> whole str by helper: 3968113 ns
> >>
> >> Compared with home-made strncmp, the performance of bpf_strncmp helper
> >> improves 80% under x86-64 and 600% under arm64. The big performance win
> >> on arm64 may comes from its arch-optimized strncmp().
> > 80% and 600% improvement?!
> > I don't understand how this math works.
> > Why one char is barely different in total nsec than the whole string?
> > The string shouldn't miscompare on the first char as far as I understand the test.
> Because the result of "one character" includes the overhead of process filtering and
> string read.
> My bad, I should explain the tests results in more details.

Maybe use bench framework for your benchmark? It allows to setup the
benchmark and collect measurements in a more structured way. Check
some existing benchmarks under benchs/ in selftests/bpf directory.

To actually test just bpf_strncmp() don't add
bpf_probe_read_kernel_str() into the loop logic, set your data in
global variable and just search it. This will give you more accurate
microbenchmark data.

>
> Three tests are exercised:
>
> (1) one character
> Filter unexpected caller by bpf_get_current_pid_tgid()
> Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
> in stack
> Only compare the first character of file name
>
> (2) whole str by strncmp
> Filter unexpected caller by bpf_get_current_pid_tgid()
> Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
> in stack
> Compare by using home-made strncmp(): the compared two strings are the same, so
> the whole string is compared
>
> (3) whole str by helper
> Filter unexpected caller by bpf_get_current_pid_tgid()
> Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
> in stack
> Compare by using bpf_strncmp: the compared two strings are the same, so
> the whole string is compared
>
> Now "(1) one character" is used to calculate the overhead of process filtering and
> string read. So under x86-64, the overhead of strncmp() is
>
>   total time of whole str by strncmp  test  - total time of no character test =
> 306849 ns.
>
> The overhead of bpf_strncmp() is:
>   total time of whole str by helper test - total time of no character test =
> 165833 ns
>
> So the performance win is about (306849  / 165833 ) * 100 - 100 = ~85%
>
> And the win under arm64 is about (497920 / 69246) * 100 - 100 = ~600%

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-08 18:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-06 13:28 [RFC PATCH bpf-next 0/2] introduce bpf_strncmp() helper Hou Tao
2021-11-06 13:28 ` [RFC PATCH bpf-next 1/2] bpf: add bpf_strncmp helper Hou Tao
2021-11-06 19:26   ` Alexei Starovoitov
2021-11-06 20:07     ` Andrii Nakryiko
2021-11-06 20:32       ` Alexei Starovoitov
2021-11-08 14:08         ` Hou Tao
2021-11-08 13:45     ` Hou Tao
2021-11-06 13:28 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp Hou Tao
2021-11-06 18:43   ` Alexei Starovoitov
2021-11-08 14:05     ` Hou Tao
2021-11-08 18:00       ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.