linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
@ 2019-04-05 19:35 Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 01/21] bpf: Add base proto function for cgroup-bpf programs Andrey Ignatov
                   ` (23 more replies)
  0 siblings, 24 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

v2->v3:
- simplify C based selftests by relying on variable offset stack access.

v1->v2:
- add fs/proc/proc_sysctl.c mainteners to Cc:.

The patch set introduces new BPF hook for sysctl.

It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
BPF_CGROUP_SYSCTL.

BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
that accesses (read/write) to sysctl can be controlled for specific cgroup
and either allowed or denied, or traced.

The hook has access to sysctl name, current sysctl value and (on write
only) to new sysctl value via corresponding helpers. New sysctl value can
be overridden by program. Both name and values (current/new) are
represented as strings same way they're visible in /proc/sys/. It is up to
program to parse these strings.

To help with parsing the most common kind of sysctl value, vector of
integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
semantic similar to user space strtol(3) and strtoul(3).

The hook also provides bpf_sysctl context with two fields:
* @write indicates whether sysctl is being read (= 0) or written (= 1);
* @file_pos is sysctl file position to read from or write to, can be
  overridden.

The hook allows to make better isolation for containerized applications
that are run as root so that one container can't change a sysctl and affect
all other containers on a host, make changes to allowed sysctl in a safer
way and simplify sysctl tracing for cgroups.

Patch 1 is preliminary refactoring.
Patch 2 adds new program and attach types.
Patches 3-5 implement helpers to access sysctl name and value.
Patch 6 adds file_pos field to bpf_sysctl context.
Patch 7 updates UAPI in tools.
Patches 8-9 add support for the new hook to libbpf and corresponding test.
Patches 10-14 add selftests for the new hook.
Patch 15 adds support for new arg types to verifier: pointer to integer.
Patch 16 adds bpf_strto{l,ul} helpers to parse integers from sysctl value.
Patch 17 updates UAPI in tools.
Patch 18 updates bpf_helpers.h.
Patch 19 adds selftests for pointer to integer in verifier.
Patches 20-21 add selftests for bpf_strto{l,ul}, including integration
              C based test for sysctl value parsing.


Andrey Ignatov (21):
  bpf: Add base proto function for cgroup-bpf programs
  bpf: Sysctl hook
  bpf: Introduce bpf_sysctl_get_name helper
  bpf: Introduce bpf_sysctl_get_current_value helper
  bpf: Introduce bpf_sysctl_{get,set}_new_value helpers
  bpf: Add file_pos field to bpf_sysctl ctx
  bpf: Sync bpf.h to tools/
  libbpf: Support sysctl hook
  selftests/bpf: Test sysctl section name
  selftests/bpf: Test BPF_CGROUP_SYSCTL
  selftests/bpf: Test bpf_sysctl_get_name helper
  selftests/bpf: Test sysctl_get_current_value helper
  selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers
  selftests/bpf: Test file_pos field in bpf_sysctl ctx
  bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types
  bpf: Introduce bpf_strtol and bpf_strtoul helpers
  bpf: Sync bpf.h to tools/
  selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h
  selftests/bpf: Test ARG_PTR_TO_LONG arg type
  selftests/bpf: Test bpf_strtol and bpf_strtoul helpers
  selftests/bpf: C based test for sysctl and strtoX

 fs/proc/proc_sysctl.c                         |   25 +-
 include/linux/bpf-cgroup.h                    |   21 +
 include/linux/bpf.h                           |    4 +
 include/linux/bpf_types.h                     |    1 +
 include/linux/filter.h                        |   16 +
 include/uapi/linux/bpf.h                      |  139 +-
 kernel/bpf/cgroup.c                           |  364 +++-
 kernel/bpf/helpers.c                          |  131 ++
 kernel/bpf/syscall.c                          |    7 +
 kernel/bpf/verifier.c                         |   30 +
 tools/include/uapi/linux/bpf.h                |  139 +-
 tools/lib/bpf/libbpf.c                        |    3 +
 tools/lib/bpf/libbpf_probes.c                 |    1 +
 tools/testing/selftests/bpf/Makefile          |    3 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   19 +
 .../selftests/bpf/progs/test_sysctl_prog.c    |   70 +
 .../selftests/bpf/test_section_names.c        |    5 +
 tools/testing/selftests/bpf/test_sysctl.c     | 1567 +++++++++++++++++
 .../testing/selftests/bpf/verifier/int_ptr.c  |  160 ++
 19 files changed, 2697 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sysctl_prog.c
 create mode 100644 tools/testing/selftests/bpf/test_sysctl.c
 create mode 100644 tools/testing/selftests/bpf/verifier/int_ptr.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 01/21] bpf: Add base proto function for cgroup-bpf programs
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 02/21] bpf: Sysctl hook Andrey Ignatov
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Currently kernel/bpf/cgroup.c contains only one program type and one
proto function cgroup_dev_func_proto(). It'd be useful to have base
proto function that can be reused for new cgroup-bpf program types
coming soon.

Introduce cgroup_base_func_proto().

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 kernel/bpf/cgroup.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4e807973aa80..f6cd38746df2 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -701,7 +701,7 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 EXPORT_SYMBOL(__cgroup_bpf_check_dev_permission);
 
 static const struct bpf_func_proto *
-cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -725,6 +725,12 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	}
 }
 
+static const struct bpf_func_proto *
+cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	return cgroup_base_func_proto(func_id, prog);
+}
+
 static bool cgroup_dev_is_valid_access(int off, int size,
 				       enum bpf_access_type type,
 				       const struct bpf_prog *prog,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 02/21] bpf: Sysctl hook
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 01/21] bpf: Add base proto function for cgroup-bpf programs Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-09 16:54   ` Kees Cook
  2019-04-05 19:35 ` [PATCH v3 bpf-next 03/21] bpf: Introduce bpf_sysctl_get_name helper Andrey Ignatov
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Containerized applications may run as root and it may create problems
for whole host. Specifically such applications may change a sysctl and
affect applications in other containers.

Furthermore in existing infrastructure it may not be possible to just
completely disable writing to sysctl, instead such a process should be
gradual with ability to log what sysctl are being changed by a
container, investigate, limit the set of writable sysctl to currently
used ones (so that new ones can not be changed) and eventually reduce
this set to zero.

The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and
attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis.

New program type has access to following minimal context:
	struct bpf_sysctl {
		__u32	write;
	};

Where @write indicates whether sysctl is being read (= 0) or written (=
1).

Helpers to access sysctl name and value will be introduced separately.

BPF_CGROUP_SYSCTL attach point is added to sysctl code right before
passing control to ctl_table->proc_handler so that BPF program can
either allow or deny access to sysctl.

Suggested-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 fs/proc/proc_sysctl.c      |  5 +++
 include/linux/bpf-cgroup.h | 18 ++++++++
 include/linux/bpf_types.h  |  1 +
 include/linux/filter.h     |  8 ++++
 include/uapi/linux/bpf.h   |  9 ++++
 kernel/bpf/cgroup.c        | 92 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c       |  7 +++
 kernel/bpf/verifier.c      |  1 +
 8 files changed, 141 insertions(+)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 4d598a399bbf..72f4a096c146 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -13,6 +13,7 @@
 #include <linux/namei.h>
 #include <linux/mm.h>
 #include <linux/module.h>
+#include <linux/bpf-cgroup.h>
 #include "internal.h"
 
 static const struct dentry_operations proc_sys_dentry_operations;
@@ -588,6 +589,10 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	if (!table->proc_handler)
 		goto out;
 
+	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write);
+	if (error)
+		goto out;
+
 	/* careful: calling conventions are nasty here */
 	res = count;
 	error = table->proc_handler(table, write, buf, &res, ppos);
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index a4c644c1c091..b1c45da20a26 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -17,6 +17,8 @@ struct bpf_map;
 struct bpf_prog;
 struct bpf_sock_ops_kern;
 struct bpf_cgroup_storage;
+struct ctl_table;
+struct ctl_table_header;
 
 #ifdef CONFIG_CGROUP_BPF
 
@@ -109,6 +111,10 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 				      short access, enum bpf_attach_type type);
 
+int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
+				   struct ctl_table *table, int write,
+				   enum bpf_attach_type type);
+
 static inline enum bpf_cgroup_storage_type cgroup_storage_type(
 	struct bpf_map *map)
 {
@@ -253,6 +259,17 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
 									      \
 	__ret;								      \
 })
+
+
+#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write)			       \
+({									       \
+	int __ret = 0;							       \
+	if (cgroup_bpf_enabled)						       \
+		__ret = __cgroup_bpf_run_filter_sysctl(head, table, write,     \
+						       BPF_CGROUP_SYSCTL);     \
+	__ret;								       \
+})
+
 int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 			   enum bpf_prog_type ptype, struct bpf_prog *prog);
 int cgroup_bpf_prog_detach(const union bpf_attr *attr,
@@ -321,6 +338,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
 #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write) ({ 0; })
 
 #define for_each_cgroup_storage_type(stype) for (; false; )
 
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 08bf2f1fe553..d26991a16894 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -28,6 +28,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 #endif
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
+BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl)
 #endif
 #ifdef CONFIG_BPF_LIRC_MODE2
 BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6074aa064b54..a17732057880 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -33,6 +33,8 @@ struct bpf_prog_aux;
 struct xdp_rxq_info;
 struct xdp_buff;
 struct sock_reuseport;
+struct ctl_table;
+struct ctl_table_header;
 
 /* ArgX, context and stack frame pointer register positions. Note,
  * Arg1, Arg2, Arg3, etc are used as argument mappings of function
@@ -1177,4 +1179,10 @@ struct bpf_sock_ops_kern {
 					 */
 };
 
+struct bpf_sysctl_kern {
+	struct ctl_table_header *head;
+	struct ctl_table *table;
+	int write;
+};
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 837024512baf..4cfda9c16327 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -166,6 +166,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LIRC_MODE2,
 	BPF_PROG_TYPE_SK_REUSEPORT,
 	BPF_PROG_TYPE_FLOW_DISSECTOR,
+	BPF_PROG_TYPE_CGROUP_SYSCTL,
 };
 
 enum bpf_attach_type {
@@ -187,6 +188,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_UDP6_SENDMSG,
 	BPF_LIRC_MODE2,
 	BPF_FLOW_DISSECTOR,
+	BPF_CGROUP_SYSCTL,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3275,4 +3277,11 @@ struct bpf_line_info {
 struct bpf_spin_lock {
 	__u32	val;
 };
+
+struct bpf_sysctl {
+	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
+				 * Allows 1,2,4-byte read, but no write.
+				 */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index f6cd38746df2..610491b5f0aa 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -11,7 +11,9 @@
 #include <linux/kernel.h>
 #include <linux/atomic.h>
 #include <linux/cgroup.h>
+#include <linux/filter.h>
 #include <linux/slab.h>
+#include <linux/sysctl.h>
 #include <linux/bpf.h>
 #include <linux/bpf-cgroup.h>
 #include <net/sock.h>
@@ -768,3 +770,93 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
 	.get_func_proto		= cgroup_dev_func_proto,
 	.is_valid_access	= cgroup_dev_is_valid_access,
 };
+
+/**
+ * __cgroup_bpf_run_filter_sysctl - Run a program on sysctl
+ *
+ * @head: sysctl table header
+ * @table: sysctl table
+ * @write: sysctl is being read (= 0) or written (= 1)
+ * @type: type of program to be executed
+ *
+ * Program is run when sysctl is being accessed, either read or written, and
+ * can allow or deny such access.
+ *
+ * This function will return %-EPERM if an attached program is found and
+ * returned value != 1 during execution. In all other cases 0 is returned.
+ */
+int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
+				   struct ctl_table *table, int write,
+				   enum bpf_attach_type type)
+{
+	struct bpf_sysctl_kern ctx = {
+		.head = head,
+		.table = table,
+		.write = write,
+	};
+	struct cgroup *cgrp;
+	int ret;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
+	rcu_read_unlock();
+
+	return ret == 1 ? 0 : -EPERM;
+}
+EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl);
+
+static const struct bpf_func_proto *
+sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	return cgroup_base_func_proto(func_id, prog);
+}
+
+static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type,
+				   const struct bpf_prog *prog,
+				   struct bpf_insn_access_aux *info)
+{
+	const int size_default = sizeof(__u32);
+
+	if (off < 0 || off + size > sizeof(struct bpf_sysctl) ||
+	    off % size || type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case offsetof(struct bpf_sysctl, write):
+		bpf_ctx_record_field_size(info, size_default);
+		return bpf_ctx_narrow_access_ok(off, size, size_default);
+	default:
+		return false;
+	}
+}
+
+static u32 sysctl_convert_ctx_access(enum bpf_access_type type,
+				     const struct bpf_insn *si,
+				     struct bpf_insn *insn_buf,
+				     struct bpf_prog *prog, u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (si->off) {
+	case offsetof(struct bpf_sysctl, write):
+		*insn++ = BPF_LDX_MEM(
+			BPF_SIZE(si->code), si->dst_reg, si->src_reg,
+			bpf_target_off(struct bpf_sysctl_kern, write,
+				       FIELD_SIZEOF(struct bpf_sysctl_kern,
+						    write),
+				       target_size));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_verifier_ops cg_sysctl_verifier_ops = {
+	.get_func_proto		= sysctl_func_proto,
+	.is_valid_access	= sysctl_is_valid_access,
+	.convert_ctx_access	= sysctl_convert_ctx_access,
+};
+
+const struct bpf_prog_ops cg_sysctl_prog_ops = {
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1d65e56594db..0b8cefcc1f38 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1828,6 +1828,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_FLOW_DISSECTOR:
 		ptype = BPF_PROG_TYPE_FLOW_DISSECTOR;
 		break;
+	case BPF_CGROUP_SYSCTL:
+		ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1906,6 +1909,9 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		return lirc_prog_detach(attr);
 	case BPF_FLOW_DISSECTOR:
 		return skb_flow_dissector_bpf_prog_detach(attr);
+	case BPF_CGROUP_SYSCTL:
+		ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1939,6 +1945,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_UDP6_SENDMSG:
 	case BPF_CGROUP_SOCK_OPS:
 	case BPF_CGROUP_DEVICE:
+	case BPF_CGROUP_SYSCTL:
 		break;
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 48718e1da16d..56baf21b57c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5219,6 +5219,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 	case BPF_PROG_TYPE_SOCK_OPS:
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 		break;
 	default:
 		return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 03/21] bpf: Introduce bpf_sysctl_get_name helper
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 01/21] bpf: Add base proto function for cgroup-bpf programs Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 02/21] bpf: Sysctl hook Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 04/21] bpf: Introduce bpf_sysctl_get_current_value helper Andrey Ignatov
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add bpf_sysctl_get_name() helper to copy sysctl name (/proc/sys/ entry)
into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.

By default full name (w/o /proc/sys/) is copied, e.g. "net/ipv4/tcp_mem".

If BPF_F_SYSCTL_BASE_NAME flag is set, only base name will be copied,
e.g. "tcp_mem".

Documentation for the new helper is provided in bpf.h UAPI.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 include/uapi/linux/bpf.h | 22 ++++++++++++-
 kernel/bpf/cgroup.c      | 70 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4cfda9c16327..c9e8a1f22c14 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2479,6 +2479,22 @@ union bpf_attr {
  * 	Return
  * 		0 if iph and th are a valid SYN cookie ACK, or a negative error
  * 		otherwise.
+ *
+ * int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t buf_len, u64 flags)
+ *	Description
+ *		Get name of sysctl in /proc/sys/ and copy it into provided by
+ *		program buffer *buf* of size *buf_len*.
+ *
+ *		The buffer is always NUL terminated, unless it's zero-sized.
+ *
+ *		If *flags* is zero, full name (e.g. "net/ipv4/tcp_mem") is
+ *		copied. Use **BPF_F_SYSCTL_BASE_NAME** flag to copy base name
+ *		only (e.g. "tcp_mem").
+ *	Return
+ *		Number of character copied (not including the trailing NUL).
+ *
+ *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
+ *		truncated name in this case).
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2581,7 +2597,8 @@ union bpf_attr {
 	FN(skb_ecn_set_ce),		\
 	FN(get_listener_sock),		\
 	FN(skc_lookup_tcp),		\
-	FN(tcp_check_syncookie),
+	FN(tcp_check_syncookie),	\
+	FN(sysctl_get_name),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2648,6 +2665,9 @@ enum bpf_func_id {
 #define BPF_F_ADJ_ROOM_ENCAP_L4_GRE	(1ULL << 3)
 #define BPF_F_ADJ_ROOM_ENCAP_L4_UDP	(1ULL << 4)
 
+/* BPF_FUNC_sysctl_get_name flags. */
+#define BPF_F_SYSCTL_BASE_NAME		(1ULL << 0)
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 610491b5f0aa..a68387043244 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -14,6 +14,7 @@
 #include <linux/filter.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/string.h>
 #include <linux/bpf.h>
 #include <linux/bpf-cgroup.h>
 #include <net/sock.h>
@@ -806,10 +807,77 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl);
 
+static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp,
+			      size_t *lenp)
+{
+	ssize_t tmp_ret = 0, ret;
+
+	if (dir->header.parent) {
+		tmp_ret = sysctl_cpy_dir(dir->header.parent, bufp, lenp);
+		if (tmp_ret < 0)
+			return tmp_ret;
+	}
+
+	ret = strscpy(*bufp, dir->header.ctl_table[0].procname, *lenp);
+	if (ret < 0)
+		return ret;
+	*bufp += ret;
+	*lenp -= ret;
+	ret += tmp_ret;
+
+	/* Avoid leading slash. */
+	if (!ret)
+		return ret;
+
+	tmp_ret = strscpy(*bufp, "/", *lenp);
+	if (tmp_ret < 0)
+		return tmp_ret;
+	*bufp += tmp_ret;
+	*lenp -= tmp_ret;
+
+	return ret + tmp_ret;
+}
+
+BPF_CALL_4(bpf_sysctl_get_name, struct bpf_sysctl_kern *, ctx, char *, buf,
+	   size_t, buf_len, u64, flags)
+{
+	ssize_t tmp_ret = 0, ret;
+
+	if (!buf)
+		return -EINVAL;
+
+	if (!(flags & BPF_F_SYSCTL_BASE_NAME)) {
+		if (!ctx->head)
+			return -EINVAL;
+		tmp_ret = sysctl_cpy_dir(ctx->head->parent, &buf, &buf_len);
+		if (tmp_ret < 0)
+			return tmp_ret;
+	}
+
+	ret = strscpy(buf, ctx->table->procname, buf_len);
+
+	return ret < 0 ? ret : tmp_ret + ret;
+}
+
+static const struct bpf_func_proto bpf_sysctl_get_name_proto = {
+	.func		= bpf_sysctl_get_name,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return cgroup_base_func_proto(func_id, prog);
+	switch (func_id) {
+	case BPF_FUNC_sysctl_get_name:
+		return &bpf_sysctl_get_name_proto;
+	default:
+		return cgroup_base_func_proto(func_id, prog);
+	}
 }
 
 static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 04/21] bpf: Introduce bpf_sysctl_get_current_value helper
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (2 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 03/21] bpf: Introduce bpf_sysctl_get_name helper Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 05/21] bpf: Introduce bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add bpf_sysctl_get_current_value() helper to copy current sysctl value
into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.

It provides same string as user space can see by reading corresponding
file in /proc/sys/, including new line, etc.

Documentation for the new helper is provided in bpf.h UAPI.

Since current value is kept in ctl_table->data in a parsed form,
ctl_table->proc_handler() with write=0 is called to read that data and
convert it to a string. Such a string can later be parsed by a program
using helpers that will be introduced separately.

Unfortunately it's not trivial to provide API to access parsed data due to
variety of data representations (string, intvec, uintvec, ulongvec,
custom structures, even NULL, etc). Instead it's assumed that user know
how to handle specific sysctl they're interested in and appropriate
helpers can be used.

Since ctl_table->proc_handler() expects __user buffer, conversion to
__user happens for kernel allocated one where the value is stored.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 include/linux/filter.h   |  2 ++
 include/uapi/linux/bpf.h | 22 +++++++++++++-
 kernel/bpf/cgroup.c      | 65 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a17732057880..f254ff92819f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1182,6 +1182,8 @@ struct bpf_sock_ops_kern {
 struct bpf_sysctl_kern {
 	struct ctl_table_header *head;
 	struct ctl_table *table;
+	void *cur_val;
+	size_t cur_len;
 	int write;
 };
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c9e8a1f22c14..481e66cce8dc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2495,6 +2495,25 @@ union bpf_attr {
  *
  *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
  *		truncated name in this case).
+ *
+ * int bpf_sysctl_get_current_value(struct bpf_sysctl *ctx, char *buf, size_t buf_len)
+ *	Description
+ *		Get current value of sysctl as it is presented in /proc/sys
+ *		(incl. newline, etc), and copy it as a string into provided
+ *		by program buffer *buf* of size *buf_len*.
+ *
+ *		The whole value is copied, no matter what file position user
+ *		space issued e.g. sys_read at.
+ *
+ *		The buffer is always NUL terminated, unless it's zero-sized.
+ *	Return
+ *		Number of character copied (not including the trailing NUL).
+ *
+ *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
+ *		truncated name in this case).
+ *
+ *		**-EINVAL** if current value was unavailable, e.g. because
+ *		sysctl is uninitialized and read returns -EIO for it.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2598,7 +2617,8 @@ union bpf_attr {
 	FN(get_listener_sock),		\
 	FN(skc_lookup_tcp),		\
 	FN(tcp_check_syncookie),	\
-	FN(sysctl_get_name),
+	FN(sysctl_get_name),		\
+	FN(sysctl_get_current_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a68387043244..c6b2cf29a54b 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -794,15 +794,37 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 		.head = head,
 		.table = table,
 		.write = write,
+		.cur_val = NULL,
+		.cur_len = PAGE_SIZE,
 	};
 	struct cgroup *cgrp;
 	int ret;
 
+	ctx.cur_val = kmalloc_track_caller(ctx.cur_len, GFP_KERNEL);
+	if (ctx.cur_val) {
+		mm_segment_t old_fs;
+		loff_t pos = 0;
+
+		old_fs = get_fs();
+		set_fs(KERNEL_DS);
+		if (table->proc_handler(table, 0, (void __user *)ctx.cur_val,
+					&ctx.cur_len, &pos)) {
+			/* Let BPF program decide how to proceed. */
+			ctx.cur_len = 0;
+		}
+		set_fs(old_fs);
+	} else {
+		/* Let BPF program decide how to proceed. */
+		ctx.cur_len = 0;
+	}
+
 	rcu_read_lock();
 	cgrp = task_dfl_cgroup(current);
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
 	rcu_read_unlock();
 
+	kfree(ctx.cur_val);
+
 	return ret == 1 ? 0 : -EPERM;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl);
@@ -869,12 +891,55 @@ static const struct bpf_func_proto bpf_sysctl_get_name_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
+static int copy_sysctl_value(char *dst, size_t dst_len, char *src,
+			     size_t src_len)
+{
+	if (!dst)
+		return -EINVAL;
+
+	if (!dst_len)
+		return -E2BIG;
+
+	if (!src || !src_len) {
+		memset(dst, 0, dst_len);
+		return -EINVAL;
+	}
+
+	memcpy(dst, src, min(dst_len, src_len));
+
+	if (dst_len > src_len) {
+		memset(dst + src_len, '\0', dst_len - src_len);
+		return src_len;
+	}
+
+	dst[dst_len - 1] = '\0';
+
+	return -E2BIG;
+}
+
+BPF_CALL_3(bpf_sysctl_get_current_value, struct bpf_sysctl_kern *, ctx,
+	   char *, buf, size_t, buf_len)
+{
+	return copy_sysctl_value(buf, buf_len, ctx->cur_val, ctx->cur_len);
+}
+
+static const struct bpf_func_proto bpf_sysctl_get_current_value_proto = {
+	.func		= bpf_sysctl_get_current_value,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *
 sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_sysctl_get_name:
 		return &bpf_sysctl_get_name_proto;
+	case BPF_FUNC_sysctl_get_current_value:
+		return &bpf_sysctl_get_current_value_proto;
 	default:
 		return cgroup_base_func_proto(func_id, prog);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 05/21] bpf: Introduce bpf_sysctl_{get,set}_new_value helpers
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (3 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 04/21] bpf: Introduce bpf_sysctl_get_current_value helper Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 06/21] bpf: Add file_pos field to bpf_sysctl ctx Andrey Ignatov
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add helpers to work with new value being written to sysctl by user
space.

bpf_sysctl_get_new_value() copies value being written to sysctl into
provided buffer.

bpf_sysctl_set_new_value() overrides new value being written by user
space with a one from provided buffer. Buffer should contain string
representation of the value, similar to what can be seen in /proc/sys/.

Both helpers can be used only on sysctl write.

File position matters and can be managed by an interface that will be
introduced separately. E.g. if user space calls sys_write to a file in
/proc/sys/ at file position = X, where X > 0, then the value set by
bpf_sysctl_set_new_value() will be written starting from X. If program
wants to override whole value with specified buffer, file position has
to be set to zero.

Documentation for the new helpers is provided in bpf.h UAPI.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 fs/proc/proc_sysctl.c      | 22 ++++++++---
 include/linux/bpf-cgroup.h |  8 ++--
 include/linux/filter.h     |  3 ++
 include/uapi/linux/bpf.h   | 38 +++++++++++++++++-
 kernel/bpf/cgroup.c        | 81 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 142 insertions(+), 10 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 72f4a096c146..4d1ab22774f7 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -570,8 +570,8 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	struct inode *inode = file_inode(filp);
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
+	void *new_buf = NULL;
 	ssize_t error;
-	size_t res;
 
 	if (IS_ERR(head))
 		return PTR_ERR(head);
@@ -589,15 +589,27 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	if (!table->proc_handler)
 		goto out;
 
-	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write);
+	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, &count,
+					   &new_buf);
 	if (error)
 		goto out;
 
 	/* careful: calling conventions are nasty here */
-	res = count;
-	error = table->proc_handler(table, write, buf, &res, ppos);
+	if (new_buf) {
+		mm_segment_t old_fs;
+
+		old_fs = get_fs();
+		set_fs(KERNEL_DS);
+		error = table->proc_handler(table, write, (void __user *)new_buf,
+					    &count, ppos);
+		set_fs(old_fs);
+		kfree(new_buf);
+	} else {
+		error = table->proc_handler(table, write, buf, &count, ppos);
+	}
+
 	if (!error)
-		error = res;
+		error = count;
 out:
 	sysctl_head_finish(head);
 
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index b1c45da20a26..1e97271f9a10 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -113,7 +113,8 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   struct ctl_table *table, int write,
-				   enum bpf_attach_type type);
+				   void __user *buf, size_t *pcount,
+				   void **new_buf, enum bpf_attach_type type);
 
 static inline enum bpf_cgroup_storage_type cgroup_storage_type(
 	struct bpf_map *map)
@@ -261,11 +262,12 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
 })
 
 
-#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write)			       \
+#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, count, nbuf)       \
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled)						       \
 		__ret = __cgroup_bpf_run_filter_sysctl(head, table, write,     \
+						       buf, count, nbuf,       \
 						       BPF_CGROUP_SYSCTL);     \
 	__ret;								       \
 })
@@ -338,7 +340,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
 #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,nbuf) ({ 0; })
 
 #define for_each_cgroup_storage_type(stype) for (; false; )
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index f254ff92819f..a23653f9460c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1184,6 +1184,9 @@ struct bpf_sysctl_kern {
 	struct ctl_table *table;
 	void *cur_val;
 	size_t cur_len;
+	void *new_val;
+	size_t new_len;
+	int new_updated;
 	int write;
 };
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 481e66cce8dc..fed5b605449a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2514,6 +2514,40 @@ union bpf_attr {
  *
  *		**-EINVAL** if current value was unavailable, e.g. because
  *		sysctl is uninitialized and read returns -EIO for it.
+ *
+ * int bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t buf_len)
+ *	Description
+ *		Get new value being written by user space to sysctl (before
+ *		the actual write happens) and copy it as a string into
+ *		provided by program buffer *buf* of size *buf_len*.
+ *
+ *		User space may write new value at file position > 0.
+ *
+ *		The buffer is always NUL terminated, unless it's zero-sized.
+ *	Return
+ *		Number of character copied (not including the trailing NUL).
+ *
+ *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
+ *		truncated name in this case).
+ *
+ *		**-EINVAL** if sysctl is being read.
+ *
+ * int bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf, size_t buf_len)
+ *	Description
+ *		Override new value being written by user space to sysctl with
+ *		value provided by program in buffer *buf* of size *buf_len*.
+ *
+ *		*buf* should contain a string in same form as provided by user
+ *		space on sysctl write.
+ *
+ *		User space may write new value at file position > 0. To override
+ *		the whole sysctl value file position should be set to zero.
+ *	Return
+ *		0 on success.
+ *
+ *		**-E2BIG** if the *buf_len* is too big.
+ *
+ *		**-EINVAL** if sysctl is being read.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2618,7 +2652,9 @@ union bpf_attr {
 	FN(skc_lookup_tcp),		\
 	FN(tcp_check_syncookie),	\
 	FN(sysctl_get_name),		\
-	FN(sysctl_get_current_value),
+	FN(sysctl_get_current_value),	\
+	FN(sysctl_get_new_value),	\
+	FN(sysctl_set_new_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index c6b2cf29a54b..ba4e21986760 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -778,6 +778,13 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
  * @head: sysctl table header
  * @table: sysctl table
  * @write: sysctl is being read (= 0) or written (= 1)
+ * @buf: pointer to buffer passed by user space
+ * @pcount: value-result argument: value is size of buffer pointed to by @buf,
+ *	result is size of @new_buf if program set new value, initial value
+ *	otherwise
+ * @new_buf: pointer to pointer to new buffer that will be allocated if program
+ *	overrides new value provided by user space on sysctl write
+ *	NOTE: it's caller responsibility to free *new_buf if it was set
  * @type: type of program to be executed
  *
  * Program is run when sysctl is being accessed, either read or written, and
@@ -788,7 +795,8 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
  */
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   struct ctl_table *table, int write,
-				   enum bpf_attach_type type)
+				   void __user *buf, size_t *pcount,
+				   void **new_buf, enum bpf_attach_type type)
 {
 	struct bpf_sysctl_kern ctx = {
 		.head = head,
@@ -796,6 +804,9 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 		.write = write,
 		.cur_val = NULL,
 		.cur_len = PAGE_SIZE,
+		.new_val = NULL,
+		.new_len = 0,
+		.new_updated = 0,
 	};
 	struct cgroup *cgrp;
 	int ret;
@@ -818,6 +829,18 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 		ctx.cur_len = 0;
 	}
 
+	if (write && buf && *pcount) {
+		/* BPF program should be able to override new value with a
+		 * buffer bigger than provided by user.
+		 */
+		ctx.new_val = kmalloc_track_caller(PAGE_SIZE, GFP_KERNEL);
+		ctx.new_len = min(PAGE_SIZE, *pcount);
+		if (!ctx.new_val ||
+		    copy_from_user(ctx.new_val, buf, ctx.new_len))
+			/* Let BPF program decide how to proceed. */
+			ctx.new_len = 0;
+	}
+
 	rcu_read_lock();
 	cgrp = task_dfl_cgroup(current);
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
@@ -825,6 +848,13 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 
 	kfree(ctx.cur_val);
 
+	if (ret == 1 && ctx.new_updated) {
+		*new_buf = ctx.new_val;
+		*pcount = ctx.new_len;
+	} else {
+		kfree(ctx.new_val);
+	}
+
 	return ret == 1 ? 0 : -EPERM;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl);
@@ -932,6 +962,51 @@ static const struct bpf_func_proto bpf_sysctl_get_current_value_proto = {
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_3(bpf_sysctl_get_new_value, struct bpf_sysctl_kern *, ctx, char *, buf,
+	   size_t, buf_len)
+{
+	if (!ctx->write) {
+		if (buf && buf_len)
+			memset(buf, '\0', buf_len);
+		return -EINVAL;
+	}
+	return copy_sysctl_value(buf, buf_len, ctx->new_val, ctx->new_len);
+}
+
+static const struct bpf_func_proto bpf_sysctl_get_new_value_proto = {
+	.func		= bpf_sysctl_get_new_value,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+};
+
+BPF_CALL_3(bpf_sysctl_set_new_value, struct bpf_sysctl_kern *, ctx,
+	   const char *, buf, size_t, buf_len)
+{
+	if (!ctx->write || !ctx->new_val || !ctx->new_len || !buf || !buf_len)
+		return -EINVAL;
+
+	if (buf_len > PAGE_SIZE - 1)
+		return -E2BIG;
+
+	memcpy(ctx->new_val, buf, buf_len);
+	ctx->new_len = buf_len;
+	ctx->new_updated = 1;
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sysctl_set_new_value_proto = {
+	.func		= bpf_sysctl_set_new_value,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *
 sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -940,6 +1015,10 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sysctl_get_name_proto;
 	case BPF_FUNC_sysctl_get_current_value:
 		return &bpf_sysctl_get_current_value_proto;
+	case BPF_FUNC_sysctl_get_new_value:
+		return &bpf_sysctl_get_new_value_proto;
+	case BPF_FUNC_sysctl_set_new_value:
+		return &bpf_sysctl_set_new_value_proto;
 	default:
 		return cgroup_base_func_proto(func_id, prog);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 06/21] bpf: Add file_pos field to bpf_sysctl ctx
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (4 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 05/21] bpf: Introduce bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 07/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add file_pos field to bpf_sysctl context to read and write sysctl file
position at which sysctl is being accessed (read or written).

The field can be used to e.g. override whole sysctl value on write to
sysctl even when sys_write is called by user space with file_pos > 0. Or
BPF program may reject such accesses.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 fs/proc/proc_sysctl.c      |  2 +-
 include/linux/bpf-cgroup.h |  9 ++++---
 include/linux/filter.h     |  3 +++
 include/uapi/linux/bpf.h   |  3 +++
 kernel/bpf/cgroup.c        | 54 +++++++++++++++++++++++++++++++++++---
 5 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 4d1ab22774f7..8371bbca1105 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -590,7 +590,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 		goto out;
 
 	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, &count,
-					   &new_buf);
+					   ppos, &new_buf);
 	if (error)
 		goto out;
 
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 1e97271f9a10..cb3c6b3b89c8 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -114,7 +114,8 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   struct ctl_table *table, int write,
 				   void __user *buf, size_t *pcount,
-				   void **new_buf, enum bpf_attach_type type);
+				   loff_t *ppos, void **new_buf,
+				   enum bpf_attach_type type);
 
 static inline enum bpf_cgroup_storage_type cgroup_storage_type(
 	struct bpf_map *map)
@@ -262,12 +263,12 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
 })
 
 
-#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, count, nbuf)       \
+#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, count, pos, nbuf)  \
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled)						       \
 		__ret = __cgroup_bpf_run_filter_sysctl(head, table, write,     \
-						       buf, count, nbuf,       \
+						       buf, count, pos, nbuf,  \
 						       BPF_CGROUP_SYSCTL);     \
 	__ret;								       \
 })
@@ -340,7 +341,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
 #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,nbuf) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos,nbuf) ({ 0; })
 
 #define for_each_cgroup_storage_type(stype) for (; false; )
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a23653f9460c..fb0edad75971 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1188,6 +1188,9 @@ struct bpf_sysctl_kern {
 	size_t new_len;
 	int new_updated;
 	int write;
+	loff_t *ppos;
+	/* Temporary "register" for indirect stores to ppos. */
+	u64 tmp_reg;
 };
 
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fed5b605449a..01faa286bb3c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3358,6 +3358,9 @@ struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
 				 */
+	__u32	file_pos;	/* Sysctl file position to read from, write to.
+				 * Allows 1,2,4-byte read an 4-byte write.
+				 */
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ba4e21986760..b2adf22139b3 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -782,6 +782,9 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
  * @pcount: value-result argument: value is size of buffer pointed to by @buf,
  *	result is size of @new_buf if program set new value, initial value
  *	otherwise
+ * @ppos: value-result argument: value is position at which read from or write
+ *	to sysctl is happening, result is new position if program overrode it,
+ *	initial value otherwise
  * @new_buf: pointer to pointer to new buffer that will be allocated if program
  *	overrides new value provided by user space on sysctl write
  *	NOTE: it's caller responsibility to free *new_buf if it was set
@@ -796,12 +799,14 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   struct ctl_table *table, int write,
 				   void __user *buf, size_t *pcount,
-				   void **new_buf, enum bpf_attach_type type)
+				   loff_t *ppos, void **new_buf,
+				   enum bpf_attach_type type)
 {
 	struct bpf_sysctl_kern ctx = {
 		.head = head,
 		.table = table,
 		.write = write,
+		.ppos = ppos,
 		.cur_val = NULL,
 		.cur_len = PAGE_SIZE,
 		.new_val = NULL,
@@ -1030,14 +1035,22 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type,
 {
 	const int size_default = sizeof(__u32);
 
-	if (off < 0 || off + size > sizeof(struct bpf_sysctl) ||
-	    off % size || type != BPF_READ)
+	if (off < 0 || off + size > sizeof(struct bpf_sysctl) || off % size)
 		return false;
 
 	switch (off) {
 	case offsetof(struct bpf_sysctl, write):
+		if (type != BPF_READ)
+			return false;
 		bpf_ctx_record_field_size(info, size_default);
 		return bpf_ctx_narrow_access_ok(off, size, size_default);
+	case offsetof(struct bpf_sysctl, file_pos):
+		if (type == BPF_READ) {
+			bpf_ctx_record_field_size(info, size_default);
+			return bpf_ctx_narrow_access_ok(off, size, size_default);
+		} else {
+			return size == size_default;
+		}
 	default:
 		return false;
 	}
@@ -1059,6 +1072,41 @@ static u32 sysctl_convert_ctx_access(enum bpf_access_type type,
 						    write),
 				       target_size));
 		break;
+	case offsetof(struct bpf_sysctl, file_pos):
+		/* ppos is a pointer so it should be accessed via indirect
+		 * loads and stores. Also for stores additional temporary
+		 * register is used since neither src_reg nor dst_reg can be
+		 * overridden.
+		 */
+		if (type == BPF_WRITE) {
+			int treg = BPF_REG_9;
+
+			if (si->src_reg == treg || si->dst_reg == treg)
+				--treg;
+			if (si->src_reg == treg || si->dst_reg == treg)
+				--treg;
+			*insn++ = BPF_STX_MEM(
+				BPF_DW, si->dst_reg, treg,
+				offsetof(struct bpf_sysctl_kern, tmp_reg));
+			*insn++ = BPF_LDX_MEM(
+				BPF_FIELD_SIZEOF(struct bpf_sysctl_kern, ppos),
+				treg, si->dst_reg,
+				offsetof(struct bpf_sysctl_kern, ppos));
+			*insn++ = BPF_STX_MEM(
+				BPF_SIZEOF(u32), treg, si->src_reg, 0);
+			*insn++ = BPF_LDX_MEM(
+				BPF_DW, treg, si->dst_reg,
+				offsetof(struct bpf_sysctl_kern, tmp_reg));
+		} else {
+			*insn++ = BPF_LDX_MEM(
+				BPF_FIELD_SIZEOF(struct bpf_sysctl_kern, ppos),
+				si->dst_reg, si->src_reg,
+				offsetof(struct bpf_sysctl_kern, ppos));
+			*insn++ = BPF_LDX_MEM(
+				BPF_SIZE(si->code), si->dst_reg, si->dst_reg, 0);
+		}
+		*target_size = sizeof(u32);
+		break;
 	}
 
 	return insn - insn_buf;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 07/21] bpf: Sync bpf.h to tools/
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (5 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 06/21] bpf: Add file_pos field to bpf_sysctl ctx Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 08/21] libbpf: Support sysctl hook Andrey Ignatov
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Sync BPF_PROG_TYPE_CGROUP_SYSCTL related bpf UAPI changes to tools/.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/include/uapi/linux/bpf.h | 90 +++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 837024512baf..01faa286bb3c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -166,6 +166,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LIRC_MODE2,
 	BPF_PROG_TYPE_SK_REUSEPORT,
 	BPF_PROG_TYPE_FLOW_DISSECTOR,
+	BPF_PROG_TYPE_CGROUP_SYSCTL,
 };
 
 enum bpf_attach_type {
@@ -187,6 +188,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_UDP6_SENDMSG,
 	BPF_LIRC_MODE2,
 	BPF_FLOW_DISSECTOR,
+	BPF_CGROUP_SYSCTL,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2477,6 +2479,75 @@ union bpf_attr {
  * 	Return
  * 		0 if iph and th are a valid SYN cookie ACK, or a negative error
  * 		otherwise.
+ *
+ * int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t buf_len, u64 flags)
+ *	Description
+ *		Get name of sysctl in /proc/sys/ and copy it into provided by
+ *		program buffer *buf* of size *buf_len*.
+ *
+ *		The buffer is always NUL terminated, unless it's zero-sized.
+ *
+ *		If *flags* is zero, full name (e.g. "net/ipv4/tcp_mem") is
+ *		copied. Use **BPF_F_SYSCTL_BASE_NAME** flag to copy base name
+ *		only (e.g. "tcp_mem").
+ *	Return
+ *		Number of character copied (not including the trailing NUL).
+ *
+ *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
+ *		truncated name in this case).
+ *
+ * int bpf_sysctl_get_current_value(struct bpf_sysctl *ctx, char *buf, size_t buf_len)
+ *	Description
+ *		Get current value of sysctl as it is presented in /proc/sys
+ *		(incl. newline, etc), and copy it as a string into provided
+ *		by program buffer *buf* of size *buf_len*.
+ *
+ *		The whole value is copied, no matter what file position user
+ *		space issued e.g. sys_read at.
+ *
+ *		The buffer is always NUL terminated, unless it's zero-sized.
+ *	Return
+ *		Number of character copied (not including the trailing NUL).
+ *
+ *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
+ *		truncated name in this case).
+ *
+ *		**-EINVAL** if current value was unavailable, e.g. because
+ *		sysctl is uninitialized and read returns -EIO for it.
+ *
+ * int bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t buf_len)
+ *	Description
+ *		Get new value being written by user space to sysctl (before
+ *		the actual write happens) and copy it as a string into
+ *		provided by program buffer *buf* of size *buf_len*.
+ *
+ *		User space may write new value at file position > 0.
+ *
+ *		The buffer is always NUL terminated, unless it's zero-sized.
+ *	Return
+ *		Number of character copied (not including the trailing NUL).
+ *
+ *		**-E2BIG** if the buffer wasn't big enough (*buf* will contain
+ *		truncated name in this case).
+ *
+ *		**-EINVAL** if sysctl is being read.
+ *
+ * int bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf, size_t buf_len)
+ *	Description
+ *		Override new value being written by user space to sysctl with
+ *		value provided by program in buffer *buf* of size *buf_len*.
+ *
+ *		*buf* should contain a string in same form as provided by user
+ *		space on sysctl write.
+ *
+ *		User space may write new value at file position > 0. To override
+ *		the whole sysctl value file position should be set to zero.
+ *	Return
+ *		0 on success.
+ *
+ *		**-E2BIG** if the *buf_len* is too big.
+ *
+ *		**-EINVAL** if sysctl is being read.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2579,7 +2650,11 @@ union bpf_attr {
 	FN(skb_ecn_set_ce),		\
 	FN(get_listener_sock),		\
 	FN(skc_lookup_tcp),		\
-	FN(tcp_check_syncookie),
+	FN(tcp_check_syncookie),	\
+	FN(sysctl_get_name),		\
+	FN(sysctl_get_current_value),	\
+	FN(sysctl_get_new_value),	\
+	FN(sysctl_set_new_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2646,6 +2721,9 @@ enum bpf_func_id {
 #define BPF_F_ADJ_ROOM_ENCAP_L4_GRE	(1ULL << 3)
 #define BPF_F_ADJ_ROOM_ENCAP_L4_UDP	(1ULL << 4)
 
+/* BPF_FUNC_sysctl_get_name flags. */
+#define BPF_F_SYSCTL_BASE_NAME		(1ULL << 0)
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
@@ -3275,4 +3353,14 @@ struct bpf_line_info {
 struct bpf_spin_lock {
 	__u32	val;
 };
+
+struct bpf_sysctl {
+	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
+				 * Allows 1,2,4-byte read, but no write.
+				 */
+	__u32	file_pos;	/* Sysctl file position to read from, write to.
+				 * Allows 1,2,4-byte read an 4-byte write.
+				 */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 08/21] libbpf: Support sysctl hook
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (6 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 07/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 09/21] selftests/bpf: Test sysctl section name Andrey Ignatov
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Support BPF_PROG_TYPE_CGROUP_SYSCTL program in libbpf: identifying
program and attach types by section name, probe.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/lib/bpf/libbpf.c        | 3 +++
 tools/lib/bpf/libbpf_probes.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e1e4d35cf08e..f36020e5207b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1705,6 +1705,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_TRACEPOINT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
 	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 		return false;
 	case BPF_PROG_TYPE_KPROBE:
 	default:
@@ -2642,6 +2643,8 @@ static const struct {
 						BPF_CGROUP_UDP4_SENDMSG),
 	BPF_EAPROG_SEC("cgroup/sendmsg6",	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 						BPF_CGROUP_UDP6_SENDMSG),
+	BPF_EAPROG_SEC("cgroup/sysctl",		BPF_PROG_TYPE_CGROUP_SYSCTL,
+						BPF_CGROUP_SYSCTL),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 8c3a1c04dcb2..0f25541632e3 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -97,6 +97,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_LIRC_MODE2:
 	case BPF_PROG_TYPE_SK_REUSEPORT:
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	default:
 		break;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 09/21] selftests/bpf: Test sysctl section name
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (7 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 08/21] libbpf: Support sysctl hook Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 10/21] selftests/bpf: Test BPF_CGROUP_SYSCTL Andrey Ignatov
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add unit test to verify that program and attach types are properly
identified for "cgroup/sysctl" section name.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/test_section_names.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_section_names.c b/tools/testing/selftests/bpf/test_section_names.c
index 7c4f41572b1c..bebd4fbca1f4 100644
--- a/tools/testing/selftests/bpf/test_section_names.c
+++ b/tools/testing/selftests/bpf/test_section_names.c
@@ -119,6 +119,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG},
 		{0, BPF_CGROUP_UDP6_SENDMSG},
 	},
+	{
+		"cgroup/sysctl",
+		{0, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL},
+		{0, BPF_CGROUP_SYSCTL},
+	},
 };
 
 static int test_prog_type_by_name(const struct sec_name_test *test)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 10/21] selftests/bpf: Test BPF_CGROUP_SYSCTL
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (8 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 09/21] selftests/bpf: Test sysctl section name Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 11/21] selftests/bpf: Test bpf_sysctl_get_name helper Andrey Ignatov
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type.

Test that program can allow/deny access.
Test both valid and invalid accesses to ctx->write.

Example of output:
  # ./test_sysctl
  Test case: sysctl wrong attach_type .. [PASS]
  Test case: sysctl:read allow all .. [PASS]
  Test case: sysctl:read deny all .. [PASS]
  Test case: ctx:write sysctl:read read ok .. [PASS]
  Test case: ctx:write sysctl:write read ok .. [PASS]
  Test case: ctx:write sysctl:read write reject .. [PASS]
  Summary: 6 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/Makefile      |   3 +-
 tools/testing/selftests/bpf/test_sysctl.c | 291 ++++++++++++++++++++++
 2 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_sysctl.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 078283d073b0..f9d83ba7843e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,7 +23,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
 	test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \
 	test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \
-	test_netcnt test_tcpnotify_user test_sock_fields
+	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -93,6 +93,7 @@ $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
 $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
 $(OUTPUT)/test_netcnt: cgroup_helpers.c
 $(OUTPUT)/test_sock_fields: cgroup_helpers.c
+$(OUTPUT)/test_sysctl: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
new file mode 100644
index 000000000000..6d0ab09789ad
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -0,0 +1,291 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <fcntl.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <linux/filter.h>
+
+#include <bpf/bpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#define CG_PATH			"/foo"
+#define MAX_INSNS		512
+
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+struct sysctl_test {
+	const char *descr;
+	struct bpf_insn	insns[MAX_INSNS];
+	enum bpf_attach_type attach_type;
+	const char *sysctl;
+	int open_flags;
+	const char *newval;
+	enum {
+		LOAD_REJECT,
+		ATTACH_REJECT,
+		OP_EPERM,
+		SUCCESS,
+	} result;
+};
+
+static struct sysctl_test tests[] = {
+	{
+		.descr = "sysctl wrong attach_type",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = 0,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = ATTACH_REJECT,
+	},
+	{
+		.descr = "sysctl:read allow all",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl:read deny all",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = OP_EPERM,
+	},
+	{
+		.descr = "ctx:write sysctl:read read ok",
+		.insns = {
+			/* If (write) */
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct bpf_sysctl, write)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 1, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "ctx:write sysctl:write read ok",
+		.insns = {
+			/* If (write) */
+			BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct bpf_sysctl, write)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 1, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/domainname",
+		.open_flags = O_WRONLY,
+		.newval = "(none)", /* same as default, should fail anyway */
+		.result = OP_EPERM,
+	},
+	{
+		.descr = "ctx:write sysctl:read write reject",
+		.insns = {
+			/* write = X */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+				    offsetof(struct bpf_sysctl, write)),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = LOAD_REJECT,
+	},
+};
+
+static size_t probe_prog_length(const struct bpf_insn *fp)
+{
+	size_t len;
+
+	for (len = MAX_INSNS - 1; len > 0; --len)
+		if (fp[len].code != 0 || fp[len].imm != 0)
+			break;
+	return len + 1;
+}
+
+static int load_sysctl_prog(struct sysctl_test *test, const char *sysctl_path)
+{
+	struct bpf_insn *prog = test->insns;
+	struct bpf_load_program_attr attr;
+	int ret;
+
+	memset(&attr, 0, sizeof(struct bpf_load_program_attr));
+	attr.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL;
+	attr.insns = prog;
+	attr.insns_cnt = probe_prog_length(attr.insns);
+	attr.license = "GPL";
+
+	ret = bpf_load_program_xattr(&attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
+	if (ret < 0 && test->result != LOAD_REJECT) {
+		log_err(">>> Loading program error.\n"
+			">>> Verifier output:\n%s\n-------\n", bpf_log_buf);
+	}
+
+	return ret;
+}
+
+static int access_sysctl(const char *sysctl_path,
+			 const struct sysctl_test *test)
+{
+	int err = 0;
+	int fd;
+
+	fd = open(sysctl_path, test->open_flags | O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	if (test->open_flags == O_RDONLY) {
+		char buf[128];
+
+		if (read(fd, buf, sizeof(buf)) == -1)
+			goto err;
+	} else if (test->open_flags == O_WRONLY) {
+		if (!test->newval) {
+			log_err("New value for sysctl is not set");
+			goto err;
+		}
+		if (write(fd, test->newval, strlen(test->newval)) == -1)
+			goto err;
+	} else {
+		log_err("Unexpected sysctl access: neither read nor write");
+		goto err;
+	}
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(fd);
+	return err;
+}
+
+static int run_test_case(int cgfd, struct sysctl_test *test)
+{
+	enum bpf_attach_type atype = test->attach_type;
+	char sysctl_path[128];
+	int progfd = -1;
+	int err = 0;
+
+	printf("Test case: %s .. ", test->descr);
+
+	snprintf(sysctl_path, sizeof(sysctl_path), "/proc/sys/%s",
+		 test->sysctl);
+
+	progfd = load_sysctl_prog(test, sysctl_path);
+	if (progfd < 0) {
+		if (test->result == LOAD_REJECT)
+			goto out;
+		else
+			goto err;
+	}
+
+	if (bpf_prog_attach(progfd, cgfd, atype, BPF_F_ALLOW_OVERRIDE) == -1) {
+		if (test->result == ATTACH_REJECT)
+			goto out;
+		else
+			goto err;
+	}
+
+	if (access_sysctl(sysctl_path, test) == -1) {
+		if (test->result == OP_EPERM && errno == EPERM)
+			goto out;
+		else
+			goto err;
+	}
+
+	if (test->result != SUCCESS) {
+		log_err("Unexpected failure");
+		goto err;
+	}
+
+	goto out;
+err:
+	err = -1;
+out:
+	/* Detaching w/o checking return code: best effort attempt. */
+	if (progfd != -1)
+		bpf_prog_detach(cgfd, atype);
+	close(progfd);
+	printf("[%s]\n", err ? "FAIL" : "PASS");
+	return err;
+}
+
+static int run_tests(int cgfd)
+{
+	int passes = 0;
+	int fails = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tests); ++i) {
+		if (run_test_case(cgfd, &tests[i]))
+			++fails;
+		else
+			++passes;
+	}
+	printf("Summary: %d PASSED, %d FAILED\n", passes, fails);
+	return fails ? -1 : 0;
+}
+
+int main(int argc, char **argv)
+{
+	int cgfd = -1;
+	int err = 0;
+
+	if (setup_cgroup_environment())
+		goto err;
+
+	cgfd = create_and_get_cgroup(CG_PATH);
+	if (cgfd < 0)
+		goto err;
+
+	if (join_cgroup(CG_PATH))
+		goto err;
+
+	if (run_tests(cgfd))
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(cgfd);
+	cleanup_cgroup_environment();
+	return err;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 11/21] selftests/bpf: Test bpf_sysctl_get_name helper
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (9 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 10/21] selftests/bpf: Test BPF_CGROUP_SYSCTL Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 12/21] selftests/bpf: Test sysctl_get_current_value helper Andrey Ignatov
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and
too small buffers to get E2BIG and truncated result, etc.

  # ./test_sysctl
  ...
  Test case: sysctl_get_name sysctl_value:base ok .. [PASS]
  Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS]
  Test case: sysctl_get_name sysctl:full ok .. [PASS]
  Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS]
  Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS]
  Summary: 11 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/test_sysctl.c | 222 ++++++++++++++++++++++
 1 file changed, 222 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
index 6d0ab09789ad..2e7ef04503a8 100644
--- a/tools/testing/selftests/bpf/test_sysctl.c
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -128,6 +128,228 @@ static struct sysctl_test tests[] = {
 		.open_flags = O_RDONLY,
 		.result = LOAD_REJECT,
 	},
+	{
+		.descr = "sysctl_get_name sysctl_value:base ok",
+		.insns = {
+			/* sysctl_get_name arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_name arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 8),
+
+			/* sysctl_get_name arg4 (flags) */
+			BPF_MOV64_IMM(BPF_REG_4, BPF_F_SYSCTL_BASE_NAME),
+
+			/* sysctl_get_name(ctx, buf, buf_len, flags) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_name),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, sizeof("tcp_mem") - 1, 6),
+			/*     buf == "tcp_mem\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x006d656d5f706374ULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_name sysctl_value:base E2BIG truncated",
+		.insns = {
+			/* sysctl_get_name arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_name arg3 (buf_len) too small */
+			BPF_MOV64_IMM(BPF_REG_3, 7),
+
+			/* sysctl_get_name arg4 (flags) */
+			BPF_MOV64_IMM(BPF_REG_4, BPF_F_SYSCTL_BASE_NAME),
+
+			/* sysctl_get_name(ctx, buf, buf_len, flags) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_name),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -E2BIG, 6),
+
+			/*     buf[0:7] == "tcp_me\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x00656d5f706374ULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_name sysctl:full ok",
+		.insns = {
+			/* sysctl_get_name arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -24),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 16),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_name arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 17),
+
+			/* sysctl_get_name arg4 (flags) */
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+
+			/* sysctl_get_name(ctx, buf, buf_len, flags) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_name),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 16, 14),
+
+			/*     buf[0:8] == "net/ipv4" && */
+			BPF_LD_IMM64(BPF_REG_8, 0x347670692f74656eULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 10),
+
+			/*     buf[8:16] == "/tcp_mem" && */
+			BPF_LD_IMM64(BPF_REG_8, 0x6d656d5f7063742fULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 8),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 6),
+
+			/*     buf[16:24] == "\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x0ULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 16),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_name sysctl:full E2BIG truncated",
+		.insns = {
+			/* sysctl_get_name arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -16),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 8),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_name arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 16),
+
+			/* sysctl_get_name arg4 (flags) */
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+
+			/* sysctl_get_name(ctx, buf, buf_len, flags) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_name),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -E2BIG, 10),
+
+			/*     buf[0:8] == "net/ipv4" && */
+			BPF_LD_IMM64(BPF_REG_8, 0x347670692f74656eULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 6),
+
+			/*     buf[8:16] == "/tcp_me\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x00656d5f7063742fULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 8),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_name sysctl:full E2BIG truncated small",
+		.insns = {
+			/* sysctl_get_name arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_name arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 7),
+
+			/* sysctl_get_name arg4 (flags) */
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+
+			/* sysctl_get_name(ctx, buf, buf_len, flags) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_name),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -E2BIG, 6),
+
+			/*     buf[0:8] == "net/ip\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x000070692f74656eULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
 };
 
 static size_t probe_prog_length(const struct bpf_insn *fp)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 12/21] selftests/bpf: Test sysctl_get_current_value helper
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (10 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 11/21] selftests/bpf: Test bpf_sysctl_get_name helper Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 13/21] selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Test sysctl_get_current_value on sysctl read and write, buffers with
enough space and too small buffers to get E2BIG and truncated result,
etc.

  # ./test_sysctl
  ...
  Test case: sysctl_get_current_value sysctl:read ok, gt .. [PASS]
  Test case: sysctl_get_current_value sysctl:read ok, eq .. [PASS]
  Test case: sysctl_get_current_value sysctl:read E2BIG truncated ..  [PASS]
  Test case: sysctl_get_current_value sysctl:read EINVAL .. [PASS]
  Test case: sysctl_get_current_value sysctl:write ok .. [PASS]
  Summary: 16 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/test_sysctl.c | 228 ++++++++++++++++++++++
 1 file changed, 228 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
index 2e7ef04503a8..9fb6560fa775 100644
--- a/tools/testing/selftests/bpf/test_sysctl.c
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -18,11 +18,13 @@
 
 #define CG_PATH			"/foo"
 #define MAX_INSNS		512
+#define FIXUP_SYSCTL_VALUE	0
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 
 struct sysctl_test {
 	const char *descr;
+	size_t fixup_value_insn;
 	struct bpf_insn	insns[MAX_INSNS];
 	enum bpf_attach_type attach_type;
 	const char *sysctl;
@@ -350,6 +352,190 @@ static struct sysctl_test tests[] = {
 		.open_flags = O_RDONLY,
 		.result = SUCCESS,
 	},
+	{
+		.descr = "sysctl_get_current_value sysctl:read ok, gt",
+		.insns = {
+			/* sysctl_get_current_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_current_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 8),
+
+			/* sysctl_get_current_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_current_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 6, 6),
+
+			/*     buf[0:6] == "Linux\n\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x000a78756e694cULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_current_value sysctl:read ok, eq",
+		.insns = {
+			/* sysctl_get_current_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_B, BPF_REG_7, BPF_REG_0, 7),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_current_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 7),
+
+			/* sysctl_get_current_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_current_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 6, 6),
+
+			/*     buf[0:6] == "Linux\n\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x000a78756e694cULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_current_value sysctl:read E2BIG truncated",
+		.insns = {
+			/* sysctl_get_current_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_H, BPF_REG_7, BPF_REG_0, 6),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_current_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 6),
+
+			/* sysctl_get_current_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_current_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -E2BIG, 6),
+
+			/*     buf[0:6] == "Linux\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x000078756e694cULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_current_value sysctl:read EINVAL",
+		.insns = {
+			/* sysctl_get_current_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_current_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 8),
+
+			/* sysctl_get_current_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_current_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -EINVAL, 4),
+
+			/*     buf[0:8] is NUL-filled) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv6/conf/lo/stable_secret", /* -EIO */
+		.open_flags = O_RDONLY,
+		.result = OP_EPERM,
+	},
+	{
+		.descr = "sysctl_get_current_value sysctl:write ok",
+		.fixup_value_insn = 6,
+		.insns = {
+			/* sysctl_get_current_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_current_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 8),
+
+			/* sysctl_get_current_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_current_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 4, 6),
+
+			/*     buf[0:4] == expected) */
+			BPF_LD_IMM64(BPF_REG_8, FIXUP_SYSCTL_VALUE),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_WRONLY,
+		.newval = "600", /* same as default, should fail anyway */
+		.result = OP_EPERM,
+	},
 };
 
 static size_t probe_prog_length(const struct bpf_insn *fp)
@@ -362,6 +548,27 @@ static size_t probe_prog_length(const struct bpf_insn *fp)
 	return len + 1;
 }
 
+static int fixup_sysctl_value(const char *buf, size_t buf_len,
+			      struct bpf_insn *prog, size_t insn_num)
+{
+	uint32_t value_num = 0;
+	uint8_t c, i;
+
+	if (buf_len > sizeof(value_num)) {
+		log_err("Value is too big (%zd) to use in fixup", buf_len);
+		return -1;
+	}
+
+	for (i = 0; i < buf_len; ++i) {
+		c = buf[i];
+		value_num |= (c << i * 8);
+	}
+
+	prog[insn_num].imm = value_num;
+
+	return 0;
+}
+
 static int load_sysctl_prog(struct sysctl_test *test, const char *sysctl_path)
 {
 	struct bpf_insn *prog = test->insns;
@@ -374,6 +581,27 @@ static int load_sysctl_prog(struct sysctl_test *test, const char *sysctl_path)
 	attr.insns_cnt = probe_prog_length(attr.insns);
 	attr.license = "GPL";
 
+	if (test->fixup_value_insn) {
+		char buf[128];
+		ssize_t len;
+		int fd;
+
+		fd = open(sysctl_path, O_RDONLY | O_CLOEXEC);
+		if (fd < 0) {
+			log_err("open(%s) failed", sysctl_path);
+			return -1;
+		}
+		len = read(fd, buf, sizeof(buf));
+		if (len == -1) {
+			log_err("read(%s) failed", sysctl_path);
+			close(fd);
+			return -1;
+		}
+		close(fd);
+		if (fixup_sysctl_value(buf, len, prog, test->fixup_value_insn))
+			return -1;
+	}
+
 	ret = bpf_load_program_xattr(&attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
 	if (ret < 0 && test->result != LOAD_REJECT) {
 		log_err(">>> Loading program error.\n"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 13/21] selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (11 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 12/21] selftests/bpf: Test sysctl_get_current_value helper Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 14/21] selftests/bpf: Test file_pos field in bpf_sysctl ctx Andrey Ignatov
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Test that new value provided by user space on sysctl write can be read
by bpf_sysctl_get_new_value and overridden by bpf_sysctl_set_new_value.

  # ./test_sysctl
  ...
  Test case: sysctl_get_new_value sysctl:read EINVAL .. [PASS]
  Test case: sysctl_get_new_value sysctl:write ok .. [PASS]
  Test case: sysctl_get_new_value sysctl:write ok long .. [PASS]
  Test case: sysctl_get_new_value sysctl:write E2BIG .. [PASS]
  Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS]
  Test case: sysctl_set_new_value sysctl:write ok .. [PASS]
  Summary: 22 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/test_sysctl.c | 222 ++++++++++++++++++++++
 1 file changed, 222 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
index 9fb6560fa775..95437b72404f 100644
--- a/tools/testing/selftests/bpf/test_sysctl.c
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -536,6 +536,228 @@ static struct sysctl_test tests[] = {
 		.newval = "600", /* same as default, should fail anyway */
 		.result = OP_EPERM,
 	},
+	{
+		.descr = "sysctl_get_new_value sysctl:read EINVAL",
+		.insns = {
+			/* sysctl_get_new_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_new_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 8),
+
+			/* sysctl_get_new_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_new_value),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -EINVAL, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_get_new_value sysctl:write ok",
+		.insns = {
+			/* sysctl_get_new_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_new_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 4),
+
+			/* sysctl_get_new_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_new_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 3, 4),
+
+			/*     buf[0:4] == "606\0") */
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0x00363036, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_WRONLY,
+		.newval = "606",
+		.result = OP_EPERM,
+	},
+	{
+		.descr = "sysctl_get_new_value sysctl:write ok long",
+		.insns = {
+			/* sysctl_get_new_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -24),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_new_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 24),
+
+			/* sysctl_get_new_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_new_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 23, 14),
+
+			/*     buf[0:8] == "3000000 " && */
+			BPF_LD_IMM64(BPF_REG_8, 0x2030303030303033ULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 10),
+
+			/*     buf[8:16] == "4000000 " && */
+			BPF_LD_IMM64(BPF_REG_8, 0x2030303030303034ULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 8),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 6),
+
+			/*     buf[16:24] == "6000000\0") */
+			BPF_LD_IMM64(BPF_REG_8, 0x0030303030303036ULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 16),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_WRONLY,
+		.newval = "3000000 4000000 6000000",
+		.result = OP_EPERM,
+	},
+	{
+		.descr = "sysctl_get_new_value sysctl:write E2BIG",
+		.insns = {
+			/* sysctl_get_new_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_B, BPF_REG_7, BPF_REG_0, 3),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_get_new_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 3),
+
+			/* sysctl_get_new_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_get_new_value),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -E2BIG, 4),
+
+			/*     buf[0:3] == "60\0") */
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0x003036, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_WRONLY,
+		.newval = "606",
+		.result = OP_EPERM,
+	},
+	{
+		.descr = "sysctl_set_new_value sysctl:read EINVAL",
+		.insns = {
+			/* sysctl_set_new_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_set_new_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 3),
+
+			/* sysctl_set_new_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_set_new_value),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -EINVAL, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "sysctl_set_new_value sysctl:write ok",
+		.fixup_value_insn = 2,
+		.insns = {
+			/* sysctl_set_new_value arg2 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, FIXUP_SYSCTL_VALUE),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+
+			/* sysctl_set_new_value arg3 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_3, 3),
+
+			/* sysctl_set_new_value(ctx, buf, buf_len) */
+			BPF_EMIT_CALL(BPF_FUNC_sysctl_set_new_value),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_WRONLY,
+		.newval = "606",
+		.result = SUCCESS,
+	},
 };
 
 static size_t probe_prog_length(const struct bpf_insn *fp)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 14/21] selftests/bpf: Test file_pos field in bpf_sysctl ctx
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (12 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 13/21] selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 15/21] bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types Andrey Ignatov
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Test access to file_pos field of bpf_sysctl context, both read (incl.
narrow read) and write.

  # ./test_sysctl
  ...
  Test case: ctx:file_pos sysctl:read read ok .. [PASS]
  Test case: ctx:file_pos sysctl:read read ok narrow .. [PASS]
  Test case: ctx:file_pos sysctl:read write ok .. [PASS]
  ...

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/test_sysctl.c | 64 +++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
index 95437b72404f..43008aae32d3 100644
--- a/tools/testing/selftests/bpf/test_sysctl.c
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -30,6 +30,7 @@ struct sysctl_test {
 	const char *sysctl;
 	int open_flags;
 	const char *newval;
+	const char *oldval;
 	enum {
 		LOAD_REJECT,
 		ATTACH_REJECT,
@@ -130,6 +131,64 @@ static struct sysctl_test tests[] = {
 		.open_flags = O_RDONLY,
 		.result = LOAD_REJECT,
 	},
+	{
+		.descr = "ctx:file_pos sysctl:read read ok",
+		.insns = {
+			/* If (file_pos == X) */
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct bpf_sysctl, file_pos)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "ctx:file_pos sysctl:read read ok narrow",
+		.insns = {
+			/* If (file_pos == X) */
+			BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct bpf_sysctl, file_pos)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		.descr = "ctx:file_pos sysctl:read write ok",
+		.insns = {
+			/* file_pos = X */
+			BPF_MOV64_IMM(BPF_REG_0, 2),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+				    offsetof(struct bpf_sysctl, file_pos)),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "kernel/ostype",
+		.open_flags = O_RDONLY,
+		.oldval = "nux\n",
+		.result = SUCCESS,
+	},
 	{
 		.descr = "sysctl_get_name sysctl_value:base ok",
 		.insns = {
@@ -848,6 +907,11 @@ static int access_sysctl(const char *sysctl_path,
 
 		if (read(fd, buf, sizeof(buf)) == -1)
 			goto err;
+		if (test->oldval &&
+		    strncmp(buf, test->oldval, strlen(test->oldval))) {
+			log_err("Read value %s != %s", buf, test->oldval);
+			goto err;
+		}
 	} else if (test->open_flags == O_WRONLY) {
 		if (!test->newval) {
 			log_err("New value for sysctl is not set");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 15/21] bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (13 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 14/21] selftests/bpf: Test file_pos field in bpf_sysctl ctx Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 16/21] bpf: Introduce bpf_strtol and bpf_strtoul helpers Andrey Ignatov
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Currently the way to pass result from BPF helper to BPF program is to
provide memory area defined by pointer and size: func(void *, size_t).

It works great for generic use-case, but for simple types, such as int,
it's overkill and consumes two arguments when it could use just one.

Introduce new argument types ARG_PTR_TO_INT and ARG_PTR_TO_LONG to be
able to pass result from helper to program via pointer to int and long
correspondingly: func(int *) or func(long *).

New argument types are similar to ARG_PTR_TO_MEM with the following
differences:
* they don't require corresponding ARG_CONST_SIZE argument, predefined
  access sizes are used instead (32bit for int, 64bit for long);
* it's possible to use more than one such an argument in a helper;
* provided pointers have to be aligned.

It's easy to introduce similar ARG_PTR_TO_CHAR and ARG_PTR_TO_SHORT
argument types. It's not done due to lack of use-case though.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 include/linux/bpf.h   |  2 ++
 kernel/bpf/verifier.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a445194b5fb6..986f89d9de63 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -195,6 +195,8 @@ enum bpf_arg_type {
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
 	ARG_PTR_TO_SPIN_LOCK,	/* pointer to bpf_spin_lock */
 	ARG_PTR_TO_SOCK_COMMON,	/* pointer to sock_common */
+	ARG_PTR_TO_INT,		/* pointer to int */
+	ARG_PTR_TO_LONG,	/* pointer to long */
 };
 
 /* type of values returned from helper functions */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 56baf21b57c2..6ff5c0e048b1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2433,6 +2433,22 @@ static bool arg_type_is_mem_size(enum bpf_arg_type type)
 	       type == ARG_CONST_SIZE_OR_ZERO;
 }
 
+static bool arg_type_is_int_ptr(enum bpf_arg_type type)
+{
+	return type == ARG_PTR_TO_INT ||
+	       type == ARG_PTR_TO_LONG;
+}
+
+static int int_ptr_type_to_size(enum bpf_arg_type type)
+{
+	if (type == ARG_PTR_TO_INT)
+		return sizeof(u32);
+	else if (type == ARG_PTR_TO_LONG)
+		return sizeof(u64);
+
+	return -EINVAL;
+}
+
 static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			  enum bpf_arg_type arg_type,
 			  struct bpf_call_arg_meta *meta)
@@ -2525,6 +2541,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			 type != expected_type)
 			goto err_type;
 		meta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;
+	} else if (arg_type_is_int_ptr(arg_type)) {
+		expected_type = PTR_TO_STACK;
+		if (!type_is_pkt_pointer(type) &&
+		    type != PTR_TO_MAP_VALUE &&
+		    type != expected_type)
+			goto err_type;
 	} else {
 		verbose(env, "unsupported arg_type %d\n", arg_type);
 		return -EFAULT;
@@ -2606,6 +2628,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		err = check_helper_mem_access(env, regno - 1,
 					      reg->umax_value,
 					      zero_size_allowed, meta);
+	} else if (arg_type_is_int_ptr(arg_type)) {
+		int size = int_ptr_type_to_size(arg_type);
+
+		err = check_helper_mem_access(env, regno, size, false, meta);
+		if (err)
+			return err;
+		err = check_ptr_alignment(env, reg, 0, size, true);
 	}
 
 	return err;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 16/21] bpf: Introduce bpf_strtol and bpf_strtoul helpers
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (14 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 15/21] bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 17/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned
long correspondingly. It's similar to user space strtol(3) and
strtoul(3) with a few changes to the API:

* instead of NUL-terminated C string the helpers expect buffer and
  buffer length;

* resulting long or unsigned long is returned in a separate
  result-argument;

* return value is used to indicate success or failure, on success number
  of consumed bytes is returned that can be used to identify position to
  read next if the buffer is expected to contain multiple integers;

* instead of *base* argument, *flags* is used that provides base in 5
  LSB, other bits are reserved for future use;

* number of supported bases is limited.

Documentation for the new helpers is provided in bpf.h UAPI.

The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to
be able to convert string input to e.g. "ulongvec" output.

E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be
parsed by calling to bpf_strtoul three times.

Implementation notes:

Implementation includes "../../lib/kstrtox.h" to reuse integer parsing
functions. It's done exactly same way as fs/proc/base.c already does.

Unfortunately existing kstrtoX function can't be used directly since
they fail if any invalid character is present right after integer in the
string. Existing simple_strtoX functions can't be used either since
they're obsolete and don't handle overflow properly.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 include/linux/bpf.h      |   2 +
 include/uapi/linux/bpf.h |  51 ++++++++++++++-
 kernel/bpf/cgroup.c      |   4 ++
 kernel/bpf/helpers.c     | 131 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 986f89d9de63..d38cc6215df7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -933,6 +933,8 @@ extern const struct bpf_func_proto bpf_sk_redirect_map_proto;
 extern const struct bpf_func_proto bpf_spin_lock_proto;
 extern const struct bpf_func_proto bpf_spin_unlock_proto;
 extern const struct bpf_func_proto bpf_get_local_storage_proto;
+extern const struct bpf_func_proto bpf_strtol_proto;
+extern const struct bpf_func_proto bpf_strtoul_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 01faa286bb3c..6611f5ce276c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2548,6 +2548,53 @@ union bpf_attr {
  *		**-E2BIG** if the *buf_len* is too big.
  *
  *		**-EINVAL** if sysctl is being read.
+ *
+ * int bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
+ *	Description
+ *		Convert the initial part of the string from buffer *buf* of
+ *		size *buf_len* to a long integer according to the given base
+ *		and save the result in *res*.
+ *
+ *		The string may begin with an arbitrary amount of white space
+ *		(as determined by isspace(3)) followed by a single optional '-'
+ *		sign.
+ *
+ *		Five least significant bits of *flags* encode base, other bits
+ *		are currently unused.
+ *
+ *		Base must be either 8, 10, 16 or 0 to detect it automatically
+ *		similar to user space strtol(3).
+ *	Return
+ *		Number of characters consumed on success. Must be positive but
+ *		no more than buf_len.
+ *
+ *		**-EINVAL** if no valid digits were found or unsupported base
+ *		was provided.
+ *
+ *		**-ERANGE** if resulting value was out of range.
+ *
+ * int bpf_strtoul(const char *buf, size_t buf_len, u64 flags, unsigned long *res)
+ *	Description
+ *		Convert the initial part of the string from buffer *buf* of
+ *		size *buf_len* to an unsigned long integer according to the
+ *		given base and save the result in *res*.
+ *
+ *		The string may begin with an arbitrary amount of white space
+ *		(as determined by isspace(3)).
+ *
+ *		Five least significant bits of *flags* encode base, other bits
+ *		are currently unused.
+ *
+ *		Base must be either 8, 10, 16 or 0 to detect it automatically
+ *		similar to user space strtoul(3).
+ *	Return
+ *		Number of characters consumed on success. Must be positive but
+ *		no more than buf_len.
+ *
+ *		**-EINVAL** if no valid digits were found or unsupported base
+ *		was provided.
+ *
+ *		**-ERANGE** if resulting value was out of range.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2654,7 +2701,9 @@ union bpf_attr {
 	FN(sysctl_get_name),		\
 	FN(sysctl_get_current_value),	\
 	FN(sysctl_get_new_value),	\
-	FN(sysctl_set_new_value),
+	FN(sysctl_set_new_value),	\
+	FN(strtol),			\
+	FN(strtoul),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b2adf22139b3..789d4ab2336e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1016,6 +1016,10 @@ static const struct bpf_func_proto *
 sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
+	case BPF_FUNC_strtol:
+		return &bpf_strtol_proto;
+	case BPF_FUNC_strtoul:
+		return &bpf_strtoul_proto;
 	case BPF_FUNC_sysctl_get_name:
 		return &bpf_sysctl_get_name_proto;
 	case BPF_FUNC_sysctl_get_current_value:
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a411fc17d265..4266ffde07ca 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -18,6 +18,9 @@
 #include <linux/sched.h>
 #include <linux/uidgid.h>
 #include <linux/filter.h>
+#include <linux/ctype.h>
+
+#include "../../lib/kstrtox.h"
 
 /* If kernel subsystem is allowing eBPF programs to call this function,
  * inside its own verifier_ops->get_func_proto() callback it should return
@@ -363,4 +366,132 @@ const struct bpf_func_proto bpf_get_local_storage_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 #endif
+
+#define BPF_STRTOX_BASE_MASK 0x1F
+
+static int __bpf_strtoull(const char *buf, size_t buf_len, u64 flags,
+			  unsigned long long *res, bool *is_negative)
+{
+	unsigned int base = flags & BPF_STRTOX_BASE_MASK;
+	const char *cur_buf = buf;
+	size_t cur_len = buf_len;
+	unsigned int consumed;
+	size_t val_len;
+	char str[64];
+
+	if (!buf || !buf_len || !res || !is_negative)
+		return -EINVAL;
+
+	if (base != 0 && base != 8 && base != 10 && base != 16)
+		return -EINVAL;
+
+	if (flags & ~BPF_STRTOX_BASE_MASK)
+		return -EINVAL;
+
+	while (cur_buf < buf + buf_len && isspace(*cur_buf))
+		++cur_buf;
+
+	*is_negative = (cur_buf < buf + buf_len && *cur_buf == '-');
+	if (*is_negative)
+		++cur_buf;
+
+	consumed = cur_buf - buf;
+	cur_len -= consumed;
+	if (!cur_len)
+		return -EINVAL;
+
+	cur_len = min(cur_len, sizeof(str) - 1);
+	memcpy(str, cur_buf, cur_len);
+	str[cur_len] = '\0';
+	cur_buf = str;
+
+	cur_buf = _parse_integer_fixup_radix(cur_buf, &base);
+	val_len = _parse_integer(cur_buf, base, res);
+
+	if (val_len & KSTRTOX_OVERFLOW)
+		return -ERANGE;
+
+	if (val_len == 0)
+		return -EINVAL;
+
+	cur_buf += val_len;
+	consumed += cur_buf - str;
+
+	return consumed;
+}
+
+static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
+			 long long *res)
+{
+	unsigned long long _res;
+	bool is_negative;
+	int err;
+
+	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
+	if (err < 0)
+		return err;
+	if (is_negative) {
+		if ((long long)-_res > 0)
+			return -ERANGE;
+		*res = -_res;
+	} else {
+		if ((long long)_res < 0)
+			return -ERANGE;
+		*res = _res;
+	}
+	return err;
+}
+
+BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags,
+	   long *, res)
+{
+	long long _res;
+	int err;
+
+	err = __bpf_strtoll(buf, buf_len, flags, &_res);
+	if (err < 0)
+		return err;
+	if (_res != (long)_res)
+		return -ERANGE;
+	*res = _res;
+	return err;
+}
+
+const struct bpf_func_proto bpf_strtol_proto = {
+	.func		= bpf_strtol,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_LONG,
+};
+
+BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags,
+	   unsigned long *, res)
+{
+	unsigned long long _res;
+	bool is_negative;
+	int err;
+
+	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
+	if (err < 0)
+		return err;
+	if (is_negative)
+		return -EINVAL;
+	if (_res != (unsigned long)_res)
+		return -ERANGE;
+	*res = _res;
+	return err;
+}
+
+const struct bpf_func_proto bpf_strtoul_proto = {
+	.func		= bpf_strtoul,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_LONG,
+};
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 17/21] bpf: Sync bpf.h to tools/
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (15 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 16/21] bpf: Introduce bpf_strtol and bpf_strtoul helpers Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 18/21] selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h Andrey Ignatov
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Sync bpf_strtoX related bpf UAPI changes to tools/.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/include/uapi/linux/bpf.h | 51 +++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 01faa286bb3c..6611f5ce276c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2548,6 +2548,53 @@ union bpf_attr {
  *		**-E2BIG** if the *buf_len* is too big.
  *
  *		**-EINVAL** if sysctl is being read.
+ *
+ * int bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
+ *	Description
+ *		Convert the initial part of the string from buffer *buf* of
+ *		size *buf_len* to a long integer according to the given base
+ *		and save the result in *res*.
+ *
+ *		The string may begin with an arbitrary amount of white space
+ *		(as determined by isspace(3)) followed by a single optional '-'
+ *		sign.
+ *
+ *		Five least significant bits of *flags* encode base, other bits
+ *		are currently unused.
+ *
+ *		Base must be either 8, 10, 16 or 0 to detect it automatically
+ *		similar to user space strtol(3).
+ *	Return
+ *		Number of characters consumed on success. Must be positive but
+ *		no more than buf_len.
+ *
+ *		**-EINVAL** if no valid digits were found or unsupported base
+ *		was provided.
+ *
+ *		**-ERANGE** if resulting value was out of range.
+ *
+ * int bpf_strtoul(const char *buf, size_t buf_len, u64 flags, unsigned long *res)
+ *	Description
+ *		Convert the initial part of the string from buffer *buf* of
+ *		size *buf_len* to an unsigned long integer according to the
+ *		given base and save the result in *res*.
+ *
+ *		The string may begin with an arbitrary amount of white space
+ *		(as determined by isspace(3)).
+ *
+ *		Five least significant bits of *flags* encode base, other bits
+ *		are currently unused.
+ *
+ *		Base must be either 8, 10, 16 or 0 to detect it automatically
+ *		similar to user space strtoul(3).
+ *	Return
+ *		Number of characters consumed on success. Must be positive but
+ *		no more than buf_len.
+ *
+ *		**-EINVAL** if no valid digits were found or unsupported base
+ *		was provided.
+ *
+ *		**-ERANGE** if resulting value was out of range.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2654,7 +2701,9 @@ union bpf_attr {
 	FN(sysctl_get_name),		\
 	FN(sysctl_get_current_value),	\
 	FN(sysctl_get_new_value),	\
-	FN(sysctl_set_new_value),
+	FN(sysctl_set_new_value),	\
+	FN(strtol),			\
+	FN(strtoul),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 18/21] selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (16 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 17/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 19/21] selftests/bpf: Test ARG_PTR_TO_LONG arg type Andrey Ignatov
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add bpf_sysctl_* and bpf_strtoX helpers to bpf_helpers.h.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/bpf_helpers.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 97d140961438..1082f3c0e7df 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -192,6 +192,25 @@ static int (*bpf_skb_ecn_set_ce)(void *ctx) =
 static int (*bpf_tcp_check_syncookie)(struct bpf_sock *sk,
 	    void *ip, int ip_len, void *tcp, int tcp_len) =
 	(void *) BPF_FUNC_tcp_check_syncookie;
+static int (*bpf_sysctl_get_name)(void *ctx, char *buf,
+				  unsigned long long buf_len,
+				  unsigned long long flags) =
+	(void *) BPF_FUNC_sysctl_get_name;
+static int (*bpf_sysctl_get_current_value)(void *ctx, char *buf,
+					   unsigned long long buf_len) =
+	(void *) BPF_FUNC_sysctl_get_current_value;
+static int (*bpf_sysctl_get_new_value)(void *ctx, char *buf,
+				       unsigned long long buf_len) =
+	(void *) BPF_FUNC_sysctl_get_new_value;
+static int (*bpf_sysctl_set_new_value)(void *ctx, const char *buf,
+				       unsigned long long buf_len) =
+	(void *) BPF_FUNC_sysctl_set_new_value;
+static int (*bpf_strtol)(const char *buf, unsigned long long buf_len,
+			 unsigned long long flags, long *res) =
+	(void *) BPF_FUNC_strtol;
+static int (*bpf_strtoul)(const char *buf, unsigned long long buf_len,
+			  unsigned long long flags, unsigned long *res) =
+	(void *) BPF_FUNC_strtoul;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 19/21] selftests/bpf: Test ARG_PTR_TO_LONG arg type
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (17 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 18/21] selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 20/21] selftests/bpf: Test bpf_strtol and bpf_strtoul helpers Andrey Ignatov
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Test that verifier handles new argument types properly, including
uninitialized or partially initialized value, misaligned stack access,
etc.

Example of output:
  #456/p ARG_PTR_TO_LONG uninitialized OK
  #457/p ARG_PTR_TO_LONG half-uninitialized OK
  #458/p ARG_PTR_TO_LONG misaligned OK
  #459/p ARG_PTR_TO_LONG size < sizeof(long) OK
  #460/p ARG_PTR_TO_LONG initialized OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 .../testing/selftests/bpf/verifier/int_ptr.c  | 160 ++++++++++++++++++
 1 file changed, 160 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/int_ptr.c

diff --git a/tools/testing/selftests/bpf/verifier/int_ptr.c b/tools/testing/selftests/bpf/verifier/int_ptr.c
new file mode 100644
index 000000000000..ca3b4729df66
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/int_ptr.c
@@ -0,0 +1,160 @@
+{
+	"ARG_PTR_TO_LONG uninitialized",
+	.insns = {
+		/* bpf_strtoul arg1 (buf) */
+		BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+		/* bpf_strtoul arg2 (buf_len) */
+		BPF_MOV64_IMM(BPF_REG_2, 4),
+
+		/* bpf_strtoul arg3 (flags) */
+		BPF_MOV64_IMM(BPF_REG_3, 0),
+
+		/* bpf_strtoul arg4 (res) */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+		/* bpf_strtoul() */
+		BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL,
+	.errstr = "invalid indirect read from stack off -16+0 size 8",
+},
+{
+	"ARG_PTR_TO_LONG half-uninitialized",
+	.insns = {
+		/* bpf_strtoul arg1 (buf) */
+		BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+		/* bpf_strtoul arg2 (buf_len) */
+		BPF_MOV64_IMM(BPF_REG_2, 4),
+
+		/* bpf_strtoul arg3 (flags) */
+		BPF_MOV64_IMM(BPF_REG_3, 0),
+
+		/* bpf_strtoul arg4 (res) */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_STX_MEM(BPF_W, BPF_REG_7, BPF_REG_0, 0),
+		BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+		/* bpf_strtoul() */
+		BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL,
+	.errstr = "invalid indirect read from stack off -16+4 size 8",
+},
+{
+	"ARG_PTR_TO_LONG misaligned",
+	.insns = {
+		/* bpf_strtoul arg1 (buf) */
+		BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+		/* bpf_strtoul arg2 (buf_len) */
+		BPF_MOV64_IMM(BPF_REG_2, 4),
+
+		/* bpf_strtoul arg3 (flags) */
+		BPF_MOV64_IMM(BPF_REG_3, 0),
+
+		/* bpf_strtoul arg4 (res) */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -12),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_STX_MEM(BPF_W, BPF_REG_7, BPF_REG_0, 0),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 4),
+		BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+		/* bpf_strtoul() */
+		BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL,
+	.errstr = "misaligned stack access off (0x0; 0x0)+-20+0 size 8",
+},
+{
+	"ARG_PTR_TO_LONG size < sizeof(long)",
+	.insns = {
+		/* bpf_strtoul arg1 (buf) */
+		BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -16),
+		BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+		/* bpf_strtoul arg2 (buf_len) */
+		BPF_MOV64_IMM(BPF_REG_2, 4),
+
+		/* bpf_strtoul arg3 (flags) */
+		BPF_MOV64_IMM(BPF_REG_3, 0),
+
+		/* bpf_strtoul arg4 (res) */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, 12),
+		BPF_STX_MEM(BPF_W, BPF_REG_7, BPF_REG_0, 0),
+		BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+		/* bpf_strtoul() */
+		BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL,
+	.errstr = "invalid stack type R4 off=-4 access_size=8",
+},
+{
+	"ARG_PTR_TO_LONG initialized",
+	.insns = {
+		/* bpf_strtoul arg1 (buf) */
+		BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+		/* bpf_strtoul arg2 (buf_len) */
+		BPF_MOV64_IMM(BPF_REG_2, 4),
+
+		/* bpf_strtoul arg3 (flags) */
+		BPF_MOV64_IMM(BPF_REG_3, 0),
+
+		/* bpf_strtoul arg4 (res) */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+		BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+		BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+		/* bpf_strtoul() */
+		BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL,
+},
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 20/21] selftests/bpf: Test bpf_strtol and bpf_strtoul helpers
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (18 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 19/21] selftests/bpf: Test ARG_PTR_TO_LONG arg type Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-05 19:35 ` [PATCH v3 bpf-next 21/21] selftests/bpf: C based test for sysctl and strtoX Andrey Ignatov
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Test that bpf_strtol and  bpf_strtoul helpers can be used to convert
provided buffer to long or unsigned long correspondingly and return both
correct result and number of consumed bytes, or proper errno.

Example of output:
  # ./test_sysctl
  ..
  Test case: bpf_strtoul one number string .. [PASS]
  Test case: bpf_strtoul multi number string .. [PASS]
  Test case: bpf_strtoul buf_len = 0, reject .. [PASS]
  Test case: bpf_strtoul supported base, ok .. [PASS]
  Test case: bpf_strtoul unsupported base, EINVAL .. [PASS]
  Test case: bpf_strtoul buf with spaces only, EINVAL .. [PASS]
  Test case: bpf_strtoul negative number, EINVAL .. [PASS]
  Test case: bpf_strtol negative number, ok .. [PASS]
  Test case: bpf_strtol hex number, ok .. [PASS]
  Test case: bpf_strtol max long .. [PASS]
  Test case: bpf_strtol overflow, ERANGE .. [PASS]
  Summary: 36 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 tools/testing/selftests/bpf/test_sysctl.c | 485 ++++++++++++++++++++++
 1 file changed, 485 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
index 43008aae32d3..885675480af9 100644
--- a/tools/testing/selftests/bpf/test_sysctl.c
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -817,6 +817,491 @@ static struct sysctl_test tests[] = {
 		.newval = "606",
 		.result = SUCCESS,
 	},
+	{
+		"bpf_strtoul one number string",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 3, 4),
+			/*     res == expected) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 600, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtoul multi number string",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			/* "600 602\0" */
+			BPF_LD_IMM64(BPF_REG_0, 0x0032303620303036ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 3, 18),
+			/*     res == expected) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 600, 16),
+
+			/*     arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_7, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/*     arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+			BPF_ALU64_REG(BPF_SUB, BPF_REG_2, BPF_REG_0),
+
+			/*     arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/*     arg4 (res) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -16),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/*     if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 4, 4),
+			/*         res == expected) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 602, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtoul buf_len = 0, reject",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = LOAD_REJECT,
+	},
+	{
+		"bpf_strtoul supported base, ok",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00373730),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 8),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 3, 4),
+			/*     res == expected) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 63, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtoul unsupported base, EINVAL",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00303036),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 3),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -EINVAL, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtoul buf with spaces only, EINVAL",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x090a0c0d),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -EINVAL, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtoul negative number, EINVAL",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00362d0a), /* " -6\0" */
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtoul),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -EINVAL, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtol negative number, ok",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x00362d0a), /* " -6\0" */
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 10),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtol),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 3, 4),
+			/*     res == expected) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, -6, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtol hex number, ok",
+		.insns = {
+			/* arg1 (buf) */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0x65667830), /* "0xfe" */
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtol),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 4, 4),
+			/*     res == expected) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 254, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtol max long",
+		.insns = {
+			/* arg1 (buf) 9223372036854775807 */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -24),
+			BPF_LD_IMM64(BPF_REG_0, 0x3032373333323239ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 0x3537373435383633ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 8),
+			BPF_LD_IMM64(BPF_REG_0, 0x0000000000373038ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 16),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 19),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtol),
+
+			/* if (ret == expected && */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 19, 6),
+			/*     res == expected) */
+			BPF_LD_IMM64(BPF_REG_8, 0x7fffffffffffffffULL),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_7, 0),
+			BPF_JMP_REG(BPF_JNE, BPF_REG_8, BPF_REG_9, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
+	{
+		"bpf_strtol overflow, ERANGE",
+		.insns = {
+			/* arg1 (buf) 9223372036854775808 */
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -24),
+			BPF_LD_IMM64(BPF_REG_0, 0x3032373333323239ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 0x3537373435383633ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 8),
+			BPF_LD_IMM64(BPF_REG_0, 0x0000000000383038ULL),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 16),
+
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+
+			/* arg2 (buf_len) */
+			BPF_MOV64_IMM(BPF_REG_2, 19),
+
+			/* arg3 (flags) */
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+
+			/* arg4 (res) */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),
+
+			BPF_EMIT_CALL(BPF_FUNC_strtol),
+
+			/* if (ret == expected) */
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -ERANGE, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
 };
 
 static size_t probe_prog_length(const struct bpf_insn *fp)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 bpf-next 21/21] selftests/bpf: C based test for sysctl and strtoX
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (19 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 20/21] selftests/bpf: Test bpf_strtol and bpf_strtoul helpers Andrey Ignatov
@ 2019-04-05 19:35 ` Andrey Ignatov
  2019-04-06 16:43 ` [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Kees Cook
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-05 19:35 UTC (permalink / raw)
  To: netdev
  Cc: Andrey Ignatov, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel

Add C based test for a few bpf_sysctl_* helpers and bpf_strtoul.

Make sure that sysctl can be identified by name and that multiple
integers can be parsed from sysctl value with bpf_strtoul.

net/ipv4/tcp_mem is chosen as a testing sysctl, it contains 3 unsigned
longs, they all are parsed and compared (val[0] < val[1] < val[2]).

Example of output:
  # ./test_sysctl
  ...
  Test case: C prog: deny all writes .. [PASS]
  Test case: C prog: deny access by name .. [PASS]
  Test case: C prog: read tcp_mem .. [PASS]
  Summary: 39 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 .../selftests/bpf/progs/test_sysctl_prog.c    | 70 +++++++++++++++++++
 tools/testing/selftests/bpf/test_sysctl.c     | 57 ++++++++++++++-
 2 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sysctl_prog.c

diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_prog.c b/tools/testing/selftests/bpf/progs/test_sysctl_prog.c
new file mode 100644
index 000000000000..a295cad805d7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_prog.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <stdint.h>
+#include <string.h>
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+
+#include "bpf_helpers.h"
+#include "bpf_util.h"
+
+/* Max supported length of a string with unsigned long in base 10 (pow2 - 1). */
+#define MAX_ULONG_STR_LEN 0xF
+
+/* Max supported length of sysctl value string (pow2). */
+#define MAX_VALUE_STR_LEN 0x40
+
+static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
+{
+	char tcp_mem_name[] = "net/ipv4/tcp_mem";
+	unsigned char i;
+	char name[64];
+	int ret;
+
+	memset(name, 0, sizeof(name));
+	ret = bpf_sysctl_get_name(ctx, name, sizeof(name), 0);
+	if (ret < 0 || ret != sizeof(tcp_mem_name) - 1)
+		return 0;
+
+#pragma clang loop unroll(full)
+	for (i = 0; i < sizeof(tcp_mem_name); ++i)
+		if (name[i] != tcp_mem_name[i])
+			return 0;
+
+	return 1;
+}
+
+SEC("cgroup/sysctl")
+int sysctl_tcp_mem(struct bpf_sysctl *ctx)
+{
+	unsigned long tcp_mem[3] = {0, 0, 0};
+	char value[MAX_VALUE_STR_LEN];
+	unsigned char i, off = 0;
+	int ret;
+
+	if (ctx->write)
+		return 0;
+
+	if (!is_tcp_mem(ctx))
+		return 0;
+
+	ret = bpf_sysctl_get_current_value(ctx, value, MAX_VALUE_STR_LEN);
+	if (ret < 0 || ret >= MAX_VALUE_STR_LEN)
+		return 0;
+
+#pragma clang loop unroll(full)
+	for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) {
+		ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0,
+				  tcp_mem + i);
+		if (ret <= 0 || ret > MAX_ULONG_STR_LEN)
+			return 0;
+		off += ret & MAX_ULONG_STR_LEN;
+	}
+
+
+	return tcp_mem[0] < tcp_mem[1] && tcp_mem[1] < tcp_mem[2];
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
index 885675480af9..a3bebd7c68dd 100644
--- a/tools/testing/selftests/bpf/test_sysctl.c
+++ b/tools/testing/selftests/bpf/test_sysctl.c
@@ -11,6 +11,7 @@
 #include <linux/filter.h>
 
 #include <bpf/bpf.h>
+#include <bpf/libbpf.h>
 
 #include "bpf_rlimit.h"
 #include "bpf_util.h"
@@ -26,6 +27,7 @@ struct sysctl_test {
 	const char *descr;
 	size_t fixup_value_insn;
 	struct bpf_insn	insns[MAX_INSNS];
+	const char *prog_file;
 	enum bpf_attach_type attach_type;
 	const char *sysctl;
 	int open_flags;
@@ -1302,6 +1304,31 @@ static struct sysctl_test tests[] = {
 		.open_flags = O_RDONLY,
 		.result = SUCCESS,
 	},
+	{
+		"C prog: deny all writes",
+		.prog_file = "./test_sysctl_prog.o",
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_WRONLY,
+		.newval = "123 456 789",
+		.result = OP_EPERM,
+	},
+	{
+		"C prog: deny access by name",
+		.prog_file = "./test_sysctl_prog.o",
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/route/mtu_expires",
+		.open_flags = O_RDONLY,
+		.result = OP_EPERM,
+	},
+	{
+		"C prog: read tcp_mem",
+		.prog_file = "./test_sysctl_prog.o",
+		.attach_type = BPF_CGROUP_SYSCTL,
+		.sysctl = "net/ipv4/tcp_mem",
+		.open_flags = O_RDONLY,
+		.result = SUCCESS,
+	},
 };
 
 static size_t probe_prog_length(const struct bpf_insn *fp)
@@ -1335,7 +1362,8 @@ static int fixup_sysctl_value(const char *buf, size_t buf_len,
 	return 0;
 }
 
-static int load_sysctl_prog(struct sysctl_test *test, const char *sysctl_path)
+static int load_sysctl_prog_insns(struct sysctl_test *test,
+				  const char *sysctl_path)
 {
 	struct bpf_insn *prog = test->insns;
 	struct bpf_load_program_attr attr;
@@ -1377,6 +1405,33 @@ static int load_sysctl_prog(struct sysctl_test *test, const char *sysctl_path)
 	return ret;
 }
 
+static int load_sysctl_prog_file(struct sysctl_test *test)
+{
+	struct bpf_prog_load_attr attr;
+	struct bpf_object *obj;
+	int prog_fd;
+
+	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
+	attr.file = test->prog_file;
+	attr.prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL;
+
+	if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) {
+		if (test->result != LOAD_REJECT)
+			log_err(">>> Loading program (%s) error.\n",
+				test->prog_file);
+		return -1;
+	}
+
+	return prog_fd;
+}
+
+static int load_sysctl_prog(struct sysctl_test *test, const char *sysctl_path)
+{
+		return test->prog_file
+			? load_sysctl_prog_file(test)
+			: load_sysctl_prog_insns(test, sysctl_path);
+}
+
 static int access_sysctl(const char *sysctl_path,
 			 const struct sysctl_test *test)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (20 preceding siblings ...)
  2019-04-05 19:35 ` [PATCH v3 bpf-next 21/21] selftests/bpf: C based test for sysctl and strtoX Andrey Ignatov
@ 2019-04-06 16:43 ` Kees Cook
  2019-04-06 17:02   ` Alexei Starovoitov
  2019-04-09 20:41 ` Jann Horn
  2019-04-12 21:27 ` Alexei Starovoitov
  23 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2019-04-06 16:43 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Roman Gushchin, kernel-team, Luis Chamberlain, Alexey Dobriyan,
	LKML, linux-fsdevel, linux-security-module, Jann Horn

On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@fb.com> wrote:
>
> v2->v3:
> - simplify C based selftests by relying on variable offset stack access.
>
> v1->v2:
> - add fs/proc/proc_sysctl.c mainteners to Cc:.
>
> The patch set introduces new BPF hook for sysctl.
>
> It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> BPF_CGROUP_SYSCTL.
>
> BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> that accesses (read/write) to sysctl can be controlled for specific cgroup
> and either allowed or denied, or traced.
>
> The hook has access to sysctl name, current sysctl value and (on write
> only) to new sysctl value via corresponding helpers. New sysctl value can
> be overridden by program. Both name and values (current/new) are
> represented as strings same way they're visible in /proc/sys/. It is up to
> program to parse these strings.
>
> To help with parsing the most common kind of sysctl value, vector of
> integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> semantic similar to user space strtol(3) and strtoul(3).
>
> The hook also provides bpf_sysctl context with two fields:
> * @write indicates whether sysctl is being read (= 0) or written (= 1);
> * @file_pos is sysctl file position to read from or write to, can be
>   overridden.
>
> The hook allows to make better isolation for containerized applications
> that are run as root so that one container can't change a sysctl and affect
> all other containers on a host, make changes to allowed sysctl in a safer
> way and simplify sysctl tracing for cgroups.

This sounds more like an LSM than BPF. So sysctls can get blocked when
new BPF is added to a cgroup? Can the BPF be removed (or rather,
what's the lifetime of such BPF?)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-06 16:43 ` [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Kees Cook
@ 2019-04-06 17:02   ` Alexei Starovoitov
  2019-04-09 16:50     ` Kees Cook
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2019-04-06 17:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrey Ignatov, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Roman Gushchin, kernel-team, Luis Chamberlain,
	Alexey Dobriyan, LKML, linux-fsdevel, linux-security-module,
	Jann Horn

On Sat, Apr 06, 2019 at 09:43:50AM -0700, Kees Cook wrote:
> On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@fb.com> wrote:
> >
> > v2->v3:
> > - simplify C based selftests by relying on variable offset stack access.
> >
> > v1->v2:
> > - add fs/proc/proc_sysctl.c mainteners to Cc:.
> >
> > The patch set introduces new BPF hook for sysctl.
> >
> > It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> > BPF_CGROUP_SYSCTL.
> >
> > BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> > that accesses (read/write) to sysctl can be controlled for specific cgroup
> > and either allowed or denied, or traced.
> >
> > The hook has access to sysctl name, current sysctl value and (on write
> > only) to new sysctl value via corresponding helpers. New sysctl value can
> > be overridden by program. Both name and values (current/new) are
> > represented as strings same way they're visible in /proc/sys/. It is up to
> > program to parse these strings.
> >
> > To help with parsing the most common kind of sysctl value, vector of
> > integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> > semantic similar to user space strtol(3) and strtoul(3).
> >
> > The hook also provides bpf_sysctl context with two fields:
> > * @write indicates whether sysctl is being read (= 0) or written (= 1);
> > * @file_pos is sysctl file position to read from or write to, can be
> >   overridden.
> >
> > The hook allows to make better isolation for containerized applications
> > that are run as root so that one container can't change a sysctl and affect
> > all other containers on a host, make changes to allowed sysctl in a safer
> > way and simplify sysctl tracing for cgroups.
> 
> This sounds more like an LSM than BPF. 

not at all. the key difference is being cgroup scoped.
essentially for different containers.

> So sysctls can get blocked when
> new BPF is added to a cgroup? 

bpf prog is attached to this hook in a particular cgroup
and executed for sysctls for tasks that belong to that cgroup.

> Can the BPF be removed (or rather,
> what's the lifetime of such BPF?)

same as all other cgroup-bpf hooks.
Do you have a specific concern or just asking how life time of programs
is managed?
High level description of lifetime is here:
https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-06 17:02   ` Alexei Starovoitov
@ 2019-04-09 16:50     ` Kees Cook
  2019-04-09 23:17       ` Andrey Ignatov
  0 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2019-04-09 16:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrey Ignatov, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Roman Gushchin, kernel-team, Luis Chamberlain,
	Alexey Dobriyan, LKML, linux-fsdevel, linux-security-module,
	Jann Horn

On Sat, Apr 6, 2019 at 10:03 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Apr 06, 2019 at 09:43:50AM -0700, Kees Cook wrote:
> > On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@fb.com> wrote:
> > > BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> > > that accesses (read/write) to sysctl can be controlled for specific cgroup
> > > and either allowed or denied, or traced.
> >
> > This sounds more like an LSM than BPF.
>
> not at all. the key difference is being cgroup scoped.
> essentially for different containers.

Okay, works for me. I was looking at it from the perspective of
something providing resource access control policy, which usually
falls into the LSM world.

> bpf prog is attached to this hook in a particular cgroup
> and executed for sysctls for tasks that belong to that cgroup.

So it's root limiting root-in-a-container? Nice to have some
boundaries there, for sure.

> > Can the BPF be removed (or rather,
> > what's the lifetime of such BPF?)
>
> same as all other cgroup-bpf hooks.
> Do you have a specific concern or just asking how life time of programs
> is managed?
> High level description of lifetime is here:
> https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html

I'm mostly curious about the access control stacking. i.e. can
in-container root add new eBPF to its own cgroup, and if so, can it
undo the restrictions already present? (I assume it can't, but figured
I'd ask...)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 02/21] bpf: Sysctl hook
  2019-04-05 19:35 ` [PATCH v3 bpf-next 02/21] bpf: Sysctl hook Andrey Ignatov
@ 2019-04-09 16:54   ` Kees Cook
  2019-04-09 20:16     ` Andrey Ignatov
  0 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2019-04-09 16:54 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Roman Gushchin, kernel-team, Luis Chamberlain, Alexey Dobriyan,
	LKML, linux-fsdevel, Jann Horn, Mickaël Salaün

On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@fb.com> wrote:
> Containerized applications may run as root and it may create problems
> for whole host. Specifically such applications may change a sysctl and
> affect applications in other containers.
>
> Furthermore in existing infrastructure it may not be possible to just
> completely disable writing to sysctl, instead such a process should be
> gradual with ability to log what sysctl are being changed by a
> container, investigate, limit the set of writable sysctl to currently
> used ones (so that new ones can not be changed) and eventually reduce
> this set to zero.

Actual-root-in-a-container is pretty powerful. What about module
loading, or /dev files? Instead of sysctl-specific hooks, what about
VFS hooks, which would be able to cover all file-based APIs. This is
what, for example, Landlock was working on doing (also with eBPF).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 02/21] bpf: Sysctl hook
  2019-04-09 16:54   ` Kees Cook
@ 2019-04-09 20:16     ` Andrey Ignatov
  0 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-09 20:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Roman Gushchin, Kernel Team, Luis Chamberlain, Alexey Dobriyan,
	LKML, linux-fsdevel, Jann Horn, Mickaël Salaün

Kees Cook <keescook@chromium.org> [Tue, 2019-04-09 09:54 -0700]:
> On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@fb.com> wrote:
> > Containerized applications may run as root and it may create problems
> > for whole host. Specifically such applications may change a sysctl and
> > affect applications in other containers.
> >
> > Furthermore in existing infrastructure it may not be possible to just
> > completely disable writing to sysctl, instead such a process should be
> > gradual with ability to log what sysctl are being changed by a
> > container, investigate, limit the set of writable sysctl to currently
> > used ones (so that new ones can not be changed) and eventually reduce
> > this set to zero.
> 
> Actual-root-in-a-container is pretty powerful. What about module
> loading, or /dev files? Instead of sysctl-specific hooks, what about
> VFS hooks, which would be able to cover all file-based APIs. This is
> what, for example, Landlock was working on doing (also with eBPF).

This sysctl hook doesn't try to solve all possible problems that
root-in-a-container may impose, but rather focuses on one specific
problem.

Generic BPF hooks in VFS can be a good idea and in fact there was a try
to add BPF hook for file_open [1], but apparently it needs more work.

Flexibility of generic hooks has its disadvantages though when it comes
to making what this sysctl-focused hook does. E.g. with sysctl hook
administrator shouldn't care about sys_open, sys_read, sys_write
separately if they want to implement policies (or just tracing) based on
sysctl name / value for cgroup, but can have just one BPF program
instead that has all necessary context to make decisions.

Also accesses to sysctl is usually just a very small part of all calls
to sys_open/sys_read/sys_write on a system, outside of fast-path (e.g.
nobody should care if write to sysctl is a bit slower), and calling BPF
hook for every sys_open/sys_read/sys_write when administrator wants to
limit just sysctl changes can be too expensive.

Furthermore sysctl focused hook allows to tailor its API to sysctl needs
and avoid exposing APIs that make sense only for sysctl to all
file-based accesses and vise versa.

[1] https://lwn.net/Articles/767615/

-- 
Andrey Ignatov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (21 preceding siblings ...)
  2019-04-06 16:43 ` [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Kees Cook
@ 2019-04-09 20:41 ` Jann Horn
  2019-04-09 23:04   ` Andrey Ignatov
  2019-04-12 21:27 ` Alexei Starovoitov
  23 siblings, 1 reply; 33+ messages in thread
From: Jann Horn @ 2019-04-09 20:41 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann, guro,
	kernel-team, Luis Chamberlain, Kees Cook, Alexey Dobriyan,
	kernel list, linux-fsdevel, linux-security-module

On Tue, Apr 9, 2019 at 10:26 PM Andrey Ignatov <rdna@fb.com> wrote:
> The patch set introduces new BPF hook for sysctl.
>
> It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> BPF_CGROUP_SYSCTL.
>
> BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> that accesses (read/write) to sysctl can be controlled for specific cgroup
> and either allowed or denied, or traced.

Don't look at the credentials of "current" in a read or write handler.
Consider what happens if, for example, someone inside a cgroup opens a
sysctl file and passes the file descriptor to another process outside
the cgroup over a unix domain socket, and that other process then
writes to it. Either do your access check on open, or use the
credentials that were saved during open() in the read/write handler.

> The hook has access to sysctl name, current sysctl value and (on write
> only) to new sysctl value via corresponding helpers. New sysctl value can
> be overridden by program. Both name and values (current/new) are
> represented as strings same way they're visible in /proc/sys/. It is up to
> program to parse these strings.

But even if a filter is installed that prevents all access to a
sysctl, you can still read it by installing your own filter that, when
a read is attempted the next time, dumps the value into a map or
something like that, right?

> To help with parsing the most common kind of sysctl value, vector of
> integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> semantic similar to user space strtol(3) and strtoul(3).
>
> The hook also provides bpf_sysctl context with two fields:
> * @write indicates whether sysctl is being read (= 0) or written (= 1);
> * @file_pos is sysctl file position to read from or write to, can be
>   overridden.
>
> The hook allows to make better isolation for containerized applications
> that are run as root so that one container can't change a sysctl and affect
> all other containers on a host, make changes to allowed sysctl in a safer
> way and simplify sysctl tracing for cgroups.

Why can't you use a user namespace and isolate things properly that
way? That would be much cleaner, wouldn't it?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-09 20:41 ` Jann Horn
@ 2019-04-09 23:04   ` Andrey Ignatov
  2019-04-09 23:22     ` Jann Horn
  0 siblings, 1 reply; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-09 23:04 UTC (permalink / raw)
  To: Jann Horn
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Roman Gushchin, Kernel Team, Luis Chamberlain, Kees Cook,
	Alexey Dobriyan, kernel list, linux-fsdevel,
	linux-security-module

Jann Horn <jannh@google.com> [Tue, 2019-04-09 13:42 -0700]:
> On Tue, Apr 9, 2019 at 10:26 PM Andrey Ignatov <rdna@fb.com> wrote:
> > The patch set introduces new BPF hook for sysctl.
> >
> > It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> > BPF_CGROUP_SYSCTL.
> >
> > BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> > that accesses (read/write) to sysctl can be controlled for specific cgroup
> > and either allowed or denied, or traced.
> 
> Don't look at the credentials of "current" in a read or write handler.
> Consider what happens if, for example, someone inside a cgroup opens a
> sysctl file and passes the file descriptor to another process outside
> the cgroup over a unix domain socket, and that other process then
> writes to it. Either do your access check on open, or use the
> credentials that were saved during open() in the read/write handler.

This way this someone inside cgroup should already have control over
something running as root [1] outside of this cgroup, i.e. the game is
already lost, even without this hook.

[1] Since proc_sys_read() / proc_sys_write() check sysctl_perm() before
    execution reaches the hook.

This patch set doesn't look at credentials at all and relies on what
checks were already done at sys_open time or in proc_sys_call_handler()
before execution reaches the hook.

> > The hook has access to sysctl name, current sysctl value and (on write
> > only) to new sysctl value via corresponding helpers. New sysctl value can
> > be overridden by program. Both name and values (current/new) are
> > represented as strings same way they're visible in /proc/sys/. It is up to
> > program to parse these strings.
> 
> But even if a filter is installed that prevents all access to a
> sysctl, you can still read it by installing your own filter that, when
> a read is attempted the next time, dumps the value into a map or
> something like that, right?

No. This can be controlled by cgroup hierarchy and appropriate attach
flags, same way as with any other cgroup-bpf hook.

E.g. imagine there is a cgroup hierarchy:
  root/slice/container/

and container application runs in root/slice/container/ in a cgroup
namespace (CLONE_NEWCGROUP) that makes visible only "container/" part of
the hierarchy, i.e. from inside container application can't even see
"root/slice/".

Administrator can then attach sysctl hook to "root/slice/" with attach
flag NONE (bpf_attr.attach_flags = 0) what means nobody down the
hierarchy can override the program attached by administrator.

> > To help with parsing the most common kind of sysctl value, vector of
> > integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> > semantic similar to user space strtol(3) and strtoul(3).
> >
> > The hook also provides bpf_sysctl context with two fields:
> > * @write indicates whether sysctl is being read (= 0) or written (= 1);
> > * @file_pos is sysctl file position to read from or write to, can be
> >   overridden.
> >
> > The hook allows to make better isolation for containerized applications
> > that are run as root so that one container can't change a sysctl and affect
> > all other containers on a host, make changes to allowed sysctl in a safer
> > way and simplify sysctl tracing for cgroups.
> 
> Why can't you use a user namespace and isolate things properly that
> way? That would be much cleaner, wouldn't it?

I'm not sure I understand how user namespace helps here. From my
understanding it can only completely deny access to sysctl and can't do
fine-grained control for specific sysctl knobs. It also can't make
allow/deny decision based on sysctl value being written.

Basically user namespace is all or nothing. This sysctl hook provides a
way to implement fine-grained access control for sysctl knobs based on
sysctl name or value being written or whatever else policy administrator
can come up with.

-- 
Andrey Ignatov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-09 16:50     ` Kees Cook
@ 2019-04-09 23:17       ` Andrey Ignatov
  0 siblings, 0 replies; 33+ messages in thread
From: Andrey Ignatov @ 2019-04-09 23:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Roman Gushchin, Kernel Team, Luis Chamberlain,
	Alexey Dobriyan, LKML, linux-fsdevel, linux-security-module,
	Jann Horn

Kees Cook <keescook@chromium.org> [Tue, 2019-04-09 09:51 -0700]:
> On Sat, Apr 6, 2019 at 10:03 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Sat, Apr 06, 2019 at 09:43:50AM -0700, Kees Cook wrote:
> > > On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@fb.com> wrote:
> > > Can the BPF be removed (or rather,
> > > what's the lifetime of such BPF?)
> >
> > same as all other cgroup-bpf hooks.
> > Do you have a specific concern or just asking how life time of programs
> > is managed?
> > High level description of lifetime is here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__facebookmicrosites.github.io_bpf_blog_2018_08_31_object-2Dlifetime.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=3jAokpHyGuCuJ834j-tttQ&m=ZJJ4QMXnksL1b4VPoBM0NJ0i6OWysGc2Om26pcoJpxA&s=6dIZ788hOzoDWVif5XQ-9Mqf9ijko9O7TOWArLzblxU&e=
> 
> I'm mostly curious about the access control stacking. i.e. can
> in-container root add new eBPF to its own cgroup, and if so, can it
> undo the restrictions already present? (I assume it can't, but figured
> I'd ask...)

Since I answered similar question from Jann below, I'll answer it here
as well (even though it was addressed to Alexei).

Stacking can be controlled by attach flags (NONE, BPF_F_ALLOW_OVERRIDE,
BPF_F_ALLOW_MULTI) described in include/uapi/linux/bpf.h.

Basically if one attaches a program to a cgroup with
`bpf_attr.attach_flags = 0` (0 is "NONE"), then nobody can override it
by their own programs of same type in any sub-cgroup. It can be hardened
further by cgroup namespace so that in-container root doesn't even see
part of cgroup hierarchy where cgroup-bpf program is attached to with
attach flags NONE.


-- 
Andrey Ignatov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-09 23:04   ` Andrey Ignatov
@ 2019-04-09 23:22     ` Jann Horn
  2019-04-09 23:34       ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Jann Horn @ 2019-04-09 23:22 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Roman Gushchin, Kernel Team, Luis Chamberlain, Kees Cook,
	Alexey Dobriyan, kernel list, linux-fsdevel,
	linux-security-module

On Wed, Apr 10, 2019 at 1:04 AM Andrey Ignatov <rdna@fb.com> wrote:
> Jann Horn <jannh@google.com> [Tue, 2019-04-09 13:42 -0700]:
> > On Tue, Apr 9, 2019 at 10:26 PM Andrey Ignatov <rdna@fb.com> wrote:
> > > The patch set introduces new BPF hook for sysctl.
> > >
> > > It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> > > BPF_CGROUP_SYSCTL.
> > >
> > > BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> > > that accesses (read/write) to sysctl can be controlled for specific cgroup
> > > and either allowed or denied, or traced.
> >
> > Don't look at the credentials of "current" in a read or write handler.
> > Consider what happens if, for example, someone inside a cgroup opens a
> > sysctl file and passes the file descriptor to another process outside
> > the cgroup over a unix domain socket, and that other process then
> > writes to it. Either do your access check on open, or use the
> > credentials that were saved during open() in the read/write handler.
>
> This way this someone inside cgroup should already have control over
> something running as root [1] outside of this cgroup, i.e. the game is
> already lost, even without this hook.
>
> [1] Since proc_sys_read() / proc_sys_write() check sysctl_perm() before
>     execution reaches the hook.

You don't need to have _control_ over something running as root. You
only need to be able to communicate with something that expects to be
passed in file descriptors for some purpose.

> This patch set doesn't look at credentials at all and relies on what
> checks were already done at sys_open time or in proc_sys_call_handler()
> before execution reaches the hook.

You're looking at the cgroup though.

> > > The hook has access to sysctl name, current sysctl value and (on write
> > > only) to new sysctl value via corresponding helpers. New sysctl value can
> > > be overridden by program. Both name and values (current/new) are
> > > represented as strings same way they're visible in /proc/sys/. It is up to
> > > program to parse these strings.
> >
> > But even if a filter is installed that prevents all access to a
> > sysctl, you can still read it by installing your own filter that, when
> > a read is attempted the next time, dumps the value into a map or
> > something like that, right?
>
> No. This can be controlled by cgroup hierarchy and appropriate attach
> flags, same way as with any other cgroup-bpf hook.
>
> E.g. imagine there is a cgroup hierarchy:
>   root/slice/container/
>
> and container application runs in root/slice/container/ in a cgroup
> namespace (CLONE_NEWCGROUP) that makes visible only "container/" part of
> the hierarchy, i.e. from inside container application can't even see
> "root/slice/".
>
> Administrator can then attach sysctl hook to "root/slice/" with attach
> flag NONE (bpf_attr.attach_flags = 0) what means nobody down the
> hierarchy can override the program attached by administrator.

Ah, okay.

> > > To help with parsing the most common kind of sysctl value, vector of
> > > integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> > > semantic similar to user space strtol(3) and strtoul(3).
> > >
> > > The hook also provides bpf_sysctl context with two fields:
> > > * @write indicates whether sysctl is being read (= 0) or written (= 1);
> > > * @file_pos is sysctl file position to read from or write to, can be
> > >   overridden.
> > >
> > > The hook allows to make better isolation for containerized applications
> > > that are run as root so that one container can't change a sysctl and affect
> > > all other containers on a host, make changes to allowed sysctl in a safer
> > > way and simplify sysctl tracing for cgroups.
> >
> > Why can't you use a user namespace and isolate things properly that
> > way? That would be much cleaner, wouldn't it?
>
> I'm not sure I understand how user namespace helps here. From my
> understanding it can only completely deny access to sysctl and can't do
> fine-grained control for specific sysctl knobs. It also can't make
> allow/deny decision based on sysctl value being written.
>
> Basically user namespace is all or nothing. This sysctl hook provides a
> way to implement fine-grained access control for sysctl knobs based on
> sysctl name or value being written or whatever else policy administrator
> can come up with.

But there's a reason why user namespaces are all-or-nothing on these
things. If the kernel does not explicitly make a sysctl available to a
container, the sysctl has global effects, and therefore probably
shouldn't be exposed to anything other than someone with
administrative privileges across the whole system. If the kernel does
make it available to a container, the sysctl's effects are limited to
the container (or otherwise it's a kernel bug).

Can you give examples of sysctls that you want to permit using from
containers, that wouldn't be accessible in a user namespace?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-09 23:22     ` Jann Horn
@ 2019-04-09 23:34       ` Alexei Starovoitov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2019-04-09 23:34 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrey Ignatov, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Roman Gushchin, Kernel Team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, kernel list, linux-fsdevel,
	linux-security-module

On Tue, Apr 9, 2019 at 4:22 PM Jann Horn <jannh@google.com> wrote:
>
> But there's a reason why user namespaces are all-or-nothing on these
> things. If the kernel does not explicitly make a sysctl available to a
> container, the sysctl has global effects, and therefore probably
> shouldn't be exposed to anything other than someone with
> administrative privileges across the whole system. If the kernel does
> make it available to a container, the sysctl's effects are limited to
> the container (or otherwise it's a kernel bug).
>
> Can you give examples of sysctls that you want to permit using from
> containers, that wouldn't be accessible in a user namespace?

I think this discussion has started with incorrect
assumptions about the goal of the patch set.
There is no _security_ part here.
The sysctl hook is to prevent silly things to be done by chef
and apps. Most interesting sysctls need root anyway.
The root can detach all progs and do its thing.
Consider tcp_mem sysctl. We've seen it's been misconfigured
and caused performance issues. bpf prog can track what is
being written, alarm, etc.
User namespaces are not applicable here.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook
  2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
                   ` (22 preceding siblings ...)
  2019-04-09 20:41 ` Jann Horn
@ 2019-04-12 21:27 ` Alexei Starovoitov
  23 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2019-04-12 21:27 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: netdev, ast, daniel, guro, kernel-team, Luis Chamberlain,
	Kees Cook, Alexey Dobriyan, linux-kernel, linux-fsdevel, jannh

On Fri, Apr 05, 2019 at 12:35:22PM -0700, Andrey Ignatov wrote:
> v2->v3:
> - simplify C based selftests by relying on variable offset stack access.
> 
> v1->v2:
> - add fs/proc/proc_sysctl.c mainteners to Cc:.
> 
> The patch set introduces new BPF hook for sysctl.
> 
> It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> BPF_CGROUP_SYSCTL.
> 
> BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> that accesses (read/write) to sysctl can be controlled for specific cgroup
> and either allowed or denied, or traced.
> 
> The hook has access to sysctl name, current sysctl value and (on write
> only) to new sysctl value via corresponding helpers. New sysctl value can
> be overridden by program. Both name and values (current/new) are
> represented as strings same way they're visible in /proc/sys/. It is up to
> program to parse these strings.
> 
> To help with parsing the most common kind of sysctl value, vector of
> integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> semantic similar to user space strtol(3) and strtoul(3).
> 
> The hook also provides bpf_sysctl context with two fields:
> * @write indicates whether sysctl is being read (= 0) or written (= 1);
> * @file_pos is sysctl file position to read from or write to, can be
>   overridden.
> 
> The hook allows to make better isolation for containerized applications
> that are run as root so that one container can't change a sysctl and affect
> all other containers on a host, make changes to allowed sysctl in a safer
> way and simplify sysctl tracing for cgroups.

Applied to bpf-next. Thanks!

Andrey,
as a follow up please add a doc describing that this bpf hook cannot be used
as a security mechanism to limit sysctl usage.
Like: explaining that task_dfl_cgroup(current) is checked at the time of read/write,
it's not a replacement for sysctl_perm, root can detach bpf progs, etc.
I think the commit 7568f4cbbeae ("selftests/bpf: C based test for sysctl and strtoX")
gives an idea of what is possible with this hook and intended usage,
but it needs to be clearly documented that it's for 'trusted root' environment.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-04-12 21:27 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 19:35 [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 01/21] bpf: Add base proto function for cgroup-bpf programs Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 02/21] bpf: Sysctl hook Andrey Ignatov
2019-04-09 16:54   ` Kees Cook
2019-04-09 20:16     ` Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 03/21] bpf: Introduce bpf_sysctl_get_name helper Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 04/21] bpf: Introduce bpf_sysctl_get_current_value helper Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 05/21] bpf: Introduce bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 06/21] bpf: Add file_pos field to bpf_sysctl ctx Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 07/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 08/21] libbpf: Support sysctl hook Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 09/21] selftests/bpf: Test sysctl section name Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 10/21] selftests/bpf: Test BPF_CGROUP_SYSCTL Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 11/21] selftests/bpf: Test bpf_sysctl_get_name helper Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 12/21] selftests/bpf: Test sysctl_get_current_value helper Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 13/21] selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 14/21] selftests/bpf: Test file_pos field in bpf_sysctl ctx Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 15/21] bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 16/21] bpf: Introduce bpf_strtol and bpf_strtoul helpers Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 17/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 18/21] selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 19/21] selftests/bpf: Test ARG_PTR_TO_LONG arg type Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 20/21] selftests/bpf: Test bpf_strtol and bpf_strtoul helpers Andrey Ignatov
2019-04-05 19:35 ` [PATCH v3 bpf-next 21/21] selftests/bpf: C based test for sysctl and strtoX Andrey Ignatov
2019-04-06 16:43 ` [PATCH v3 bpf-next 00/21] bpf: Sysctl hook Kees Cook
2019-04-06 17:02   ` Alexei Starovoitov
2019-04-09 16:50     ` Kees Cook
2019-04-09 23:17       ` Andrey Ignatov
2019-04-09 20:41 ` Jann Horn
2019-04-09 23:04   ` Andrey Ignatov
2019-04-09 23:22     ` Jann Horn
2019-04-09 23:34       ` Alexei Starovoitov
2019-04-12 21:27 ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).