LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability
@ 2019-12-18  9:16 Alexey Budankov
  2019-12-18  9:24 ` [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space Alexey Budankov
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:16 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Currently access to perf_events, i915_perf and other performance monitoring and
observability subsystems of the kernel is open for a privileged process [1] with
CAP_SYS_ADMIN capability enabled in the process effective set [2].

This patch set introduces CAP_SYS_PERFMON capability devoted to secure system
performance monitoring and observability operations so that CAP_SYS_PERFMON would
assist CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems of the kernel.

CAP_SYS_PERFMON intends to meet the demand to secure system performance monitoring
and observability operations in security sensitive, restricted, production
environments (e.g. HPC clusters, cloud and virtual compute environments) where root
or CAP_SYS_ADMIN credentials are not available to mass users of a system because
of security considerations.

CAP_SYS_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack surface
that is available to CAP_SYS_ADMIN privileged processes [2].

CAP_SYS_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system
performance monitoring and observability operations and balance amount of
CAP_SYS_ADMIN credentials following the recommendations in the capabilities man
page [2] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to
kernel developers, below."

For backward compatibility reasons access to system performance monitoring and
observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged
processes but CAP_SYS_ADMIN capability usage for secure system performance
monitoring and observability operations is discouraged with respect to the
introduced CAP_SYS_PERFMON capability.

The patch set is for tip perf/core repository:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6

---
Changes in v4:
- converted perfmon_capable() into an inline function
- made perf_events kprobes, uprobes, hw breakpoints and namespaces data available
  to CAP_SYS_PERFMON privileged processes
- applied perfmon_capable() to drivers/perf and drivers/oprofile
- extended __cmd_ftrace() with support of CAP_SYS_PERFMON
Changes in v3:
- implemented perfmon_capable() macros aggregating required capabilities checks
Changes in v2:
- made perf_events trace points available to CAP_SYS_PERFMON privileged processes
- made perf_event_paranoid_check() treat CAP_SYS_PERFMON equally to CAP_SYS_ADMIN
- applied CAP_SYS_PERFMON to i915_perf, bpf_trace, powerpc and parisc system
  performance monitoring and observability related subsystems

---
Alexey Budankov (9):
  capabilities: introduce CAP_SYS_PERFMON to kernel and user space
  perf/core: open access for CAP_SYS_PERFMON privileged process
  perf tool: extend Perf tool with CAP_SYS_PERFMON capability support
  drm/i915/perf: open access for CAP_SYS_PERFMON privileged process
  trace/bpf_trace: open access for CAP_SYS_PERFMON privileged process
  powerpc/perf: open access for CAP_SYS_PERFMON privileged process
  parisc/perf: open access for CAP_SYS_PERFMON privileged process
  drivers/perf: open access for CAP_SYS_PERFMON privileged process
  drivers/oprofile: open access for CAP_SYS_PERFMON privileged process

 arch/parisc/kernel/perf.c           |  2 +-
 arch/powerpc/perf/imc-pmu.c         |  4 ++--
 drivers/gpu/drm/i915/i915_perf.c    | 13 ++++++-------
 drivers/oprofile/event_buffer.c     |  2 +-
 drivers/perf/arm_spe_pmu.c          |  4 ++--
 include/linux/capability.h          |  4 ++++
 include/linux/perf_event.h          |  6 +++---
 include/uapi/linux/capability.h     |  8 +++++++-
 kernel/events/core.c                |  6 +++---
 kernel/trace/bpf_trace.c            |  2 +-
 security/selinux/include/classmap.h |  4 ++--
 tools/perf/builtin-ftrace.c         |  5 +++--
 tools/perf/design.txt               |  3 ++-
 tools/perf/util/cap.h               |  4 ++++
 tools/perf/util/evsel.c             | 10 +++++-----
 tools/perf/util/util.c              |  1 +
 16 files changed, 47 insertions(+), 31 deletions(-)

---
Testing and validation (Intel Skylake, 8 cores, Fedora 29, 5.4.0-rc8+, x86_64):

libcap library [3], [4] and Perf tool can be used to apply CAP_SYS_PERFMON 
capability for secure system performance monitoring and observability beyond the
scope permitted by the system wide perf_event_paranoid kernel setting [5] and
below are the steps for evaluation:

  - patch, build and boot the kernel
  - patch, build Perf tool e.g. to /home/user/perf
  ...
  # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
  # pushd libcap
  # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1]
  # make
  # pushd progs
  # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
  # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
  /home/user/perf: OK
  # ./getcap /home/user/perf
  /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
  # echo 2 > /proc/sys/kernel/perf_event_paranoid
  # cat /proc/sys/kernel/perf_event_paranoid 
  2
  ...
  $ /home/user/perf top
    ... works as expected ...
  $ cat /proc/`pidof perf`/status
  Name:	perf
  Umask:	0002
  State:	S (sleeping)
  Tgid:	2958
  Ngid:	0
  Pid:	2958
  PPid:	9847
  TracerPid:	0
  Uid:	500	500	500	500
  Gid:	500	500	500	500
  FDSize:	256
  ...
  CapInh:	0000000000000000
  CapPrm:	0000004400080000
  CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
                                     cap_sys_perfmon,cap_sys_ptrace,cap_syslog
  CapBnd:	0000007fffffffff
  CapAmb:	0000000000000000
  NoNewPrivs:	0
  Seccomp:	0
  Speculation_Store_Bypass:	thread vulnerable
  Cpus_allowed:	ff
  Cpus_allowed_list:	0-7
  ...

Usage of cap_sys_perfmon effectively avoids unused credentials excess:

- with cap_sys_admin:
  CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111

- with cap_sys_perfmon:
  CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
                                    38   34               19
                           sys_perfmon   syslog           sys_ptrace

---

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
[2] http://man7.org/linux/man-pages/man7/capabilities.7.html
[3] http://man7.org/linux/man-pages/man8/setcap.8.html
[4] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
[5] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
[6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf

-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
@ 2019-12-18  9:24 ` Alexey Budankov
  2019-12-18 19:56   ` Stephen Smalley
  2019-12-18  9:25 ` [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:24 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Introduce CAP_SYS_PERFMON capability devoted to secure system performance
monitoring and observability operations so that CAP_SYS_PERFMON would
assist CAP_SYS_ADMIN capability in its governing role for perf_events,
i915_perf and other subsystems of the kernel.

CAP_SYS_PERFMON intends to harden system security and integrity during
system performance monitoring and observability operations by decreasing
attack surface that is available to CAP_SYS_ADMIN privileged processes.

CAP_SYS_PERFMON intends to take over CAP_SYS_ADMIN credentials related
to system performance monitoring and observability operations and balance
amount of CAP_SYS_ADMIN credentials in accordance with the recommendations
provided in the man page for CAP_SYS_ADMIN [1]: "Note: this capability
is overloaded; see Notes to kernel developers, below."

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 include/linux/capability.h          | 4 ++++
 include/uapi/linux/capability.h     | 8 +++++++-
 security/selinux/include/classmap.h | 4 ++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..883c879baa4b 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,10 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
 extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
 extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
 extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
+static inline bool perfmon_capable(void)
+{
+	return capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN);
+}
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..98e03cc76c7c 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -366,8 +366,14 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_READ		37
 
+/*
+ * Allow system performance and observability privileged operations
+ * using perf_events, i915_perf and other kernel subsystems
+ */
+
+#define CAP_SYS_PERFMON		38
 
-#define CAP_LAST_CAP         CAP_AUDIT_READ
+#define CAP_LAST_CAP         CAP_SYS_PERFMON
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 7db24855e12d..bae602c623b0 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@
 	    "audit_control", "setfcap"
 
 #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
-		"wake_alarm", "block_suspend", "audit_read"
+		"wake_alarm", "block_suspend", "audit_read", "sys_perfmon"
 
-#if CAP_LAST_CAP > CAP_AUDIT_READ
+#if CAP_LAST_CAP > CAP_SYS_PERFMON
 #error New capability defined, please update COMMON_CAP2_PERMS.
 #endif
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
  2019-12-18  9:24 ` [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space Alexey Budankov
@ 2019-12-18  9:25 ` Alexey Budankov
  2020-01-08 16:07   ` Peter Zijlstra
  2019-12-18  9:26 ` [PATCH v4 3/9] perf tool: extend Perf tool with CAP_SYS_PERFMON capability support Alexey Budankov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:25 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
processes. For backward compatibility reasons access to perf_events
subsystem remains open for CAP_SYS_ADMIN privileged processes but
CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged
with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 include/linux/perf_event.h | 6 +++---
 kernel/events/core.c       | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 34c7c6910026..f46acd69425f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)
 
 static inline int perf_allow_kernel(struct perf_event_attr *attr)
 {
-	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
+	if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
 		return -EACCES;
 
 	return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
@@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr)
 
 static inline int perf_allow_cpu(struct perf_event_attr *attr)
 {
-	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
+	if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
 		return -EACCES;
 
 	return security_perf_event_open(attr, PERF_SECURITY_CPU);
@@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr)
 
 static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
 {
-	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
+	if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
 		return -EPERM;
 
 	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 059ee7116008..d9db414f2197 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9056,7 +9056,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
 	if (event->attr.type != perf_kprobe.type)
 		return -ENOENT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EACCES;
 
 	/*
@@ -9116,7 +9116,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
 	if (event->attr.type != perf_uprobe.type)
 		return -ENOENT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EACCES;
 
 	/*
@@ -11157,7 +11157,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	if (attr.namespaces) {
-		if (!capable(CAP_SYS_ADMIN))
+		if (!perfmon_capable())
 			return -EACCES;
 	}
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 3/9] perf tool: extend Perf tool with CAP_SYS_PERFMON capability support
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
  2019-12-18  9:24 ` [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space Alexey Budankov
  2019-12-18  9:25 ` [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
@ 2019-12-18  9:26 ` Alexey Budankov
  2019-12-18  9:27 ` [PATCH v4 4/9] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:26 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Extend error messages to mention CAP_SYS_PERFMON capability as an option
to substitute CAP_SYS_ADMIN capability for secure system performance
monitoring and observability operations. Make perf_event_paranoid_check()
and __cmd_ftrace() to be aware of CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-ftrace.c |  5 +++--
 tools/perf/design.txt       |  3 ++-
 tools/perf/util/cap.h       |  4 ++++
 tools/perf/util/evsel.c     | 10 +++++-----
 tools/perf/util/util.c      |  1 +
 5 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index d5adc417a4ca..8096e9b5f4f9 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -284,10 +284,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
 		.events = POLLIN,
 	};
 
-	if (!perf_cap__capable(CAP_SYS_ADMIN)) {
+	if (!(perf_cap__capable(CAP_SYS_PERFMON) ||
+	      perf_cap__capable(CAP_SYS_ADMIN))) {
 		pr_err("ftrace only works for %s!\n",
 #ifdef HAVE_LIBCAP_SUPPORT
-		"users with the SYS_ADMIN capability"
+		"users with the CAP_SYS_PERFMON or CAP_SYS_ADMIN capability"
 #else
 		"root"
 #endif
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index 0453ba26cdbd..71755b3e1303 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -258,7 +258,8 @@ gets schedule to. Per task counters can be created by any user, for
 their own tasks.
 
 A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
-all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
+all events on CPU-x. Per CPU counters need CAP_SYS_PERFMON or
+CAP_SYS_ADMIN privilege.
 
 The 'flags' parameter is currently unused and must be zero.
 
diff --git a/tools/perf/util/cap.h b/tools/perf/util/cap.h
index 051dc590ceee..0f79fbf6638b 100644
--- a/tools/perf/util/cap.h
+++ b/tools/perf/util/cap.h
@@ -29,4 +29,8 @@ static inline bool perf_cap__capable(int cap __maybe_unused)
 #define CAP_SYSLOG	34
 #endif
 
+#ifndef CAP_SYS_PERFMON
+#define CAP_SYS_PERFMON 38
+#endif
+
 #endif /* __PERF_CAP_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f4dea055b080..3a46325e3702 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2468,14 +2468,14 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target,
 		 "You may not have permission to collect %sstats.\n\n"
 		 "Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n"
 		 "which controls use of the performance events system by\n"
-		 "unprivileged users (without CAP_SYS_ADMIN).\n\n"
+		 "unprivileged users (without CAP_SYS_PERFMON or CAP_SYS_ADMIN).\n\n"
 		 "The current value is %d:\n\n"
 		 "  -1: Allow use of (almost) all events by all users\n"
 		 "      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK\n"
-		 ">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN\n"
-		 "      Disallow raw tracepoint access by users without CAP_SYS_ADMIN\n"
-		 ">= 1: Disallow CPU event access by users without CAP_SYS_ADMIN\n"
-		 ">= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN\n\n"
+		 ">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+		 "      Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+		 ">= 1: Disallow CPU event access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+		 ">= 2: Disallow kernel profiling by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n\n"
 		 "To make this setting permanent, edit /etc/sysctl.conf too, e.g.:\n\n"
 		 "	kernel.perf_event_paranoid = -1\n" ,
 				 target->system_wide ? "system-wide " : "",
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 969ae560dad9..9981db0d8d09 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -272,6 +272,7 @@ int perf_event_paranoid(void)
 bool perf_event_paranoid_check(int max_level)
 {
 	return perf_cap__capable(CAP_SYS_ADMIN) ||
+			perf_cap__capable(CAP_SYS_PERFMON) ||
 			perf_event_paranoid() <= max_level;
 }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 4/9] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
                   ` (2 preceding siblings ...)
  2019-12-18  9:26 ` [PATCH v4 3/9] perf tool: extend Perf tool with CAP_SYS_PERFMON capability support Alexey Budankov
@ 2019-12-18  9:27 ` Alexey Budankov
  2019-12-19  9:10   ` Lionel Landwerlin
  2019-12-18  9:28 ` [PATCH v4 5/9] trace/bpf_trace: " Alexey Budankov
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:27 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to i915_perf monitoring for CAP_SYS_PERFMON privileged
processes. For backward compatibility reasons access to i915_perf
subsystem remains open for CAP_SYS_ADMIN privileged processes but
CAP_SYS_ADMIN usage for secure i915_perf monitoring is discouraged
with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e42b86827d6b..e2697f8d04de 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2748,10 +2748,10 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 	 * we check a dev.i915.perf_stream_paranoid sysctl option
 	 * to determine if it's ok to access system wide OA counters
-	 * without CAP_SYS_ADMIN privileges.
+	 * without CAP_SYS_PERFMON or CAP_SYS_ADMIN privileges.
 	 */
 	if (privileged_op &&
-	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+	    i915_perf_stream_paranoid && !perfmon_capable()) {
 		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
@@ -2939,9 +2939,8 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			} else
 				oa_freq_hz = 0;
 
-			if (oa_freq_hz > i915_oa_max_sample_rate &&
-			    !capable(CAP_SYS_ADMIN)) {
-				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
+			if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) {
+				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_SYS_PERFMON or CAP_SYS_ADMIN privileges\n",
 					  i915_oa_max_sample_rate);
 				return -EACCES;
 			}
@@ -3328,7 +3327,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
 		DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
 		return -EACCES;
 	}
@@ -3474,7 +3473,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 		return -ENOTSUPP;
 	}
 
-	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
 		DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
 		return -EACCES;
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 5/9] trace/bpf_trace: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
                   ` (3 preceding siblings ...)
  2019-12-18  9:27 ` [PATCH v4 4/9] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
@ 2019-12-18  9:28 ` " Alexey Budankov
  2019-12-18  9:28 ` [PATCH v4 6/9] powerpc/perf: " Alexey Budankov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to bpf_trace monitoring for CAP_SYS_PERFMON privileged
processes. For backward compatibility reasons access to bpf_trace
monitoring remains open for CAP_SYS_ADMIN privileged processes but
CAP_SYS_ADMIN usage for secure bpf_trace monitoring is discouraged
with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 kernel/trace/bpf_trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 44bd08f2443b..bafe21ac6d92 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1272,7 +1272,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 	u32 *ids, prog_cnt, ids_len;
 	int ret;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EPERM;
 	if (event->attr.type != PERF_TYPE_TRACEPOINT)
 		return -EINVAL;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 6/9] powerpc/perf: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
                   ` (4 preceding siblings ...)
  2019-12-18  9:28 ` [PATCH v4 5/9] trace/bpf_trace: " Alexey Budankov
@ 2019-12-18  9:28 ` " Alexey Budankov
  2019-12-18  9:29 ` [PATCH v4 7/9] parisc/perf: " Alexey Budankov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 arch/powerpc/perf/imc-pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d..e837717492e4 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event)
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EACCES;
 
 	/* Sampling not supported */
@@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event *event)
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EACCES;
 
 	/* Return if this is a couting event */
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 7/9] parisc/perf: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
                   ` (5 preceding siblings ...)
  2019-12-18  9:28 ` [PATCH v4 6/9] powerpc/perf: " Alexey Budankov
@ 2019-12-18  9:29 ` " Alexey Budankov
  2019-12-18  9:30 ` [PATCH v4 8/9] drivers/perf: " Alexey Budankov
  2019-12-18  9:31 ` [PATCH v4 9/9] drivers/oprofile: " Alexey Budankov
  8 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:29 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 arch/parisc/kernel/perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index 676683641d00..c4208d027794 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
 	else
 		return -EFAULT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EACCES;
 
 	if (count != sizeof(uint32_t))
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 8/9] drivers/perf: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
                   ` (6 preceding siblings ...)
  2019-12-18  9:29 ` [PATCH v4 7/9] parisc/perf: " Alexey Budankov
@ 2019-12-18  9:30 ` " Alexey Budankov
       [not found]   ` <20200117105153.GB6144@willie-the-truck>
  2019-12-18  9:31 ` [PATCH v4 9/9] drivers/oprofile: " Alexey Budankov
  8 siblings, 1 reply; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:30 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 drivers/perf/arm_spe_pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..5dff81bc3324 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
 	if (!attr->exclude_kernel)
 		reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
 
-	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
+	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
 		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
 
 	return reg;
@@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 		return -EOPNOTSUPP;
 
 	reg = arm_spe_event_to_pmscr(event);
-	if (!capable(CAP_SYS_ADMIN) &&
+	if (!perfmon_capable() &&
 	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
 		    BIT(SYS_PMSCR_EL1_CX_SHIFT) |
 		    BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 9/9] drivers/oprofile: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
                   ` (7 preceding siblings ...)
  2019-12-18  9:30 ` [PATCH v4 8/9] drivers/perf: " Alexey Budankov
@ 2019-12-18  9:31 ` " Alexey Budankov
  8 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2019-12-18  9:31 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list


Open access to monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 drivers/oprofile/event_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index 12ea4a4ad607..6c9edc8bbc95 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct file *file)
 {
 	int err = -EPERM;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!perfmon_capable())
 		return -EPERM;
 
 	if (test_and_set_bit_lock(0, &buffer_opened))
-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space
  2019-12-18  9:24 ` [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space Alexey Budankov
@ 2019-12-18 19:56   ` Stephen Smalley
  0 siblings, 0 replies; 27+ messages in thread
From: Stephen Smalley @ 2019-12-18 19:56 UTC (permalink / raw)
  To: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	Alexei Starovoitov, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, james.bottomley, Serge Hallyn, James Morris,
	Will Deacon, Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Lionel Landwerlin, Song Liu,
	linux-kernel, linux-security-module, selinux, intel-gfx, bpf,
	linux-parisc, linuxppc-dev, linux-perf-users, linux-arm-kernel,
	oprofile-list

On 12/18/19 4:24 AM, Alexey Budankov wrote:
> 
> Introduce CAP_SYS_PERFMON capability devoted to secure system performance
> monitoring and observability operations so that CAP_SYS_PERFMON would
> assist CAP_SYS_ADMIN capability in its governing role for perf_events,
> i915_perf and other subsystems of the kernel.
> 
> CAP_SYS_PERFMON intends to harden system security and integrity during
> system performance monitoring and observability operations by decreasing
> attack surface that is available to CAP_SYS_ADMIN privileged processes.
> 
> CAP_SYS_PERFMON intends to take over CAP_SYS_ADMIN credentials related
> to system performance monitoring and observability operations and balance
> amount of CAP_SYS_ADMIN credentials in accordance with the recommendations
> provided in the man page for CAP_SYS_ADMIN [1]: "Note: this capability
> is overloaded; see Notes to kernel developers, below."
> 
> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

Note for selinux developers: we will need to update the 
selinux-testsuite tests for perf_event when/if this change lands upstream.

> ---
>   include/linux/capability.h          | 4 ++++
>   include/uapi/linux/capability.h     | 8 +++++++-
>   security/selinux/include/classmap.h | 4 ++--
>   3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index ecce0f43c73a..883c879baa4b 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -251,6 +251,10 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> +static inline bool perfmon_capable(void)
> +{
> +	return capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN);
> +}
>   
>   /* audit system wants to get cap info from files as well */
>   extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 240fdb9a60f6..98e03cc76c7c 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -366,8 +366,14 @@ struct vfs_ns_cap_data {
>   
>   #define CAP_AUDIT_READ		37
>   
> +/*
> + * Allow system performance and observability privileged operations
> + * using perf_events, i915_perf and other kernel subsystems
> + */
> +
> +#define CAP_SYS_PERFMON		38
>   
> -#define CAP_LAST_CAP         CAP_AUDIT_READ
> +#define CAP_LAST_CAP         CAP_SYS_PERFMON
>   
>   #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
>   
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index 7db24855e12d..bae602c623b0 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -27,9 +27,9 @@
>   	    "audit_control", "setfcap"
>   
>   #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
> -		"wake_alarm", "block_suspend", "audit_read"
> +		"wake_alarm", "block_suspend", "audit_read", "sys_perfmon"
>   
> -#if CAP_LAST_CAP > CAP_AUDIT_READ
> +#if CAP_LAST_CAP > CAP_SYS_PERFMON
>   #error New capability defined, please update COMMON_CAP2_PERMS.
>   #endif
>   
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/9] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:27 ` [PATCH v4 4/9] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
@ 2019-12-19  9:10   ` Lionel Landwerlin
  0 siblings, 0 replies; 27+ messages in thread
From: Lionel Landwerlin @ 2019-12-19  9:10 UTC (permalink / raw)
  To: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	Alexei Starovoitov, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, james.bottomley, Serge Hallyn, James Morris,
	Will Deacon, Mark Rutland, Casey Schaufler, Robert Richter
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Kees Cook, Jann Horn,
	Thomas Gleixner, Tvrtko Ursulin, Song Liu, linux-kernel,
	linux-security-module, selinux, intel-gfx, bpf, linux-parisc,
	linuxppc-dev, linux-perf-users, linux-arm-kernel, oprofile-list

On 18/12/2019 11:27, Alexey Budankov wrote:
> Open access to i915_perf monitoring for CAP_SYS_PERFMON privileged
> processes. For backward compatibility reasons access to i915_perf
> subsystem remains open for CAP_SYS_ADMIN privileged processes but
> CAP_SYS_ADMIN usage for secure i915_perf monitoring is discouraged
> with respect to CAP_SYS_PERFMON capability.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_perf.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index e42b86827d6b..e2697f8d04de 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -2748,10 +2748,10 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
>   	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
>   	 * we check a dev.i915.perf_stream_paranoid sysctl option
>   	 * to determine if it's ok to access system wide OA counters
> -	 * without CAP_SYS_ADMIN privileges.
> +	 * without CAP_SYS_PERFMON or CAP_SYS_ADMIN privileges.
>   	 */
>   	if (privileged_op &&
> -	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
> +	    i915_perf_stream_paranoid && !perfmon_capable()) {
>   		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
>   		ret = -EACCES;
>   		goto err_ctx;
> @@ -2939,9 +2939,8 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   			} else
>   				oa_freq_hz = 0;
>   
> -			if (oa_freq_hz > i915_oa_max_sample_rate &&
> -			    !capable(CAP_SYS_ADMIN)) {
> -				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
> +			if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) {
> +				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_SYS_PERFMON or CAP_SYS_ADMIN privileges\n",
>   					  i915_oa_max_sample_rate);
>   				return -EACCES;
>   			}
> @@ -3328,7 +3327,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
>   		return -EINVAL;
>   	}
>   
> -	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
>   		DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
>   		return -EACCES;
>   	}
> @@ -3474,7 +3473,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
>   		return -ENOTSUPP;
>   	}
>   
> -	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
>   		DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
>   		return -EACCES;
>   	}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2019-12-18  9:25 ` [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
@ 2020-01-08 16:07   ` Peter Zijlstra
  2020-01-09 11:36     ` Alexey Budankov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-01-08 16:07 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Jiri Olsa,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Alexander Shishkin,
	Namhyung Kim, Kees Cook, Jann Horn, Thomas Gleixner,
	Tvrtko Ursulin, Lionel Landwerlin, Song Liu, linux-kernel,
	linux-security-module, selinux, intel-gfx, bpf, linux-parisc,
	linuxppc-dev, linux-perf-users, linux-arm-kernel, oprofile-list

On Wed, Dec 18, 2019 at 12:25:35PM +0300, Alexey Budankov wrote:
> 
> Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
> processes. For backward compatibility reasons access to perf_events
> subsystem remains open for CAP_SYS_ADMIN privileged processes but
> CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged
> with respect to CAP_SYS_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  include/linux/perf_event.h | 6 +++---
>  kernel/events/core.c       | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 34c7c6910026..f46acd69425f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)
>  
>  static inline int perf_allow_kernel(struct perf_event_attr *attr)
>  {
> -	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
>  		return -EACCES;
>  
>  	return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
> @@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr)
>  
>  static inline int perf_allow_cpu(struct perf_event_attr *attr)
>  {
> -	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
>  		return -EACCES;
>  
>  	return security_perf_event_open(attr, PERF_SECURITY_CPU);
> @@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr)
>  
>  static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
>  {
> -	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
>  		return -EPERM;
>  
>  	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);

These are OK I suppose.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 059ee7116008..d9db414f2197 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9056,7 +9056,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
>  	if (event->attr.type != perf_kprobe.type)
>  		return -ENOENT;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!perfmon_capable())
>  		return -EACCES;
>  
>  	/*

This one only allows attaching to already extant kprobes, right? It does
not allow creation of kprobes.

> @@ -9116,7 +9116,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
>  	if (event->attr.type != perf_uprobe.type)
>  		return -ENOENT;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!perfmon_capable())
>  		return -EACCES;
>  
>  	/*

Idem, I presume.

> @@ -11157,7 +11157,7 @@ SYSCALL_DEFINE5(perf_event_open,
>  	}
>  
>  	if (attr.namespaces) {
> -		if (!capable(CAP_SYS_ADMIN))
> +		if (!perfmon_capable())
>  			return -EACCES;
>  	}

And given we basically make the entire kernel observable with this CAP,
busting namespaces shoulnd't be a problem either.

So yeah, I suppose that works.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-08 16:07   ` Peter Zijlstra
@ 2020-01-09 11:36     ` Alexey Budankov
       [not found]       ` <20200110140234.GO2844@hirez.programming.kicks-ass.net>
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Budankov @ 2020-01-09 11:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Jiri Olsa,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Alexander Shishkin,
	Namhyung Kim, Kees Cook, Jann Horn, Thomas Gleixner,
	Tvrtko Ursulin, Lionel Landwerlin, Song Liu, linux-kernel,
	linux-security-module, selinux, intel-gfx, bpf, linux-parisc,
	linuxppc-dev, linux-perf-users, linux-arm-kernel, oprofile-list


On 08.01.2020 19:07, Peter Zijlstra wrote:
> On Wed, Dec 18, 2019 at 12:25:35PM +0300, Alexey Budankov wrote:
>>
>> Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
>> processes. For backward compatibility reasons access to perf_events
>> subsystem remains open for CAP_SYS_ADMIN privileged processes but
>> CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged
>> with respect to CAP_SYS_PERFMON capability.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  include/linux/perf_event.h | 6 +++---
>>  kernel/events/core.c       | 6 +++---
>>  2 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 34c7c6910026..f46acd69425f 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)
>>  
>>  static inline int perf_allow_kernel(struct perf_event_attr *attr)
>>  {
>> -	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
>>  		return -EACCES;
>>  
>>  	return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
>> @@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr)
>>  
>>  static inline int perf_allow_cpu(struct perf_event_attr *attr)
>>  {
>> -	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
>>  		return -EACCES;
>>  
>>  	return security_perf_event_open(attr, PERF_SECURITY_CPU);
>> @@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr)
>>  
>>  static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
>>  {
>> -	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
>>  		return -EPERM;
>>  
>>  	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
> 
> These are OK I suppose.
> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 059ee7116008..d9db414f2197 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -9056,7 +9056,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
>>  	if (event->attr.type != perf_kprobe.type)
>>  		return -ENOENT;
>>  
>> -	if (!capable(CAP_SYS_ADMIN))
>> +	if (!perfmon_capable())
>>  		return -EACCES;
>>  
>>  	/*
> 
> This one only allows attaching to already extant kprobes, right? It does
> not allow creation of kprobes.

This unblocks creation of local trace kprobes and uprobes by CAP_SYS_PERFMON 
privileged process, exactly the same as for CAP_SYS_ADMIN privileged process.

> 
>> @@ -9116,7 +9116,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
>>  	if (event->attr.type != perf_uprobe.type)
>>  		return -ENOENT;
>>  
>> -	if (!capable(CAP_SYS_ADMIN))
>> +	if (!perfmon_capable())
>>  		return -EACCES;
>>  
>>  	/*
> 
> Idem, I presume.
> 
>> @@ -11157,7 +11157,7 @@ SYSCALL_DEFINE5(perf_event_open,
>>  	}
>>  
>>  	if (attr.namespaces) {
>> -		if (!capable(CAP_SYS_ADMIN))
>> +		if (!perfmon_capable())
>>  			return -EACCES;
>>  	}
> 
> And given we basically make the entire kernel observable with this CAP,
> busting namespaces shoulnd't be a problem either.
> 
> So yeah, I suppose that works.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
       [not found]                 ` <5e191833.1c69fb81.8bc25.a88c@mx.google.com>
@ 2020-01-11  9:57                   ` Alexey Budankov
  2020-01-13 20:39                     ` Song Liu
  2020-01-14  3:25                     ` Masami Hiramatsu
  0 siblings, 2 replies; 27+ messages in thread
From: Alexey Budankov @ 2020-01-11  9:57 UTC (permalink / raw)
  To: arnaldo.melo, Song Liu, Masami Hiramatsu
  Cc: Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel


On 11.01.2020 3:35, arnaldo.melo@gmail.com wrote:
> <keescook@chromium.org>,Jann Horn <jannh@google.com>,Thomas Gleixner <tglx@linutronix.de>,Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,Lionel Landwerlin <lionel.g.landwerlin@intel.com>,linux-kernel <linux-kernel@vger.kernel.org>,"linux-security-module@vger.kernel.org" <linux-security-module@vger.kernel.org>,"selinux@vger.kernel.org" <selinux@vger.kernel.org>,"intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>,"bpf@vger.kernel.org" <bpf@vger.kernel.org>,"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,"linux-perf-users@vger.kernel.org" <linux-perf-users@vger.kernel.org>,"linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>,"oprofile-list@lists.sf.net" <oprofile-list@lists.sf.net>
> From: Arnaldo Carvalho de Melo <acme@kernel.org>
> Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@kernel.org>
> 
> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@fb.com> wrote:
>>
>>
>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@kernel.org>
>> wrote:
>>>
>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>>
>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>> <peterz@infradead.org> wrote:
>>>>>> Again, this only allows attaching to previously created kprobes,
>> it does
>>>>>> not allow creating kprobes, right?
>>>>
>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>> kprobes.
>>>>
>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>> creating kprobes.
>>>>
>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>> interface to
>>>>> define new kprobe events, and those events are treated as
>> completely same as
>>>>> tracepoint events. On the other hand, ebpf tries to define new
>> probe event
>>>>> via perf_event interface. Above one is that interface. IOW, it
>> creates new kprobe.
>>>>
>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>> interface for creating kprobes/uprobes?
>>>
>>> Would you mean perf probe to switch to perf_event_open()?
>>> No, perf probe is for setting up the ftrace probe events. I think we
>> can add an
>>> option to use perf_event_open(). But current kprobe creation from
>> perf_event_open()
>>> is separated from ftrace by design.
>>
>> I guess we can extend event parser to understand kprobe directly.
>> Instead of
>>
>> 	perf probe kernel_func
>> 	perf stat/record -e probe:kernel_func ...
>>
>> We can just do 
>>
>> 	perf stat/record -e kprobe:kernel_func ...
> 
> 
> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.

Arnaldo, Masami, Song,

What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
Could you please also review and comment on patch 5/9 for bpf_trace.c?

Thanks,
Alexey

> 
> Regards, 
> 
> - Arnaldo
> 
>>
>> Thanks,
>> Song
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-11  9:57                   ` Alexey Budankov
@ 2020-01-13 20:39                     ` Song Liu
  2020-01-14  3:25                     ` Masami Hiramatsu
  1 sibling, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-01-13 20:39 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Masami Hiramatsu, Peter Zijlstra,
	Ingo Molnar, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	Alexei Starovoitov, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, james.bottomley, Serge Hallyn, James Morris,
	Will Deacon, Mark Rutland, Casey Schaufler, Robert Richter,
	Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, linux-kernel



> On Jan 11, 2020, at 1:57 AM, Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
> 
> 
> On 11.01.2020 3:35, arnaldo.melo@gmail.com wrote:
>> <keescook@chromium.org>,Jann Horn <jannh@google.com>,Thomas Gleixner <tglx@linutronix.de>,Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,Lionel Landwerlin <lionel.g.landwerlin@intel.com>,linux-kernel <linux-kernel@vger.kernel.org>,"linux-security-module@vger.kernel.org" <linux-security-module@vger.kernel.org>,"selinux@vger.kernel.org" <selinux@vger.kernel.org>,"intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>,"bpf@vger.kernel.org" <bpf@vger.kernel.org>,"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,"linux-perf-users@vger.kernel.org" <linux-perf-users@vger.kernel.org>,"linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>,"oprofile-list@lists.sf.net" <oprofile-list@lists.sf.net>
>> From: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@kernel.org>
>> 
>> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@fb.com> wrote:
>>> 
>>> 
>>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@kernel.org>
>>> wrote:
>>>> 
>>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>>> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>>> 
>>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>>> <peterz@infradead.org> wrote:
>>>>>>> Again, this only allows attaching to previously created kprobes,
>>> it does
>>>>>>> not allow creating kprobes, right?
>>>>> 
>>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>>> kprobes.
>>>>> 
>>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>>> creating kprobes.
>>>>> 
>>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>>> interface to
>>>>>> define new kprobe events, and those events are treated as
>>> completely same as
>>>>>> tracepoint events. On the other hand, ebpf tries to define new
>>> probe event
>>>>>> via perf_event interface. Above one is that interface. IOW, it
>>> creates new kprobe.
>>>>> 
>>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>>> interface for creating kprobes/uprobes?
>>>> 
>>>> Would you mean perf probe to switch to perf_event_open()?
>>>> No, perf probe is for setting up the ftrace probe events. I think we
>>> can add an
>>>> option to use perf_event_open(). But current kprobe creation from
>>> perf_event_open()
>>>> is separated from ftrace by design.
>>> 
>>> I guess we can extend event parser to understand kprobe directly.
>>> Instead of
>>> 
>>> 	perf probe kernel_func
>>> 	perf stat/record -e probe:kernel_func ...
>>> 
>>> We can just do 
>>> 
>>> 	perf stat/record -e kprobe:kernel_func ...
>> 
>> 
>> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
> 
> Arnaldo, Masami, Song,
> 
> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?

I think we should at least allow CAP_SYS_PERFMON to create some kprobes. Maybe we can 
limited that to per-task kprobes, and the task should be owned by the user?

Thanks,
Song



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-11  9:57                   ` Alexey Budankov
  2020-01-13 20:39                     ` Song Liu
@ 2020-01-14  3:25                     ` Masami Hiramatsu
  2020-01-14  5:17                       ` Alexei Starovoitov
  1 sibling, 1 reply; 27+ messages in thread
From: Masami Hiramatsu @ 2020-01-14  3:25 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: arnaldo.melo, Song Liu, Peter Zijlstra, Ingo Molnar, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Jiri Olsa,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Alexander Shishkin,
	Namhyung Kim, linux-kernel

On Sat, 11 Jan 2020 12:57:18 +0300
Alexey Budankov <alexey.budankov@linux.intel.com> wrote:

> 
> On 11.01.2020 3:35, arnaldo.melo@gmail.com wrote:

> > Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@kernel.org>
> > 
> > On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@kernel.org>
> >> wrote:
> >>>
> >>> On Fri, 10 Jan 2020 13:45:31 -0300
> >>> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >>>
> >>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
> >>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> >> <peterz@infradead.org> wrote:
> >>>>>> Again, this only allows attaching to previously created kprobes,
> >> it does
> >>>>>> not allow creating kprobes, right?
> >>>>
> >>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
> >>>>>> kprobes.
> >>>>
> >>>>>> As might be clear; I don't actually know what the user-ABI is for
> >>>>>> creating kprobes.
> >>>>
> >>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> >> interface to
> >>>>> define new kprobe events, and those events are treated as
> >> completely same as
> >>>>> tracepoint events. On the other hand, ebpf tries to define new
> >> probe event
> >>>>> via perf_event interface. Above one is that interface. IOW, it
> >> creates new kprobe.
> >>>>
> >>>> Masami, any plans to make 'perf probe' use the perf_event_open()
> >>>> interface for creating kprobes/uprobes?
> >>>
> >>> Would you mean perf probe to switch to perf_event_open()?
> >>> No, perf probe is for setting up the ftrace probe events. I think we
> >> can add an
> >>> option to use perf_event_open(). But current kprobe creation from
> >> perf_event_open()
> >>> is separated from ftrace by design.
> >>
> >> I guess we can extend event parser to understand kprobe directly.
> >> Instead of
> >>
> >> 	perf probe kernel_func
> >> 	perf stat/record -e probe:kernel_func ...
> >>
> >> We can just do 
> >>
> >> 	perf stat/record -e kprobe:kernel_func ...
> > 
> > 
> > You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
> 
> Arnaldo, Masami, Song,
> 
> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
> Could you please also review and comment on patch 5/9 for bpf_trace.c?

As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
to open it for enabling/disabling kprobes, not for creation.

If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
to shoot their foot by their own risk, I'm OK to allow it. (Even though,
it should check the max number of probes to be created by something like
ulimit)
I think nowadays we have fixed all such kernel crash problems on x86,
but not sure for other archs, especially on the devices I can not reach.
I need more help to stabilize it.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14  3:25                     ` Masami Hiramatsu
@ 2020-01-14  5:17                       ` Alexei Starovoitov
  2020-01-14  9:47                         ` Alexey Budankov
  2020-01-14 12:04                         ` Masami Hiramatsu
  0 siblings, 2 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2020-01-14  5:17 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Alexey Budankov, Arnaldo Carvalho de Melo, Song Liu,
	Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel

On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Sat, 11 Jan 2020 12:57:18 +0300
> Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
>
> >
> > On 11.01.2020 3:35, arnaldo.melo@gmail.com wrote:
>
> > > Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@kernel.org>
> > >
> > > On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@fb.com> wrote:
> > >>
> > >>
> > >>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@kernel.org>
> > >> wrote:
> > >>>
> > >>> On Fri, 10 Jan 2020 13:45:31 -0300
> > >>> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > >>>
> > >>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
> > >>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> > >> <peterz@infradead.org> wrote:
> > >>>>>> Again, this only allows attaching to previously created kprobes,
> > >> it does
> > >>>>>> not allow creating kprobes, right?
> > >>>>
> > >>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
> > >>>>>> kprobes.
> > >>>>
> > >>>>>> As might be clear; I don't actually know what the user-ABI is for
> > >>>>>> creating kprobes.
> > >>>>
> > >>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> > >> interface to
> > >>>>> define new kprobe events, and those events are treated as
> > >> completely same as
> > >>>>> tracepoint events. On the other hand, ebpf tries to define new
> > >> probe event
> > >>>>> via perf_event interface. Above one is that interface. IOW, it
> > >> creates new kprobe.
> > >>>>
> > >>>> Masami, any plans to make 'perf probe' use the perf_event_open()
> > >>>> interface for creating kprobes/uprobes?
> > >>>
> > >>> Would you mean perf probe to switch to perf_event_open()?
> > >>> No, perf probe is for setting up the ftrace probe events. I think we
> > >> can add an
> > >>> option to use perf_event_open(). But current kprobe creation from
> > >> perf_event_open()
> > >>> is separated from ftrace by design.
> > >>
> > >> I guess we can extend event parser to understand kprobe directly.
> > >> Instead of
> > >>
> > >>    perf probe kernel_func
> > >>    perf stat/record -e probe:kernel_func ...
> > >>
> > >> We can just do
> > >>
> > >>    perf stat/record -e kprobe:kernel_func ...
> > >
> > >
> > > You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
> >
> > Arnaldo, Masami, Song,
> >
> > What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
> > Could you please also review and comment on patch 5/9 for bpf_trace.c?
>
> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> to open it for enabling/disabling kprobes, not for creation.
>
> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> it should check the max number of probes to be created by something like
> ulimit)
> I think nowadays we have fixed all such kernel crash problems on x86,
> but not sure for other archs, especially on the devices I can not reach.
> I need more help to stabilize it.

I don't see how enable/disable is any safer than creation.
If there are kernel bugs in kprobes the kernel will crash anyway.
I think such partial CAP_SYS_PERFMON would be very confusing to the users.
CAP_* is about delegation of root privileges to non-root.
Delegating some of it is ok, but disallowing creation makes it useless
for bpf tracing, so we would need to add another CAP later.
Hence I suggest to do it right away instead of breaking
sys_perf_even_open() access into two CAPs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14  5:17                       ` Alexei Starovoitov
@ 2020-01-14  9:47                         ` Alexey Budankov
  2020-01-14 18:06                           ` Alexei Starovoitov
  2020-01-14 12:04                         ` Masami Hiramatsu
  1 sibling, 1 reply; 27+ messages in thread
From: Alexey Budankov @ 2020-01-14  9:47 UTC (permalink / raw)
  To: Alexei Starovoitov, Masami Hiramatsu
  Cc: Arnaldo Carvalho de Melo, Song Liu, Peter Zijlstra, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Jiri Olsa,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Alexander Shishkin,
	Namhyung Kim, linux-kernel


On 14.01.2020 8:17, Alexei Starovoitov wrote:
> On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>>
>> On Sat, 11 Jan 2020 12:57:18 +0300
>> Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
>>
>>>
>>> On 11.01.2020 3:35, arnaldo.melo@gmail.com wrote:
>>
>>>> Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@kernel.org>
>>>>
>>>> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@fb.com> wrote:
>>>>>
>>>>>
>>>>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@kernel.org>
>>>>> wrote:
>>>>>>
>>>>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>>>>> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>>>>>
>>>>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>>>>> <peterz@infradead.org> wrote:
>>>>>>>>> Again, this only allows attaching to previously created kprobes,
>>>>> it does
>>>>>>>>> not allow creating kprobes, right?
>>>>>>>
>>>>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>>>>> kprobes.
>>>>>>>
>>>>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>>>>> creating kprobes.
>>>>>>>
>>>>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>>>>> interface to
>>>>>>>> define new kprobe events, and those events are treated as
>>>>> completely same as
>>>>>>>> tracepoint events. On the other hand, ebpf tries to define new
>>>>> probe event
>>>>>>>> via perf_event interface. Above one is that interface. IOW, it
>>>>> creates new kprobe.
>>>>>>>
>>>>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>>>>> interface for creating kprobes/uprobes?
>>>>>>
>>>>>> Would you mean perf probe to switch to perf_event_open()?
>>>>>> No, perf probe is for setting up the ftrace probe events. I think we
>>>>> can add an
>>>>>> option to use perf_event_open(). But current kprobe creation from
>>>>> perf_event_open()
>>>>>> is separated from ftrace by design.
>>>>>
>>>>> I guess we can extend event parser to understand kprobe directly.
>>>>> Instead of
>>>>>
>>>>>    perf probe kernel_func
>>>>>    perf stat/record -e probe:kernel_func ...
>>>>>
>>>>> We can just do
>>>>>
>>>>>    perf stat/record -e kprobe:kernel_func ...
>>>>
>>>>
>>>> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
>>>
>>> Arnaldo, Masami, Song,
>>>
>>> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
>>> Could you please also review and comment on patch 5/9 for bpf_trace.c?
>>
>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>> to open it for enabling/disabling kprobes, not for creation.
>>
>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>> it should check the max number of probes to be created by something like
>> ulimit)
>> I think nowadays we have fixed all such kernel crash problems on x86,
>> but not sure for other archs, especially on the devices I can not reach.
>> I need more help to stabilize it.
> 
> I don't see how enable/disable is any safer than creation.
> If there are kernel bugs in kprobes the kernel will crash anyway.
> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> CAP_* is about delegation of root privileges to non-root.
> Delegating some of it is ok, but disallowing creation makes it useless
> for bpf tracing, so we would need to add another CAP later.
> Hence I suggest to do it right away instead of breaking
> sys_perf_even_open() access into two CAPs.
> 

Alexei, Masami,

Thanks for your meaningful input.
If we know in advance that it still can crash the system in some cases and on 
some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
such delegation looks premature until the crashes are avoided. So it looks like
access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
a separate patch set.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14  5:17                       ` Alexei Starovoitov
  2020-01-14  9:47                         ` Alexey Budankov
@ 2020-01-14 12:04                         ` Masami Hiramatsu
  1 sibling, 0 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2020-01-14 12:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexey Budankov, Arnaldo Carvalho de Melo, Song Liu,
	Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel

On Mon, 13 Jan 2020 21:17:49 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Sat, 11 Jan 2020 12:57:18 +0300
> > Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
> >
> > >
> > > On 11.01.2020 3:35, arnaldo.melo@gmail.com wrote:
> >
> > > > Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@kernel.org>
> > > >
> > > > On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@fb.com> wrote:
> > > >>
> > > >>
> > > >>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@kernel.org>
> > > >> wrote:
> > > >>>
> > > >>> On Fri, 10 Jan 2020 13:45:31 -0300
> > > >>> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > >>>
> > > >>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
> > > >>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> > > >> <peterz@infradead.org> wrote:
> > > >>>>>> Again, this only allows attaching to previously created kprobes,
> > > >> it does
> > > >>>>>> not allow creating kprobes, right?
> > > >>>>
> > > >>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
> > > >>>>>> kprobes.
> > > >>>>
> > > >>>>>> As might be clear; I don't actually know what the user-ABI is for
> > > >>>>>> creating kprobes.
> > > >>>>
> > > >>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> > > >> interface to
> > > >>>>> define new kprobe events, and those events are treated as
> > > >> completely same as
> > > >>>>> tracepoint events. On the other hand, ebpf tries to define new
> > > >> probe event
> > > >>>>> via perf_event interface. Above one is that interface. IOW, it
> > > >> creates new kprobe.
> > > >>>>
> > > >>>> Masami, any plans to make 'perf probe' use the perf_event_open()
> > > >>>> interface for creating kprobes/uprobes?
> > > >>>
> > > >>> Would you mean perf probe to switch to perf_event_open()?
> > > >>> No, perf probe is for setting up the ftrace probe events. I think we
> > > >> can add an
> > > >>> option to use perf_event_open(). But current kprobe creation from
> > > >> perf_event_open()
> > > >>> is separated from ftrace by design.
> > > >>
> > > >> I guess we can extend event parser to understand kprobe directly.
> > > >> Instead of
> > > >>
> > > >>    perf probe kernel_func
> > > >>    perf stat/record -e probe:kernel_func ...
> > > >>
> > > >> We can just do
> > > >>
> > > >>    perf stat/record -e kprobe:kernel_func ...
> > > >
> > > >
> > > > You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
> > >
> > > Arnaldo, Masami, Song,
> > >
> > > What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
> > > Could you please also review and comment on patch 5/9 for bpf_trace.c?
> >
> > As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> > to open it for enabling/disabling kprobes, not for creation.
> >
> > If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> > to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> > it should check the max number of probes to be created by something like
> > ulimit)
> > I think nowadays we have fixed all such kernel crash problems on x86,
> > but not sure for other archs, especially on the devices I can not reach.
> > I need more help to stabilize it.
> 
> I don't see how enable/disable is any safer than creation.
> If there are kernel bugs in kprobes the kernel will crash anyway.

Why? admin can test the probes before using it via bpf.

My point was only admin can make a dicision to allow (or delegate) the
priviledge to a user, and if it is OK, I don't mind it.
(Maybe it is better to give a knob to allow this CAP only for admin.)

> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> CAP_* is about delegation of root privileges to non-root.
> Delegating some of it is ok, but disallowing creation makes it useless
> for bpf tracing, so we would need to add another CAP later.
> Hence I suggest to do it right away instead of breaking
> sys_perf_even_open() access into two CAPs.

I understand that the single strong CAP will useful anyway (even if
it is CAP_SYS_ADMIN). I just concern that causes any issue and when
someone wants to mitigate it, it is sad if there is only way to disable
all tracing facilities.

What about providing a sysctl to control the power of the CAP? maybe
it is also good from the viewpoint of system security.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14  9:47                         ` Alexey Budankov
@ 2020-01-14 18:06                           ` Alexei Starovoitov
  2020-01-14 18:50                             ` Alexey Budankov
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2020-01-14 18:06 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Masami Hiramatsu, Arnaldo Carvalho de Melo, Song Liu,
	Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel

On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
> >>
> >> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >> to open it for enabling/disabling kprobes, not for creation.
> >>
> >> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >> it should check the max number of probes to be created by something like
> >> ulimit)
> >> I think nowadays we have fixed all such kernel crash problems on x86,
> >> but not sure for other archs, especially on the devices I can not reach.
> >> I need more help to stabilize it.
> >
> > I don't see how enable/disable is any safer than creation.
> > If there are kernel bugs in kprobes the kernel will crash anyway.
> > I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> > CAP_* is about delegation of root privileges to non-root.
> > Delegating some of it is ok, but disallowing creation makes it useless
> > for bpf tracing, so we would need to add another CAP later.
> > Hence I suggest to do it right away instead of breaking
> > sys_perf_even_open() access into two CAPs.
> >
>
> Alexei, Masami,
>
> Thanks for your meaningful input.
> If we know in advance that it still can crash the system in some cases and on
> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> such delegation looks premature until the crashes are avoided. So it looks like
> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> a separate patch set.

perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
imo adding a cap just for pmc is pointless.
if you add a new cap it should cover all of sys_perf_event_open syscall.
subdividing it into sw vs hw counters, kprobe create vs enable, etc will
be the source of ongoing confusion. nack to such cap.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14 18:06                           ` Alexei Starovoitov
@ 2020-01-14 18:50                             ` Alexey Budankov
  2020-01-15  1:52                               ` Alexei Starovoitov
  2020-01-15  9:45                               ` Masami Hiramatsu
  0 siblings, 2 replies; 27+ messages in thread
From: Alexey Budankov @ 2020-01-14 18:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu, Arnaldo Carvalho de Melo, Song Liu,
	Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel


On 14.01.2020 21:06, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>>>
>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>
>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>> it should check the max number of probes to be created by something like
>>>> ulimit)
>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>> but not sure for other archs, especially on the devices I can not reach.
>>>> I need more help to stabilize it.
>>>
>>> I don't see how enable/disable is any safer than creation.
>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>> CAP_* is about delegation of root privileges to non-root.
>>> Delegating some of it is ok, but disallowing creation makes it useless
>>> for bpf tracing, so we would need to add another CAP later.
>>> Hence I suggest to do it right away instead of breaking
>>> sys_perf_even_open() access into two CAPs.
>>>
>>
>> Alexei, Masami,
>>
>> Thanks for your meaningful input.
>> If we know in advance that it still can crash the system in some cases and on
>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>> such delegation looks premature until the crashes are avoided. So it looks like
>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>> a separate patch set.
> 
> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.

Sure, software cannot guarantee, but known software bugs could still be fixed,
that's what I meant.

> imo adding a cap just for pmc is pointless.
> if you add a new cap it should cover all of sys_perf_event_open syscall.
> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> be the source of ongoing confusion. nack to such cap.
> 

Well, as this patch set already covers complete perf_event_open functionality,
and also eBPF related parts too, could you please review and comment on it?
Does the patches 2/9 and 5/9 already bring all required extentions?

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14 18:50                             ` Alexey Budankov
@ 2020-01-15  1:52                               ` Alexei Starovoitov
  2020-01-15  5:15                                 ` Alexey Budankov
  2020-01-15  9:45                               ` Masami Hiramatsu
  1 sibling, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2020-01-15  1:52 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Masami Hiramatsu, Arnaldo Carvalho de Melo, Song Liu,
	Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel,
	Andy Lutomirski

On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
>
> On 14.01.2020 21:06, Alexei Starovoitov wrote:
> > On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> > <alexey.budankov@linux.intel.com> wrote:
> >>>>
> >>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >>>> to open it for enabling/disabling kprobes, not for creation.
> >>>>
> >>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >>>> it should check the max number of probes to be created by something like
> >>>> ulimit)
> >>>> I think nowadays we have fixed all such kernel crash problems on x86,
> >>>> but not sure for other archs, especially on the devices I can not reach.
> >>>> I need more help to stabilize it.
> >>>
> >>> I don't see how enable/disable is any safer than creation.
> >>> If there are kernel bugs in kprobes the kernel will crash anyway.
> >>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> >>> CAP_* is about delegation of root privileges to non-root.
> >>> Delegating some of it is ok, but disallowing creation makes it useless
> >>> for bpf tracing, so we would need to add another CAP later.
> >>> Hence I suggest to do it right away instead of breaking
> >>> sys_perf_even_open() access into two CAPs.
> >>>
> >>
> >> Alexei, Masami,
> >>
> >> Thanks for your meaningful input.
> >> If we know in advance that it still can crash the system in some cases and on
> >> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> >> such delegation looks premature until the crashes are avoided. So it looks like
> >> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> >> a separate patch set.
> >
> > perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>
> Sure, software cannot guarantee, but known software bugs could still be fixed,
> that's what I meant.
>
> > imo adding a cap just for pmc is pointless.
> > if you add a new cap it should cover all of sys_perf_event_open syscall.
> > subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> > be the source of ongoing confusion. nack to such cap.
> >
>
> Well, as this patch set already covers complete perf_event_open functionality,
> and also eBPF related parts too, could you please review and comment on it?
> Does the patches 2/9 and 5/9 already bring all required extentions?

yes. the current patches 2 and 5 look good to me.
I would only change patch 1 to what Andy was proposing earlier:

static inline bool perfmon_capable(void)
{
if (capable_noaudit(CAP_PERFMON))
  return capable(CAP_PERFMON);
if (capable_noaudit(CAP_SYS_ADMIN))
  return capable(CAP_SYS_ADMIN);

return capable(CAP_PERFMON);
}
I think Andy was trying to preserve the order of audit events.

I'm also suggesting to drop SYS from the cap name. It doesn't add any value
to the name.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-15  1:52                               ` Alexei Starovoitov
@ 2020-01-15  5:15                                 ` Alexey Budankov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2020-01-15  5:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu, Arnaldo Carvalho de Melo, Song Liu,
	Peter Zijlstra, Ingo Molnar, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, Alexei Starovoitov, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, james.bottomley, Serge Hallyn,
	James Morris, Will Deacon, Mark Rutland, Casey Schaufler,
	Robert Richter, Jiri Olsa, Andi Kleen, Stephane Eranian,
	Igor Lubashev, Alexander Shishkin, Namhyung Kim, linux-kernel,
	Andy Lutomirski


On 15.01.2020 4:52, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>
>>
>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>
>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>
>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>> it should check the max number of probes to be created by something like
>>>>>> ulimit)
>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>> I need more help to stabilize it.
>>>>>
>>>>> I don't see how enable/disable is any safer than creation.
>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>> Hence I suggest to do it right away instead of breaking
>>>>> sys_perf_even_open() access into two CAPs.
>>>>>
>>>>
>>>> Alexei, Masami,
>>>>
>>>> Thanks for your meaningful input.
>>>> If we know in advance that it still can crash the system in some cases and on
>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>> a separate patch set.
>>>
>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>
>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>> that's what I meant.
>>
>>> imo adding a cap just for pmc is pointless.
>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>> be the source of ongoing confusion. nack to such cap.
>>>
>>
>> Well, as this patch set already covers complete perf_event_open functionality,
>> and also eBPF related parts too, could you please review and comment on it?
>> Does the patches 2/9 and 5/9 already bring all required extentions?
> 
> yes. the current patches 2 and 5 look good to me.

Thanks. I appreciate your cooperation.

> I would only change patch 1 to what Andy was proposing earlier:

Could you please share the link to the proposal to get more details?
In this patch set discussion there was only this [1] on more generic 
naming of PERFMON cap from Andi Kleen.

> 
> static inline bool perfmon_capable(void)
> {
> if (capable_noaudit(CAP_PERFMON))
>   return capable(CAP_PERFMON);
> if (capable_noaudit(CAP_SYS_ADMIN))
>   return capable(CAP_SYS_ADMIN);
> 
> return capable(CAP_PERFMON);
> }

Yes, this makes sense and adds up.

> I think Andy was trying to preserve the order of audit events.
> 
> I'm also suggesting to drop SYS from the cap name. It doesn't add any value
> to the name.

Agreed, CAP_PERFMON sounds more generic, as it actually is.

Gratefully,
Alexey

[1] https://lore.kernel.org/lkml/20191211203648.GA862919@tassilo.jf.intel.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-14 18:50                             ` Alexey Budankov
  2020-01-15  1:52                               ` Alexei Starovoitov
@ 2020-01-15  9:45                               ` Masami Hiramatsu
  2020-01-15 12:11                                 ` Alexey Budankov
  1 sibling, 1 reply; 27+ messages in thread
From: Masami Hiramatsu @ 2020-01-15  9:45 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Alexei Starovoitov, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Song Liu, Peter Zijlstra, Ingo Molnar, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Jiri Olsa,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Alexander Shishkin,
	Namhyung Kim, linux-kernel

On Tue, 14 Jan 2020 21:50:33 +0300
Alexey Budankov <alexey.budankov@linux.intel.com> wrote:

> 
> On 14.01.2020 21:06, Alexei Starovoitov wrote:
> > On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> > <alexey.budankov@linux.intel.com> wrote:
> >>>>
> >>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >>>> to open it for enabling/disabling kprobes, not for creation.
> >>>>
> >>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >>>> it should check the max number of probes to be created by something like
> >>>> ulimit)
> >>>> I think nowadays we have fixed all such kernel crash problems on x86,
> >>>> but not sure for other archs, especially on the devices I can not reach.
> >>>> I need more help to stabilize it.
> >>>
> >>> I don't see how enable/disable is any safer than creation.
> >>> If there are kernel bugs in kprobes the kernel will crash anyway.
> >>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> >>> CAP_* is about delegation of root privileges to non-root.
> >>> Delegating some of it is ok, but disallowing creation makes it useless
> >>> for bpf tracing, so we would need to add another CAP later.
> >>> Hence I suggest to do it right away instead of breaking
> >>> sys_perf_even_open() access into two CAPs.
> >>>
> >>
> >> Alexei, Masami,
> >>
> >> Thanks for your meaningful input.
> >> If we know in advance that it still can crash the system in some cases and on
> >> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> >> such delegation looks premature until the crashes are avoided. So it looks like
> >> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> >> a separate patch set.
> > 
> > perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
> 

OK, anyway, for higher security, admin may not give CAP_SYS_PERFMON to
unpriviledged users, since it might allows users to analyze kernel, which
can lead security concerns.

> Sure, software cannot guarantee, but known software bugs could still be fixed,
> that's what I meant.

Agreed, bugs must be fixed anyway.

Thank you,

> > imo adding a cap just for pmc is pointless.
> > if you add a new cap it should cover all of sys_perf_event_open syscall.
> > subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> > be the source of ongoing confusion. nack to such cap.
> > 
> 
> Well, as this patch set already covers complete perf_event_open functionality,
> and also eBPF related parts too, could you please review and comment on it?
> Does the patches 2/9 and 5/9 already bring all required extentions?
> 
> Thanks,
> Alexey


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process
  2020-01-15  9:45                               ` Masami Hiramatsu
@ 2020-01-15 12:11                                 ` Alexey Budankov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2020-01-15 12:11 UTC (permalink / raw)
  To: Masami Hiramatsu, Alexei Starovoitov, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Song Liu, Ingo Molnar, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Jiri Olsa,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Alexander Shishkin,
	Namhyung Kim, linux-kernel


On 15.01.2020 12:45, Masami Hiramatsu wrote:
> On Tue, 14 Jan 2020 21:50:33 +0300
> Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
> 
>>
>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>
>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>
>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>> it should check the max number of probes to be created by something like
>>>>>> ulimit)
>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>> I need more help to stabilize it.
>>>>>
>>>>> I don't see how enable/disable is any safer than creation.
>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>> Hence I suggest to do it right away instead of breaking
>>>>> sys_perf_even_open() access into two CAPs.
>>>>>
>>>>
>>>> Alexei, Masami,
>>>>
>>>> Thanks for your meaningful input.
>>>> If we know in advance that it still can crash the system in some cases and on
>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>> a separate patch set.
>>>
>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>
> 
> OK, anyway, for higher security, admin may not give CAP_SYS_PERFMON to
> unpriviledged users, since it might allows users to analyze kernel, which
> can lead security concerns.

FWIW,
Discovered security related hardware issues could be mitigated in software and 
here [1] is the official procedure documented on how to follow up, so this could
be a draft plan to approach eBPF perf_events related hardware issues, if required.

[1] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html

> 
>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>> that's what I meant.
> 
> Agreed, bugs must be fixed anyway.
> 
> Thank you,
> 
>>> imo adding a cap just for pmc is pointless.
>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>> be the source of ongoing confusion. nack to such cap.
>>>
>>
>> Well, as this patch set already covers complete perf_event_open functionality,
>> and also eBPF related parts too, could you please review and comment on it?
>> Does the patches 2/9 and 5/9 already bring all required extentions?
>>
>> Thanks,
>> Alexey
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 8/9] drivers/perf: open access for CAP_SYS_PERFMON privileged process
       [not found]   ` <20200117105153.GB6144@willie-the-truck>
@ 2020-01-18 18:48     ` Alexey Budankov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexey Budankov @ 2020-01-18 18:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	james.bottomley, Serge Hallyn, James Morris, Will Deacon,
	Mark Rutland, Casey Schaufler, Robert Richter, Song Liu,
	Alexander Shishkin, Stephane Eranian, Jiri Olsa, Andi Kleen,
	Igor Lubashev, linux-kernel, Kees Cook, Jann Horn,
	linux-arm-kernel, Namhyung Kim, Thomas Gleixner


On 17.01.2020 13:51, Will Deacon wrote:
> On Wed, Dec 18, 2019 at 12:30:29PM +0300, Alexey Budankov wrote:
>>
>> Open access to monitoring for CAP_SYS_PERFMON privileged processes.
>> For backward compatibility reasons access to the monitoring remains open
>> for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
>> monitoring is discouraged with respect to CAP_SYS_PERFMON capability.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  drivers/perf/arm_spe_pmu.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index 4e4984a55cd1..5dff81bc3324 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
>>  	if (!attr->exclude_kernel)
>>  		reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
>>  
>> -	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
>> +	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
>>  		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
>>  
>>  	return reg;
>> @@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>>  		return -EOPNOTSUPP;
>>  
>>  	reg = arm_spe_event_to_pmscr(event);
>> -	if (!capable(CAP_SYS_ADMIN) &&
>> +	if (!perfmon_capable() &&
>>  	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
>>  		    BIT(SYS_PMSCR_EL1_CX_SHIFT) |
>>  		    BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
> 
> Acked-by: Will Deacon <will@kernel.org>
> 
> Worth noting that this allows profiling of *physical* addresses used by
> memory access instructions and so probably has some security implications
> beyond the usual "but perf is buggy" line of reasoning.

Good to know. Thank you!
The data on physical addresses used by memory access instructions can already be
provided under CAP_SYS_ADMIN privileges [1] thus, I suppose, any implications you
have mentioned are already in place. I believe providing the data under CAP_PERFMON
alone without the rest of CAP_SYS_ADMIN credentials decreases chances to misuse the
data for harm and makes the monitoring more secure.

~Alexey

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

> 
> Will
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18  9:16 [PATCH v4 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability Alexey Budankov
2019-12-18  9:24 ` [PATCH v4 1/9] capabilities: introduce CAP_SYS_PERFMON to kernel and user space Alexey Budankov
2019-12-18 19:56   ` Stephen Smalley
2019-12-18  9:25 ` [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
2020-01-08 16:07   ` Peter Zijlstra
2020-01-09 11:36     ` Alexey Budankov
     [not found]       ` <20200110140234.GO2844@hirez.programming.kicks-ass.net>
     [not found]         ` <20200111005213.6dfd98fb36ace098004bde0e@kernel.org>
     [not found]           ` <20200110164531.GA2598@kernel.org>
     [not found]             ` <20200111084735.0ff01c758bfbfd0ae2e1f24e@kernel.org>
     [not found]               ` <2B79131A-3F76-47F5-AAB4-08BCA820473F@fb.com>
     [not found]                 ` <5e191833.1c69fb81.8bc25.a88c@mx.google.com>
2020-01-11  9:57                   ` Alexey Budankov
2020-01-13 20:39                     ` Song Liu
2020-01-14  3:25                     ` Masami Hiramatsu
2020-01-14  5:17                       ` Alexei Starovoitov
2020-01-14  9:47                         ` Alexey Budankov
2020-01-14 18:06                           ` Alexei Starovoitov
2020-01-14 18:50                             ` Alexey Budankov
2020-01-15  1:52                               ` Alexei Starovoitov
2020-01-15  5:15                                 ` Alexey Budankov
2020-01-15  9:45                               ` Masami Hiramatsu
2020-01-15 12:11                                 ` Alexey Budankov
2020-01-14 12:04                         ` Masami Hiramatsu
2019-12-18  9:26 ` [PATCH v4 3/9] perf tool: extend Perf tool with CAP_SYS_PERFMON capability support Alexey Budankov
2019-12-18  9:27 ` [PATCH v4 4/9] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process Alexey Budankov
2019-12-19  9:10   ` Lionel Landwerlin
2019-12-18  9:28 ` [PATCH v4 5/9] trace/bpf_trace: " Alexey Budankov
2019-12-18  9:28 ` [PATCH v4 6/9] powerpc/perf: " Alexey Budankov
2019-12-18  9:29 ` [PATCH v4 7/9] parisc/perf: " Alexey Budankov
2019-12-18  9:30 ` [PATCH v4 8/9] drivers/perf: " Alexey Budankov
     [not found]   ` <20200117105153.GB6144@willie-the-truck>
2020-01-18 18:48     ` Alexey Budankov
2019-12-18  9:31 ` [PATCH v4 9/9] drivers/oprofile: " Alexey Budankov

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git