> One of the strengths of linux is applications of features the authors > of > the software had not imagined.  Your proposals seem to be trying to > put > the world a tiny little box where if someone had not imagined and > preapproved a use of a feature it should not happen.   Let's please > avoid implementing totalitarianism to avoid malicious code exploiting > bugs in the kernel.  I am not interested in that future. You're describing operating systems like Android, ChromeOS and iOS. That future is already here and the Linux kernel is the major weak point in the attempts to build those systems based on Linux. Even for the very restricted Chrome sandbox, it's the easiest way out. Android similarly allows near zero access to /sys for apps and little access to /proc beyond the /proc/PID directories belonging to an app. > Especially when dealing with disabling code to reduce attack surface, > when then are no known attacks what we are actually dealing with > is a small percentage probability reduction that a malicious attacker > will be able to exploit the attack. There are perf events vulnerabilities being exploited in the wild to gain root on Android. It's not a theoretical attack vector. They're used in both malware and rooting tools. Local privilege escalation bugs in the kernel are common so there are a lot of alternatives but it's one of the major sources for vulnerabilities. There's a lot of architecture and vendor specific perf events code and lots of bleeding edge features. On Android, a lot of the perf events vulnerabilities have been specific to the Qualcomm SoC platform. Other platforms are likely just receiving a lot less attention. > Remember security is as much about availability as it is about > integrity.  You keep imagining features that are great big denial of > service attacks on legitimate users. Only developers care about perf events and they still have access to it. JIT compilers have other ways to do tracing and even if they consider this to be the ideal API, it's not particularly important if they have to settle for something else. In reality, it's a small compromise. > I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl. > Perhaps something else. It's not possible to use the current incarnation of seccomp for this since it can't be dynamically granted/revoked. Perhaps it would be possible to support adding/removing or at least toggling seccomp filters for groups of processes. That would be good enough to take care of user ns, ptrace, perf events, etc.