> One of the strengths of linux is applications of features the authors
> of
> the software had not imagined.  Your proposals seem to be trying to
> put
> the world a tiny little box where if someone had not imagined and
> preapproved a use of a feature it should not happen.   Let's please
> avoid implementing totalitarianism to avoid malicious code exploiting
> bugs in the kernel.  I am not interested in that future.

You're describing operating systems like Android, ChromeOS and iOS.

That future is already here and the Linux kernel is the major weak point
in the attempts to build those systems based on Linux. Even for the very
restricted Chrome sandbox, it's the easiest way out.

Android similarly allows near zero access to /sys for apps and little
access to /proc beyond the /proc/PID directories belonging to an app.

> Especially when dealing with disabling code to reduce attack surface,
> when then are no known attacks what we are actually dealing with
> is a small percentage probability reduction that a malicious attacker
> will be able to exploit the attack.

There are perf events vulnerabilities being exploited in the wild to
gain root on Android. It's not a theoretical attack vector. They're used
in both malware and rooting tools. Local privilege escalation bugs in
the kernel are common so there are a lot of alternatives but it's one of
the major sources for vulnerabilities. There's a lot of architecture and
vendor specific perf events code and lots of bleeding edge features. On
Android, a lot of the perf events vulnerabilities have been specific to
the Qualcomm SoC platform. Other platforms are likely just receiving a
lot less attention.

> Remember security is as much about availability as it is about
> integrity.  You keep imagining features that are great big denial of
> service attacks on legitimate users.

Only developers care about perf events and they still have access to it.
JIT compilers have other ways to do tracing and even if they consider
this to be the ideal API, it's not particularly important if they have
to settle for something else. In reality, it's a small compromise.

> I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl.
> Perhaps something else.

It's not possible to use the current incarnation of seccomp for this
since it can't be dynamically granted/revoked. Perhaps it would be
possible to support adding/removing or at least toggling seccomp filters
for groups of processes. That would be good enough to take care of user
ns, ptrace, perf events, etc.