[PATCH v5 0/3] Implement fast refcount overflow protection

* [PATCH v5 0/3] Implement fast refcount overflow protection
@ 2017-05-30 21:39 Kees Cook
  2017-05-30 21:39 ` [PATCH v5 1/3] refcount: Create unchecked atomic_t implementation Kees Cook
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Kees Cook @ 2017-05-30 21:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Christoph Hellwig, Peter Zijlstra, Eric W. Biederman,
	Andrew Morton, Josh Poimboeuf, PaX Team, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, David Windsor, Greg KH,
	Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe, James Bottomley, x86,
	Ingo Molnar, Arnd Bergmann, David S. Miller, Rik van Riel,
	linux-arch, kernel-hardening

A new patch has been added at the start of this series to make the default
refcount_t implementation just use an unchecked atomic_t implementation,
since many kernel subsystems want to be able to opt out of the full
validation, since it includes a small performance overhead. When enabling
CONFIG_REFCOUNT_FULL, the full validation is used.

The other two patches provide overflow protection on x86 without incurring
a performance penalty. The changelog for patch 3 is reproduced here for
details:

This protection is a modified version of the x86 PAX_REFCOUNT defense
from PaX/grsecurity. This speeds up the refcount_t API by duplicating
the existing atomic_t implementation with a single instruction added to
detect if the refcount has wrapped past INT_MAX (or below 0) resulting
in a negative value, where the handler then restores the refcount_t to
INT_MAX or saturates to INT_MIN / 2. With this overflow protection, the
use-after-free following a refcount_t wrap is blocked from happening,
avoiding the vulnerability entirely.

While this defense only perfectly protects the overflow case, as that
can be detected and stopped before the reference is freed and left to be
abused by an attacker, it also notices some of the "inc from 0" and "below
0" cases. However, these only indicate that a use-after-free has already
happened. Such notifications are likely avoidable by an attacker that has
already exploited a use-after-free vulnerability, but it's better to have
them than allow such conditions to remain universally silent.

On overflow detection (actually "negative value" detection), the refcount
value is reset to INT_MAX, the offending process is killed, and a report
and stack trace are generated. This allows the system to attempt to
keep operating. In the case of a below-zero decrement or other negative
value results, the refcount is saturated to INT_MIN / 2 to keep it from
reaching zero again. (For the INT_MAX reset, another option would be to
choose (INT_MAX - N) with some small N to provide some headroom for
legitimate users of the reference counter.)

On the matter of races, since the entire range beyond INT_MAX but before 0
is negative, every inc will trap, leaving no overflow-only race condition.

As for performance, this implementation adds a single "js" instruction to
the regular execution flow of a copy of the regular atomic_t operations.
Since this is a forward jump, it is by default the non-predicted path,
which will be reinforced by dynamic branch prediction. The result is
this protection having no measurable change in performance over standard
atomic_t operations. The error path, located in .text.unlikely, saves
the refcount location and then uses UD0 to fire a refcount exception
handler, which resets the refcount, reports the error, marks the process
to be killed, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Assembly comparison:

atomic_inc
.text:
ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)

refcount_inc
.text:
ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
ffffffff816c36d7:       0f ff                   (bad)

Thanks to PaX Team for various suggestions for improvement.

-Kees

v5:
- add unchecked atomic_t implementation when !CONFIG_REFCOUNT_FULL
- use "leal" again, as in v3 for more flexible reset handling
- provide better underflow detection, with saturation

v4:
- switch to js from jns to gain static branch prediction benefits
- use .text.unlikely for js target, effectively making handler __cold
- use UD0 with refcount exception handler instead of int 0x81
- Kconfig defaults on when arch has support

v3:
- drop named text sections until we need to distinguish sizes/directions
- reset value immediately instead of passing back to handler
- drop needless export; josh

v2:
- fix instruction pointer decrement bug; thejh
- switch to js; pax-team
- improve commit log
- extract rmwcc macro helpers for better readability
- implemented checks in inc_not_zero interface
- adjusted reset values

^ permalink raw reply	[flat|nested] 17+ messages in thread