Re: [PATCH v4 0/2] Control over userfaultfd kernel-fault handling

From: Andrea Arcangeli <aarcange@redhat.com>
To: Nick Kralevich <nnk@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>,
	Kees Cook <keescook@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>, Peter Xu <peterx@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Stephen Smalley <stephen.smalley.work@gmail.com>,
	Eric Biggers <ebiggers@kernel.org>,
	Daniel Colascione <dancol@dancol.org>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-doc@vger.kernel.org, Kalesh Singh <kaleshsingh@google.com>,
	Calin Juravle <calin@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Jeffrey Vander Stoep <jeffv@google.com>,
	"Cc: Android Kernel" <kernel-team@android.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>, Shaohua Li <shli@fb.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Nitin Gupta <nigupta@nvidia.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Iurii Zaikin <yzaikin@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>
Subject: Re: [PATCH v4 0/2] Control over userfaultfd kernel-fault handling
Date: Sat, 24 Oct 2020 01:28:16 -0400	[thread overview]
Message-ID: <20201024052816.GD19707@redhat.com> (raw)
In-Reply-To: <CAFJ0LnGoD9NaKhbsohdXo5zt5nyMOX=g1aMRX0b0W1zBSNaSBg@mail.gmail.com>

Hello,

On Thu, Oct 08, 2020 at 04:22:36PM -0700, Nick Kralevich wrote:
> I haven't tried to verify this myself. I wonder if the usermode
> hardening changes also impacted this exploit? See
> https://lkml.org/lkml/2017/1/16/468

My plan was to:

1) reproduce with the old buggy kernel

2) forward port the bug to the very first version that had both the
   slub and page freelist randomization available and keep them
   disabled

3) enable the freelist randomization features (which are already
   enabled by default in the current enterprise kernels) and see if
   that makes the attack not workable

The hardening of the usermode helper you mentioned is spot on, but it
would have been something to worry about and possibly roll back at
point 2), but I couldn't get past point 1)..

Plenty other hardening techniques (just like the usermode helper) are
very specific to a single attack, but the randomization looks generic
enough to cover the entire class.

> But again, focusing on an exploit, which is inherently fragile in
> nature and dependent on the state of the kernel tree at a particular
> time, is unlikely to be useful to analyze this patch.

Agreed. A single exploit using userfaultfd to enlarge the race window
of the use-after-free, not being workable anymore with randomized slub
and page freelist enabled, wouldn't have meant a thing by itself.

As opposed if that single exploit was still fairly reproducible, it
would have been enough to consider the sysctl default to zero as
something providing a more tangible long term benefit. That would have
been good information to have too, if that's actually the case.

I was merely attempting to get a first data point.. overall it would
be nice to initiate some research to verify the exact statistical
effects that slub/page randomization has on those use-after-free race
conditions that can be enlarged by blocking kernel faults, given we're
already paying the price for it. I don't think anybody has a sure
answer at this point, if we can entirely rely on those features or not.

> Seccomp causes more problems than just performance. Seccomp is not
> designed for whole-of-system protections. Please see my other writeup
> at https://lore.kernel.org/lkml/CAFJ0LnEo-7YUvgOhb4pHteuiUW+wPfzqbwXUCGAA35ZMx11A-w@mail.gmail.com/

Whole-of-system protection I guess helps primarily because it requires
no change to userland I guess.

An example of a task not running as root (and without ptrace
capability) that could use more seccomp blocking:

# cat /proc/1517/cmdline ; echo ; grep CapEff /proc/1517/status; grep Seccomp /proc/1517/status
/vendor/bin/hw/qcrild
CapEff: 0000001000003000
Seccomp:        0

My view is that if the various binaries started by init.rc are run
without a strict seccomp filter there would be more things to worry
about, than kernel initiated userfaults for those.

Still the solution in the v5 patchset looks the safest for all until
we'll be able to tell if the slub/page randomizaton (or any other
generic enough robustness feature) is already effective against an
enlarged race window of kernel initiated userfaults and at the same
time it provides the main benefit of avoiding divergence in the
behavior of the userfaultfd syscall if invoked within the Android
userland.

Thanks,
Andrea