All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Christian Brauner <christian@brauner.io>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Seth Forshee <seth.forshee@canonical.com>,
	Serge Hallyn <serge@hallyn.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.19 02/46] Revert "vfs: Allow userns root to call mknod on owned filesystems."
Date: Fri, 28 Dec 2018 12:51:56 +0100	[thread overview]
Message-ID: <20181228113125.063988618@linuxfoundation.org> (raw)
In-Reply-To: <20181228113124.971620049@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christian Brauner <christian@brauner.io>

commit 94f82008ce30e2624537d240d64ce718255e0b80 upstream.

This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd.

commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.")
enabled mknod() in user namespaces for userns root if CAP_MKNOD is
available. However, these device nodes are useless since any filesystem
mounted from a non-initial user namespace will set the SB_I_NODEV flag on
the filesystem. Now, when a device node s created in a non-initial user
namespace a call to open() on said device node will fail due to:

bool may_open_dev(const struct path *path)
{
        return !(path->mnt->mnt_flags & MNT_NODEV) &&
                !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}

The problem with this is that as of the aforementioned commit mknod()
creates partially functional device nodes in non-initial user namespaces.
In particular, it has the consequence that as of the aforementioned commit
open() will be more privileged with respect to device nodes than mknod().
Before it was the other way around. Specifically, if mknod() succeeded
then it was transparent for any userspace application that a fatal error
must have occured when open() failed.

All of this breaks multiple userspace workloads and a widespread assumption
about how to handle mknod(). Basically, all container runtimes and systemd
live by the slogan "ask for forgiveness not permission" when running user
namespace workloads. For mknod() the assumption is that if the syscall
succeeds the device nodes are useable irrespective of whether it succeeds
in a non-initial user namespace or not. This logic was chosen explicitly
to allow for the glorious day when mknod() will actually be able to create
fully functional device nodes in user namespaces.
A specific problem people are already running into when running 4.18 rc
kernels are failing systemd services. For any distro that is run in a
container systemd services started with the PrivateDevices= property set
will fail to start since the device nodes in question cannot be
opened (cf. the arguments in [1]).

Full disclosure, Seth made the very sound argument that it is already
possible to end up with partially functional device nodes. Any filesystem
mounted with MS_NODEV set will allow mknod() to succeed but will not allow
open() to succeed. The difference to the case here is that the MS_NODEV
case is transparent to userspace since it is an explicitly set mount option
while the SB_I_NODEV case is an implicit property enforced by the kernel
and hence opaque to userspace.

[1]: https://github.com/systemd/systemd/pull/9483

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/namei.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3701,8 +3701,7 @@ int vfs_mknod(struct inode *dir, struct
 	if (error)
 		return error;
 
-	if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
-	    !ns_capable(dentry->d_sb->s_user_ns, CAP_MKNOD))
+	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
 		return -EPERM;
 
 	if (!dir->i_op->mknod)



  parent reply	other threads:[~2018-12-28 11:53 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 11:51 [PATCH 4.19 00/46] 4.19.13-stable review Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 01/46] iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" Greg Kroah-Hartman
2018-12-28 11:51 ` Greg Kroah-Hartman [this message]
2018-12-28 11:51 ` [PATCH 4.19 03/46] USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 04/46] xhci: Dont prevent USB2 bus suspend in state check intended for USB3 only Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 05/46] USB: xhci: fix broken_suspend placement in struct xchi_hcd Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 06/46] USB: serial: option: add GosunCn ZTE WeLink ME3630 Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 07/46] USB: serial: option: add HP lt4132 Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 08/46] USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode) Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 09/46] USB: serial: option: add Fibocom NL668 series Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 10/46] USB: serial: option: add Telit LN940 series Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 11/46] ubifs: Handle re-linking of inodes correctly while recovery Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 12/46] scsi: t10-pi: Return correct ref tag when queue has no integrity profile Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 13/46] scsi: sd: use mempool for discard special page Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 14/46] mmc: core: Reset HPI enabled state during re-init and in case of errors Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 15/46] mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 16/46] mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 17/46] mmc: omap_hsmmc: fix DMA API warning Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 18/46] gpio: max7301: fix driver for use with CONFIG_VMAP_STACK Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 19/46] gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 20/46] posix-timers: Fix division by zero bug Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 21/46] KVM: X86: Fix NULL deref in vcpu_scan_ioapic Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 22/46] kvm: x86: Add AMDs EX_CFG to the list of ignored MSRs Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 23/46] KVM: Fix UAF in nested posted interrupt processing Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 24/46] Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 25/46] futex: Cure exit race Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 26/46] x86/mtrr: Dont copy uninitialized gentry fields back to userspace Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 27/46] x86/mm: Fix decoy address handling vs 32-bit builds Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 28/46] x86/vdso: Pass --eh-frame-hdr to the linker Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 29/46] x86/intel_rdt: Ensure a CPU remains online for the regions pseudo-locking sequence Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 30/46] panic: avoid deadlocks in re-entrant console drivers Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 31/46] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 32/46] mm: make the __PAGETABLE_PxD_FOLDED defines non-empty Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 33/46] mm: introduce mm_[p4d|pud|pmd]_folded Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 34/46] xfrm_user: fix freeing of xfrm states on acquire Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 35/46] rtlwifi: Fix leak of skb when processing C2H_BT_INFO Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 36/46] iwlwifi: mvm: dont send GEO_TX_POWER_LIMIT to old firmwares Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 37/46] Revert "mwifiex: restructure rx_reorder_tbl_lock usage" Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 38/46] iwlwifi: add new cards for 9560, 9462, 9461 and killer series Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 39/46] media: ov5640: Fix set format regression Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 40/46] mm, memory_hotplug: initialize struct pages for the full memory section Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 41/46] mm: thp: fix flags for pmd migration when split Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 42/46] mm, page_alloc: fix has_unmovable_pages for HugePages Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 43/46] mm: dont miss the last page because of round-off error Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 44/46] Input: elantech - disable elan-i2c for P52 and P72 Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 45/46] proc/sysctl: dont return ENOMEM on lookup when a table is unregistering Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 46/46] drm/ioctl: Fix Spectre v1 vulnerabilities Greg Kroah-Hartman
2018-12-28 17:54 ` [PATCH 4.19 00/46] 4.19.13-stable review Dan Rue
2018-12-29 12:20   ` Greg Kroah-Hartman
2018-12-28 20:08 ` shuah
2018-12-29  9:55   ` Greg Kroah-Hartman
2018-12-28 21:29 ` Guenter Roeck
2018-12-29 12:20   ` Greg Kroah-Hartman
2018-12-29  9:01 ` Harsh Shandilya
2018-12-29 12:20   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181228113125.063988618@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=christian@brauner.io \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.