linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
@ 2021-11-17  1:58 David Anderson
  2021-11-17  1:58 ` [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr David Anderson
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: David Anderson @ 2021-11-17  1:58 UTC (permalink / raw)
  Cc: David Anderson, Mark Salyzyn, Miklos Szeredi, Jonathan Corbet,
	Vivek Goyal, Eric W . Biederman, Amir Goldstein, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, linux-unionfs, linux-security-module, kernel-team,
	selinux, paulmoore, Luca.Boccassi

Mark Salyzyn (3):
  Add flags option to get xattr method paired to __vfs_getxattr
  overlayfs: handle XATTR_NOSECURITY flag for get xattr method
  overlayfs: override_creds=off option bypass creator_cred

Mark Salyzyn + John Stultz (1):
  overlayfs: inode_owner_or_capable called during execv

The first three patches address fundamental security issues that should
be solved regardless of the override_creds=off feature.

The fourth adds the feature depends on these other fixes.

By default, all access to the upper, lower and work directories is the
recorded mounter's MAC and DAC credentials.  The incoming accesses are
checked against the caller's credentials.

If the principles of least privilege are applied for sepolicy, the
mounter's credentials might not overlap the credentials of the caller's
when accessing the overlayfs filesystem.  For example, a file that a
lower DAC privileged caller can execute, is MAC denied to the
generally higher DAC privileged mounter, to prevent an attack vector.

We add the option to turn off override_creds in the mount options; all
subsequent operations after mount on the filesystem will be only the
caller's credentials.  The module boolean parameter and mount option
override_creds is also added as a presence check for this "feature",
existence of /sys/module/overlay/parameters/overlay_creds

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: David Anderson <dvander@google.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: selinux@vger.kernel.org
Cc: paulmoore@microsoft.com
Cc: Luca.Boccassi@microsoft.com

---

v19
- rebase.

v18
- rebase + fix minor cut and paste error for inode argument in __vfs_getxattr

v17
- correct some zero-day build failures.
- fix up documentation

v16
- rebase and merge of two patches.
- add adjustment to deal with execv when overrides is off.

v15
- Revert back to v4 with fixes from on the way from v5-v14. The single
  structure argument passing to address the complaints about too many
  arguments was rejected by the community.
- Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH
  check. Can address that independently.
- ToDo: upstream test frame for thes security fixes (currently testing
  is all in Android).

v14:
- Rejoin, rebase and a few adjustments.

v13:
- Pull out first patch and try to get it in alone feedback, some
  Acks, and then <crickets> because people forgot why we were doing i.

v12:
- Restore squished out patch 2 and 3 in the series,
  then change algorithm to add flags argument.
  Per-thread flag is a large security surface.

v11:
- Squish out v10 introduced patch 2 and 3 in the series,
  then and use per-thread flag instead for nesting.
- Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper.
- Add sb argument to ovl_revert_creds to match future work.

v10:
- Return NULL on CAP_DAC_READ_SEARCH
- Add __get xattr method to solve sepolicy logging issue
- Drop unnecessary sys_admin sepolicy checking for administrative
  driver internal xattr functions.

v6:
- Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
- Do better with the documentation, drop rationalizations.
- pr_warn message adjusted to report consequences.

v5:
- beefed up the caveats in the Documentation
- Is dependent on
  "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
  "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
- Added prwarn when override_creds=off

v4:
- spelling and grammar errors in text

v3:
- Change name from caller_credentials / creator_credentials to the
  boolean override_creds.
- Changed from creator to mounter credentials.
- Updated and fortified the documentation.
- Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS

v2:
- Forward port changed attr to stat, resulting in a build error.
- altered commit message.

David Anderson (4):
  Add flags option to get xattr method paired to __vfs_getxattr
  overlayfs: handle XATTR_NOSECURITY flag for get xattr method
  overlayfs: override_creds=off option bypass creator_cred
  overlayfs: inode_owner_or_capable called during execv

 Documentation/filesystems/locking.rst   |  2 +-
 Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++-
 fs/9p/acl.c                             |  3 +-
 fs/9p/xattr.c                           |  3 +-
 fs/afs/xattr.c                          | 10 +++---
 fs/attr.c                               |  2 +-
 fs/btrfs/xattr.c                        |  3 +-
 fs/ceph/xattr.c                         |  3 +-
 fs/cifs/xattr.c                         |  2 +-
 fs/ecryptfs/inode.c                     |  6 ++--
 fs/ecryptfs/mmap.c                      |  5 +--
 fs/erofs/xattr.c                        |  3 +-
 fs/ext2/xattr_security.c                |  2 +-
 fs/ext2/xattr_trusted.c                 |  2 +-
 fs/ext2/xattr_user.c                    |  2 +-
 fs/ext4/xattr_hurd.c                    |  2 +-
 fs/ext4/xattr_security.c                |  2 +-
 fs/ext4/xattr_trusted.c                 |  2 +-
 fs/ext4/xattr_user.c                    |  2 +-
 fs/f2fs/xattr.c                         |  4 +--
 fs/fuse/xattr.c                         |  4 +--
 fs/gfs2/xattr.c                         |  3 +-
 fs/hfs/attr.c                           |  2 +-
 fs/hfsplus/xattr.c                      |  3 +-
 fs/hfsplus/xattr_security.c             |  3 +-
 fs/hfsplus/xattr_trusted.c              |  3 +-
 fs/hfsplus/xattr_user.c                 |  3 +-
 fs/inode.c                              |  7 +++--
 fs/internal.h                           |  3 +-
 fs/jffs2/security.c                     |  3 +-
 fs/jffs2/xattr_trusted.c                |  3 +-
 fs/jffs2/xattr_user.c                   |  3 +-
 fs/jfs/xattr.c                          |  5 +--
 fs/kernfs/inode.c                       |  3 +-
 fs/nfs/nfs4proc.c                       |  9 ++++--
 fs/ntfs3/xattr.c                        |  2 +-
 fs/ocfs2/xattr.c                        |  9 ++++--
 fs/open.c                               |  2 +-
 fs/orangefs/xattr.c                     |  3 +-
 fs/overlayfs/copy_up.c                  |  2 +-
 fs/overlayfs/dir.c                      | 17 +++++-----
 fs/overlayfs/file.c                     | 25 ++++++++-------
 fs/overlayfs/inode.c                    | 29 ++++++++---------
 fs/overlayfs/namei.c                    |  6 ++--
 fs/overlayfs/overlayfs.h                |  7 +++--
 fs/overlayfs/ovl_entry.h                |  1 +
 fs/overlayfs/readdir.c                  |  8 ++---
 fs/overlayfs/super.c                    | 34 ++++++++++++++++----
 fs/overlayfs/util.c                     | 13 ++++++--
 fs/posix_acl.c                          |  2 +-
 fs/reiserfs/xattr_security.c            |  3 +-
 fs/reiserfs/xattr_trusted.c             |  3 +-
 fs/reiserfs/xattr_user.c                |  3 +-
 fs/squashfs/xattr.c                     |  2 +-
 fs/ubifs/xattr.c                        |  3 +-
 fs/xattr.c                              | 42 +++++++++++++------------
 fs/xfs/xfs_xattr.c                      |  3 +-
 include/linux/lsm_hook_defs.h           |  3 +-
 include/linux/security.h                |  6 ++--
 include/linux/xattr.h                   |  6 ++--
 include/uapi/linux/xattr.h              |  7 +++--
 mm/shmem.c                              |  3 +-
 net/socket.c                            |  3 +-
 security/commoncap.c                    | 11 ++++---
 security/integrity/evm/evm_main.c       | 13 +++++---
 security/security.c                     |  5 +--
 security/selinux/hooks.c                | 19 ++++++-----
 security/smack/smack_lsm.c              | 18 ++++++-----
 68 files changed, 289 insertions(+), 167 deletions(-)

-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr
  2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
@ 2021-11-17  1:58 ` David Anderson
  2021-11-17 16:13   ` kernel test robot
  2022-03-25 11:02   ` Luca Weiss
  2021-11-17  1:58 ` [PATCH v19 2/4] overlayfs: handle XATTR_NOSECURITY flag for get xattr method David Anderson
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: David Anderson @ 2021-11-17  1:58 UTC (permalink / raw)
  Cc: David Anderson, Mark Salyzyn, Jan Kara, Jeff Layton,
	David Sterba, Darrick J . Wong, Mike Marshall, linux-fsdevel,
	linux-unionfs, Stephen Smalley, linux-kernel,
	linux-security-module, kernel-team, selinux, paulmoore,
	Luca.Boccassi

Add a flag option to get xattr method that could have a bit flag of
XATTR_NOSECURITY passed to it.  XATTR_NOSECURITY is generally then
set in the __vfs_getxattr path when called by security
infrastructure.

This handles the case of a union filesystem driver that is being
requested by the security layer to report back the xattr data.

For the use case where access is to be blocked by the security layer.

The path then could be security(dentry) ->
__vfs_getxattr(dentry...XATTR_NOSECURITY) ->
handler->get(dentry...XATTR_NOSECURITY) ->
__vfs_getxattr(lower_dentry...XATTR_NOSECURITY) ->
lower_handler->get(lower_dentry...XATTR_NOSECURITY)
which would report back through the chain data and success as
expected, the logging security layer at the top would have the
data to determine the access permissions and report back the target
context that was blocked.

Without the get handler flag, the path on a union filesystem would be
the errant security(dentry) -> __vfs_getxattr(dentry) ->
handler->get(dentry) -> vfs_getxattr(lower_dentry) -> nested ->
security(lower_dentry, log off) -> lower_handler->get(lower_dentry)
which would report back through the chain no data, and -EACCES.

For selinux for both cases, this would translate to a correctly
determined blocked access. In the first case with this change a correct avc
log would be reported, in the second legacy case an incorrect avc log
would be reported against an uninitialized u:object_r:unlabeled:s0
context making the logs cosmetically useless for audit2allow.

This patch series is inert and is the wide-spread addition of the
flags option for xattr functions, and a replacement of __vfs_getxattr
with __vfs_getxattr(...XATTR_NOSECURITY).

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: David Anderson <dvander@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: linux-kernel@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: selinux@vger.kernel.org
Cc: paulmoore@microsoft.com
Cc: Luca.Boccassi@microsoft.com

v19
- rebase

v18
- rebase

v17
- rebase with additional change to fs/ext4/xattr_hurd.c

v16
- rebase

v15
- revert back to v4 as struct xattr_gs_args was not acceptable by the
  wider audience. Incorporate any relevant fixes on the way.

v14 (new series):
- Reincorporate back into the bugfix series for overlayfs

v8:
- Documentation reported 'struct xattr_gs_flags' rather than
'struct xattr_gs_flags *args' as argument to get and set methods.

v7:
- missed spots in fs/9p/acl.c, fs/afs/xattr.c, fs/ecryptfs/crypto.c,
fs/ubifs/xattr.c, fs/xfs/libxfs/xfs_attr.c,
security/integrity/evm/evm_main.c and security/smack/smack_lsm.c.

v6:
- kernfs missed a spot

v5:
- introduce struct xattr_gs_args for get and set methods,
__vfs_getxattr and __vfs_setxattr functions.
- cover a missing spot in ext2.
- switch from snprintf to scnprintf for correctness.

v4:
- ifdef __KERNEL__ around XATTR_NOSECURITY to
keep it colocated in uapi headers.

v3:
- poor aim on ubifs not ubifs_xattr_get, but static xattr_get

v2:
- Missed a spot: ubifs, erofs and afs.

v1:
- Removed from an overlayfs patch set, and made independent.
  Expect this to be the basis of some security improvements.
---
 Documentation/filesystems/locking.rst |  2 +-
 fs/9p/acl.c                           |  3 +-
 fs/9p/xattr.c                         |  3 +-
 fs/afs/xattr.c                        | 10 +++----
 fs/attr.c                             |  2 +-
 fs/btrfs/xattr.c                      |  3 +-
 fs/ceph/xattr.c                       |  3 +-
 fs/cifs/xattr.c                       |  2 +-
 fs/ecryptfs/inode.c                   |  6 ++--
 fs/ecryptfs/mmap.c                    |  5 ++--
 fs/erofs/xattr.c                      |  3 +-
 fs/ext2/xattr_security.c              |  2 +-
 fs/ext2/xattr_trusted.c               |  2 +-
 fs/ext2/xattr_user.c                  |  2 +-
 fs/ext4/xattr_hurd.c                  |  2 +-
 fs/ext4/xattr_security.c              |  2 +-
 fs/ext4/xattr_trusted.c               |  2 +-
 fs/ext4/xattr_user.c                  |  2 +-
 fs/f2fs/xattr.c                       |  4 +--
 fs/fuse/xattr.c                       |  4 +--
 fs/gfs2/xattr.c                       |  3 +-
 fs/hfs/attr.c                         |  2 +-
 fs/hfsplus/xattr.c                    |  3 +-
 fs/hfsplus/xattr_security.c           |  3 +-
 fs/hfsplus/xattr_trusted.c            |  3 +-
 fs/hfsplus/xattr_user.c               |  3 +-
 fs/inode.c                            |  7 +++--
 fs/internal.h                         |  3 +-
 fs/jffs2/security.c                   |  3 +-
 fs/jffs2/xattr_trusted.c              |  3 +-
 fs/jffs2/xattr_user.c                 |  3 +-
 fs/jfs/xattr.c                        |  5 ++--
 fs/kernfs/inode.c                     |  3 +-
 fs/nfs/nfs4proc.c                     |  9 ++++--
 fs/ntfs3/xattr.c                      |  2 +-
 fs/ocfs2/xattr.c                      |  9 ++++--
 fs/open.c                             |  2 +-
 fs/orangefs/xattr.c                   |  3 +-
 fs/overlayfs/super.c                  |  8 +++--
 fs/posix_acl.c                        |  2 +-
 fs/reiserfs/xattr_security.c          |  3 +-
 fs/reiserfs/xattr_trusted.c           |  3 +-
 fs/reiserfs/xattr_user.c              |  3 +-
 fs/squashfs/xattr.c                   |  2 +-
 fs/ubifs/xattr.c                      |  3 +-
 fs/xattr.c                            | 42 ++++++++++++++-------------
 fs/xfs/xfs_xattr.c                    |  3 +-
 include/linux/lsm_hook_defs.h         |  3 +-
 include/linux/security.h              |  6 ++--
 include/linux/xattr.h                 |  6 ++--
 include/uapi/linux/xattr.h            |  7 +++--
 mm/shmem.c                            |  3 +-
 net/socket.c                          |  3 +-
 security/commoncap.c                  | 11 ++++---
 security/integrity/evm/evm_main.c     | 13 +++++----
 security/security.c                   |  5 ++--
 security/selinux/hooks.c              | 19 +++++++-----
 security/smack/smack_lsm.c            | 18 +++++++-----
 58 files changed, 178 insertions(+), 118 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index d36fe79167b3..322466cbb650 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -130,7 +130,7 @@ prototypes::
 	bool (*list)(struct dentry *dentry);
 	int (*get)(const struct xattr_handler *handler, struct dentry *dentry,
 		   struct inode *inode, const char *name, void *buffer,
-		   size_t size);
+		   size_t size, int flags);
 	int (*set)(const struct xattr_handler *handler,
                    struct user_namespace *mnt_userns,
                    struct dentry *dentry, struct inode *inode, const char *name,
diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index 4dac4a0dc5f4..66389aaa2385 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -210,7 +210,8 @@ int v9fs_acl_mode(struct inode *dir, umode_t *modep,
 
 static int v9fs_xattr_get_acl(const struct xattr_handler *handler,
 			      struct dentry *dentry, struct inode *inode,
-			      const char *name, void *buffer, size_t size)
+			      const char *name, void *buffer, size_t size,
+			      int flags)
 {
 	struct v9fs_session_info *v9ses;
 	struct posix_acl *acl;
diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index a824441b95a2..213ca59b1c35 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -141,7 +141,8 @@ ssize_t v9fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
 
 static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
 				  struct dentry *dentry, struct inode *inode,
-				  const char *name, void *buffer, size_t size)
+				  const char *name, void *buffer, size_t size,
+				  int flags)
 {
 	const char *full_name = xattr_full_name(handler, name);
 
diff --git a/fs/afs/xattr.c b/fs/afs/xattr.c
index 7751b0b3f81d..172013d3e0dd 100644
--- a/fs/afs/xattr.c
+++ b/fs/afs/xattr.c
@@ -36,7 +36,7 @@ static const struct afs_operation_ops afs_fetch_acl_operation = {
 static int afs_xattr_get_acl(const struct xattr_handler *handler,
 			     struct dentry *dentry,
 			     struct inode *inode, const char *name,
-			     void *buffer, size_t size)
+			     void *buffer, size_t size, int flags)
 {
 	struct afs_operation *op;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
@@ -138,7 +138,7 @@ static const struct afs_operation_ops yfs_fetch_opaque_acl_operation = {
 static int afs_xattr_get_yfs(const struct xattr_handler *handler,
 			     struct dentry *dentry,
 			     struct inode *inode, const char *name,
-			     void *buffer, size_t size)
+			     void *buffer, size_t size, int flags)
 {
 	struct afs_operation *op;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
@@ -268,7 +268,7 @@ static const struct xattr_handler afs_xattr_yfs_handler = {
 static int afs_xattr_get_cell(const struct xattr_handler *handler,
 			      struct dentry *dentry,
 			      struct inode *inode, const char *name,
-			      void *buffer, size_t size)
+			      void *buffer, size_t size, int flags)
 {
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	struct afs_cell *cell = vnode->volume->cell;
@@ -295,7 +295,7 @@ static const struct xattr_handler afs_xattr_afs_cell_handler = {
 static int afs_xattr_get_fid(const struct xattr_handler *handler,
 			     struct dentry *dentry,
 			     struct inode *inode, const char *name,
-			     void *buffer, size_t size)
+			     void *buffer, size_t size, int flags)
 {
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	char text[16 + 1 + 24 + 1 + 8 + 1];
@@ -333,7 +333,7 @@ static const struct xattr_handler afs_xattr_afs_fid_handler = {
 static int afs_xattr_get_volume(const struct xattr_handler *handler,
 			      struct dentry *dentry,
 			      struct inode *inode, const char *name,
-			      void *buffer, size_t size)
+			      void *buffer, size_t size, int flags)
 {
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	const char *volname = vnode->volume->name;
diff --git a/fs/attr.c b/fs/attr.c
index 473d21b3a86d..caa6662cb14a 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -342,7 +342,7 @@ int notify_change(struct user_namespace *mnt_userns, struct dentry *dentry,
 		attr->ia_mtime = timestamp_truncate(attr->ia_mtime, inode);
 
 	if (ia_valid & ATTR_KILL_PRIV) {
-		error = security_inode_need_killpriv(dentry);
+		error = security_inode_need_killpriv(mnt_userns, dentry);
 		if (error < 0)
 			return error;
 		if (error == 0)
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 2837b4c8424d..7d9f7a56e7c0 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -378,7 +378,8 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size)
 
 static int btrfs_xattr_handler_get(const struct xattr_handler *handler,
 				   struct dentry *unused, struct inode *inode,
-				   const char *name, void *buffer, size_t size)
+				   const char *name, void *buffer, size_t size,
+				   int flags)
 {
 	name = xattr_full_name(handler, name);
 	return btrfs_getxattr(inode, name, buffer, size);
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index fcf7dfdecf96..0f4971025387 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -1256,7 +1256,8 @@ int __ceph_setxattr(struct inode *inode, const char *name,
 
 static int ceph_get_xattr_handler(const struct xattr_handler *handler,
 				  struct dentry *dentry, struct inode *inode,
-				  const char *name, void *value, size_t size)
+				  const char *name, void *value, size_t size,
+				  int flags)
 {
 	if (!ceph_is_valid_xattr(name))
 		return -EOPNOTSUPP;
diff --git a/fs/cifs/xattr.c b/fs/cifs/xattr.c
index 7d8b72d67c80..e55bb399916f 100644
--- a/fs/cifs/xattr.c
+++ b/fs/cifs/xattr.c
@@ -279,7 +279,7 @@ static int cifs_creation_time_get(struct dentry *dentry, struct inode *inode,
 
 static int cifs_xattr_get(const struct xattr_handler *handler,
 			  struct dentry *dentry, struct inode *inode,
-			  const char *name, void *value, size_t size)
+			  const char *name, void *value, size_t size, int flags)
 {
 	ssize_t rc = -EOPNOTSUPP;
 	unsigned int xid;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 16d50dface59..f3a20a14dd8b 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1050,7 +1050,8 @@ ecryptfs_getxattr_lower(struct dentry *lower_dentry, struct inode *lower_inode,
 		goto out;
 	}
 	inode_lock(lower_inode);
-	rc = __vfs_getxattr(lower_dentry, lower_inode, name, value, size);
+	rc = __vfs_getxattr(&init_user_ns, lower_dentry, lower_inode, name,
+			    value, size, XATTR_NOSECURITY);
 	inode_unlock(lower_inode);
 out:
 	return rc;
@@ -1156,7 +1157,8 @@ const struct inode_operations ecryptfs_main_iops = {
 
 static int ecryptfs_xattr_get(const struct xattr_handler *handler,
 			      struct dentry *dentry, struct inode *inode,
-			      const char *name, void *buffer, size_t size)
+			      const char *name, void *buffer, size_t size,
+			      int flags)
 {
 	return ecryptfs_getxattr(dentry, inode, name, buffer, size);
 }
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 7d85e64ea62f..8635390f2bcb 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -422,8 +422,9 @@ static int ecryptfs_write_inode_size_to_xattr(struct inode *ecryptfs_inode)
 		goto out;
 	}
 	inode_lock(lower_inode);
-	size = __vfs_getxattr(lower_dentry, lower_inode, ECRYPTFS_XATTR_NAME,
-			      xattr_virt, PAGE_SIZE);
+	size = __vfs_getxattr(&init_user_ns, lower_dentry, lower_inode,
+			      ECRYPTFS_XATTR_NAME, xattr_virt, PAGE_SIZE,
+			      XATTR_NOSECURITY);
 	if (size < 0)
 		size = 8;
 	put_unaligned_be64(i_size_read(ecryptfs_inode), xattr_virt);
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index 01c581e93c5f..a900976b0f77 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -470,7 +470,8 @@ int erofs_getxattr(struct inode *inode, int index,
 
 static int erofs_xattr_generic_get(const struct xattr_handler *handler,
 				   struct dentry *unused, struct inode *inode,
-				   const char *name, void *buffer, size_t size)
+				   const char *name, void *buffer, size_t size,
+				   int flags)
 {
 	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
 
diff --git a/fs/ext2/xattr_security.c b/fs/ext2/xattr_security.c
index ebade1f52451..91c716ec140f 100644
--- a/fs/ext2/xattr_security.c
+++ b/fs/ext2/xattr_security.c
@@ -11,7 +11,7 @@
 static int
 ext2_xattr_security_get(const struct xattr_handler *handler,
 			struct dentry *unused, struct inode *inode,
-			const char *name, void *buffer, size_t size)
+			const char *name, void *buffer, size_t size, int flags)
 {
 	return ext2_xattr_get(inode, EXT2_XATTR_INDEX_SECURITY, name,
 			      buffer, size);
diff --git a/fs/ext2/xattr_trusted.c b/fs/ext2/xattr_trusted.c
index 18a87d5dd1ab..362f83e43c4c 100644
--- a/fs/ext2/xattr_trusted.c
+++ b/fs/ext2/xattr_trusted.c
@@ -18,7 +18,7 @@ ext2_xattr_trusted_list(struct dentry *dentry)
 static int
 ext2_xattr_trusted_get(const struct xattr_handler *handler,
 		       struct dentry *unused, struct inode *inode,
-		       const char *name, void *buffer, size_t size)
+		       const char *name, void *buffer, size_t size, int flags)
 {
 	return ext2_xattr_get(inode, EXT2_XATTR_INDEX_TRUSTED, name,
 			      buffer, size);
diff --git a/fs/ext2/xattr_user.c b/fs/ext2/xattr_user.c
index 58092449f8ff..37936d5979c8 100644
--- a/fs/ext2/xattr_user.c
+++ b/fs/ext2/xattr_user.c
@@ -20,7 +20,7 @@ ext2_xattr_user_list(struct dentry *dentry)
 static int
 ext2_xattr_user_get(const struct xattr_handler *handler,
 		    struct dentry *unused, struct inode *inode,
-		    const char *name, void *buffer, size_t size)
+		    const char *name, void *buffer, size_t size, int flags)
 {
 	if (!test_opt(inode->i_sb, XATTR_USER))
 		return -EOPNOTSUPP;
diff --git a/fs/ext4/xattr_hurd.c b/fs/ext4/xattr_hurd.c
index c78df5790377..7cce47e56a10 100644
--- a/fs/ext4/xattr_hurd.c
+++ b/fs/ext4/xattr_hurd.c
@@ -21,7 +21,7 @@ ext4_xattr_hurd_list(struct dentry *dentry)
 static int
 ext4_xattr_hurd_get(const struct xattr_handler *handler,
 		    struct dentry *unused, struct inode *inode,
-		    const char *name, void *buffer, size_t size)
+		    const char *name, void *buffer, size_t size, int flags)
 {
 	if (!test_opt(inode->i_sb, XATTR_USER))
 		return -EOPNOTSUPP;
diff --git a/fs/ext4/xattr_security.c b/fs/ext4/xattr_security.c
index 8213f66f7b2d..f85699f6ad17 100644
--- a/fs/ext4/xattr_security.c
+++ b/fs/ext4/xattr_security.c
@@ -15,7 +15,7 @@
 static int
 ext4_xattr_security_get(const struct xattr_handler *handler,
 			struct dentry *unused, struct inode *inode,
-			const char *name, void *buffer, size_t size)
+			const char *name, void *buffer, size_t size, int flags)
 {
 	return ext4_xattr_get(inode, EXT4_XATTR_INDEX_SECURITY,
 			      name, buffer, size);
diff --git a/fs/ext4/xattr_trusted.c b/fs/ext4/xattr_trusted.c
index 7c21ffb26d25..9b550a8a5c0b 100644
--- a/fs/ext4/xattr_trusted.c
+++ b/fs/ext4/xattr_trusted.c
@@ -22,7 +22,7 @@ ext4_xattr_trusted_list(struct dentry *dentry)
 static int
 ext4_xattr_trusted_get(const struct xattr_handler *handler,
 		       struct dentry *unused, struct inode *inode,
-		       const char *name, void *buffer, size_t size)
+		       const char *name, void *buffer, size_t size, int flags)
 {
 	return ext4_xattr_get(inode, EXT4_XATTR_INDEX_TRUSTED,
 			      name, buffer, size);
diff --git a/fs/ext4/xattr_user.c b/fs/ext4/xattr_user.c
index 2fe7ff0a479c..5483b158c5b5 100644
--- a/fs/ext4/xattr_user.c
+++ b/fs/ext4/xattr_user.c
@@ -21,7 +21,7 @@ ext4_xattr_user_list(struct dentry *dentry)
 static int
 ext4_xattr_user_get(const struct xattr_handler *handler,
 		    struct dentry *unused, struct inode *inode,
-		    const char *name, void *buffer, size_t size)
+		    const char *name, void *buffer, size_t size, int flags)
 {
 	if (!test_opt(inode->i_sb, XATTR_USER))
 		return -EOPNOTSUPP;
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index e348f33bcb2b..a73a3d0510f1 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -45,7 +45,7 @@ static void xattr_free(struct f2fs_sb_info *sbi, void *xattr_addr,
 
 static int f2fs_xattr_generic_get(const struct xattr_handler *handler,
 		struct dentry *unused, struct inode *inode,
-		const char *name, void *buffer, size_t size)
+		const char *name, void *buffer, size_t size, int flags)
 {
 	struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
 
@@ -101,7 +101,7 @@ static bool f2fs_xattr_trusted_list(struct dentry *dentry)
 
 static int f2fs_xattr_advise_get(const struct xattr_handler *handler,
 		struct dentry *unused, struct inode *inode,
-		const char *name, void *buffer, size_t size)
+		const char *name, void *buffer, size_t size, int flags)
 {
 	if (buffer)
 		*((char *)buffer) = F2FS_I(inode)->i_advise;
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 0d3e7177fce0..d77d5844e6c2 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -180,7 +180,7 @@ int fuse_removexattr(struct inode *inode, const char *name)
 
 static int fuse_xattr_get(const struct xattr_handler *handler,
 			 struct dentry *dentry, struct inode *inode,
-			 const char *name, void *value, size_t size)
+			 const char *name, void *value, size_t size, int flags)
 {
 	if (fuse_is_bad(inode))
 		return -EIO;
@@ -210,7 +210,7 @@ static bool no_xattr_list(struct dentry *dentry)
 
 static int no_xattr_get(const struct xattr_handler *handler,
 			struct dentry *dentry, struct inode *inode,
-			const char *name, void *value, size_t size)
+			const char *name, void *value, size_t size, int flags)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 0c5650fe1fd1..d4e8fb14b18e 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -602,7 +602,8 @@ static int __gfs2_xattr_get(struct inode *inode, const char *name,
 
 static int gfs2_xattr_get(const struct xattr_handler *handler,
 			  struct dentry *unused, struct inode *inode,
-			  const char *name, void *buffer, size_t size)
+			  const char *name, void *buffer, size_t size,
+			  int flags)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
diff --git a/fs/hfs/attr.c b/fs/hfs/attr.c
index 2bd54efaf416..c42385c33d2a 100644
--- a/fs/hfs/attr.c
+++ b/fs/hfs/attr.c
@@ -115,7 +115,7 @@ static ssize_t __hfs_getxattr(struct inode *inode, enum hfs_xattr_type type,
 
 static int hfs_xattr_get(const struct xattr_handler *handler,
 			 struct dentry *unused, struct inode *inode,
-			 const char *name, void *value, size_t size)
+			 const char *name, void *value, size_t size, int flags)
 {
 	return __hfs_getxattr(inode, handler->flags, value, size);
 }
diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
index e2855ceefd39..65e625aa4fa9 100644
--- a/fs/hfsplus/xattr.c
+++ b/fs/hfsplus/xattr.c
@@ -838,7 +838,8 @@ static int hfsplus_removexattr(struct inode *inode, const char *name)
 
 static int hfsplus_osx_getxattr(const struct xattr_handler *handler,
 				struct dentry *unused, struct inode *inode,
-				const char *name, void *buffer, size_t size)
+				const char *name, void *buffer, size_t size,
+				int flags)
 {
 	/*
 	 * Don't allow retrieving properly prefixed attributes
diff --git a/fs/hfsplus/xattr_security.c b/fs/hfsplus/xattr_security.c
index c1c7a16cbf21..1b28b83c201e 100644
--- a/fs/hfsplus/xattr_security.c
+++ b/fs/hfsplus/xattr_security.c
@@ -15,7 +15,8 @@
 
 static int hfsplus_security_getxattr(const struct xattr_handler *handler,
 				     struct dentry *unused, struct inode *inode,
-				     const char *name, void *buffer, size_t size)
+				     const char *name, void *buffer,
+				     size_t size, int flags)
 {
 	return hfsplus_getxattr(inode, name, buffer, size,
 				XATTR_SECURITY_PREFIX,
diff --git a/fs/hfsplus/xattr_trusted.c b/fs/hfsplus/xattr_trusted.c
index e150372ec564..7caeff6243d7 100644
--- a/fs/hfsplus/xattr_trusted.c
+++ b/fs/hfsplus/xattr_trusted.c
@@ -14,7 +14,8 @@
 
 static int hfsplus_trusted_getxattr(const struct xattr_handler *handler,
 				    struct dentry *unused, struct inode *inode,
-				    const char *name, void *buffer, size_t size)
+				    const char *name, void *buffer,
+				    size_t size, int flags)
 {
 	return hfsplus_getxattr(inode, name, buffer, size,
 				XATTR_TRUSTED_PREFIX,
diff --git a/fs/hfsplus/xattr_user.c b/fs/hfsplus/xattr_user.c
index a6b60b153916..ca74443eb123 100644
--- a/fs/hfsplus/xattr_user.c
+++ b/fs/hfsplus/xattr_user.c
@@ -14,7 +14,8 @@
 
 static int hfsplus_user_getxattr(const struct xattr_handler *handler,
 				 struct dentry *unused, struct inode *inode,
-				 const char *name, void *buffer, size_t size)
+				 const char *name, void *buffer, size_t size,
+				 int flags)
 {
 
 	return hfsplus_getxattr(inode, name, buffer, size,
diff --git a/fs/inode.c b/fs/inode.c
index 3eba0940ffcf..a8234909ca5a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1907,7 +1907,8 @@ EXPORT_SYMBOL(should_remove_suid);
  * response to write or truncate. Return 0 if nothing has to be changed.
  * Negative value on error (change should be denied).
  */
-int dentry_needs_remove_privs(struct dentry *dentry)
+int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
+			      struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	int mask = 0;
@@ -1917,7 +1918,7 @@ int dentry_needs_remove_privs(struct dentry *dentry)
 		return 0;
 
 	mask = should_remove_suid(dentry);
-	ret = security_inode_need_killpriv(dentry);
+	ret = security_inode_need_killpriv(mnt_userns, dentry);
 	if (ret < 0)
 		return ret;
 	if (ret)
@@ -1958,7 +1959,7 @@ int file_remove_privs(struct file *file)
 	if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
 		return 0;
 
-	kill = dentry_needs_remove_privs(dentry);
+	kill = dentry_needs_remove_privs(file_mnt_user_ns(file), dentry);
 	if (kill < 0)
 		return kill;
 	if (kill)
diff --git a/fs/internal.h b/fs/internal.h
index 7979ff8d168c..2ed729447c4d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -138,7 +138,8 @@ extern int vfs_open(const struct path *, struct file *);
  * inode.c
  */
 extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
-extern int dentry_needs_remove_privs(struct dentry *dentry);
+extern int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
+				     struct dentry *dentry);
 
 /*
  * fs-writeback.c
diff --git a/fs/jffs2/security.c b/fs/jffs2/security.c
index aef5522551db..c443c4e47208 100644
--- a/fs/jffs2/security.c
+++ b/fs/jffs2/security.c
@@ -50,7 +50,8 @@ int jffs2_init_security(struct inode *inode, struct inode *dir,
 /* ---- XATTR Handler for "security.*" ----------------- */
 static int jffs2_security_getxattr(const struct xattr_handler *handler,
 				   struct dentry *unused, struct inode *inode,
-				   const char *name, void *buffer, size_t size)
+				   const char *name, void *buffer, size_t size,
+				   int flags)
 {
 	return do_jffs2_getxattr(inode, JFFS2_XPREFIX_SECURITY,
 				 name, buffer, size);
diff --git a/fs/jffs2/xattr_trusted.c b/fs/jffs2/xattr_trusted.c
index cc3f24883e7d..95ed9ce1eaaf 100644
--- a/fs/jffs2/xattr_trusted.c
+++ b/fs/jffs2/xattr_trusted.c
@@ -18,7 +18,8 @@
 
 static int jffs2_trusted_getxattr(const struct xattr_handler *handler,
 				  struct dentry *unused, struct inode *inode,
-				  const char *name, void *buffer, size_t size)
+				  const char *name, void *buffer, size_t size,
+				  int flags)
 {
 	return do_jffs2_getxattr(inode, JFFS2_XPREFIX_TRUSTED,
 				 name, buffer, size);
diff --git a/fs/jffs2/xattr_user.c b/fs/jffs2/xattr_user.c
index fb945977c013..418bb8d2758f 100644
--- a/fs/jffs2/xattr_user.c
+++ b/fs/jffs2/xattr_user.c
@@ -18,7 +18,8 @@
 
 static int jffs2_user_getxattr(const struct xattr_handler *handler,
 			       struct dentry *unused, struct inode *inode,
-			       const char *name, void *buffer, size_t size)
+			       const char *name, void *buffer, size_t size,
+			       int flags)
 {
 	return do_jffs2_getxattr(inode, JFFS2_XPREFIX_USER,
 				 name, buffer, size);
diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c
index f9273f6901c8..8728df337090 100644
--- a/fs/jfs/xattr.c
+++ b/fs/jfs/xattr.c
@@ -925,7 +925,7 @@ static int __jfs_xattr_set(struct inode *inode, const char *name,
 
 static int jfs_xattr_get(const struct xattr_handler *handler,
 			 struct dentry *unused, struct inode *inode,
-			 const char *name, void *value, size_t size)
+			 const char *name, void *value, size_t size, int flags)
 {
 	name = xattr_full_name(handler, name);
 	return __jfs_getxattr(inode, name, value, size);
@@ -943,7 +943,8 @@ static int jfs_xattr_set(const struct xattr_handler *handler,
 
 static int jfs_xattr_get_os2(const struct xattr_handler *handler,
 			     struct dentry *unused, struct inode *inode,
-			     const char *name, void *value, size_t size)
+			     const char *name, void *value, size_t size,
+			     int flags)
 {
 	if (is_known_namespace(name))
 		return -EOPNOTSUPP;
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index c0eae1725435..d56210c657f0 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -313,7 +313,8 @@ int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
 
 static int kernfs_vfs_xattr_get(const struct xattr_handler *handler,
 				struct dentry *unused, struct inode *inode,
-				const char *suffix, void *value, size_t size)
+				const char *suffix, void *value, size_t size,
+				int flags)
 {
 	const char *name = xattr_full_name(handler, suffix);
 	struct kernfs_node *kn = inode->i_private;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ee3bc79f6ca3..4e5f66623cda 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7542,7 +7542,8 @@ static int nfs4_xattr_set_nfs4_acl(const struct xattr_handler *handler,
 
 static int nfs4_xattr_get_nfs4_acl(const struct xattr_handler *handler,
 				   struct dentry *unused, struct inode *inode,
-				   const char *key, void *buf, size_t buflen)
+				   const char *key, void *buf, size_t buflen,
+				   int flags)
 {
 	return nfs4_proc_get_acl(inode, buf, buflen);
 }
@@ -7568,7 +7569,8 @@ static int nfs4_xattr_set_nfs4_label(const struct xattr_handler *handler,
 
 static int nfs4_xattr_get_nfs4_label(const struct xattr_handler *handler,
 				     struct dentry *unused, struct inode *inode,
-				     const char *key, void *buf, size_t buflen)
+				     const char *key, void *buf, size_t buflen,
+				     int flags)
 {
 	if (security_ismaclabel(key))
 		return nfs4_get_security_label(inode, buf, buflen);
@@ -7646,7 +7648,8 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
 
 static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
 				    struct dentry *unused, struct inode *inode,
-				    const char *key, void *buf, size_t buflen)
+				    const char *key, void *buf, size_t buflen,
+				    int flags)
 {
 	struct nfs_access_entry cache;
 	ssize_t ret;
diff --git a/fs/ntfs3/xattr.c b/fs/ntfs3/xattr.c
index afd0ddad826f..fdc54f7573d5 100644
--- a/fs/ntfs3/xattr.c
+++ b/fs/ntfs3/xattr.c
@@ -710,7 +710,7 @@ ssize_t ntfs_listxattr(struct dentry *dentry, char *buffer, size_t size)
 
 static int ntfs_getxattr(const struct xattr_handler *handler, struct dentry *de,
 			 struct inode *inode, const char *name, void *buffer,
-			 size_t size)
+			 size_t size, int flags)
 {
 	int err;
 	struct ntfs_inode *ni = ntfs_i(inode);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index dd784eb0cd7c..bd019ed56555 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -7240,7 +7240,8 @@ int ocfs2_init_security_and_acl(struct inode *dir,
  */
 static int ocfs2_xattr_security_get(const struct xattr_handler *handler,
 				    struct dentry *unused, struct inode *inode,
-				    const char *name, void *buffer, size_t size)
+				    const char *name, void *buffer, size_t size,
+				    int flags)
 {
 	return ocfs2_xattr_get(inode, OCFS2_XATTR_INDEX_SECURITY,
 			       name, buffer, size);
@@ -7313,7 +7314,8 @@ const struct xattr_handler ocfs2_xattr_security_handler = {
  */
 static int ocfs2_xattr_trusted_get(const struct xattr_handler *handler,
 				   struct dentry *unused, struct inode *inode,
-				   const char *name, void *buffer, size_t size)
+				   const char *name, void *buffer, size_t size,
+				   int flags)
 {
 	return ocfs2_xattr_get(inode, OCFS2_XATTR_INDEX_TRUSTED,
 			       name, buffer, size);
@@ -7340,7 +7342,8 @@ const struct xattr_handler ocfs2_xattr_trusted_handler = {
  */
 static int ocfs2_xattr_user_get(const struct xattr_handler *handler,
 				struct dentry *unused, struct inode *inode,
-				const char *name, void *buffer, size_t size)
+				const char *name, void *buffer, size_t size,
+				int flags)
 {
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
diff --git a/fs/open.c b/fs/open.c
index f732fb94600c..0591189a169a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -53,7 +53,7 @@ int do_truncate(struct user_namespace *mnt_userns, struct dentry *dentry,
 	}
 
 	/* Remove suid, sgid, and file capabilities on truncate too */
-	ret = dentry_needs_remove_privs(dentry);
+	ret = dentry_needs_remove_privs(mnt_userns, dentry);
 	if (ret < 0)
 		return ret;
 	if (ret)
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
index 9a5b757fbd2f..2cc3bc61235b 100644
--- a/fs/orangefs/xattr.c
+++ b/fs/orangefs/xattr.c
@@ -542,7 +542,8 @@ static int orangefs_xattr_get_default(const struct xattr_handler *handler,
 				      struct inode *inode,
 				      const char *name,
 				      void *buffer,
-				      size_t size)
+				      size_t size,
+				      int flags)
 {
 	return orangefs_inode_getxattr(inode, name, buffer, size);
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 265181c110ae..359aa5772cb7 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1004,7 +1004,7 @@ static unsigned int ovl_split_lowerdirs(char *str)
 static int __maybe_unused
 ovl_posix_acl_xattr_get(const struct xattr_handler *handler,
 			struct dentry *dentry, struct inode *inode,
-			const char *name, void *buffer, size_t size)
+			const char *name, void *buffer, size_t size, int flags)
 {
 	return ovl_xattr_get(dentry, inode, handler->name, buffer, size);
 }
@@ -1067,7 +1067,8 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
 
 static int ovl_own_xattr_get(const struct xattr_handler *handler,
 			     struct dentry *dentry, struct inode *inode,
-			     const char *name, void *buffer, size_t size)
+			     const char *name, void *buffer, size_t size,
+			     int flags)
 {
 	return -EOPNOTSUPP;
 }
@@ -1083,7 +1084,8 @@ static int ovl_own_xattr_set(const struct xattr_handler *handler,
 
 static int ovl_other_xattr_get(const struct xattr_handler *handler,
 			       struct dentry *dentry, struct inode *inode,
-			       const char *name, void *buffer, size_t size)
+			       const char *name, void *buffer, size_t size,
+			       int flags)
 {
 	return ovl_xattr_get(dentry, inode, name, buffer, size);
 }
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 9323a854a60a..e6cd6065ddd0 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -887,7 +887,7 @@ EXPORT_SYMBOL (posix_acl_to_xattr);
 static int
 posix_acl_xattr_get(const struct xattr_handler *handler,
 		    struct dentry *unused, struct inode *inode,
-		    const char *name, void *value, size_t size)
+		    const char *name, void *value, size_t size, int flags)
 {
 	struct posix_acl *acl;
 	int error;
diff --git a/fs/reiserfs/xattr_security.c b/fs/reiserfs/xattr_security.c
index 8965c8e5e172..1fd1359a98ba 100644
--- a/fs/reiserfs/xattr_security.c
+++ b/fs/reiserfs/xattr_security.c
@@ -11,7 +11,8 @@
 
 static int
 security_get(const struct xattr_handler *handler, struct dentry *unused,
-	     struct inode *inode, const char *name, void *buffer, size_t size)
+	     struct inode *inode, const char *name, void *buffer, size_t size,
+	     int flags)
 {
 	if (IS_PRIVATE(inode))
 		return -EPERM;
diff --git a/fs/reiserfs/xattr_trusted.c b/fs/reiserfs/xattr_trusted.c
index d853cea2afcd..f54b848e9504 100644
--- a/fs/reiserfs/xattr_trusted.c
+++ b/fs/reiserfs/xattr_trusted.c
@@ -10,7 +10,8 @@
 
 static int
 trusted_get(const struct xattr_handler *handler, struct dentry *unused,
-	    struct inode *inode, const char *name, void *buffer, size_t size)
+	    struct inode *inode, const char *name, void *buffer, size_t size,
+	    int flags)
 {
 	if (!capable(CAP_SYS_ADMIN) || IS_PRIVATE(inode))
 		return -EPERM;
diff --git a/fs/reiserfs/xattr_user.c b/fs/reiserfs/xattr_user.c
index 65d9cd10a5ea..ddd98cdab24d 100644
--- a/fs/reiserfs/xattr_user.c
+++ b/fs/reiserfs/xattr_user.c
@@ -9,7 +9,8 @@
 
 static int
 user_get(const struct xattr_handler *handler, struct dentry *unused,
-	 struct inode *inode, const char *name, void *buffer, size_t size)
+	 struct inode *inode, const char *name, void *buffer, size_t size,
+	 int flags)
 {
 	if (!reiserfs_xattrs_user(inode->i_sb))
 		return -EOPNOTSUPP;
diff --git a/fs/squashfs/xattr.c b/fs/squashfs/xattr.c
index e1e3f3dd5a06..d8d58c990652 100644
--- a/fs/squashfs/xattr.c
+++ b/fs/squashfs/xattr.c
@@ -204,7 +204,7 @@ static int squashfs_xattr_handler_get(const struct xattr_handler *handler,
 				      struct dentry *unused,
 				      struct inode *inode,
 				      const char *name,
-				      void *buffer, size_t size)
+				      void *buffer, size_t size, int flags)
 {
 	return squashfs_xattr_get(inode, handler->flags, name,
 		buffer, size);
diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index e4f193eae4b2..681663f5bfa1 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -689,7 +689,8 @@ int ubifs_init_security(struct inode *dentry, struct inode *inode,
 
 static int xattr_get(const struct xattr_handler *handler,
 			   struct dentry *dentry, struct inode *inode,
-			   const char *name, void *buffer, size_t size)
+			   const char *name, void *buffer, size_t size,
+			   int flags)
 {
 	dbg_gen("xattr '%s', ino %lu ('%pd'), buf size %zd", name,
 		inode->i_ino, dentry, size);
diff --git a/fs/xattr.c b/fs/xattr.c
index 5c8c5175b385..4c36ddd6ac3c 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -369,7 +369,7 @@ vfs_getxattr_alloc(struct user_namespace *mnt_userns, struct dentry *dentry,
 		return PTR_ERR(handler);
 	if (!handler->get)
 		return -EOPNOTSUPP;
-	error = handler->get(handler, dentry, inode, name, NULL, 0);
+	error = handler->get(handler, dentry, inode, name, NULL, 0, 0);
 	if (error < 0)
 		return error;
 
@@ -380,33 +380,22 @@ vfs_getxattr_alloc(struct user_namespace *mnt_userns, struct dentry *dentry,
 		memset(value, 0, error + 1);
 	}
 
-	error = handler->get(handler, dentry, inode, name, value, error);
+	error = handler->get(handler, dentry, inode, name, value, error, 0);
 	*xattr_value = value;
 	return error;
 }
 
 ssize_t
-__vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
-	       void *value, size_t size)
+__vfs_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+	       struct inode *inode, const char *name, void *value,
+	       size_t size, int flags)
 {
 	const struct xattr_handler *handler;
-
-	handler = xattr_resolve_name(inode, &name);
-	if (IS_ERR(handler))
-		return PTR_ERR(handler);
-	if (!handler->get)
-		return -EOPNOTSUPP;
-	return handler->get(handler, dentry, inode, name, value, size);
-}
-EXPORT_SYMBOL(__vfs_getxattr);
-
-ssize_t
-vfs_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
-	     const char *name, void *value, size_t size)
-{
-	struct inode *inode = dentry->d_inode;
 	int error;
 
+	if (flags & XATTR_NOSECURITY)
+		goto nolsm;
+
 	error = xattr_permission(mnt_userns, inode, name, MAY_READ);
 	if (error)
 		return error;
@@ -429,7 +418,20 @@ vfs_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		return ret;
 	}
 nolsm:
-	return __vfs_getxattr(dentry, inode, name, value, size);
+	handler = xattr_resolve_name(inode, &name);
+	if (IS_ERR(handler))
+		return PTR_ERR(handler);
+	if (!handler->get)
+		return -EOPNOTSUPP;
+	return handler->get(handler, dentry, inode, name, value, size, flags);
+}
+EXPORT_SYMBOL(__vfs_getxattr);
+
+ssize_t
+vfs_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+	     const char *name, void *value, size_t size)
+{
+	return __vfs_getxattr(mnt_userns, dentry, dentry->d_inode, name, value, size, 0);
 }
 EXPORT_SYMBOL_GPL(vfs_getxattr);
 
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 0d050f8829ef..296b494e67c6 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -21,7 +21,8 @@
 
 static int
 xfs_xattr_get(const struct xattr_handler *handler, struct dentry *unused,
-		struct inode *inode, const char *name, void *value, size_t size)
+		struct inode *inode, const char *name, void *value, size_t size,
+		int flags)
 {
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index df8de62f4710..999375766635 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -147,7 +147,8 @@ LSM_HOOK(int, 0, inode_getxattr, struct dentry *dentry, const char *name)
 LSM_HOOK(int, 0, inode_listxattr, struct dentry *dentry)
 LSM_HOOK(int, 0, inode_removexattr, struct user_namespace *mnt_userns,
 	 struct dentry *dentry, const char *name)
-LSM_HOOK(int, 0, inode_need_killpriv, struct dentry *dentry)
+LSM_HOOK(int, 0, inode_need_killpriv, struct user_namespace *mnt_userns,
+	 struct dentry *dentry)
 LSM_HOOK(int, 0, inode_killpriv, struct user_namespace *mnt_userns,
 	 struct dentry *dentry)
 LSM_HOOK(int, -EOPNOTSUPP, inode_getsecurity, struct user_namespace *mnt_userns,
diff --git a/include/linux/security.h b/include/linux/security.h
index bbf44a466832..c48992c71295 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -150,7 +150,8 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
 		       const void *value, size_t size, int flags);
 int cap_inode_removexattr(struct user_namespace *mnt_userns,
 			  struct dentry *dentry, const char *name);
-int cap_inode_need_killpriv(struct dentry *dentry);
+int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
+			    struct dentry *dentry);
 int cap_inode_killpriv(struct user_namespace *mnt_userns,
 		       struct dentry *dentry);
 int cap_inode_getsecurity(struct user_namespace *mnt_userns,
@@ -364,7 +365,8 @@ int security_inode_getxattr(struct dentry *dentry, const char *name);
 int security_inode_listxattr(struct dentry *dentry);
 int security_inode_removexattr(struct user_namespace *mnt_userns,
 			       struct dentry *dentry, const char *name);
-int security_inode_need_killpriv(struct dentry *dentry);
+int security_inode_need_killpriv(struct user_namespace *mnt_userns,
+				 struct dentry *dentry);
 int security_inode_killpriv(struct user_namespace *mnt_userns,
 			    struct dentry *dentry);
 int security_inode_getsecurity(struct user_namespace *mnt_userns,
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index 4c379d23ec6e..663a04ae6223 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -34,7 +34,7 @@ struct xattr_handler {
 	bool (*list)(struct dentry *dentry);
 	int (*get)(const struct xattr_handler *, struct dentry *dentry,
 		   struct inode *inode, const char *name, void *buffer,
-		   size_t size);
+		   size_t size, int flags);
 	int (*set)(const struct xattr_handler *,
 		   struct user_namespace *mnt_userns, struct dentry *dentry,
 		   struct inode *inode, const char *name, const void *buffer,
@@ -49,7 +49,9 @@ struct xattr {
 	size_t value_len;
 };
 
-ssize_t __vfs_getxattr(struct dentry *, struct inode *, const char *, void *, size_t);
+ssize_t __vfs_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+		       struct inode *inode, const char *name, void *buffer,
+		       size_t size, int flags);
 ssize_t vfs_getxattr(struct user_namespace *, struct dentry *, const char *,
 		     void *, size_t);
 ssize_t vfs_listxattr(struct dentry *d, char *list, size_t size);
diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
index 9463db2dfa9d..d22191a3cf09 100644
--- a/include/uapi/linux/xattr.h
+++ b/include/uapi/linux/xattr.h
@@ -18,8 +18,11 @@
 #if __UAPI_DEF_XATTR
 #define __USE_KERNEL_XATTR_DEFS
 
-#define XATTR_CREATE	0x1	/* set value, fail if attr already exists */
-#define XATTR_REPLACE	0x2	/* set value, fail if attr does not exist */
+#define XATTR_CREATE	 0x1	/* set value, fail if attr already exists */
+#define XATTR_REPLACE	 0x2	/* set value, fail if attr does not exist */
+#ifdef __KERNEL__ /* following is kernel internal, colocated for maintenance */
+#define XATTR_NOSECURITY 0x4	/* get value, do not involve security check */
+#endif
 #endif
 
 /* Namespaces */
diff --git a/mm/shmem.c b/mm/shmem.c
index dc038ce78700..ece9a84c7701 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3152,7 +3152,8 @@ static int shmem_initxattrs(struct inode *inode,
 
 static int shmem_xattr_handler_get(const struct xattr_handler *handler,
 				   struct dentry *unused, struct inode *inode,
-				   const char *name, void *buffer, size_t size)
+				   const char *name, void *buffer, size_t size,
+				   int flags)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..8704bbf55eda 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -364,7 +364,8 @@ static const struct dentry_operations sockfs_dentry_operations = {
 
 static int sockfs_xattr_get(const struct xattr_handler *handler,
 			    struct dentry *dentry, struct inode *inode,
-			    const char *suffix, void *value, size_t size)
+			    const char *suffix, void *value, size_t size,
+			    int flags)
 {
 	if (value) {
 		if (dentry->d_name.len + 1 > size)
diff --git a/security/commoncap.c b/security/commoncap.c
index 3f810d37b71b..05a5bc2c0db9 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -292,12 +292,14 @@ int cap_capset(struct cred *new,
  * Return: 1 if security.capability has a value, meaning inode_killpriv()
  * is required, 0 otherwise, meaning inode_killpriv() is not required.
  */
-int cap_inode_need_killpriv(struct dentry *dentry)
+int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
+			    struct dentry *dentry)
 {
 	struct inode *inode = d_backing_inode(dentry);
 	int error;
 
-	error = __vfs_getxattr(dentry, inode, XATTR_NAME_CAPS, NULL, 0);
+	error = __vfs_getxattr(mnt_userns, dentry, inode, XATTR_NAME_CAPS,
+			       NULL, 0, XATTR_NOSECURITY);
 	return error > 0;
 }
 
@@ -660,8 +662,9 @@ int get_vfs_caps_from_disk(struct user_namespace *mnt_userns,
 		return -ENODATA;
 
 	fs_ns = inode->i_sb->s_user_ns;
-	size = __vfs_getxattr((struct dentry *)dentry, inode,
-			      XATTR_NAME_CAPS, &data, XATTR_CAPS_SZ);
+	size = __vfs_getxattr(mnt_userns, (struct dentry *)dentry, inode,
+			      XATTR_NAME_CAPS, &data, XATTR_CAPS_SZ,
+			      XATTR_NOSECURITY);
 	if (size == -ENODATA || size == -EOPNOTSUPP)
 		/* no data, that's ok */
 		return -ENODATA;
diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index 08f907382c61..e8335fb04c7a 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -145,7 +145,8 @@ static int evm_find_protected_xattrs(struct dentry *dentry)
 		return -EOPNOTSUPP;
 
 	list_for_each_entry_lockless(xattr, &evm_config_xattrnames, list) {
-		error = __vfs_getxattr(dentry, inode, xattr->name, NULL, 0);
+		error = __vfs_getxattr(&init_user_ns, dentry, inode,
+				       xattr->name, NULL, 0, XATTR_NOSECURITY);
 		if (error < 0) {
 			if (error == -ENODATA)
 				continue;
@@ -343,8 +344,9 @@ int evm_read_protected_xattrs(struct dentry *dentry, u8 *buffer,
 	int rc, size, total_size = 0;
 
 	list_for_each_entry_lockless(xattr, &evm_config_xattrnames, list) {
-		rc = __vfs_getxattr(dentry, d_backing_inode(dentry),
-				    xattr->name, NULL, 0);
+		rc = __vfs_getxattr(&init_user_ns, dentry,
+				    d_backing_inode(dentry), xattr->name, NULL,
+				    0, XATTR_NOSECURITY);
 		if (rc < 0 && rc == -ENODATA)
 			continue;
 		else if (rc < 0)
@@ -372,10 +374,11 @@ int evm_read_protected_xattrs(struct dentry *dentry, u8 *buffer,
 		case 'v':
 			size = rc;
 			if (buffer) {
-				rc = __vfs_getxattr(dentry,
+				rc = __vfs_getxattr(&init_user_ns, dentry,
 					d_backing_inode(dentry), xattr->name,
 					buffer + total_size,
-					buffer_size - total_size);
+					buffer_size - total_size,
+					XATTR_NOSECURITY);
 				if (rc < 0)
 					return rc;
 			}
diff --git a/security/security.c b/security/security.c
index c88167a414b4..cc75b37dedc0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1404,9 +1404,10 @@ int security_inode_removexattr(struct user_namespace *mnt_userns,
 	return evm_inode_removexattr(mnt_userns, dentry, name);
 }
 
-int security_inode_need_killpriv(struct dentry *dentry)
+int security_inode_need_killpriv(struct user_namespace *mnt_userns,
+				 struct dentry *dentry)
 {
-	return call_int_hook(inode_need_killpriv, 0, dentry);
+	return call_int_hook(inode_need_killpriv, 0, mnt_userns, dentry);
 }
 
 int security_inode_killpriv(struct user_namespace *mnt_userns,
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 62d30c0a30c2..84c12072e38a 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -490,7 +490,7 @@ static int selinux_is_sblabel_mnt(struct super_block *sb)
 	}
 }
 
-static int sb_check_xattr_support(struct super_block *sb)
+static int sb_check_xattr_support(struct user_namespace *mnt_userns, struct super_block *sb)
 {
 	struct superblock_security_struct *sbsec = sb->s_security;
 	struct dentry *root = sb->s_root;
@@ -511,7 +511,8 @@ static int sb_check_xattr_support(struct super_block *sb)
 		goto fallback;
 	}
 
-	rc = __vfs_getxattr(root, root_inode, XATTR_NAME_SELINUX, NULL, 0);
+	rc = __vfs_getxattr(mnt_userns, root, root_inode, XATTR_NAME_SELINUX, NULL, 0,
+			    XATTR_NOSECURITY);
 	if (rc < 0 && rc != -ENODATA) {
 		if (rc == -EOPNOTSUPP) {
 			pr_warn("SELinux: (dev %s, type %s) has no security xattr handler\n",
@@ -547,7 +548,7 @@ static int sb_finish_set_opts(struct super_block *sb)
 	int rc = 0;
 
 	if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
-		rc = sb_check_xattr_support(sb);
+		rc = sb_check_xattr_support(sb->s_user_ns, sb);
 		if (rc)
 			return rc;
 	}
@@ -1371,12 +1372,15 @@ static int inode_doinit_use_xattr(struct inode *inode, struct dentry *dentry,
 		return -ENOMEM;
 
 	context[len] = '\0';
-	rc = __vfs_getxattr(dentry, inode, XATTR_NAME_SELINUX, context, len);
+	rc = __vfs_getxattr(&init_user_ns, dentry, inode, XATTR_NAME_SELINUX,
+			    context, len, XATTR_NOSECURITY);
 	if (rc == -ERANGE) {
 		kfree(context);
 
 		/* Need a larger buffer.  Query for the right size. */
-		rc = __vfs_getxattr(dentry, inode, XATTR_NAME_SELINUX, NULL, 0);
+		rc = __vfs_getxattr(&init_user_ns, dentry, inode,
+				    XATTR_NAME_SELINUX, NULL, 0,
+				    XATTR_NOSECURITY);
 		if (rc < 0)
 			return rc;
 
@@ -1386,8 +1390,9 @@ static int inode_doinit_use_xattr(struct inode *inode, struct dentry *dentry,
 			return -ENOMEM;
 
 		context[len] = '\0';
-		rc = __vfs_getxattr(dentry, inode, XATTR_NAME_SELINUX,
-				    context, len);
+		rc = __vfs_getxattr(&init_user_ns, dentry, inode,
+				    XATTR_NAME_SELINUX, context, len,
+				    XATTR_NOSECURITY);
 	}
 	if (rc < 0) {
 		kfree(context);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index efd35b07c7f8..5ea91eae1054 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -277,8 +277,9 @@ static int smk_bu_credfile(const struct cred *cred, struct file *file,
  * Returns a pointer to the master list entry for the Smack label,
  * NULL if there was no label to fetch, or an error code.
  */
-static struct smack_known *smk_fetch(const char *name, struct inode *ip,
-					struct dentry *dp)
+static struct smack_known *smk_fetch(struct user_namespace *mnt_userns,
+				     const char *name, struct inode *ip,
+				     struct dentry *dp)
 {
 	int rc;
 	char *buffer;
@@ -291,7 +292,8 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip,
 	if (buffer == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	rc = __vfs_getxattr(dp, ip, name, buffer, SMK_LONGLABEL);
+	rc = __vfs_getxattr(mnt_userns, dp, ip, name, buffer, SMK_LONGLABEL,
+			    XATTR_NOSECURITY);
 	if (rc < 0)
 		skp = ERR_PTR(rc);
 	else if (rc == 0)
@@ -3414,7 +3416,7 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 		 * Get the dentry for xattr.
 		 */
 		dp = dget(opt_dentry);
-		skp = smk_fetch(XATTR_NAME_SMACK, inode, dp);
+		skp = smk_fetch(&init_user_ns, XATTR_NAME_SMACK, inode, dp);
 		if (!IS_ERR_OR_NULL(skp))
 			final = skp;
 
@@ -3438,9 +3440,9 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 					TRANS_TRUE, TRANS_TRUE_SIZE,
 					0);
 			} else {
-				rc = __vfs_getxattr(dp, inode,
+				rc = __vfs_getxattr(&init_user_ns, dp, inode,
 					XATTR_NAME_SMACKTRANSMUTE, trattr,
-					TRANS_TRUE_SIZE);
+					TRANS_TRUE_SIZE, XATTR_NOSECURITY);
 				if (rc >= 0 && strncmp(trattr, TRANS_TRUE,
 						       TRANS_TRUE_SIZE) != 0)
 					rc = -EINVAL;
@@ -3451,13 +3453,13 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 		/*
 		 * Don't let the exec or mmap label be "*" or "@".
 		 */
-		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+		skp = smk_fetch(&init_user_ns, XATTR_NAME_SMACKEXEC, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
 		    skp == &smack_known_web)
 			skp = NULL;
 		isp->smk_task = skp;
 
-		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
+		skp = smk_fetch(&init_user_ns, XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
 		    skp == &smack_known_web)
 			skp = NULL;
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v19 2/4] overlayfs: handle XATTR_NOSECURITY flag for get xattr method
  2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
  2021-11-17  1:58 ` [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr David Anderson
@ 2021-11-17  1:58 ` David Anderson
  2021-11-17  1:58 ` [PATCH v19 3/4] overlayfs: override_creds=off option bypass creator_cred David Anderson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: David Anderson @ 2021-11-17  1:58 UTC (permalink / raw)
  Cc: David Anderson, Mark Salyzyn, linux-fsdevel, linux-unionfs,
	Stephen Smalley, linux-kernel, linux-security-module,
	kernel-team, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Amir Goldstein, linux-doc, selinux,
	paulmoore, Luca.Boccassi

Because of the overlayfs getxattr recursion, the incoming inode fails
to update the selinux sid resulting in avc denials being reported
against a target context of u:object_r:unlabeled:s0.

Solution is to respond to the XATTR_NOSECURITY flag in get xattr
method that calls the __vfs_getxattr handler instead so that the
context can be read in, rather than being denied with an -EACCES
when vfs_getxattr handler is called.

For the use case where access is to be blocked by the security layer.

The path then would be security(dentry) ->
__vfs_getxattr({dentry...XATTR_NOSECURITY}) ->
handler->get({dentry...XATTR_NOSECURITY}) ->
__vfs_getxattr({realdentry...XATTR_NOSECURITY}) ->
lower_handler->get({realdentry...XATTR_NOSECURITY}) which
would report back through the chain data and success as expected,
the logging security layer at the top would have the data to
determine the access permissions and report back to the logs and
the caller that the target context was blocked.

For selinux this would solve the cosmetic issue of the selinux log
and allow audit2allow to correctly report the rule needed to address
the access problem.

Check impure, opaque, origin & meta xattr with no sepolicy audit
(using __vfs_getxattr) since these operations are internal to
overlayfs operations and do not disclose any data.  This became
an issue for credential override off since sys_admin would have
been required by the caller; whereas would have been inherently
present for the creator since it performed the mount.

This is a change in operations since we do not check in the new
ovl_do_getxattr function if the credential override is off or not.
Reasoning is that the sepolicy check is unnecessary overhead,
especially since the check can be expensive.

Because for override credentials off, this affects _everyone_ that
underneath performs private xattr calls without the appropriate
sepolicy permissions and sys_admin capability.  Providing blanket
support for sys_admin would be bad for all possible callers.

For the override credentials on, this will affect only the mounter,
should it lack sepolicy permissions. Not considered a security
problem since mounting by definition has sys_admin capabilities,
but sepolicy contexts would still need to be crafted.

It should be noted that there is precedence, __vfs_getxattr is used
in other filesystems for their own internal trusted xattr management.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: David Anderson <dvander@google.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: linux-kernel@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: linux-doc@vger.kernel.org
Cc: selinux@vger.kernel.org
Cc: paulmoore@microsoft.com
Cc: Luca.Boccassi@microsoft.com

v19 - rebase

v18 - correct inode argument to __vfs_getxattr

v17 - rebase and add inode argument to __vfs_getxattr

v16 - rebase and merge internal getxattr operations patch

v15 - revert to v13 because xattr_gs_args rejected.

v14 - rebase to use xattr_gs_args.

v13 - rebase to use __vfs_getxattr flags option.

v12 - Added back to patch series as get xattr with flag option.

v11 - Squashed out of patch series and replaced with per-thread flag
      solution.

v10 - Added to patch series as __get xattr method.
---
 fs/overlayfs/inode.c     | 5 +++--
 fs/overlayfs/overlayfs.h | 6 ++++--
 fs/overlayfs/super.c     | 4 ++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1f36158c7dbe..49bfa33bb682 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -386,7 +386,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 }
 
 int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
-		  void *value, size_t size)
+		  void *value, size_t size, int flags)
 {
 	ssize_t res;
 	const struct cred *old_cred;
@@ -394,7 +394,8 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 		ovl_i_dentry_upper(inode) ?: ovl_dentry_lower(dentry);
 
 	old_cred = ovl_override_creds(dentry->d_sb);
-	res = vfs_getxattr(&init_user_ns, realdentry, name, value, size);
+	res = __vfs_getxattr(&init_user_ns, realdentry, d_inode(realdentry),
+			     name, value, size, flags);
 	revert_creds(old_cred);
 	return res;
 }
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 2cd5741c873b..3fcd62e72aad 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -187,7 +187,9 @@ static inline ssize_t ovl_do_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
 				      size_t size)
 {
 	const char *name = ovl_xattr(ofs, ox);
-	int err = vfs_getxattr(&init_user_ns, dentry, name, value, size);
+	struct inode *ip = d_inode(dentry);
+	int err = __vfs_getxattr(&init_user_ns, dentry, ip, name, value, size,
+				 XATTR_NOSECURITY);
 	int len = (value && err > 0) ? err : 0;
 
 	pr_debug("getxattr(%pd2, \"%s\", \"%*pE\", %zu, 0) = %i\n",
@@ -496,7 +498,7 @@ int ovl_permission(struct user_namespace *mnt_userns, struct inode *inode,
 int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		  const void *value, size_t size, int flags);
 int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
-		  void *value, size_t size);
+		  void *value, size_t size, int flags);
 ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size);
 struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu);
 int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 359aa5772cb7..973644af1288 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1006,7 +1006,7 @@ ovl_posix_acl_xattr_get(const struct xattr_handler *handler,
 			struct dentry *dentry, struct inode *inode,
 			const char *name, void *buffer, size_t size, int flags)
 {
-	return ovl_xattr_get(dentry, inode, handler->name, buffer, size);
+	return ovl_xattr_get(dentry, inode, handler->name, buffer, size, flags);
 }
 
 static int __maybe_unused
@@ -1087,7 +1087,7 @@ static int ovl_other_xattr_get(const struct xattr_handler *handler,
 			       const char *name, void *buffer, size_t size,
 			       int flags)
 {
-	return ovl_xattr_get(dentry, inode, name, buffer, size);
+	return ovl_xattr_get(dentry, inode, name, buffer, size, flags);
 }
 
 static int ovl_other_xattr_set(const struct xattr_handler *handler,
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v19 3/4] overlayfs: override_creds=off option bypass creator_cred
  2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
  2021-11-17  1:58 ` [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr David Anderson
  2021-11-17  1:58 ` [PATCH v19 2/4] overlayfs: handle XATTR_NOSECURITY flag for get xattr method David Anderson
@ 2021-11-17  1:58 ` David Anderson
  2021-11-17  1:58 ` [PATCH v19 4/4] overlayfs: inode_owner_or_capable called during execv David Anderson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: David Anderson @ 2021-11-17  1:58 UTC (permalink / raw)
  Cc: David Anderson, Mark Salyzyn, linux-fsdevel, linux-unionfs,
	Miklos Szeredi, Jonathan Corbet, Vivek Goyal, Eric W . Biederman,
	Amir Goldstein, Randy Dunlap, Stephen Smalley,
	linux-security-module, linux-doc, linux-kernel, kernel-team,
	selinux, paulmoore, Luca.Boccassi

By default, all access to the upper, lower and work directories is the
recorded mounter's MAC and DAC credentials.  The incoming accesses are
checked against the caller's credentials.

If the principles of least privilege are applied, the mounter's
credentials might not overlap the credentials of the caller's when
accessing the overlayfs filesystem.  For example, a file that a lower
DAC privileged caller can execute, is MAC denied to the generally
higher DAC privileged mounter, to prevent an attack vector.

We add the option to turn off override_creds in the mount options; all
subsequent operations after mount on the filesystem will be only the
caller's credentials.  The module boolean parameter and mount option
override_creds is also added as a presence check for this "feature",
existence of /sys/module/overlay/parameters/override_creds.

It was not always this way.  Circa 4.6 there was no recorded mounter's
credentials, instead privileged access to upper or work directories
were temporarily increased to perform the operations.  The MAC
(selinux) policies were caller's in all cases.  override_creds=off
partially returns us to this older access model minus the insecure
temporary credential increases.  This is to permit use in a system
with non-overlapping security models for each executable including
the agent that mounts the overlayfs filesystem.  In Android
this is the case since init, which performs the mount operations,
has a minimal MAC set of privileges to reduce any attack surface,
and services that use the content have a different set of MAC
privileges (eg: read, for vendor labelled configuration, execute for
vendor libraries and modules).  The caveats are not a problem in
the Android usage model, however they should be fixed for
completeness and for general use in time.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: David Anderson <dvander@google.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: linux-security-module@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-team@android.com
Cc: selinux@vger.kernel.org
Cc: paulmoore@microsoft.com
Cc: Luca.Boccassi@microsoft.com

v19
- rebase

v18
- rebase

v17
- move credential section for override_creds=off as a level 3 heading
  subsection of the permissions section.

v16
- Rebase, cover a few more new ovl_revert_creds callpoints.

v15
- Rebase

v14:
- fix an issue in ovl_create_or_link which leaks credentials.

v12 + v13
- Rebase

v11:
- add sb argument to ovl_revert_creds to match future work in progress
  in other commiter's hands.

v10:
- Rebase (and expand because of increased revert_cred usage)

v9:
- Add to the caveats

v8:
- drop pr_warn message after straw poll to remove it.
- added a use case in the commit message

v7:
- change name of internal parameter to ovl_override_creds_def
- report override_creds only if different than default

v6:
- Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
- Do better with the documentation.
- pr_warn message adjusted to report consequences.

v5:
- beefed up the caveats in the Documentation
- Is dependent on
  "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
  "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
- Added prwarn when override_creds=off

v4:
- spelling and grammar errors in text

v3:
- Change name from caller_credentials / creator_credentials to the
  boolean override_creds.
- Changed from creator to mounter credentials.
- Updated and fortified the documentation.
- Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS

v2:
- Forward port changed attr to stat, resulting in a build error.
- altered commit message.
---
 Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++++++++++++-
 fs/overlayfs/copy_up.c                  |  2 +-
 fs/overlayfs/dir.c                      | 17 +++++++++-------
 fs/overlayfs/file.c                     | 22 ++++++++++-----------
 fs/overlayfs/inode.c                    | 24 +++++++++++------------
 fs/overlayfs/namei.c                    |  6 +++---
 fs/overlayfs/overlayfs.h                |  1 +
 fs/overlayfs/ovl_entry.h                |  1 +
 fs/overlayfs/readdir.c                  |  8 ++++----
 fs/overlayfs/super.c                    | 22 ++++++++++++++++++++-
 fs/overlayfs/util.c                     | 13 +++++++++++--
 11 files changed, 100 insertions(+), 42 deletions(-)

diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 7da6c30ed596..d7cd4032134a 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst
@@ -195,7 +195,7 @@ handle it in two different ways:
 
 1. return EXDEV error: this error is returned by rename(2) when trying to
    move a file or directory across filesystem boundaries.  Hence
-   applications are usually prepared to hande this error (mv(1) for example
+   applications are usually prepared to handle this error (mv(1) for example
    recursively copies the directory tree).  This is the default behavior.
 
 2. If the "redirect_dir" feature is enabled, then the directory will be
@@ -324,6 +324,30 @@ and
 The resulting access permissions should be the same.  The difference is in
 the time of copy (on-demand vs. up-front).
 
+### Non overlapping credentials
+
+As noted above, all access to the upper, lower and work directories is the
+recorded mounter's MAC and DAC credentials.  The incoming accesses are
+checked against the caller's credentials.
+
+In the case where caller MAC or DAC credentials do not overlap the mounter, a
+use case available in older versions of the driver, the override_creds mount
+flag can be turned off.  For when the use pattern has caller with legitimate
+credentials where the mounter does not.  For example init may have been the
+mounter, but the caller would have execute or read MAC permissions where
+init would not.  override_creds off means all access, incoming, upper, lower
+or working, will be tested against the caller.
+
+Several unintended side effects will occur though.  The caller without certain
+key capabilities or lower privilege will not always be able to delete files or
+directories, create nodes, or search some restricted directories.  The ability
+to search and read a directory entry is spotty as a result of the cache
+mechanism not re-testing the credentials because of the assumption, a
+privileged caller can fill cache, then a lower privilege can read the directory
+cache.  The uneven security model where cache, upperdir and workdir are opened
+at privilege, but accessed without creating a form of privilege escalation,
+should only be used with strict understanding of the side effects and of the
+security policies.
 
 Multiple lower layers
 ---------------------
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b193d08a3dc3..de7fdcb3eb6c 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -1032,7 +1032,7 @@ static int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		dput(parent);
 		dput(next);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index f18490813170..7449825b6cbd 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -569,7 +569,7 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 			      struct ovl_cattr *attr, bool origin)
 {
 	int err;
-	const struct cred *old_cred;
+	const struct cred *old_cred, *hold_cred = NULL;
 	struct cred *override_cred;
 	struct dentry *parent = dentry->d_parent;
 
@@ -596,14 +596,15 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 		override_cred->fsgid = inode->i_gid;
 		if (!attr->hardlink) {
 			err = security_dentry_create_files_as(dentry,
-					attr->mode, &dentry->d_name, old_cred,
+					attr->mode, &dentry->d_name,
+					old_cred ? old_cred : current_cred(),
 					override_cred);
 			if (err) {
 				put_cred(override_cred);
 				goto out_revert_creds;
 			}
 		}
-		put_cred(override_creds(override_cred));
+		hold_cred = override_creds(override_cred);
 		put_cred(override_cred);
 
 		if (!ovl_dentry_is_whiteout(dentry))
@@ -612,7 +613,9 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 			err = ovl_create_over_whiteout(dentry, inode, attr);
 	}
 out_revert_creds:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred ?: hold_cred);
+	if (old_cred && hold_cred)
+		put_cred(hold_cred);
 	return err;
 }
 
@@ -689,7 +692,7 @@ static int ovl_set_link_redirect(struct dentry *dentry)
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	err = ovl_set_redirect(dentry, false);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -908,7 +911,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 		err = ovl_remove_upper(dentry, is_dir, &list);
 	else
 		err = ovl_remove_and_whiteout(dentry, &list);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (!err) {
 		if (is_dir)
 			clear_nlink(dentry->d_inode);
@@ -1283,7 +1286,7 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 out_unlock:
 	unlock_rename(new_upperdir, old_upperdir);
 out_revert_creds:
-	revert_creds(old_cred);
+	ovl_revert_creds(old->d_sb, old_cred);
 	if (update_nlink)
 		ovl_nlink_end(new);
 out_drop_write:
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index fa125feed0ff..11d8277c94cd 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -61,7 +61,7 @@ static struct file *ovl_open_realfile(const struct file *file,
 		realfile = open_with_fake_path(&file->f_path, flags, realinode,
 					       current_cred());
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
 		 file, file, ovl_whatisit(inode, realinode), file->f_flags,
@@ -205,7 +205,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 
 	old_cred = ovl_override_creds(inode->i_sb);
 	ret = vfs_llseek(real.file, offset, whence);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	file->f_pos = real.file->f_pos;
 	ovl_inode_unlock(inode);
@@ -334,7 +334,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 			ovl_aio_cleanup_handler(aio_req);
 	}
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	ovl_file_accessed(file);
 out_fdput:
 	fdput(real);
@@ -407,7 +407,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 			ovl_aio_cleanup_handler(aio_req);
 	}
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 out_fdput:
 	fdput(real);
 
@@ -453,7 +453,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	file_end_write(real.file);
 	/* Update size */
 	ovl_copyattr(realinode, inode);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 	fdput(real);
 
 out_unlock:
@@ -480,7 +480,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	if (file_inode(real.file) == ovl_inode_upper(file_inode(file))) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
 		ret = vfs_fsync_range(real.file, start, end, datasync);
-		revert_creds(old_cred);
+		ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	}
 
 	fdput(real);
@@ -504,7 +504,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = call_mmap(vma->vm_file, vma);
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	ovl_file_accessed(file);
 
 	return ret;
@@ -523,7 +523,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fallocate(real.file, mode, offset, len);
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(ovl_inode_real(inode), inode);
@@ -545,7 +545,7 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fadvise(real.file, offset, len, advice);
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 
 	fdput(real);
 
@@ -595,7 +595,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 						flags);
 		break;
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file_out)->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(ovl_inode_real(inode_out), inode_out);
@@ -654,7 +654,7 @@ static int ovl_flush(struct file *file, fl_owner_t id)
 	if (real.file->f_op->flush) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
 		err = real.file->f_op->flush(real.file, id);
-		revert_creds(old_cred);
+		ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	}
 	fdput(real);
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 49bfa33bb682..0123270e295f 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -78,7 +78,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		inode_lock(upperdentry->d_inode);
 		old_cred = ovl_override_creds(dentry->d_sb);
 		err = notify_change(&init_user_ns, upperdentry, attr, NULL);
-		revert_creds(old_cred);
+		ovl_revert_creds(dentry->d_sb, old_cred);
 		if (!err)
 			ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
 		inode_unlock(upperdentry->d_inode);
@@ -270,7 +270,7 @@ int ovl_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->nlink = dentry->d_inode->i_nlink;
 
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -305,7 +305,7 @@ int ovl_permission(struct user_namespace *mnt_userns,
 		mask |= MAY_READ;
 	}
 	err = inode_permission(&init_user_ns, realinode, mask);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
@@ -322,7 +322,7 @@ static const char *ovl_get_link(struct dentry *dentry,
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	p = vfs_get_link(ovl_dentry_real(dentry), done);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return p;
 }
 
@@ -353,7 +353,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 	if (!value && !upperdentry) {
 		old_cred = ovl_override_creds(dentry->d_sb);
 		err = vfs_getxattr(&init_user_ns, realdentry, name, NULL, 0);
-		revert_creds(old_cred);
+		ovl_revert_creds(dentry->d_sb, old_cred);
 		if (err < 0)
 			goto out_drop_write;
 	}
@@ -374,7 +374,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		WARN_ON(flags != XATTR_REPLACE);
 		err = vfs_removexattr(&init_user_ns, realdentry, name);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	/* copy c/mtime */
 	ovl_copyattr(d_inode(realdentry), inode);
@@ -396,7 +396,7 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 	old_cred = ovl_override_creds(dentry->d_sb);
 	res = __vfs_getxattr(&init_user_ns, realdentry, d_inode(realdentry),
 			     name, value, size, flags);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return res;
 }
 
@@ -424,7 +424,7 @@ ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size)
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	res = vfs_listxattr(realdentry, list, size);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (res <= 0 || size == 0)
 		return res;
 
@@ -462,7 +462,7 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu)
 
 	old_cred = ovl_override_creds(inode->i_sb);
 	acl = get_acl(realinode, type);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return acl;
 }
@@ -496,7 +496,7 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 	old_cred = ovl_override_creds(inode->i_sb);
 	err = realinode->i_op->fiemap(realinode, fieinfo, start, len);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
@@ -567,7 +567,7 @@ int ovl_fileattr_set(struct user_namespace *mnt_userns,
 		err = ovl_set_protattr(inode, upperpath.dentry, fa);
 		if (!err)
 			err = ovl_real_fileattr_set(&upperpath, fa);
-		revert_creds(old_cred);
+		ovl_revert_creds(inode->i_sb, old_cred);
 
 		/*
 		 * Merge real inode flags with inode flags read from
@@ -629,7 +629,7 @@ int ovl_fileattr_get(struct dentry *dentry, struct fileattr *fa)
 	old_cred = ovl_override_creds(inode->i_sb);
 	err = ovl_real_fileattr_get(&realpath, fa);
 	ovl_fileattr_prot_flags(inode, fa);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 1a9b515fc45d..0ecb6bab0f2a 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -1106,7 +1106,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	ovl_dentry_update_reval(dentry, upperdentry,
 			DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE);
 
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (origin_path) {
 		dput(origin_path->dentry);
 		kfree(origin_path);
@@ -1133,7 +1133,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	kfree(upperredirect);
 out:
 	kfree(d.redirect);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return ERR_PTR(err);
 }
 
@@ -1185,7 +1185,7 @@ bool ovl_lower_positive(struct dentry *dentry)
 			dput(this);
 		}
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return positive;
 }
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 3fcd62e72aad..16c6280d8201 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -282,6 +282,7 @@ int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
 const struct cred *ovl_override_creds(struct super_block *sb);
+void ovl_revert_creds(struct super_block *sb, const struct cred *oldcred);
 int ovl_can_decode_fh(struct super_block *sb);
 struct dentry *ovl_indexdir(struct super_block *sb);
 bool ovl_index_all(struct super_block *sb);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 63efee554f69..31aebf6d2ea4 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -20,6 +20,7 @@ struct ovl_config {
 	bool metacopy;
 	bool userxattr;
 	bool ovl_volatile;
+	bool override_creds;
 };
 
 struct ovl_sb {
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 150fdf3bc68d..c6038e6ad753 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -286,7 +286,7 @@ static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
 		}
 		inode_unlock(dir->d_inode);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(rdd->dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -789,7 +789,7 @@ static int ovl_iterate(struct file *file, struct dir_context *ctx)
 	}
 	err = 0;
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return err;
 }
 
@@ -841,7 +841,7 @@ static struct file *ovl_dir_open_realfile(const struct file *file,
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	res = ovl_path_open(realpath, O_RDONLY | (file->f_flags & O_LARGEFILE));
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 
 	return res;
 }
@@ -967,7 +967,7 @@ int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	err = ovl_dir_read_merged(dentry, list, &root);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (err)
 		return err;
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 973644af1288..d1c6e883790d 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -53,6 +53,11 @@ module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644);
 MODULE_PARM_DESC(xino_auto,
 		 "Auto enable xino feature");
 
+static bool __read_mostly ovl_override_creds_def = true;
+module_param_named(override_creds, ovl_override_creds_def, bool, 0644);
+MODULE_PARM_DESC(ovl_override_creds_def,
+		 "Use mounter's credentials for accesses");
+
 static void ovl_entry_stack_free(struct ovl_entry *oe)
 {
 	unsigned int i;
@@ -382,6 +387,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 		seq_puts(m, ",volatile");
 	if (ofs->config.userxattr)
 		seq_puts(m, ",userxattr");
+	if (ofs->config.override_creds != ovl_override_creds_def)
+		seq_show_option(m, "override_creds",
+				ofs->config.override_creds ? "on" : "off");
 	return 0;
 }
 
@@ -437,6 +445,8 @@ enum {
 	OPT_METACOPY_ON,
 	OPT_METACOPY_OFF,
 	OPT_VOLATILE,
+	OPT_OVERRIDE_CREDS_ON,
+	OPT_OVERRIDE_CREDS_OFF,
 	OPT_ERR,
 };
 
@@ -459,6 +469,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_METACOPY_ON,		"metacopy=on"},
 	{OPT_METACOPY_OFF,		"metacopy=off"},
 	{OPT_VOLATILE,			"volatile"},
+	{OPT_OVERRIDE_CREDS_ON,		"override_creds=on"},
+	{OPT_OVERRIDE_CREDS_OFF,	"override_creds=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -518,6 +530,7 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 	config->redirect_mode = kstrdup(ovl_redirect_mode_def(), GFP_KERNEL);
 	if (!config->redirect_mode)
 		return -ENOMEM;
+	config->override_creds = ovl_override_creds_def;
 
 	while ((p = ovl_next_opt(&opt)) != NULL) {
 		int token;
@@ -619,6 +632,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->userxattr = true;
 			break;
 
+		case OPT_OVERRIDE_CREDS_ON:
+			config->override_creds = true;
+			break;
+
+		case OPT_OVERRIDE_CREDS_OFF:
+			config->override_creds = false;
+			break;
+
 		default:
 			pr_err("unrecognized mount option \"%s\" or missing value\n",
 					p);
@@ -2144,7 +2165,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(splitlower);
 
 	sb->s_root = root_dentry;
-
 	return 0;
 
 out_free_oe:
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f48284a2a896..32eceb495202 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -38,9 +38,18 @@ const struct cred *ovl_override_creds(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
 
+	if (!ofs->config.override_creds)
+		return NULL;
 	return override_creds(ofs->creator_cred);
 }
 
+void ovl_revert_creds(struct super_block *sb, const struct cred *old_cred)
+{
+	if (old_cred)
+		revert_creds(old_cred);
+}
+
+
 /*
  * Check if underlying fs supports file handles and try to determine encoding
  * type, in order to deduce maximum inode number used by fs.
@@ -899,7 +908,7 @@ int ovl_nlink_start(struct dentry *dentry)
 	 * value relative to the upper inode nlink in an upper inode xattr.
 	 */
 	err = ovl_set_nlink_upper(dentry);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 out:
 	if (err)
@@ -917,7 +926,7 @@ void ovl_nlink_end(struct dentry *dentry)
 
 		old_cred = ovl_override_creds(dentry->d_sb);
 		ovl_cleanup_index(dentry);
-		revert_creds(old_cred);
+		ovl_revert_creds(dentry->d_sb, old_cred);
 	}
 
 	ovl_inode_unlock(inode);
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v19 4/4] overlayfs: inode_owner_or_capable called during execv
  2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
                   ` (2 preceding siblings ...)
  2021-11-17  1:58 ` [PATCH v19 3/4] overlayfs: override_creds=off option bypass creator_cred David Anderson
@ 2021-11-17  1:58 ` David Anderson
  2021-11-17  2:18 ` [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix Casey Schaufler
  2021-11-17  7:36 ` Amir Goldstein
  5 siblings, 0 replies; 29+ messages in thread
From: David Anderson @ 2021-11-17  1:58 UTC (permalink / raw)
  Cc: David Anderson, John Stultz, Mark Salyzyn, linux-fsdevel,
	linux-unionfs, Stephen Smalley, linux-kernel,
	linux-security-module, kernel-team, selinux, paulmoore,
	Luca.Boccassi

From: John Stultz <john.stultz@linaro.org>

Using old_creds as an indication that we are not overriding the
credentials, bypass call to inode_owner_or_capable.  This solves
a problem with all execv calls being blocked when using the caller's
credentials.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: David Anderson <dvander@google.com>
Fixes: 05acefb4872da ("ovl: check permission to open real file")
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: linux-kernel@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: selinux@vger.kernel.org
Cc: paulmoore@microsoft.com
Cc: Luca.Boccassi@microsoft.com

v19 - rebase

v18 - rebase

v17 - rebase

v16 - introduced fix over rebased series
---
 fs/overlayfs/file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 11d8277c94cd..586de55bba79 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -55,7 +55,8 @@ static struct file *ovl_open_realfile(const struct file *file,
 	if (err) {
 		realfile = ERR_PTR(err);
 	} else {
-		if (!inode_owner_or_capable(&init_user_ns, realinode))
+		if (old_cred && !inode_owner_or_capable(&init_user_ns,
+							realinode))
 			flags &= ~O_NOATIME;
 
 		realfile = open_with_fake_path(&file->f_path, flags, realinode,
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
                   ` (3 preceding siblings ...)
  2021-11-17  1:58 ` [PATCH v19 4/4] overlayfs: inode_owner_or_capable called during execv David Anderson
@ 2021-11-17  2:18 ` Casey Schaufler
  2021-11-18  7:59   ` David Anderson
  2021-11-17  7:36 ` Amir Goldstein
  5 siblings, 1 reply; 29+ messages in thread
From: Casey Schaufler @ 2021-11-17  2:18 UTC (permalink / raw)
  To: David Anderson
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Amir Goldstein, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, linux-unionfs, linux-security-module, kernel-team,
	selinux, paulmoore, Luca.Boccassi, Casey Schaufler

On 11/16/2021 5:58 PM, David Anderson wrote:
> Mark Salyzyn (3):
>    Add flags option to get xattr method paired to __vfs_getxattr
>    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>    overlayfs: override_creds=off option bypass creator_cred
>
> Mark Salyzyn + John Stultz (1):
>    overlayfs: inode_owner_or_capable called during execv
>
> The first three patches address fundamental security issues that should
> be solved regardless of the override_creds=off feature.
>
> The fourth adds the feature depends on these other fixes.
>
> By default, all access to the upper, lower and work directories is the
> recorded mounter's MAC and DAC credentials.  The incoming accesses are
> checked against the caller's credentials.

This isn't very clear. Are you saying that the security attributes
of the upper, lower, and work directories are determined by the
attributes of the process that mounted the filesystem? What is an
"incoming access"? I'm sure that means something if you're steeped
in the lore of overlayfs, but it isn't obvious to me.

> If the principles of least privilege are applied for sepolicy, the
> mounter's credentials might not overlap the credentials of the caller's
> when accessing the overlayfs filesystem.

I'm sorry, but I've tried pretty hard, and can't puzzle that one out.

>    For example, a file that a
> lower DAC privileged caller can execute, is MAC denied to the
> generally higher DAC privileged mounter, to prevent an attack vector.

DAC privileges are not hierarchical. This doesn't make any sense.

> We add the option to turn off override_creds in the mount options; all
> subsequent operations after mount on the filesystem will be only the
> caller's credentials.

I think I might have figured that one out, but in order to do so
I have to make way too many assumptions about the earlier paragraph.
Could you please try to explain what you're doing with more context?

>    The module boolean parameter and mount option
> override_creds is also added as a presence check for this "feature",
> existence of /sys/module/overlay/parameters/overlay_creds
>
> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> Signed-off-by: David Anderson <dvander@google.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-team@android.com
> Cc: selinux@vger.kernel.org
> Cc: paulmoore@microsoft.com
> Cc: Luca.Boccassi@microsoft.com
>
> ---
>
> v19
> - rebase.
>
> v18
> - rebase + fix minor cut and paste error for inode argument in __vfs_getxattr
>
> v17
> - correct some zero-day build failures.
> - fix up documentation
>
> v16
> - rebase and merge of two patches.
> - add adjustment to deal with execv when overrides is off.
>
> v15
> - Revert back to v4 with fixes from on the way from v5-v14. The single
>    structure argument passing to address the complaints about too many
>    arguments was rejected by the community.
> - Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH
>    check. Can address that independently.
> - ToDo: upstream test frame for thes security fixes (currently testing
>    is all in Android).
>
> v14:
> - Rejoin, rebase and a few adjustments.
>
> v13:
> - Pull out first patch and try to get it in alone feedback, some
>    Acks, and then <crickets> because people forgot why we were doing i.
>
> v12:
> - Restore squished out patch 2 and 3 in the series,
>    then change algorithm to add flags argument.
>    Per-thread flag is a large security surface.
>
> v11:
> - Squish out v10 introduced patch 2 and 3 in the series,
>    then and use per-thread flag instead for nesting.
> - Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper.
> - Add sb argument to ovl_revert_creds to match future work.
>
> v10:
> - Return NULL on CAP_DAC_READ_SEARCH
> - Add __get xattr method to solve sepolicy logging issue
> - Drop unnecessary sys_admin sepolicy checking for administrative
>    driver internal xattr functions.
>
> v6:
> - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
> - Do better with the documentation, drop rationalizations.
> - pr_warn message adjusted to report consequences.
>
> v5:
> - beefed up the caveats in the Documentation
> - Is dependent on
>    "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
>    "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
> - Added prwarn when override_creds=off
>
> v4:
> - spelling and grammar errors in text
>
> v3:
> - Change name from caller_credentials / creator_credentials to the
>    boolean override_creds.
> - Changed from creator to mounter credentials.
> - Updated and fortified the documentation.
> - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS
>
> v2:
> - Forward port changed attr to stat, resulting in a build error.
> - altered commit message.
>
> David Anderson (4):
>    Add flags option to get xattr method paired to __vfs_getxattr
>    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>    overlayfs: override_creds=off option bypass creator_cred
>    overlayfs: inode_owner_or_capable called during execv
>
>   Documentation/filesystems/locking.rst   |  2 +-
>   Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++-
>   fs/9p/acl.c                             |  3 +-
>   fs/9p/xattr.c                           |  3 +-
>   fs/afs/xattr.c                          | 10 +++---
>   fs/attr.c                               |  2 +-
>   fs/btrfs/xattr.c                        |  3 +-
>   fs/ceph/xattr.c                         |  3 +-
>   fs/cifs/xattr.c                         |  2 +-
>   fs/ecryptfs/inode.c                     |  6 ++--
>   fs/ecryptfs/mmap.c                      |  5 +--
>   fs/erofs/xattr.c                        |  3 +-
>   fs/ext2/xattr_security.c                |  2 +-
>   fs/ext2/xattr_trusted.c                 |  2 +-
>   fs/ext2/xattr_user.c                    |  2 +-
>   fs/ext4/xattr_hurd.c                    |  2 +-
>   fs/ext4/xattr_security.c                |  2 +-
>   fs/ext4/xattr_trusted.c                 |  2 +-
>   fs/ext4/xattr_user.c                    |  2 +-
>   fs/f2fs/xattr.c                         |  4 +--
>   fs/fuse/xattr.c                         |  4 +--
>   fs/gfs2/xattr.c                         |  3 +-
>   fs/hfs/attr.c                           |  2 +-
>   fs/hfsplus/xattr.c                      |  3 +-
>   fs/hfsplus/xattr_security.c             |  3 +-
>   fs/hfsplus/xattr_trusted.c              |  3 +-
>   fs/hfsplus/xattr_user.c                 |  3 +-
>   fs/inode.c                              |  7 +++--
>   fs/internal.h                           |  3 +-
>   fs/jffs2/security.c                     |  3 +-
>   fs/jffs2/xattr_trusted.c                |  3 +-
>   fs/jffs2/xattr_user.c                   |  3 +-
>   fs/jfs/xattr.c                          |  5 +--
>   fs/kernfs/inode.c                       |  3 +-
>   fs/nfs/nfs4proc.c                       |  9 ++++--
>   fs/ntfs3/xattr.c                        |  2 +-
>   fs/ocfs2/xattr.c                        |  9 ++++--
>   fs/open.c                               |  2 +-
>   fs/orangefs/xattr.c                     |  3 +-
>   fs/overlayfs/copy_up.c                  |  2 +-
>   fs/overlayfs/dir.c                      | 17 +++++-----
>   fs/overlayfs/file.c                     | 25 ++++++++-------
>   fs/overlayfs/inode.c                    | 29 ++++++++---------
>   fs/overlayfs/namei.c                    |  6 ++--
>   fs/overlayfs/overlayfs.h                |  7 +++--
>   fs/overlayfs/ovl_entry.h                |  1 +
>   fs/overlayfs/readdir.c                  |  8 ++---
>   fs/overlayfs/super.c                    | 34 ++++++++++++++++----
>   fs/overlayfs/util.c                     | 13 ++++++--
>   fs/posix_acl.c                          |  2 +-
>   fs/reiserfs/xattr_security.c            |  3 +-
>   fs/reiserfs/xattr_trusted.c             |  3 +-
>   fs/reiserfs/xattr_user.c                |  3 +-
>   fs/squashfs/xattr.c                     |  2 +-
>   fs/ubifs/xattr.c                        |  3 +-
>   fs/xattr.c                              | 42 +++++++++++++------------
>   fs/xfs/xfs_xattr.c                      |  3 +-
>   include/linux/lsm_hook_defs.h           |  3 +-
>   include/linux/security.h                |  6 ++--
>   include/linux/xattr.h                   |  6 ++--
>   include/uapi/linux/xattr.h              |  7 +++--
>   mm/shmem.c                              |  3 +-
>   net/socket.c                            |  3 +-
>   security/commoncap.c                    | 11 ++++---
>   security/integrity/evm/evm_main.c       | 13 +++++---
>   security/security.c                     |  5 +--
>   security/selinux/hooks.c                | 19 ++++++-----
>   security/smack/smack_lsm.c              | 18 ++++++-----
>   68 files changed, 289 insertions(+), 167 deletions(-)
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
                   ` (4 preceding siblings ...)
  2021-11-17  2:18 ` [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix Casey Schaufler
@ 2021-11-17  7:36 ` Amir Goldstein
  2021-11-18  9:53   ` David Anderson
  2021-12-03 15:37   ` Vivek Goyal
  5 siblings, 2 replies; 29+ messages in thread
From: Amir Goldstein @ 2021-11-17  7:36 UTC (permalink / raw)
  To: David Anderson
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, Luca.Boccassi

On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
>
> Mark Salyzyn (3):
>   Add flags option to get xattr method paired to __vfs_getxattr
>   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>   overlayfs: override_creds=off option bypass creator_cred
>
> Mark Salyzyn + John Stultz (1):
>   overlayfs: inode_owner_or_capable called during execv
>
> The first three patches address fundamental security issues that should
> be solved regardless of the override_creds=off feature.
>
> The fourth adds the feature depends on these other fixes.
>
> By default, all access to the upper, lower and work directories is the
> recorded mounter's MAC and DAC credentials.  The incoming accesses are
> checked against the caller's credentials.
>
> If the principles of least privilege are applied for sepolicy, the
> mounter's credentials might not overlap the credentials of the caller's
> when accessing the overlayfs filesystem.  For example, a file that a
> lower DAC privileged caller can execute, is MAC denied to the
> generally higher DAC privileged mounter, to prevent an attack vector.
>
> We add the option to turn off override_creds in the mount options; all
> subsequent operations after mount on the filesystem will be only the
> caller's credentials.  The module boolean parameter and mount option
> override_creds is also added as a presence check for this "feature",
> existence of /sys/module/overlay/parameters/overlay_creds
>
> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> Signed-off-by: David Anderson <dvander@google.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-team@android.com
> Cc: selinux@vger.kernel.org
> Cc: paulmoore@microsoft.com
> Cc: Luca.Boccassi@microsoft.com
>
> ---
>
> v19
> - rebase.
>

Hi David,

I see that the patch set has changed hands (presumably to Android upstreaming
team), but you just rebased v18 without addressing the maintainers concerns [1].

Specifically, the patch 2/4 is very wrong for unprivileged mount and
I think that the very noisy patch 1/4 could be completely avoided:
Can't you use -o userxattr mount option for Android use case and limit
the manipulation of user.ovrelay.* xattr based on sepolicy for actors
that are allowed
to make changes in overlayfs mount? or not limit at all?
The access to those xattr is forbidden via "incoming" xattr ops on
overlay inodes.

Also, IMO, the Documentation section about "Non overlapping credentials" does
not hold the same standards as the section about "Permission model" -
it does not
state the requirements clear enough for my non-security-oriented brain to be
able to understand if the "Ignore mounter's credentials" model can be exploited.

Can an unprivileged user create an overlay over a directory that they have
access to and redirect an innocent looking file name to an underlying file that
said the mounting user has no access to and by doing that, tricking a privileged
user to modify the innocent looking file on the  mounter's behalf?
Of course this could be avoided by forbidding unprivileged mount with
override_creds=off, but there could be other scenarios, so a clear model
would help to understand the risks.

For example:
If user 1 was able to read in lower dir A, now the content of overlay dir A
is cached and user 2, that has permissions to read upper dir A and does
not have read permissions on lower dir A will see the content of lower dir A.

I think that the core problem with the approach is using Non-uniform
credentials to access underlying layers. I don't see a simple way around
a big audit that checks all those cases, but maybe I'm missing some quick
shortcut or maybe your use case can add some restrictions about the
users that could access this overlay that would simplify the generic problem.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr
  2021-11-17  1:58 ` [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr David Anderson
@ 2021-11-17 16:13   ` kernel test robot
  2022-03-25 11:02   ` Luca Weiss
  1 sibling, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-11-17 16:13 UTC (permalink / raw)
  To: David Anderson
  Cc: kbuild-all, David Anderson, Mark Salyzyn, Jan Kara, Jeff Layton,
	David Sterba, Darrick J . Wong, Mike Marshall, linux-fsdevel,
	linux-unionfs, Stephen Smalley

[-- Attachment #1: Type: text/plain, Size: 18570 bytes --]

Hi David,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.16-rc1 next-20211117]
[cannot apply to mszeredi-vfs/overlayfs-next tytso-ext4/dev jmorris-security/next-testing]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Anderson/overlayfs-override_creds-off-nested-get-xattr-fix/20211117-100030
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8ab774587903771821b59471cc723bba6d893942
config: arm-randconfig-c002-20211116 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/73277167dc9cad1e636a76e9f993d8f30b289d02
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Anderson/overlayfs-override_creds-off-nested-get-xattr-fix/20211117-100030
        git checkout 73277167dc9cad1e636a76e9f993d8f30b289d02
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from fs/open.c:19:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from include/linux/fs_context.h:14,
                    from include/linux/pseudo_fs.h:4,
                    from fs/pipe.c:17:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   fs/pipe.c: At top level:
   fs/pipe.c:755:15: warning: no previous prototype for 'account_pipe_buffers' [-Wmissing-prototypes]
     755 | unsigned long account_pipe_buffers(struct user_struct *user,
         |               ^~~~~~~~~~~~~~~~~~~~
   fs/pipe.c:761:6: warning: no previous prototype for 'too_many_pipe_buffers_soft' [-Wmissing-prototypes]
     761 | bool too_many_pipe_buffers_soft(unsigned long user_bufs)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/pipe.c:768:6: warning: no previous prototype for 'too_many_pipe_buffers_hard' [-Wmissing-prototypes]
     768 | bool too_many_pipe_buffers_hard(unsigned long user_bufs)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/pipe.c:775:6: warning: no previous prototype for 'pipe_is_unprivileged_user' [-Wmissing-prototypes]
     775 | bool pipe_is_unprivileged_user(void)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~
   fs/pipe.c:1245:5: warning: no previous prototype for 'pipe_resize_ring' [-Wmissing-prototypes]
    1245 | int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
         |     ^~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from fs/inode.c:12:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   fs/inode.c: In function 'dentry_needs_remove_privs':
>> fs/inode.c:1921:44: error: passing argument 1 of 'security_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
    1921 |         ret = security_inode_need_killpriv(mnt_userns, dentry);
         |                                            ^~~~~~~~~~
         |                                            |
         |                                            struct user_namespace *
   In file included from fs/inode.c:12:
   include/linux/security.h:899:63: note: expected 'struct dentry *' but argument is of type 'struct user_namespace *'
     899 | static inline int security_inode_need_killpriv(struct dentry *dentry)
         |                                                ~~~~~~~~~~~~~~~^~~~~~
>> fs/inode.c:1921:15: error: too many arguments to function 'security_inode_need_killpriv'
    1921 |         ret = security_inode_need_killpriv(mnt_userns, dentry);
         |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from fs/inode.c:12:
   include/linux/security.h:899:19: note: declared here
     899 | static inline int security_inode_need_killpriv(struct dentry *dentry)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h: In function 'security_inode_need_killpriv':
   include/linux/security.h:902:1: error: control reaches end of non-void function [-Werror=return-type]
     902 | }
         | ^
   cc1: some warnings being treated as errors
--
   In file included from fs/attr.c:17:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   fs/attr.c: In function 'notify_change':
>> fs/attr.c:345:54: error: passing argument 1 of 'security_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     345 |                 error = security_inode_need_killpriv(mnt_userns, dentry);
         |                                                      ^~~~~~~~~~
         |                                                      |
         |                                                      struct user_namespace *
   In file included from fs/attr.c:17:
   include/linux/security.h:899:63: note: expected 'struct dentry *' but argument is of type 'struct user_namespace *'
     899 | static inline int security_inode_need_killpriv(struct dentry *dentry)
         |                                                ~~~~~~~~~~~~~~~^~~~~~
>> fs/attr.c:345:25: error: too many arguments to function 'security_inode_need_killpriv'
     345 |                 error = security_inode_need_killpriv(mnt_userns, dentry);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from fs/attr.c:17:
   include/linux/security.h:899:19: note: declared here
     899 | static inline int security_inode_need_killpriv(struct dentry *dentry)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h: In function 'security_inode_need_killpriv':
   include/linux/security.h:902:1: error: control reaches end of non-void function [-Werror=return-type]
     902 | }
         | ^
   cc1: some warnings being treated as errors
--
   In file included from include/linux/perf_event.h:59,
                    from include/linux/trace_events.h:10,
                    from include/trace/syscall.h:7,
                    from include/linux/syscalls.h:88,
                    from fs/d_path.c:2:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   fs/d_path.c: At top level:
   fs/d_path.c:318:7: warning: no previous prototype for 'simple_dname' [-Wmissing-prototypes]
     318 | char *simple_dname(struct dentry *dentry, char *buffer, int buflen)
         |       ^~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from kernel/trace/trace.c:20:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace.c: In function 'trace_check_vprintf':
   kernel/trace/trace.c:3813:17: warning: function 'trace_check_vprintf' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
    3813 |                 trace_seq_vprintf(&iter->seq, iter->fmt, ap);
         |                 ^~~~~~~~~~~~~~~~~
   kernel/trace/trace.c:3868:17: warning: function 'trace_check_vprintf' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
    3868 |                 trace_seq_vprintf(&iter->seq, p, ap);
         |                 ^~~~~~~~~~~~~~~~~
   At top level:
   kernel/trace/trace.c:1668:37: warning: 'tracing_max_lat_fops' defined but not used [-Wunused-const-variable=]
    1668 | static const struct file_operations tracing_max_lat_fops;
         |                                     ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from include/linux/perf_event.h:59,
                    from include/linux/hw_breakpoint.h:5,
                    from kernel/trace/trace.h:15,
                    from kernel/trace/trace_output.h:6,
                    from kernel/trace/trace_output.c:15:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_output.c: In function 'trace_output_raw':
   kernel/trace/trace_output.c:332:9: warning: function 'trace_output_raw' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
     332 |         trace_seq_vprintf(s, trace_event_format(iter, fmt), ap);
         |         ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from include/linux/perf_event.h:59,
                    from include/linux/hw_breakpoint.h:5,
                    from kernel/trace/trace.h:15,
                    from kernel/trace/trace_preemptirq.c:13:
   include/linux/security.h: In function 'security_inode_need_killpriv':
>> include/linux/security.h:901:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
     901 |         return cap_inode_need_killpriv(dentry);
         |                                        ^~~~~~
         |                                        |
         |                                        struct dentry *
   include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>> include/linux/security.h:901:16: error: too few arguments to function 'cap_inode_need_killpriv'
     901 |         return cap_inode_need_killpriv(dentry);
         |                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/security.h:153:5: note: declared here
     153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_preemptirq.c: At top level:
   kernel/trace/trace_preemptirq.c:88:16: warning: no previous prototype for 'trace_hardirqs_on_caller' [-Wmissing-prototypes]
      88 | __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
         |                ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_preemptirq.c:103:16: warning: no previous prototype for 'trace_hardirqs_off_caller' [-Wmissing-prototypes]
     103 | __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/cap_inode_need_killpriv +901 include/linux/security.h

^1da177e4c3f415 Linus Torvalds  2005-04-16  898  
b53767719b6cd87 Serge E. Hallyn 2007-10-16  899  static inline int security_inode_need_killpriv(struct dentry *dentry)
b53767719b6cd87 Serge E. Hallyn 2007-10-16  900  {
b53767719b6cd87 Serge E. Hallyn 2007-10-16 @901  	return cap_inode_need_killpriv(dentry);
b53767719b6cd87 Serge E. Hallyn 2007-10-16  902  }
b53767719b6cd87 Serge E. Hallyn 2007-10-16  903  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38968 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-17  2:18 ` [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix Casey Schaufler
@ 2021-11-18  7:59   ` David Anderson
  0 siblings, 0 replies; 29+ messages in thread
From: David Anderson @ 2021-11-18  7:59 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Amir Goldstein, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, linux-unionfs, linux-security-module, kernel-team,
	selinux, paulmoore, luca.boccassi

On Tue, Nov 16, 2021 at 6:18 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/16/2021 5:58 PM, David Anderson wrote:
> > Mark Salyzyn (3):
> >
> > By default, all access to the upper, lower and work directories is the
> > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > checked against the caller's credentials.
>
> This isn't very clear. Are you saying that the security attributes
> of the upper, lower, and work directories are determined by the
> attributes of the process that mounted the filesystem? What is an
> "incoming access"? I'm sure that means something if you're steeped
> in the lore of overlayfs, but it isn't obvious to me.

(Sorry, hitting "Reply All" this time...)

Thanks for taking a look - Yes. An "incoming access" is the user
application security context accessing the filesystem.

> > If the principles of least privilege are applied for sepolicy, the
> > mounter's credentials might not overlap the credentials of the caller's
> > when accessing the overlayfs filesystem.
>
> I'm sorry, but I've tried pretty hard, and can't puzzle that one out.

If your sepolicy is designed to give processes minimal privileges (as ours is),
then "init" might lack privileges even though other processes have them. For
example, init can mount /x but not access /x/y/z. But, process XYZ can access
/x/y/z. In our system processes have no privileges to anything by default,
and permissions are granted as needed, as narrowly as possible.

> DAC privileges are not hierarchical. This doesn't make any sense.

Sorry, that was probably not the right word. The intent was to say that a
process with minimal DAC privileges might be able to access a file, but
a process with expansive DAC privileges might be denied access to the
same file due to MAC restrictions.

> I think I might have figured that one out, but in order to do so
> I have to make way too many assumptions about the earlier paragraph.
> Could you please try to explain what you're doing with more context?

Hopefully the above helps explain: overlayfs uses the mounter's privileges,
which does not work on a system where the mounter does not have a
superset of child processes' privileges. That's the crux of the issue and
I'll keep working on how it's communicated in the patch description.

-David

On Tue, Nov 16, 2021 at 6:18 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/16/2021 5:58 PM, David Anderson wrote:
> > Mark Salyzyn (3):
> >    Add flags option to get xattr method paired to __vfs_getxattr
> >    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> >    overlayfs: override_creds=off option bypass creator_cred
> >
> > Mark Salyzyn + John Stultz (1):
> >    overlayfs: inode_owner_or_capable called during execv
> >
> > The first three patches address fundamental security issues that should
> > be solved regardless of the override_creds=off feature.
> >
> > The fourth adds the feature depends on these other fixes.
> >
> > By default, all access to the upper, lower and work directories is the
> > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > checked against the caller's credentials.
>
> This isn't very clear. Are you saying that the security attributes
> of the upper, lower, and work directories are determined by the
> attributes of the process that mounted the filesystem? What is an
> "incoming access"? I'm sure that means something if you're steeped
> in the lore of overlayfs, but it isn't obvious to me.
>
> > If the principles of least privilege are applied for sepolicy, the
> > mounter's credentials might not overlap the credentials of the caller's
> > when accessing the overlayfs filesystem.
>
> I'm sorry, but I've tried pretty hard, and can't puzzle that one out.
>
> >    For example, a file that a
> > lower DAC privileged caller can execute, is MAC denied to the
> > generally higher DAC privileged mounter, to prevent an attack vector.
>
> DAC privileges are not hierarchical. This doesn't make any sense.
>
> > We add the option to turn off override_creds in the mount options; all
> > subsequent operations after mount on the filesystem will be only the
> > caller's credentials.
>
> I think I might have figured that one out, but in order to do so
> I have to make way too many assumptions about the earlier paragraph.
> Could you please try to explain what you're doing with more context?
>
> >    The module boolean parameter and mount option
> > override_creds is also added as a presence check for this "feature",
> > existence of /sys/module/overlay/parameters/overlay_creds
> >
> > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > Signed-off-by: David Anderson <dvander@google.com>
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-team@android.com
> > Cc: selinux@vger.kernel.org
> > Cc: paulmoore@microsoft.com
> > Cc: Luca.Boccassi@microsoft.com
> >
> > ---
> >
> > v19
> > - rebase.
> >
> > v18
> > - rebase + fix minor cut and paste error for inode argument in __vfs_getxattr
> >
> > v17
> > - correct some zero-day build failures.
> > - fix up documentation
> >
> > v16
> > - rebase and merge of two patches.
> > - add adjustment to deal with execv when overrides is off.
> >
> > v15
> > - Revert back to v4 with fixes from on the way from v5-v14. The single
> >    structure argument passing to address the complaints about too many
> >    arguments was rejected by the community.
> > - Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH
> >    check. Can address that independently.
> > - ToDo: upstream test frame for thes security fixes (currently testing
> >    is all in Android).
> >
> > v14:
> > - Rejoin, rebase and a few adjustments.
> >
> > v13:
> > - Pull out first patch and try to get it in alone feedback, some
> >    Acks, and then <crickets> because people forgot why we were doing i.
> >
> > v12:
> > - Restore squished out patch 2 and 3 in the series,
> >    then change algorithm to add flags argument.
> >    Per-thread flag is a large security surface.
> >
> > v11:
> > - Squish out v10 introduced patch 2 and 3 in the series,
> >    then and use per-thread flag instead for nesting.
> > - Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper.
> > - Add sb argument to ovl_revert_creds to match future work.
> >
> > v10:
> > - Return NULL on CAP_DAC_READ_SEARCH
> > - Add __get xattr method to solve sepolicy logging issue
> > - Drop unnecessary sys_admin sepolicy checking for administrative
> >    driver internal xattr functions.
> >
> > v6:
> > - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
> > - Do better with the documentation, drop rationalizations.
> > - pr_warn message adjusted to report consequences.
> >
> > v5:
> > - beefed up the caveats in the Documentation
> > - Is dependent on
> >    "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
> >    "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
> > - Added prwarn when override_creds=off
> >
> > v4:
> > - spelling and grammar errors in text
> >
> > v3:
> > - Change name from caller_credentials / creator_credentials to the
> >    boolean override_creds.
> > - Changed from creator to mounter credentials.
> > - Updated and fortified the documentation.
> > - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS
> >
> > v2:
> > - Forward port changed attr to stat, resulting in a build error.
> > - altered commit message.
> >
> > David Anderson (4):
> >    Add flags option to get xattr method paired to __vfs_getxattr
> >    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> >    overlayfs: override_creds=off option bypass creator_cred
> >    overlayfs: inode_owner_or_capable called during execv
> >
> >   Documentation/filesystems/locking.rst   |  2 +-
> >   Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++-
> >   fs/9p/acl.c                             |  3 +-
> >   fs/9p/xattr.c                           |  3 +-
> >   fs/afs/xattr.c                          | 10 +++---
> >   fs/attr.c                               |  2 +-
> >   fs/btrfs/xattr.c                        |  3 +-
> >   fs/ceph/xattr.c                         |  3 +-
> >   fs/cifs/xattr.c                         |  2 +-
> >   fs/ecryptfs/inode.c                     |  6 ++--
> >   fs/ecryptfs/mmap.c                      |  5 +--
> >   fs/erofs/xattr.c                        |  3 +-
> >   fs/ext2/xattr_security.c                |  2 +-
> >   fs/ext2/xattr_trusted.c                 |  2 +-
> >   fs/ext2/xattr_user.c                    |  2 +-
> >   fs/ext4/xattr_hurd.c                    |  2 +-
> >   fs/ext4/xattr_security.c                |  2 +-
> >   fs/ext4/xattr_trusted.c                 |  2 +-
> >   fs/ext4/xattr_user.c                    |  2 +-
> >   fs/f2fs/xattr.c                         |  4 +--
> >   fs/fuse/xattr.c                         |  4 +--
> >   fs/gfs2/xattr.c                         |  3 +-
> >   fs/hfs/attr.c                           |  2 +-
> >   fs/hfsplus/xattr.c                      |  3 +-
> >   fs/hfsplus/xattr_security.c             |  3 +-
> >   fs/hfsplus/xattr_trusted.c              |  3 +-
> >   fs/hfsplus/xattr_user.c                 |  3 +-
> >   fs/inode.c                              |  7 +++--
> >   fs/internal.h                           |  3 +-
> >   fs/jffs2/security.c                     |  3 +-
> >   fs/jffs2/xattr_trusted.c                |  3 +-
> >   fs/jffs2/xattr_user.c                   |  3 +-
> >   fs/jfs/xattr.c                          |  5 +--
> >   fs/kernfs/inode.c                       |  3 +-
> >   fs/nfs/nfs4proc.c                       |  9 ++++--
> >   fs/ntfs3/xattr.c                        |  2 +-
> >   fs/ocfs2/xattr.c                        |  9 ++++--
> >   fs/open.c                               |  2 +-
> >   fs/orangefs/xattr.c                     |  3 +-
> >   fs/overlayfs/copy_up.c                  |  2 +-
> >   fs/overlayfs/dir.c                      | 17 +++++-----
> >   fs/overlayfs/file.c                     | 25 ++++++++-------
> >   fs/overlayfs/inode.c                    | 29 ++++++++---------
> >   fs/overlayfs/namei.c                    |  6 ++--
> >   fs/overlayfs/overlayfs.h                |  7 +++--
> >   fs/overlayfs/ovl_entry.h                |  1 +
> >   fs/overlayfs/readdir.c                  |  8 ++---
> >   fs/overlayfs/super.c                    | 34 ++++++++++++++++----
> >   fs/overlayfs/util.c                     | 13 ++++++--
> >   fs/posix_acl.c                          |  2 +-
> >   fs/reiserfs/xattr_security.c            |  3 +-
> >   fs/reiserfs/xattr_trusted.c             |  3 +-
> >   fs/reiserfs/xattr_user.c                |  3 +-
> >   fs/squashfs/xattr.c                     |  2 +-
> >   fs/ubifs/xattr.c                        |  3 +-
> >   fs/xattr.c                              | 42 +++++++++++++------------
> >   fs/xfs/xfs_xattr.c                      |  3 +-
> >   include/linux/lsm_hook_defs.h           |  3 +-
> >   include/linux/security.h                |  6 ++--
> >   include/linux/xattr.h                   |  6 ++--
> >   include/uapi/linux/xattr.h              |  7 +++--
> >   mm/shmem.c                              |  3 +-
> >   net/socket.c                            |  3 +-
> >   security/commoncap.c                    | 11 ++++---
> >   security/integrity/evm/evm_main.c       | 13 +++++---
> >   security/security.c                     |  5 +--
> >   security/selinux/hooks.c                | 19 ++++++-----
> >   security/smack/smack_lsm.c              | 18 ++++++-----
> >   68 files changed, 289 insertions(+), 167 deletions(-)
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-17  7:36 ` Amir Goldstein
@ 2021-11-18  9:53   ` David Anderson
  2021-11-18 10:20     ` Amir Goldstein
  2021-12-03 15:37   ` Vivek Goyal
  1 sibling, 1 reply; 29+ messages in thread
From: David Anderson @ 2021-11-18  9:53 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, luca.boccassi

On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> Hi David,
>
> I see that the patch set has changed hands (presumably to Android upstreaming
> team), but you just rebased v18 without addressing the maintainers concerns [1].

Indeed I'm carrying this forward as Mark is no longer working on it.
My apologies for
missing those comments!

> Specifically, the patch 2/4 is very wrong for unprivileged mount and
> I think that the very noisy patch 1/4 could be completely avoided:
> Can't you use -o userxattr mount option for Android use case and limit
> the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> that are allowed
> to make changes in overlayfs mount? or not limit at all?
> The access to those xattr is forbidden via "incoming" xattr ops on
> overlay inodes.

Can you clarify a bit more? The patch is definitely super noisy and I'd love
to have a better solution. The problem it's trying to solve is:
 1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper.
 2. Kernel-privileged init mounts /blah with overlayfs using the above dirs.
 2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins.
 3. Kernel-privileged init tries to execute /blah/init to initiate a
domain transition.
 4. exec() fails because the overlayfs mounter creds (kernel domain) does
     not have getxattr permission to /blah/init.

Eg, we're hitting this problem without even making changes to the mount, and
without anything being written to /mnt/blah-upper.

> Can an unprivileged user create an overlay over a directory that they have
> access to and redirect an innocent looking file name to an underlying file that
> said the mounting user has no access to and by doing that, tricking a privileged
> user to modify the innocent looking file on the  mounter's behalf?
> Of course this could be avoided by forbidding unprivileged mount with
> override_creds=off, but there could be other scenarios, so a clear model
> would help to understand the risks.
>
> For example:
> If user 1 was able to read in lower dir A, now the content of overlay dir A
> is cached and user 2, that has permissions to read upper dir A and does
> not have read permissions on lower dir A will see the content of lower dir A.

I'll need to think about this more and test to verify. It's not a scenario that
would come up in our use case (both dirs effectively have the same permissions).

If the answer is "yes, that can happen" - do you see this as a problem of
clarifying the model, or a problem of fixing that loophole?

>> I think that the core problem with the approach is using Non-uniform
> credentials to access underlying layers. I don't see a simple way around
> a big audit that checks all those cases, but maybe I'm missing some quick
> shortcut or maybe your use case can add some restrictions about the
> users that could access this overlay that would simplify the generic problem.

In a security model like ours, I think there's no way around it, that
we really need
accesses to be from the caller's credentials and not the mounter's. It's even
worse than earlier iterations of this patch perhaps let on: we mount
before sepolicy
is loaded (so we can overlay the policy itself), and thus the
mounter's creds are
effectively "the kernel". This domain is highly restricted in our
sepolicy for obvious
reasons. There's no way our security team will let us unrestrict it.

Best,

-David




>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-18  9:53   ` David Anderson
@ 2021-11-18 10:20     ` Amir Goldstein
  2021-11-18 20:15       ` David Anderson
  0 siblings, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2021-11-18 10:20 UTC (permalink / raw)
  To: David Anderson
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, luca.boccassi

On Thu, Nov 18, 2021 at 11:53 AM David Anderson <dvander@google.com> wrote:
>
> On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > Hi David,
> >
> > I see that the patch set has changed hands (presumably to Android upstreaming
> > team), but you just rebased v18 without addressing the maintainers concerns [1].
>
> Indeed I'm carrying this forward as Mark is no longer working on it.
> My apologies for
> missing those comments!
>
> > Specifically, the patch 2/4 is very wrong for unprivileged mount and
> > I think that the very noisy patch 1/4 could be completely avoided:
> > Can't you use -o userxattr mount option for Android use case and limit
> > the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> > that are allowed
> > to make changes in overlayfs mount? or not limit at all?
> > The access to those xattr is forbidden via "incoming" xattr ops on
> > overlay inodes.
>
> Can you clarify a bit more? The patch is definitely super noisy and I'd love
> to have a better solution. The problem it's trying to solve is:
>  1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper.
>  2. Kernel-privileged init mounts /blah with overlayfs using the above dirs.
>  2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins.
>  3. Kernel-privileged init tries to execute /blah/init to initiate a
> domain transition.
>  4. exec() fails because the overlayfs mounter creds (kernel domain) does
>      not have getxattr permission to /blah/init.
>
> Eg, we're hitting this problem without even making changes to the mount, and
> without anything being written to /mnt/blah-upper.
>

So what is your solution?
Remove the security check from overlayfs setting xattr?
How does that work for the person who composed the security policy?
You will need to grant some basic privileges to the mounter.
If you do not want to grant the mounter privileges to set trusted.overlay xattr
you may use mount option -o userxattr and grant it permissions to set
user.overlay xattrs.

> > Can an unprivileged user create an overlay over a directory that they have
> > access to and redirect an innocent looking file name to an underlying file that
> > said the mounting user has no access to and by doing that, tricking a privileged
> > user to modify the innocent looking file on the  mounter's behalf?
> > Of course this could be avoided by forbidding unprivileged mount with
> > override_creds=off, but there could be other scenarios, so a clear model
> > would help to understand the risks.
> >
> > For example:
> > If user 1 was able to read in lower dir A, now the content of overlay dir A
> > is cached and user 2, that has permissions to read upper dir A and does
> > not have read permissions on lower dir A will see the content of lower dir A.
>
> I'll need to think about this more and test to verify. It's not a scenario that
> would come up in our use case (both dirs effectively have the same permissions).
>

Your argument is taking the wrong POV.
The reason that previous posts of this patch set have been rejected
is not because it doesn't work for your use case.
It is because other players can exploit the feature to bypass security
policies, so the feature cannot be merged as is.

> If the answer is "yes, that can happen" - do you see this as a problem of
> clarifying the model, or a problem of fixing that loophole?
>

It is something that is not at all easy to fix.
In the example above, instead of checking permissions against the
overlay inode (on "incoming" readdir) will need to check permissions of every
accessing user against all layers, before allowing access to the merged
directory content (which is cached).
A lot more work - and this is just for this one example.

> >> I think that the core problem with the approach is using Non-uniform
> > credentials to access underlying layers. I don't see a simple way around
> > a big audit that checks all those cases, but maybe I'm missing some quick
> > shortcut or maybe your use case can add some restrictions about the
> > users that could access this overlay that would simplify the generic problem.
>
> In a security model like ours, I think there's no way around it, that
> we really need
> accesses to be from the caller's credentials and not the mounter's. It's even
> worse than earlier iterations of this patch perhaps let on: we mount
> before sepolicy
> is loaded (so we can overlay the policy itself), and thus the
> mounter's creds are
> effectively "the kernel". This domain is highly restricted in our
> sepolicy for obvious
> reasons. There's no way our security team will let us unrestrict it.
>

Not sure what that means or what I can do with this information.
If I had a simple suggestion on how to solve your problem I would have
suggested it, but I cannot think of any right now.
The best I can do is to try to make you understand the problems that your
patch causes to others, so you can figure out a way that meets your goals
without breaking other use cases.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-18 10:20     ` Amir Goldstein
@ 2021-11-18 20:15       ` David Anderson
  2021-11-18 20:32         ` Amir Goldstein
  0 siblings, 1 reply; 29+ messages in thread
From: David Anderson @ 2021-11-18 20:15 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, luca.boccassi

On Thu, Nov 18, 2021 at 2:20 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Nov 18, 2021 at 11:53 AM David Anderson <dvander@google.com> wrote:
> >
> > On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > Hi David,
> > >
> > > I see that the patch set has changed hands (presumably to Android upstreaming
> > > team), but you just rebased v18 without addressing the maintainers concerns [1].
> >
> > Indeed I'm carrying this forward as Mark is no longer working on it.
> > My apologies for
> > missing those comments!
> >
> > > Specifically, the patch 2/4 is very wrong for unprivileged mount and
> > > I think that the very noisy patch 1/4 could be completely avoided:
> > > Can't you use -o userxattr mount option for Android use case and limit
> > > the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> > > that are allowed
> > > to make changes in overlayfs mount? or not limit at all?
> > > The access to those xattr is forbidden via "incoming" xattr ops on
> > > overlay inodes.
> >
> > Can you clarify a bit more? The patch is definitely super noisy and I'd love
> > to have a better solution. The problem it's trying to solve is:
> >  1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper.
> >  2. Kernel-privileged init mounts /blah with overlayfs using the above dirs.
> >  2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins.
> >  3. Kernel-privileged init tries to execute /blah/init to initiate a
> > domain transition.
> >  4. exec() fails because the overlayfs mounter creds (kernel domain) does
> >      not have getxattr permission to /blah/init.
> >
> > Eg, we're hitting this problem without even making changes to the mount, and
> > without anything being written to /mnt/blah-upper.
> >
>
> So what is your solution?
> Remove the security check from overlayfs setting xattr?
> How does that work for the person who composed the security policy?
> You will need to grant some basic privileges to the mounter.
> If you do not want to grant the mounter privileges to set trusted.overlay xattr
> you may use mount option -o userxattr and grant it permissions to set
> user.overlay xattrs.

Thanks for the tips. I think it might be possible for us to work
around this one in
userspace, either by granting the mounter this one permission across
all contexts,
or with a much narrower kernel fix that doesn't require us keeping the big noisy
patch in our tree.

> > > Can an unprivileged user create an overlay over a directory that they have
> > > access to and redirect an innocent looking file name to an underlying file that
> > > said the mounting user has no access to and by doing that, tricking a privileged
> > > user to modify the innocent looking file on the  mounter's behalf?
> > > Of course this could be avoided by forbidding unprivileged mount with
> > > override_creds=off, but there could be other scenarios, so a clear model
> > > would help to understand the risks.
> > >
> > > For example:
> > > If user 1 was able to read in lower dir A, now the content of overlay dir A
> > > is cached and user 2, that has permissions to read upper dir A and does
> > > not have read permissions on lower dir A will see the content of lower dir A.
> >
> > I'll need to think about this more and test to verify. It's not a scenario that
> > would come up in our use case (both dirs effectively have the same permissions).
> >
>
> Your argument is taking the wrong POV.
> The reason that previous posts of this patch set have been rejected
> is not because it doesn't work for your use case.
> It is because other players can exploit the feature to bypass security
> policies, so the feature cannot be merged as is.

Ack, understood.

> > If the answer is "yes, that can happen" - do you see this as a problem of
> > clarifying the model, or a problem of fixing that loophole?
> >
>
> It is something that is not at all easy to fix.
> In the example above, instead of checking permissions against the
> overlay inode (on "incoming" readdir) will need to check permissions of every
> accessing user against all layers, before allowing access to the merged
> directory content (which is cached).
> A lot more work - and this is just for this one example.

I see your point. If we could implement that, behind a mount flag, would that be
an acceptable solution?

> > >> I think that the core problem with the approach is using Non-uniform
> > > credentials to access underlying layers. I don't see a simple way around
> > > a big audit that checks all those cases, but maybe I'm missing some quick
> > > shortcut or maybe your use case can add some restrictions about the
> > > users that could access this overlay that would simplify the generic problem.
> >
> > In a security model like ours, I think there's no way around it, that
> > we really need
> > accesses to be from the caller's credentials and not the mounter's. It's even
> > worse than earlier iterations of this patch perhaps let on: we mount
> > before sepolicy
> > is loaded (so we can overlay the policy itself), and thus the
> > mounter's creds are
> > effectively "the kernel". This domain is highly restricted in our
> > sepolicy for obvious
> > reasons. There's no way our security team will let us unrestrict it.
> >
>
> Not sure what that means or what I can do with this information.
> If I had a simple suggestion on how to solve your problem I would have
> suggested it, but I cannot think of any right now.
> The best I can do is to try to make you understand the problems that your
> patch causes to others, so you can figure out a way that meets your goals
> without breaking other use cases.

To help clarify, we do not have any processes which have access to everything.
Almost every file in the system has a specific selabel and processes are only
granted access to the labels they need.

To give the mounter (init in kernel domain) access to every single label doesn't
make sense. It only makes sense to have the access be "passthrough", making
overlayfs as transparent as possible.

Best,

-David

>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-18 20:15       ` David Anderson
@ 2021-11-18 20:32         ` Amir Goldstein
  0 siblings, 0 replies; 29+ messages in thread
From: Amir Goldstein @ 2021-11-18 20:32 UTC (permalink / raw)
  To: David Anderson
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet, Vivek Goyal,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, luca.boccassi

> > It is something that is not at all easy to fix.
> > In the example above, instead of checking permissions against the
> > overlay inode (on "incoming" readdir) will need to check permissions of every
> > accessing user against all layers, before allowing access to the merged
> > directory content (which is cached).
> > A lot more work - and this is just for this one example.
>
> I see your point. If we could implement that, behind a mount flag, would that be
> an acceptable solution?
>

As I wrote, this is one specific problem that I identified.
If you propose a different behavior base on mount flag you should
be able to argue that is cannot be exploited to circumvent security
access policies, by peaking into cached copies of objects that the user
has no access to, or by any other way.

I have no idea how to implement what you want and prove that
it is safe.
Maybe if you explained the use case in greater details with some
examples someone could help you reach a possible solution.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-11-17  7:36 ` Amir Goldstein
  2021-11-18  9:53   ` David Anderson
@ 2021-12-03 15:37   ` Vivek Goyal
  2021-12-03 16:04     ` Paul Moore
  2021-12-03 16:31     ` Amir Goldstein
  1 sibling, 2 replies; 29+ messages in thread
From: Vivek Goyal @ 2021-12-03 15:37 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: David Anderson, Mark Salyzyn, Miklos Szeredi, Jonathan Corbet,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, Luca.Boccassi

On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> >
> > Mark Salyzyn (3):
> >   Add flags option to get xattr method paired to __vfs_getxattr
> >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> >   overlayfs: override_creds=off option bypass creator_cred
> >
> > Mark Salyzyn + John Stultz (1):
> >   overlayfs: inode_owner_or_capable called during execv
> >
> > The first three patches address fundamental security issues that should
> > be solved regardless of the override_creds=off feature.
> >
> > The fourth adds the feature depends on these other fixes.
> >
> > By default, all access to the upper, lower and work directories is the
> > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > checked against the caller's credentials.
> >
> > If the principles of least privilege are applied for sepolicy, the
> > mounter's credentials might not overlap the credentials of the caller's
> > when accessing the overlayfs filesystem.  For example, a file that a
> > lower DAC privileged caller can execute, is MAC denied to the
> > generally higher DAC privileged mounter, to prevent an attack vector.
> >
> > We add the option to turn off override_creds in the mount options; all
> > subsequent operations after mount on the filesystem will be only the
> > caller's credentials.  The module boolean parameter and mount option
> > override_creds is also added as a presence check for this "feature",
> > existence of /sys/module/overlay/parameters/overlay_creds
> >
> > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > Signed-off-by: David Anderson <dvander@google.com>
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-team@android.com
> > Cc: selinux@vger.kernel.org
> > Cc: paulmoore@microsoft.com
> > Cc: Luca.Boccassi@microsoft.com
> >
> > ---
> >
> > v19
> > - rebase.
> >
> 
> Hi David,
> 
> I see that the patch set has changed hands (presumably to Android upstreaming
> team), but you just rebased v18 without addressing the maintainers concerns [1].
> 

BTW, where is patch 1 of the series. I can't seem to find it.

I think I was running into issues with getxattr() on underlying filesystem
as well (if mounter did not have sufficient privileges) and tried to fix
it. But did not find a good solution at that point of time.

https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/

So basically when overlay inode is being initialized, code will try to
query "security.selinux" xattr on underlying file to initialize selinux
label on the overlay inode. For regular filesystems, they bypass the
security check by calling __vfs_getxattr() when trying to initialize
this selinux security label. But with layered filesystem, it still
ends up calling vfs_getxattr() on underlying filesyste. Which means
it checks for caller's creds and if caller is not priviliged enough,
access will be denied.

To solve this problem, looks like this patch set is passing a flag
XATTR_NOSECUROTY so that permission checks are skipped in getxattr()
path in underlying filesystem. As long as this information is
not leaked to user space (and remains in overlayfs), it probably is
fine? And if information is not going to user space, then it probably
is fine for unprivileged overlayfs mounts as well?

I see a comment from Miklos as well as you that it is not safe to
do for unprivileged mounts. Can you help me understand why that's
the case.


> Specifically, the patch 2/4 is very wrong for unprivileged mount and

Can you help me understand why it is wrong. (/me should spend more
time reading the patch. But I am taking easy route of asking you. :-)).

> I think that the very noisy patch 1/4 could be completely avoided:

How can it completely avoided. If mounter is not privileged then 
vfs_getxattr() on underlying filesystem will fail. Or if
override_creds=off, then caller might not be privileged enough to
do getxattr() but we still should be able to initialize overlay
inode security label. 

> Can't you use -o userxattr mount option

user xattrs done't work for device nodes and symlinks.

BTW, how will userxattr solve the problem completely. It can be used
to store overlay specific xattrs but accessing security xattrs on
underlying filesystem will still be a problem?

Thanks
Vivek

> for Android use case and limit
> the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> that are allowed
> to make changes in overlayfs mount? or not limit at all?
> The access to those xattr is forbidden via "incoming" xattr ops on
> overlay inodes.
> 
> Also, IMO, the Documentation section about "Non overlapping credentials" does
> not hold the same standards as the section about "Permission model" -
> it does not
> state the requirements clear enough for my non-security-oriented brain to be
> able to understand if the "Ignore mounter's credentials" model can be exploited.
> 
> Can an unprivileged user create an overlay over a directory that they have
> access to and redirect an innocent looking file name to an underlying file that
> said the mounting user has no access to and by doing that, tricking a privileged
> user to modify the innocent looking file on the  mounter's behalf?
> Of course this could be avoided by forbidding unprivileged mount with
> override_creds=off, but there could be other scenarios, so a clear model
> would help to understand the risks.
> 
> For example:
> If user 1 was able to read in lower dir A, now the content of overlay dir A
> is cached and user 2, that has permissions to read upper dir A and does
> not have read permissions on lower dir A will see the content of lower dir A.
> 
> I think that the core problem with the approach is using Non-uniform
> credentials to access underlying layers. I don't see a simple way around
> a big audit that checks all those cases, but maybe I'm missing some quick
> shortcut or maybe your use case can add some restrictions about the
> users that could access this overlay that would simplify the generic problem.
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-12-03 15:37   ` Vivek Goyal
@ 2021-12-03 16:04     ` Paul Moore
  2021-12-03 16:31     ` Amir Goldstein
  1 sibling, 0 replies; 29+ messages in thread
From: Paul Moore @ 2021-12-03 16:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Amir Goldstein, David Anderson, Mark Salyzyn, Miklos Szeredi,
	Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, Luca.Boccassi

On Fri, Dec 3, 2021 at 10:38 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> > >
> > > Mark Salyzyn (3):
> > >   Add flags option to get xattr method paired to __vfs_getxattr
> > >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> > >   overlayfs: override_creds=off option bypass creator_cred
> > >
> > > Mark Salyzyn + John Stultz (1):
> > >   overlayfs: inode_owner_or_capable called during execv
> > >
> > > The first three patches address fundamental security issues that should
> > > be solved regardless of the override_creds=off feature.
> > >
> > > The fourth adds the feature depends on these other fixes.
> > >
> > > By default, all access to the upper, lower and work directories is the
> > > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > > checked against the caller's credentials.
> > >
> > > If the principles of least privilege are applied for sepolicy, the
> > > mounter's credentials might not overlap the credentials of the caller's
> > > when accessing the overlayfs filesystem.  For example, a file that a
> > > lower DAC privileged caller can execute, is MAC denied to the
> > > generally higher DAC privileged mounter, to prevent an attack vector.
> > >
> > > We add the option to turn off override_creds in the mount options; all
> > > subsequent operations after mount on the filesystem will be only the
> > > caller's credentials.  The module boolean parameter and mount option
> > > override_creds is also added as a presence check for this "feature",
> > > existence of /sys/module/overlay/parameters/overlay_creds
>
> BTW, where is patch 1 of the series. I can't seem to find it.

Lore to the rescue ...

https://lore.kernel.org/selinux/20211117015806.2192263-2-dvander@google.com/

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-12-03 15:37   ` Vivek Goyal
  2021-12-03 16:04     ` Paul Moore
@ 2021-12-03 16:31     ` Amir Goldstein
  2021-12-03 18:34       ` Vivek Goyal
  1 sibling, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2021-12-03 16:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: David Anderson, Mark Salyzyn, Miklos Szeredi, Jonathan Corbet,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, Luca.Boccassi

On Fri, Dec 3, 2021 at 5:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> > >
> > > Mark Salyzyn (3):
> > >   Add flags option to get xattr method paired to __vfs_getxattr
> > >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> > >   overlayfs: override_creds=off option bypass creator_cred
> > >
> > > Mark Salyzyn + John Stultz (1):
> > >   overlayfs: inode_owner_or_capable called during execv
> > >
> > > The first three patches address fundamental security issues that should
> > > be solved regardless of the override_creds=off feature.
> > >
> > > The fourth adds the feature depends on these other fixes.
> > >
> > > By default, all access to the upper, lower and work directories is the
> > > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > > checked against the caller's credentials.
> > >
> > > If the principles of least privilege are applied for sepolicy, the
> > > mounter's credentials might not overlap the credentials of the caller's
> > > when accessing the overlayfs filesystem.  For example, a file that a
> > > lower DAC privileged caller can execute, is MAC denied to the
> > > generally higher DAC privileged mounter, to prevent an attack vector.
> > >
> > > We add the option to turn off override_creds in the mount options; all
> > > subsequent operations after mount on the filesystem will be only the
> > > caller's credentials.  The module boolean parameter and mount option
> > > override_creds is also added as a presence check for this "feature",
> > > existence of /sys/module/overlay/parameters/overlay_creds
> > >
> > > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > > Signed-off-by: David Anderson <dvander@google.com>
> > > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > Cc: Amir Goldstein <amir73il@gmail.com>
> > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > > Cc: John Stultz <john.stultz@linaro.org>
> > > Cc: linux-doc@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: linux-fsdevel@vger.kernel.org
> > > Cc: linux-unionfs@vger.kernel.org
> > > Cc: linux-security-module@vger.kernel.org
> > > Cc: kernel-team@android.com
> > > Cc: selinux@vger.kernel.org
> > > Cc: paulmoore@microsoft.com
> > > Cc: Luca.Boccassi@microsoft.com
> > >
> > > ---
> > >
> > > v19
> > > - rebase.
> > >
> >
> > Hi David,
> >
> > I see that the patch set has changed hands (presumably to Android upstreaming
> > team), but you just rebased v18 without addressing the maintainers concerns [1].
> >
>
> BTW, where is patch 1 of the series. I can't seem to find it.
>
> I think I was running into issues with getxattr() on underlying filesystem
> as well (if mounter did not have sufficient privileges) and tried to fix
> it. But did not find a good solution at that point of time.
>
> https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/
>
> So basically when overlay inode is being initialized, code will try to
> query "security.selinux" xattr on underlying file to initialize selinux
> label on the overlay inode. For regular filesystems, they bypass the
> security check by calling __vfs_getxattr() when trying to initialize
> this selinux security label. But with layered filesystem, it still
> ends up calling vfs_getxattr() on underlying filesyste. Which means
> it checks for caller's creds and if caller is not priviliged enough,
> access will be denied.
>
> To solve this problem, looks like this patch set is passing a flag
> XATTR_NOSECUROTY so that permission checks are skipped in getxattr()
> path in underlying filesystem. As long as this information is
> not leaked to user space (and remains in overlayfs), it probably is
> fine? And if information is not going to user space, then it probably
> is fine for unprivileged overlayfs mounts as well?
>
> I see a comment from Miklos as well as you that it is not safe to
> do for unprivileged mounts. Can you help me understand why that's
> the case.
>
>
> > Specifically, the patch 2/4 is very wrong for unprivileged mount and
>
> Can you help me understand why it is wrong. (/me should spend more
> time reading the patch. But I am taking easy route of asking you. :-)).
>

I should have spent more time reading the patch too :-)
I was not referring to the selinux part. That looks fine I guess.

I was referring to the part of:
"Check impure, opaque, origin & meta xattr with no sepolicy audit
(using __vfs_getxattr) since these operations are internal to
overlayfs operations and do not disclose any data."
I don't know how safe that really is to ignore the security checks
for reading trusted xattr and allow non-privileged mounts to do that.
Certainly since non privileged mounts are likely to use userxattr
anyway, so what's the reason to bypass security?

> > I think that the very noisy patch 1/4 could be completely avoided:
>
> How can it completely avoided. If mounter is not privileged then
> vfs_getxattr() on underlying filesystem will fail. Or if
> override_creds=off, then caller might not be privileged enough to
> do getxattr() but we still should be able to initialize overlay
> inode security label.
>

My bad. I didn't read the description of the selinux problem
with the re-post and forgot about it.

> > Can't you use -o userxattr mount option
>
> user xattrs done't work for device nodes and symlinks.
>
> BTW, how will userxattr solve the problem completely. It can be used
> to store overlay specific xattrs but accessing security xattrs on
> underlying filesystem will still be a problem?

It cannot.
As long as the patch sticks with passing through the
getxattr flags, it looks fine to me.
passing security for trusted.overlay seems dodgy.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-12-03 16:31     ` Amir Goldstein
@ 2021-12-03 18:34       ` Vivek Goyal
  2022-03-01  1:09         ` Paul Moore
  0 siblings, 1 reply; 29+ messages in thread
From: Vivek Goyal @ 2021-12-03 18:34 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: David Anderson, Mark Salyzyn, Miklos Szeredi, Jonathan Corbet,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, Luca.Boccassi

On Fri, Dec 03, 2021 at 06:31:01PM +0200, Amir Goldstein wrote:
> On Fri, Dec 3, 2021 at 5:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> > > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> > > >
> > > > Mark Salyzyn (3):
> > > >   Add flags option to get xattr method paired to __vfs_getxattr
> > > >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> > > >   overlayfs: override_creds=off option bypass creator_cred
> > > >
> > > > Mark Salyzyn + John Stultz (1):
> > > >   overlayfs: inode_owner_or_capable called during execv
> > > >
> > > > The first three patches address fundamental security issues that should
> > > > be solved regardless of the override_creds=off feature.
> > > >
> > > > The fourth adds the feature depends on these other fixes.
> > > >
> > > > By default, all access to the upper, lower and work directories is the
> > > > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > > > checked against the caller's credentials.
> > > >
> > > > If the principles of least privilege are applied for sepolicy, the
> > > > mounter's credentials might not overlap the credentials of the caller's
> > > > when accessing the overlayfs filesystem.  For example, a file that a
> > > > lower DAC privileged caller can execute, is MAC denied to the
> > > > generally higher DAC privileged mounter, to prevent an attack vector.
> > > >
> > > > We add the option to turn off override_creds in the mount options; all
> > > > subsequent operations after mount on the filesystem will be only the
> > > > caller's credentials.  The module boolean parameter and mount option
> > > > override_creds is also added as a presence check for this "feature",
> > > > existence of /sys/module/overlay/parameters/overlay_creds
> > > >
> > > > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > > > Signed-off-by: David Anderson <dvander@google.com>
> > > > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > > Cc: Amir Goldstein <amir73il@gmail.com>
> > > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > > > Cc: John Stultz <john.stultz@linaro.org>
> > > > Cc: linux-doc@vger.kernel.org
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Cc: linux-fsdevel@vger.kernel.org
> > > > Cc: linux-unionfs@vger.kernel.org
> > > > Cc: linux-security-module@vger.kernel.org
> > > > Cc: kernel-team@android.com
> > > > Cc: selinux@vger.kernel.org
> > > > Cc: paulmoore@microsoft.com
> > > > Cc: Luca.Boccassi@microsoft.com
> > > >
> > > > ---
> > > >
> > > > v19
> > > > - rebase.
> > > >
> > >
> > > Hi David,
> > >
> > > I see that the patch set has changed hands (presumably to Android upstreaming
> > > team), but you just rebased v18 without addressing the maintainers concerns [1].
> > >
> >
> > BTW, where is patch 1 of the series. I can't seem to find it.
> >
> > I think I was running into issues with getxattr() on underlying filesystem
> > as well (if mounter did not have sufficient privileges) and tried to fix
> > it. But did not find a good solution at that point of time.
> >
> > https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/
> >
> > So basically when overlay inode is being initialized, code will try to
> > query "security.selinux" xattr on underlying file to initialize selinux
> > label on the overlay inode. For regular filesystems, they bypass the
> > security check by calling __vfs_getxattr() when trying to initialize
> > this selinux security label. But with layered filesystem, it still
> > ends up calling vfs_getxattr() on underlying filesyste. Which means
> > it checks for caller's creds and if caller is not priviliged enough,
> > access will be denied.
> >
> > To solve this problem, looks like this patch set is passing a flag
> > XATTR_NOSECUROTY so that permission checks are skipped in getxattr()
> > path in underlying filesystem. As long as this information is
> > not leaked to user space (and remains in overlayfs), it probably is
> > fine? And if information is not going to user space, then it probably
> > is fine for unprivileged overlayfs mounts as well?
> >
> > I see a comment from Miklos as well as you that it is not safe to
> > do for unprivileged mounts. Can you help me understand why that's
> > the case.
> >
> >
> > > Specifically, the patch 2/4 is very wrong for unprivileged mount and
> >
> > Can you help me understand why it is wrong. (/me should spend more
> > time reading the patch. But I am taking easy route of asking you. :-)).
> >
> 
> I should have spent more time reading the patch too :-)
> I was not referring to the selinux part. That looks fine I guess.
> 
> I was referring to the part of:
> "Check impure, opaque, origin & meta xattr with no sepolicy audit
> (using __vfs_getxattr) since these operations are internal to
> overlayfs operations and do not disclose any data."
> I don't know how safe that really is to ignore the security checks
> for reading trusted xattr and allow non-privileged mounts to do that.

I am also concerned about this.

> Certainly since non privileged mounts are likely to use userxattr
> anyway, so what's the reason to bypass security?

I am not sure. In the early version of patches I think argument was
that do not switch to mounter's creds and use caller's creds on 
underlying filesystem as well. And each caller will be privileged
enough to be able to perform the operation.

Our take was that how is this model better because in current model
only mounter needs to be privileged while in this new model each
caller will have to be privileged. But Android guys seemed to be ok
with that. So has this assumption changed since early days. If callers
are privileged, then vfs_getxattr() on underlying filesystem for
overaly internal xattrs should succeed and there is no need for this
change.

I suspect patches have evolved since then and callers are not as
privileged as we expect them to and that's why we are bypassing this
check on all overlayfs internal trusted xattrs? This definitely requires
much close scrutiny. My initial reaction is that this sounds very scary.

In general I would think overlayfs should not bypass the check on
underlying fs. Either checks should be done in mounter's context or
caller's context (depending on override_creds=on/off).

Thanks
Vivek

> 
> > > I think that the very noisy patch 1/4 could be completely avoided:
> >
> > How can it completely avoided. If mounter is not privileged then
> > vfs_getxattr() on underlying filesystem will fail. Or if
> > override_creds=off, then caller might not be privileged enough to
> > do getxattr() but we still should be able to initialize overlay
> > inode security label.
> >
> 
> My bad. I didn't read the description of the selinux problem
> with the re-post and forgot about it.
> 
> > > Can't you use -o userxattr mount option
> >
> > user xattrs done't work for device nodes and symlinks.
> >
> > BTW, how will userxattr solve the problem completely. It can be used
> > to store overlay specific xattrs but accessing security xattrs on
> > underlying filesystem will still be a problem?
> 
> It cannot.
> As long as the patch sticks with passing through the
> getxattr flags, it looks fine to me.
> passing security for trusted.overlay seems dodgy.
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2021-12-03 18:34       ` Vivek Goyal
@ 2022-03-01  1:09         ` Paul Moore
       [not found]           ` <CA+FmFJA-r+JgMqObNCvE_X+L6jxWtDrczM9Jh0L38Fq-6mnbbA@mail.gmail.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Moore @ 2022-03-01  1:09 UTC (permalink / raw)
  To: Vivek Goyal, Amir Goldstein, David Anderson
  Cc: Mark Salyzyn, Miklos Szeredi, Jonathan Corbet,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, Luca.Boccassi

I wanted to try and bring this thread back from the dead (?) as I
believe the use-case is still valid and worth supporting.  Some more
brief comments below ...

On Fri, Dec 3, 2021 at 1:34 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> I am not sure. In the early version of patches I think argument was
> that do not switch to mounter's creds and use caller's creds on
> underlying filesystem as well. And each caller will be privileged
> enough to be able to perform the operation.

The basic idea is that we can now build Linux systems with enough
access control granularity such that a given process can have the
necessary privileges to mount a filesystem, but not necessarily access
all of the data on the filesystem, while other processes, with
different access rights, are allowed to read and write data on the
mounted filesystem.  Granted, this is a bit different from how things
are usually done, but in my opinion it's a valid and interesting use
case in that it allows us to remove unneeded access rights from
historically very privileged system startup services/scripts: the
service that runs to mount my homedir shouldn't be allowed to access
my files just to mount the directory.

Unfortunately, this idea falls apart when we attempt to use overlayfs
due to the clever/usual way it caches the mounting processes
credentials and uses that in place of the current process' credentials
when accessing certain parts of the underlying filesystems.  The
current overlayfs implementation assumes that the mounter will always
be more privileged than the processes accessing the filesystem, it
would be nice if we could build a mechanism that didn't have this
assumption baked into the implementation.

This patchset may not have been The Answer, but surely there is
something we can do to support this use-case.

> Our take was that how is this model better because in current model
> only mounter needs to be privileged while in this new model each
> caller will have to be privileged. But Android guys seemed to be ok
> with that. So has this assumption changed since early days. If callers
> are privileged, then vfs_getxattr() on underlying filesystem for
> overaly internal xattrs should succeed and there is no need for this
> change.
>
> I suspect patches have evolved since then and callers are not as
> privileged as we expect them to and that's why we are bypassing this
> check on all overlayfs internal trusted xattrs? This definitely requires
> much close scrutiny. My initial reaction is that this sounds very scary.
>
> In general I would think overlayfs should not bypass the check on
> underlying fs. Either checks should be done in mounter's context or
> caller's context (depending on override_creds=on/off).
>
> Thanks
> Vivek

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
       [not found]           ` <CA+FmFJA-r+JgMqObNCvE_X+L6jxWtDrczM9Jh0L38Fq-6mnbbA@mail.gmail.com>
@ 2022-03-09 21:13             ` Paul Moore
  2022-03-10 22:11               ` Paul Moore
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Moore @ 2022-03-09 21:13 UTC (permalink / raw)
  To: David Anderson
  Cc: Vivek Goyal, Amir Goldstein, Mark Salyzyn, Miklos Szeredi,
	Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:
>>
>> I wanted to try and bring this thread back from the dead (?) as I
>> believe the use-case is still valid and worth supporting.  Some more
>> brief comments below ...
>>
>> On Fri, Dec 3, 2021 at 1:34 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>> > I am not sure. In the early version of patches I think argument was
>> > that do not switch to mounter's creds and use caller's creds on
>> > underlying filesystem as well. And each caller will be privileged
>> > enough to be able to perform the operation.
>
> Indeed that was the argument - though, "userxattr" eliminated the need for patches 1 & 2 completely for us, which is great. We're no longer carrying those in our 5.15 tree.
>
>> Unfortunately, this idea falls apart when we attempt to use overlayfs
>> due to the clever/usual way it caches the mounting processes
>> credentials and uses that in place of the current process' credentials
>> when accessing certain parts of the underlying filesystems.  The
>> current overlayfs implementation assumes that the mounter will always
>> be more privileged than the processes accessing the filesystem, it
>> would be nice if we could build a mechanism that didn't have this
>> assumption baked into the implementation.
>>
>> This patchset may not have been The Answer, but surely there is
>> something we can do to support this use-case.
>
> Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.

Can you elaborate a bit more on the coherency issues?  Is this the dir
cache issue that is alluded to in the patchset?  Anything else that
has come up on review?

Before I start looking at the dir cache in any detail, did you have
any thoughts on how to resolve the problems that have arisen?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2022-03-09 21:13             ` Paul Moore
@ 2022-03-10 22:11               ` Paul Moore
  2022-03-11  4:09                 ` Amir Goldstein
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Moore @ 2022-03-10 22:11 UTC (permalink / raw)
  To: Vivek Goyal, Amir Goldstein, Miklos Szeredi, David Anderson
  Cc: Mark Salyzyn, Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote:
> On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:

...

> >> This patchset may not have been The Answer, but surely there is
> >> something we can do to support this use-case.
> >
> > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.
>
> Can you elaborate a bit more on the coherency issues?  Is this the dir
> cache issue that is alluded to in the patchset?  Anything else that
> has come up on review?
>
> Before I start looking at the dir cache in any detail, did you have
> any thoughts on how to resolve the problems that have arisen?

David, Vivek, Amir, Miklos, or anyone for that matter, can you please
go into more detail on the cache issues?  I *think* I may have found a
potential solution for an issue that could arise when the credential
override is not in place, but I'm not certain it's the only issue :)

There is motivation on our part to try and get the
"override_creds=off" portion of the patchset working and suitable for
upstreaming, but I need some help in making sure I understand all the
objections/problems.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2022-03-10 22:11               ` Paul Moore
@ 2022-03-11  4:09                 ` Amir Goldstein
  2022-03-11 14:01                   ` Vivek Goyal
  0 siblings, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2022-03-11  4:09 UTC (permalink / raw)
  To: Paul Moore
  Cc: Vivek Goyal, Miklos Szeredi, David Anderson, Mark Salyzyn,
	Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

On Fri, Mar 11, 2022 at 12:11 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:
>
> ...
>
> > >> This patchset may not have been The Answer, but surely there is
> > >> something we can do to support this use-case.
> > >
> > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.
> >
> > Can you elaborate a bit more on the coherency issues?  Is this the dir
> > cache issue that is alluded to in the patchset?  Anything else that
> > has come up on review?
> >
> > Before I start looking at the dir cache in any detail, did you have
> > any thoughts on how to resolve the problems that have arisen?
>
> David, Vivek, Amir, Miklos, or anyone for that matter, can you please
> go into more detail on the cache issues?  I *think* I may have found a
> potential solution for an issue that could arise when the credential
> override is not in place, but I'm not certain it's the only issue :)
>

Hi Paul,

In this thread I claimed that the authors of the patches did not present
a security model for overlayfs, such as the one currently in overlayfs.rst.
If we had a model we could have debated its correctness and review its
implementation.

As a proof that there is no solid model, I gave an *example* regarding
the overlay readdir cache.

When listing a merged dir, meaning, a directory containing entries from
several overlay layers, ovl_permission() is called to check user's permission,
but ovl_permission() does not currently check permissions to read all layers,
because that is not the current overlayfs model.

Overlayfs has a readdir cache, so without override_cred, a user with high
credentials can populate the readdir cache and then a user will fewer
credentials, not enough to access the lower layers, but enough to access
the upper most layer, will pass ovl_permission() check and be allowed to
read from readdir cache.

This specific problem can be solved in several ways - disable readdir
cache with override_cred=off, check all layers in ovl_permission().
That's not my point. My point is that I provided a proof that the current
model of override_cred=off is flawed and it is up to the authors of the
patch to fix the model and provide the analysis of overlayfs code to
prove the model's correctness.

The core of the matter is there is no easy way to "merge" the permissions
from all layers into a single permission blob that could be checked once.

Maybe the example I gave is the only flaw in the model, maybe not
I am not sure. I will be happy to help you in review of a model and the
solution that you may have found.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2022-03-11  4:09                 ` Amir Goldstein
@ 2022-03-11 14:01                   ` Vivek Goyal
  2022-03-11 20:52                     ` Paul Moore
  0 siblings, 1 reply; 29+ messages in thread
From: Vivek Goyal @ 2022-03-11 14:01 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Paul Moore, Miklos Szeredi, David Anderson, Mark Salyzyn,
	Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

On Fri, Mar 11, 2022 at 06:09:56AM +0200, Amir Goldstein wrote:
> On Fri, Mar 11, 2022 at 12:11 AM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> > > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > ...
> >
> > > >> This patchset may not have been The Answer, but surely there is
> > > >> something we can do to support this use-case.
> > > >
> > > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.
> > >
> > > Can you elaborate a bit more on the coherency issues?  Is this the dir
> > > cache issue that is alluded to in the patchset?  Anything else that
> > > has come up on review?
> > >
> > > Before I start looking at the dir cache in any detail, did you have
> > > any thoughts on how to resolve the problems that have arisen?
> >
> > David, Vivek, Amir, Miklos, or anyone for that matter, can you please
> > go into more detail on the cache issues?  I *think* I may have found a
> > potential solution for an issue that could arise when the credential
> > override is not in place, but I'm not certain it's the only issue :)
> >
> 
> Hi Paul,
> 
> In this thread I claimed that the authors of the patches did not present
> a security model for overlayfs, such as the one currently in overlayfs.rst.
> If we had a model we could have debated its correctness and review its
> implementation.

Agreed. After going through the patch set, I was wondering what's the
overall security model and how to visualize that.

So probably there needs to be a documentation patch which explains
what's the new security model and how does it work.

Also think both in terms of DAC and MAC. (Instead of just focussing too
hard on SELinux).

My understanding is that in current model, some of the overlayfs
operations require priviliges. So mounter is supposed to be priviliged
and does the operation on underlying layers.

Now in this new model, there will be two levels of check. Both overlay
level and underlying layer checks will happen in the context of task
which is doing the operation. So first of all, all tasks will need
to have enough priviliges to be able to perform various operations
on lower layer. 

If we do checks at both the levels in with the creds of calling task,
I guess that probably is fine. (But will require a closer code inspection
to make sure there is no privilege escalation both for mounter as well
calling task).

> 
> As a proof that there is no solid model, I gave an *example* regarding
> the overlay readdir cache.
> 
> When listing a merged dir, meaning, a directory containing entries from
> several overlay layers, ovl_permission() is called to check user's permission,
> but ovl_permission() does not currently check permissions to read all layers,
> because that is not the current overlayfs model.
> 
> Overlayfs has a readdir cache, so without override_cred, a user with high
> credentials can populate the readdir cache and then a user will fewer
> credentials, not enough to access the lower layers, but enough to access
> the upper most layer, will pass ovl_permission() check and be allowed to
> read from readdir cache.

I am not very familiar with dir caching code. When I read through the
overlayfs.rst, it gave the impression that caching is per "struct file".

"This merged name list is cached in the
'struct file' and so remains as long as the file is kept open."

And I was wondering if that's the case, then one user should not be able
to access the cache built by another priviliged user (until and unless
privileged user passed fd).

But looks like we build this cache and store in ovl inode and that's
why this issue of cache built by higher privileged process will be
accessible to lower privileged process.

With current model this is not an issue because "mounter" is providing
those privileges to unprivileged process. So while unprivileged process
can't do "readdir" on an underlying lower dir, it might still be able
to do that through an overlay mount. But if we don't switch to mounter's
creds, then we probably can't rely on this dir caching. Agreed that
disabling dir caching seems simplest solution if we were to do
override_creds=off.

Thanks
Vivek

> 
> This specific problem can be solved in several ways - disable readdir
> cache with override_cred=off, check all layers in ovl_permission().
> That's not my point. My point is that I provided a proof that the current
> model of override_cred=off is flawed and it is up to the authors of the
> patch to fix the model and provide the analysis of overlayfs code to
> prove the model's correctness.
> 
> The core of the matter is there is no easy way to "merge" the permissions
> from all layers into a single permission blob that could be checked once.
> 
> Maybe the example I gave is the only flaw in the model, maybe not
> I am not sure. I will be happy to help you in review of a model and the
> solution that you may have found.
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2022-03-11 14:01                   ` Vivek Goyal
@ 2022-03-11 20:52                     ` Paul Moore
  2023-03-22  7:28                       ` Johannes Segitz
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Moore @ 2022-03-11 20:52 UTC (permalink / raw)
  To: Amir Goldstein, Vivek Goyal
  Cc: Miklos Szeredi, David Anderson, Mark Salyzyn, Jonathan Corbet,
	Eric W . Biederman, Randy Dunlap, Stephen Smalley, John Stultz,
	linux-doc, linux-kernel, linux-fsdevel, overlayfs, LSM List,
	kernel-team, selinux, paulmoore, luca.boccassi

On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 11, 2022 at 06:09:56AM +0200, Amir Goldstein wrote:
> > Hi Paul,

Hi Amir, Vivek,

Thanks for the replies, I think I now have a better understanding of
the concerns which is starting to make the path forward a bit more
clear.  A few more comments below ...

> > In this thread I claimed that the authors of the patches did not present
> > a security model for overlayfs, such as the one currently in overlayfs.rst.
> > If we had a model we could have debated its correctness and review its
> > implementation.
>
> Agreed. After going through the patch set, I was wondering what's the
> overall security model and how to visualize that.
>
> So probably there needs to be a documentation patch which explains
> what's the new security model and how does it work.

Yes, of course.  I'll be sure to add a section to the existing docs.

> Also think both in terms of DAC and MAC. (Instead of just focussing too
> hard on SELinux).

Definitely.  Most of what I've been thinking about the past day or so
has been how to properly handle some of the DAC/capability issues; I
have yet to start playing with the code, but for the most part I think
the MAC/SELinux bits are already working properly.

> My understanding is that in current model, some of the overlayfs
> operations require priviliges. So mounter is supposed to be priviliged
> and does the operation on underlying layers.
>
> Now in this new model, there will be two levels of check. Both overlay
> level and underlying layer checks will happen in the context of task
> which is doing the operation. So first of all, all tasks will need
> to have enough priviliges to be able to perform various operations
> on lower layer.
>
> If we do checks at both the levels in with the creds of calling task,
> I guess that probably is fine. (But will require a closer code inspection
> to make sure there is no privilege escalation both for mounter as well
> calling task).

I have thoughts on this, but I don't think I'm yet in a position to
debate this in depth just yet; I still need to finish poking around
the code and playing with a few things :)

It may take some time before I'm back with patches, but I appreciate
all of the tips and insight - thank you!

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr
  2021-11-17  1:58 ` [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr David Anderson
  2021-11-17 16:13   ` kernel test robot
@ 2022-03-25 11:02   ` Luca Weiss
  2022-03-25 14:44     ` Paul Moore
  1 sibling, 1 reply; 29+ messages in thread
From: Luca Weiss @ 2022-03-25 11:02 UTC (permalink / raw)
  To: dvander
  Cc: Luca.Boccassi, darrick.wong, dsterba, hubcap, jack, jlayton,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module,
	linux-unionfs, paulmoore, salyzyn, sds, selinux

Hi David,

this patch doesn't compile with CONFIG_SECURITY=n:

./include/linux/security.h: In function 'security_inode_need_killpriv':
./include/linux/security.h:893:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
  893 |         return cap_inode_need_killpriv(dentry);
      |                                        ^~~~~~
      |                                        |
      |                                        struct dentry *
./include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
  153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
      |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

I applied the patch on linux-next tag next-20220318, but the relevant part
doesn't seem to have changed lately.

Regards
Luca

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr
  2022-03-25 11:02   ` Luca Weiss
@ 2022-03-25 14:44     ` Paul Moore
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Moore @ 2022-03-25 14:44 UTC (permalink / raw)
  To: Luca Weiss
  Cc: dvander, Luca.Boccassi, darrick.wong, dsterba, hubcap, jack,
	jlayton, kernel-team, linux-fsdevel, linux-kernel,
	linux-security-module, linux-unionfs, paulmoore, salyzyn, sds,
	selinux

On Fri, Mar 25, 2022 at 7:02 AM Luca Weiss <luca.weiss@fairphone.com> wrote:
>
> Hi David,
>
> this patch doesn't compile with CONFIG_SECURITY=n:
>
> ./include/linux/security.h: In function 'security_inode_need_killpriv':
> ./include/linux/security.h:893:40: error: passing argument 1 of 'cap_inode_need_killpriv' from incompatible pointer type [-Werror=incompatible-pointer-types]
>   893 |         return cap_inode_need_killpriv(dentry);
>       |                                        ^~~~~~
>       |                                        |
>       |                                        struct dentry *
> ./include/linux/security.h:153:52: note: expected 'struct user_namespace *' but argument is of type 'struct dentry *'
>   153 | int cap_inode_need_killpriv(struct user_namespace *mnt_userns,
>       |                             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
>
> I applied the patch on linux-next tag next-20220318, but the relevant part
> doesn't seem to have changed lately.

I believe David (and Google) have abandoned this patchset in favor of
another approach.  I'm possibly going to recycle some of the ideas in
this patchset for some future work, but the details are still TBD.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2022-03-11 20:52                     ` Paul Moore
@ 2023-03-22  7:28                       ` Johannes Segitz
  2023-03-22 12:48                         ` Amir Goldstein
  2023-03-22 14:05                         ` Paul Moore
  0 siblings, 2 replies; 29+ messages in thread
From: Johannes Segitz @ 2023-03-22  7:28 UTC (permalink / raw)
  To: Paul Moore
  Cc: Amir Goldstein, Vivek Goyal, Miklos Szeredi, David Anderson,
	Mark Salyzyn, Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

[-- Attachment #1: Type: text/plain, Size: 3058 bytes --]

On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote:
> On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > Agreed. After going through the patch set, I was wondering what's the
> > overall security model and how to visualize that.
> >
> > So probably there needs to be a documentation patch which explains
> > what's the new security model and how does it work.
> 
> Yes, of course.  I'll be sure to add a section to the existing docs.
> 
> > Also think both in terms of DAC and MAC. (Instead of just focussing too
> > hard on SELinux).
> 
> Definitely.  Most of what I've been thinking about the past day or so
> has been how to properly handle some of the DAC/capability issues; I
> have yet to start playing with the code, but for the most part I think
> the MAC/SELinux bits are already working properly.
> 
> > My understanding is that in current model, some of the overlayfs
> > operations require priviliges. So mounter is supposed to be priviliged
> > and does the operation on underlying layers.
> >
> > Now in this new model, there will be two levels of check. Both overlay
> > level and underlying layer checks will happen in the context of task
> > which is doing the operation. So first of all, all tasks will need
> > to have enough priviliges to be able to perform various operations
> > on lower layer.
> >
> > If we do checks at both the levels in with the creds of calling task,
> > I guess that probably is fine. (But will require a closer code inspection
> > to make sure there is no privilege escalation both for mounter as well
> > calling task).
> 
> I have thoughts on this, but I don't think I'm yet in a position to
> debate this in depth just yet; I still need to finish poking around
> the code and playing with a few things :)
> 
> It may take some time before I'm back with patches, but I appreciate
> all of the tips and insight - thank you!

Let me resurrect this discussion. With 
https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792
the Fedora policy changed kernel_t to a confined domain. This means that
many overlayfs setups that are created in initrd will now run into issues,
as it will have kernel_t as part of the saved credentials. So while the
original use case that inspired the patch set was probably not very common
that now changed.

It's tricky to work around this. Loading a policy in initrd causes a lot of
issues now that kernel_t isn't unconfined anymore. Once the policy is
loaded by systemd changing the mounts is tough since we use it for /etc and
at this time systemd already has open file handles for policy files in
/etc.

Johannes
-- 
GPG Key                EE16 6BCE AD56 E034 BFB3  3ADD 7BF7 29D5 E7C8 1FA0
Subkey fingerprint:    250F 43F5 F7CE 6F1E 9C59  4F95 BC27 DD9D 2CC4 FD66
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nürnberg, Germany
Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
(HRB 36809, AG Nürnberg)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2023-03-22  7:28                       ` Johannes Segitz
@ 2023-03-22 12:48                         ` Amir Goldstein
  2023-03-22 14:07                           ` Paul Moore
  2023-03-22 14:05                         ` Paul Moore
  1 sibling, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2023-03-22 12:48 UTC (permalink / raw)
  To: Johannes Segitz
  Cc: Paul Moore, Vivek Goyal, Miklos Szeredi, David Anderson,
	Mark Salyzyn, Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

On Wed, Mar 22, 2023 at 9:28 AM Johannes Segitz <jsegitz@suse.com> wrote:
>
> On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote:
> > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Agreed. After going through the patch set, I was wondering what's the
> > > overall security model and how to visualize that.
> > >
> > > So probably there needs to be a documentation patch which explains
> > > what's the new security model and how does it work.
> >
> > Yes, of course.  I'll be sure to add a section to the existing docs.
> >
> > > Also think both in terms of DAC and MAC. (Instead of just focussing too
> > > hard on SELinux).
> >
> > Definitely.  Most of what I've been thinking about the past day or so
> > has been how to properly handle some of the DAC/capability issues; I
> > have yet to start playing with the code, but for the most part I think
> > the MAC/SELinux bits are already working properly.
> >
> > > My understanding is that in current model, some of the overlayfs
> > > operations require priviliges. So mounter is supposed to be priviliged
> > > and does the operation on underlying layers.
> > >
> > > Now in this new model, there will be two levels of check. Both overlay
> > > level and underlying layer checks will happen in the context of task
> > > which is doing the operation. So first of all, all tasks will need
> > > to have enough priviliges to be able to perform various operations
> > > on lower layer.
> > >
> > > If we do checks at both the levels in with the creds of calling task,
> > > I guess that probably is fine. (But will require a closer code inspection
> > > to make sure there is no privilege escalation both for mounter as well
> > > calling task).
> >
> > I have thoughts on this, but I don't think I'm yet in a position to
> > debate this in depth just yet; I still need to finish poking around
> > the code and playing with a few things :)
> >
> > It may take some time before I'm back with patches, but I appreciate
> > all of the tips and insight - thank you!
>
> Let me resurrect this discussion. With
> https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792
> the Fedora policy changed kernel_t to a confined domain. This means that
> many overlayfs setups that are created in initrd will now run into issues,
> as it will have kernel_t as part of the saved credentials. So while the
> original use case that inspired the patch set was probably not very common
> that now changed.

I don't remember anyone rejecting the patches on the account that
the Android use case is not important. It was never the issue.

>
> It's tricky to work around this. Loading a policy in initrd causes a lot of
> issues now that kernel_t isn't unconfined anymore. Once the policy is
> loaded by systemd changing the mounts is tough since we use it for /etc and
> at this time systemd already has open file handles for policy files in
> /etc.
>

I've already explained several times on this thread what needs to be
done in order to move forward - express the security model and
explain why it is safe.

If the security guys are going to be in LSS in Vancouver, perhaps
we can have a meetup with overlayfs developers on the overlap
day with LSFMM (May 10) to try and figure out a path forward.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2023-03-22  7:28                       ` Johannes Segitz
  2023-03-22 12:48                         ` Amir Goldstein
@ 2023-03-22 14:05                         ` Paul Moore
  1 sibling, 0 replies; 29+ messages in thread
From: Paul Moore @ 2023-03-22 14:05 UTC (permalink / raw)
  To: Johannes Segitz
  Cc: Amir Goldstein, Vivek Goyal, Miklos Szeredi, David Anderson,
	Mark Salyzyn, Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi, brauner

On Wed, Mar 22, 2023 at 3:28 AM Johannes Segitz <jsegitz@suse.com> wrote:
> On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote:
> > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Agreed. After going through the patch set, I was wondering what's the
> > > overall security model and how to visualize that.
> > >
> > > So probably there needs to be a documentation patch which explains
> > > what's the new security model and how does it work.
> >
> > Yes, of course.  I'll be sure to add a section to the existing docs.
> >
> > > Also think both in terms of DAC and MAC. (Instead of just focussing too
> > > hard on SELinux).
> >
> > Definitely.  Most of what I've been thinking about the past day or so
> > has been how to properly handle some of the DAC/capability issues; I
> > have yet to start playing with the code, but for the most part I think
> > the MAC/SELinux bits are already working properly.
> >
> > > My understanding is that in current model, some of the overlayfs
> > > operations require priviliges. So mounter is supposed to be priviliged
> > > and does the operation on underlying layers.
> > >
> > > Now in this new model, there will be two levels of check. Both overlay
> > > level and underlying layer checks will happen in the context of task
> > > which is doing the operation. So first of all, all tasks will need
> > > to have enough priviliges to be able to perform various operations
> > > on lower layer.
> > >
> > > If we do checks at both the levels in with the creds of calling task,
> > > I guess that probably is fine. (But will require a closer code inspection
> > > to make sure there is no privilege escalation both for mounter as well
> > > calling task).
> >
> > I have thoughts on this, but I don't think I'm yet in a position to
> > debate this in depth just yet; I still need to finish poking around
> > the code and playing with a few things :)
> >
> > It may take some time before I'm back with patches, but I appreciate
> > all of the tips and insight - thank you!
>
> Let me resurrect this discussion. With
> https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792
> the Fedora policy changed kernel_t to a confined domain. This means that
> many overlayfs setups that are created in initrd will now run into issues,
> as it will have kernel_t as part of the saved credentials.

Regardless of any overlayfs cred work, it seems like it would also be
worth spending some time to see if the kernel_t mounter creds
situation can also be improved.  I'm guessing this is due to mounts
happening before the SELinux policy is loaded?  Has anyone looked into
mounting the SELinux policy even earlier in these cases (may not be
possible) and/or umount/mount/remounting the affected overlayfs-based
filesystems after the policy has been loaded?

I can't say I'm the best person to comment on how the Fedora SELinux
policy is structured, but I do know a *little* about SELinux and I
think that accepting kernel_t as an overlayfs mounter cred is a
mistake.

> So while the
> original use case that inspired the patch set was probably not very common
> that now changed.
>
> It's tricky to work around this. Loading a policy in initrd causes a lot of
> issues now that kernel_t isn't unconfined anymore. Once the policy is
> loaded by systemd changing the mounts is tough since we use it for /etc and
> at this time systemd already has open file handles for policy files in
> /etc.

It's been a while since I worked on this, but I pretty much had to
give up on the read-write case, the overlayfs copy-up/work-dir
approach made this impractical, or at least I couldn't think of a sane
way to handle this without some sort of credential override.  However,
I did have a quick-and-dirty prototype that appeared to work well in
the read-only/no-work-dir case; I think I still have it in a
development branch somewhere, I can dig it back up and get it ported
to a modern kernel if there is any interest.

However, when discussing the prototype with Christian Brauner off-list
(added to the CC line) he still objected to the no-cred-override
approach and said it wasn't something he could support, so I dropped
it and focused on the other piles of fire lying about my desk (my
apologies to Christian if I'm mis-remembering/understanding the
conversation).  I still think there is value in supporting a
no-creds-override option, and if there is basic support for getting
this upstream I'm happy to pick the work back up, but I can't invest a
lot more time in this if there isn't an agreement from the
overlayfs/VFS maintainers that this is something that would consider.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix
  2023-03-22 12:48                         ` Amir Goldstein
@ 2023-03-22 14:07                           ` Paul Moore
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Moore @ 2023-03-22 14:07 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Johannes Segitz, Vivek Goyal, Miklos Szeredi, David Anderson,
	Mark Salyzyn, Jonathan Corbet, Eric W . Biederman, Randy Dunlap,
	Stephen Smalley, John Stultz, linux-doc, linux-kernel,
	linux-fsdevel, overlayfs, LSM List, kernel-team, selinux,
	paulmoore, luca.boccassi

On Wed, Mar 22, 2023 at 8:48 AM Amir Goldstein <amir73il@gmail.com> wrote:
> If the security guys are going to be in LSS in Vancouver, perhaps
> we can have a meetup with overlayfs developers on the overlap
> day with LSFMM (May 10) to try and figure out a path forward.

At the very least I currently plan to be at LSS-NA in Vancouver and
would be very happy to discuss this with anyone who wants to talk
about it.  I'm also happy to continue to conversation here too, for
the sake of those who might not be able to travel to LSS-NA and/or
LSFMM.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-03-22 14:08 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17  1:58 [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix David Anderson
2021-11-17  1:58 ` [PATCH v19 1/4] Add flags option to get xattr method paired to __vfs_getxattr David Anderson
2021-11-17 16:13   ` kernel test robot
2022-03-25 11:02   ` Luca Weiss
2022-03-25 14:44     ` Paul Moore
2021-11-17  1:58 ` [PATCH v19 2/4] overlayfs: handle XATTR_NOSECURITY flag for get xattr method David Anderson
2021-11-17  1:58 ` [PATCH v19 3/4] overlayfs: override_creds=off option bypass creator_cred David Anderson
2021-11-17  1:58 ` [PATCH v19 4/4] overlayfs: inode_owner_or_capable called during execv David Anderson
2021-11-17  2:18 ` [PATCH v19 0/4] overlayfs override_creds=off & nested get xattr fix Casey Schaufler
2021-11-18  7:59   ` David Anderson
2021-11-17  7:36 ` Amir Goldstein
2021-11-18  9:53   ` David Anderson
2021-11-18 10:20     ` Amir Goldstein
2021-11-18 20:15       ` David Anderson
2021-11-18 20:32         ` Amir Goldstein
2021-12-03 15:37   ` Vivek Goyal
2021-12-03 16:04     ` Paul Moore
2021-12-03 16:31     ` Amir Goldstein
2021-12-03 18:34       ` Vivek Goyal
2022-03-01  1:09         ` Paul Moore
     [not found]           ` <CA+FmFJA-r+JgMqObNCvE_X+L6jxWtDrczM9Jh0L38Fq-6mnbbA@mail.gmail.com>
2022-03-09 21:13             ` Paul Moore
2022-03-10 22:11               ` Paul Moore
2022-03-11  4:09                 ` Amir Goldstein
2022-03-11 14:01                   ` Vivek Goyal
2022-03-11 20:52                     ` Paul Moore
2023-03-22  7:28                       ` Johannes Segitz
2023-03-22 12:48                         ` Amir Goldstein
2023-03-22 14:07                           ` Paul Moore
2023-03-22 14:05                         ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).