linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>,
	linux-fsdevel@vger.kernel.org,
	overlayfs <linux-unionfs@vger.kernel.org>,
	LSM <linux-security-module@vger.kernel.org>,
	linux-kernel@vger.kernel.org,
	"Serge E. Hallyn" <serge@hallyn.com>
Subject: Re: [PATCH v2 01/10] vfs: move cap_convert_nscap() call into vfs_setxattr()
Date: Tue, 12 Jan 2021 10:43:05 +0100	[thread overview]
Message-ID: <CAJfpegtKMwTZwENX7hrVGUVRWgNTf4Tr_bRxYrPpPAH_D2fH-Q@mail.gmail.com> (raw)
In-Reply-To: <874kjnm2p2.fsf@x220.int.ebiederm.org>

On Tue, Jan 12, 2021 at 1:15 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> writes:
>
> > On Fri, Jan 01, 2021 at 11:35:16AM -0600, Eric W. Biederman wrote:

> > For one: a v2 fscap is supposed to be equivalent to a v3 fscap with a rootid of
> > zero, right?
>
> Yes.  This assumes that everything is translated into the uids of the
> target filesystem.
>
> > If so, why does cap_inode_getsecurity() treat them differently (v2 fscap
> > succeeding unconditionally while v3 one being either converted to v2, rejected
> > or left as v3 depending on current_user_ns())?
>
> As I understand it v2 fscaps have always succeeded unconditionally.  The
> only case I can see for a v2 fscap might not succeed when read is if the
> filesystem is outside of the initial user namespace.

Looking again, it's rather confusing.  cap_inode_getsecurity()
currently handles the following cases:

v1: -> fails with -EINVAL

v2: -> returns unconverted xattr

v3:
 a) rootid is mapped in the current namespace to non-zero:
     -> convert rootid

 b) rootid owns the current or ancerstor namespace:
     -> convert to v2

 c) rootid is not mapped and is not owner:
     -> return -EOPNOTSUPP -> falls back to unconverted v3

So lets take the example, where a tmpfs is created in a private user
namespace and one file has a v2 cap and the other an equivalent v3 cap
with a zero rootid.  This is the result when looking at it from

1) the namespace of the fs:
---------------------------------------
t = cap_dac_override+eip
tt = cap_dac_override+eip

2) the initial namespace:
---------------------------------------
t = cap_dac_override+eip
tt = cap_dac_override+eip [rootid=1000]

3) an unrelated namespace:
---------------------------------------
t = cap_dac_override+eip
tt = cap_dac_override+eip

Note: in this last case getxattr will actually return a v3 cap with
zero rootid for "tt" which getcap does not display due to being zero.
I could do a setup with a nested namespaces that better demonstrate
the confusing nature of this, but I think this also proves the point.

At this point userspace simply cannot determine whether the returned
cap is in any way valid or not.

The following semantics would make a ton more sense, since getting a
v2 would indicate that rootid is unknown:

- if cap is v2 convert to v3 with zero rootid
- after this, check if rootid needs to be translated, if not return v3
- if yes, try to translate to current ns, if succeeds return translated v3
- if not mappable, return v2

Hmm?

> > Anyway, here's a patch that I think fixes getxattr() layering for
> > security.capability.  Does basically what you suggested.  Slight change of
> > semantics vs. v1 caps, not sure if that is still needed, getxattr()/setxattr()
> > hasn't worked for these since the introduction of v3 in 4.14.
> > Untested.
>
> Taking a look.  The goal of change how these operate is to make it so
> that layered filesystems can just pass through the data if they don't
> want to change anything (even with the user namespaces of the
> filesystems in question are different).
>
> Feedback on the code below:
> - cap_get should be in inode_operations like get_acl and set_acl.

So it's not clear to me why xattr ops are per-sb and acl ops are per-inode.

> - cap_get should return a cpu_vfs_cap_data.
>
>   Which means that only make_kuid is needed when reading the cap from
>   disk.

It also means translating the cap bits back and forth between disk and
cpu endian.  Not a big deal, but...

>   Which means that except for the rootid_owns_currentns check (which
>   needs to happen elsewhere) default_cap_get should be today's
>   get_vfs_cap_from_disk.

That's true.   So what's the deal with v1 caps?  Support was silently
dropped for getxattr/setxattr but remained in get_vfs_caps_from_disk()
(I guess to not break legacy disk images), but maybe it's time to
deprecate v1 caps completely?

> - With the introduction of cap_get I believe commoncap should stop
>   implementing the security_inode_getsecurity hook, and rather have
>   getxattr observe is the file capability xatter and call the new
>   vfs_cap_get then translate to a v2 or v3 cap as appropriate when
>   returning the cap to userspace.

Confused.  vfs_cap_get() is the one the layered filesystem will
recurse with, so it must not translate the cap.   The one to do that
would be __vfs_getxattr(), right?

Thanks,
Miklos

  reply	other threads:[~2021-01-12  9:44 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 16:32 [PATCH v2 00/10] allow unprivileged overlay mounts Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 01/10] vfs: move cap_convert_nscap() call into vfs_setxattr() Miklos Szeredi
2020-12-09  1:53   ` James Morris
2021-01-01 17:35   ` Eric W. Biederman
2021-01-11 13:49     ` Miklos Szeredi
2021-01-12  0:14       ` Eric W. Biederman
2021-01-12  9:43         ` Miklos Szeredi [this message]
2021-01-12 10:04           ` Miklos Szeredi
2021-01-12 18:36           ` Eric W. Biederman
2021-01-12 18:49             ` Eric W. Biederman
2020-12-07 16:32 ` [PATCH v2 02/10] vfs: verify source area in vfs_dedupe_file_range_one() Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 03/10] ovl: check privs before decoding file handle Miklos Szeredi
2020-12-08 13:49   ` Amir Goldstein
2020-12-09 10:13     ` Miklos Szeredi
2020-12-09 16:20       ` Miklos Szeredi
2020-12-09 18:16         ` Amir Goldstein
2020-12-07 16:32 ` [PATCH v2 04/10] ovl: make ioctl() safe Miklos Szeredi
2020-12-08 11:11   ` Amir Goldstein
2020-12-10 15:18     ` Miklos Szeredi
2020-12-14  5:44       ` Amir Goldstein
2020-12-14 13:23         ` Miklos Szeredi
2020-12-14 14:47           ` Amir Goldstein
2020-12-09  1:57   ` James Morris
2020-12-10 15:19     ` Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 05/10] ovl: simplify file splice Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 06/10] ovl: user xattr Miklos Szeredi
2020-12-08 13:10   ` Amir Goldstein
2020-12-11 14:55     ` Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 07/10] ovl: do not fail when setting origin xattr Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 08/10] ovl: do not fail because of O_NOATIME Miklos Szeredi
2020-12-08 11:29   ` Amir Goldstein
2020-12-11 14:44     ` Miklos Szeredi
2020-12-14  5:49       ` Amir Goldstein
2020-12-07 16:32 ` [PATCH v2 09/10] ovl: do not get metacopy for userxattr Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 10/10] ovl: unprivieged mounts Miklos Szeredi
2020-12-08 10:27 ` [PATCH v2 00/10] allow unprivileged overlay mounts Tetsuo Handa
2020-12-10  8:56   ` John Johansen
2020-12-10  9:39     ` Miklos Szeredi
2020-12-15 11:03       ` John Johansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJfpegtKMwTZwENX7hrVGUVRWgNTf4Tr_bRxYrPpPAH_D2fH-Q@mail.gmail.com \
    --to=miklos@szeredi.hu \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).