From: Miklos Szeredi <miklos@szeredi.hu>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>,
linux-fsdevel@vger.kernel.org,
overlayfs <linux-unionfs@vger.kernel.org>,
LSM <linux-security-module@vger.kernel.org>,
linux-kernel@vger.kernel.org,
"Serge E. Hallyn" <serge@hallyn.com>
Subject: Re: [PATCH v2 01/10] vfs: move cap_convert_nscap() call into vfs_setxattr()
Date: Tue, 12 Jan 2021 10:43:05 +0100 [thread overview]
Message-ID: <CAJfpegtKMwTZwENX7hrVGUVRWgNTf4Tr_bRxYrPpPAH_D2fH-Q@mail.gmail.com> (raw)
In-Reply-To: <874kjnm2p2.fsf@x220.int.ebiederm.org>
On Tue, Jan 12, 2021 at 1:15 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> writes:
>
> > On Fri, Jan 01, 2021 at 11:35:16AM -0600, Eric W. Biederman wrote:
> > For one: a v2 fscap is supposed to be equivalent to a v3 fscap with a rootid of
> > zero, right?
>
> Yes. This assumes that everything is translated into the uids of the
> target filesystem.
>
> > If so, why does cap_inode_getsecurity() treat them differently (v2 fscap
> > succeeding unconditionally while v3 one being either converted to v2, rejected
> > or left as v3 depending on current_user_ns())?
>
> As I understand it v2 fscaps have always succeeded unconditionally. The
> only case I can see for a v2 fscap might not succeed when read is if the
> filesystem is outside of the initial user namespace.
Looking again, it's rather confusing. cap_inode_getsecurity()
currently handles the following cases:
v1: -> fails with -EINVAL
v2: -> returns unconverted xattr
v3:
a) rootid is mapped in the current namespace to non-zero:
-> convert rootid
b) rootid owns the current or ancerstor namespace:
-> convert to v2
c) rootid is not mapped and is not owner:
-> return -EOPNOTSUPP -> falls back to unconverted v3
So lets take the example, where a tmpfs is created in a private user
namespace and one file has a v2 cap and the other an equivalent v3 cap
with a zero rootid. This is the result when looking at it from
1) the namespace of the fs:
---------------------------------------
t = cap_dac_override+eip
tt = cap_dac_override+eip
2) the initial namespace:
---------------------------------------
t = cap_dac_override+eip
tt = cap_dac_override+eip [rootid=1000]
3) an unrelated namespace:
---------------------------------------
t = cap_dac_override+eip
tt = cap_dac_override+eip
Note: in this last case getxattr will actually return a v3 cap with
zero rootid for "tt" which getcap does not display due to being zero.
I could do a setup with a nested namespaces that better demonstrate
the confusing nature of this, but I think this also proves the point.
At this point userspace simply cannot determine whether the returned
cap is in any way valid or not.
The following semantics would make a ton more sense, since getting a
v2 would indicate that rootid is unknown:
- if cap is v2 convert to v3 with zero rootid
- after this, check if rootid needs to be translated, if not return v3
- if yes, try to translate to current ns, if succeeds return translated v3
- if not mappable, return v2
Hmm?
> > Anyway, here's a patch that I think fixes getxattr() layering for
> > security.capability. Does basically what you suggested. Slight change of
> > semantics vs. v1 caps, not sure if that is still needed, getxattr()/setxattr()
> > hasn't worked for these since the introduction of v3 in 4.14.
> > Untested.
>
> Taking a look. The goal of change how these operate is to make it so
> that layered filesystems can just pass through the data if they don't
> want to change anything (even with the user namespaces of the
> filesystems in question are different).
>
> Feedback on the code below:
> - cap_get should be in inode_operations like get_acl and set_acl.
So it's not clear to me why xattr ops are per-sb and acl ops are per-inode.
> - cap_get should return a cpu_vfs_cap_data.
>
> Which means that only make_kuid is needed when reading the cap from
> disk.
It also means translating the cap bits back and forth between disk and
cpu endian. Not a big deal, but...
> Which means that except for the rootid_owns_currentns check (which
> needs to happen elsewhere) default_cap_get should be today's
> get_vfs_cap_from_disk.
That's true. So what's the deal with v1 caps? Support was silently
dropped for getxattr/setxattr but remained in get_vfs_caps_from_disk()
(I guess to not break legacy disk images), but maybe it's time to
deprecate v1 caps completely?
> - With the introduction of cap_get I believe commoncap should stop
> implementing the security_inode_getsecurity hook, and rather have
> getxattr observe is the file capability xatter and call the new
> vfs_cap_get then translate to a v2 or v3 cap as appropriate when
> returning the cap to userspace.
Confused. vfs_cap_get() is the one the layered filesystem will
recurse with, so it must not translate the cap. The one to do that
would be __vfs_getxattr(), right?
Thanks,
Miklos
next prev parent reply other threads:[~2021-01-12 9:44 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-07 16:32 [PATCH v2 00/10] allow unprivileged overlay mounts Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 01/10] vfs: move cap_convert_nscap() call into vfs_setxattr() Miklos Szeredi
2020-12-09 1:53 ` James Morris
2021-01-01 17:35 ` Eric W. Biederman
2021-01-11 13:49 ` Miklos Szeredi
2021-01-12 0:14 ` Eric W. Biederman
2021-01-12 9:43 ` Miklos Szeredi [this message]
2021-01-12 10:04 ` Miklos Szeredi
2021-01-12 18:36 ` Eric W. Biederman
2021-01-12 18:49 ` Eric W. Biederman
2020-12-07 16:32 ` [PATCH v2 02/10] vfs: verify source area in vfs_dedupe_file_range_one() Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 03/10] ovl: check privs before decoding file handle Miklos Szeredi
2020-12-08 13:49 ` Amir Goldstein
2020-12-09 10:13 ` Miklos Szeredi
2020-12-09 16:20 ` Miklos Szeredi
2020-12-09 18:16 ` Amir Goldstein
2020-12-07 16:32 ` [PATCH v2 04/10] ovl: make ioctl() safe Miklos Szeredi
2020-12-08 11:11 ` Amir Goldstein
2020-12-10 15:18 ` Miklos Szeredi
2020-12-14 5:44 ` Amir Goldstein
2020-12-14 13:23 ` Miklos Szeredi
2020-12-14 14:47 ` Amir Goldstein
2020-12-09 1:57 ` James Morris
2020-12-10 15:19 ` Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 05/10] ovl: simplify file splice Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 06/10] ovl: user xattr Miklos Szeredi
2020-12-08 13:10 ` Amir Goldstein
2020-12-11 14:55 ` Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 07/10] ovl: do not fail when setting origin xattr Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 08/10] ovl: do not fail because of O_NOATIME Miklos Szeredi
2020-12-08 11:29 ` Amir Goldstein
2020-12-11 14:44 ` Miklos Szeredi
2020-12-14 5:49 ` Amir Goldstein
2020-12-07 16:32 ` [PATCH v2 09/10] ovl: do not get metacopy for userxattr Miklos Szeredi
2020-12-07 16:32 ` [PATCH v2 10/10] ovl: unprivieged mounts Miklos Szeredi
2020-12-08 10:27 ` [PATCH v2 00/10] allow unprivileged overlay mounts Tetsuo Handa
2020-12-10 8:56 ` John Johansen
2020-12-10 9:39 ` Miklos Szeredi
2020-12-15 11:03 ` John Johansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJfpegtKMwTZwENX7hrVGUVRWgNTf4Tr_bRxYrPpPAH_D2fH-Q@mail.gmail.com \
--to=miklos@szeredi.hu \
--cc=ebiederm@xmission.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=mszeredi@redhat.com \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).