All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH 07/10] ceph: update cap message struct version to 9
@ 2016-11-07 21:21 Sage Weil
  2016-11-07 21:51 ` Jeff Layton
  2016-11-07 23:15 ` Gregory Farnum
  0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2016-11-07 21:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov, Zheng Yan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11551 bytes --]

On Mon, 7 Nov 2016, Jeff Layton wrote:
> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > >> > > > > >
> > > >> > > > > > On Fri, 2016-11-04 at 07:34 -0400, Jeff Layton wrote:
> > > >> > > > > > >
> > > >> > > > > > > The userland ceph has MClientCaps at struct version 9. This brings the
> > > >> > > > > > > kernel up the same version.
> > > >> > > > > > >
> > > >> > > > > > > With this change, we have to start tracking the btime and change_attr,
> > > >> > > > > > > so that the client can pass back sane values in cap messages. The
> > > >> > > > > > > client doesn't care about the btime at all, so this is just passed
> > > >> > > > > > > around, but the change_attr is used when ceph is exported via NFS.
> > > >> > > > > > >
> > > >> > > > > > > For now, the new "sync" parm is left at 0, to preserve the existing
> > > >> > > > > > > behavior of the client.
> > > >> > > > > > >
> > > >> > > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > >> > > > > > > ---
> > > >> > > > > > >  fs/ceph/caps.c | 33 +++++++++++++++++++++++++--------
> > > >> > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > >> > > > > > >
> > > >> > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > >> > > > > > > index 6e99866b1946..452f5024589f 100644
> > > >> > > > > > > --- a/fs/ceph/caps.c
> > > >> > > > > > > +++ b/fs/ceph/caps.c
> > > >> > > > > > > @@ -991,9 +991,9 @@ struct cap_msg_args {
> > > >> > > > > > >       struct ceph_mds_session *session;
> > > >> > > > > > >       u64                     ino, cid, follows;
> > > >> > > > > > >       u64                     flush_tid, oldest_flush_tid, size, max_size;
> > > >> > > > > > > -     u64                     xattr_version;
> > > >> > > > > > > +     u64                     xattr_version, change_attr;
> > > >> > > > > > >       struct ceph_buffer      *xattr_buf;
> > > >> > > > > > > -     struct timespec         atime, mtime, ctime;
> > > >> > > > > > > +     struct timespec         atime, mtime, ctime, btime;
> > > >> > > > > > >       int                     op, caps, wanted, dirty;
> > > >> > > > > > >       u32                     seq, issue_seq, mseq, time_warp_seq;
> > > >> > > > > > >       kuid_t                  uid;
> > > >> > > > > > > @@ -1026,13 +1026,13 @@ static int send_cap_msg(struct cap_msg_args *arg)
> > > >> > > > > > >
> > > >> > > > > > >       /* flock buffer size + inline version + inline data size +
> > > >> > > > > > >        * osd_epoch_barrier + oldest_flush_tid */
> > > >> > > > > > > -     extra_len = 4 + 8 + 4 + 4 + 8;
> > > >> > > > > > > +     extra_len = 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 1;
> > > >> > > > > > >       msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, sizeof(*fc) + extra_len,
> > > >> > > > > > >                          GFP_NOFS, false);
> > > >> > > > > > >       if (!msg)
> > > >> > > > > > >               return -ENOMEM;
> > > >> > > > > > >
> > > >> > > > > > > -     msg->hdr.version = cpu_to_le16(6);
> > > >> > > > > > > +     msg->hdr.version = cpu_to_le16(9);
> > > >> > > > > > >       msg->hdr.tid = cpu_to_le64(arg->flush_tid);
> > > >> > > > > > >
> > > >> > > > > > >       fc = msg->front.iov_base;
> > > >> > > > > > > @@ -1068,17 +1068,30 @@ static int send_cap_msg(struct cap_msg_args *arg)
> > > >> > > > > > >       }
> > > >> > > > > > >
> > > >> > > > > > >       p = fc + 1;
> > > >> > > > > > > -     /* flock buffer size */
> > > >> > > > > > > +     /* flock buffer size (version 2) */
> > > >> > > > > > >       ceph_encode_32(&p, 0);
> > > >> > > > > > > -     /* inline version */
> > > >> > > > > > > +     /* inline version (version 4) */
> > > >> > > > > > >       ceph_encode_64(&p, arg->inline_data ? 0 : CEPH_INLINE_NONE);
> > > >> > > > > > >       /* inline data size */
> > > >> > > > > > >       ceph_encode_32(&p, 0);
> > > >> > > > > > > -     /* osd_epoch_barrier */
> > > >> > > > > > > +     /* osd_epoch_barrier (version 5) */
> > > >> > > > > > >       ceph_encode_32(&p, 0);
> > > >> > > > > > > -     /* oldest_flush_tid */
> > > >> > > > > > > +     /* oldest_flush_tid (version 6) */
> > > >> > > > > > >       ceph_encode_64(&p, arg->oldest_flush_tid);
> > > >> > > > > > >
> > > >> > > > > > > +     /* caller_uid/caller_gid (version 7) */
> > > >> > > > > > > +     ceph_encode_32(&p, (u32)-1);
> > > >> > > > > > > +     ceph_encode_32(&p, (u32)-1);
> > > >> > > > > >
> > > >> > > > > > A bit of self-review...
> > > >> > > > > >
> > > >> > > > > > Not sure if we want to set the above to something else -- maybe 0 or to
> > > >> > > > > > current's creds? That may not always make sense though (during e.g.
> > > >> > > > > > writeback).
> > > >> > > > > >
> > > >> > > >
> > > >> > > > Looking further, I'm not quite sure I understand why we send creds at
> > > >> > > > all in cap messages. Can you clarify where that matters?
> > > >> > > >
> > > >> > > > The way I look at it, would be to consider caps to be something like a
> > > >> > > > more granular NFS delegation or SMB oplock.
> > > >> > > >
> > > >> > > > In that light, a cap flush is just the client sending updated attrs for
> > > >> > > > the exclusive caps that it has already been granted. Is there a
> > > >> > > > situation where we would ever want to refuse that update?
> > > >> > >
> > > >> > > A chmod or chown can be done locally if you have excl caps and flushed
> > > >> > > back to the MDS via a caps message.  We need to verify the user has
> > > >> > > permission to make the change.
> > > >> > >
> > > >> >
> > > >> > My take is that once the MDS has delegated Ax to the client, then it's
> > > >> > effectively trusting the client to handle permissions enforcement
> > > >> > correctly. I don't see why we should second guess that.
> > > >> >
> > > >> > > > Note that nothing ever checks the return code for _do_cap_update in the
> > > >> > > > userland code. If the permissions check fails, then we'll end up
> > > >> > > > silently dropping the updated attrs on the floor.
> > > >> > >
> > > >> > > Yeah.  This was mainly for expediency... the protocol assumes that flushes
> > > >> > > don't fail.  Given that the client does it's own permissions check, I
> > > >> > > think the way to improve this is to have it prevent the flush in the first
> > > >> > > place, so that it's only nefarious clients that are effected (and who
> > > >> > > cares if they get confused).  I don't think we have a particularly good
> > > >> > > way to tell the client it can't, say, sudo chmod 0:0 a file, though.
> > > >> > >
> > > >> >
> > > >> > Sorry, I don't quite follow. How would we prevent the flush from a
> > > >> > nefarious client (which is not something we can really control)?
> > > >> >
> > > >> > In any case...ISTM that the permissions check in _do_cap_update ought to
> > > >> > be replaced by a cephx key check. IOW, what we really want to know is
> > > >> > whether the client is truly the one to which we delegated the caps. If
> > > >> > so, then we sort of have to trust that it's doing the right thing with
> > > >> > respect to permissions checking here.
> > > >>
> > > >> The capability can say "you are allowed to be uid 1000 or uid 1020." We
> > > >> want to delegate the EXCL caps to the client so that a create + chmod +
> > > >> chown + write can all happen efficiently, but we still need to ensure that
> > > >> the values they set are legal (a permitted uid/gid combo).
> > > >>
> > > >> A common example would be user workstations that are allowed access to
> > > >> /home/user and restricted via their mds caps to their uid/gid.  We need to
> > > >> prevent them from doing a 'sudo chown 0:0 foo'...
> > > >>
> > > >>
> > > >
> > > >
> > > > On what basis do you make such a decision though? For instance, NFS does
> > > > root-squashing which is (generally) a per-export+per-client thing.
> > > > It sounds like you're saying that ceph has different semantics here?
> > > >
> > > > (cc'ing Greg here)
> > > 
> > > As Sage says, we definitely avoid the root squash semantics. We
> > > discussed them last year and concluded they were an inappropriate
> > > match for Ceph's permission model.
> > > 
> > > >
> > > > Also, chown (at least under POSIX) is reserved for superuser only, and
> > > > now that I look, I think this check in MDSAuthCaps::is_capable may be
> > > > wrong:
> > > >
> > > >       // chown/chgrp
> > > >       if (mask & MAY_CHOWN) {
> > > >         if (new_uid != caller_uid ||   // you can't chown to someone else
> > > >             inode_uid != caller_uid) { // you can't chown from someone else
> > > >           continue;
> > > >         }
> > > >       }
> > > >
> > > > Shouldn't this just be a check for whether the caller_uid is 0 (or
> > > > whatever the correct check for the equivalent to the kernel's CAP_CHOWN
> > > > would be)?
> > 
> > Oops, I skipped over this part ^
> >  
> > > Without context, this does look a little weird — does it allow *any*
> > > change, given caller_uid needs to match both new and inode uid?
> > > Sort of the common case would be that the admin cap gets hit toward
> > > the beginning of the function and just allows it without ever reaching
> > > this point.
> > 
> > Yeah, the check is a bit weird.  It looks like
> > 
> > 1- A normal cap that specifies a uid can't ever change the uid.  This 
> > conditional could be simplified/clarified...
> > 
> > 2- If you have a pair of caps, like
> > 
> >   allow * uid=1, allow * uid=2
> > 
> > we still don't let you chown between uid 1 and 2.  Well, not as caller_uid 
> > 1 or 2 (which is fine), but
> > 
> > 3- Jeff is right, we don't allow root to chown between allowed uids.  
> > Like if you had
> > 
> >   allow * uid=0
> > 
> > shouldn't that let you chown anything?  I didn't really consider this 
> > case since most users would just do
> > 
> >   allow *
> > 
> > which can do anything (including chown).  But probably the 'allow * uid=0' 
> > case should be handled properly.
> > 
> > sage
> 
> It still seems to me like that should just be a check for superuser
> status. Something like:
> 
>       if (mask & MAY_CHOWN) {
> 	// only root can chown
>         if (i->match.uid != 0 || caller_uid != 0)
>           continue;
>         }
>       }
> 
> i.e. only allow chown if the capability has a uid of 0 and the
> caller_uid is also 0.
> 
> I don't think we want to ever grant an unprivileged user the ability to
> chown, do we?

Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask 
for MAY_CHOWN if the uid is changing.  Right now it's only passing 
MAY_WRITE which looks wrong too...

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 07/10] ceph: update cap message struct version to 9
  2016-11-07 21:21 [RFC PATCH 07/10] ceph: update cap message struct version to 9 Sage Weil
@ 2016-11-07 21:51 ` Jeff Layton
  2016-11-07 23:15 ` Gregory Farnum
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2016-11-07 21:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov, Zheng Yan

On Mon, 2016-11-07 at 21:21 +0000, Sage Weil wrote:
> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > >> > > > > >
> > > > >> > > > > > On Fri, 2016-11-04 at 07:34 -0400, Jeff Layton wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > The userland ceph has MClientCaps at struct version 9. This brings the
> > > > >> > > > > > > kernel up the same version.
> > > > >> > > > > > >
> > > > >> > > > > > > With this change, we have to start tracking the btime and change_attr,
> > > > >> > > > > > > so that the client can pass back sane values in cap messages. The
> > > > >> > > > > > > client doesn't care about the btime at all, so this is just passed
> > > > >> > > > > > > around, but the change_attr is used when ceph is exported via NFS.
> > > > >> > > > > > >
> > > > >> > > > > > > For now, the new "sync" parm is left at 0, to preserve the existing
> > > > >> > > > > > > behavior of the client.
> > > > >> > > > > > >
> > > > >> > > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > >> > > > > > > ---
> > > > >> > > > > > >  fs/ceph/caps.c | 33 +++++++++++++++++++++++++--------
> > > > >> > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > >> > > > > > >
> > > > >> > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > >> > > > > > > index 6e99866b1946..452f5024589f 100644
> > > > >> > > > > > > --- a/fs/ceph/caps.c
> > > > >> > > > > > > +++ b/fs/ceph/caps.c
> > > > >> > > > > > > @@ -991,9 +991,9 @@ struct cap_msg_args {
> > > > >> > > > > > >       struct ceph_mds_session *session;
> > > > >> > > > > > >       u64                     ino, cid, follows;
> > > > >> > > > > > >       u64                     flush_tid, oldest_flush_tid, size, max_size;
> > > > >> > > > > > > -     u64                     xattr_version;
> > > > >> > > > > > > +     u64                     xattr_version, change_attr;
> > > > >> > > > > > >       struct ceph_buffer      *xattr_buf;
> > > > >> > > > > > > -     struct timespec         atime, mtime, ctime;
> > > > >> > > > > > > +     struct timespec         atime, mtime, ctime, btime;
> > > > >> > > > > > >       int                     op, caps, wanted, dirty;
> > > > >> > > > > > >       u32                     seq, issue_seq, mseq, time_warp_seq;
> > > > >> > > > > > >       kuid_t                  uid;
> > > > >> > > > > > > @@ -1026,13 +1026,13 @@ static int send_cap_msg(struct cap_msg_args *arg)
> > > > >> > > > > > >
> > > > >> > > > > > >       /* flock buffer size + inline version + inline data size +
> > > > >> > > > > > >        * osd_epoch_barrier + oldest_flush_tid */
> > > > >> > > > > > > -     extra_len = 4 + 8 + 4 + 4 + 8;
> > > > >> > > > > > > +     extra_len = 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 1;
> > > > >> > > > > > >       msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, sizeof(*fc) + extra_len,
> > > > >> > > > > > >                          GFP_NOFS, false);
> > > > >> > > > > > >       if (!msg)
> > > > >> > > > > > >               return -ENOMEM;
> > > > >> > > > > > >
> > > > >> > > > > > > -     msg->hdr.version = cpu_to_le16(6);
> > > > >> > > > > > > +     msg->hdr.version = cpu_to_le16(9);
> > > > >> > > > > > >       msg->hdr.tid = cpu_to_le64(arg->flush_tid);
> > > > >> > > > > > >
> > > > >> > > > > > >       fc = msg->front.iov_base;
> > > > >> > > > > > > @@ -1068,17 +1068,30 @@ static int send_cap_msg(struct cap_msg_args *arg)
> > > > >> > > > > > >       }
> > > > >> > > > > > >
> > > > >> > > > > > >       p = fc + 1;
> > > > >> > > > > > > -     /* flock buffer size */
> > > > >> > > > > > > +     /* flock buffer size (version 2) */
> > > > >> > > > > > >       ceph_encode_32(&p, 0);
> > > > >> > > > > > > -     /* inline version */
> > > > >> > > > > > > +     /* inline version (version 4) */
> > > > >> > > > > > >       ceph_encode_64(&p, arg->inline_data ? 0 : CEPH_INLINE_NONE);
> > > > >> > > > > > >       /* inline data size */
> > > > >> > > > > > >       ceph_encode_32(&p, 0);
> > > > >> > > > > > > -     /* osd_epoch_barrier */
> > > > >> > > > > > > +     /* osd_epoch_barrier (version 5) */
> > > > >> > > > > > >       ceph_encode_32(&p, 0);
> > > > >> > > > > > > -     /* oldest_flush_tid */
> > > > >> > > > > > > +     /* oldest_flush_tid (version 6) */
> > > > >> > > > > > >       ceph_encode_64(&p, arg->oldest_flush_tid);
> > > > >> > > > > > >
> > > > >> > > > > > > +     /* caller_uid/caller_gid (version 7) */
> > > > >> > > > > > > +     ceph_encode_32(&p, (u32)-1);
> > > > >> > > > > > > +     ceph_encode_32(&p, (u32)-1);
> > > > >> > > > > >
> > > > >> > > > > > A bit of self-review...
> > > > >> > > > > >
> > > > >> > > > > > Not sure if we want to set the above to something else -- maybe 0 or to
> > > > >> > > > > > current's creds? That may not always make sense though (during e.g.
> > > > >> > > > > > writeback).
> > > > >> > > > > >
> > > > >> > > >
> > > > >> > > > Looking further, I'm not quite sure I understand why we send creds at
> > > > >> > > > all in cap messages. Can you clarify where that matters?
> > > > >> > > >
> > > > >> > > > The way I look at it, would be to consider caps to be something like a
> > > > >> > > > more granular NFS delegation or SMB oplock.
> > > > >> > > >
> > > > >> > > > In that light, a cap flush is just the client sending updated attrs for
> > > > >> > > > the exclusive caps that it has already been granted. Is there a
> > > > >> > > > situation where we would ever want to refuse that update?
> > > > >> > >
> > > > >> > > A chmod or chown can be done locally if you have excl caps and flushed
> > > > >> > > back to the MDS via a caps message.  We need to verify the user has
> > > > >> > > permission to make the change.
> > > > >> > >
> > > > >> >
> > > > >> > My take is that once the MDS has delegated Ax to the client, then it's
> > > > >> > effectively trusting the client to handle permissions enforcement
> > > > >> > correctly. I don't see why we should second guess that.
> > > > >> >
> > > > >> > > > Note that nothing ever checks the return code for _do_cap_update in the
> > > > >> > > > userland code. If the permissions check fails, then we'll end up
> > > > >> > > > silently dropping the updated attrs on the floor.
> > > > >> > >
> > > > >> > > Yeah.  This was mainly for expediency... the protocol assumes that flushes
> > > > >> > > don't fail.  Given that the client does it's own permissions check, I
> > > > >> > > think the way to improve this is to have it prevent the flush in the first
> > > > >> > > place, so that it's only nefarious clients that are effected (and who
> > > > >> > > cares if they get confused).  I don't think we have a particularly good
> > > > >> > > way to tell the client it can't, say, sudo chmod 0:0 a file, though.
> > > > >> > >
> > > > >> >
> > > > >> > Sorry, I don't quite follow. How would we prevent the flush from a
> > > > >> > nefarious client (which is not something we can really control)?
> > > > >> >
> > > > >> > In any case...ISTM that the permissions check in _do_cap_update ought to
> > > > >> > be replaced by a cephx key check. IOW, what we really want to know is
> > > > >> > whether the client is truly the one to which we delegated the caps. If
> > > > >> > so, then we sort of have to trust that it's doing the right thing with
> > > > >> > respect to permissions checking here.
> > > > >>
> > > > >> The capability can say "you are allowed to be uid 1000 or uid 1020." We
> > > > >> want to delegate the EXCL caps to the client so that a create + chmod +
> > > > >> chown + write can all happen efficiently, but we still need to ensure that
> > > > >> the values they set are legal (a permitted uid/gid combo).
> > > > >>
> > > > >> A common example would be user workstations that are allowed access to
> > > > >> /home/user and restricted via their mds caps to their uid/gid.  We need to
> > > > >> prevent them from doing a 'sudo chown 0:0 foo'...
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > On what basis do you make such a decision though? For instance, NFS does
> > > > > root-squashing which is (generally) a per-export+per-client thing.
> > > > > It sounds like you're saying that ceph has different semantics here?
> > > > >
> > > > > (cc'ing Greg here)
> > > > 
> > > > As Sage says, we definitely avoid the root squash semantics. We
> > > > discussed them last year and concluded they were an inappropriate
> > > > match for Ceph's permission model.
> > > > 
> > > > >
> > > > > Also, chown (at least under POSIX) is reserved for superuser only, and
> > > > > now that I look, I think this check in MDSAuthCaps::is_capable may be
> > > > > wrong:
> > > > >
> > > > >       // chown/chgrp
> > > > >       if (mask & MAY_CHOWN) {
> > > > >         if (new_uid != caller_uid ||   // you can't chown to someone else
> > > > >             inode_uid != caller_uid) { // you can't chown from someone else
> > > > >           continue;
> > > > >         }
> > > > >       }
> > > > >
> > > > > Shouldn't this just be a check for whether the caller_uid is 0 (or
> > > > > whatever the correct check for the equivalent to the kernel's CAP_CHOWN
> > > > > would be)?
> > > 
> > > Oops, I skipped over this part ^
> > >  
> > > > Without context, this does look a little weird — does it allow *any*
> > > > change, given caller_uid needs to match both new and inode uid?
> > > > Sort of the common case would be that the admin cap gets hit toward
> > > > the beginning of the function and just allows it without ever reaching
> > > > this point.
> > > 
> > > Yeah, the check is a bit weird.  It looks like
> > > 
> > > 1- A normal cap that specifies a uid can't ever change the uid.  This 
> > > conditional could be simplified/clarified...
> > > 
> > > 2- If you have a pair of caps, like
> > > 
> > >   allow * uid=1, allow * uid=2
> > > 
> > > we still don't let you chown between uid 1 and 2.  Well, not as caller_uid 
> > > 1 or 2 (which is fine), but
> > > 
> > > 3- Jeff is right, we don't allow root to chown between allowed uids.  
> > > Like if you had
> > > 
> > >   allow * uid=0
> > > 
> > > shouldn't that let you chown anything?  I didn't really consider this 
> > > case since most users would just do
> > > 
> > >   allow *
> > > 
> > > which can do anything (including chown).  But probably the 'allow * uid=0' 
> > > case should be handled properly.
> > > 
> > > sage
> > 
> > It still seems to me like that should just be a check for superuser
> > status. Something like:
> > 
> >       if (mask & MAY_CHOWN) {
> >       // only root can chown
> >         if (i->match.uid != 0 || caller_uid != 0)
> >           continue;
> >         }
> >       }
> > 
> > i.e. only allow chown if the capability has a uid of 0 and the
> > caller_uid is also 0.
> > 
> > I don't think we want to ever grant an unprivileged user the ability to
> > chown, do we?
> 
> Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask 
> for MAY_CHOWN if the uid is changing.  Right now it's only passing 
> MAY_WRITE which looks wrong too...
> 
> 

Ouch, yeah... ok, so I guess we just need to roll a function to set the
MAY_* mask properly for what fields are dirty in the cap flush?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 07/10] ceph: update cap message struct version to 9
  2016-11-07 21:21 [RFC PATCH 07/10] ceph: update cap message struct version to 9 Sage Weil
  2016-11-07 21:51 ` Jeff Layton
@ 2016-11-07 23:15 ` Gregory Farnum
  2016-11-07 23:21   ` Sage Weil
  1 sibling, 1 reply; 11+ messages in thread
From: Gregory Farnum @ 2016-11-07 23:15 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jeff Layton, Yan, Zheng, ceph-devel, Ilya Dryomov, Zheng Yan

On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> On Mon, 7 Nov 2016, Jeff Layton wrote:
>> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
>> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
>> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
>> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
>> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
>> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
>> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
>> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
>> > > >> > > > > >
>> > > >> > > > > > On Fri, 2016-11-04 at 07:34 -0400, Jeff Layton wrote:
>> > > >> > > > > > >
>> > > >> > > > > > > The userland ceph has MClientCaps at struct version 9. This brings the
>> > > >> > > > > > > kernel up the same version.
>> > > >> > > > > > >
>> > > >> > > > > > > With this change, we have to start tracking the btime and change_attr,
>> > > >> > > > > > > so that the client can pass back sane values in cap messages. The
>> > > >> > > > > > > client doesn't care about the btime at all, so this is just passed
>> > > >> > > > > > > around, but the change_attr is used when ceph is exported via NFS.
>> > > >> > > > > > >
>> > > >> > > > > > > For now, the new "sync" parm is left at 0, to preserve the existing
>> > > >> > > > > > > behavior of the client.
>> > > >> > > > > > >
>> > > >> > > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> > > >> > > > > > > ---
>> > > >> > > > > > >  fs/ceph/caps.c | 33 +++++++++++++++++++++++++--------
>> > > >> > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
>> > > >> > > > > > >
>> > > >> > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> > > >> > > > > > > index 6e99866b1946..452f5024589f 100644
>> > > >> > > > > > > --- a/fs/ceph/caps.c
>> > > >> > > > > > > +++ b/fs/ceph/caps.c
>> > > >> > > > > > > @@ -991,9 +991,9 @@ struct cap_msg_args {
>> > > >> > > > > > >       struct ceph_mds_session *session;
>> > > >> > > > > > >       u64                     ino, cid, follows;
>> > > >> > > > > > >       u64                     flush_tid, oldest_flush_tid, size, max_size;
>> > > >> > > > > > > -     u64                     xattr_version;
>> > > >> > > > > > > +     u64                     xattr_version, change_attr;
>> > > >> > > > > > >       struct ceph_buffer      *xattr_buf;
>> > > >> > > > > > > -     struct timespec         atime, mtime, ctime;
>> > > >> > > > > > > +     struct timespec         atime, mtime, ctime, btime;
>> > > >> > > > > > >       int                     op, caps, wanted, dirty;
>> > > >> > > > > > >       u32                     seq, issue_seq, mseq, time_warp_seq;
>> > > >> > > > > > >       kuid_t                  uid;
>> > > >> > > > > > > @@ -1026,13 +1026,13 @@ static int send_cap_msg(struct cap_msg_args *arg)
>> > > >> > > > > > >
>> > > >> > > > > > >       /* flock buffer size + inline version + inline data size +
>> > > >> > > > > > >        * osd_epoch_barrier + oldest_flush_tid */
>> > > >> > > > > > > -     extra_len = 4 + 8 + 4 + 4 + 8;
>> > > >> > > > > > > +     extra_len = 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 1;
>> > > >> > > > > > >       msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, sizeof(*fc) + extra_len,
>> > > >> > > > > > >                          GFP_NOFS, false);
>> > > >> > > > > > >       if (!msg)
>> > > >> > > > > > >               return -ENOMEM;
>> > > >> > > > > > >
>> > > >> > > > > > > -     msg->hdr.version = cpu_to_le16(6);
>> > > >> > > > > > > +     msg->hdr.version = cpu_to_le16(9);
>> > > >> > > > > > >       msg->hdr.tid = cpu_to_le64(arg->flush_tid);
>> > > >> > > > > > >
>> > > >> > > > > > >       fc = msg->front.iov_base;
>> > > >> > > > > > > @@ -1068,17 +1068,30 @@ static int send_cap_msg(struct cap_msg_args *arg)
>> > > >> > > > > > >       }
>> > > >> > > > > > >
>> > > >> > > > > > >       p = fc + 1;
>> > > >> > > > > > > -     /* flock buffer size */
>> > > >> > > > > > > +     /* flock buffer size (version 2) */
>> > > >> > > > > > >       ceph_encode_32(&p, 0);
>> > > >> > > > > > > -     /* inline version */
>> > > >> > > > > > > +     /* inline version (version 4) */
>> > > >> > > > > > >       ceph_encode_64(&p, arg->inline_data ? 0 : CEPH_INLINE_NONE);
>> > > >> > > > > > >       /* inline data size */
>> > > >> > > > > > >       ceph_encode_32(&p, 0);
>> > > >> > > > > > > -     /* osd_epoch_barrier */
>> > > >> > > > > > > +     /* osd_epoch_barrier (version 5) */
>> > > >> > > > > > >       ceph_encode_32(&p, 0);
>> > > >> > > > > > > -     /* oldest_flush_tid */
>> > > >> > > > > > > +     /* oldest_flush_tid (version 6) */
>> > > >> > > > > > >       ceph_encode_64(&p, arg->oldest_flush_tid);
>> > > >> > > > > > >
>> > > >> > > > > > > +     /* caller_uid/caller_gid (version 7) */
>> > > >> > > > > > > +     ceph_encode_32(&p, (u32)-1);
>> > > >> > > > > > > +     ceph_encode_32(&p, (u32)-1);
>> > > >> > > > > >
>> > > >> > > > > > A bit of self-review...
>> > > >> > > > > >
>> > > >> > > > > > Not sure if we want to set the above to something else -- maybe 0 or to
>> > > >> > > > > > current's creds? That may not always make sense though (during e.g.
>> > > >> > > > > > writeback).
>> > > >> > > > > >
>> > > >> > > >
>> > > >> > > > Looking further, I'm not quite sure I understand why we send creds at
>> > > >> > > > all in cap messages. Can you clarify where that matters?
>> > > >> > > >
>> > > >> > > > The way I look at it, would be to consider caps to be something like a
>> > > >> > > > more granular NFS delegation or SMB oplock.
>> > > >> > > >
>> > > >> > > > In that light, a cap flush is just the client sending updated attrs for
>> > > >> > > > the exclusive caps that it has already been granted. Is there a
>> > > >> > > > situation where we would ever want to refuse that update?
>> > > >> > >
>> > > >> > > A chmod or chown can be done locally if you have excl caps and flushed
>> > > >> > > back to the MDS via a caps message.  We need to verify the user has
>> > > >> > > permission to make the change.
>> > > >> > >
>> > > >> >
>> > > >> > My take is that once the MDS has delegated Ax to the client, then it's
>> > > >> > effectively trusting the client to handle permissions enforcement
>> > > >> > correctly. I don't see why we should second guess that.
>> > > >> >
>> > > >> > > > Note that nothing ever checks the return code for _do_cap_update in the
>> > > >> > > > userland code. If the permissions check fails, then we'll end up
>> > > >> > > > silently dropping the updated attrs on the floor.
>> > > >> > >
>> > > >> > > Yeah.  This was mainly for expediency... the protocol assumes that flushes
>> > > >> > > don't fail.  Given that the client does it's own permissions check, I
>> > > >> > > think the way to improve this is to have it prevent the flush in the first
>> > > >> > > place, so that it's only nefarious clients that are effected (and who
>> > > >> > > cares if they get confused).  I don't think we have a particularly good
>> > > >> > > way to tell the client it can't, say, sudo chmod 0:0 a file, though.
>> > > >> > >
>> > > >> >
>> > > >> > Sorry, I don't quite follow. How would we prevent the flush from a
>> > > >> > nefarious client (which is not something we can really control)?
>> > > >> >
>> > > >> > In any case...ISTM that the permissions check in _do_cap_update ought to
>> > > >> > be replaced by a cephx key check. IOW, what we really want to know is
>> > > >> > whether the client is truly the one to which we delegated the caps. If
>> > > >> > so, then we sort of have to trust that it's doing the right thing with
>> > > >> > respect to permissions checking here.
>> > > >>
>> > > >> The capability can say "you are allowed to be uid 1000 or uid 1020." We
>> > > >> want to delegate the EXCL caps to the client so that a create + chmod +
>> > > >> chown + write can all happen efficiently, but we still need to ensure that
>> > > >> the values they set are legal (a permitted uid/gid combo).
>> > > >>
>> > > >> A common example would be user workstations that are allowed access to
>> > > >> /home/user and restricted via their mds caps to their uid/gid.  We need to
>> > > >> prevent them from doing a 'sudo chown 0:0 foo'...
>> > > >>
>> > > >>
>> > > >
>> > > >
>> > > > On what basis do you make such a decision though? For instance, NFS does
>> > > > root-squashing which is (generally) a per-export+per-client thing.
>> > > > It sounds like you're saying that ceph has different semantics here?
>> > > >
>> > > > (cc'ing Greg here)
>> > >
>> > > As Sage says, we definitely avoid the root squash semantics. We
>> > > discussed them last year and concluded they were an inappropriate
>> > > match for Ceph's permission model.
>> > >
>> > > >
>> > > > Also, chown (at least under POSIX) is reserved for superuser only, and
>> > > > now that I look, I think this check in MDSAuthCaps::is_capable may be
>> > > > wrong:
>> > > >
>> > > >       // chown/chgrp
>> > > >       if (mask & MAY_CHOWN) {
>> > > >         if (new_uid != caller_uid ||   // you can't chown to someone else
>> > > >             inode_uid != caller_uid) { // you can't chown from someone else
>> > > >           continue;
>> > > >         }
>> > > >       }
>> > > >
>> > > > Shouldn't this just be a check for whether the caller_uid is 0 (or
>> > > > whatever the correct check for the equivalent to the kernel's CAP_CHOWN
>> > > > would be)?
>> >
>> > Oops, I skipped over this part ^
>> >
>> > > Without context, this does look a little weird — does it allow *any*
>> > > change, given caller_uid needs to match both new and inode uid?
>> > > Sort of the common case would be that the admin cap gets hit toward
>> > > the beginning of the function and just allows it without ever reaching
>> > > this point.
>> >
>> > Yeah, the check is a bit weird.  It looks like
>> >
>> > 1- A normal cap that specifies a uid can't ever change the uid.  This
>> > conditional could be simplified/clarified...
>> >
>> > 2- If you have a pair of caps, like
>> >
>> >   allow * uid=1, allow * uid=2
>> >
>> > we still don't let you chown between uid 1 and 2.  Well, not as caller_uid
>> > 1 or 2 (which is fine), but
>> >
>> > 3- Jeff is right, we don't allow root to chown between allowed uids.
>> > Like if you had
>> >
>> >   allow * uid=0
>> >
>> > shouldn't that let you chown anything?  I didn't really consider this
>> > case since most users would just do
>> >
>> >   allow *
>> >
>> > which can do anything (including chown).  But probably the 'allow * uid=0'
>> > case should be handled properly.
>> >
>> > sage
>>
>> It still seems to me like that should just be a check for superuser
>> status. Something like:
>>
>>       if (mask & MAY_CHOWN) {
>>       // only root can chown
>>         if (i->match.uid != 0 || caller_uid != 0)
>>           continue;
>>         }
>>       }
>>
>> i.e. only allow chown if the capability has a uid of 0 and the
>> caller_uid is also 0.
>>
>> I don't think we want to ever grant an unprivileged user the ability to
>> chown, do we?
>
> Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> for MAY_CHOWN if the uid is changing.  Right now it's only passing
> MAY_WRITE which looks wrong too...

Don't we want to let users chown between their own UIDs? A POSIX
superuser — ie root — really has very little meaning in terms of CephX
permissions. But it's perfectly legitimate for a tenant with 3 users
to chmod files between those 3. :/
-Greg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 07/10] ceph: update cap message struct version to 9
  2016-11-07 23:15 ` Gregory Farnum
@ 2016-11-07 23:21   ` Sage Weil
  2016-11-11 12:45     ` Jeff Layton
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-11-07 23:21 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: Jeff Layton, Yan, Zheng, ceph-devel, Ilya Dryomov, Zheng Yan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2142 bytes --]

On Mon, 7 Nov 2016, Gregory Farnum wrote:
> On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > On Mon, 7 Nov 2016, Jeff Layton wrote:
> >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> >> > > >> > > > > >

[okay, time to prune this a bit]

> >> It still seems to me like that should just be a check for superuser
> >> status. Something like:
> >>
> >>       if (mask & MAY_CHOWN) {
> >>       // only root can chown
> >>         if (i->match.uid != 0 || caller_uid != 0)
> >>           continue;
> >>         }
> >>       }
> >>
> >> i.e. only allow chown if the capability has a uid of 0 and the
> >> caller_uid is also 0.
> >>
> >> I don't think we want to ever grant an unprivileged user the ability to
> >> chown, do we?
> >
> > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > MAY_WRITE which looks wrong too...
> 
> Don't we want to let users chown between their own UIDs? A POSIX
> superuser — ie root — really has very little meaning in terms of CephX
> permissions. But it's perfectly legitimate for a tenant with 3 users
> to chmod files between those 3. :/

Maybe, but it'd be a different kind of check though, because it depends on 
multiple caps in order to permit the operation.  So we'd need a couple 
function-global bools like chown_src_allowed and chown_dst_allowed or 
something like that.

I'm not sure it would work even then, though, because on the client the 
user would still have to sudo chown ... to get past the client kernel's 
checks (I assume?).

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 07/10] ceph: update cap message struct version to 9
  2016-11-07 23:21   ` Sage Weil
@ 2016-11-11 12:45     ` Jeff Layton
  2016-11-11 14:48       ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2016-11-11 12:45 UTC (permalink / raw)
  To: Sage Weil, Gregory Farnum; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov, Zheng Yan

On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > >> > > >> > > > > >
> 
> [okay, time to prune this a bit]
> 
> > >> It still seems to me like that should just be a check for superuser
> > >> status. Something like:
> > >>
> > >>       if (mask & MAY_CHOWN) {
> > >>       // only root can chown
> > >>         if (i->match.uid != 0 || caller_uid != 0)
> > >>           continue;
> > >>         }
> > >>       }
> > >>
> > >> i.e. only allow chown if the capability has a uid of 0 and the
> > >> caller_uid is also 0.
> > >>
> > >> I don't think we want to ever grant an unprivileged user the ability to
> > >> chown, do we?
> > >
> > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > MAY_WRITE which looks wrong too...
> > 
> > Don't we want to let users chown between their own UIDs? A POSIX
> > superuser — ie root — really has very little meaning in terms of CephX
> > permissions. But it's perfectly legitimate for a tenant with 3 users
> > to chmod files between those 3. :/
> 
> Maybe, but it'd be a different kind of check though, because it depends on 
> multiple caps in order to permit the operation.  So we'd need a couple 
> function-global bools like chown_src_allowed and chown_dst_allowed or 
> something like that.
> 
> I'm not sure it would work even then, though, because on the client the 
> user would still have to sudo chown ... to get past the client kernel's 
> checks (I assume?).
> 
> 

Not necessarily. Most kernel filesystems call setattr_prepare, which is
where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
always delegates the permission checking for setattr to the server.

Looks like both the in-kernel and Ceph client and FUSE always call that
function, so CAP_CHOWN permissions are always checked client-side there.
Still, with libcephfs a program need not deal with the kernel at all, so
you can't really rely on that.

Isn't this potentially silent metadata corruption? You could end up with
a program that issues an ownership change that appears to work, only to
find later that it got reverted because the mds decided it didn't want
to apply it once the caps got updated.

I think the simplest solution would be to just not issue exclusive caps
at all if you aren't prepared to accept cached updates from the client
for the corresponding metadata. Those clients would get shared caps
only. So, you could have clients to which you allow Fx caps, but only
As?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 07/10] ceph: update cap message struct version to 9
  2016-11-11 12:45     ` Jeff Layton
@ 2016-11-11 14:48       ` Sage Weil
  2016-11-11 15:00         ` handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9) Jeff Layton
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-11-11 14:48 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov, Zheng Yan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3999 bytes --]

On Fri, 11 Nov 2016, Jeff Layton wrote:
> On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > >> > > >> > > > > >
> > 
> > [okay, time to prune this a bit]
> > 
> > > >> It still seems to me like that should just be a check for superuser
> > > >> status. Something like:
> > > >>
> > > >>       if (mask & MAY_CHOWN) {
> > > >>       // only root can chown
> > > >>         if (i->match.uid != 0 || caller_uid != 0)
> > > >>           continue;
> > > >>         }
> > > >>       }
> > > >>
> > > >> i.e. only allow chown if the capability has a uid of 0 and the
> > > >> caller_uid is also 0.
> > > >>
> > > >> I don't think we want to ever grant an unprivileged user the ability to
> > > >> chown, do we?
> > > >
> > > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > > MAY_WRITE which looks wrong too...
> > > 
> > > Don't we want to let users chown between their own UIDs? A POSIX
> > > superuser — ie root — really has very little meaning in terms of CephX
> > > permissions. But it's perfectly legitimate for a tenant with 3 users
> > > to chmod files between those 3. :/
> > 
> > Maybe, but it'd be a different kind of check though, because it depends on 
> > multiple caps in order to permit the operation.  So we'd need a couple 
> > function-global bools like chown_src_allowed and chown_dst_allowed or 
> > something like that.
> > 
> > I'm not sure it would work even then, though, because on the client the 
> > user would still have to sudo chown ... to get past the client kernel's 
> > checks (I assume?).
> > 
> > 
> 
> Not necessarily. Most kernel filesystems call setattr_prepare, which is
> where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
> always delegates the permission checking for setattr to the server.
> 
> Looks like both the in-kernel and Ceph client and FUSE always call that
> function, so CAP_CHOWN permissions are always checked client-side there.
> Still, with libcephfs a program need not deal with the kernel at all, so
> you can't really rely on that.

In the libcephfs case it's still doing the Client.cc checks, though, 
right?  Both the Linux VFS (on ceph.ko) and Client.cc are implementing 
client-side checks...

> Isn't this potentially silent metadata corruption? You could end up with
> a program that issues an ownership change that appears to work, only to
> find later that it got reverted because the mds decided it didn't want
> to apply it once the caps got updated.
> 
> I think the simplest solution would be to just not issue exclusive caps
> at all if you aren't prepared to accept cached updates from the client
> for the corresponding metadata. Those clients would get shared caps
> only. So, you could have clients to which you allow Fx caps, but only
> As?

That would be safe, but it would kill performance for any user writing 
files and restricted to a particular user by their caps.  I think it's 
definitely worth maintaining the Client code do the checks so that 
unpermitted updates aren't dropped... which, if I'm not mistaken, 
is already in place.

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9)
  2016-11-11 14:48       ` Sage Weil
@ 2016-11-11 15:00         ` Jeff Layton
  2016-11-11 15:07           ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2016-11-11 15:00 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov

On Fri, 2016-11-11 at 14:48 +0000, Sage Weil wrote:
> On Fri, 11 Nov 2016, Jeff Layton wrote:
> > On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> > > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > >> > > >> > > > > >
> > > 
> > > [okay, time to prune this a bit]
> > > 
> > > > >> It still seems to me like that should just be a check for superuser
> > > > >> status. Something like:
> > > > >>
> > > > >>       if (mask & MAY_CHOWN) {
> > > > >>       // only root can chown
> > > > >>         if (i->match.uid != 0 || caller_uid != 0)
> > > > >>           continue;
> > > > >>         }
> > > > >>       }
> > > > >>
> > > > >> i.e. only allow chown if the capability has a uid of 0 and the
> > > > >> caller_uid is also 0.
> > > > >>
> > > > >> I don't think we want to ever grant an unprivileged user the ability to
> > > > >> chown, do we?
> > > > >
> > > > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > > > MAY_WRITE which looks wrong too...
> > > > 
> > > > Don't we want to let users chown between their own UIDs? A POSIX
> > > > superuser — ie root — really has very little meaning in terms of CephX
> > > > permissions. But it's perfectly legitimate for a tenant with 3 users
> > > > to chmod files between those 3. :/
> > > 
> > > Maybe, but it'd be a different kind of check though, because it depends on 
> > > multiple caps in order to permit the operation.  So we'd need a couple 
> > > function-global bools like chown_src_allowed and chown_dst_allowed or 
> > > something like that.
> > > 
> > > I'm not sure it would work even then, though, because on the client the 
> > > user would still have to sudo chown ... to get past the client kernel's 
> > > checks (I assume?).
> > > 
> > > 
> > 
> > Not necessarily. Most kernel filesystems call setattr_prepare, which is
> > where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
> > always delegates the permission checking for setattr to the server.
> > 
> > Looks like both the in-kernel and Ceph client and FUSE always call that
> > function, so CAP_CHOWN permissions are always checked client-side there.
> > Still, with libcephfs a program need not deal with the kernel at all, so
> > you can't really rely on that.
> 
> In the libcephfs case it's still doing the Client.cc checks, though, 
> right?  Both the Linux VFS (on ceph.ko) and Client.cc are implementing 
> client-side checks...
> 

Only if fuse_default_permissions is set (which is a misnomer since it
affects more than just ceph-fuse). It's not set by default, but probably
should be.


> > Isn't this potentially silent metadata corruption? You could end up with
> > a program that issues an ownership change that appears to work, only to
> > find later that it got reverted because the mds decided it didn't want
> > to apply it once the caps got updated.
> > 
> > I think the simplest solution would be to just not issue exclusive caps
> > at all if you aren't prepared to accept cached updates from the client
> > for the corresponding metadata. Those clients would get shared caps
> > only. So, you could have clients to which you allow Fx caps, but only
> > As?
> 
> That would be safe, but it would kill performance for any user writing 
> files and restricted to a particular user by their caps.  I think it's 
> definitely worth maintaining the Client code do the checks so that 
> unpermitted updates aren't dropped... which, if I'm not mistaken, 
> is already in place.
> 

I don't see why that would kill performance. chown/chgrp/chmod would all
come under Auth caps, whereas read/write (which is where we really care
about performance) come under File caps.

chown/chgrp/chmod are all pretty rare in most workloads, so having to
call the mds to apply them in those cases doesn't seem like a heavy
burden.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9)
  2016-11-11 15:00         ` handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9) Jeff Layton
@ 2016-11-11 15:07           ` Sage Weil
  2016-11-11 15:39             ` Jeff Layton
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-11-11 15:07 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5245 bytes --]

On Fri, 11 Nov 2016, Jeff Layton wrote:
> On Fri, 2016-11-11 at 14:48 +0000, Sage Weil wrote:
> > On Fri, 11 Nov 2016, Jeff Layton wrote:
> > > On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> > > > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > > > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > > > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > > > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > > > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > > > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > >> > > >> > > > > >
> > > > 
> > > > [okay, time to prune this a bit]
> > > > 
> > > > > >> It still seems to me like that should just be a check for superuser
> > > > > >> status. Something like:
> > > > > >>
> > > > > >>       if (mask & MAY_CHOWN) {
> > > > > >>       // only root can chown
> > > > > >>         if (i->match.uid != 0 || caller_uid != 0)
> > > > > >>           continue;
> > > > > >>         }
> > > > > >>       }
> > > > > >>
> > > > > >> i.e. only allow chown if the capability has a uid of 0 and the
> > > > > >> caller_uid is also 0.
> > > > > >>
> > > > > >> I don't think we want to ever grant an unprivileged user the ability to
> > > > > >> chown, do we?
> > > > > >
> > > > > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > > > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > > > > MAY_WRITE which looks wrong too...
> > > > > 
> > > > > Don't we want to let users chown between their own UIDs? A POSIX
> > > > > superuser — ie root — really has very little meaning in terms of CephX
> > > > > permissions. But it's perfectly legitimate for a tenant with 3 users
> > > > > to chmod files between those 3. :/
> > > > 
> > > > Maybe, but it'd be a different kind of check though, because it depends on 
> > > > multiple caps in order to permit the operation.  So we'd need a couple 
> > > > function-global bools like chown_src_allowed and chown_dst_allowed or 
> > > > something like that.
> > > > 
> > > > I'm not sure it would work even then, though, because on the client the 
> > > > user would still have to sudo chown ... to get past the client kernel's 
> > > > checks (I assume?).
> > > > 
> > > > 
> > > 
> > > Not necessarily. Most kernel filesystems call setattr_prepare, which is
> > > where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
> > > always delegates the permission checking for setattr to the server.
> > > 
> > > Looks like both the in-kernel and Ceph client and FUSE always call that
> > > function, so CAP_CHOWN permissions are always checked client-side there.
> > > Still, with libcephfs a program need not deal with the kernel at all, so
> > > you can't really rely on that.
> > 
> > In the libcephfs case it's still doing the Client.cc checks, though, 
> > right?  Both the Linux VFS (on ceph.ko) and Client.cc are implementing 
> > client-side checks...
> > 
> 
> Only if fuse_default_permissions is set (which is a misnomer since it
> affects more than just ceph-fuse). It's not set by default, but probably
> should be.
> 
> 
> > > Isn't this potentially silent metadata corruption? You could end up with
> > > a program that issues an ownership change that appears to work, only to
> > > find later that it got reverted because the mds decided it didn't want
> > > to apply it once the caps got updated.
> > > 
> > > I think the simplest solution would be to just not issue exclusive caps
> > > at all if you aren't prepared to accept cached updates from the client
> > > for the corresponding metadata. Those clients would get shared caps
> > > only. So, you could have clients to which you allow Fx caps, but only
> > > As?
> > 
> > That would be safe, but it would kill performance for any user writing 
> > files and restricted to a particular user by their caps.  I think it's 
> > definitely worth maintaining the Client code do the checks so that 
> > unpermitted updates aren't dropped... which, if I'm not mistaken, 
> > is already in place.
> > 
> 
> I don't see why that would kill performance. chown/chgrp/chmod would all
> come under Auth caps, whereas read/write (which is where we really care
> about performance) come under File caps.
> 
> chown/chgrp/chmod are all pretty rare in most workloads, so having to
> call the mds to apply them in those cases doesn't seem like a heavy
> burden.

It's tar xf that I'm worried about.. that'll do a create following by 
chmod, maybe chown, and then file write, which are currently all flushed 
async, and would now turn synchronous.

But AFAICS it's not needed, since both the kernel and userspace clients 
are doing the client-side check...

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9)
  2016-11-11 15:07           ` Sage Weil
@ 2016-11-11 15:39             ` Jeff Layton
  2016-11-11 16:02               ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2016-11-11 15:39 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov

On Fri, 2016-11-11 at 15:07 +0000, Sage Weil wrote:
> On Fri, 11 Nov 2016, Jeff Layton wrote:
> > On Fri, 2016-11-11 at 14:48 +0000, Sage Weil wrote:
> > > On Fri, 11 Nov 2016, Jeff Layton wrote:
> > > > On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> > > > > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > > > > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > > > > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > > > > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > > > > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > >> > > >> > > > > >
> > > > > 
> > > > > [okay, time to prune this a bit]
> > > > > 
> > > > > > >> It still seems to me like that should just be a check for superuser
> > > > > > >> status. Something like:
> > > > > > >>
> > > > > > >>       if (mask & MAY_CHOWN) {
> > > > > > >>       // only root can chown
> > > > > > >>         if (i->match.uid != 0 || caller_uid != 0)
> > > > > > >>           continue;
> > > > > > >>         }
> > > > > > >>       }
> > > > > > >>
> > > > > > >> i.e. only allow chown if the capability has a uid of 0 and the
> > > > > > >> caller_uid is also 0.
> > > > > > >>
> > > > > > >> I don't think we want to ever grant an unprivileged user the ability to
> > > > > > >> chown, do we?
> > > > > > >
> > > > > > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > > > > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > > > > > MAY_WRITE which looks wrong too...
> > > > > > 
> > > > > > Don't we want to let users chown between their own UIDs? A POSIX
> > > > > > superuser — ie root — really has very little meaning in terms of CephX
> > > > > > permissions. But it's perfectly legitimate for a tenant with 3 users
> > > > > > to chmod files between those 3. :/
> > > > > 
> > > > > Maybe, but it'd be a different kind of check though, because it depends on 
> > > > > multiple caps in order to permit the operation.  So we'd need a couple 
> > > > > function-global bools like chown_src_allowed and chown_dst_allowed or 
> > > > > something like that.
> > > > > 
> > > > > I'm not sure it would work even then, though, because on the client the 
> > > > > user would still have to sudo chown ... to get past the client kernel's 
> > > > > checks (I assume?).
> > > > > 
> > > > > 
> > > > 
> > > > Not necessarily. Most kernel filesystems call setattr_prepare, which is
> > > > where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
> > > > always delegates the permission checking for setattr to the server.
> > > > 
> > > > Looks like both the in-kernel and Ceph client and FUSE always call that
> > > > function, so CAP_CHOWN permissions are always checked client-side there.
> > > > Still, with libcephfs a program need not deal with the kernel at all, so
> > > > you can't really rely on that.
> > > 
> > > In the libcephfs case it's still doing the Client.cc checks, though, 
> > > right?  Both the Linux VFS (on ceph.ko) and Client.cc are implementing 
> > > client-side checks...
> > > 
> > 
> > Only if fuse_default_permissions is set (which is a misnomer since it
> > affects more than just ceph-fuse). It's not set by default, but probably
> > should be.
> > 
> > 
> > > > Isn't this potentially silent metadata corruption? You could end up with
> > > > a program that issues an ownership change that appears to work, only to
> > > > find later that it got reverted because the mds decided it didn't want
> > > > to apply it once the caps got updated.
> > > > 
> > > > I think the simplest solution would be to just not issue exclusive caps
> > > > at all if you aren't prepared to accept cached updates from the client
> > > > for the corresponding metadata. Those clients would get shared caps
> > > > only. So, you could have clients to which you allow Fx caps, but only
> > > > As?
> > > 
> > > That would be safe, but it would kill performance for any user writing 
> > > files and restricted to a particular user by their caps.  I think it's 
> > > definitely worth maintaining the Client code do the checks so that 
> > > unpermitted updates aren't dropped... which, if I'm not mistaken, 
> > > is already in place.
> > > 
> > 
> > I don't see why that would kill performance. chown/chgrp/chmod would all
> > come under Auth caps, whereas read/write (which is where we really care
> > about performance) come under File caps.
> > 
> > chown/chgrp/chmod are all pretty rare in most workloads, so having to
> > call the mds to apply them in those cases doesn't seem like a heavy
> > burden.
> 
> It's tar xf that I'm worried about.. that'll do a create following by 
> chmod, maybe chown, and then file write, which are currently all flushed 
> async, and would now turn synchronous.
> 

Good point, that workload would suck there.

> But AFAICS it's not needed, since both the kernel and userspace clients 
> are doing the client-side check...
> 
> 

Erm...that's just it though...all the client is doing is enforcing
normal POSIX permissions, but those don't necessarily align exactly with
the ceph user permissions.

Can we end up in a situation where the POSIX permissions would allow
root to chown, but then the client ends up not being able to do it
because root's permissions are limited?

If not, do we need the client to try and enforce those as well? 

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9)
  2016-11-11 15:39             ` Jeff Layton
@ 2016-11-11 16:02               ` Sage Weil
  2016-11-12  1:16                 ` Jeff Layton
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-11-11 16:02 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7307 bytes --]

On Fri, 11 Nov 2016, Jeff Layton wrote:
> On Fri, 2016-11-11 at 15:07 +0000, Sage Weil wrote:
> > On Fri, 11 Nov 2016, Jeff Layton wrote:
> > > On Fri, 2016-11-11 at 14:48 +0000, Sage Weil wrote:
> > > > On Fri, 11 Nov 2016, Jeff Layton wrote:
> > > > > On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> > > > > > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > > > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > > > > > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > > > > > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > > > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > > > > > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > > > > > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > >> > > >> > > > > >
> > > > > > 
> > > > > > [okay, time to prune this a bit]
> > > > > > 
> > > > > > > >> It still seems to me like that should just be a check for superuser
> > > > > > > >> status. Something like:
> > > > > > > >>
> > > > > > > >>       if (mask & MAY_CHOWN) {
> > > > > > > >>       // only root can chown
> > > > > > > >>         if (i->match.uid != 0 || caller_uid != 0)
> > > > > > > >>           continue;
> > > > > > > >>         }
> > > > > > > >>       }
> > > > > > > >>
> > > > > > > >> i.e. only allow chown if the capability has a uid of 0 and the
> > > > > > > >> caller_uid is also 0.
> > > > > > > >>
> > > > > > > >> I don't think we want to ever grant an unprivileged user the ability to
> > > > > > > >> chown, do we?
> > > > > > > >
> > > > > > > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > > > > > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > > > > > > MAY_WRITE which looks wrong too...
> > > > > > > 
> > > > > > > Don't we want to let users chown between their own UIDs? A POSIX
> > > > > > > superuser — ie root — really has very little meaning in terms of CephX
> > > > > > > permissions. But it's perfectly legitimate for a tenant with 3 users
> > > > > > > to chmod files between those 3. :/
> > > > > > 
> > > > > > Maybe, but it'd be a different kind of check though, because it depends on 
> > > > > > multiple caps in order to permit the operation.  So we'd need a couple 
> > > > > > function-global bools like chown_src_allowed and chown_dst_allowed or 
> > > > > > something like that.
> > > > > > 
> > > > > > I'm not sure it would work even then, though, because on the client the 
> > > > > > user would still have to sudo chown ... to get past the client kernel's 
> > > > > > checks (I assume?).
> > > > > > 
> > > > > > 
> > > > > 
> > > > > Not necessarily. Most kernel filesystems call setattr_prepare, which is
> > > > > where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
> > > > > always delegates the permission checking for setattr to the server.
> > > > > 
> > > > > Looks like both the in-kernel and Ceph client and FUSE always call that
> > > > > function, so CAP_CHOWN permissions are always checked client-side there.
> > > > > Still, with libcephfs a program need not deal with the kernel at all, so
> > > > > you can't really rely on that.
> > > > 
> > > > In the libcephfs case it's still doing the Client.cc checks, though, 
> > > > right?  Both the Linux VFS (on ceph.ko) and Client.cc are implementing 
> > > > client-side checks...
> > > > 
> > > 
> > > Only if fuse_default_permissions is set (which is a misnomer since it
> > > affects more than just ceph-fuse). It's not set by default, but probably
> > > should be.
> > > 
> > > 
> > > > > Isn't this potentially silent metadata corruption? You could end up with
> > > > > a program that issues an ownership change that appears to work, only to
> > > > > find later that it got reverted because the mds decided it didn't want
> > > > > to apply it once the caps got updated.
> > > > > 
> > > > > I think the simplest solution would be to just not issue exclusive caps
> > > > > at all if you aren't prepared to accept cached updates from the client
> > > > > for the corresponding metadata. Those clients would get shared caps
> > > > > only. So, you could have clients to which you allow Fx caps, but only
> > > > > As?
> > > > 
> > > > That would be safe, but it would kill performance for any user writing 
> > > > files and restricted to a particular user by their caps.  I think it's 
> > > > definitely worth maintaining the Client code do the checks so that 
> > > > unpermitted updates aren't dropped... which, if I'm not mistaken, 
> > > > is already in place.
> > > > 
> > > 
> > > I don't see why that would kill performance. chown/chgrp/chmod would all
> > > come under Auth caps, whereas read/write (which is where we really care
> > > about performance) come under File caps.
> > > 
> > > chown/chgrp/chmod are all pretty rare in most workloads, so having to
> > > call the mds to apply them in those cases doesn't seem like a heavy
> > > burden.
> > 
> > It's tar xf that I'm worried about.. that'll do a create following by 
> > chmod, maybe chown, and then file write, which are currently all flushed 
> > async, and would now turn synchronous.
> > 
> 
> Good point, that workload would suck there.
> 
> > But AFAICS it's not needed, since both the kernel and userspace clients 
> > are doing the client-side check...
> > 
> > 
> 
> Erm...that's just it though...all the client is doing is enforcing
> normal POSIX permissions, but those don't necessarily align exactly with
> the ceph user permissions.
> 
> Can we end up in a situation where the POSIX permissions would allow
> root to chown, but then the client ends up not being able to do it
> because root's permissions are limited?
> 
> If not, do we need the client to try and enforce those as well? 

Ah, right.  The 'sudo chown ...' case is the trivial example, I think.

We've tried to keep the capabilities opaque to the client until now, and I 
think that's still probably the right choice... I wouldn't want to 
deal with it in the kernel case.

Even if the async metadata update returned an error to the client, they'd 
only see it on, say, fsync (or maybe close, sometimes).  I don't think 
that's particularly helpful.

Perhaps we could communicate a few "imporant" flags to the client about 
it's permissions.  In this case, a flag indicating there are restrictions 
on uid/gid, so that any inode updates touching those fields should be done 
synchronously...

Or, we could separate out the authlock caps into two locks, one with 
uid/gid and one with mode.  This is a case where it's actually harmful to 
conflate them.

Are there cases where a mode change would have to be validated by the MDS?  
Like, setting the setuid bit or something?  I don't think we are doing 
anything special on the MDS side there, but maybe we should be?

sage


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9)
  2016-11-11 16:02               ` Sage Weil
@ 2016-11-12  1:16                 ` Jeff Layton
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2016-11-12  1:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel, Ilya Dryomov

On Fri, 2016-11-11 at 16:02 +0000, Sage Weil wrote:
> On Fri, 11 Nov 2016, Jeff Layton wrote:
> > On Fri, 2016-11-11 at 15:07 +0000, Sage Weil wrote:
> > > On Fri, 11 Nov 2016, Jeff Layton wrote:
> > > > On Fri, 2016-11-11 at 14:48 +0000, Sage Weil wrote:
> > > > > On Fri, 11 Nov 2016, Jeff Layton wrote:
> > > > > > On Mon, 2016-11-07 at 23:21 +0000, Sage Weil wrote:
> > > > > > > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > > > > On Mon, Nov 7, 2016 at 1:21 PM, Sage Weil <sweil@redhat.com> wrote:
> > > > > > > > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > > > >> On Mon, 2016-11-07 at 20:09 +0000, Sage Weil wrote:
> > > > > > > > >> > On Mon, 7 Nov 2016, Gregory Farnum wrote:
> > > > > > > > >> > > On Mon, Nov 7, 2016 at 10:39 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > >> > > > On Mon, 2016-11-07 at 14:36 +0000, Sage Weil wrote:
> > > > > > > > >> > > >> On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > > > >> > > >> > On Mon, 2016-11-07 at 14:05 +0000, Sage Weil wrote:
> > > > > > > > >> > > >> > > On Mon, 7 Nov 2016, Jeff Layton wrote:
> > > > > > > > >> > > >> > > > On Mon, 2016-11-07 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > > >> > > >> > > > > On Fri, Nov 4, 2016 at 8:57 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > >> > > >> > > > > >
> > > > > > > 
> > > > > > > [okay, time to prune this a bit]
> > > > > > > 
> > > > > > > > >> It still seems to me like that should just be a check for superuser
> > > > > > > > >> status. Something like:
> > > > > > > > >>
> > > > > > > > >>       if (mask & MAY_CHOWN) {
> > > > > > > > >>       // only root can chown
> > > > > > > > >>         if (i->match.uid != 0 || caller_uid != 0)
> > > > > > > > >>           continue;
> > > > > > > > >>         }
> > > > > > > > >>       }
> > > > > > > > >>
> > > > > > > > >> i.e. only allow chown if the capability has a uid of 0 and the
> > > > > > > > >> caller_uid is also 0.
> > > > > > > > >>
> > > > > > > > >> I don't think we want to ever grant an unprivileged user the ability to
> > > > > > > > >> chown, do we?
> > > > > > > > >
> > > > > > > > > Ah, yep.  Except that the Locker.cc caller needs to be fixed to only ask
> > > > > > > > > for MAY_CHOWN if the uid is changing.  Right now it's only passing
> > > > > > > > > MAY_WRITE which looks wrong too...
> > > > > > > > 
> > > > > > > > Don't we want to let users chown between their own UIDs? A POSIX
> > > > > > > > superuser — ie root — really has very little meaning in terms of CephX
> > > > > > > > permissions. But it's perfectly legitimate for a tenant with 3 users
> > > > > > > > to chmod files between those 3. :/
> > > > > > > 
> > > > > > > Maybe, but it'd be a different kind of check though, because it depends on 
> > > > > > > multiple caps in order to permit the operation.  So we'd need a couple 
> > > > > > > function-global bools like chown_src_allowed and chown_dst_allowed or 
> > > > > > > something like that.
> > > > > > > 
> > > > > > > I'm not sure it would work even then, though, because on the client the 
> > > > > > > user would still have to sudo chown ... to get past the client kernel's 
> > > > > > > checks (I assume?).
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > Not necessarily. Most kernel filesystems call setattr_prepare, which is
> > > > > > where the CAP_CHOWN check happens, but (e.g.) NFS doesn't call it as it
> > > > > > always delegates the permission checking for setattr to the server.
> > > > > > 
> > > > > > Looks like both the in-kernel and Ceph client and FUSE always call that
> > > > > > function, so CAP_CHOWN permissions are always checked client-side there.
> > > > > > Still, with libcephfs a program need not deal with the kernel at all, so
> > > > > > you can't really rely on that.
> > > > > 
> > > > > In the libcephfs case it's still doing the Client.cc checks, though, 
> > > > > right?  Both the Linux VFS (on ceph.ko) and Client.cc are implementing 
> > > > > client-side checks...
> > > > > 
> > > > 
> > > > Only if fuse_default_permissions is set (which is a misnomer since it
> > > > affects more than just ceph-fuse). It's not set by default, but probably
> > > > should be.
> > > > 
> > > > 
> > > > > > Isn't this potentially silent metadata corruption? You could end up with
> > > > > > a program that issues an ownership change that appears to work, only to
> > > > > > find later that it got reverted because the mds decided it didn't want
> > > > > > to apply it once the caps got updated.
> > > > > > 
> > > > > > I think the simplest solution would be to just not issue exclusive caps
> > > > > > at all if you aren't prepared to accept cached updates from the client
> > > > > > for the corresponding metadata. Those clients would get shared caps
> > > > > > only. So, you could have clients to which you allow Fx caps, but only
> > > > > > As?
> > > > > 
> > > > > That would be safe, but it would kill performance for any user writing 
> > > > > files and restricted to a particular user by their caps.  I think it's 
> > > > > definitely worth maintaining the Client code do the checks so that 
> > > > > unpermitted updates aren't dropped... which, if I'm not mistaken, 
> > > > > is already in place.
> > > > > 
> > > > 
> > > > I don't see why that would kill performance. chown/chgrp/chmod would all
> > > > come under Auth caps, whereas read/write (which is where we really care
> > > > about performance) come under File caps.
> > > > 
> > > > chown/chgrp/chmod are all pretty rare in most workloads, so having to
> > > > call the mds to apply them in those cases doesn't seem like a heavy
> > > > burden.
> > > 
> > > It's tar xf that I'm worried about.. that'll do a create following by 
> > > chmod, maybe chown, and then file write, which are currently all flushed 
> > > async, and would now turn synchronous.
> > > 
> > 
> > Good point, that workload would suck there.
> > 
> > > But AFAICS it's not needed, since both the kernel and userspace clients 
> > > are doing the client-side check...
> > > 
> > > 
> > 
> > Erm...that's just it though...all the client is doing is enforcing
> > normal POSIX permissions, but those don't necessarily align exactly with
> > the ceph user permissions.
> > 
> > Can we end up in a situation where the POSIX permissions would allow
> > root to chown, but then the client ends up not being able to do it
> > because root's permissions are limited?
> > 
> > If not, do we need the client to try and enforce those as well? 
> 
> Ah, right.  The 'sudo chown ...' case is the trivial example, I think.
> 
> We've tried to keep the capabilities opaque to the client until now, and I 
> think that's still probably the right choice... I wouldn't want to 
> deal with it in the kernel case.
> 
> Even if the async metadata update returned an error to the client, they'd 
> only see it on, say, fsync (or maybe close, sometimes).  I don't think 
> that's particularly helpful.
> 
> Perhaps we could communicate a few "imporant" flags to the client about 
> it's permissions.  In this case, a flag indicating there are restrictions 
> on uid/gid, so that any inode updates touching those fields should be done 
> synchronously...
> 
> Or, we could separate out the authlock caps into two locks, one with 
> uid/gid and one with mode.  This is a case where it's actually harmful to 
> conflate them.
> 

That seems like it would be the cleanest solution. Then you could just
not hand out exclusive caps for the uid/gid to those clients. Mode
changes could still be cached. 

That's a bit more work to engineer though, and it's probably also worth
thinking about whether we need to move other metadata to different caps
as well.

On a slightly related note, in hindsight the btime might have been
better under xattr caps. It's probably not too late for that if we want
to make that change since kraken hasn't shipped yet, and the kernel
patches are on hold.

> Are there cases where a mode change would have to be validated by the MDS?  
> Like, setting the setuid bit or something?  I don't think we are doing 
> anything special on the MDS side there, but maybe we should be?
> 
> 

Yeah. Probably the MDS ought to clear the setuid and setgid bits
whenever there is an ownership change (user or group). POSIX requires
that, IIRC.

I'll look over the code to make sure we aren't already doing that and
open a bug if not.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-11-12  1:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-07 21:21 [RFC PATCH 07/10] ceph: update cap message struct version to 9 Sage Weil
2016-11-07 21:51 ` Jeff Layton
2016-11-07 23:15 ` Gregory Farnum
2016-11-07 23:21   ` Sage Weil
2016-11-11 12:45     ` Jeff Layton
2016-11-11 14:48       ` Sage Weil
2016-11-11 15:00         ` handling MDS permission check failures in cap updates (was: [RFC PATCH 07/10] ceph: update cap message struct version to 9) Jeff Layton
2016-11-11 15:07           ` Sage Weil
2016-11-11 15:39             ` Jeff Layton
2016-11-11 16:02               ` Sage Weil
2016-11-12  1:16                 ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.