All of lore.kernel.org
 help / color / mirror / Atom feed
* [ksmbd] racy uses of ->d_parent and ->d_name
@ 2022-01-31  1:57 Al Viro
  2022-02-02 23:16 ` Namjae Jeon
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2022-01-31  1:57 UTC (permalink / raw)
  To: linux-cifs; +Cc: linux-fsdevel

	Folks, ->d_name and ->d_parent are *NOT* stable unless the
appropriate locks are held.  In particular, locking a directory that
might not be our parent is obviously not going to prevent anything.
Even if it had been our parent at some earlier point.

	->d_lock would suffice, but it can't be held over blocking
operation and it can't be held over dcache lookups anyway (instant
deadlocks).  IOW, the following is racy:

int ksmbd_vfs_lock_parent(struct user_namespace *user_ns, struct dentry *parent,
                          struct dentry *child)
{
        struct dentry *dentry;
        int ret = 0;

        inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
        dentry = lookup_one(user_ns, child->d_name.name, parent,
                            child->d_name.len);
        if (IS_ERR(dentry)) {
                ret = PTR_ERR(dentry);
                goto out_err;
        }

        if (dentry != child) {
                ret = -ESTALE;
                dput(dentry);
                goto out_err;
        }

        dput(dentry);
        return 0;
out_err:
        inode_unlock(d_inode(parent));
        return ret;
}


	Some of that might be fixable - verifying that ->d_parent points
to parent immediately after inode_lock would stabilize ->d_name in case
of match.  However, a quick look through the callers shows e.g. this:
                write_lock(&ci->m_lock);
                if (ci->m_flags & (S_DEL_ON_CLS | S_DEL_PENDING)) {
                        dentry = filp->f_path.dentry;
                        dir = dentry->d_parent;
                        ci->m_flags &= ~(S_DEL_ON_CLS | S_DEL_PENDING);
                        write_unlock(&ci->m_lock);
                        ksmbd_vfs_unlink(file_mnt_user_ns(filp), dir, dentry);
                        write_lock(&ci->m_lock);
                }
                write_unlock(&ci->m_lock);

	What's to keep dir from getting freed right under us, just as
ksmbd_vfs_lock_parent() (from ksmbd_vfs_unlink()) tries to grab ->i_rwsem
on its inode?

	Have the file moved to other directory and apply memory pressure.
What's to prevent dir from being evicted, its memory recycled, etc.?

	For another fun example, consider e.g. smb2_rename():
                if (file_present &&
                    strncmp(old_name, path.dentry->d_name.name, strlen(old_name))) {
                        rc = -EEXIST;
                        ksmbd_debug(SMB,
                                    "cannot rename already existing file\n");
                        goto out;
                }

Suppose path.dentry has a name longer than 32 bytes (i.e. too large to
fit into ->d_iname and thus allocated separately).  At this point you
are not holding any locks (otherwise ksmbd_vfs_fp_rename() immediately
downstream would deadlock).  So what's to prevent rename(2) on host
ending up with path.dentry getting renamed and old name getting freed?

	More of the same: ksmbd_vfs_fp_rename().  In this one
dget_parent() will at least avoid parent getting freed.  It won't do
a damn thing to stabilize src_dent->d_name after lock_rename(),
though, since we are not guaranteed that the thing we locked is
still the parent...

	Why is so much tied to "open, then figure out where it is" model?
Is it a legacy of userland implementation, or a network fs protocol that
manages to outsuck NFS, or...?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ksmbd] racy uses of ->d_parent and ->d_name
  2022-01-31  1:57 [ksmbd] racy uses of ->d_parent and ->d_name Al Viro
@ 2022-02-02 23:16 ` Namjae Jeon
  2022-02-03  4:02   ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Namjae Jeon @ 2022-02-02 23:16 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-cifs, linux-fsdevel

2022-01-31 10:57 GMT+09:00, Al Viro <viro@zeniv.linux.org.uk>:
> 	Folks, ->d_name and ->d_parent are *NOT* stable unless the
> appropriate locks are held.  In particular, locking a directory that
> might not be our parent is obviously not going to prevent anything.
> Even if it had been our parent at some earlier point.
Hi Al,

First, Thanks for pointing that out!
>
> 	->d_lock would suffice, but it can't be held over blocking
> operation and it can't be held over dcache lookups anyway (instant
> deadlocks).  IOW, the following is racy:
>
> int ksmbd_vfs_lock_parent(struct user_namespace *user_ns, struct dentry
> *parent,
>                           struct dentry *child)
> {
>         struct dentry *dentry;
>         int ret = 0;
>
>         inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>         dentry = lookup_one(user_ns, child->d_name.name, parent,
>                             child->d_name.len);
>         if (IS_ERR(dentry)) {
>                 ret = PTR_ERR(dentry);
>                 goto out_err;
>         }
>
>         if (dentry != child) {
>                 ret = -ESTALE;
>                 dput(dentry);
>                 goto out_err;
>         }
>
>         dput(dentry);
>         return 0;
> out_err:
>         inode_unlock(d_inode(parent));
>         return ret;
> }
>
>
> 	Some of that might be fixable - verifying that ->d_parent points
> to parent immediately after inode_lock would stabilize ->d_name in case
> of match.  However, a quick look through the callers shows e.g. this:
>                 write_lock(&ci->m_lock);
>                 if (ci->m_flags & (S_DEL_ON_CLS | S_DEL_PENDING)) {
>                         dentry = filp->f_path.dentry;
>                         dir = dentry->d_parent;
>                         ci->m_flags &= ~(S_DEL_ON_CLS | S_DEL_PENDING);
>                         write_unlock(&ci->m_lock);
>                         ksmbd_vfs_unlink(file_mnt_user_ns(filp), dir,
> dentry);
>                         write_lock(&ci->m_lock);
>                 }
>                 write_unlock(&ci->m_lock);
>
> 	What's to keep dir from getting freed right under us, just as
> ksmbd_vfs_lock_parent() (from ksmbd_vfs_unlink()) tries to grab ->i_rwsem
> on its inode?
Right. We need to get parent using dget_parent().
>
> 	Have the file moved to other directory and apply memory pressure.
> What's to prevent dir from being evicted, its memory recycled, etc.?
Let me check it.
>
> 	For another fun example, consider e.g. smb2_rename():
>                 if (file_present &&
>                     strncmp(old_name, path.dentry->d_name.name,
> strlen(old_name))) {
>                         rc = -EEXIST;
>                         ksmbd_debug(SMB,
>                                     "cannot rename already existing
> file\n");
>                         goto out;
>                 }
>
> Suppose path.dentry has a name longer than 32 bytes (i.e. too large to
> fit into ->d_iname and thus allocated separately).  At this point you
> are not holding any locks (otherwise ksmbd_vfs_fp_rename() immediately
> downstream would deadlock).  So what's to prevent rename(2) on host
> ending up with path.dentry getting renamed and old name getting freed?
>
> 	More of the same: ksmbd_vfs_fp_rename().  In this one
> dget_parent() will at least avoid parent getting freed.  It won't do
> a damn thing to stabilize src_dent->d_name after lock_rename(),
> though, since we are not guaranteed that the thing we locked is
> still the parent...
Okay, I will check it.
>
> 	Why is so much tied to "open, then figure out where it is" model?
> Is it a legacy of userland implementation, or a network fs protocol that
> manages to outsuck NFS, or...?
It need to use absolute based path given from request.

Thanks!
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ksmbd] racy uses of ->d_parent and ->d_name
  2022-02-02 23:16 ` Namjae Jeon
@ 2022-02-03  4:02   ` Al Viro
  2022-02-03  4:29     ` Jeremy Allison
  2022-02-03  8:00     ` Namjae Jeon
  0 siblings, 2 replies; 6+ messages in thread
From: Al Viro @ 2022-02-03  4:02 UTC (permalink / raw)
  To: Namjae Jeon; +Cc: linux-cifs, linux-fsdevel

On Thu, Feb 03, 2022 at 08:16:21AM +0900, Namjae Jeon wrote:

> > 	Why is so much tied to "open, then figure out where it is" model?
> > Is it a legacy of userland implementation, or a network fs protocol that
> > manages to outsuck NFS, or...?
> It need to use absolute based path given from request.

Er... yes?  Normal syscalls also have to be able to deal with pathnames;
the sane way for e.g. unlink() is to traverse everything except the last
component, then do inode_lock() on the directory you've arrived at, do
lookup for the final component and do the operation.

What we do not attempt is "walk the entire path, then try to lock the
parent of whatever we'd arrived at, then do operation".  Otherwise
we'd have the same kind of headache in all directory-manipulating
syscalls...

What's the problem with doing the same thing here?  Lack of convenient
exported helpers for some of those?  Then let's sort _that_ out...
If there's something else, I'd like to know what exactly it is.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ksmbd] racy uses of ->d_parent and ->d_name
  2022-02-03  4:02   ` Al Viro
@ 2022-02-03  4:29     ` Jeremy Allison
  2022-02-03  7:10       ` Al Viro
  2022-02-03  8:00     ` Namjae Jeon
  1 sibling, 1 reply; 6+ messages in thread
From: Jeremy Allison @ 2022-02-03  4:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Namjae Jeon, linux-cifs, linux-fsdevel

On Thu, Feb 03, 2022 at 04:02:45AM +0000, Al Viro wrote:
>On Thu, Feb 03, 2022 at 08:16:21AM +0900, Namjae Jeon wrote:
>
>> > 	Why is so much tied to "open, then figure out where it is" model?
>> > Is it a legacy of userland implementation, or a network fs protocol that
>> > manages to outsuck NFS, or...?
>> It need to use absolute based path given from request.
>
>Er... yes?  Normal syscalls also have to be able to deal with pathnames;
>the sane way for e.g. unlink() is to traverse everything except the last
>component, then do inode_lock() on the directory you've arrived at, do
>lookup for the final component and do the operation.
>
>What we do not attempt is "walk the entire path, then try to lock the
>parent of whatever we'd arrived at, then do operation".  Otherwise
>we'd have the same kind of headache in all directory-manipulating
>syscalls...
>
>What's the problem with doing the same thing here?  Lack of convenient
>exported helpers for some of those?  Then let's sort _that_ out...
>If there's something else, I'd like to know what exactly it is.

Samba recently did a complete VFS rewrite to remove almost
*all* pathname-based calls for exactly this reason (remove
all possibility of symlink races).

https://www.samba.org/samba/security/CVE-2021-20316.html

Is this essentially the same bug affecting ksmbd here ?

If so, the pathname processing will need to be re-worked
in the same way we had to do for Samba.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ksmbd] racy uses of ->d_parent and ->d_name
  2022-02-03  4:29     ` Jeremy Allison
@ 2022-02-03  7:10       ` Al Viro
  0 siblings, 0 replies; 6+ messages in thread
From: Al Viro @ 2022-02-03  7:10 UTC (permalink / raw)
  To: Jeremy Allison; +Cc: Namjae Jeon, linux-cifs, linux-fsdevel

On Wed, Feb 02, 2022 at 08:29:42PM -0800, Jeremy Allison wrote:
> On Thu, Feb 03, 2022 at 04:02:45AM +0000, Al Viro wrote:
> > On Thu, Feb 03, 2022 at 08:16:21AM +0900, Namjae Jeon wrote:
> > 
> > > > 	Why is so much tied to "open, then figure out where it is" model?
> > > > Is it a legacy of userland implementation, or a network fs protocol that
> > > > manages to outsuck NFS, or...?
> > > It need to use absolute based path given from request.
> > 
> > Er... yes?  Normal syscalls also have to be able to deal with pathnames;
> > the sane way for e.g. unlink() is to traverse everything except the last
> > component, then do inode_lock() on the directory you've arrived at, do
> > lookup for the final component and do the operation.
> > 
> > What we do not attempt is "walk the entire path, then try to lock the
> > parent of whatever we'd arrived at, then do operation".  Otherwise
> > we'd have the same kind of headache in all directory-manipulating
> > syscalls...
> > 
> > What's the problem with doing the same thing here?  Lack of convenient
> > exported helpers for some of those?  Then let's sort _that_ out...
> > If there's something else, I'd like to know what exactly it is.
> 
> Samba recently did a complete VFS rewrite to remove almost
> *all* pathname-based calls for exactly this reason (remove
> all possibility of symlink races).
> 
> https://www.samba.org/samba/security/CVE-2021-20316.html
> 
> Is this essentially the same bug affecting ksmbd here ?

It's not about symlinks; it's about rename racing with pathname
resolution and tearing the object you've looked up away from the
directory you are preparing to modify.

ksmbd does pathwalk + attempt to lock the parent + check if we
had been moved away while we'd been grabbing the lock.  It *can*
be made to work, but it's bloody painful - you need to grab
a reference to parent (take a look at the games dget_parent()
has to do, and that one has a luxury of not needing to grab a blocking
lock), *then* you need to lock it, check that our object is still
its child (child->d_parent might change, but its comparison with
dentry of locked directory is stable), then check that child is
still hashed and either proceed with the operation, or unlock
the directory you'd locked, drop the reference to it and repeat
the entire dance starting at dget_parent().

It's doable, but really unpleasant.  A much simpler approach is
to find the parent as we are looking for child, lock it and
then look for child in an already locked directory.  Which
guarantees the stability of name and parent of whatever you
find there, for as long as directory remains locked.  None of
such retry loops are needed (and dget_parent() *is* that
internally), no need to bother about unexpected errors, etc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ksmbd] racy uses of ->d_parent and ->d_name
  2022-02-03  4:02   ` Al Viro
  2022-02-03  4:29     ` Jeremy Allison
@ 2022-02-03  8:00     ` Namjae Jeon
  1 sibling, 0 replies; 6+ messages in thread
From: Namjae Jeon @ 2022-02-03  8:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-cifs, linux-fsdevel

2022-02-03 13:02 GMT+09:00, Al Viro <viro@zeniv.linux.org.uk>:
> On Thu, Feb 03, 2022 at 08:16:21AM +0900, Namjae Jeon wrote:
>
>> > 	Why is so much tied to "open, then figure out where it is" model?
>> > Is it a legacy of userland implementation, or a network fs protocol
>> > that
>> > manages to outsuck NFS, or...?
>> It need to use absolute based path given from request.
>
> Er... yes?  Normal syscalls also have to be able to deal with pathnames;
> the sane way for e.g. unlink() is to traverse everything except the last
> component, then do inode_lock() on the directory you've arrived at, do
> lookup for the final component and do the operation.
>
> What we do not attempt is "walk the entire path, then try to lock the
> parent of whatever we'd arrived at, then do operation".  Otherwise
> we'd have the same kind of headache in all directory-manipulating
> syscalls...
>
> What's the problem with doing the same thing here?  Lack of convenient
> exported helpers for some of those?  Then let's sort _that_ out...
> If there's something else, I'd like to know what exactly it is.
I have fully understood what you point out. if filename_parentat() and
__lookup_hash() can be exported in vfs, they can be used in ksmbd for
this issue.
I'll check it more and get your ack if I need more change in vfs.

Thanks!
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-02-03  8:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-31  1:57 [ksmbd] racy uses of ->d_parent and ->d_name Al Viro
2022-02-02 23:16 ` Namjae Jeon
2022-02-03  4:02   ` Al Viro
2022-02-03  4:29     ` Jeremy Allison
2022-02-03  7:10       ` Al Viro
2022-02-03  8:00     ` Namjae Jeon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.