* [PATCHSET] AF_UNIX bind(2) cleanup on failures after mknod @ 2021-06-19 3:48 Al Viro 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2021-06-19 3:48 UTC (permalink / raw) To: netdev; +Cc: David Miller Failures in unix_bind() coming after we'd created a socket node in filesystem are possible and can leave an unpleasant mess behind. That had been reported back in February by Denis Kirjanov, but his proposed solution had holes. This is a rebase of the series I'd ended up with back then; I'd prefer to have it go through the regular net.git path. 8 commits total, preliminary massage in the first 6, the fix is #7 and #8 is a minor followup cleanup. Branch is based at net.git/master, lives in vfs.git #misc.af_unix. Individual patches in followups. Al Viro (8): af_unix: take address assignment/hash insertion into a new helper unix_bind(): allocate addr earlier unix_bind(): separate BSD and abstract cases unix_bind(): take BSD and abstract address cases into new helpers fold unix_mknod() into unix_bind_bsd() unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock unix_bind_bsd(): unlink if we fail after successful mknod __unix_find_socket_byname(): don't pass hash and type separately net/unix/af_unix.c | 188 +++++++++++++++++++++++++++-------------------------- 1 file changed, 96 insertions(+), 92 deletions(-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper 2021-06-19 3:48 [PATCHSET] AF_UNIX bind(2) cleanup on failures after mknod Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 2/8] unix_bind(): allocate addr earlier Al Viro ` (7 more replies) 0 siblings, 8 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller Duplicated logics in all bind variants (autobind, bind-to-path, bind-to-abstract) gets taken into a common helper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 4d4f24cbd86b..464473a78b05 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -262,6 +262,14 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk) sk_add_node(sk, list); } +static void __unix_set_addr(struct sock *sk, struct unix_address *addr, + unsigned hash) +{ + __unix_remove_socket(sk); + smp_store_release(&unix_sk(sk)->addr, addr); + __unix_insert_socket(&unix_socket_table[hash], sk); +} + static inline void unix_remove_socket(struct sock *sk) { spin_lock(&unix_table_lock); @@ -912,9 +920,7 @@ static int unix_autobind(struct socket *sock) } addr->hash ^= sk->sk_type; - __unix_remove_socket(sk); - smp_store_release(&u->addr, addr); - __unix_insert_socket(&unix_socket_table[addr->hash], sk); + __unix_set_addr(sk, addr, addr->hash); spin_unlock(&unix_table_lock); err = 0; @@ -1017,7 +1023,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) int err; unsigned int hash; struct unix_address *addr; - struct hlist_head *list; struct path path = { }; err = -EINVAL; @@ -1069,25 +1074,20 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); spin_lock(&unix_table_lock); u->path = path; - list = &unix_socket_table[hash]; } else { spin_lock(&unix_table_lock); err = -EADDRINUSE; if (__unix_find_socket_byname(net, sunaddr, addr_len, sk->sk_type, hash)) { + spin_unlock(&unix_table_lock); unix_release_addr(addr); - goto out_unlock; + goto out_up; } - - list = &unix_socket_table[addr->hash]; + hash = addr->hash; } err = 0; - __unix_remove_socket(sk); - smp_store_release(&u->addr, addr); - __unix_insert_socket(list, sk); - -out_unlock: + __unix_set_addr(sk, addr, hash); spin_unlock(&unix_table_lock); out_up: mutex_unlock(&u->bindlock); -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/8] unix_bind(): allocate addr earlier 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 3/8] unix_bind(): separate BSD and abstract cases Al Viro ` (6 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller makes it easier to massage; we do pay for that by extra work (kmalloc+memcpy+kfree) in some error cases, but those are not on the hot paths anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 464473a78b05..185f868db998 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1039,6 +1039,15 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (err < 0) goto out; addr_len = err; + err = -ENOMEM; + addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); + if (!addr) + goto out; + + memcpy(addr->name, sunaddr, addr_len); + addr->len = addr_len; + addr->hash = hash ^ sk->sk_type; + refcount_set(&addr->refcnt, 1); if (sun_path[0]) { umode_t mode = S_IFSOCK | @@ -1047,7 +1056,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (err) { if (err == -EEXIST) err = -EADDRINUSE; - goto out; + goto out_addr; } } @@ -1059,16 +1068,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (u->addr) goto out_up; - err = -ENOMEM; - addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); - if (!addr) - goto out_up; - - memcpy(addr->name, sunaddr, addr_len); - addr->len = addr_len; - addr->hash = hash ^ sk->sk_type; - refcount_set(&addr->refcnt, 1); - if (sun_path[0]) { addr->hash = UNIX_HASH_SIZE; hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); @@ -1080,20 +1079,23 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (__unix_find_socket_byname(net, sunaddr, addr_len, sk->sk_type, hash)) { spin_unlock(&unix_table_lock); - unix_release_addr(addr); goto out_up; } hash = addr->hash; } - err = 0; __unix_set_addr(sk, addr, hash); spin_unlock(&unix_table_lock); + addr = NULL; + err = 0; out_up: mutex_unlock(&u->bindlock); out_put: if (err) path_put(&path); +out_addr: + if (addr) + unix_release_addr(addr); out: return err; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/8] unix_bind(): separate BSD and abstract cases 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro 2021-06-19 3:50 ` [PATCH 2/8] unix_bind(): allocate addr earlier Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro ` (5 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller We do get some duplication that way, but it's minor compared to parts that are different. What we get is an ability to change locking in BSD case without making failure exits very hard to follow. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 55 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 34 insertions(+), 21 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 185f868db998..65a2545d969d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1023,7 +1023,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) int err; unsigned int hash; struct unix_address *addr; - struct path path = { }; err = -EINVAL; if (addr_len < offsetofend(struct sockaddr_un, sun_family) || @@ -1050,6 +1049,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) refcount_set(&addr->refcnt, 1); if (sun_path[0]) { + struct path path = { }; umode_t mode = S_IFSOCK | (SOCK_INODE(sock)->i_mode & ~current_umask()); err = unix_mknod(sun_path, mode, &path); @@ -1058,41 +1058,54 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) err = -EADDRINUSE; goto out_addr; } - } - err = mutex_lock_interruptible(&u->bindlock); - if (err) - goto out_put; + err = mutex_lock_interruptible(&u->bindlock); + if (err) { + path_put(&path); + goto out_addr; + } - err = -EINVAL; - if (u->addr) - goto out_up; + err = -EINVAL; + if (u->addr) { + mutex_unlock(&u->bindlock); + path_put(&path); + goto out_addr; + } - if (sun_path[0]) { addr->hash = UNIX_HASH_SIZE; hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); spin_lock(&unix_table_lock); u->path = path; + __unix_set_addr(sk, addr, hash); + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + addr = NULL; + err = 0; } else { + err = mutex_lock_interruptible(&u->bindlock); + if (err) + goto out_addr; + + err = -EINVAL; + if (u->addr) { + mutex_unlock(&u->bindlock); + goto out_addr; + } + spin_lock(&unix_table_lock); err = -EADDRINUSE; if (__unix_find_socket_byname(net, sunaddr, addr_len, sk->sk_type, hash)) { spin_unlock(&unix_table_lock); - goto out_up; + mutex_unlock(&u->bindlock); + goto out_addr; } - hash = addr->hash; + __unix_set_addr(sk, addr, addr->hash); + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + addr = NULL; + err = 0; } - - __unix_set_addr(sk, addr, hash); - spin_unlock(&unix_table_lock); - addr = NULL; - err = 0; -out_up: - mutex_unlock(&u->bindlock); -out_put: - if (err) - path_put(&path); out_addr: if (addr) unix_release_addr(addr); -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro 2021-06-19 3:50 ` [PATCH 2/8] unix_bind(): allocate addr earlier Al Viro 2021-06-19 3:50 ` [PATCH 3/8] unix_bind(): separate BSD and abstract cases Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 5/8] fold unix_mknod() into unix_bind_bsd() Al Viro ` (4 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller unix_bind_bsd() and unix_bind_abstract() respectively. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 147 +++++++++++++++++++++++++++-------------------------- 1 file changed, 74 insertions(+), 73 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 65a2545d969d..04ec7be04909 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1013,104 +1013,105 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res) return err; } +static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) +{ + struct unix_sock *u = unix_sk(sk); + struct path path = { }; + umode_t mode = S_IFSOCK | + (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); + unsigned int hash; + int err; + + err = unix_mknod(addr->name->sun_path, mode, &path); + if (err) + return err; + + err = mutex_lock_interruptible(&u->bindlock); + if (err) { + path_put(&path); + return err; + } + + if (u->addr) { + mutex_unlock(&u->bindlock); + path_put(&path); + return -EINVAL; + } + + addr->hash = UNIX_HASH_SIZE; + hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); + spin_lock(&unix_table_lock); + u->path = path; + __unix_set_addr(sk, addr, hash); + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return 0; +} + +static int unix_bind_abstract(struct sock *sk, unsigned hash, + struct unix_address *addr) +{ + struct unix_sock *u = unix_sk(sk); + int err; + + err = mutex_lock_interruptible(&u->bindlock); + if (err) + return err; + + if (u->addr) { + mutex_unlock(&u->bindlock); + return -EINVAL; + } + + spin_lock(&unix_table_lock); + if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len, + sk->sk_type, hash)) { + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return -EADDRINUSE; + } + __unix_set_addr(sk, addr, addr->hash); + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return 0; +} + static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; - struct net *net = sock_net(sk); - struct unix_sock *u = unix_sk(sk); struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr; char *sun_path = sunaddr->sun_path; int err; unsigned int hash; struct unix_address *addr; - err = -EINVAL; if (addr_len < offsetofend(struct sockaddr_un, sun_family) || sunaddr->sun_family != AF_UNIX) - goto out; + return -EINVAL; - if (addr_len == sizeof(short)) { - err = unix_autobind(sock); - goto out; - } + if (addr_len == sizeof(short)) + return unix_autobind(sock); err = unix_mkname(sunaddr, addr_len, &hash); if (err < 0) - goto out; + return err; addr_len = err; - err = -ENOMEM; addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); if (!addr) - goto out; + return -ENOMEM; memcpy(addr->name, sunaddr, addr_len); addr->len = addr_len; addr->hash = hash ^ sk->sk_type; refcount_set(&addr->refcnt, 1); - if (sun_path[0]) { - struct path path = { }; - umode_t mode = S_IFSOCK | - (SOCK_INODE(sock)->i_mode & ~current_umask()); - err = unix_mknod(sun_path, mode, &path); - if (err) { - if (err == -EEXIST) - err = -EADDRINUSE; - goto out_addr; - } - - err = mutex_lock_interruptible(&u->bindlock); - if (err) { - path_put(&path); - goto out_addr; - } - - err = -EINVAL; - if (u->addr) { - mutex_unlock(&u->bindlock); - path_put(&path); - goto out_addr; - } - - addr->hash = UNIX_HASH_SIZE; - hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); - spin_lock(&unix_table_lock); - u->path = path; - __unix_set_addr(sk, addr, hash); - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - addr = NULL; - err = 0; - } else { - err = mutex_lock_interruptible(&u->bindlock); - if (err) - goto out_addr; - - err = -EINVAL; - if (u->addr) { - mutex_unlock(&u->bindlock); - goto out_addr; - } - - spin_lock(&unix_table_lock); - err = -EADDRINUSE; - if (__unix_find_socket_byname(net, sunaddr, addr_len, - sk->sk_type, hash)) { - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - goto out_addr; - } - __unix_set_addr(sk, addr, addr->hash); - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - addr = NULL; - err = 0; - } -out_addr: - if (addr) + if (sun_path[0]) + err = unix_bind_bsd(sk, addr); + else + err = unix_bind_abstract(sk, hash, addr); + if (err) unix_release_addr(addr); -out: - return err; + return err == -EEXIST ? -EADDRINUSE : err; } static void unix_state_double_lock(struct sock *sk1, struct sock *sk2) -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/8] fold unix_mknod() into unix_bind_bsd() 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro ` (2 preceding siblings ...) 2021-06-19 3:50 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock Al Viro ` (3 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 42 +++++++++++++++++------------------------- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 04ec7be04909..d5ec0abd389d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -983,46 +983,38 @@ static struct sock *unix_find_other(struct net *net, return NULL; } -static int unix_mknod(const char *sun_path, umode_t mode, struct path *res) +static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) { + struct unix_sock *u = unix_sk(sk); + umode_t mode = S_IFSOCK | + (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); + struct path parent, path; + struct user_namespace *ns; // barf... struct dentry *dentry; - struct path path; - int err = 0; + unsigned int hash; + int err; + /* * Get the parent directory, calculate the hash for last * component. */ - dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0); - err = PTR_ERR(dentry); + dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, &parent, 0); if (IS_ERR(dentry)) - return err; + return PTR_ERR(dentry); + ns = mnt_user_ns(parent.mnt); /* * All right, let's create it. */ - err = security_path_mknod(&path, dentry, mode, 0); + err = security_path_mknod(&parent, dentry, mode, 0); if (!err) { - err = vfs_mknod(mnt_user_ns(path.mnt), d_inode(path.dentry), - dentry, mode, 0); + err = vfs_mknod(ns, d_inode(parent.dentry), dentry, mode, 0); if (!err) { - res->mnt = mntget(path.mnt); - res->dentry = dget(dentry); + path.mnt = mntget(parent.mnt); + path.dentry = dget(dentry); } } - done_path_create(&path, dentry); - return err; -} - -static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) -{ - struct unix_sock *u = unix_sk(sk); - struct path path = { }; - umode_t mode = S_IFSOCK | - (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); - unsigned int hash; - int err; - - err = unix_mknod(addr->name->sun_path, mode, &path); + done_path_create(&parent, dentry); if (err) return err; -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro ` (3 preceding siblings ...) 2021-06-19 3:50 ` [PATCH 5/8] fold unix_mknod() into unix_bind_bsd() Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 7/8] unix_bind_bsd(): unlink if we fail after successful mknod Al Viro ` (2 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller Final preparations for doing unlink on failure past the successful mknod. We can't hold ->bindlock over ->mknod() or ->unlink(), since either might do sb_start_write() (e.g. on overlayfs). However, we can do it while holding filesystem and VFS locks - doing kern_path_create() vfs_mknod() grab ->bindlock if u->addr had been set drop ->bindlock done_path_create return -EINVAL else assign the address to socket drop ->bindlock done_path_create return 0 would be deadlock-free. Here we massage unix_bind_bsd() to that form. We are still doing equivalent transformations. Next commit will *not* be an equivalent transformation - it will add a call of vfs_unlink() before done_path_create() in "alread bound" case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index d5ec0abd389d..3c389a6bf670 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -988,8 +988,8 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) struct unix_sock *u = unix_sk(sk); umode_t mode = S_IFSOCK | (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); - struct path parent, path; struct user_namespace *ns; // barf... + struct path parent; struct dentry *dentry; unsigned int hash; int err; @@ -1007,36 +1007,32 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) * All right, let's create it. */ err = security_path_mknod(&parent, dentry, mode, 0); - if (!err) { + if (!err) err = vfs_mknod(ns, d_inode(parent.dentry), dentry, mode, 0); - if (!err) { - path.mnt = mntget(parent.mnt); - path.dentry = dget(dentry); - } - } - done_path_create(&parent, dentry); - if (err) + if (err) { + done_path_create(&parent, dentry); return err; - + } err = mutex_lock_interruptible(&u->bindlock); if (err) { - path_put(&path); + done_path_create(&parent, dentry); return err; } - if (u->addr) { mutex_unlock(&u->bindlock); - path_put(&path); + done_path_create(&parent, dentry); return -EINVAL; } addr->hash = UNIX_HASH_SIZE; - hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); + hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1); spin_lock(&unix_table_lock); - u->path = path; + u->path.mnt = mntget(parent.mnt); + u->path.dentry = dget(dentry); __unix_set_addr(sk, addr, hash); spin_unlock(&unix_table_lock); mutex_unlock(&u->bindlock); + done_path_create(&parent, dentry); return 0; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 7/8] unix_bind_bsd(): unlink if we fail after successful mknod 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro ` (4 preceding siblings ...) 2021-06-19 3:50 ` [PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-19 3:50 ` [PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately Al Viro 2021-06-21 19:30 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper patchwork-bot+netdevbpf 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller We can do that more or less safely, since the parent is held locked all along. Yes, somebody might observe the object via dcache, only to have it disappear afterwards, but there's really no good way to prevent that. It won't race with other bind(2) or attempts to move the sucker elsewhere, or put something else in its place - locked parent prevents that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3c389a6bf670..fde3175aa3d3 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1009,20 +1009,13 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) err = security_path_mknod(&parent, dentry, mode, 0); if (!err) err = vfs_mknod(ns, d_inode(parent.dentry), dentry, mode, 0); - if (err) { - done_path_create(&parent, dentry); - return err; - } + if (err) + goto out; err = mutex_lock_interruptible(&u->bindlock); - if (err) { - done_path_create(&parent, dentry); - return err; - } - if (u->addr) { - mutex_unlock(&u->bindlock); - done_path_create(&parent, dentry); - return -EINVAL; - } + if (err) + goto out_unlink; + if (u->addr) + goto out_unlock; addr->hash = UNIX_HASH_SIZE; hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1); @@ -1034,6 +1027,16 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) mutex_unlock(&u->bindlock); done_path_create(&parent, dentry); return 0; + +out_unlock: + mutex_unlock(&u->bindlock); + err = -EINVAL; +out_unlink: + /* failed after successful mknod? unlink what we'd created... */ + vfs_unlink(ns, d_inode(parent.dentry), dentry, NULL); +out: + done_path_create(&parent, dentry); + return err; } static int unix_bind_abstract(struct sock *sk, unsigned hash, -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro ` (5 preceding siblings ...) 2021-06-19 3:50 ` [PATCH 7/8] unix_bind_bsd(): unlink if we fail after successful mknod Al Viro @ 2021-06-19 3:50 ` Al Viro 2021-06-21 19:30 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper patchwork-bot+netdevbpf 7 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-06-19 3:50 UTC (permalink / raw) To: netdev; +Cc: David Miller We only care about exclusive or of those, so pass that directly. Makes life simpler for callers as well... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index fde3175aa3d3..52c685814685 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -286,11 +286,11 @@ static inline void unix_insert_socket(struct hlist_head *list, struct sock *sk) static struct sock *__unix_find_socket_byname(struct net *net, struct sockaddr_un *sunname, - int len, int type, unsigned int hash) + int len, unsigned int hash) { struct sock *s; - sk_for_each(s, &unix_socket_table[hash ^ type]) { + sk_for_each(s, &unix_socket_table[hash]) { struct unix_sock *u = unix_sk(s); if (!net_eq(sock_net(s), net)) @@ -305,13 +305,12 @@ static struct sock *__unix_find_socket_byname(struct net *net, static inline struct sock *unix_find_socket_byname(struct net *net, struct sockaddr_un *sunname, - int len, int type, - unsigned int hash) + int len, unsigned int hash) { struct sock *s; spin_lock(&unix_table_lock); - s = __unix_find_socket_byname(net, sunname, len, type, hash); + s = __unix_find_socket_byname(net, sunname, len, hash); if (s) sock_hold(s); spin_unlock(&unix_table_lock); @@ -898,12 +897,12 @@ static int unix_autobind(struct socket *sock) retry: addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short); addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0)); + addr->hash ^= sk->sk_type; spin_lock(&unix_table_lock); ordernum = (ordernum+1)&0xFFFFF; - if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type, - addr->hash)) { + if (__unix_find_socket_byname(net, addr->name, addr->len, addr->hash)) { spin_unlock(&unix_table_lock); /* * __unix_find_socket_byname() may take long time if many names @@ -918,7 +917,6 @@ static int unix_autobind(struct socket *sock) } goto retry; } - addr->hash ^= sk->sk_type; __unix_set_addr(sk, addr, addr->hash); spin_unlock(&unix_table_lock); @@ -965,7 +963,7 @@ static struct sock *unix_find_other(struct net *net, } } else { err = -ECONNREFUSED; - u = unix_find_socket_byname(net, sunname, len, type, hash); + u = unix_find_socket_byname(net, sunname, len, type ^ hash); if (u) { struct dentry *dentry; dentry = unix_sk(u)->path.dentry; @@ -1039,8 +1037,7 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) return err; } -static int unix_bind_abstract(struct sock *sk, unsigned hash, - struct unix_address *addr) +static int unix_bind_abstract(struct sock *sk, struct unix_address *addr) { struct unix_sock *u = unix_sk(sk); int err; @@ -1056,7 +1053,7 @@ static int unix_bind_abstract(struct sock *sk, unsigned hash, spin_lock(&unix_table_lock); if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len, - sk->sk_type, hash)) { + addr->hash)) { spin_unlock(&unix_table_lock); mutex_unlock(&u->bindlock); return -EADDRINUSE; @@ -1099,7 +1096,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (sun_path[0]) err = unix_bind_bsd(sk, addr); else - err = unix_bind_abstract(sk, hash, addr); + err = unix_bind_abstract(sk, addr); if (err) unix_release_addr(addr); return err == -EEXIST ? -EADDRINUSE : err; -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro ` (6 preceding siblings ...) 2021-06-19 3:50 ` [PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately Al Viro @ 2021-06-21 19:30 ` patchwork-bot+netdevbpf 7 siblings, 0 replies; 12+ messages in thread From: patchwork-bot+netdevbpf @ 2021-06-21 19:30 UTC (permalink / raw) To: Al Viro; +Cc: netdev, davem Hello: This series was applied to netdev/net-next.git (refs/heads/master): On Sat, 19 Jun 2021 03:50:26 +0000 you wrote: > Duplicated logics in all bind variants (autobind, bind-to-path, > bind-to-abstract) gets taken into a common helper. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > net/unix/af_unix.c | 26 +++++++++++++------------- > 1 file changed, 13 insertions(+), 13 deletions(-) Here is the summary with links: - [1/8] af_unix: take address assignment/hash insertion into a new helper https://git.kernel.org/netdev/net-next/c/185ab886d3fb - [2/8] unix_bind(): allocate addr earlier https://git.kernel.org/netdev/net-next/c/c34d4582518f - [3/8] unix_bind(): separate BSD and abstract cases https://git.kernel.org/netdev/net-next/c/aee515170576 - [4/8] unix_bind(): take BSD and abstract address cases into new helpers https://git.kernel.org/netdev/net-next/c/fa42d910a38e - [5/8] fold unix_mknod() into unix_bind_bsd() https://git.kernel.org/netdev/net-next/c/71e6be6f7d2b - [6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock https://git.kernel.org/netdev/net-next/c/56c1731b280d - [7/8] unix_bind_bsd(): unlink if we fail after successful mknod https://git.kernel.org/netdev/net-next/c/c0c3b8d380a8 - [8/8] __unix_find_socket_byname(): don't pass hash and type separately https://git.kernel.org/netdev/net-next/c/be752283a2a2 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCHSET] making unix_bind() undo mknod on failure
@ 2021-02-22 19:06 Al Viro
2021-02-22 19:12 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro
0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2021-02-22 19:06 UTC (permalink / raw)
To: netdev
Cc: Denis Kirjanov, Christoph Hellwig, LKML, Jakub Kicinski,
linux-fsdevel, Cong Wang
On Sat, Feb 20, 2021 at 09:08:56PM +0000, Al Viro wrote:
> *shrug*
>
> If anything, __unix_complete_bind() might make a better name for that,
> with dropping ->bindlock also pulled in, but TBH I don't have sufficiently
> strong preferences - might as well leave dropping the lock to caller.
>
> I'll post that series to netdev tonight.
Took longer than I hoped... Anyway, here's the current variant;
it's 5.11-based, lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.af_unix
Shortlog:
Al Viro (8):
af_unix: take address assignment/hash insertion into a new helper
unix_bind(): allocate addr earlier
unix_bind(): separate BSD and abstract cases
unix_bind(): take BSD and abstract address cases into new helpers
fold unix_mknod() into unix_bind_bsd()
unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
unix_bind_bsd(): unlink if we fail after successful mknod
__unix_find_socket_byname(): don't pass hash and type separately
Diffstat:
net/unix/af_unix.c | 186 +++++++++++++++++++++++++++--------------------------
1 file changed, 94 insertions(+), 92 deletions(-)
The actual fix is in #7/8, the first 6 are massage in preparation to that
and #8/8 is a minor followup cleanup. Individual patches in followups.
Please, review.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper 2021-02-22 19:06 [PATCHSET] making unix_bind() undo mknod on failure Al Viro @ 2021-02-22 19:12 ` Al Viro 2021-02-22 19:12 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2021-02-22 19:12 UTC (permalink / raw) To: netdev Cc: linux-kernel, Christoph Hellwig, Jakub Kicinski, Denis Kirjanov, linux-fsdevel, Cong Wang Duplicated logics in all bind variants (autobind, bind-to-path, bind-to-abstract) gets taken into a common helper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 41c3303c3357..45a40cf7b6af 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -262,6 +262,14 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk) sk_add_node(sk, list); } +static void __unix_set_addr(struct sock *sk, struct unix_address *addr, + unsigned hash) +{ + __unix_remove_socket(sk); + smp_store_release(&unix_sk(sk)->addr, addr); + __unix_insert_socket(&unix_socket_table[hash], sk); +} + static inline void unix_remove_socket(struct sock *sk) { spin_lock(&unix_table_lock); @@ -912,9 +920,7 @@ static int unix_autobind(struct socket *sock) } addr->hash ^= sk->sk_type; - __unix_remove_socket(sk); - smp_store_release(&u->addr, addr); - __unix_insert_socket(&unix_socket_table[addr->hash], sk); + __unix_set_addr(sk, addr, addr->hash); spin_unlock(&unix_table_lock); err = 0; @@ -1016,7 +1022,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) int err; unsigned int hash; struct unix_address *addr; - struct hlist_head *list; struct path path = { }; err = -EINVAL; @@ -1068,25 +1073,20 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); spin_lock(&unix_table_lock); u->path = path; - list = &unix_socket_table[hash]; } else { spin_lock(&unix_table_lock); err = -EADDRINUSE; if (__unix_find_socket_byname(net, sunaddr, addr_len, sk->sk_type, hash)) { + spin_unlock(&unix_table_lock); unix_release_addr(addr); - goto out_unlock; + goto out_up; } - - list = &unix_socket_table[addr->hash]; + hash = addr->hash; } err = 0; - __unix_remove_socket(sk); - smp_store_release(&u->addr, addr); - __unix_insert_socket(list, sk); - -out_unlock: + __unix_set_addr(sk, addr, hash); spin_unlock(&unix_table_lock); out_up: mutex_unlock(&u->bindlock); -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers 2021-02-22 19:12 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro @ 2021-02-22 19:12 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-02-22 19:12 UTC (permalink / raw) To: netdev Cc: linux-kernel, Christoph Hellwig, Jakub Kicinski, Denis Kirjanov, linux-fsdevel, Cong Wang unix_bind_bsd() and unix_bind_abstract() respectively. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 147 +++++++++++++++++++++++++++-------------------------- 1 file changed, 74 insertions(+), 73 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 496b069c99fe..56443f05ed9d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1012,104 +1012,105 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res) return err; } +static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) +{ + struct unix_sock *u = unix_sk(sk); + struct path path = { }; + umode_t mode = S_IFSOCK | + (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); + unsigned int hash; + int err; + + err = unix_mknod(addr->name->sun_path, mode, &path); + if (err) + return err; + + err = mutex_lock_interruptible(&u->bindlock); + if (err) { + path_put(&path); + return err; + } + + if (u->addr) { + mutex_unlock(&u->bindlock); + path_put(&path); + return -EINVAL; + } + + addr->hash = UNIX_HASH_SIZE; + hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); + spin_lock(&unix_table_lock); + u->path = path; + __unix_set_addr(sk, addr, hash); + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return 0; +} + +static int unix_bind_abstract(struct sock *sk, unsigned hash, + struct unix_address *addr) +{ + struct unix_sock *u = unix_sk(sk); + int err; + + err = mutex_lock_interruptible(&u->bindlock); + if (err) + return err; + + if (u->addr) { + mutex_unlock(&u->bindlock); + return -EINVAL; + } + + spin_lock(&unix_table_lock); + if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len, + sk->sk_type, hash)) { + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return -EADDRINUSE; + } + __unix_set_addr(sk, addr, addr->hash); + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return 0; +} + static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; - struct net *net = sock_net(sk); - struct unix_sock *u = unix_sk(sk); struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr; char *sun_path = sunaddr->sun_path; int err; unsigned int hash; struct unix_address *addr; - err = -EINVAL; if (addr_len < offsetofend(struct sockaddr_un, sun_family) || sunaddr->sun_family != AF_UNIX) - goto out; + return -EINVAL; - if (addr_len == sizeof(short)) { - err = unix_autobind(sock); - goto out; - } + if (addr_len == sizeof(short)) + return unix_autobind(sock); err = unix_mkname(sunaddr, addr_len, &hash); if (err < 0) - goto out; + return err; addr_len = err; - err = -ENOMEM; addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); if (!addr) - goto out; + return -ENOMEM; memcpy(addr->name, sunaddr, addr_len); addr->len = addr_len; addr->hash = hash ^ sk->sk_type; refcount_set(&addr->refcnt, 1); - if (sun_path[0]) { - struct path path = { }; - umode_t mode = S_IFSOCK | - (SOCK_INODE(sock)->i_mode & ~current_umask()); - err = unix_mknod(sun_path, mode, &path); - if (err) { - if (err == -EEXIST) - err = -EADDRINUSE; - goto out_addr; - } - - err = mutex_lock_interruptible(&u->bindlock); - if (err) { - path_put(&path); - goto out_addr; - } - - err = -EINVAL; - if (u->addr) { - mutex_unlock(&u->bindlock); - path_put(&path); - goto out_addr; - } - - addr->hash = UNIX_HASH_SIZE; - hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); - spin_lock(&unix_table_lock); - u->path = path; - __unix_set_addr(sk, addr, hash); - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - addr = NULL; - err = 0; - } else { - err = mutex_lock_interruptible(&u->bindlock); - if (err) - goto out_addr; - - err = -EINVAL; - if (u->addr) { - mutex_unlock(&u->bindlock); - goto out_addr; - } - - spin_lock(&unix_table_lock); - err = -EADDRINUSE; - if (__unix_find_socket_byname(net, sunaddr, addr_len, - sk->sk_type, hash)) { - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - goto out_addr; - } - __unix_set_addr(sk, addr, addr->hash); - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - addr = NULL; - err = 0; - } -out_addr: - if (addr) + if (sun_path[0]) + err = unix_bind_bsd(sk, addr); + else + err = unix_bind_abstract(sk, hash, addr); + if (err) unix_release_addr(addr); -out: - return err; + return err == -EEXIST ? -EADDRINUSE : err; } static void unix_state_double_lock(struct sock *sk1, struct sock *sk2) -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH] fs: export kern_path_locked @ 2021-01-25 15:49 Denis Kirjanov 2021-01-27 17:57 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Denis Kirjanov @ 2021-01-25 15:49 UTC (permalink / raw) To: linux-kernel; +Cc: viro, kuba the function is used outside and we have a prototype defined in namei.h Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> --- fs/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/namei.c b/fs/namei.c index 78443a85480a..3de3b3642302 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2450,6 +2450,7 @@ struct dentry *kern_path_locked(const char *name, struct path *path) putname(filename); return d; } +EXPORT_SYMBOL(kern_path_locked); int kern_path(const char *name, unsigned int flags, struct path *path) { -- 2.16.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: export kern_path_locked 2021-01-25 15:49 [PATCH] fs: export kern_path_locked Denis Kirjanov @ 2021-01-27 17:57 ` Christoph Hellwig [not found] ` <CAOJe8K0MC-TCURE2Gpci1SLnLXCbUkE7q6SS0fznzBA+Pf-B8Q@mail.gmail.com> 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2021-01-27 17:57 UTC (permalink / raw) To: Denis Kirjanov; +Cc: linux-kernel, viro, kuba On Mon, Jan 25, 2021 at 06:49:37PM +0300, Denis Kirjanov wrote: > the function is used outside and we have a prototype > defined in namei.h Huh? It is only used in drivers/base/devtmpfs.c, and kernel/audit_*.c. Absolutely no need to export it for that. ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CAOJe8K0MC-TCURE2Gpci1SLnLXCbUkE7q6SS0fznzBA+Pf-B8Q@mail.gmail.com>]
[parent not found: <20210129082524.GA2282796@infradead.org>]
[parent not found: <CAOJe8K0iG91tm8YBRmE_rdMMMbc4iRsMGYNxJk0p9vEedNHEkg@mail.gmail.com>]
[parent not found: <20210129131855.GA2346744@infradead.org>]
* Re: [PATCH] fs: export kern_path_locked [not found] ` <20210129131855.GA2346744@infradead.org> @ 2021-02-14 18:17 ` Al Viro 2021-02-16 14:31 ` Denis Kirjanov 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2021-02-14 18:17 UTC (permalink / raw) To: Christoph Hellwig Cc: Denis Kirjanov, linux-kernel, Jakub Kicinski, linux-fsdevel On Fri, Jan 29, 2021 at 01:18:55PM +0000, Christoph Hellwig wrote: > On Fri, Jan 29, 2021 at 04:11:05PM +0300, Denis Kirjanov wrote: > > Do you mean just: > > We'll still need to lock the parent inode. Not just "lock", we wouldd need to have the lock _held_ across the entire sequence. Without that there's no warranty that it will refer to the same object we'd created. In any case, unlink in any potentially public area is pretty much never the right approach. Once mknod has happened, that's it - too late to bail out. IIRC, most of the PITA in that area is due to unix_autobind() iteractions. Basically, we try to bind() an unbound socket and another thread does sendmsg() on the same while we are in the middle of ->mknod(). Who should wait for whom? ->mknod() really should be a point of no return - any games with "so we unlink it" are unreliable in the best case, and that's only if we do _not_ unlock the parent through the entire sequence. Seeing that we have separate bindlock and iolock now... How about this (completely untested) delta? diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 41c3303c3357..c21038b15836 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1034,6 +1034,14 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; addr_len = err; + err = mutex_lock_interruptible(&u->bindlock); + if (err) + goto out; + + err = -EINVAL; + if (u->addr) + goto out_up; + if (sun_path[0]) { umode_t mode = S_IFSOCK | (SOCK_INODE(sock)->i_mode & ~current_umask()); @@ -1041,18 +1049,10 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (err) { if (err == -EEXIST) err = -EADDRINUSE; - goto out; + goto out_up; } } - err = mutex_lock_interruptible(&u->bindlock); - if (err) - goto out_put; - - err = -EINVAL; - if (u->addr) - goto out_up; - err = -ENOMEM; addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); if (!addr) @@ -1090,7 +1090,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) spin_unlock(&unix_table_lock); out_up: mutex_unlock(&u->bindlock); -out_put: if (err) path_put(&path); out: ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: export kern_path_locked 2021-02-14 18:17 ` Al Viro @ 2021-02-16 14:31 ` Denis Kirjanov 2021-02-16 18:00 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Denis Kirjanov @ 2021-02-16 14:31 UTC (permalink / raw) To: Al Viro; +Cc: Christoph Hellwig, linux-kernel, Jakub Kicinski, linux-fsdevel On 2/14/21, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Jan 29, 2021 at 01:18:55PM +0000, Christoph Hellwig wrote: >> On Fri, Jan 29, 2021 at 04:11:05PM +0300, Denis Kirjanov wrote: >> > Do you mean just: >> >> We'll still need to lock the parent inode. > > Not just "lock", we wouldd need to have the lock _held_ across the > entire sequence. Without that there's no warranty that it will refer > to the same object we'd created. > > In any case, unlink in any potentially public area is pretty much > never the right approach. Once mknod has happened, that's it - too > late to bail out. > > IIRC, most of the PITA in that area is due to unix_autobind() > iteractions. Basically, we try to bind() an unbound socket and > another thread does sendmsg() on the same while we are in the > middle of ->mknod(). Who should wait for whom? > > ->mknod() really should be a point of no return - any games with > "so we unlink it" are unreliable in the best case, and that's > only if we do _not_ unlock the parent through the entire sequence. > > Seeing that we have separate bindlock and iolock now... How about > this (completely untested) delta? We had a change like that: Author: WANG Cong <xiyou.wangcong@gmail.com> Date: Mon Jan 23 11:17:35 2017 -0800 af_unix: move unix_mknod() out of bindlock Dmitry reported a deadlock scenario: unix_bind() path: u->bindlock ==> sb_writer do_splice() path: sb_writer ==> pipe->mutex ==> u->bindlock In the unix_bind() code path, unix_mknod() does not have to be done with u->bindlock held, since it is a pure fs operation, so we can just move unix_mknod() out. > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 41c3303c3357..c21038b15836 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -1034,6 +1034,14 @@ static int unix_bind(struct socket *sock, struct > sockaddr *uaddr, int addr_len) > goto out; > addr_len = err; > > + err = mutex_lock_interruptible(&u->bindlock); > + if (err) > + goto out; > + > + err = -EINVAL; > + if (u->addr) > + goto out_up; > + > if (sun_path[0]) { > umode_t mode = S_IFSOCK | > (SOCK_INODE(sock)->i_mode & ~current_umask()); > @@ -1041,18 +1049,10 @@ static int unix_bind(struct socket *sock, struct > sockaddr *uaddr, int addr_len) > if (err) { > if (err == -EEXIST) > err = -EADDRINUSE; > - goto out; > + goto out_up; > } > } > > - err = mutex_lock_interruptible(&u->bindlock); > - if (err) > - goto out_put; > - > - err = -EINVAL; > - if (u->addr) > - goto out_up; > - > err = -ENOMEM; > addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); > if (!addr) > @@ -1090,7 +1090,6 @@ static int unix_bind(struct socket *sock, struct > sockaddr *uaddr, int addr_len) > spin_unlock(&unix_table_lock); > out_up: > mutex_unlock(&u->bindlock); > -out_put: > if (err) > path_put(&path); > out: > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: export kern_path_locked 2021-02-16 14:31 ` Denis Kirjanov @ 2021-02-16 18:00 ` Al Viro 2021-02-19 4:11 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2021-02-16 18:00 UTC (permalink / raw) To: Denis Kirjanov Cc: Christoph Hellwig, linux-kernel, Jakub Kicinski, linux-fsdevel On Tue, Feb 16, 2021 at 05:31:33PM +0300, Denis Kirjanov wrote: > We had a change like that: > Author: WANG Cong <xiyou.wangcong@gmail.com> > Date: Mon Jan 23 11:17:35 2017 -0800 > > af_unix: move unix_mknod() out of bindlock > > Dmitry reported a deadlock scenario: > > unix_bind() path: > u->bindlock ==> sb_writer > > do_splice() path: > sb_writer ==> pipe->mutex ==> u->bindlock > > In the unix_bind() code path, unix_mknod() does not have to > be done with u->bindlock held, since it is a pure fs operation, > so we can just move unix_mknod() out. *cringe* I remember now... Process set: P1: bind() of AF_UNIX socket to /mnt/sock P2: splice() from pipe to /mnt/foo P3: freeze /mnt P4: splice() from pipe to AF_UNIX socket P1 grabs ->bindlock P2 sb_start_write() for what's on /mnt P2 grabs rwsem shared P3 blocks in sb_wait_write() trying to grab the same rwsem exclusive P1 sb_start_write() blocks trying to grab the same rwsem shared P4 calls ->splice_write(), aka generic_splice_sendpage() P4 grabs pipe->mutex P4 calls ->sendpage(), aka sock_no_sendpage() P4 calls ->sendmsg(), aka unix_dgram_sendmsg() P4 calls unix_autobind() P4 blocks trying to grab ->bindlock P2 ->splice_write(), aka iter_file_splice_write() P2 blocks trying to grab pipe->mutex DEADLOCK Sigh... OK, so we want something like user_path_create() vfs_mknod() created = true grab bindlock .... drop bindlock if failed && created vfs_unlink() done_path_create() in unix_bind()... That would push ->bindlock all way down in the hierarchy, so that should be deadlock-free, but it looks like that'll be fucking ugly ;-/ Let me try and play with that a bit, maybe it can be massaged to something relatively sane... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: export kern_path_locked 2021-02-16 18:00 ` Al Viro @ 2021-02-19 4:11 ` Al Viro 2021-02-19 4:21 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2021-02-19 4:11 UTC (permalink / raw) To: Denis Kirjanov Cc: Christoph Hellwig, linux-kernel, Jakub Kicinski, linux-fsdevel On Tue, Feb 16, 2021 at 06:00:34PM +0000, Al Viro wrote: > Sigh... OK, so we want something like > user_path_create() > vfs_mknod() > created = true > grab bindlock > .... > drop bindlock > if failed && created > vfs_unlink() > done_path_create() > in unix_bind()... That would push ->bindlock all way down in the hierarchy, > so that should be deadlock-free, but it looks like that'll be fucking ugly ;-/ > > Let me try and play with that a bit, maybe it can be massaged to something > relatively sane... OK... Completely untested series follows. Preliminary massage in first 6 patches, then actual "add cleanup on failure", then minor followup cleanup. af_unix: take address assignment/hash insertion into a new helper unix_bind(): allocate addr earlier unix_bind(): separate BSD and abstract cases unix_bind(): take BSD and abstract address cases into new helpers fold unix_mknod() into unix_bind_bsd() unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock unix_bind_bsd(): unlink if we fail after successful mknod __unix_find_socket_byname(): don't pass hash and type separately Branch is in git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #misc.af_unix, individual patches in followups ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers 2021-02-19 4:11 ` Al Viro @ 2021-02-19 4:21 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2021-02-19 4:21 UTC (permalink / raw) To: Denis Kirjanov Cc: Christoph Hellwig, linux-kernel, Jakub Kicinski, linux-fsdevel unix_bind_bsd() and unix_bind_abstract() respectively. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- net/unix/af_unix.c | 144 +++++++++++++++++++++++++++-------------------------- 1 file changed, 74 insertions(+), 70 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 11e18b0efbc6..d7aeb4827747 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1013,101 +1013,105 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res) return err; } +static int unix_bind_bsd(struct sock *sk, struct unix_address *addr) +{ + struct unix_sock *u = unix_sk(sk); + struct path path = { }; + umode_t mode = S_IFSOCK | + (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); + unsigned int hash; + int err; + + err = unix_mknod(addr->name->sun_path, mode, &path); + if (err) { + if (err == -EEXIST) + err = -EADDRINUSE; + return err; + } + + err = mutex_lock_interruptible(&u->bindlock); + if (err) { + path_put(&path); + return err; + } + + if (u->addr) { + mutex_unlock(&u->bindlock); + path_put(&path); + return -EINVAL; + } + + addr->hash = UNIX_HASH_SIZE; + hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); + spin_lock(&unix_table_lock); + u->path = path; + __unix_set_addr(sk, addr, hash); + mutex_unlock(&u->bindlock); + return 0; +} + +static int unix_bind_abstract(struct sock *sk, unsigned hash, + struct unix_address *addr) +{ + struct unix_sock *u = unix_sk(sk); + int err; + + err = mutex_lock_interruptible(&u->bindlock); + if (err) + return err; + + if (u->addr) { + mutex_unlock(&u->bindlock); + return -EINVAL; + } + + spin_lock(&unix_table_lock); + if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len, + sk->sk_type, hash)) { + spin_unlock(&unix_table_lock); + mutex_unlock(&u->bindlock); + return -EADDRINUSE; + } + __unix_set_addr(sk, addr, addr->hash); + mutex_unlock(&u->bindlock); + return 0; +} + static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; - struct net *net = sock_net(sk); - struct unix_sock *u = unix_sk(sk); struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr; char *sun_path = sunaddr->sun_path; int err; unsigned int hash; struct unix_address *addr; - err = -EINVAL; if (addr_len < offsetofend(struct sockaddr_un, sun_family) || sunaddr->sun_family != AF_UNIX) - goto out; + return -EINVAL; - if (addr_len == sizeof(short)) { - err = unix_autobind(sock); - goto out; - } + if (addr_len == sizeof(short)) + return unix_autobind(sock); err = unix_mkname(sunaddr, addr_len, &hash); if (err < 0) - goto out; + return err; addr_len = err; - err = -ENOMEM; addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL); if (!addr) - goto out; + return -ENOMEM; memcpy(addr->name, sunaddr, addr_len); addr->len = addr_len; addr->hash = hash ^ sk->sk_type; refcount_set(&addr->refcnt, 1); - if (sun_path[0]) { - struct path path = { }; - umode_t mode = S_IFSOCK | - (SOCK_INODE(sock)->i_mode & ~current_umask()); - err = unix_mknod(sun_path, mode, &path); - if (err) { - if (err == -EEXIST) - err = -EADDRINUSE; - goto out_addr; - } - - err = mutex_lock_interruptible(&u->bindlock); - if (err) { - path_put(&path); - goto out_addr; - } - - err = -EINVAL; - if (u->addr) { - mutex_unlock(&u->bindlock); - path_put(&path); - goto out_addr; - } - - addr->hash = UNIX_HASH_SIZE; - hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); - spin_lock(&unix_table_lock); - u->path = path; - __unix_set_addr(sk, addr, hash); - mutex_unlock(&u->bindlock); - addr = NULL; - err = 0; - } else { - err = mutex_lock_interruptible(&u->bindlock); - if (err) - goto out_addr; - - err = -EINVAL; - if (u->addr) { - mutex_unlock(&u->bindlock); - goto out_addr; - } - - spin_lock(&unix_table_lock); - err = -EADDRINUSE; - if (__unix_find_socket_byname(net, sunaddr, addr_len, - sk->sk_type, hash)) { - spin_unlock(&unix_table_lock); - mutex_unlock(&u->bindlock); - goto out_addr; - } - __unix_set_addr(sk, addr, addr->hash); - mutex_unlock(&u->bindlock); - addr = NULL; - err = 0; - } -out_addr: - if (addr) + if (sun_path[0]) + err = unix_bind_bsd(sk, addr); + else + err = unix_bind_abstract(sk, hash, addr); + if (err) unix_release_addr(addr); -out: return err; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-06-21 19:30 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-19 3:48 [PATCHSET] AF_UNIX bind(2) cleanup on failures after mknod Al Viro 2021-06-19 3:50 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro 2021-06-19 3:50 ` [PATCH 2/8] unix_bind(): allocate addr earlier Al Viro 2021-06-19 3:50 ` [PATCH 3/8] unix_bind(): separate BSD and abstract cases Al Viro 2021-06-19 3:50 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro 2021-06-19 3:50 ` [PATCH 5/8] fold unix_mknod() into unix_bind_bsd() Al Viro 2021-06-19 3:50 ` [PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock Al Viro 2021-06-19 3:50 ` [PATCH 7/8] unix_bind_bsd(): unlink if we fail after successful mknod Al Viro 2021-06-19 3:50 ` [PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately Al Viro 2021-06-21 19:30 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper patchwork-bot+netdevbpf -- strict thread matches above, loose matches on Subject: below -- 2021-02-22 19:06 [PATCHSET] making unix_bind() undo mknod on failure Al Viro 2021-02-22 19:12 ` [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper Al Viro 2021-02-22 19:12 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro 2021-01-25 15:49 [PATCH] fs: export kern_path_locked Denis Kirjanov 2021-01-27 17:57 ` Christoph Hellwig [not found] ` <CAOJe8K0MC-TCURE2Gpci1SLnLXCbUkE7q6SS0fznzBA+Pf-B8Q@mail.gmail.com> [not found] ` <20210129082524.GA2282796@infradead.org> [not found] ` <CAOJe8K0iG91tm8YBRmE_rdMMMbc4iRsMGYNxJk0p9vEedNHEkg@mail.gmail.com> [not found] ` <20210129131855.GA2346744@infradead.org> 2021-02-14 18:17 ` Al Viro 2021-02-16 14:31 ` Denis Kirjanov 2021-02-16 18:00 ` Al Viro 2021-02-19 4:11 ` Al Viro 2021-02-19 4:21 ` [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers Al Viro
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.