All of lore.kernel.org
 help / color / mirror / Atom feed
* About unix_autobind()
@ 2010-08-21 12:01 Tetsuo Handa
  2010-08-21 12:34 ` Changli Gao
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2010-08-21 12:01 UTC (permalink / raw)
  To: netdev

I was browsing unix_autobind() and wondered what happens if all names in
Unix domain socket's abstract namespace were in use.

Below part is unix_autobind() from linux-2.6.36-rc1/net/unix/af_unix.c

710 retry:
711         addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
712         addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0));
713 
714         spin_lock(&unix_table_lock);
715         ordernum = (ordernum+1)&0xFFFFF;
716 
717         if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
718                                       addr->hash)) {
719                 spin_unlock(&unix_table_lock);
720                 /* Sanity yield. It is unusual case, but yet... */
721                 if (!(ordernum&0xFF))
722                         yield();
723                 goto retry;
724         }

We can see that unix_autobind() allows 1048576 names.

A machine with 256MB RAM:

  # cat /proc/sys/fs/file-max
  24109
  # cat /proc/sys/fs/file-nr
  608 0 24109

A machine with 1736MB RAM:

  # cat /proc/sys/fs/file-max
  174347
  # cat /proc/sys/fs/file-nr
  96 0 174347

/proc/sys/fs/file-max seems to be proportional to the amount of RAM.
I don't have access to a machine with 10GB RAM (where /proc/sys/fs/file-max
becomes larger than 1048576 by default). So, I manually set

  # echo 1050000 > /proc/sys/fs/file-max

and executed below program as non-root user.

---------- Test program start ----------
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	int i;
	for (i = 0; i < 1030; i++) {
		switch (fork()) {
		case 0:
			sleep(5);
			close(0);
			close(1);
			close(2);
			for (i = 0; i < 1024; i++) {
				struct sockaddr_un addr;
				int fd = socket(PF_UNIX, SOCK_DGRAM, 0);
				addr.sun_family = AF_UNIX;
				bind(fd, (struct sockaddr *) &addr, sizeof(addr.sun_family));
			}
			while (1)
				sleep(1000);
		case -1:
			write(1, "fork() failed\n", 14);
			return 1;
		}
	}
	return 0;
}
---------- Test program end ----------

If there is not enough memory, OOM killer was invoked. OOM killer killed
/usr/sbin/httpd process rather than above test program. (Oops. Non-root user
was able to terminate other user's processes via OOM killer.)

If there is enough memory (I can't test it), OOM killer will not be invoked.
But if all names were occupied, I guess subsequent unix_autobind() by other
users will loop forever because it loops until a name becomes available.
(I had to type "killall a.out" before above program occupies all names, for
the machine became very dull due to unix_autobind().)

Maybe some safeguard is wanted.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: About unix_autobind()
  2010-08-21 12:01 About unix_autobind() Tetsuo Handa
@ 2010-08-21 12:34 ` Changli Gao
  2010-08-30 13:27   ` [PATCH] UNIX: Do not loop forever at unix_autobind() Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Changli Gao @ 2010-08-21 12:34 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

On Sat, Aug 21, 2010 at 8:01 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> I was browsing unix_autobind() and wondered what happens if all names in
> Unix domain socket's abstract namespace were in use.
>
> Below part is unix_autobind() from linux-2.6.36-rc1/net/unix/af_unix.c
>
> 710 retry:
> 711         addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
> 712         addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0));
> 713
> 714         spin_lock(&unix_table_lock);
> 715         ordernum = (ordernum+1)&0xFFFFF;
> 716
> 717         if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
> 718                                       addr->hash)) {
> 719                 spin_unlock(&unix_table_lock);
> 720                 /* Sanity yield. It is unusual case, but yet... */
> 721                 if (!(ordernum&0xFF))
> 722                         yield();
> 723                 goto retry;
> 724         }
>
> We can see that unix_autobind() allows 1048576 names.
>
> A machine with 256MB RAM:
>
>  # cat /proc/sys/fs/file-max
>  24109
>  # cat /proc/sys/fs/file-nr
>  608 0 24109
>
> A machine with 1736MB RAM:
>
>  # cat /proc/sys/fs/file-max
>  174347
>  # cat /proc/sys/fs/file-nr
>  96 0 174347
>
> /proc/sys/fs/file-max seems to be proportional to the amount of RAM.
> I don't have access to a machine with 10GB RAM (where /proc/sys/fs/file-max
> becomes larger than 1048576 by default). So, I manually set
>
>  # echo 1050000 > /proc/sys/fs/file-max
>
> and executed below program as non-root user.
>
> ---------- Test program start ----------
> #include <sys/socket.h>
> #include <sys/un.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
>        int i;
>        for (i = 0; i < 1030; i++) {
>                switch (fork()) {
>                case 0:
>                        sleep(5);
>                        close(0);
>                        close(1);
>                        close(2);
>                        for (i = 0; i < 1024; i++) {
>                                struct sockaddr_un addr;
>                                int fd = socket(PF_UNIX, SOCK_DGRAM, 0);
>                                addr.sun_family = AF_UNIX;
>                                bind(fd, (struct sockaddr *) &addr, sizeof(addr.sun_family));
>                        }
>                        while (1)
>                                sleep(1000);
>                case -1:
>                        write(1, "fork() failed\n", 14);
>                        return 1;
>                }
>        }
>        return 0;
> }
> ---------- Test program end ----------
>
> If there is not enough memory, OOM killer was invoked. OOM killer killed
> /usr/sbin/httpd process rather than above test program. (Oops. Non-root user
> was able to terminate other user's processes via OOM killer.)
>
> If there is enough memory (I can't test it), OOM killer will not be invoked.
> But if all names were occupied, I guess subsequent unix_autobind() by other
> users will loop forever because it loops until a name becomes available.
> (I had to type "killall a.out" before above program occupies all names, for
> the machine became very dull due to unix_autobind().)
>
> Maybe some safeguard is wanted.

You can add a counter of the names, and check it before allocating a new name.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-08-21 12:34 ` Changli Gao
@ 2010-08-30 13:27   ` Tetsuo Handa
  2010-09-01 19:47     ` Eric Dumazet
  2010-09-01 21:33     ` How can OOM killer detect process consuming much kernel memory? Tetsuo Handa
  0 siblings, 2 replies; 16+ messages in thread
From: Tetsuo Handa @ 2010-08-30 13:27 UTC (permalink / raw)
  To: netdev

I tried to create and autobind 1048576 UNIX domain sockets using
http://I-love.SAKURA.ne.jp/tmp/unixloop.asm on 2.6.18-194.11.1.el5.x86_64,
and found that it needs about 2GB of RAM. Thus, on systems where
/proc/sys/fs/file-max is larger than 1048576, a local user can create and
autobind 1048576 UNIX domain sockets in order to let applications fall into

  while (1)
  	yield();

loop. Below is the patch (only compile tested) to avoid falling into this loop.
Is there any reason a name has to be '\0' + 5 letters?
Shoud I give up after checking 1048576 names rather than after checking
4294967296 names?
----------------------------------------
 From 703b0677bf03afda19665f05fae4850ecf88afe3 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 30 Aug 2010 22:07:13 +0900
Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().

We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
  	yield();

loop at unix_autobind() till a name becomes available.

This patch changes unix_autobind() to check 4294967296 names and also give up
unix_autobind() if all names are in use.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/unix/af_unix.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4414a18..1271e98 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
 	static u32 ordernum = 1;
 	struct unix_address *addr;
 	int err;
+	u32 loop = 0;
 
 	mutex_lock(&u->readlock);
 
@@ -700,7 +701,7 @@ static int unix_autobind(struct socket *sock)
 		goto out;
 
 	err = -ENOMEM;
-	addr = kzalloc(sizeof(*addr) + sizeof(short) + 16, GFP_KERNEL);
+	addr = kzalloc(sizeof(*addr) + sizeof(short) + 19, GFP_KERNEL);
 	if (!addr)
 		goto out;
 
@@ -708,11 +709,11 @@ static int unix_autobind(struct socket *sock)
 	atomic_set(&addr->refcnt, 1);
 
 retry:
-	addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
+	addr->len = sprintf(addr->name->sun_path+1, "%08x", ordernum) + 1 + sizeof(short);
 	addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0));
 
 	spin_lock(&unix_table_lock);
-	ordernum = (ordernum+1)&0xFFFFF;
+	ordernum++;
 
 	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
 				      addr->hash)) {
@@ -720,6 +721,12 @@ retry:
 		/* Sanity yield. It is unusual case, but yet... */
 		if (!(ordernum&0xFF))
 			yield();
+		/* Give up if all names are in use. */
+		if (unlikely(!++loop)) {
+			err = -EAGAIN;
+			kfree(addr);
+			goto out;
+		}
 		goto retry;
 	}
 	addr->hash ^= sk->sk_type;
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-08-30 13:27   ` [PATCH] UNIX: Do not loop forever at unix_autobind() Tetsuo Handa
@ 2010-09-01 19:47     ` Eric Dumazet
  2010-09-04  6:58       ` Tetsuo Handa
  2010-09-01 21:33     ` How can OOM killer detect process consuming much kernel memory? Tetsuo Handa
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-01 19:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

Le lundi 30 août 2010 à 22:27 +0900, Tetsuo Handa a écrit :
> I tried to create and autobind 1048576 UNIX domain sockets using
> http://I-love.SAKURA.ne.jp/tmp/unixloop.asm on 2.6.18-194.11.1.el5.x86_64,
> and found that it needs about 2GB of RAM. Thus, on systems where
> /proc/sys/fs/file-max is larger than 1048576, a local user can create and
> autobind 1048576 UNIX domain sockets in order to let applications fall into
> 
>   while (1)
>   	yield();
> 
> loop. Below is the patch (only compile tested) to avoid falling into this loop.
> Is there any reason a name has to be '\0' + 5 letters?
> Shoud I give up after checking 1048576 names rather than after checking
> 4294967296 names?

Yes please. Fix the bug first.

Then, a following patch to increase current limit, if necessary.

> ----------------------------------------

>  
>  	err = -ENOMEM;
> -	addr = kzalloc(sizeof(*addr) + sizeof(short) + 16, GFP_KERNEL);
> +	addr = kzalloc(sizeof(*addr) + sizeof(short) + 19, GFP_KERNEL);

IMHO, this 16 or 19 value is wrong, we need less memory than that.

(But this will be adressed in a 2nd patch)

Thanks



^ permalink raw reply	[flat|nested] 16+ messages in thread

* How can OOM killer detect process consuming much kernel memory?
  2010-08-30 13:27   ` [PATCH] UNIX: Do not loop forever at unix_autobind() Tetsuo Handa
  2010-09-01 19:47     ` Eric Dumazet
@ 2010-09-01 21:33     ` Tetsuo Handa
  2010-09-01 22:25       ` David Rientjes
  2010-09-03  6:32       ` KOSAKI Motohiro
  1 sibling, 2 replies; 16+ messages in thread
From: Tetsuo Handa @ 2010-09-01 21:33 UTC (permalink / raw)
  To: linux-kernel, netdev, kosaki.motohiro, spender, eric.dumazet

I contacted Brad Spengler regarding below topic

  http://kerneltrap.org/mailarchive/linux-netdev/2010/8/21/6283491

and received some comments.

Brad Spengler wrote:
> Tetsuo Handa wrote:
> > Brad Spengler wrote:
> > > Nice find, seems like the problem isn't just the exhaustion of the space 
> > > itself, but that a single non-root user can completely exhaust it with 
> > > no limitation (given enough RAM).  There should be some per-user 
> > > resource limit for this (similar to RLIMIT_NPROC) or the space should be 
> > > made less rigid.
> > > 
> > I posted a patch in the third message of that thread, but no response so far.
> > I don't know whether we should allow 4294967296 names or not
> > because more RAM will be consumed by attacker if more names are allowed.
> 
> The max size for sun_path is considerably large; why couldn't we encode 
> the uid/namespace into it somehow and reduce the amount each user can 
> allocate?  (conceptually it'd be something like, <namespace 
> uuid><uid><hex to operate on as previously>)  The OOM killer needs to 
> make sure creators of large numbers of these get killed, but 
> implementing a per-user limit on this keeps it from turning into a DoS.

Is it possible to make OOM killer to kill processes consuming kernel memory
rather than userspace memory?

I'm happy if OOM killer can select victim process based on both kernel memory
usage and userspace memory usage. For example, opening a file requires kernel
memory allocation (e.g. /sys/kernel/security/tomoyo/self_domain allocates 8KB
of kernel memory). Therefore, I refuse allowing everybody to open that file
even if the content is public (because an attacker would open
/sys/kernel/security/tomoyo/self_domain as many as possible using
fork() and open() in order to exhaust kernel memory).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How can OOM killer detect process consuming much kernel memory?
  2010-09-01 21:33     ` How can OOM killer detect process consuming much kernel memory? Tetsuo Handa
@ 2010-09-01 22:25       ` David Rientjes
  2010-09-03  6:32       ` KOSAKI Motohiro
  1 sibling, 0 replies; 16+ messages in thread
From: David Rientjes @ 2010-09-01 22:25 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-kernel, netdev, kosaki.motohiro, spender, eric.dumazet

On Thu, 2 Sep 2010, Tetsuo Handa wrote:

> Is it possible to make OOM killer to kill processes consuming kernel memory
> rather than userspace memory?
> 
> I'm happy if OOM killer can select victim process based on both kernel memory
> usage and userspace memory usage. For example, opening a file requires kernel
> memory allocation (e.g. /sys/kernel/security/tomoyo/self_domain allocates 8KB
> of kernel memory). Therefore, I refuse allowing everybody to open that file
> even if the content is public (because an attacker would open
> /sys/kernel/security/tomoyo/self_domain as many as possible using
> fork() and open() in order to exhaust kernel memory).

The oom killer has been rewritten for 2.6.36, so it'd probably help to 
frame it in the context of its new behavior, which you can try out in 
2.6.36-rc3.

The case you're describing would be difficult to kill with the oom killer 
because each fork() and exec() would be considered as seperate users of 
memory, so an aggregate of kernel memory amongst these threads wouldn't be 
considered.

The oom killer isn't really concerned whether the memory was allocated in 
userspace or kernelspace, the only thing that matters is that it's used 
and is unreclaimable (and non-migratable).

The memory controller accounts for user memory, so it's probably not a 
sufficient alternative either in this case.  Is it not possible for the 
kernel to intelligently limit the amount of memory that it can allocate 
through an interface at any given time?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How can OOM killer detect process consuming much kernel memory?
  2010-09-01 21:33     ` How can OOM killer detect process consuming much kernel memory? Tetsuo Handa
  2010-09-01 22:25       ` David Rientjes
@ 2010-09-03  6:32       ` KOSAKI Motohiro
  1 sibling, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2010-09-03  6:32 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: kosaki.motohiro, linux-kernel, netdev, spender, eric.dumazet

> Is it possible to make OOM killer to kill processes consuming kernel memory
> rather than userspace memory?
> 
> I'm happy if OOM killer can select victim process based on both kernel memory
> usage and userspace memory usage. For example, opening a file requires kernel
> memory allocation (e.g. /sys/kernel/security/tomoyo/self_domain allocates 8KB
> of kernel memory). Therefore, I refuse allowing everybody to open that file
> even if the content is public (because an attacker would open
> /sys/kernel/security/tomoyo/self_domain as many as possible using
> fork() and open() in order to exhaust kernel memory).

Unfortunatelly, It's impossible. We have zero information which kernel
memory should be bound caller task. Almost all kernel memory are task
unrelated, so we can't bind them with caller task blindly.

Thanks.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-01 19:47     ` Eric Dumazet
@ 2010-09-04  6:58       ` Tetsuo Handa
  2010-09-04  7:11         ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2010-09-04  6:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

>From a67ccbb8033993df29f26bde9944e37bffe4fc1b Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 4 Sep 2010 15:22:22 +0900
Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().

We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
  	yield();

loop at unix_autobind() till a name becomes available.
This patch changes unix_autobind() to fail if all names are in use.

Note that currently a local user can consume 2GB of kernel memory if the user
is allowed to create and autobind 1048576 UNIX domain sockets. We should
consider adding some restriction for autobind operation.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/unix/af_unix.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4414a18..46fc6b2 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
 	static u32 ordernum = 1;
 	struct unix_address *addr;
 	int err;
+	u32 stop_ordernum;
 
 	mutex_lock(&u->readlock);
 
@@ -706,6 +707,7 @@ static int unix_autobind(struct socket *sock)
 
 	addr->name->sun_family = AF_UNIX;
 	atomic_set(&addr->refcnt, 1);
+	stop_ordernum = ordernum;
 
 retry:
 	addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
@@ -720,6 +722,12 @@ retry:
 		/* Sanity yield. It is unusual case, but yet... */
 		if (!(ordernum&0xFF))
 			yield();
+		/* Give up if all names are in use. */
+		if (ordernum == stop_ordernum) {
+			err = -ENOMEM;
+			kfree(addr);
+			goto out;
+		}
 		goto retry;
 	}
 	addr->hash ^= sk->sk_type;
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04  6:58       ` Tetsuo Handa
@ 2010-09-04  7:11         ` Eric Dumazet
  2010-09-04  7:40           ` Tetsuo Handa
  2010-09-04 11:52           ` Michał Mirosław
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-09-04  7:11 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

Le samedi 04 septembre 2010 à 15:58 +0900, Tetsuo Handa a écrit :
> From a67ccbb8033993df29f26bde9944e37bffe4fc1b Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sat, 4 Sep 2010 15:22:22 +0900
> Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().
> 
> We assumed that unix_autobind() never fails if kzalloc() succeeded.
> But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
> larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
> consume all names using fork()/socket()/bind().
> 
> If all names are in use, those who call bind() with addr_len == sizeof(short)
> or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue
> 
>   while (1)
>   	yield();
> 
> loop at unix_autobind() till a name becomes available.
> This patch changes unix_autobind() to fail if all names are in use.
> 
> Note that currently a local user can consume 2GB of kernel memory if the user
> is allowed to create and autobind 1048576 UNIX domain sockets. We should
> consider adding some restriction for autobind operation.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  net/unix/af_unix.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 4414a18..46fc6b2 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
>  	static u32 ordernum = 1;
>  	struct unix_address *addr;
>  	int err;
> +	u32 stop_ordernum;
>  
>  	mutex_lock(&u->readlock);
>  
> @@ -706,6 +707,7 @@ static int unix_autobind(struct socket *sock)
>  
>  	addr->name->sun_family = AF_UNIX;
>  	atomic_set(&addr->refcnt, 1);
> +	stop_ordernum = ordernum;
>  
>  retry:
>  	addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
> @@ -720,6 +722,12 @@ retry:
>  		/* Sanity yield. It is unusual case, but yet... */
>  		if (!(ordernum&0xFF))
>  			yield();
> +		/* Give up if all names are in use. */
> +		if (ordernum == stop_ordernum) {
> +			err = -ENOMEM;
> +			kfree(addr);
> +			goto out;
> +		}
>  		goto retry;
>  	}
>  	addr->hash ^= sk->sk_type;


Sorry, this wont work very well if you have many processes using
autobind(). Some of them will loop many time before hitting
"stop_ordernum".

unsigned int counter;

...

if (++maxtries == 1<<20) {
	...
}


This is a pathological situation. We are not forced to give a successful
autobind() when so many sockets are in use, even if some slots are
available.

Thanks



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04  7:11         ` Eric Dumazet
@ 2010-09-04  7:40           ` Tetsuo Handa
  2010-09-04  8:24             ` Eric Dumazet
  2010-09-04 11:52           ` Michał Mirosław
  1 sibling, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2010-09-04  7:40 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

Eric Dumazet wrote:
> Sorry, this wont work very well if you have many processes using
> autobind(). Some of them will loop many time before hitting
> "stop_ordernum".

I see. Then, we should use local counter rather than global counter for yield()
checking in case there are multiple threads hitting this loop.
----------------------------------------
>From 57a49c7b5a39de58d4538b85c60c758be3d2ce4f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 4 Sep 2010 16:23:46 +0900
Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().

We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
  	yield();

loop at unix_autobind() till a name becomes available.
This patch adds a loop counter in order to give up after 1048576 attempts and
use the loop counter in order to count yield() interval more reliably when
there are many threads hitting this loop.

Note that currently a local user can consume 2GB of kernel memory if the user
is allowed to create and autobind 1048576 UNIX domain sockets. We should
consider adding some restriction for autobind operation.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/unix/af_unix.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4414a18..eedfe50 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
 	static u32 ordernum = 1;
 	struct unix_address *addr;
 	int err;
+	unsigned int retries = 0;
 
 	mutex_lock(&u->readlock);
 
@@ -717,8 +718,14 @@ retry:
 	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
 				      addr->hash)) {
 		spin_unlock(&unix_table_lock);
+		/* Give up if all names seems to be in use. */
+		if (retries++ == 0xFFFFF) {
+			err = -ENOMEM;
+			kfree(addr);
+			goto out;
+		}
 		/* Sanity yield. It is unusual case, but yet... */
-		if (!(ordernum&0xFF))
+		if (!(retries & 0xFF))
 			yield();
 		goto retry;
 	}
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04  7:40           ` Tetsuo Handa
@ 2010-09-04  8:24             ` Eric Dumazet
  2010-09-04  9:31               ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-04  8:24 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

Le samedi 04 septembre 2010 à 16:40 +0900, Tetsuo Handa a écrit :
> Eric Dumazet wrote:
> > Sorry, this wont work very well if you have many processes using
> > autobind(). Some of them will loop many time before hitting
> > "stop_ordernum".
> 
> I see. Then, we should use local counter rather than global counter for yield()
> checking in case there are multiple threads hitting this loop.
> ----------------------------------------
> From 57a49c7b5a39de58d4538b85c60c758be3d2ce4f Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sat, 4 Sep 2010 16:23:46 +0900
> Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().
> 
> We assumed that unix_autobind() never fails if kzalloc() succeeded.
> But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
> larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
> consume all names using fork()/socket()/bind().
> 
> If all names are in use, those who call bind() with addr_len == sizeof(short)
> or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue
> 
>   while (1)
>   	yield();
> 
> loop at unix_autobind() till a name becomes available.
> This patch adds a loop counter in order to give up after 1048576 attempts and
> use the loop counter in order to count yield() interval more reliably when
> there are many threads hitting this loop.
> 
> Note that currently a local user can consume 2GB of kernel memory if the user
> is allowed to create and autobind 1048576 UNIX domain sockets. We should
> consider adding some restriction for autobind operation.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  net/unix/af_unix.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 4414a18..eedfe50 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
>  	static u32 ordernum = 1;
>  	struct unix_address *addr;
>  	int err;
> +	unsigned int retries = 0;
>  
>  	mutex_lock(&u->readlock);
>  
> @@ -717,8 +718,14 @@ retry:
>  	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
>  				      addr->hash)) {
>  		spin_unlock(&unix_table_lock);
> +		/* Give up if all names seems to be in use. */
> +		if (retries++ == 0xFFFFF) {
> +			err = -ENOMEM;
> +			kfree(addr);
> +			goto out;
> +		}
>  		/* Sanity yield. It is unusual case, but yet... */
> -		if (!(ordernum&0xFF))
> +		if (!(retries & 0xFF))
>  			yield();
>  		goto retry;
>  	}


Quite frankly, given __unix_find_socket_byname() can take quite a long
time with one million entries in table, we should just remove

if (whatever)
	yield();

and use a more friendly :

cond_resched();




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04  8:24             ` Eric Dumazet
@ 2010-09-04  9:31               ` Tetsuo Handa
  2010-09-04 10:55                 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2010-09-04  9:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

Eric Dumazet wrote:
> Quite frankly, given __unix_find_socket_byname() can take quite a long
> time with one million entries in table, we should just remove
> 
> if (whatever)
> 	yield();
> 
> and use a more friendly :
> 
> cond_resched();
> 
I see. What about this one?
----------------------------------------
>From 5c11dcc7dd5063584bc3fbdb059ecdaf12aa38ce Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 4 Sep 2010 18:02:26 +0900
Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().

We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
        yield();

loop at unix_autobind() till a name becomes available.
This patch adds a loop counter in order to give up after 1048576 attempts.

Calling yield() for once per 256 attempts may not be sufficient when many names
are already in use, for __unix_find_socket_byname() can take long time under
such circumstance. Therefore, this patch also adds cond_resched() call.

Note that currently a local user can consume 2GB of kernel memory if the user
is allowed to create and autobind 1048576 UNIX domain sockets. We should
consider adding some restriction for autobind operation.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/unix/af_unix.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4414a18..1ef37ca 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
 	static u32 ordernum = 1;
 	struct unix_address *addr;
 	int err;
+	unsigned int retries = 0;
 
 	mutex_lock(&u->readlock);
 
@@ -717,9 +718,17 @@ retry:
 	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
 				      addr->hash)) {
 		spin_unlock(&unix_table_lock);
-		/* Sanity yield. It is unusual case, but yet... */
-		if (!(ordernum&0xFF))
-			yield();
+		/*
+		 * __unix_find_socket_byname() may take long time if many names
+		 * are already in use.
+		 */
+		cond_resched();
+		/* Give up if all names seems to be in use. */
+		if (retries++ == 0xFFFFF) {
+			err = -ENOMEM;
+			kfree(addr);
+			goto out;
+		}
 		goto retry;
 	}
 	addr->hash ^= sk->sk_type;
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04  9:31               ` Tetsuo Handa
@ 2010-09-04 10:55                 ` Eric Dumazet
  2010-09-04 11:34                   ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-04 10:55 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

Le samedi 04 septembre 2010 à 18:31 +0900, Tetsuo Handa a écrit :
> +		cond_resched();
> +		/* Give up if all names seems to be in use. */
> +		if (retries++ == 0xFFFFF) {
> +			err = -ENOMEM;
> +			kfree(addr);
> +			goto out;
> +		}
>  		goto retry;
>  	}
>  	addr->hash ^= sk->sk_type;

Yes, but please use a different error code, its not ENOMEM...
maybe EBUSY or ENOSPC ...




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04 10:55                 ` Eric Dumazet
@ 2010-09-04 11:34                   ` Tetsuo Handa
  2010-09-07  1:45                     ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2010-09-04 11:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

Eric Dumazet wrote:
> Le samedi 04 septembre 2010 a 18:31 +0900, Tetsuo Handa a ecrit :
> > +		cond_resched();
> > +		/* Give up if all names seems to be in use. */
> > +		if (retries++ == 0xFFFFF) {
> > +			err = -ENOMEM;
> > +			kfree(addr);
> > +			goto out;
> > +		}
> >  		goto retry;
> >  	}
> >  	addr->hash ^= sk->sk_type;
> 
> Yes, but please use a different error code, its not ENOMEM...
> maybe EBUSY or ENOSPC ...

OK. I choose -ENOSPC.
----------------------------------------
>From c3ea4d8b28618dff235621ff1cb62e6a17aab423 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 4 Sep 2010 20:27:31 +0900
Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().

We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
        yield();

loop at unix_autobind() till a name becomes available.
This patch adds a loop counter in order to give up after 1048576 attempts.

Calling yield() for once per 256 attempts may not be sufficient when many names
are already in use, for __unix_find_socket_byname() can take long time under
such circumstance. Therefore, this patch also adds cond_resched() call.

Note that currently a local user can consume 2GB of kernel memory if the user
is allowed to create and autobind 1048576 UNIX domain sockets. We should
consider adding some restriction for autobind operation.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/unix/af_unix.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4414a18..0b39b24 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
 	static u32 ordernum = 1;
 	struct unix_address *addr;
 	int err;
+	unsigned int retries = 0;
 
 	mutex_lock(&u->readlock);
 
@@ -717,9 +718,17 @@ retry:
 	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
 				      addr->hash)) {
 		spin_unlock(&unix_table_lock);
-		/* Sanity yield. It is unusual case, but yet... */
-		if (!(ordernum&0xFF))
-			yield();
+		/*
+		 * __unix_find_socket_byname() may take long time if many names
+		 * are already in use.
+		 */
+		cond_resched();
+		/* Give up if all names seems to be in use. */
+		if (retries++ == 0xFFFFF) {
+			err = -ENOSPC;
+			kfree(addr);
+			goto out;
+		}
 		goto retry;
 	}
 	addr->hash ^= sk->sk_type;
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04  7:11         ` Eric Dumazet
  2010-09-04  7:40           ` Tetsuo Handa
@ 2010-09-04 11:52           ` Michał Mirosław
  1 sibling, 0 replies; 16+ messages in thread
From: Michał Mirosław @ 2010-09-04 11:52 UTC (permalink / raw)
  To: Eric Dumazet, Tetsuo Handa, netdev

2010/9/4 Eric Dumazet <eric.dumazet@gmail.com>:
> Le samedi 04 septembre 2010 à 15:58 +0900, Tetsuo Handa a écrit :
>> From a67ccbb8033993df29f26bde9944e37bffe4fc1b Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Sat, 4 Sep 2010 15:22:22 +0900
>> Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().
>>
>> We assumed that unix_autobind() never fails if kzalloc() succeeded.
>> But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
>> larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
>> consume all names using fork()/socket()/bind().
>>
>> If all names are in use, those who call bind() with addr_len == sizeof(short)
>> or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue
>>
>>   while (1)
>>       yield();
>>
>> loop at unix_autobind() till a name becomes available.
>> This patch changes unix_autobind() to fail if all names are in use.
>>
>> Note that currently a local user can consume 2GB of kernel memory if the user
>> is allowed to create and autobind 1048576 UNIX domain sockets. We should
>> consider adding some restriction for autobind operation.
[cut patch]
> Sorry, this wont work very well if you have many processes using
> autobind(). Some of them will loop many time before hitting
> "stop_ordernum".
>
> unsigned int counter;
>
> ...
>
> if (++maxtries == 1<<20) {
>        ...
> }
>
>
> This is a pathological situation. We are not forced to give a successful
> autobind() when so many sockets are in use, even if some slots are
> available.

Is there any specific requirement on generated names for auto-bind?
Wouldn't it be easier and more efficient to use some pseudo-random
name. i.e. derived from current time and/or owning process state?

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] UNIX: Do not loop forever at unix_autobind().
  2010-09-04 11:34                   ` Tetsuo Handa
@ 2010-09-07  1:45                     ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2010-09-07  1:45 UTC (permalink / raw)
  To: penguin-kernel; +Cc: eric.dumazet, netdev

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 04 Sep 2010 20:34:28 +0900

> Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().
> 
> We assumed that unix_autobind() never fails if kzalloc() succeeded.
> But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
> larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
> consume all names using fork()/socket()/bind().
> 
> If all names are in use, those who call bind() with addr_len == sizeof(short)
> or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue
> 
>   while (1)
>         yield();
> 
> loop at unix_autobind() till a name becomes available.
> This patch adds a loop counter in order to give up after 1048576 attempts.
> 
> Calling yield() for once per 256 attempts may not be sufficient when many names
> are already in use, for __unix_find_socket_byname() can take long time under
> such circumstance. Therefore, this patch also adds cond_resched() call.
> 
> Note that currently a local user can consume 2GB of kernel memory if the user
> is allowed to create and autobind 1048576 UNIX domain sockets. We should
> consider adding some restriction for autobind operation.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-09-07  1:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-21 12:01 About unix_autobind() Tetsuo Handa
2010-08-21 12:34 ` Changli Gao
2010-08-30 13:27   ` [PATCH] UNIX: Do not loop forever at unix_autobind() Tetsuo Handa
2010-09-01 19:47     ` Eric Dumazet
2010-09-04  6:58       ` Tetsuo Handa
2010-09-04  7:11         ` Eric Dumazet
2010-09-04  7:40           ` Tetsuo Handa
2010-09-04  8:24             ` Eric Dumazet
2010-09-04  9:31               ` Tetsuo Handa
2010-09-04 10:55                 ` Eric Dumazet
2010-09-04 11:34                   ` Tetsuo Handa
2010-09-07  1:45                     ` David Miller
2010-09-04 11:52           ` Michał Mirosław
2010-09-01 21:33     ` How can OOM killer detect process consuming much kernel memory? Tetsuo Handa
2010-09-01 22:25       ` David Rientjes
2010-09-03  6:32       ` KOSAKI Motohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.