linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH 05/27] containers: Open a socket inside a container
       [not found] <m2o8z7t2w5.fsf@badgerous.net>
@ 2019-09-27 14:46 ` Eric W. Biederman
  2019-09-28 22:29   ` Alun Evans
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2019-09-27 14:46 UTC (permalink / raw)
  To: Alun Evans; +Cc: linux-kernel

Alun Evans <alun@badgerous.net> writes:

> Hi Eric,
>
>
> On Tue, 19 Feb 2019, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> David Howells <dhowells@redhat.com> writes:
>>
>> > Provide a system call to open a socket inside of a container, using that
>> > container's network namespace.  This allows netlink to be used to manage
>> > the container.
>> >
>> > 	fd = container_socket(int container_fd,
>> > 			      int domain, int type, int protocol);
>> >
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> Use a namespace file descriptor if you need this.  So far we have not
>> added this system call as it is just a performance optimization.  And it
>> has been too niche to matter.
>>
>> If this that has changed we can add this separately from everything else
>> you are doing here.
>
> I think I've found the niche.
>
>
> I'm trying to use network namespaces from Go.

Yes. Go sucks for this.

> Since setns is thread
> specific, I'm forced to use this pattern:
>
>     runtime.LockOSThread()
>     defer runtime.UnlockOSThread()
>     …
>     err = netns.Set(newns)
>
>
> This is only safe recently:
> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770
>
> - but is still less than ideal performance wise, as it locks out other
>   socket operations.
>
> The socketat() / socketns() would be ideal:
>
>   https://lwn.net/Articles/406684/
>   https://lwn.net/Articles/407495/
>   https://lkml.org/lkml/2011/10/3/220
>
>
> One thing that is interesting, the LockOSThread works pretty well for
> receiving, since I can wrap it around the socket()/bind()/listen() at
> startup. Then accept() can run outside of the lock.
>
> It's creating new outbound tcp connections via socket()/connect() pairs
> that is the issue.

As I understand it you should be able to write socketat in go something like:

	runtime.LockOSThread()
        err = netns.Set(newns);
        fd = socket(...);
        err = netns.Set(defaultns);
	runtime.UnlockOSThread()

I have no real objections to a kernel system call doing that.  It has
just never risen to the level where it was necessary to optimize
userspace yet.

Eric




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 05/27] containers: Open a socket inside a container
  2019-09-27 14:46 ` [RFC PATCH 05/27] containers: Open a socket inside a container Eric W. Biederman
@ 2019-09-28 22:29   ` Alun Evans
  2019-09-30 10:02     ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Alun Evans @ 2019-09-28 22:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel



On Fri 27 Sep '19 at 07:46 ebiederm@xmission.com (Eric W. Biederman) wrote:
> 
> Alun Evans <alun@badgerous.net> writes:
>
>> Hi Eric,
>>
>>
>> On Tue, 19 Feb 2019, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>
>>> David Howells <dhowells@redhat.com> writes:
>>>
>>> > Provide a system call to open a socket inside of a container, using that
>>> > container's network namespace.  This allows netlink to be used to manage
>>> > the container.
>>> >
>>> > 	fd = container_socket(int container_fd,
>>> > 			      int domain, int type, int protocol);
>>> >
>>>
>>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>
>>> Use a namespace file descriptor if you need this.  So far we have not
>>> added this system call as it is just a performance optimization.  And it
>>> has been too niche to matter.
>>>
>>> If this that has changed we can add this separately from everything else
>>> you are doing here.
>>
>> I think I've found the niche.
>>
>>
>> I'm trying to use network namespaces from Go.
>
> Yes. Go sucks for this.

Haha... Neither confirm nor deny.

>> Since setns is thread
>> specific, I'm forced to use this pattern:
>>
>>     runtime.LockOSThread()
>>     defer runtime.UnlockOSThread()
>>     …
>>     err = netns.Set(newns)
>>
>>
>> This is only safe recently:
>> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770
>>
>> - but is still less than ideal performance wise, as it locks out other
>>   socket operations.
>>
>> The socketat() / socketns() would be ideal:
>>
>>   https://lwn.net/Articles/406684/
>>   https://lwn.net/Articles/407495/
>>   https://lkml.org/lkml/2011/10/3/220
>>
>>
>> One thing that is interesting, the LockOSThread works pretty well for
>> receiving, since I can wrap it around the socket()/bind()/listen() at
>> startup. Then accept() can run outside of the lock.
>>
>> It's creating new outbound tcp connections via socket()/connect() pairs
>> that is the issue.
>
> As I understand it you should be able to write socketat in go something like:
>
>         runtime.LockOSThread()
>         err = netns.Set(newns);
>         fd = socket(...);
>         err = netns.Set(defaultns);
>         runtime.UnlockOSThread()

Yeah, this is currently what I'm having to do. It's painful because due
to the Go runtime model of a single OS netpoller thread, locking the OS
thread to the current goroutine blocks out the other goroutines doing
network I/O.

> I have no real objections to a kernel system call doing that.  It has
> just never risen to the level where it was necessary to optimize
> userspace yet.

Would you be able to accept the patch from this thread with the
container API?

    fd = container_socket(int container_fd,
                          int domain, int type, int protocol);

I think that seems more coherent with the rest of the container world
than a follow up of https://lkml.org/lkml/2011/10/3/220 :

    int socketns(int namespace, int domain, int type, int protocol)


I could also put some up if required.


A.


-- 
Alun Evans.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 05/27] containers: Open a socket inside a container
  2019-09-28 22:29   ` Alun Evans
@ 2019-09-30 10:02     ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2019-09-30 10:02 UTC (permalink / raw)
  To: Alun Evans; +Cc: linux-kernel

Alun Evans <alun@badgerous.net> writes:

> On Fri 27 Sep '19 at 07:46 ebiederm@xmission.com (Eric W. Biederman) wrote:
>> 
>> Alun Evans <alun@badgerous.net> writes:
>>
>>> Hi Eric,
>>>
>>>
>>> On Tue, 19 Feb 2019, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>
>>>> David Howells <dhowells@redhat.com> writes:
>>>>
>>>> > Provide a system call to open a socket inside of a container, using that
>>>> > container's network namespace.  This allows netlink to be used to manage
>>>> > the container.
>>>> >
>>>> > 	fd = container_socket(int container_fd,
>>>> > 			      int domain, int type, int protocol);
>>>> >
>>>>
>>>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>>
>>>> Use a namespace file descriptor if you need this.  So far we have not
>>>> added this system call as it is just a performance optimization.  And it
>>>> has been too niche to matter.
>>>>
>>>> If this that has changed we can add this separately from everything else
>>>> you are doing here.
>>>
>>> I think I've found the niche.
>>>
>>>
>>> I'm trying to use network namespaces from Go.
>>
>> Yes. Go sucks for this.
>
> Haha... Neither confirm nor deny.
>
>>> Since setns is thread
>>> specific, I'm forced to use this pattern:
>>>
>>>     runtime.LockOSThread()
>>>     defer runtime.UnlockOSThread()
>>>     …
>>>     err = netns.Set(newns)
>>>
>>>
>>> This is only safe recently:
>>> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770
>>>
>>> - but is still less than ideal performance wise, as it locks out other
>>>   socket operations.
>>>
>>> The socketat() / socketns() would be ideal:
>>>
>>>   https://lwn.net/Articles/406684/
>>>   https://lwn.net/Articles/407495/
>>>   https://lkml.org/lkml/2011/10/3/220
>>>
>>>
>>> One thing that is interesting, the LockOSThread works pretty well for
>>> receiving, since I can wrap it around the socket()/bind()/listen() at
>>> startup. Then accept() can run outside of the lock.
>>>
>>> It's creating new outbound tcp connections via socket()/connect() pairs
>>> that is the issue.
>>
>> As I understand it you should be able to write socketat in go something like:
>>
>>         runtime.LockOSThread()
>>         err = netns.Set(newns);
>>         fd = socket(...);
>>         err = netns.Set(defaultns);
>>         runtime.UnlockOSThread()
>
> Yeah, this is currently what I'm having to do. It's painful because due
> to the Go runtime model of a single OS netpoller thread, locking the OS
> thread to the current goroutine blocks out the other goroutines doing
> network I/O.

Just to be clear you know that only the setns and the socket calls need
to block out switching threads and all of those should be currently
quite fast.

Hmm.  So this is a global Go lock and not simply locking the current go
routine onto it's current kernel thread?  Yes that does sound quite
painful.

It would be very nice if Go could provide an idiom where a series of
calls could be fixed to a single kernel thread.

>> I have no real objections to a kernel system call doing that.  It has
>> just never risen to the level where it was necessary to optimize
>> userspace yet.
>
> Would you be able to accept the patch from this thread with the
> container API?
>
>     fd = container_socket(int container_fd,
>                           int domain, int type, int protocol);
>
> I think that seems more coherent with the rest of the container world
> than a follow up of https://lkml.org/lkml/2011/10/3/220 :
>

Given container_socket implies the need to create a namespace of
namespaces. No.

Given that container_socket can't be used in iptools because it has
a different concept of container.  No.

Given that no one has ever proposed solving the entire migration story
when the have wanted to define a container and thus all of this
implies breaking CRIU.  No.

>     int socketns(int netns_fd, int domain, int type, int protocol)
>

Yes please.

I suspect in the current world where system calls are much more
expensive (because of mitigations for speculative execution bugs) with a
little bit of timing we could come up with a reasonable case even for
non GO runtimes.

To that end I would like to see performance numbers of at least a micro
benchmark in C.  Just so we can quantify the improvement.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 05/27] containers: Open a socket inside a container
  2019-02-15 16:07 ` [RFC PATCH 05/27] containers: Open a socket inside a container David Howells
@ 2019-02-19 16:41   ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2019-02-19 16:41 UTC (permalink / raw)
  To: David Howells
  Cc: keyrings, trond.myklebust, sfrench, linux-security-module,
	linux-nfs, linux-cifs, linux-fsdevel, rgb, linux-kernel

David Howells <dhowells@redhat.com> writes:

> Provide a system call to open a socket inside of a container, using that
> container's network namespace.  This allows netlink to be used to manage
> the container.
>
> 	fd = container_socket(int container_fd,
> 			      int domain, int type, int protocol);
>

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Use a namespace file descriptor if you need this.  So far we have not
added this system call as it is just a performance optimization.  And it
has been too niche to matter.

If this that has changed we can add this separately from everything else
you are doing here.


> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 +
>  include/linux/socket.h                 |    3 ++-
>  include/linux/syscalls.h               |    2 ++
>  kernel/sys_ni.c                        |    1 +
>  net/compat.c                           |    2 +-
>  net/socket.c                           |   34 +++++++++++++++++++++++++++-----
>  7 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 8666693510f9..f4c9beff77a6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -409,3 +409,4 @@
>  395	i386	sb_notify		sys_sb_notify			__ia32_sys_sb_notify
>  396	i386	container_create	sys_container_create		__ia32_sys_container_create
>  397	i386	fork_into_container	sys_fork_into_container		__ia32_sys_fork_into_container
> +398	i386	container_socket	sys_container_socket		__ia32_sys_container_socket
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index d40d4790fcb2..e20cdf7b5527 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -354,6 +354,7 @@
>  343	common	sb_notify		__x64_sys_sb_notify
>  344	common	container_create	__x64_sys_container_create
>  345	common	fork_into_container	__x64_sys_fork_into_container
> +346	common	container_socket	__x64_sys_container_socket
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index ab2041a00e01..154ac900a8a5 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -10,6 +10,7 @@
>  #include <linux/compiler.h>		/* __user			*/
>  #include <uapi/linux/socket.h>
>  
> +struct net;
>  struct pid;
>  struct cred;
>  
> @@ -376,7 +377,7 @@ extern int __sys_sendto(int fd, void __user *buff, size_t len,
>  			int addr_len);
>  extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
>  			 int __user *upeer_addrlen, int flags);
> -extern int __sys_socket(int family, int type, int protocol);
> +extern int __sys_socket(struct net *net, int family, int type, int protocol);
>  extern int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen);
>  extern int __sys_connect(int fd, struct sockaddr __user *uservaddr,
>  			 int addrlen);
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 15e5cc704df3..547334c6ffc2 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -947,6 +947,8 @@ asmlinkage long sys_container_create(const char __user *name, unsigned int flags
>  				     unsigned long spare3, unsigned long spare4,
>  				     unsigned long spare5);
>  asmlinkage long sys_fork_into_container(int containerfd);
> +asmlinkage long sys_container_socket(int containerfd,
> +				     int domain, int type, int protocol);
>  
>  /*
>   * Architecture-specific system calls
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index a23ad529d548..ce9c5bb30e7f 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -236,6 +236,7 @@ COND_SYSCALL(shmdt);
>  /* net/socket.c */
>  COND_SYSCALL(socket);
>  COND_SYSCALL(socketpair);
> +COND_SYSCALL(container_socket);
>  COND_SYSCALL(bind);
>  COND_SYSCALL(listen);
>  COND_SYSCALL(accept);
> diff --git a/net/compat.c b/net/compat.c
> index 959d1c51826d..1b2db740fd33 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -856,7 +856,7 @@ COMPAT_SYSCALL_DEFINE2(socketcall, int, call, u32 __user *, args)
>  
>  	switch (call) {
>  	case SYS_SOCKET:
> -		ret = __sys_socket(a0, a1, a[2]);
> +		ret = __sys_socket(current->nsproxy->net_ns, a0, a1, a[2]);
>  		break;
>  	case SYS_BIND:
>  		ret = __sys_bind(a0, compat_ptr(a1), a[2]);
> diff --git a/net/socket.c b/net/socket.c
> index 7d271a1d0c7e..7406580598b9 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -80,6 +80,7 @@
>  #include <linux/highmem.h>
>  #include <linux/mount.h>
>  #include <linux/fs_context.h>
> +#include <linux/container.h>
>  #include <linux/security.h>
>  #include <linux/syscalls.h>
>  #include <linux/compat.h>
> @@ -1326,9 +1327,9 @@ int sock_create_kern(struct net *net, int family, int type, int protocol, struct
>  }
>  EXPORT_SYMBOL(sock_create_kern);
>  
> -int __sys_socket(int family, int type, int protocol)
> +int __sys_socket(struct net *net, int family, int type, int protocol)
>  {
> -	int retval;
> +	long retval;
>  	struct socket *sock;
>  	int flags;
>  
> @@ -1346,7 +1347,7 @@ int __sys_socket(int family, int type, int protocol)
>  	if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
>  		flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
>  
> -	retval = sock_create(family, type, protocol, &sock);
> +	retval = __sock_create(net, family, type, protocol, &sock, 0);
>  	if (retval < 0)
>  		return retval;
>  
> @@ -1355,9 +1356,32 @@ int __sys_socket(int family, int type, int protocol)
>  
>  SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
>  {
> -	return __sys_socket(family, type, protocol);
> +	return __sys_socket(current->nsproxy->net_ns, family, type, protocol);
>  }
>  
> +/*
> + * Create a socket inside a container.
> + */
> +#ifdef CONFIG_CONTAINERS
> +SYSCALL_DEFINE4(container_socket,
> +		int, containerfd, int, family, int, type, int, protocol)
> +{
> +	struct fd f = fdget(containerfd);
> +	long ret;
> +
> +	if (!f.file)
> +		return -EBADF;
> +	ret = -EINVAL;
> +	if (is_container_file(f.file)) {
> +		struct container *c = f.file->private_data;
> +
> +		ret = __sys_socket(c->ns->net_ns, family, type, protocol);
> +	}
> +	fdput(f);
> +	return ret;
> +}
> +#endif
> +
>  /*
>   *	Create a pair of connected sockets.
>   */
> @@ -2555,7 +2579,7 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
>  
>  	switch (call) {
>  	case SYS_SOCKET:
> -		err = __sys_socket(a0, a1, a[2]);
> +		err = __sys_socket(current->nsproxy->net_ns, a0, a1, a[2]);
>  		break;
>  	case SYS_BIND:
>  		err = __sys_bind(a0, (struct sockaddr __user *)a1, a[2]);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 05/27] containers: Open a socket inside a container
  2019-02-15 16:07 [RFC PATCH 00/27] Containers and using authenticated filesystems David Howells
@ 2019-02-15 16:07 ` David Howells
  2019-02-19 16:41   ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: David Howells @ 2019-02-15 16:07 UTC (permalink / raw)
  To: keyrings, trond.myklebust, sfrench
  Cc: linux-security-module, linux-nfs, linux-cifs, linux-fsdevel, rgb,
	dhowells, linux-kernel

Provide a system call to open a socket inside of a container, using that
container's network namespace.  This allows netlink to be used to manage
the container.

	fd = container_socket(int container_fd,
			      int domain, int type, int protocol);

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 +
 arch/x86/entry/syscalls/syscall_64.tbl |    1 +
 include/linux/socket.h                 |    3 ++-
 include/linux/syscalls.h               |    2 ++
 kernel/sys_ni.c                        |    1 +
 net/compat.c                           |    2 +-
 net/socket.c                           |   34 +++++++++++++++++++++++++++-----
 7 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 8666693510f9..f4c9beff77a6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -409,3 +409,4 @@
 395	i386	sb_notify		sys_sb_notify			__ia32_sys_sb_notify
 396	i386	container_create	sys_container_create		__ia32_sys_container_create
 397	i386	fork_into_container	sys_fork_into_container		__ia32_sys_fork_into_container
+398	i386	container_socket	sys_container_socket		__ia32_sys_container_socket
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index d40d4790fcb2..e20cdf7b5527 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -354,6 +354,7 @@
 343	common	sb_notify		__x64_sys_sb_notify
 344	common	container_create	__x64_sys_container_create
 345	common	fork_into_container	__x64_sys_fork_into_container
+346	common	container_socket	__x64_sys_container_socket
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/socket.h b/include/linux/socket.h
index ab2041a00e01..154ac900a8a5 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>		/* __user			*/
 #include <uapi/linux/socket.h>
 
+struct net;
 struct pid;
 struct cred;
 
@@ -376,7 +377,7 @@ extern int __sys_sendto(int fd, void __user *buff, size_t len,
 			int addr_len);
 extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 			 int __user *upeer_addrlen, int flags);
-extern int __sys_socket(int family, int type, int protocol);
+extern int __sys_socket(struct net *net, int family, int type, int protocol);
 extern int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen);
 extern int __sys_connect(int fd, struct sockaddr __user *uservaddr,
 			 int addrlen);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 15e5cc704df3..547334c6ffc2 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -947,6 +947,8 @@ asmlinkage long sys_container_create(const char __user *name, unsigned int flags
 				     unsigned long spare3, unsigned long spare4,
 				     unsigned long spare5);
 asmlinkage long sys_fork_into_container(int containerfd);
+asmlinkage long sys_container_socket(int containerfd,
+				     int domain, int type, int protocol);
 
 /*
  * Architecture-specific system calls
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index a23ad529d548..ce9c5bb30e7f 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -236,6 +236,7 @@ COND_SYSCALL(shmdt);
 /* net/socket.c */
 COND_SYSCALL(socket);
 COND_SYSCALL(socketpair);
+COND_SYSCALL(container_socket);
 COND_SYSCALL(bind);
 COND_SYSCALL(listen);
 COND_SYSCALL(accept);
diff --git a/net/compat.c b/net/compat.c
index 959d1c51826d..1b2db740fd33 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -856,7 +856,7 @@ COMPAT_SYSCALL_DEFINE2(socketcall, int, call, u32 __user *, args)
 
 	switch (call) {
 	case SYS_SOCKET:
-		ret = __sys_socket(a0, a1, a[2]);
+		ret = __sys_socket(current->nsproxy->net_ns, a0, a1, a[2]);
 		break;
 	case SYS_BIND:
 		ret = __sys_bind(a0, compat_ptr(a1), a[2]);
diff --git a/net/socket.c b/net/socket.c
index 7d271a1d0c7e..7406580598b9 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -80,6 +80,7 @@
 #include <linux/highmem.h>
 #include <linux/mount.h>
 #include <linux/fs_context.h>
+#include <linux/container.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/compat.h>
@@ -1326,9 +1327,9 @@ int sock_create_kern(struct net *net, int family, int type, int protocol, struct
 }
 EXPORT_SYMBOL(sock_create_kern);
 
-int __sys_socket(int family, int type, int protocol)
+int __sys_socket(struct net *net, int family, int type, int protocol)
 {
-	int retval;
+	long retval;
 	struct socket *sock;
 	int flags;
 
@@ -1346,7 +1347,7 @@ int __sys_socket(int family, int type, int protocol)
 	if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
 		flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
 
-	retval = sock_create(family, type, protocol, &sock);
+	retval = __sock_create(net, family, type, protocol, &sock, 0);
 	if (retval < 0)
 		return retval;
 
@@ -1355,9 +1356,32 @@ int __sys_socket(int family, int type, int protocol)
 
 SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
 {
-	return __sys_socket(family, type, protocol);
+	return __sys_socket(current->nsproxy->net_ns, family, type, protocol);
 }
 
+/*
+ * Create a socket inside a container.
+ */
+#ifdef CONFIG_CONTAINERS
+SYSCALL_DEFINE4(container_socket,
+		int, containerfd, int, family, int, type, int, protocol)
+{
+	struct fd f = fdget(containerfd);
+	long ret;
+
+	if (!f.file)
+		return -EBADF;
+	ret = -EINVAL;
+	if (is_container_file(f.file)) {
+		struct container *c = f.file->private_data;
+
+		ret = __sys_socket(c->ns->net_ns, family, type, protocol);
+	}
+	fdput(f);
+	return ret;
+}
+#endif
+
 /*
  *	Create a pair of connected sockets.
  */
@@ -2555,7 +2579,7 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
 
 	switch (call) {
 	case SYS_SOCKET:
-		err = __sys_socket(a0, a1, a[2]);
+		err = __sys_socket(current->nsproxy->net_ns, a0, a1, a[2]);
 		break;
 	case SYS_BIND:
 		err = __sys_bind(a0, (struct sockaddr __user *)a1, a[2]);


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-30 10:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <m2o8z7t2w5.fsf@badgerous.net>
2019-09-27 14:46 ` [RFC PATCH 05/27] containers: Open a socket inside a container Eric W. Biederman
2019-09-28 22:29   ` Alun Evans
2019-09-30 10:02     ` Eric W. Biederman
2019-02-15 16:07 [RFC PATCH 00/27] Containers and using authenticated filesystems David Howells
2019-02-15 16:07 ` [RFC PATCH 05/27] containers: Open a socket inside a container David Howells
2019-02-19 16:41   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).