All of lore.kernel.org
 help / color / mirror / Atom feed
* Retrieving the network namespace of a socket
@ 2021-10-20  9:57 Sargun Dhillon
  2021-10-20 14:03 ` Sergey Ryazanov
  0 siblings, 1 reply; 6+ messages in thread
From: Sargun Dhillon @ 2021-10-20  9:57 UTC (permalink / raw)
  To: LKML, netdev, Christian Brauner

I'm working on a problem where I need to determine which network namespace a 
given socket is in. I can currently bruteforce this by using INET_DIAG, and 
enumerating namespaces and working backwards. 

I was wondering if anyone had any suggestions on where to add this capability, 
or suggestions how on how it might look? It appears like using network namespace 
IDs is the thing to do -- I'm unsure of whether this API fits better into 
ioctl_ns, getsockopt, netlink, or even somewhere in proc. I'm curious what 
people think. I think that the "easiest" way would be to drop it into fdinfo
in proc.

Alternatively, I may have overlooked an existing API.

-Thanks,
Sargun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Retrieving the network namespace of a socket
  2021-10-20  9:57 Retrieving the network namespace of a socket Sargun Dhillon
@ 2021-10-20 14:03 ` Sergey Ryazanov
  2021-10-20 16:34   ` Sargun Dhillon
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Ryazanov @ 2021-10-20 14:03 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: LKML, netdev, Christian Brauner

Hello Sargun,

On Wed, Oct 20, 2021 at 12:57 PM Sargun Dhillon <sargun@sargun.me> wrote:
> I'm working on a problem where I need to determine which network namespace a
> given socket is in. I can currently bruteforce this by using INET_DIAG, and
> enumerating namespaces and working backwards.

Namespace is not a per-socket, but a per-process attribute. So each
socket of a process belongs to the same namespace.

Could you elaborate what kind of problem you are trying to solve?
Maybe there is a more simple solution. for it.

-- 
Sergey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Retrieving the network namespace of a socket
  2021-10-20 14:03 ` Sergey Ryazanov
@ 2021-10-20 16:34   ` Sargun Dhillon
  2021-10-20 19:24     ` Sargun Dhillon
  2021-10-20 20:02     ` Sergey Ryazanov
  0 siblings, 2 replies; 6+ messages in thread
From: Sargun Dhillon @ 2021-10-20 16:34 UTC (permalink / raw)
  To: Sergey Ryazanov; +Cc: LKML, netdev, Christian Brauner

On Wed, Oct 20, 2021 at 05:03:56PM +0300, Sergey Ryazanov wrote:
> Hello Sargun,
> 
> On Wed, Oct 20, 2021 at 12:57 PM Sargun Dhillon <sargun@sargun.me> wrote:
> > I'm working on a problem where I need to determine which network namespace a
> > given socket is in. I can currently bruteforce this by using INET_DIAG, and
> > enumerating namespaces and working backwards.
> 
> Namespace is not a per-socket, but a per-process attribute. So each
> socket of a process belongs to the same namespace.
> 
> Could you elaborate what kind of problem you are trying to solve?
> Maybe there is a more simple solution. for it.
> 
> -- 
> Sergey

That's not entirely true. See the folowing code:

int main() {
	int fd1, fd2;
	fd1 = socket(AF_INET, SOCK_STREAM, 0);
	assert(fd1 >= 0);
	assert(unshare(CLONE_NEWNET) == 0);
	fd2 = socket(AF_INET, SOCK_STREAM, 0);
	assert(fd2 >= 0);
}

fd1 and fd2 have different sock_net.

The context for this is:
https://linuxplumbersconf.org/event/11/contributions/932/

We need to figure out, for a given socket, if it has reachability to a given IP.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Retrieving the network namespace of a socket
  2021-10-20 16:34   ` Sargun Dhillon
@ 2021-10-20 19:24     ` Sargun Dhillon
  2021-10-21  9:20       ` Christian Brauner
  2021-10-20 20:02     ` Sergey Ryazanov
  1 sibling, 1 reply; 6+ messages in thread
From: Sargun Dhillon @ 2021-10-20 19:24 UTC (permalink / raw)
  To: Sergey Ryazanov; +Cc: LKML, netdev, Christian Brauner

On Wed, Oct 20, 2021 at 04:34:18PM +0000, Sargun Dhillon wrote:
> On Wed, Oct 20, 2021 at 05:03:56PM +0300, Sergey Ryazanov wrote:
> > Hello Sargun,
> > 
> > On Wed, Oct 20, 2021 at 12:57 PM Sargun Dhillon <sargun@sargun.me> wrote:
> > > I'm working on a problem where I need to determine which network namespace a
> > > given socket is in. I can currently bruteforce this by using INET_DIAG, and
> > > enumerating namespaces and working backwards.
> > 
> > Namespace is not a per-socket, but a per-process attribute. So each
> > socket of a process belongs to the same namespace.
> > 
> > Could you elaborate what kind of problem you are trying to solve?
> > Maybe there is a more simple solution. for it.
> > 
> > -- 
> > Sergey
> 
> That's not entirely true. See the folowing code:
> 
> int main() {
> 	int fd1, fd2;
> 	fd1 = socket(AF_INET, SOCK_STREAM, 0);
> 	assert(fd1 >= 0);
> 	assert(unshare(CLONE_NEWNET) == 0);
> 	fd2 = socket(AF_INET, SOCK_STREAM, 0);
> 	assert(fd2 >= 0);
> }
> 
> fd1 and fd2 have different sock_net.
> 
> The context for this is:
> https://linuxplumbersconf.org/event/11/contributions/932/
> 
> We need to figure out, for a given socket, if it has reachability to a given IP.

So, I was lazy / misread documentation. It turns out SIOCGSKNS does exactly
what I need.

Nonetheless, it's a little weird and awkward that it is exists. I was wondering
if this functionality made sense as part of kcmp. I wrote up a quick patch
to see if anyone was interested:

diff --git a/include/uapi/linux/kcmp.h b/include/uapi/linux/kcmp.h
index ef1305010925..d6b9c3923d20 100644
--- a/include/uapi/linux/kcmp.h
+++ b/include/uapi/linux/kcmp.h
@@ -14,6 +14,7 @@ enum kcmp_type {
 	KCMP_IO,
 	KCMP_SYSVSEM,
 	KCMP_EPOLL_TFD,
+	KCMP_NETNS,
 
 	KCMP_TYPES,
 };
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index 5353edfad8e1..8fadae4b588f 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -18,6 +18,8 @@
 #include <linux/file.h>
 
 #include <asm/unistd.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
 
 /*
  * We don't expose the real in-memory order of objects for security reasons.
@@ -132,6 +134,58 @@ static int kcmp_epoll_target(struct task_struct *task1,
 }
 #endif
 
+#ifdef CONFIG_NET
+static int __kcmp_netns_target(struct task_struct *task1,
+			       struct task_struct *task2,
+			       struct file *filp1,
+			       struct file *filp2)
+{
+	struct socket *sock1, *sock2;
+	struct net *net1, *net2;
+
+	sock1 = sock_from_file(filp1);
+	sock2 = sock_from_file(filp1);
+	if (!sock1 || !sock2)
+		return -ENOTSOCK;
+
+	net1 = sock_net(sock1->sk);
+	net2 = sock_net(sock2->sk);
+
+	return kcmp_ptr(net1, net2, KCMP_NETNS);
+}
+
+static int kcmp_netns_target(struct task_struct *task1,
+			     struct task_struct *task2,
+			     unsigned long idx1,
+			     unsigned long idx2)
+{
+	struct file *filp1, *filp2;
+
+	int ret = -EBADF;
+
+	filp1 = fget_task(task1, idx1);
+	if (filp1) {
+		filp2 = fget_task(task2, idx2);
+		if (filp2) {
+			ret = __kcmp_netns_target(task1, task2, filp1, filp2);
+			fput(filp2);
+		}
+
+		fput(filp1);
+	}
+
+	return ret;
+}
+#else
+static int kcmp_netns_target(struct task_struct *task1,
+			     struct task_struct *task2,
+			     unsigned long idx1,
+			     unsigned long idx2)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 		unsigned long, idx1, unsigned long, idx2)
 {
@@ -206,6 +260,9 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 	case KCMP_EPOLL_TFD:
 		ret = kcmp_epoll_target(task1, task2, idx1, (void *)idx2);
 		break;
+	case KCMP_NETNS:
+		ret = kcmp_netns_target(task1, task2, idx1, idx2);
+		break;
 	default:
 		ret = -EINVAL;
 		break;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Retrieving the network namespace of a socket
  2021-10-20 16:34   ` Sargun Dhillon
  2021-10-20 19:24     ` Sargun Dhillon
@ 2021-10-20 20:02     ` Sergey Ryazanov
  1 sibling, 0 replies; 6+ messages in thread
From: Sergey Ryazanov @ 2021-10-20 20:02 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: LKML, netdev, Christian Brauner

On Wed, Oct 20, 2021 at 7:34 PM Sargun Dhillon <sargun@sargun.me> wrote:
> On Wed, Oct 20, 2021 at 05:03:56PM +0300, Sergey Ryazanov wrote:
>> On Wed, Oct 20, 2021 at 12:57 PM Sargun Dhillon <sargun@sargun.me> wrote:
>>> I'm working on a problem where I need to determine which network namespace a
>>> given socket is in. I can currently bruteforce this by using INET_DIAG, and
>>> enumerating namespaces and working backwards.
>>
>> Namespace is not a per-socket, but a per-process attribute. So each
>> socket of a process belongs to the same namespace.
>>
> > Could you elaborate what kind of problem you are trying to solve?
>> Maybe there is a more simple solution. for it.
>
> That's not entirely true. See the folowing code:
>
> int main() {
>         int fd1, fd2;
>         fd1 = socket(AF_INET, SOCK_STREAM, 0);
>         assert(fd1 >= 0);
>         assert(unshare(CLONE_NEWNET) == 0);
>         fd2 = socket(AF_INET, SOCK_STREAM, 0);
>         assert(fd2 >= 0);
> }
>
> fd1 and fd2 have different sock_net.

Ouch, I totally missed this case. Thank you for reminding me.

--
Sergey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Retrieving the network namespace of a socket
  2021-10-20 19:24     ` Sargun Dhillon
@ 2021-10-21  9:20       ` Christian Brauner
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2021-10-21  9:20 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: Sergey Ryazanov, LKML, netdev

On Wed, Oct 20, 2021 at 07:24:57PM +0000, Sargun Dhillon wrote:
> On Wed, Oct 20, 2021 at 04:34:18PM +0000, Sargun Dhillon wrote:
> > On Wed, Oct 20, 2021 at 05:03:56PM +0300, Sergey Ryazanov wrote:
> > > Hello Sargun,
> > > 
> > > On Wed, Oct 20, 2021 at 12:57 PM Sargun Dhillon <sargun@sargun.me> wrote:
> > > > I'm working on a problem where I need to determine which network namespace a
> > > > given socket is in. I can currently bruteforce this by using INET_DIAG, and
> > > > enumerating namespaces and working backwards.
> > > 
> > > Namespace is not a per-socket, but a per-process attribute. So each
> > > socket of a process belongs to the same namespace.
> > > 
> > > Could you elaborate what kind of problem you are trying to solve?
> > > Maybe there is a more simple solution. for it.
> > > 
> > > -- 
> > > Sergey
> > 
> > That's not entirely true. See the folowing code:
> > 
> > int main() {
> > 	int fd1, fd2;
> > 	fd1 = socket(AF_INET, SOCK_STREAM, 0);
> > 	assert(fd1 >= 0);
> > 	assert(unshare(CLONE_NEWNET) == 0);
> > 	fd2 = socket(AF_INET, SOCK_STREAM, 0);
> > 	assert(fd2 >= 0);
> > }
> > 
> > fd1 and fd2 have different sock_net.
> > 
> > The context for this is:
> > https://linuxplumbersconf.org/event/11/contributions/932/
> > 
> > We need to figure out, for a given socket, if it has reachability to a given IP.
> 
> So, I was lazy / misread documentation. It turns out SIOCGSKNS does exactly
> what I need.

I was about to reply with this. :) It's heavily used in CRIU and we use
it in LXC/LXD as well.

> 
> Nonetheless, it's a little weird and awkward that it is exists. I was wondering
> if this functionality made sense as part of kcmp. I wrote up a quick patch
> to see if anyone was interested:

Per se I don't see a reason why this shouldn't exist as an extension to
kcmp(). It seems useful.

> 
> diff --git a/include/uapi/linux/kcmp.h b/include/uapi/linux/kcmp.h
> index ef1305010925..d6b9c3923d20 100644
> --- a/include/uapi/linux/kcmp.h
> +++ b/include/uapi/linux/kcmp.h
> @@ -14,6 +14,7 @@ enum kcmp_type {
>  	KCMP_IO,
>  	KCMP_SYSVSEM,
>  	KCMP_EPOLL_TFD,
> +	KCMP_NETNS,
>  
>  	KCMP_TYPES,
>  };
> diff --git a/kernel/kcmp.c b/kernel/kcmp.c
> index 5353edfad8e1..8fadae4b588f 100644
> --- a/kernel/kcmp.c
> +++ b/kernel/kcmp.c
> @@ -18,6 +18,8 @@
>  #include <linux/file.h>
>  
>  #include <asm/unistd.h>
> +#include <net/net_namespace.h>
> +#include <net/sock.h>
>  
>  /*
>   * We don't expose the real in-memory order of objects for security reasons.
> @@ -132,6 +134,58 @@ static int kcmp_epoll_target(struct task_struct *task1,
>  }
>  #endif
>  
> +#ifdef CONFIG_NET
> +static int __kcmp_netns_target(struct task_struct *task1,
> +			       struct task_struct *task2,
> +			       struct file *filp1,
> +			       struct file *filp2)
> +{
> +	struct socket *sock1, *sock2;
> +	struct net *net1, *net2;
> +
> +	sock1 = sock_from_file(filp1);
> +	sock2 = sock_from_file(filp1);
> +	if (!sock1 || !sock2)
> +		return -ENOTSOCK;
> +
> +	net1 = sock_net(sock1->sk);
> +	net2 = sock_net(sock2->sk);
> +
> +	return kcmp_ptr(net1, net2, KCMP_NETNS);
> +}
> +
> +static int kcmp_netns_target(struct task_struct *task1,
> +			     struct task_struct *task2,
> +			     unsigned long idx1,
> +			     unsigned long idx2)
> +{
> +	struct file *filp1, *filp2;
> +
> +	int ret = -EBADF;
> +
> +	filp1 = fget_task(task1, idx1);
> +	if (filp1) {
> +		filp2 = fget_task(task2, idx2);
> +		if (filp2) {
> +			ret = __kcmp_netns_target(task1, task2, filp1, filp2);
> +			fput(filp2);
> +		}
> +
> +		fput(filp1);
> +	}
> +
> +	return ret;
> +}
> +#else
> +static int kcmp_netns_target(struct task_struct *task1,
> +			     struct task_struct *task2,
> +			     unsigned long idx1,
> +			     unsigned long idx2)
> +{
> +	return -EOPNOTSUPP;
> +}
> +#endif
> +
>  SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>  		unsigned long, idx1, unsigned long, idx2)
>  {
> @@ -206,6 +260,9 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>  	case KCMP_EPOLL_TFD:
>  		ret = kcmp_epoll_target(task1, task2, idx1, (void *)idx2);
>  		break;
> +	case KCMP_NETNS:
> +		ret = kcmp_netns_target(task1, task2, idx1, idx2);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-21  9:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20  9:57 Retrieving the network namespace of a socket Sargun Dhillon
2021-10-20 14:03 ` Sergey Ryazanov
2021-10-20 16:34   ` Sargun Dhillon
2021-10-20 19:24     ` Sargun Dhillon
2021-10-21  9:20       ` Christian Brauner
2021-10-20 20:02     ` Sergey Ryazanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.