All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
@ 2017-03-12 23:01 Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 01/27] afnetns: add CLONE_NEWAFNET flag Hannes Frederic Sowa
                   ` (27 more replies)
  0 siblings, 28 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

--- >8 ---
Note:
* BE CAREFUL SOURCE ADDRESS SELECTION 
--- >8 ---

afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
can work with afnetns with one limitation: one cannot cross the realm
of a network namespace while changing the afnetns compartement. To get
into a new afnetns in a different net namespace, one must first change
to the net namespace and afterwards switch to the desired afnetns.

The primitive objects in the kernel an afnetns relates to are,
    - process
    - socket
    - ipv4 address
    - ipv6 address.

An afnetns basically forms a namespace around socket binds. While not
strictly necessary, it also affects the source routing, so firewall rules
are easier to maintain. It does in now way deal with the reception and
handling of multicast or broadcast sockets. As the afnetns namespaces
are connecting to the same L2 network, it does not make sense to try to
build up separation rules here, as they can be broken anyway.

In comparison to ipvlan, afnetns allows early to use early socket
demuxing.

Loopback is not possible within an afnetns until its own loopback device
is added or its private ip address is used.

The easiest way to use afnetns is to use the iproute2 interface, which
very much follows the style of ip-netns.

$ ip afnetns help
Usage: ip afnetns list
       ip afnetns add NAME
       ip afnetns del NAME
       ip afnetns exec NAME cmd ...

IP addresses carry a afnetns identifier, too. It is visible with the -d
(details) option:

$ ip -d a l dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 numtxqueues 1 numrxqueues 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever afnet afnet:[4026531958],self
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever afnet afnet:[4026531958],self

This shows the afnetns inode number, as well as that we are currently in
the same namespace as the two specified ip addresses. In case we added
a name for the namespace with ip-afnetns, it will be visible here, too.

$ ip a a 10.0.0.1/24 dev lo afnetns test

This command adds a new ip address to the loopback device and makes it
available in the test afnetns. Commands in this namespace can use this
IP address and use it for outgoing communication.

Changelog:
v1) first published version

The same commands work for IPv6, I only used IPv4 as an example.

This is still work in progress.

Hannes Frederic Sowa (27):
  afnetns: add CLONE_NEWAFNET flag
  afnetns: basic namespace operations and representations
  afnetns: prepare for integration into ipv4
  afnetns: add net_afnetns
  afnetns: ipv6 integration
  afnetns: put afnetns pointer into struct sock
  ipv4: introduce ifa_find_rcu
  afnetns: factor out inet_allow_bind
  afnetns: add sock_afnetns
  afnetns: add ifa_find_afnetns_rcu
  afnetns: validate afnetns in inet_allow_bind
  afnetns: ipv4/udp integration
  afnetns: use inet_allow_bind in inet6_bind
  afnetns: check for afnetns in inet6_bind
  afnetns: add ipv6_get_ifaddr_afnetns_rcu
  afnetns: add udpv6 support
  afnetns: introduce __inet_select_addr
  afnetns: afnetns should influence source address selection
  afnetns: add afnetns support for tcpv4
  ipv6: move ipv6_get_ifaddr to vmlinux in case ipv6 is build as module
  afnetns: add support for tcpv6
  afnetns: track owning namespace for inet_bind
  afnetns: use user_ns from afnetns for checking for binding to port <
    1024
  afnetns: check afnetns user_ns in inet6_bind
  afnetns: ipv4: inherit afnetns from calling application
  afnetns: ipv6: inherit afnetns from calling application
  afnetns: allow only whitelisted protocols to operate inside afnetns

 Documentation/networking/afnetns.txt    |  64 +++++++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c |   2 +-
 fs/proc/namespaces.c                    |   3 +
 include/linux/inetdevice.h              |  22 ++++-
 include/linux/nsproxy.h                 |   3 +
 include/linux/proc_ns.h                 |   1 +
 include/net/addrconf.h                  |  26 +++++-
 include/net/afnetns.h                   |  47 ++++++++++
 include/net/if_inet6.h                  |   3 +
 include/net/inet_common.h               |   1 +
 include/net/inet_sock.h                 |   1 +
 include/net/net_namespace.h             |  12 +++
 include/net/protocol.h                  |   1 +
 include/net/route.h                     |  10 +-
 include/net/sock.h                      |  13 +++
 include/uapi/linux/if_addr.h            |   2 +
 include/uapi/linux/sched.h              |   1 +
 kernel/fork.c                           |  12 ++-
 kernel/nsproxy.c                        |  24 ++++-
 net/Kconfig                             |  10 ++
 net/core/Makefile                       |   1 +
 net/core/afnetns.c                      | 159 ++++++++++++++++++++++++++++++++
 net/core/net_namespace.c                |  25 +++++
 net/core/sock.c                         |  18 +++-
 net/ipv4/af_inet.c                      | 101 ++++++++++++++------
 net/ipv4/devinet.c                      | 104 ++++++++++++++++++---
 net/ipv4/icmp.c                         |   4 +-
 net/ipv4/igmp.c                         |   2 +-
 net/ipv4/inet_hashtables.c              |  17 +++-
 net/ipv4/route.c                        |  21 +++--
 net/ipv4/tcp_input.c                    |   3 +
 net/ipv4/udp.c                          |  22 ++++-
 net/ipv4/udplite.c                      |   3 +-
 net/ipv4/xfrm4_policy.c                 |   2 +-
 net/ipv6/addrconf.c                     | 117 +++++++++++++----------
 net/ipv6/af_inet6.c                     |  78 ++++++++++------
 net/ipv6/datagram.c                     |   6 +-
 net/ipv6/inet6_hashtables.c             |  55 ++++++++++-
 net/ipv6/ndisc.c                        |   4 +-
 net/ipv6/route.c                        |   2 +-
 net/ipv6/tcp_ipv6.c                     |   3 +-
 net/ipv6/udp.c                          |  21 +++--
 net/ipv6/udplite.c                      |   3 +-
 net/sctp/protocol.c                     |   4 +-
 net/tipc/udp_media.c                    |   2 +-
 45 files changed, 864 insertions(+), 171 deletions(-)
 create mode 100644 Documentation/networking/afnetns.txt
 create mode 100644 include/net/afnetns.h
 create mode 100644 net/core/afnetns.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 01/27] afnetns: add CLONE_NEWAFNET flag
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 02/27] afnetns: basic namespace operations and representations Hannes Frederic Sowa
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

This patch adds a new clone flag. It will be used if a clone should also
open up a new afnetns namespace. The only restriction placed on this new
flag, is, that it cannot be used together with CLONE_NEWNET.

The previous usage of flag 0x00001000 was used for CLONE_IDLETASK until
2004. It was only allowed to be used by the kernel, thus I consider its
usage safe.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/uapi/linux/sched.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 5f0fe019a7204e..b48dea58f55524 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -9,6 +9,7 @@
 #define CLONE_FS	0x00000200	/* set if fs info shared between processes */
 #define CLONE_FILES	0x00000400	/* set if open files shared between processes */
 #define CLONE_SIGHAND	0x00000800	/* set if signal handlers and blocked signals shared */
+#define CLONE_NEWAFNET	0x00001000	/* Clone new afnet context */
 #define CLONE_PTRACE	0x00002000	/* set if we want to let tracing continue on the child too */
 #define CLONE_VFORK	0x00004000	/* set if the parent wants the child to wake it up on mm_release */
 #define CLONE_PARENT	0x00008000	/* set if we want to have the same parent as the cloner */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 02/27] afnetns: basic namespace operations and representations
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 01/27] afnetns: add CLONE_NEWAFNET flag Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 03/27] afnetns: prepare for integration into ipv4 Hannes Frederic Sowa
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

This patch adds the basic afnetns operations. Specifically it implements
the /proc/self/ns/afnet operations which allow to basically manage
afnetns namespaces plus, clone, unshare and setns.

The afnetns is tracked in the nsproxy structure for each task_struct.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 Documentation/networking/afnetns.txt |  64 ++++++++++++++++++
 fs/proc/namespaces.c                 |   3 +
 include/linux/nsproxy.h              |   3 +
 include/linux/proc_ns.h              |   1 +
 include/net/afnetns.h                |  42 ++++++++++++
 include/net/net_namespace.h          |   4 ++
 kernel/fork.c                        |  12 +++-
 kernel/nsproxy.c                     |  24 ++++++-
 net/Kconfig                          |  10 +++
 net/core/Makefile                    |   1 +
 net/core/afnetns.c                   | 124 +++++++++++++++++++++++++++++++++++
 net/core/net_namespace.c             |  25 +++++++
 12 files changed, 308 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/networking/afnetns.txt
 create mode 100644 include/net/afnetns.h
 create mode 100644 net/core/afnetns.c

diff --git a/Documentation/networking/afnetns.txt b/Documentation/networking/afnetns.txt
new file mode 100644
index 00000000000000..cede4564f8c396
--- /dev/null
+++ b/Documentation/networking/afnetns.txt
@@ -0,0 +1,64 @@
+Address-family net namespace
+===========================
+
+Support for afnetns is enabled in the kernel via CONFIG_AFNETNS.
+
+afnetns allows to put address family addresses into separate
+namespaces.
+
+afnetns behaves like all other namespaces: clone, unshare, setns
+syscalls can work with afnetns with one limitation: one cannot cross
+the realm of a network namespace while changing the afnetns
+compartment. To get into a new afnetns in a different net namespace,
+one must first change to the net namespace and afterwards switch to
+the desired afnetns.
+
+The primitive objects in the kernel an afnetns relates to are:
+    - process
+    - socket
+    - ipv4 address
+    - ipv6 address.
+
+An afnetns basically forms a namespace around socket binds. While not
+strictly necessary, it also affects source routing, so firewall rules
+are easier to maintain. It does in no way deal with the reception and
+handling of multicast or broadcast sockets. As the afnetns namespaces
+are connecting to the same L2 network, it does not make sense to try
+to build up separation rules here, as they can be broken anyway.
+
+afnetns doesn't allow sharing of the 127.0.0.1/32 loopback
+address. Instead each afnetns must be provided with a loopback address
+from the 127.0.0.0/8 range if needed.
+
+The easiest way to use afnetns is to use the iproute2 interface, which
+very much follows the style of ip-netns.
+
+$ ip afnetns help
+Usage: ip afnetns list
+       ip afnetns add NAME
+       ip afnetns del NAME
+       ip afnetns exec NAME cmd ...
+
+IP addresses carry a afnetns identifier, too. It is visible with the
+-d (details) option:
+
+$ ip -d a l dev lo
+1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
+    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 numtxqueues 1 numrxqueues 1 
+    inet 127.0.0.1/8 scope host lo
+       valid_lft forever preferred_lft forever afnet afnet:[4026531958],self
+    inet6 ::1/128 scope host 
+       valid_lft forever preferred_lft forever afnet afnet:[4026531958],self
+
+This shows the afnetns inode number, as well as that we are currently
+in the same namespace as the two specified ip addresses. In case we
+added a name for the namespace with ip-afnetns, it will be visible
+here, too.
+
+$ ip a a 10.0.0.1/24 dev lo afnetns test
+
+This command adds a new ip address to the loopback device and makes it
+available in the "test" afnetns. Commands in this namespace can use
+this IP address and use it for outgoing communication.
+
+The same commands work for IPv6, I only used IPv4 as an example.
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 766f0c637ad1b4..f1ccef97ce9861 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -31,6 +31,9 @@ static const struct proc_ns_operations *ns_entries[] = {
 #ifdef CONFIG_CGROUPS
 	&cgroupns_operations,
 #endif
+#if IS_ENABLED(CONFIG_AFNETNS)
+	&afnetns_operations,
+#endif
 };
 
 static const char *proc_ns_get_link(struct dentry *dentry,
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index ac0d65bef5d086..0c0e48dca4b744 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -35,6 +35,9 @@ struct nsproxy {
 	struct pid_namespace *pid_ns_for_children;
 	struct net 	     *net_ns;
 	struct cgroup_namespace *cgroup_ns;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct afnetns *afnet_ns;
+#endif
 };
 extern struct nsproxy init_nsproxy;
 
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 12cb8bd81d2d12..45f103098ab0c1 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -29,6 +29,7 @@ extern const struct proc_ns_operations pidns_operations;
 extern const struct proc_ns_operations userns_operations;
 extern const struct proc_ns_operations mntns_operations;
 extern const struct proc_ns_operations cgroupns_operations;
+extern const struct proc_ns_operations afnetns_operations;
 
 /*
  * We always define these enumerators
diff --git a/include/net/afnetns.h b/include/net/afnetns.h
new file mode 100644
index 00000000000000..d5fbb83023acd6
--- /dev/null
+++ b/include/net/afnetns.h
@@ -0,0 +1,42 @@
+#pragma once
+
+#include <linux/atomic.h>
+#include <linux/refcount.h>
+#include <linux/ns_common.h>
+#include <linux/nsproxy.h>
+
+struct afnetns {
+#if IS_ENABLED(CONFIG_AFNETNS)
+	refcount_t ref;
+	struct ns_common ns;
+	struct net *net;
+#endif
+};
+
+extern struct afnetns init_afnetns;
+
+int afnet_ns_init(void);
+
+struct afnetns *afnetns_new(struct net *net);
+struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old);
+void afnetns_free(struct afnetns *afnetns);
+
+static inline struct afnetns *afnetns_get(struct afnetns *afnetns)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	refcount_inc(&afnetns->ref);
+#else
+	BUILD_BUG();
+#endif
+	return afnetns;
+}
+
+static inline void afnetns_put(struct afnetns *afnetns)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (refcount_dec_and_test(&afnetns->ref))
+		afnetns_free(afnetns);
+#else
+	BUILD_BUG();
+#endif
+}
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index af8fe8a909dc0c..c59fb018da5e46 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -30,6 +30,7 @@
 #include <linux/ns_common.h>
 #include <linux/idr.h>
 #include <linux/skbuff.h>
+#include <net/afnetns.h>
 
 struct user_namespace;
 struct proc_dir_entry;
@@ -61,6 +62,9 @@ struct net {
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
 	struct ucounts		*ucounts;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct afnetns		*afnet_ns;
+#endif
 	spinlock_t		nsid_lock;
 	struct idr		netns_ids;
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 6c463c80e93de8..d3ab9f050adfe8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2180,10 +2180,16 @@ void __init proc_caches_init(void)
 static int check_unshare_flags(unsigned long unshare_flags)
 {
 	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
-				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
-				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
-				CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP))
+			      CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
+			      CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
+			      CLONE_NEWAFNET|CLONE_NEWUSER|CLONE_NEWPID|
+			      CLONE_NEWCGROUP))
 		return -EINVAL;
+
+	if ((unshare_flags & CLONE_NEWNET) &&
+	    (unshare_flags & CLONE_NEWAFNET))
+		return -EINVAL;
+
 	/*
 	 * Not implemented, but pretend it works if there is nothing
 	 * to unshare.  Note that unsharing the address space or the
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 782102e59eed5b..f99ecbdd506137 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -26,6 +26,7 @@
 #include <linux/file.h>
 #include <linux/syscalls.h>
 #include <linux/cgroup.h>
+#include <net/afnetns.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -43,6 +44,9 @@ struct nsproxy init_nsproxy = {
 #ifdef CONFIG_CGROUPS
 	.cgroup_ns		= &init_cgroup_ns,
 #endif
+#if IS_ENABLED(CONFIG_AFNETNS)
+	.afnet_ns		= &init_afnetns,
+#endif
 };
 
 static inline struct nsproxy *create_nsproxy(void)
@@ -109,8 +113,20 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		goto out_net;
 	}
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	new_nsp->afnet_ns = copy_afnet_ns(flags, tsk->nsproxy);
+	if (IS_ERR(new_nsp->afnet_ns)) {
+		err = PTR_ERR(new_nsp->afnet_ns);
+		goto out_afnet;
+	}
+#endif
+
 	return new_nsp;
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+out_afnet:
+	put_net(new_nsp->net_ns);
+#endif
 out_net:
 	put_cgroup_ns(new_nsp->cgroup_ns);
 out_cgroup:
@@ -141,7 +157,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	struct nsproxy *new_ns;
 
 	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			      CLONE_NEWPID | CLONE_NEWNET |
+			      CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWAFNET |
 			      CLONE_NEWCGROUP)))) {
 		get_nsproxy(old_ns);
 		return 0;
@@ -181,6 +197,9 @@ void free_nsproxy(struct nsproxy *ns)
 		put_pid_ns(ns->pid_ns_for_children);
 	put_cgroup_ns(ns->cgroup_ns);
 	put_net(ns->net_ns);
+#if IS_ENABLED(CONFIG_AFNETNS)
+	afnetns_put(ns->afnet_ns);
+#endif
 	kmem_cache_free(nsproxy_cachep, ns);
 }
 
@@ -195,7 +214,8 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP)))
+			       CLONE_NEWNET | CLONE_NEWAFNET |CLONE_NEWPID |
+			       CLONE_NEWCGROUP)))
 		return 0;
 
 	user_ns = new_cred ? new_cred->user_ns : current_user_ns();
diff --git a/net/Kconfig b/net/Kconfig
index 102f781a0131af..8496df4372705f 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -84,6 +84,16 @@ config INET
 	  Short answer: say Y.
 
 if INET
+
+config AFNETNS
+       select NET_NS
+       depends on NAMESPACES
+       bool "Address family net namespace"
+       ---help---
+	 This option enables support for afnetns. It allows to put
+         address family (currently IPv4/IPv6) addresses into separate
+         namespaces.
+
 source "net/ipv4/Kconfig"
 source "net/ipv6/Kconfig"
 source "net/netlabel/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index 79f9479e965812..c0e703307425c2 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
 obj-$(CONFIG_GRO_CELLS) += gro_cells.o
+obj-$(CONFIG_AFNETNS) += afnetns.o
diff --git a/net/core/afnetns.c b/net/core/afnetns.c
new file mode 100644
index 00000000000000..997623e4dc5078
--- /dev/null
+++ b/net/core/afnetns.c
@@ -0,0 +1,124 @@
+#include <net/afnetns.h>
+#include <net/net_namespace.h>
+#include <linux/sched.h>
+#include <linux/sched/task.h>
+#include <linux/nsproxy.h>
+#include <linux/proc_ns.h>
+
+const struct proc_ns_operations afnetns_operations;
+
+struct afnetns init_afnetns = {
+	.ref = REFCOUNT_INIT(1),
+};
+
+static struct afnetns *ns_to_afnet(struct ns_common *ns)
+{
+	return container_of(ns, struct afnetns, ns);
+}
+
+static int afnet_setup(struct afnetns *afnetns, struct net *net)
+{
+	int err;
+
+	afnetns->ns.ops = &afnetns_operations;
+	err = ns_alloc_inum(&afnetns->ns);
+	if (err)
+		return err;
+
+	refcount_set(&afnetns->ref, 1);
+	afnetns->net = get_net(net);
+
+	return err;
+}
+
+struct afnetns *afnetns_new(struct net *net)
+{
+	int err;
+	struct afnetns *afnetns;
+
+	afnetns = kzalloc(sizeof(*afnetns), GFP_KERNEL);
+	if (!afnetns)
+		return ERR_PTR(-ENOMEM);
+
+	err = afnet_setup(afnetns, net);
+	if (err) {
+		kfree(afnetns);
+		return ERR_PTR(err);
+	}
+
+	return afnetns;
+}
+
+void afnetns_free(struct afnetns *afnetns)
+{
+	ns_free_inum(&afnetns->ns);
+	put_net(afnetns->net);
+	kfree(afnetns);
+}
+
+struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old)
+{
+	if (flags & CLONE_NEWNET)
+		return afnetns_get(old->net_ns->afnet_ns);
+
+	if (!(flags & CLONE_NEWAFNET))
+		return afnetns_get(old->afnet_ns);
+
+	return afnetns_new(old->net_ns);
+}
+
+static struct ns_common *afnet_get(struct task_struct *task)
+{
+	struct afnetns *afnetns = NULL;
+	struct nsproxy *nsproxy;
+
+	task_lock(task);
+	nsproxy = task->nsproxy;
+	if (nsproxy)
+		afnetns = afnetns_get(nsproxy->afnet_ns);
+	task_unlock(task);
+	return afnetns ? &afnetns->ns : NULL;
+}
+
+static void afnet_put(struct ns_common *ns)
+{
+	afnetns_put(ns_to_afnet(ns));
+}
+
+static int afnet_install(struct nsproxy *nsproxy, struct ns_common *ns)
+{
+	struct afnetns *afnetns = ns_to_afnet(ns);
+
+	if (!ns_capable(afnetns->net->user_ns, CAP_SYS_ADMIN) ||
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* don't allow cross netns setns */
+	if (!net_eq(nsproxy->net_ns, afnetns->net))
+		return -EINVAL;
+
+	afnetns_put(nsproxy->afnet_ns);
+	nsproxy->afnet_ns = afnetns_get(afnetns);
+
+	return 0;
+}
+
+const struct proc_ns_operations afnetns_operations = {
+	.name		= "afnet",
+	.type		= CLONE_NEWAFNET,
+	.get		= afnet_get,
+	.put		= afnet_put,
+	.install	= afnet_install,
+};
+
+int __init afnet_ns_init(void)
+{
+	int err;
+
+	err = afnet_setup(&init_afnetns, &init_net);
+	if (err)
+		return err;
+
+	pr_info("afnetns: address family namespaces available\n");
+	return err;
+}
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 652468ff65b79d..1b11883d8cdbbd 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -36,6 +36,9 @@ EXPORT_SYMBOL_GPL(net_namespace_list);
 
 struct net init_net = {
 	.dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
+#if IS_ENABLED(CONFIG_AFNETNS)
+	.afnet_ns      = &init_afnetns,
+#endif
 };
 EXPORT_SYMBOL(init_net);
 
@@ -282,6 +285,16 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	int error = 0;
 	LIST_HEAD(net_exit_list);
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (likely(!net_eq(&init_net, net))) {
+		net->afnet_ns = afnetns_new(net);
+		if (IS_ERR(net->afnet_ns)) {
+			error = PTR_ERR(net->afnet_ns);
+			goto out;
+		}
+	}
+#endif
+
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
@@ -353,6 +366,9 @@ static struct net *net_alloc(void)
 
 static void net_free(struct net *net)
 {
+#if IS_ENABLED(CONFIG_AFNETNS)
+	afnetns_put(net->afnet_ns);
+#endif
 	kfree(rcu_access_pointer(net->gen));
 	kmem_cache_free(net_cachep, net);
 }
@@ -795,6 +811,11 @@ static int __init net_ns_init(void)
 	rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, rtnl_net_dumpid,
 		      NULL);
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (afnet_ns_init())
+		panic("Could not setup the initial address family namespace");
+#endif
+
 	return 0;
 }
 
@@ -1035,6 +1056,10 @@ static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 
 	put_net(nsproxy->net_ns);
 	nsproxy->net_ns = get_net(net);
+#if IS_ENABLED(CONFIG_AFNETNS)
+	afnetns_put(nsproxy->afnet_ns);
+	nsproxy->afnet_ns = afnetns_get(net->afnet_ns);
+#endif
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 03/27] afnetns: prepare for integration into ipv4
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 01/27] afnetns: add CLONE_NEWAFNET flag Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 02/27] afnetns: basic namespace operations and representations Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 04/27] afnetns: add net_afnetns Hannes Frederic Sowa
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Each IPv4 address has an associated afnet namespace, so it is only going
to be used by applications in the same afnet namespace.

One can open a file descriptor and pass it to the newaddr rtnetlink
functions to put an IP address into a specific afnet namespace.

Dumping the addresses also returns the appropriate afnetns inode number,
so a match with the appropriate afnet namespace can be done in user space.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/inetdevice.h   |  3 +++
 include/net/afnetns.h        |  2 ++
 include/uapi/linux/if_addr.h |  2 ++
 net/core/afnetns.c           | 26 ++++++++++++++++++++++++++
 net/ipv4/devinet.c           | 39 ++++++++++++++++++++++++++++++++++++++-
 5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index ee971f335a8b65..d5ac959e90baa1 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -141,6 +141,9 @@ struct in_ifaddr {
 	unsigned char		ifa_scope;
 	unsigned char		ifa_prefixlen;
 	__u32			ifa_flags;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct afnetns		*afnetns;
+#endif
 	char			ifa_label[IFNAMSIZ];
 
 	/* In seconds, relative to tstamp. Expiry is at tstamp + HZ * lft. */
diff --git a/include/net/afnetns.h b/include/net/afnetns.h
index d5fbb83023acd6..9039086717c356 100644
--- a/include/net/afnetns.h
+++ b/include/net/afnetns.h
@@ -19,6 +19,8 @@ int afnet_ns_init(void);
 
 struct afnetns *afnetns_new(struct net *net);
 struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old);
+struct afnetns *afnetns_get_by_fd(int fd);
+unsigned int afnetns_to_inode(struct afnetns *afnetns);
 void afnetns_free(struct afnetns *afnetns);
 
 static inline struct afnetns *afnetns_get(struct afnetns *afnetns)
diff --git a/include/uapi/linux/if_addr.h b/include/uapi/linux/if_addr.h
index 4318ab1635cedf..c67703808584eb 100644
--- a/include/uapi/linux/if_addr.h
+++ b/include/uapi/linux/if_addr.h
@@ -32,6 +32,8 @@ enum {
 	IFA_CACHEINFO,
 	IFA_MULTICAST,
 	IFA_FLAGS,
+	IFA_AFNETNS_FD,
+	IFA_AFNETNS_INODE,
 	__IFA_MAX,
 };
 
diff --git a/net/core/afnetns.c b/net/core/afnetns.c
index 997623e4dc5078..12b823ae780796 100644
--- a/net/core/afnetns.c
+++ b/net/core/afnetns.c
@@ -2,6 +2,7 @@
 #include <net/net_namespace.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
+#include <linux/file.h>
 #include <linux/nsproxy.h>
 #include <linux/proc_ns.h>
 
@@ -56,6 +57,31 @@ void afnetns_free(struct afnetns *afnetns)
 	kfree(afnetns);
 }
 
+struct afnetns *afnetns_get_by_fd(int fd)
+{
+	struct file *file;
+	struct ns_common *ns;
+	struct afnetns *afnetns;
+
+	file = proc_ns_fget(fd);
+	if (IS_ERR(file))
+		return ERR_CAST(file);
+
+	ns = get_proc_ns(file_inode(file));
+	if (ns->ops == &afnetns_operations)
+		afnetns = afnetns_get(ns_to_afnet(ns));
+	else
+		afnetns = ERR_PTR(-EINVAL);
+
+	fput(file);
+	return afnetns;
+}
+
+unsigned int afnetns_to_inode(struct afnetns *afnetns)
+{
+	return afnetns->ns.inum;
+}
+
 struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old)
 {
 	if (flags & CLONE_NEWNET)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cebedd545e5e28..d4a38b6e9adb79 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -99,6 +99,7 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = {
 	[IFA_LABEL]     	= { .type = NLA_STRING, .len = IFNAMSIZ - 1 },
 	[IFA_CACHEINFO]		= { .len = sizeof(struct ifa_cacheinfo) },
 	[IFA_FLAGS]		= { .type = NLA_U32 },
+	[IFA_AFNETNS_FD]	= { .type = NLA_S32 },
 };
 
 #define IN4_ADDR_HSIZE_SHIFT	8
@@ -203,6 +204,9 @@ static void inet_rcu_free_ifa(struct rcu_head *head)
 	struct in_ifaddr *ifa = container_of(head, struct in_ifaddr, rcu_head);
 	if (ifa->ifa_dev)
 		in_dev_put(ifa->ifa_dev);
+#if IS_ENABLED(CONFIG_AFNETNS)
+	afnetns_put(ifa->afnetns);
+#endif
 	kfree(ifa);
 }
 
@@ -805,6 +809,26 @@ static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh,
 	else
 		memcpy(ifa->ifa_label, dev->name, IFNAMSIZ);
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (tb[IFA_AFNETNS_FD]) {
+		int fd = nla_get_s32(tb[IFA_AFNETNS_FD]);
+
+		ifa->afnetns = afnetns_get_by_fd(fd);
+		if (IS_ERR(ifa->afnetns)) {
+			err = PTR_ERR(ifa->afnetns);
+			ifa->afnetns = afnetns_get(net->afnet_ns);
+			goto errout_free;
+		}
+	} else {
+		ifa->afnetns = afnetns_get(net->afnet_ns);
+	}
+#else
+	if (tb[IFA_AFNETNS_FD]) {
+		err = -EOPNOTSUPP;
+		goto errout_free;
+	}
+#endif
+
 	if (tb[IFA_CACHEINFO]) {
 		struct ifa_cacheinfo *ci;
 
@@ -1089,6 +1113,9 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 			ifa->ifa_mask = inet_make_mask(32);
 		}
 		set_ifa_lifetime(ifa, INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);
+#if IS_ENABLED(CONFIG_AFNETNS)
+		ifa->afnetns = afnetns_get(net->afnet_ns);
+#endif
 		ret = inet_set_ifa(dev, ifa);
 		break;
 
@@ -1444,6 +1471,9 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 				in_dev_hold(in_dev);
 				ifa->ifa_dev = in_dev;
 				ifa->ifa_scope = RT_SCOPE_HOST;
+#if IS_ENABLED(CONFIG_AFNETNS)
+				ifa->afnetns = afnetns_get(dev_net(dev)->afnet_ns);
+#endif
 				memcpy(ifa->ifa_label, dev->name, IFNAMSIZ);
 				set_ifa_lifetime(ifa, INFINITY_LIFE_TIME,
 						 INFINITY_LIFE_TIME);
@@ -1504,7 +1534,8 @@ static size_t inet_nlmsg_size(void)
 	       + nla_total_size(4) /* IFA_BROADCAST */
 	       + nla_total_size(IFNAMSIZ) /* IFA_LABEL */
 	       + nla_total_size(4)  /* IFA_FLAGS */
-	       + nla_total_size(sizeof(struct ifa_cacheinfo)); /* IFA_CACHEINFO */
+	       + nla_total_size(sizeof(struct ifa_cacheinfo)) /* IFA_CACHEINFO */
+	       + nla_total_size(4); /* IFA_AFNETNS_INODE */
 }
 
 static inline u32 cstamp_delta(unsigned long cstamp)
@@ -1577,6 +1608,12 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
 			  preferred, valid))
 		goto nla_put_failure;
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (nla_put_u32(skb, IFA_AFNETNS_INODE,
+			afnetns_to_inode(ifa->afnetns)))
+		goto nla_put_failure;
+#endif
+
 	nlmsg_end(skb, nlh);
 	return 0;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 04/27] afnetns: add net_afnetns
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (2 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 03/27] afnetns: prepare for integration into ipv4 Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 05/27] afnetns: ipv6 integration Hannes Frederic Sowa
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/net_namespace.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c59fb018da5e46..9be39b8315a6f9 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -244,6 +244,14 @@ int net_eq(const struct net *net1, const struct net *net2)
 #define net_drop_ns NULL
 #endif
 
+static inline struct afnetns *net_afnetns(struct net *net)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	return net->afnet_ns;
+#else
+	return NULL;
+#endif
+}
 
 typedef struct {
 #ifdef CONFIG_NET_NS
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 05/27] afnetns: ipv6 integration
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (3 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 04/27] afnetns: add net_afnetns Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 06/27] afnetns: put afnetns pointer into struct sock Hannes Frederic Sowa
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Like the previous IPv4 counterpart, this patch associates every IPv6
address with a corresponding afnet namespace. The namespace can be set
via file descriptor and the inode gets reported during dumping.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/if_inet6.h |  3 +++
 net/core/afnetns.c     |  3 +++
 net/ipv6/addrconf.c    | 70 +++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index f656f9051acafa..cad645851501f4 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -41,6 +41,9 @@ enum {
 struct inet6_ifaddr {
 	struct in6_addr		addr;
 	__u32			prefix_len;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct afnetns		*afnetns;
+#endif
 
 	/* In seconds, relative to tstamp. Expiry is at tstamp + HZ * lft. */
 	__u32			valid_lft;
diff --git a/net/core/afnetns.c b/net/core/afnetns.c
index 12b823ae780796..b96c25b5ebe30d 100644
--- a/net/core/afnetns.c
+++ b/net/core/afnetns.c
@@ -56,6 +56,7 @@ void afnetns_free(struct afnetns *afnetns)
 	put_net(afnetns->net);
 	kfree(afnetns);
 }
+EXPORT_SYMBOL(afnetns_free);
 
 struct afnetns *afnetns_get_by_fd(int fd)
 {
@@ -76,11 +77,13 @@ struct afnetns *afnetns_get_by_fd(int fd)
 	fput(file);
 	return afnetns;
 }
+EXPORT_SYMBOL(afnetns_get_by_fd);
 
 unsigned int afnetns_to_inode(struct afnetns *afnetns)
 {
 	return afnetns->ns.inum;
 }
+EXPORT_SYMBOL(afnetns_to_inode);
 
 struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8c69768a5c4606..c67f6d3c5b9a7a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -910,7 +910,9 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp)
 		return;
 	}
 	ip6_rt_put(ifp->rt);
-
+#if IS_ENABLED(CONFIG_AFNETNS)
+	afnetns_put(ifp->afnetns);
+#endif
 	kfree_rcu(ifp, rcu);
 }
 
@@ -942,9 +944,10 @@ static u32 inet6_addr_hash(const struct in6_addr *addr)
 /* On success it returns ifp with increased reference count */
 
 static struct inet6_ifaddr *
-ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
-	      const struct in6_addr *peer_addr, int pfxlen,
-	      int scope, u32 flags, u32 valid_lft, u32 prefered_lft)
+__ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
+		const struct in6_addr *peer_addr, int pfxlen,
+		int scope, u32 flags, u32 valid_lft, u32 prefered_lft,
+		struct afnetns *afnetns)
 {
 	struct net *net = dev_net(idev->dev);
 	struct inet6_ifaddr *ifa = NULL;
@@ -1002,7 +1005,9 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
 	ifa->addr = *addr;
 	if (peer_addr)
 		ifa->peer_addr = *peer_addr;
-
+#if IS_ENABLED(CONFIG_AFNETNS)
+	ifa->afnetns = afnetns_get(afnetns);
+#endif
 	spin_lock_init(&ifa->lock);
 	INIT_DELAYED_WORK(&ifa->dad_work, addrconf_dad_work);
 	INIT_HLIST_NODE(&ifa->addr_lst);
@@ -1054,6 +1059,17 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
 	goto out2;
 }
 
+static struct inet6_ifaddr *ipv6_add_addr(struct inet6_dev *idev,
+					  const struct in6_addr *addr,
+					  const struct in6_addr *peer_addr,
+					  int pfxlen, int scope, u32 flags,
+					  u32 valid_lft, u32 prefered_lft)
+{
+	return __ipv6_add_addr(idev, addr, peer_addr, pfxlen, scope, flags,
+			       valid_lft, prefered_lft,
+			       net_afnetns(dev_net(idev->dev)));
+}
+
 enum cleanup_prefix_rt_t {
 	CLEANUP_PREFIX_RT_NOP,    /* no cleanup action for prefix route */
 	CLEANUP_PREFIX_RT_DEL,    /* delete the prefix route */
@@ -2741,7 +2757,8 @@ static int inet6_addr_add(struct net *net, int ifindex,
 			  const struct in6_addr *pfx,
 			  const struct in6_addr *peer_pfx,
 			  unsigned int plen, __u32 ifa_flags,
-			  __u32 prefered_lft, __u32 valid_lft)
+			  __u32 prefered_lft, __u32 valid_lft,
+			  struct afnetns *afnetns)
 {
 	struct inet6_ifaddr *ifp;
 	struct inet6_dev *idev;
@@ -2799,8 +2816,8 @@ static int inet6_addr_add(struct net *net, int ifindex,
 		prefered_lft = timeout;
 	}
 
-	ifp = ipv6_add_addr(idev, pfx, peer_pfx, plen, scope, ifa_flags,
-			    valid_lft, prefered_lft);
+	ifp = __ipv6_add_addr(idev, pfx, peer_pfx, plen, scope, ifa_flags,
+			      valid_lft, prefered_lft, afnetns);
 
 	if (!IS_ERR(ifp)) {
 		if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
@@ -2885,7 +2902,8 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	rtnl_lock();
 	err = inet6_addr_add(net, ireq.ifr6_ifindex, &ireq.ifr6_addr, NULL,
 			     ireq.ifr6_prefixlen, IFA_F_PERMANENT,
-			     INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);
+			     INFINITY_LIFE_TIME, INFINITY_LIFE_TIME,
+			     net_afnetns(net));
 	rtnl_unlock();
 	return err;
 }
@@ -4502,6 +4520,7 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct nlattr *tb[IFA_MAX+1];
 	struct in6_addr *pfx, *peer_pfx;
 	struct inet6_ifaddr *ifa;
+	struct afnetns *afnetns = NULL;
 	struct net_device *dev;
 	u32 valid_lft = INFINITY_LIFE_TIME, preferred_lft = INFINITY_LIFE_TIME;
 	u32 ifa_flags;
@@ -4537,15 +4556,31 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 	ifa_flags &= IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
 		     IFA_F_NOPREFIXROUTE | IFA_F_MCAUTOJOIN;
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (tb[IFA_AFNETNS_FD]) {
+		int fd = nla_get_s32(tb[IFA_AFNETNS_FD]);
+
+		afnetns = afnetns_get_by_fd(fd);
+		if (IS_ERR(afnetns))
+			return PTR_ERR(afnetns);
+	} else {
+		afnetns = afnetns_get(net_afnetns(net));
+	}
+#else
+	if (tb[IFA_AFNETNS_FD])
+		return -EOPNOTSUPP;
+#endif
+
 	ifa = ipv6_get_ifaddr(net, pfx, dev, 1);
 	if (!ifa) {
 		/*
 		 * It would be best to check for !NLM_F_CREATE here but
 		 * userspace already relies on not having to provide this.
 		 */
-		return inet6_addr_add(net, ifm->ifa_index, pfx, peer_pfx,
+		err =  inet6_addr_add(net, ifm->ifa_index, pfx, peer_pfx,
 				      ifm->ifa_prefixlen, ifa_flags,
-				      preferred_lft, valid_lft);
+				      preferred_lft, valid_lft, afnetns);
+		goto out;
 	}
 
 	if (nlh->nlmsg_flags & NLM_F_EXCL ||
@@ -4555,6 +4590,10 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 		err = inet6_addr_modify(ifa, ifa_flags, preferred_lft, valid_lft);
 
 	in6_ifa_put(ifa);
+out:
+#if IS_ENABLED(CONFIG_AFNETNS)
+	afnetns_put(afnetns);
+#endif
 
 	return err;
 }
@@ -4603,7 +4642,8 @@ static inline int inet6_ifaddr_msgsize(void)
 	       + nla_total_size(16) /* IFA_LOCAL */
 	       + nla_total_size(16) /* IFA_ADDRESS */
 	       + nla_total_size(sizeof(struct ifa_cacheinfo))
-	       + nla_total_size(4)  /* IFA_FLAGS */;
+	       + nla_total_size(4)  /* IFA_FLAGS */
+	       + nla_total_size(4); /* IFA_AFNETNS_INODE */
 }
 
 static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
@@ -4655,6 +4695,12 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
 	if (nla_put_u32(skb, IFA_FLAGS, ifa->flags) < 0)
 		goto error;
 
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (nla_put_u32(skb, IFA_AFNETNS_INODE,
+			afnetns_to_inode(ifa->afnetns)))
+		goto error;
+#endif
+
 	nlmsg_end(skb, nlh);
 	return 0;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 06/27] afnetns: put afnetns pointer into struct sock
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (4 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 05/27] afnetns: ipv6 integration Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 07/27] ipv4: introduce ifa_find_rcu Hannes Frederic Sowa
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

All sockets are associated to its creator's afnet namespace.

A little bit care must be taken about in-kernel socket creation.
Basically we associate kernel pointers to the current's net namespace
afnet and don't use the process contexts afnetns.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/sock.h |  4 ++++
 net/core/sock.c    | 18 ++++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6db7693b9e6185..1e05d497db2520 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -183,6 +183,9 @@ struct sock_common {
 	};
 	struct proto		*skc_prot;
 	possible_net_t		skc_net;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct afnetns		*skc_afnet;
+#endif
 
 #if IS_ENABLED(CONFIG_IPV6)
 	struct in6_addr		skc_v6_daddr;
@@ -337,6 +340,7 @@ struct sock {
 #define sk_bind_node		__sk_common.skc_bind_node
 #define sk_prot			__sk_common.skc_prot
 #define sk_net			__sk_common.skc_net
+#define sk_afnet		__sk_common.skc_afnet
 #define sk_v6_daddr		__sk_common.skc_v6_daddr
 #define sk_v6_rcv_saddr	__sk_common.skc_v6_rcv_saddr
 #define sk_cookie		__sk_common.skc_cookie
diff --git a/net/core/sock.c b/net/core/sock.c
index 768aedf238f5b4..542d496858f993 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1458,6 +1458,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		if (likely(sk->sk_net_refcnt))
 			get_net(net);
 		sock_net_set(sk, net);
+#if IS_ENABLED(CONFIG_AFNETNS)
+		if (likely(sk->sk_net_refcnt))
+			sk->sk_afnet = afnetns_get(current->nsproxy->afnet_ns);
+		else
+			sk->sk_afnet = net->afnet_ns;
+#endif
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
 		mem_cgroup_sk_alloc(sk);
@@ -1499,8 +1505,12 @@ static void __sk_destruct(struct rcu_head *head)
 	if (sk->sk_peer_cred)
 		put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
-	if (likely(sk->sk_net_refcnt))
+	if (likely(sk->sk_net_refcnt)) {
 		put_net(sock_net(sk));
+#if IS_ENABLED(CONFIG_AFNETNS)
+		afnetns_put(sk->sk_afnet);
+#endif
+	}
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
 
@@ -1572,8 +1582,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sock_copy(newsk, sk);
 
 		/* SANITY */
-		if (likely(newsk->sk_net_refcnt))
+		if (likely(newsk->sk_net_refcnt)) {
 			get_net(sock_net(newsk));
+#if IS_ENABLED(CONFIG_AFNETNS)
+			afnetns_get(newsk->sk_afnet);
+#endif
+		}
 		sk_node_init(&newsk->sk_node);
 		sock_lock_init(newsk);
 		bh_lock_sock(newsk);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 07/27] ipv4: introduce ifa_find_rcu
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (5 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 06/27] afnetns: put afnetns pointer into struct sock Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 08/27] afnetns: factor out inet_allow_bind Hannes Frederic Sowa
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/inetdevice.h |  1 +
 net/ipv4/devinet.c         | 29 +++++++++++++++++------------
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index d5ac959e90baa1..eb1b662f62626f 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -159,6 +159,7 @@ int unregister_inetaddr_notifier(struct notifier_block *nb);
 void inet_netconf_notify_devconf(struct net *net, int type, int ifindex,
 				 struct ipv4_devconf *devconf);
 
+struct in_ifaddr *ifa_find_rcu(struct net *net, __be32 addr);
 struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref);
 static inline struct net_device *ip_dev_find(struct net *net, __be32 addr)
 {
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index d4a38b6e9adb79..cc15afefa1df0a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -128,6 +128,20 @@ static void inet_hash_remove(struct in_ifaddr *ifa)
 	hlist_del_init_rcu(&ifa->hash);
 }
 
+struct in_ifaddr *ifa_find_rcu(struct net *net, __be32 addr)
+{
+	u32 hash = inet_addr_hash(net, addr);
+	struct in_ifaddr *ifa;
+
+	hlist_for_each_entry_rcu(ifa, &inet_addr_lst[hash], hash) {
+		if (ifa->ifa_local == addr &&
+		    net_eq(dev_net(ifa->ifa_dev->dev), net))
+			return ifa;
+	}
+
+	return NULL;
+}
+
 /**
  * __ip_dev_find - find the first device with a given source address.
  * @net: the net namespace
@@ -138,21 +152,12 @@ static void inet_hash_remove(struct in_ifaddr *ifa)
  */
 struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 {
-	u32 hash = inet_addr_hash(net, addr);
-	struct net_device *result = NULL;
+	struct net_device *result;
 	struct in_ifaddr *ifa;
 
 	rcu_read_lock();
-	hlist_for_each_entry_rcu(ifa, &inet_addr_lst[hash], hash) {
-		if (ifa->ifa_local == addr) {
-			struct net_device *dev = ifa->ifa_dev->dev;
-
-			if (!net_eq(dev_net(dev), net))
-				continue;
-			result = dev;
-			break;
-		}
-	}
+	ifa = ifa_find_rcu(net, addr);
+	result = ifa ? ifa->ifa_dev->dev : NULL;
 	if (!result) {
 		struct flowi4 fl4 = { .daddr = addr };
 		struct fib_result res = { 0 };
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 08/27] afnetns: factor out inet_allow_bind
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (6 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 07/27] ipv4: introduce ifa_find_rcu Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 09/27] afnetns: add sock_afnetns Hannes Frederic Sowa
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_common.h |  1 +
 net/ipv4/af_inet.c        | 51 ++++++++++++++++++++++++++++++-----------------
 2 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index b7952d55b9c000..4ac8229dca6af4 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -30,6 +30,7 @@ int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
 void inet_sock_destruct(struct sock *sk);
 int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
+int inet_allow_bind(struct sock *sk, __be32 addr);
 int inet_getname(struct socket *sock, struct sockaddr *uaddr, int *uaddr_len,
 		 int peer);
 int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 602d40f43687c9..aee599e23137e7 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -428,6 +428,35 @@ int inet_release(struct socket *sock)
 }
 EXPORT_SYMBOL(inet_release);
 
+int inet_allow_bind(struct sock *sk, __be32 addr)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	struct net *net = sock_net(sk);
+	u32 tb_id = RT_TABLE_LOCAL;
+	int chk_addr_ret;
+
+	tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id;
+	chk_addr_ret = inet_addr_type_table(net, addr, tb_id);
+
+	/* Not specified by any standard per-se, however it breaks too
+	 * many applications when removed.  It is unfortunate since
+	 * allowing applications to make a non-local bind solves
+	 * several problems with systems using dynamic addressing.
+	 * (ie. your servers still start up even if your ISDN link
+	 *  is temporarily down)
+	 */
+	if (!net->ipv4.sysctl_ip_nonlocal_bind &&
+	    !(inet->freebind || inet->transparent) &&
+	    addr != htonl(INADDR_ANY) &&
+	    chk_addr_ret != RTN_LOCAL &&
+	    chk_addr_ret != RTN_MULTICAST &&
+	    chk_addr_ret != RTN_BROADCAST)
+		return -EADDRNOTAVAIL;
+
+	return chk_addr_ret;
+}
+EXPORT_SYMBOL(inet_allow_bind);
+
 int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
 	struct sockaddr_in *addr = (struct sockaddr_in *)uaddr;
@@ -436,7 +465,6 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	struct net *net = sock_net(sk);
 	unsigned short snum;
 	int chk_addr_ret;
-	u32 tb_id = RT_TABLE_LOCAL;
 	int err;
 
 	/* If the socket has its own bind function then use it. (RAW) */
@@ -458,24 +486,11 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 	}
 
-	tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id;
-	chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id);
-
-	/* Not specified by any standard per-se, however it breaks too
-	 * many applications when removed.  It is unfortunate since
-	 * allowing applications to make a non-local bind solves
-	 * several problems with systems using dynamic addressing.
-	 * (ie. your servers still start up even if your ISDN link
-	 *  is temporarily down)
-	 */
-	err = -EADDRNOTAVAIL;
-	if (!net->ipv4.sysctl_ip_nonlocal_bind &&
-	    !(inet->freebind || inet->transparent) &&
-	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
-	    chk_addr_ret != RTN_LOCAL &&
-	    chk_addr_ret != RTN_MULTICAST &&
-	    chk_addr_ret != RTN_BROADCAST)
+	chk_addr_ret = inet_allow_bind(sk, addr->sin_addr.s_addr);
+	if (chk_addr_ret < 0) {
+		err = chk_addr_ret;
 		goto out;
+	}
 
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 09/27] afnetns: add sock_afnetns
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (7 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 08/27] afnetns: factor out inet_allow_bind Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 10/27] afnetns: add ifa_find_afnetns_rcu Hannes Frederic Sowa
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/sock.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 1e05d497db2520..aa204bf3537ba0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2293,6 +2293,15 @@ struct net *sock_net(const struct sock *sk)
 	return read_pnet(&sk->sk_net);
 }
 
+static inline struct afnetns *sock_afnetns(const struct sock *sk)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	return sk->sk_afnet;
+#else
+	return NULL;
+#endif
+}
+
 static inline
 void sock_net_set(struct sock *sk, struct net *net)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 10/27] afnetns: add ifa_find_afnetns_rcu
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (8 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 09/27] afnetns: add sock_afnetns Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 11/27] afnetns: validate afnetns in inet_allow_bind Hannes Frederic Sowa
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/inetdevice.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index eb1b662f62626f..01cbcfe93383b7 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -180,6 +180,17 @@ static __inline__ bool inet_ifa_match(__be32 addr, struct in_ifaddr *ifa)
 	return !((addr^ifa->ifa_address)&ifa->ifa_mask);
 }
 
+static inline struct afnetns *ifa_find_afnetns_rcu(struct net *net, __be32 addr)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct in_ifaddr *ifa = ifa_find_rcu(net, addr);
+
+	return ifa ? ifa->afnetns : net->afnet_ns;
+#else
+	return NULL;
+#endif
+}
+
 /*
  *	Check if a mask is acceptable.
  */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 11/27] afnetns: validate afnetns in inet_allow_bind
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (9 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 10/27] afnetns: add ifa_find_afnetns_rcu Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 12/27] afnetns: ipv4/udp integration Hannes Frederic Sowa
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/af_inet.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index aee599e23137e7..5f11399bafd16f 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -453,6 +453,17 @@ int inet_allow_bind(struct sock *sk, __be32 addr)
 	    chk_addr_ret != RTN_BROADCAST)
 		return -EADDRNOTAVAIL;
 
+	if (chk_addr_ret == RTN_LOCAL &&
+	    net_afnetns(net) != sock_afnetns(sk)) {
+		struct afnetns *afnetns;
+
+		rcu_read_lock();
+		afnetns = ifa_find_afnetns_rcu(net, addr);
+		if (afnetns != sock_afnetns(sk))
+			chk_addr_ret = -EADDRNOTAVAIL;
+		rcu_read_unlock();
+	}
+
 	return chk_addr_ret;
 }
 EXPORT_SYMBOL(inet_allow_bind);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 12/27] afnetns: ipv4/udp integration
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (10 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 11/27] afnetns: validate afnetns in inet_allow_bind Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 13/27] afnetns: use inet_allow_bind in inet6_bind Hannes Frederic Sowa
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/udp.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ea6e4cff9fafe9..5bfe2d9f5583da 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -155,6 +155,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 
 	sk_for_each(sk2, &hslot->head) {
 		if (net_eq(sock_net(sk2), net) &&
+		    sock_afnetns(sk) == sock_afnetns(sk2) &&
 		    sk2 != sk &&
 		    (bitmap || udp_sk(sk2)->udp_port_hash == num) &&
 		    (!sk2->sk_reuse || !sk->sk_reuse) &&
@@ -192,6 +193,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num,
 	spin_lock(&hslot2->lock);
 	udp_portaddr_for_each_entry(sk2, &hslot2->head) {
 		if (net_eq(sock_net(sk2), net) &&
+		    sock_afnetns(sk) == sock_afnetns(sk2) &&
 		    sk2 != sk &&
 		    (udp_sk(sk2)->udp_port_hash == num) &&
 		    (!sk2->sk_reuse || !sk->sk_reuse) &&
@@ -220,6 +222,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot)
 
 	sk_for_each(sk2, &hslot->head) {
 		if (net_eq(sock_net(sk2), net) &&
+		    sock_afnetns(sk) == sock_afnetns(sk2) &&
 		    sk2 != sk &&
 		    sk2->sk_family == sk->sk_family &&
 		    ipv6_only_sock(sk2) == ipv6_only_sock(sk) &&
@@ -379,6 +382,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum)
 }
 
 static int compute_score(struct sock *sk, struct net *net,
+			 struct afnetns *afnetns,
 			 __be32 saddr, __be16 sport,
 			 __be32 daddr, unsigned short hnum, int dif,
 			 bool exact_dif)
@@ -391,6 +395,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	    ipv6_only_sock(sk))
 		return -1;
 
+	if (sock_afnetns(sk) != afnetns)
+		return -1;
+
 	score = (sk->sk_family == PF_INET) ? 2 : 1;
 	inet = inet_sk(sk);
 
@@ -436,6 +443,7 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
 
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
+		struct afnetns *afnetns,
 		__be32 saddr, __be16 sport,
 		__be32 daddr, unsigned int hnum, int dif, bool exact_dif,
 		struct udp_hslot *hslot2,
@@ -448,7 +456,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	result = NULL;
 	badness = 0;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-		score = compute_score(sk, net, saddr, sport,
+		score = compute_score(sk, net, afnetns, saddr, sport,
 				      daddr, hnum, dif, exact_dif);
 		if (score > badness) {
 			reuseport = sk->sk_reuseport;
@@ -486,8 +494,11 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	struct udp_hslot *hslot2, *hslot = &udptable->hash[slot];
 	bool exact_dif = udp_lib_exact_dif_match(net, skb);
 	int score, badness, matches = 0, reuseport = 0;
+	struct afnetns *afnetns;
 	u32 hash = 0;
 
+	afnetns = ifa_find_afnetns_rcu(net, daddr);
+
 	if (hslot->count > 10) {
 		hash2 = udp4_portaddr_hash(net, daddr, hnum);
 		slot2 = hash2 & udptable->mask;
@@ -495,7 +506,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		if (hslot->count < hslot2->count)
 			goto begin;
 
-		result = udp4_lib_lookup2(net, saddr, sport,
+		result = udp4_lib_lookup2(net, afnetns, saddr, sport,
 					  daddr, hnum, dif,
 					  exact_dif, hslot2, skb);
 		if (!result) {
@@ -510,7 +521,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 			if (hslot->count < hslot2->count)
 				goto begin;
 
-			result = udp4_lib_lookup2(net, saddr, sport,
+			result = udp4_lib_lookup2(net, afnetns, saddr, sport,
 						  daddr, hnum, dif,
 						  exact_dif, hslot2, skb);
 		}
@@ -520,7 +531,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	result = NULL;
 	badness = 0;
 	sk_for_each_rcu(sk, &hslot->head) {
-		score = compute_score(sk, net, saddr, sport,
+		score = compute_score(sk, net, afnetns, saddr, sport,
 				      daddr, hnum, dif, exact_dif);
 		if (score > badness) {
 			reuseport = sk->sk_reuseport;
@@ -2031,9 +2042,12 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
 	struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
 	INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr);
 	const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
+	struct afnetns *afnetns = ifa_find_afnetns_rcu(net, loc_addr);
 	struct sock *sk;
 
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
+		if (afnetns != sock_afnetns(sk))
+			continue;
 		if (INET_MATCH(sk, net, acookie, rmt_addr,
 			       loc_addr, ports, dif))
 			return sk;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 13/27] afnetns: use inet_allow_bind in inet6_bind
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (11 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 12/27] afnetns: ipv4/udp integration Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 14/27] afnetns: check for afnetns " Hannes Frederic Sowa
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/af_inet6.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 04db40620ea65c..f9367c507573bc 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -316,8 +316,6 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	/* Check if the address belongs to the host. */
 	if (addr_type == IPV6_ADDR_MAPPED) {
-		int chk_addr_ret;
-
 		/* Binding to v4-mapped address on a v6-only socket
 		 * makes no sense
 		 */
@@ -326,18 +324,11 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 		}
 
-		/* Reproduce AF_INET checks to make the bindings consistent */
-		v4addr = addr->sin6_addr.s6_addr32[3];
-		chk_addr_ret = inet_addr_type(net, v4addr);
-		if (!net->ipv4.sysctl_ip_nonlocal_bind &&
-		    !(inet->freebind || inet->transparent) &&
-		    v4addr != htonl(INADDR_ANY) &&
-		    chk_addr_ret != RTN_LOCAL &&
-		    chk_addr_ret != RTN_MULTICAST &&
-		    chk_addr_ret != RTN_BROADCAST) {
-			err = -EADDRNOTAVAIL;
+		err = inet_allow_bind(sk, addr->sin6_addr.s6_addr32[3]);
+		if (err < 0)
 			goto out;
-		}
+		else
+			err = 0;
 	} else {
 		if (addr_type != IPV6_ADDR_ANY) {
 			struct net_device *dev = NULL;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 14/27] afnetns: check for afnetns in inet6_bind
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (12 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 13/27] afnetns: use inet_allow_bind in inet6_bind Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 15/27] afnetns: add ipv6_get_ifaddr_afnetns_rcu Hannes Frederic Sowa
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/addrconf.h |  3 ++-
 net/ipv6/addrconf.c    | 12 ++++++++++--
 net/ipv6/af_inet6.c    |  7 +++++--
 net/ipv6/ndisc.c       |  4 ++--
 net/ipv6/route.c       |  2 +-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 17c6fd84e28780..e3f1920ca57968 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -63,7 +63,8 @@ int addrconf_set_dstaddr(struct net *net, void __user *arg);
 
 int ipv6_chk_addr(struct net *net, const struct in6_addr *addr,
 		  const struct net_device *dev, int strict);
-int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr,
+int ipv6_chk_addr_and_flags(struct net *net, struct afnetns *afnetns,
+			    const struct in6_addr *addr,
 			    const struct net_device *dev, int strict,
 			    u32 banned_flags);
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c67f6d3c5b9a7a..2e546584695118 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1776,11 +1776,13 @@ static int ipv6_count_addresses(struct inet6_dev *idev)
 int ipv6_chk_addr(struct net *net, const struct in6_addr *addr,
 		  const struct net_device *dev, int strict)
 {
-	return ipv6_chk_addr_and_flags(net, addr, dev, strict, IFA_F_TENTATIVE);
+	return ipv6_chk_addr_and_flags(net, NULL, addr, dev, strict,
+				       IFA_F_TENTATIVE);
 }
 EXPORT_SYMBOL(ipv6_chk_addr);
 
-int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr,
+int ipv6_chk_addr_and_flags(struct net *net, struct afnetns *afnetns,
+			    const struct in6_addr *addr,
 			    const struct net_device *dev, int strict,
 			    u32 banned_flags)
 {
@@ -1792,6 +1794,12 @@ int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr,
 	hlist_for_each_entry_rcu(ifp, &inet6_addr_lst[hash], addr_lst) {
 		if (!net_eq(dev_net(ifp->idev->dev), net))
 			continue;
+
+#if IS_ENABLED(CONFIG_AFNETNS)
+		if (afnetns && ifp->afnetns != afnetns)
+			continue;
+#endif
+
 		/* Decouple optimistic from tentative for evaluation here.
 		 * Ban optimistic addresses explicitly, when required.
 		 */
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index f9367c507573bc..ffb116297c0950 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -362,8 +362,11 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			if (!(addr_type & IPV6_ADDR_MULTICAST))	{
 				if (!net->ipv6.sysctl.ip_nonlocal_bind &&
 				    !(inet->freebind || inet->transparent) &&
-				    !ipv6_chk_addr(net, &addr->sin6_addr,
-						   dev, 0)) {
+				    !ipv6_chk_addr_and_flags(net,
+							     sock_afnetns(sk),
+							     &addr->sin6_addr,
+							     dev, 0,
+							     IFA_F_TENTATIVE)) {
 					err = -EADDRNOTAVAIL;
 					goto out_unlock;
 				}
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 7ebac630d3c603..4415659f8cfb0d 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -693,8 +693,8 @@ static void ndisc_solicit(struct neighbour *neigh, struct sk_buff *skb)
 	struct in6_addr *target = (struct in6_addr *)&neigh->primary_key;
 	int probes = atomic_read(&neigh->probes);
 
-	if (skb && ipv6_chk_addr_and_flags(dev_net(dev), &ipv6_hdr(skb)->saddr,
-					   dev, 1,
+	if (skb && ipv6_chk_addr_and_flags(dev_net(dev), NULL,
+					   &ipv6_hdr(skb)->saddr, dev, 1,
 					   IFA_F_TENTATIVE|IFA_F_OPTIMISTIC))
 		saddr = &ipv6_hdr(skb)->saddr;
 	probes -= NEIGH_VAR(neigh->parms, UCAST_PROBES);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 229bfcc451ef50..87d87c5413d71e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2007,7 +2007,7 @@ static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg)
 		 * prefix route was assigned to, which might be non-loopback.
 		 */
 		err = -EINVAL;
-		if (ipv6_chk_addr_and_flags(net, gw_addr,
+		if (ipv6_chk_addr_and_flags(net, NULL, gw_addr,
 					    gwa_type & IPV6_ADDR_LINKLOCAL ?
 					    dev : NULL, 0, 0))
 			goto out;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 15/27] afnetns: add ipv6_get_ifaddr_afnetns_rcu
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (13 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 14/27] afnetns: check for afnetns " Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 16/27] afnetns: add udpv6 support Hannes Frederic Sowa
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/addrconf.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index e3f1920ca57968..644fa68bb4ddef 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -104,6 +104,23 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
 				 u32 addr_flags, bool sllao, bool tokenized,
 				 __u32 valid_lft, u32 prefered_lft);
 
+static inline
+struct afnetns *ipv6_get_ifaddr_afnetns_rcu(struct net *net,
+					    const struct in6_addr *addr,
+					    struct net_device *dev)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct inet6_ifaddr *ifp;
+
+	ifp = ipv6_get_ifaddr(net, addr, dev, 1);
+	if (ifp)
+		return ifp->afnetns;
+	return net->afnet_ns;
+#else
+	return NULL;
+#endif
+}
+
 static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
 {
 	if (dev->addr_len != ETH_ALEN)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 16/27] afnetns: add udpv6 support
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (14 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 15/27] afnetns: add ipv6_get_ifaddr_afnetns_rcu Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 17/27] afnetns: introduce __inet_select_addr Hannes Frederic Sowa
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/datagram.c |  6 ++++--
 net/ipv6/udp.c      | 18 +++++++++++++-----
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index eec27f87efaca1..cd811e8b1ba824 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -804,8 +804,10 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 			if (addr_type != IPV6_ADDR_ANY) {
 				int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
 				if (!(inet_sk(sk)->freebind || inet_sk(sk)->transparent) &&
-				    !ipv6_chk_addr(net, &src_info->ipi6_addr,
-						   strict ? dev : NULL, 0) &&
+				    !ipv6_chk_addr_and_flags(net, sock_afnetns(sk),
+							     &src_info->ipi6_addr,
+							     strict ? dev : NULL, 0,
+							     IFA_F_TENTATIVE) &&
 				    !ipv6_chk_acast_addr_src(net, dev,
 							     &src_info->ipi6_addr))
 					err = -EINVAL;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4e4c401e3bc690..d63e0e362fe72b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -126,6 +126,7 @@ static void udp_v6_rehash(struct sock *sk)
 }
 
 static int compute_score(struct sock *sk, struct net *net,
+			 struct afnetns *afnetns,
 			 const struct in6_addr *saddr, __be16 sport,
 			 const struct in6_addr *daddr, unsigned short hnum,
 			 int dif, bool exact_dif)
@@ -138,6 +139,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	    sk->sk_family != PF_INET6)
 		return -1;
 
+	if (sock_afnetns(sk) != afnetns)
+		return -1;
+
 	score = 0;
 	inet = inet_sk(sk);
 
@@ -173,6 +177,7 @@ static int compute_score(struct sock *sk, struct net *net,
 
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
+		struct afnetns *afnetns,
 		const struct in6_addr *saddr, __be16 sport,
 		const struct in6_addr *daddr, unsigned int hnum, int dif,
 		bool exact_dif, struct udp_hslot *hslot2,
@@ -185,7 +190,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 	result = NULL;
 	badness = -1;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-		score = compute_score(sk, net, saddr, sport,
+		score = compute_score(sk, net, afnetns, saddr, sport,
 				      daddr, hnum, dif, exact_dif);
 		if (score > badness) {
 			reuseport = sk->sk_reuseport;
@@ -224,8 +229,11 @@ struct sock *__udp6_lib_lookup(struct net *net,
 	struct udp_hslot *hslot2, *hslot = &udptable->hash[slot];
 	bool exact_dif = udp6_lib_exact_dif_match(net, skb);
 	int score, badness, matches = 0, reuseport = 0;
+	struct afnetns *afnetns;
 	u32 hash = 0;
 
+	afnetns = ipv6_get_ifaddr_afnetns_rcu(net, daddr, skb->dev);
+
 	if (hslot->count > 10) {
 		hash2 = udp6_portaddr_hash(net, daddr, hnum);
 		slot2 = hash2 & udptable->mask;
@@ -233,7 +241,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 		if (hslot->count < hslot2->count)
 			goto begin;
 
-		result = udp6_lib_lookup2(net, saddr, sport,
+		result = udp6_lib_lookup2(net, afnetns, saddr, sport,
 					  daddr, hnum, dif, exact_dif,
 					  hslot2, skb);
 		if (!result) {
@@ -248,7 +256,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			if (hslot->count < hslot2->count)
 				goto begin;
 
-			result = udp6_lib_lookup2(net, saddr, sport,
+			result = udp6_lib_lookup2(net, afnetns, saddr, sport,
 						  daddr, hnum, dif,
 						  exact_dif, hslot2,
 						  skb);
@@ -259,8 +267,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
 	result = NULL;
 	badness = -1;
 	sk_for_each_rcu(sk, &hslot->head) {
-		score = compute_score(sk, net, saddr, sport, daddr, hnum, dif,
-				      exact_dif);
+		score = compute_score(sk, net, afnetns, saddr, sport, daddr,
+				      hnum, dif, exact_dif);
 		if (score > badness) {
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 17/27] afnetns: introduce __inet_select_addr
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (15 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 16/27] afnetns: add udpv6 support Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 18/27] afnetns: afnetns should influence source address selection Hannes Frederic Sowa
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/inetdevice.h |  2 ++
 net/ipv4/devinet.c         | 27 ++++++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 01cbcfe93383b7..a41bfce099e0a1 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -170,6 +170,8 @@ int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b);
 int devinet_ioctl(struct net *net, unsigned int cmd, void __user *);
 void devinet_init(void);
 struct in_device *inetdev_by_index(struct net *, int);
+__be32 __inet_select_addr(const struct net_device *dev, __be32 dst, int scope,
+			  struct afnetns *afnetns);
 __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope);
 __be32 inet_confirm_addr(struct net *net, struct in_device *in_dev, __be32 dst,
 			 __be32 local, int scope);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cc15afefa1df0a..0844d917aa8d7d 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1224,7 +1224,17 @@ static int inet_gifconf(struct net_device *dev, char __user *buf, int len)
 	return done;
 }
 
-__be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
+static struct afnetns *ifa_afnetns(struct in_ifaddr *ifa)
+{
+#if IS_ENABLED(CONFIG_AFNETNS)
+	return ifa->afnetns;
+#else
+	return NULL;
+#endif
+}
+
+__be32 __inet_select_addr(const struct net_device *dev, __be32 dst,
+			  int scope, struct afnetns *afnetns)
 {
 	__be32 addr = 0;
 	struct in_device *in_dev;
@@ -1237,6 +1247,8 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 		goto no_in_dev;
 
 	for_primary_ifa(in_dev) {
+		if (afnetns && afnetns != ifa_afnetns(ifa))
+			continue;
 		if (ifa->ifa_scope > scope)
 			continue;
 		if (!dst || inet_ifa_match(dst, ifa)) {
@@ -1262,7 +1274,8 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 	    (in_dev = __in_dev_get_rcu(dev))) {
 		for_primary_ifa(in_dev) {
 			if (ifa->ifa_scope != RT_SCOPE_LINK &&
-			    ifa->ifa_scope <= scope) {
+			    ifa->ifa_scope <= scope &&
+			    (!afnetns || afnetns == ifa_afnetns(ifa))) {
 				addr = ifa->ifa_local;
 				goto out_unlock;
 			}
@@ -1283,7 +1296,8 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 
 		for_primary_ifa(in_dev) {
 			if (ifa->ifa_scope != RT_SCOPE_LINK &&
-			    ifa->ifa_scope <= scope) {
+			    ifa->ifa_scope <= scope &&
+			    (!afnetns || afnetns == ifa_afnetns(ifa))) {
 				addr = ifa->ifa_local;
 				goto out_unlock;
 			}
@@ -1293,6 +1307,13 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 	rcu_read_unlock();
 	return addr;
 }
+EXPORT_SYMBOL(__inet_select_addr);
+
+__be32 inet_select_addr(const struct net_device *dev, __be32 dst,
+			int scope)
+{
+	return __inet_select_addr(dev, dst, scope, NULL);
+}
 EXPORT_SYMBOL(inet_select_addr);
 
 static __be32 confirm_addr_indev(struct in_device *in_dev, __be32 dst,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 18/27] afnetns: afnetns should influence source address selection
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (16 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 17/27] afnetns: introduce __inet_select_addr Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 19/27] afnetns: add afnetns support for tcpv4 Hannes Frederic Sowa
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 drivers/target/iscsi/cxgbit/cxgbit_cm.c |  2 +-
 include/linux/inetdevice.h              |  5 +++--
 include/net/route.h                     | 10 ++++++----
 net/ipv4/devinet.c                      | 19 ++++++++++++++++---
 net/ipv4/icmp.c                         |  4 ++--
 net/ipv4/igmp.c                         |  2 +-
 net/ipv4/route.c                        | 21 ++++++++++++---------
 net/ipv4/xfrm4_policy.c                 |  2 +-
 net/sctp/protocol.c                     |  4 ++--
 net/tipc/udp_media.c                    |  2 +-
 10 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index 37a05185dcbe0e..4ae59d20d8e260 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -266,7 +266,7 @@ static struct net_device *cxgbit_ipv4_netdev(__be32 saddr)
 {
 	struct net_device *ndev;
 
-	ndev = __ip_dev_find(&init_net, saddr, false);
+	ndev = __ip_dev_find(&init_net, NULL, saddr, false);
 	if (!ndev)
 		return NULL;
 
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a41bfce099e0a1..9411270cb0fe64 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -160,10 +160,11 @@ void inet_netconf_notify_devconf(struct net *net, int type, int ifindex,
 				 struct ipv4_devconf *devconf);
 
 struct in_ifaddr *ifa_find_rcu(struct net *net, __be32 addr);
-struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref);
+struct net_device *__ip_dev_find(struct net *net, struct afnetns *afnetns,
+				 __be32 addr, bool devref);
 static inline struct net_device *ip_dev_find(struct net *net, __be32 addr)
 {
-	return __ip_dev_find(net, addr, true);
+	return __ip_dev_find(net, NULL, addr, true);
 }
 
 int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b);
diff --git a/include/net/route.h b/include/net/route.h
index c0874c87c17371..d29449d1863636 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -113,13 +113,15 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
-					  int mp_hash);
+struct rtable *__ip_route_output_key_hash(struct net *net,
+					  struct afnetns *afnetns,
+					  struct flowi4 *flp, int mp_hash);
 
 static inline struct rtable *__ip_route_output_key(struct net *net,
+						   struct afnetns *afnetns,
 						   struct flowi4 *flp)
 {
-	return __ip_route_output_key_hash(net, flp, -1);
+	return __ip_route_output_key_hash(net, afnetns, flp, -1);
 }
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
@@ -286,7 +288,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
 			      sport, dport, sk);
 
 	if (!dst || !src) {
-		rt = __ip_route_output_key(net, fl4);
+		rt = __ip_route_output_key(net, NULL, fl4);
 		if (IS_ERR(rt))
 			return rt;
 		ip_rt_put(rt);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0844d917aa8d7d..82a7389ec86faa 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -150,14 +150,27 @@ struct in_ifaddr *ifa_find_rcu(struct net *net, __be32 addr)
  *
  * If a caller uses devref=false, it should be protected by RCU, or RTNL
  */
-struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
+struct net_device *__ip_dev_find(struct net *net, struct afnetns *afnetns,
+				 __be32 addr, bool devref)
 {
-	struct net_device *result;
+	struct net_device *result = NULL;
 	struct in_ifaddr *ifa;
 
 	rcu_read_lock();
 	ifa = ifa_find_rcu(net, addr);
-	result = ifa ? ifa->ifa_dev->dev : NULL;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (afnetns && afnetns != net->afnet_ns) {
+		/* we are in a child namespace, thus only allow to
+		 * explicitly configured addresses
+		 */
+		if (!ifa || ifa->afnetns != afnetns) {
+			rcu_read_unlock();
+			return NULL;
+		}
+	}
+#endif
+	if (ifa)
+		result = ifa->ifa_dev->dev;
 	if (!result) {
 		struct flowi4 fl4 = { .daddr = addr };
 		struct fib_result res = { 0 };
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index fc310db2708bf6..74261d6b86e4fc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -505,7 +505,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
 	fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
 
 	security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
-	rt = __ip_route_output_key_hash(net, fl4,
+	rt = __ip_route_output_key_hash(net, NULL, fl4,
 					icmp_multipath_hash_skb(skb_in));
 	if (IS_ERR(rt))
 		return rt;
@@ -529,7 +529,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
 
 	if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
 				     fl4_dec.saddr) == RTN_LOCAL) {
-		rt2 = __ip_route_output_key(net, &fl4_dec);
+		rt2 = __ip_route_output_key(net, NULL, &fl4_dec);
 		if (IS_ERR(rt2))
 			err = PTR_ERR(rt2);
 	} else {
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 44fd86de2823dd..d246bf1704f4d8 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1754,7 +1754,7 @@ static struct in_device *ip_mc_find_dev(struct net *net, struct ip_mreqn *imr)
 		return idev;
 	}
 	if (imr->imr_address.s_addr) {
-		dev = __ip_dev_find(net, imr->imr_address.s_addr, false);
+		dev = __ip_dev_find(net, NULL, imr->imr_address.s_addr, false);
 		if (!dev)
 			return NULL;
 	}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8471dd11677146..f3304647082182 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1045,7 +1045,7 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
 
 	__build_flow_key(net, &fl4, NULL, iph, oif,
 			 RT_TOS(iph->tos), protocol, mark, flow_flags);
-	rt = __ip_route_output_key(net, &fl4);
+	rt = __ip_route_output_key(net, NULL, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_rt_update_pmtu(rt, &fl4, mtu);
 		ip_rt_put(rt);
@@ -1064,7 +1064,7 @@ static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 	if (!fl4.flowi4_mark)
 		fl4.flowi4_mark = IP4_REPLY_MARK(sock_net(sk), skb->mark);
 
-	rt = __ip_route_output_key(sock_net(sk), &fl4);
+	rt = __ip_route_output_key(sock_net(sk), NULL, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_rt_update_pmtu(rt, &fl4, mtu);
 		ip_rt_put(rt);
@@ -1134,7 +1134,7 @@ void ipv4_redirect(struct sk_buff *skb, struct net *net,
 
 	__build_flow_key(net, &fl4, NULL, iph, oif,
 			 RT_TOS(iph->tos), protocol, mark, flow_flags);
-	rt = __ip_route_output_key(net, &fl4);
+	rt = __ip_route_output_key(net, NULL, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_do_redirect(rt, skb, &fl4, false);
 		ip_rt_put(rt);
@@ -1150,7 +1150,7 @@ void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk)
 	struct net *net = sock_net(sk);
 
 	__build_flow_key(net, &fl4, sk, iph, 0, 0, 0, 0, 0);
-	rt = __ip_route_output_key(net, &fl4);
+	rt = __ip_route_output_key(net, NULL, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_do_redirect(rt, skb, &fl4, false);
 		ip_rt_put(rt);
@@ -2202,8 +2202,9 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
  * Major route resolver routine.
  */
 
-struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
-					  int mp_hash)
+struct rtable *__ip_route_output_key_hash(struct net *net,
+					  struct afnetns *afnetns,
+					  struct flowi4 *fl4, int mp_hash)
 {
 	struct net_device *dev_out = NULL;
 	__u8 tos = RT_FL_TOS(fl4);
@@ -2244,7 +2245,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 		    (ipv4_is_multicast(fl4->daddr) ||
 		     ipv4_is_lbcast(fl4->daddr))) {
 			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-			dev_out = __ip_dev_find(net, fl4->saddr, false);
+			dev_out = __ip_dev_find(net, NULL, fl4->saddr, false);
 			if (!dev_out)
 				goto out;
 
@@ -2269,7 +2270,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 
 		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
 			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-			if (!__ip_dev_find(net, fl4->saddr, false))
+			if (!__ip_dev_find(net, afnetns, fl4->saddr, false))
 				goto out;
 		}
 	}
@@ -2458,7 +2459,9 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
 struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 				    const struct sock *sk)
 {
-	struct rtable *rt = __ip_route_output_key(net, flp4);
+	struct rtable *rt;
+
+	rt = __ip_route_output_key(net, sk ? sock_afnetns(sk) : NULL, flp4);
 
 	if (IS_ERR(rt))
 		return rt;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 71b4ecc195c707..c8d9eaa59be8fc 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -33,7 +33,7 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4,
 
 	fl4->flowi4_flags = FLOWI_FLAG_SKIP_NH_OIF;
 
-	rt = __ip_route_output_key(net, fl4);
+	rt = __ip_route_output_key(net, NULL, fl4);
 	if (!IS_ERR(rt))
 		return &rt->dst;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 1b6d4574d2b02a..cd77ec87c5f9ef 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -520,8 +520,8 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 		/* Ensure the src address belongs to the output
 		 * interface.
 		 */
-		odev = __ip_dev_find(sock_net(sk), laddr->a.v4.sin_addr.s_addr,
-				     false);
+		odev = __ip_dev_find(sock_net(sk), NULL,
+				     laddr->a.v4.sin_addr.s_addr, false);
 		if (!odev || odev->ifindex != fl4->flowi4_oif) {
 			if (&rt->dst != dst)
 				dst_release(&rt->dst);
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 46061cf48cd135..98bc29e63058a2 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -688,7 +688,7 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
 	if (local.proto == htons(ETH_P_IP)) {
 		struct net_device *dev;
 
-		dev = __ip_dev_find(net, local.ipv4.s_addr, false);
+		dev = __ip_dev_find(net, NULL, local.ipv4.s_addr, false);
 		if (!dev) {
 			err = -ENODEV;
 			goto err;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 19/27] afnetns: add afnetns support for tcpv4
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (17 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 18/27] afnetns: afnetns should influence source address selection Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 20/27] ipv6: move ipv6_get_ifaddr to vmlinux in case ipv6 is build as module Hannes Frederic Sowa
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

This commit adds the necessary checks to inet_hashtables, so that
sockets also have to match the corresponding afnetns.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_sock.h    |  1 +
 net/ipv4/inet_hashtables.c | 17 +++++++++++++++--
 net/ipv4/tcp_input.c       |  3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index aa95053dfc78d3..d348f150e8e2c9 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -81,6 +81,7 @@ struct inet_request_sock {
 #define ir_iif			req.__req_common.skc_bound_dev_if
 #define ir_cookie		req.__req_common.skc_cookie
 #define ireq_net		req.__req_common.skc_net
+#define ireq_afnet		req.__req_common.skc_afnet
 #define ireq_state		req.__req_common.skc_state
 #define ireq_family		req.__req_common.skc_family
 
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 8bea74298173f5..813a8fa1331944 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -28,6 +28,8 @@
 #include <net/tcp.h>
 #include <net/sock_reuseport.h>
 
+#include <linux/inetdevice.h>
+
 static u32 inet_ehashfn(const struct net *net, const __be32 laddr,
 			const __u16 lport, const __be32 faddr,
 			const __be16 fport)
@@ -169,6 +171,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 EXPORT_SYMBOL_GPL(__inet_inherit_port);
 
 static inline int compute_score(struct sock *sk, struct net *net,
+				struct afnetns *afnetns,
 				const unsigned short hnum, const __be32 daddr,
 				const int dif, bool exact_dif)
 {
@@ -176,7 +179,7 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	struct inet_sock *inet = inet_sk(sk);
 
 	if (net_eq(sock_net(sk), net) && inet->inet_num == hnum &&
-			!ipv6_only_sock(sk)) {
+	    afnetns == sock_afnetns(sk) && !ipv6_only_sock(sk)) {
 		__be32 rcv_saddr = inet->inet_rcv_saddr;
 		score = sk->sk_family == PF_INET ? 2 : 1;
 		if (rcv_saddr) {
@@ -215,10 +218,14 @@ struct sock *__inet_lookup_listener(struct net *net,
 	int score, hiscore = 0, matches = 0, reuseport = 0;
 	bool exact_dif = inet_exact_dif_match(net, skb);
 	struct sock *sk, *result = NULL;
+	struct afnetns *afnetns;
 	u32 phash = 0;
 
+	afnetns = ifa_find_afnetns_rcu(net, daddr);
+
 	sk_for_each_rcu(sk, &ilb->head) {
-		score = compute_score(sk, net, hnum, daddr, dif, exact_dif);
+		score = compute_score(sk, net, afnetns, hnum, daddr, dif,
+				      exact_dif);
 		if (score > hiscore) {
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
@@ -272,6 +279,7 @@ struct sock *__inet_lookup_established(struct net *net,
 {
 	INET_ADDR_COOKIE(acookie, saddr, daddr);
 	const __portpair ports = INET_COMBINED_PORTS(sport, hnum);
+	struct afnetns *afnetns;
 	struct sock *sk;
 	const struct hlist_nulls_node *node;
 	/* Optimize here for direct hit, only listening connections can
@@ -281,10 +289,14 @@ struct sock *__inet_lookup_established(struct net *net,
 	unsigned int slot = hash & hashinfo->ehash_mask;
 	struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
 
+	afnetns = ifa_find_afnetns_rcu(net, daddr);
+
 begin:
 	sk_nulls_for_each_rcu(sk, node, &head->chain) {
 		if (sk->sk_hash != hash)
 			continue;
+		if (afnetns != sock_afnetns(sk))
+			continue;
 		if (likely(INET_MATCH(sk, net, acookie,
 				      saddr, daddr, ports, dif))) {
 			if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
@@ -445,6 +457,7 @@ static int inet_reuseport_add_sock(struct sock *sk,
 		    sk2->sk_bound_dev_if == sk->sk_bound_dev_if &&
 		    inet_csk(sk2)->icsk_bind_hash == tb &&
 		    sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) &&
+		    sock_afnetns(sk) == sock_afnetns(sk2) &&
 		    inet_rcv_saddr_equal(sk, sk2, false))
 			return reuseport_add_sock(sk, sk2);
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 96b67a8b18c3c3..0fc69a32c9faea 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6211,6 +6211,9 @@ struct request_sock *inet_reqsk_alloc(const struct request_sock_ops *ops,
 		atomic64_set(&ireq->ir_cookie, 0);
 		ireq->ireq_state = TCP_NEW_SYN_RECV;
 		write_pnet(&ireq->ireq_net, sock_net(sk_listener));
+#if IS_ENABLED(CONFIG_AFNETNS)
+		ireq->ireq_afnet = sock_afnetns(sk_listener);
+#endif
 		ireq->ireq_family = sk_listener->sk_family;
 	}
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 20/27] ipv6: move ipv6_get_ifaddr to vmlinux in case ipv6 is build as module
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (18 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 19/27] afnetns: add afnetns support for tcpv4 Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 21/27] afnetns: add support for tcpv6 Hannes Frederic Sowa
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

inet6_hashtables is build into vmlinux in case ipv6 gets build as a
module. As the inet6_hashtables functions depend on ipv6_get_ifaddr
via ipv6_get_ifaddr_afnetns_rcu, we need to make the lookup function
always available.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/addrconf.h      |  6 ++++++
 net/ipv6/addrconf.c         | 35 +----------------------------------
 net/ipv6/inet6_hashtables.c | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 34 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 644fa68bb4ddef..dcb17f88fd2875 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -78,6 +78,7 @@ bool ipv6_chk_custom_prefix(const struct in6_addr *addr,
 
 int ipv6_chk_prefix(const struct in6_addr *addr, struct net_device *dev);
 
+extern struct hlist_head inet6_addr_lst[IN6_ADDR_HSIZE];
 struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net,
 				     const struct in6_addr *addr,
 				     struct net_device *dev, int strict);
@@ -416,6 +417,11 @@ static inline bool ipv6_addr_is_solict_mult(const struct in6_addr *addr)
 #endif
 }
 
+static inline u32 inet6_addr_hash(const struct in6_addr *addr)
+{
+	return hash_32(ipv6_addr_hash(addr), IN6_ADDR_HSIZE_SHIFT);
+}
+
 #ifdef CONFIG_PROC_FS
 int if6_proc_init(void);
 void if6_proc_exit(void);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2e546584695118..319f83a7d29dd5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -160,7 +160,6 @@ static int ipv6_generate_stable_address(struct in6_addr *addr,
 /*
  *	Configured unicast address hash table
  */
-static struct hlist_head inet6_addr_lst[IN6_ADDR_HSIZE];
 static DEFINE_SPINLOCK(addrconf_hash_lock);
 
 static void addrconf_verify(void);
@@ -936,11 +935,6 @@ ipv6_link_dev_addr(struct inet6_dev *idev, struct inet6_ifaddr *ifp)
 	list_add_tail(&ifp->if_list, p);
 }
 
-static u32 inet6_addr_hash(const struct in6_addr *addr)
-{
-	return hash_32(ipv6_addr_hash(addr), IN6_ADDR_HSIZE_SHIFT);
-}
-
 /* On success it returns ifp with increased reference count */
 
 static struct inet6_ifaddr *
@@ -1888,30 +1882,6 @@ int ipv6_chk_prefix(const struct in6_addr *addr, struct net_device *dev)
 }
 EXPORT_SYMBOL(ipv6_chk_prefix);
 
-struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *addr,
-				     struct net_device *dev, int strict)
-{
-	struct inet6_ifaddr *ifp, *result = NULL;
-	unsigned int hash = inet6_addr_hash(addr);
-
-	rcu_read_lock_bh();
-	hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[hash], addr_lst) {
-		if (!net_eq(dev_net(ifp->idev->dev), net))
-			continue;
-		if (ipv6_addr_equal(&ifp->addr, addr)) {
-			if (!dev || ifp->idev->dev == dev ||
-			    !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) {
-				result = ifp;
-				in6_ifa_hold(ifp);
-				break;
-			}
-		}
-	}
-	rcu_read_unlock_bh();
-
-	return result;
-}
-
 /* Gets referenced address, destroys ifaddr */
 
 static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed)
@@ -6518,7 +6488,7 @@ static struct rtnl_af_ops inet6_ops __read_mostly = {
 int __init addrconf_init(void)
 {
 	struct inet6_dev *idev;
-	int i, err;
+	int err;
 
 	err = ipv6_addr_label_init();
 	if (err < 0) {
@@ -6563,9 +6533,6 @@ int __init addrconf_init(void)
 		goto errlo;
 	}
 
-	for (i = 0; i < IN6_ADDR_HSIZE; i++)
-		INIT_HLIST_HEAD(&inet6_addr_lst[i]);
-
 	register_netdevice_notifier(&ipv6_dev_notf);
 
 	addrconf_verify();
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index d0900918a19e5e..8570e0e3016b65 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -25,6 +25,9 @@
 #include <net/ip.h>
 #include <net/sock_reuseport.h>
 
+struct hlist_head inet6_addr_lst[IN6_ADDR_HSIZE];
+EXPORT_SYMBOL(inet6_addr_lst);
+
 u32 inet6_ehashfn(const struct net *net,
 		  const struct in6_addr *laddr, const u16 lport,
 		  const struct in6_addr *faddr, const __be16 fport)
@@ -44,6 +47,32 @@ u32 inet6_ehashfn(const struct net *net,
 			       inet6_ehash_secret + net_hash_mix(net));
 }
 
+struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net,
+				     const struct in6_addr *addr,
+				     struct net_device *dev, int strict)
+{
+	struct inet6_ifaddr *ifp, *result = NULL;
+	unsigned int hash = inet6_addr_hash(addr);
+
+	rcu_read_lock_bh();
+	hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[hash], addr_lst) {
+		if (!net_eq(dev_net(ifp->idev->dev), net))
+			continue;
+		if (ipv6_addr_equal(&ifp->addr, addr)) {
+			if (!dev || ifp->idev->dev == dev ||
+			    !(ifp->scope & (IFA_LINK | IFA_HOST) || strict)) {
+				result = ifp;
+				in6_ifa_hold(ifp);
+				break;
+			}
+		}
+	}
+	rcu_read_unlock_bh();
+
+	return result;
+}
+EXPORT_SYMBOL(ipv6_get_ifaddr);
+
 /*
  * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so
  * we need not check it for TCP lookups anymore, thanks Alexey. -DaveM
@@ -275,3 +304,13 @@ int inet6_hash(struct sock *sk)
 	return err;
 }
 EXPORT_SYMBOL_GPL(inet6_hash);
+
+int __init inet6_hashtables_init(void)
+{
+	int i;
+
+	for (i = 0; i < IN6_ADDR_HSIZE; i++)
+		INIT_HLIST_HEAD(&inet6_addr_lst[i]);
+	return 0;
+}
+early_initcall(inet6_hashtables_init);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 21/27] afnetns: add support for tcpv6
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (19 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 20/27] ipv6: move ipv6_get_ifaddr to vmlinux in case ipv6 is build as module Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 22/27] afnetns: track owning namespace for inet_bind Hannes Frederic Sowa
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Same as the support for tcpv4, we simply add the necessary checks so we
just look at our own sockets.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/inet6_hashtables.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 8570e0e3016b65..05b71f0937e676 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -87,6 +87,7 @@ struct sock *__inet6_lookup_established(struct net *net,
 					   const u16 hnum,
 					   const int dif)
 {
+	struct afnetns *afnetns;
 	struct sock *sk;
 	const struct hlist_nulls_node *node;
 	const __portpair ports = INET_COMBINED_PORTS(sport, hnum);
@@ -97,11 +98,15 @@ struct sock *__inet6_lookup_established(struct net *net,
 	unsigned int slot = hash & hashinfo->ehash_mask;
 	struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
 
+	afnetns = ipv6_get_ifaddr_afnetns_rcu(net, daddr,
+					      dev_get_by_index_rcu(net, dif));
 
 begin:
 	sk_nulls_for_each_rcu(sk, node, &head->chain) {
 		if (sk->sk_hash != hash)
 			continue;
+		if (sock_afnetns(sk) != afnetns)
+			continue;
 		if (!INET6_MATCH(sk, net, saddr, daddr, ports, dif))
 			continue;
 		if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
@@ -123,14 +128,15 @@ struct sock *__inet6_lookup_established(struct net *net,
 EXPORT_SYMBOL(__inet6_lookup_established);
 
 static inline int compute_score(struct sock *sk, struct net *net,
+				struct afnetns *afnetns,
 				const unsigned short hnum,
 				const struct in6_addr *daddr,
 				const int dif, bool exact_dif)
 {
 	int score = -1;
 
-	if (net_eq(sock_net(sk), net) && inet_sk(sk)->inet_num == hnum &&
-	    sk->sk_family == PF_INET6) {
+	if (net_eq(sock_net(sk), net) && sock_afnetns(sk) == afnetns &&
+	    inet_sk(sk)->inet_num == hnum && sk->sk_family == PF_INET6) {
 
 		score = 1;
 		if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) {
@@ -162,10 +168,14 @@ struct sock *inet6_lookup_listener(struct net *net,
 	int score, hiscore = 0, matches = 0, reuseport = 0;
 	bool exact_dif = inet6_exact_dif_match(net, skb);
 	struct sock *sk, *result = NULL;
+	struct afnetns *afnetns;
 	u32 phash = 0;
 
+	afnetns = ipv6_get_ifaddr_afnetns_rcu(net, daddr, skb->dev);
+
 	sk_for_each(sk, &ilb->head) {
-		score = compute_score(sk, net, hnum, daddr, dif, exact_dif);
+		score = compute_score(sk, net, afnetns, hnum, daddr, dif,
+				      exact_dif);
 		if (score > hiscore) {
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 22/27] afnetns: track owning namespace for inet_bind
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (20 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 21/27] afnetns: add support for tcpv6 Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 23/27] afnetns: use user_ns from afnetns for checking for binding to port < 1024 Hannes Frederic Sowa
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

In order for a newly created afnetns to allow its processes to bind to
ports lower than 1024 we need to track the to be created user namespace
to check for the permissions for binding so.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/afnetns.h    |  7 +++++--
 kernel/nsproxy.c         |  2 +-
 net/core/afnetns.c       | 18 ++++++++++++------
 net/core/net_namespace.c |  2 +-
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/net/afnetns.h b/include/net/afnetns.h
index 9039086717c356..9db49551fff714 100644
--- a/include/net/afnetns.h
+++ b/include/net/afnetns.h
@@ -8,6 +8,7 @@
 struct afnetns {
 #if IS_ENABLED(CONFIG_AFNETNS)
 	refcount_t ref;
+	struct user_namespace *user_ns;
 	struct ns_common ns;
 	struct net *net;
 #endif
@@ -17,8 +18,10 @@ extern struct afnetns init_afnetns;
 
 int afnet_ns_init(void);
 
-struct afnetns *afnetns_new(struct net *net);
-struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old);
+struct afnetns *afnetns_new(struct net *net, struct user_namespace *user_ns);
+struct afnetns *copy_afnet_ns(unsigned long flags,
+			      struct user_namespace *user_ns,
+			      struct nsproxy *old);
 struct afnetns *afnetns_get_by_fd(int fd);
 unsigned int afnetns_to_inode(struct afnetns *afnetns);
 void afnetns_free(struct afnetns *afnetns);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f99ecbdd506137..90462012aecf78 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -114,7 +114,7 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 	}
 
 #if IS_ENABLED(CONFIG_AFNETNS)
-	new_nsp->afnet_ns = copy_afnet_ns(flags, tsk->nsproxy);
+	new_nsp->afnet_ns = copy_afnet_ns(flags, user_ns, tsk->nsproxy);
 	if (IS_ERR(new_nsp->afnet_ns)) {
 		err = PTR_ERR(new_nsp->afnet_ns);
 		goto out_afnet;
diff --git a/net/core/afnetns.c b/net/core/afnetns.c
index b96c25b5ebe30d..69d776564c69be 100644
--- a/net/core/afnetns.c
+++ b/net/core/afnetns.c
@@ -5,6 +5,7 @@
 #include <linux/file.h>
 #include <linux/nsproxy.h>
 #include <linux/proc_ns.h>
+#include <linux/user_namespace.h>
 
 const struct proc_ns_operations afnetns_operations;
 
@@ -17,7 +18,8 @@ static struct afnetns *ns_to_afnet(struct ns_common *ns)
 	return container_of(ns, struct afnetns, ns);
 }
 
-static int afnet_setup(struct afnetns *afnetns, struct net *net)
+static int afnet_setup(struct afnetns *afnetns, struct net *net,
+		       struct user_namespace *user_ns)
 {
 	int err;
 
@@ -28,11 +30,12 @@ static int afnet_setup(struct afnetns *afnetns, struct net *net)
 
 	refcount_set(&afnetns->ref, 1);
 	afnetns->net = get_net(net);
+	afnetns->user_ns = get_user_ns(user_ns);
 
 	return err;
 }
 
-struct afnetns *afnetns_new(struct net *net)
+struct afnetns *afnetns_new(struct net *net, struct user_namespace *user_ns)
 {
 	int err;
 	struct afnetns *afnetns;
@@ -41,7 +44,7 @@ struct afnetns *afnetns_new(struct net *net)
 	if (!afnetns)
 		return ERR_PTR(-ENOMEM);
 
-	err = afnet_setup(afnetns, net);
+	err = afnet_setup(afnetns, net, user_ns);
 	if (err) {
 		kfree(afnetns);
 		return ERR_PTR(err);
@@ -54,6 +57,7 @@ void afnetns_free(struct afnetns *afnetns)
 {
 	ns_free_inum(&afnetns->ns);
 	put_net(afnetns->net);
+	put_user_ns(afnetns->user_ns);
 	kfree(afnetns);
 }
 EXPORT_SYMBOL(afnetns_free);
@@ -85,7 +89,9 @@ unsigned int afnetns_to_inode(struct afnetns *afnetns)
 }
 EXPORT_SYMBOL(afnetns_to_inode);
 
-struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old)
+struct afnetns *copy_afnet_ns(unsigned long flags,
+			      struct user_namespace *user_ns,
+			      struct nsproxy *old)
 {
 	if (flags & CLONE_NEWNET)
 		return afnetns_get(old->net_ns->afnet_ns);
@@ -93,7 +99,7 @@ struct afnetns *copy_afnet_ns(unsigned long flags, struct nsproxy *old)
 	if (!(flags & CLONE_NEWAFNET))
 		return afnetns_get(old->afnet_ns);
 
-	return afnetns_new(old->net_ns);
+	return afnetns_new(old->net_ns, user_ns);
 }
 
 static struct ns_common *afnet_get(struct task_struct *task)
@@ -144,7 +150,7 @@ int __init afnet_ns_init(void)
 {
 	int err;
 
-	err = afnet_setup(&init_afnetns, &init_net);
+	err = afnet_setup(&init_afnetns, &init_net, &init_user_ns);
 	if (err)
 		return err;
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 1b11883d8cdbbd..6bb1c87e72dcc0 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -287,7 +287,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 
 #if IS_ENABLED(CONFIG_AFNETNS)
 	if (likely(!net_eq(&init_net, net))) {
-		net->afnet_ns = afnetns_new(net);
+		net->afnet_ns = afnetns_new(net, user_ns);
 		if (IS_ERR(net->afnet_ns)) {
 			error = PTR_ERR(net->afnet_ns);
 			goto out;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 23/27] afnetns: use user_ns from afnetns for checking for binding to port < 1024
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (21 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 22/27] afnetns: track owning namespace for inet_bind Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 24/27] afnetns: check afnetns user_ns in inet6_bind Hannes Frederic Sowa
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_common.h |  2 +-
 net/ipv4/af_inet.c        | 37 ++++++++++++++++++++++---------------
 net/ipv6/af_inet6.c       |  2 +-
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 4ac8229dca6af4..16dfbb02296be6 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -30,7 +30,7 @@ int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
 void inet_sock_destruct(struct sock *sk);
 int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
-int inet_allow_bind(struct sock *sk, __be32 addr);
+int inet_allow_bind(struct sock *sk, __be32 addr, unsigned short snum);
 int inet_getname(struct socket *sock, struct sockaddr *uaddr, int *uaddr_len,
 		 int peer);
 int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5f11399bafd16f..da7e6299073743 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -428,12 +428,14 @@ int inet_release(struct socket *sock)
 }
 EXPORT_SYMBOL(inet_release);
 
-int inet_allow_bind(struct sock *sk, __be32 addr)
+int inet_allow_bind(struct sock *sk, __be32 addr, unsigned short snum)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct net *net = sock_net(sk);
+	struct afnetns *afnetns = NULL;
 	u32 tb_id = RT_TABLE_LOCAL;
 	int chk_addr_ret;
+	int err = 0;
 
 	tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id;
 	chk_addr_ret = inet_addr_type_table(net, addr, tb_id);
@@ -453,18 +455,29 @@ int inet_allow_bind(struct sock *sk, __be32 addr)
 	    chk_addr_ret != RTN_BROADCAST)
 		return -EADDRNOTAVAIL;
 
+	rcu_read_lock();
 	if (chk_addr_ret == RTN_LOCAL &&
 	    net_afnetns(net) != sock_afnetns(sk)) {
-		struct afnetns *afnetns;
-
-		rcu_read_lock();
 		afnetns = ifa_find_afnetns_rcu(net, addr);
 		if (afnetns != sock_afnetns(sk))
-			chk_addr_ret = -EADDRNOTAVAIL;
-		rcu_read_unlock();
+			err = -EADDRNOTAVAIL;
+	}
+
+	if (!err && snum && snum < inet_prot_sock(net)) {
+		struct user_namespace *user_ns;
+
+#if IS_ENABLED(CONFIG_AFNETNS)
+		user_ns = afnetns ? afnetns->user_ns : net->user_ns;
+#else
+		user_ns = net->user_ns;
+#endif
+		if (!ns_capable(user_ns, CAP_NET_BIND_SERVICE))
+			err = -EACCES;
 	}
 
-	return chk_addr_ret;
+	rcu_read_unlock();
+
+	return err;
 }
 EXPORT_SYMBOL(inet_allow_bind);
 
@@ -473,7 +486,6 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	struct sockaddr_in *addr = (struct sockaddr_in *)uaddr;
 	struct sock *sk = sock->sk;
 	struct inet_sock *inet = inet_sk(sk);
-	struct net *net = sock_net(sk);
 	unsigned short snum;
 	int chk_addr_ret;
 	int err;
@@ -497,18 +509,13 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 	}
 
-	chk_addr_ret = inet_allow_bind(sk, addr->sin_addr.s_addr);
+	snum = ntohs(addr->sin_port);
+	chk_addr_ret = inet_allow_bind(sk, addr->sin_addr.s_addr, snum);
 	if (chk_addr_ret < 0) {
 		err = chk_addr_ret;
 		goto out;
 	}
 
-	snum = ntohs(addr->sin_port);
-	err = -EACCES;
-	if (snum && snum < inet_prot_sock(net) &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
-		goto out;
-
 	/*      We keep a pair of addresses. rcv_saddr is the one
 	 *      used by hash lookups, and saddr is used for transmit.
 	 *
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index ffb116297c0950..30aff01eba5be0 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -324,7 +324,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 		}
 
-		err = inet_allow_bind(sk, addr->sin6_addr.s6_addr32[3]);
+		err = inet_allow_bind(sk, addr->sin6_addr.s6_addr32[3], snum);
 		if (err < 0)
 			goto out;
 		else
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 24/27] afnetns: check afnetns user_ns in inet6_bind
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (22 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 23/27] afnetns: use user_ns from afnetns for checking for binding to port < 1024 Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 25/27] afnetns: ipv4: inherit afnetns from calling application Hannes Frederic Sowa
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/af_inet6.c | 40 ++++++++++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 30aff01eba5be0..4aa221826e753c 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -273,6 +273,26 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	goto out;
 }
 
+static int inet6_allow_bind(struct net *net, struct in6_addr *addr,
+			    unsigned short snum, struct net_device *dev)
+{
+	struct user_namespace *user_ns;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	struct afnetns *afnetns;
+
+	afnetns = ipv6_get_ifaddr_afnetns_rcu(net, addr, dev);
+	user_ns = afnetns ? afnetns->user_ns : net->user_ns;
+#else
+	user_ns = net->user_ns;
+#endif
+
+	if (snum && snum < inet_prot_sock(net) &&
+	    !ns_capable(user_ns, CAP_NET_BIND_SERVICE))
+		return -EADDRNOTAVAIL;
+
+	return 0;
+}
+
 
 /* bind for INET6 API */
 int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
@@ -301,11 +321,6 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	if ((addr_type & IPV6_ADDR_MULTICAST) && sock->type == SOCK_STREAM)
 		return -EINVAL;
 
-	snum = ntohs(addr->sin6_port);
-	if (snum && snum < inet_prot_sock(net) &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
-		return -EACCES;
-
 	lock_sock(sk);
 
 	/* Check these errors (active socket, double bind). */
@@ -314,6 +329,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		goto out;
 	}
 
+	snum = ntohs(addr->sin6_port);
+
 	/* Check if the address belongs to the host. */
 	if (addr_type == IPV6_ADDR_MAPPED) {
 		/* Binding to v4-mapped address on a v6-only socket
@@ -330,10 +347,12 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		else
 			err = 0;
 	} else {
+		struct net_device *dev = NULL;
+
+		rcu_read_lock();
+
 		if (addr_type != IPV6_ADDR_ANY) {
-			struct net_device *dev = NULL;
 
-			rcu_read_lock();
 			if (__ipv6_addr_needs_scope_id(addr_type)) {
 				if (addr_len >= sizeof(struct sockaddr_in6) &&
 				    addr->sin6_scope_id) {
@@ -371,8 +390,13 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 					goto out_unlock;
 				}
 			}
-			rcu_read_unlock();
 		}
+
+		err = inet6_allow_bind(net, &addr->sin6_addr, snum, dev);
+		if (err)
+			goto out_unlock;
+
+		rcu_read_unlock();
 	}
 
 	inet->inet_rcv_saddr = v4addr;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 25/27] afnetns: ipv4: inherit afnetns from calling application
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (23 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 24/27] afnetns: check afnetns user_ns in inet6_bind Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 26/27] afnetns: ipv6: " Hannes Frederic Sowa
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/devinet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 82a7389ec86faa..01bdff8a957ae1 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -838,7 +838,7 @@ static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh,
 			goto errout_free;
 		}
 	} else {
-		ifa->afnetns = afnetns_get(net->afnet_ns);
+		ifa->afnetns = afnetns_get(current->nsproxy->afnet_ns);
 	}
 #else
 	if (tb[IFA_AFNETNS_FD]) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 26/27] afnetns: ipv6: inherit afnetns from calling application
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (24 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 25/27] afnetns: ipv4: inherit afnetns from calling application Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:01 ` [PATCH net-next RFC v1 27/27] afnetns: allow only whitelisted protocols to operate inside afnetns Hannes Frederic Sowa
  2017-03-12 23:26 ` [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level David Miller
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/addrconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 319f83a7d29dd5..3d9d24ec066a67 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4542,7 +4542,7 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 		if (IS_ERR(afnetns))
 			return PTR_ERR(afnetns);
 	} else {
-		afnetns = afnetns_get(net_afnetns(net));
+		afnetns = afnetns_get(current->nsproxy->afnet_ns);
 	}
 #else
 	if (tb[IFA_AFNETNS_FD])
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next RFC v1 27/27] afnetns: allow only whitelisted protocols to operate inside afnetns
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (25 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 26/27] afnetns: ipv6: " Hannes Frederic Sowa
@ 2017-03-12 23:01 ` Hannes Frederic Sowa
  2017-03-12 23:26 ` [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level David Miller
  27 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:01 UTC (permalink / raw)
  To: netdev

We only care about inet protocols (which is IPv4 and IPv6). Other
protocols, like netlink are not under control of afnetns and thus must
be hardened with capabilities.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/protocol.h |  1 +
 net/ipv4/af_inet.c     | 20 +++++++++++++++-----
 net/ipv4/udplite.c     |  3 ++-
 net/ipv6/af_inet6.c    | 14 +++++++++++---
 net/ipv6/tcp_ipv6.c    |  3 ++-
 net/ipv6/udp.c         |  3 ++-
 net/ipv6/udplite.c     |  3 ++-
 7 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index bf36ca34af7ad2..7b64f71b16ccc0 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -91,6 +91,7 @@ struct inet_protosw {
 #define INET_PROTOSW_REUSE 0x01	     /* Are ports automatically reusable? */
 #define INET_PROTOSW_PERMANENT 0x02  /* Permanent protocols are unremovable. */
 #define INET_PROTOSW_ICSK      0x04  /* Is this an inet_connection_sock? */
+#define INET_PROTOSW_AFNETNS_OK 0x08 /* Is this proto afnetns compatible? */
 
 extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
 extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index da7e6299073743..1eb8a8ea49f56c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -302,14 +302,22 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 			goto out_rcu_unlock;
 	}
 
+	sock->ops = answer->ops;
+	answer_prot = answer->prot;
+	answer_flags = answer->flags;
+
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
 	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
-	sock->ops = answer->ops;
-	answer_prot = answer->prot;
-	answer_flags = answer->flags;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (unlikely(!kern &&
+		     current->nsproxy->afnet_ns != net->afnet_ns &&
+		     !(answer_flags & INET_PROTOSW_AFNETNS_OK)))
+		goto out_rcu_unlock;
+#endif
+
 	rcu_read_unlock();
 
 	WARN_ON(!answer_prot->slab);
@@ -1060,7 +1068,8 @@ static struct inet_protosw inetsw_array[] =
 		.prot =       &tcp_prot,
 		.ops =        &inet_stream_ops,
 		.flags =      INET_PROTOSW_PERMANENT |
-			      INET_PROTOSW_ICSK,
+			      INET_PROTOSW_ICSK |
+			      INET_PROTOSW_AFNETNS_OK,
 	},
 
 	{
@@ -1068,7 +1077,8 @@ static struct inet_protosw inetsw_array[] =
 		.protocol =   IPPROTO_UDP,
 		.prot =       &udp_prot,
 		.ops =        &inet_dgram_ops,
-		.flags =      INET_PROTOSW_PERMANENT,
+		.flags =      INET_PROTOSW_PERMANENT |
+			      INET_PROTOSW_AFNETNS_OK,
        },
 
        {
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 59f10fe9782e57..fbdb4208ebc483 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -69,7 +69,8 @@ static struct inet_protosw udplite4_protosw = {
 	.protocol	=  IPPROTO_UDPLITE,
 	.prot		=  &udplite_prot,
 	.ops		=  &inet_dgram_ops,
-	.flags		=  INET_PROTOSW_PERMANENT,
+	.flags		=  INET_PROTOSW_PERMANENT |
+			   INET_PROTOSW_AFNETNS_OK,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 4aa221826e753c..e21804b24be408 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -167,14 +167,22 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 			goto out_rcu_unlock;
 	}
 
+	sock->ops = answer->ops;
+	answer_prot = answer->prot;
+	answer_flags = answer->flags;
+
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
 	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
-	sock->ops = answer->ops;
-	answer_prot = answer->prot;
-	answer_flags = answer->flags;
+#if IS_ENABLED(CONFIG_AFNETNS)
+	if (unlikely(!kern &&
+		     current->nsproxy->afnet_ns != net->afnet_ns &&
+		     !(answer_flags & INET_PROTOSW_AFNETNS_OK)))
+		goto out_rcu_unlock;
+#endif
+
 	rcu_read_unlock();
 
 	WARN_ON(!answer_prot->slab);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 56f742fff96723..5b3b34495d4538 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1944,7 +1944,8 @@ static struct inet_protosw tcpv6_protosw = {
 	.prot		=	&tcpv6_prot,
 	.ops		=	&inet6_stream_ops,
 	.flags		=	INET_PROTOSW_PERMANENT |
-				INET_PROTOSW_ICSK,
+				INET_PROTOSW_ICSK |
+				INET_PROTOSW_AFNETNS_OK,
 };
 
 static int __net_init tcpv6_net_init(struct net *net)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d63e0e362fe72b..8707aab65872f9 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1475,7 +1475,8 @@ static struct inet_protosw udpv6_protosw = {
 	.protocol =  IPPROTO_UDP,
 	.prot =      &udpv6_prot,
 	.ops =       &inet6_dgram_ops,
-	.flags =     INET_PROTOSW_PERMANENT,
+	.flags =     INET_PROTOSW_PERMANENT |
+		     INET_PROTOSW_AFNETNS_OK,
 };
 
 int __init udpv6_init(void)
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index 2784cc363f2b53..331a6eb7a278da 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -63,7 +63,8 @@ static struct inet_protosw udplite6_protosw = {
 	.protocol	= IPPROTO_UDPLITE,
 	.prot		= &udplitev6_prot,
 	.ops		= &inet6_dgram_ops,
-	.flags		= INET_PROTOSW_PERMANENT,
+	.flags		= INET_PROTOSW_PERMANENT |
+			  INET_PROTOSW_AFNETNS_OK,
 };
 
 int __init udplitev6_init(void)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
  2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
                   ` (26 preceding siblings ...)
  2017-03-12 23:01 ` [PATCH net-next RFC v1 27/27] afnetns: allow only whitelisted protocols to operate inside afnetns Hannes Frederic Sowa
@ 2017-03-12 23:26 ` David Miller
  2017-03-12 23:44   ` Hannes Frederic Sowa
  27 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2017-03-12 23:26 UTC (permalink / raw)
  To: hannes; +Cc: netdev

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 13 Mar 2017 00:01:24 +0100

> afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
> can work with afnetns with one limitation: one cannot cross the realm
> of a network namespace while changing the afnetns compartement. To get
> into a new afnetns in a different net namespace, one must first change
> to the net namespace and afterwards switch to the desired afnetns.

Please explain why this is useful, who wants this kind of facility,
and how it will be used.

Thank you.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
  2017-03-12 23:26 ` [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level David Miller
@ 2017-03-12 23:44   ` Hannes Frederic Sowa
       [not found]     ` <1489362279.2283.1.camel-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-12 23:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Hi,

On Sun, 2017-03-12 at 16:26 -0700, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Mon, 13 Mar 2017 00:01:24 +0100
> 
> > afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
> > can work with afnetns with one limitation: one cannot cross the realm
> > of a network namespace while changing the afnetns compartement. To get
> > into a new afnetns in a different net namespace, one must first change
> > to the net namespace and afterwards switch to the desired afnetns.
> 
> Please explain why this is useful, who wants this kind of facility,
> and how it will be used.

Yes, I have to enhance the cover letter:

The work behind all this is to provide more dense container hosting.
Right now we lose performance, because all packets need to be forwarded
through either a bridge or must be routed until they reach the
containers. For example, we can't make use of early demuxing for the
incoming packets. We basically pass the networking stack twice for
every packet.

The usage is very much in line with how network namespaces are used
nowadays:

ip afnetns add afns-1
ip address add 192.168.1.1/24 dev eth0 afnetns afns-1
ip afnetns exec afns-1 /usr/sbin/httpd

this spawns a shell where all child processes will only have access to
the specific ip addresses, even though they do a wildcard bind. Source
address selection will also use only the ip addresses available to the
children.

In some sense it has lots of characteristics like ipvlan, allowing a
single MAC address to host lots of IP addresses which will end up in
different namespaces. Unlink ipvlan however, it will also solve the
problem around duplicate address detection and multiplexing packets to
the IGMP or MLD state machines.

The resource consumption in comparison with ordinary namespaces will be
much lower. All in all, we will have far less networking subsystems to
cross compared to normal netns solutions.

Some more information also in the first patch, which adds a
Documentation.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
       [not found]     ` <1489362279.2283.1.camel-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-03-13 19:56       ` Michael Kerrisk
       [not found]         ` <CAHO5Pa1s949dohzEEE68Ux=mXA7N7sR-U98Jwjvx1a_A5AhFEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2017-03-13 19:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, netdev, Linux API

[CC += linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org]

Hannes,

Since this is a kernel-user-space API change, please CC linux-api@
(and on future iterations of the series). The kernel source file
Documentation/SubmitChecklist notes that all Linux kernel patches that
change userspace interfaces should be CCed to
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html

Thanks,

Michael


On Mon, Mar 13, 2017 at 12:44 AM, Hannes Frederic Sowa
<hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org> wrote:
> Hi,
>
> On Sun, 2017-03-12 at 16:26 -0700, David Miller wrote:
>> From: Hannes Frederic Sowa <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
>> Date: Mon, 13 Mar 2017 00:01:24 +0100
>>
>> > afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
>> > can work with afnetns with one limitation: one cannot cross the realm
>> > of a network namespace while changing the afnetns compartement. To get
>> > into a new afnetns in a different net namespace, one must first change
>> > to the net namespace and afterwards switch to the desired afnetns.
>>
>> Please explain why this is useful, who wants this kind of facility,
>> and how it will be used.
>
> Yes, I have to enhance the cover letter:
>
> The work behind all this is to provide more dense container hosting.
> Right now we lose performance, because all packets need to be forwarded
> through either a bridge or must be routed until they reach the
> containers. For example, we can't make use of early demuxing for the
> incoming packets. We basically pass the networking stack twice for
> every packet.
>
> The usage is very much in line with how network namespaces are used
> nowadays:
>
> ip afnetns add afns-1
> ip address add 192.168.1.1/24 dev eth0 afnetns afns-1
> ip afnetns exec afns-1 /usr/sbin/httpd
>
> this spawns a shell where all child processes will only have access to
> the specific ip addresses, even though they do a wildcard bind. Source
> address selection will also use only the ip addresses available to the
> children.
>
> In some sense it has lots of characteristics like ipvlan, allowing a
> single MAC address to host lots of IP addresses which will end up in
> different namespaces. Unlink ipvlan however, it will also solve the
> problem around duplicate address detection and multiplexing packets to
> the IGMP or MLD state machines.
>
> The resource consumption in comparison with ordinary namespaces will be
> much lower. All in all, we will have far less networking subsystems to
> cross compared to normal netns solutions.
>
> Some more information also in the first patch, which adds a
> Documentation.
>
> Bye,
> Hannes
>



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
       [not found]         ` <CAHO5Pa1s949dohzEEE68Ux=mXA7N7sR-U98Jwjvx1a_A5AhFEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-13 22:06           ` Eric W. Biederman
  2017-03-14 10:18             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 34+ messages in thread
From: Eric W. Biederman @ 2017-03-13 22:06 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: Hannes Frederic Sowa, David Miller, netdev, Linux API

Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On Mon, Mar 13, 2017 at 12:44 AM, Hannes Frederic Sowa
> <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org> wrote:
>> Hi,
>>
>> On Sun, 2017-03-12 at 16:26 -0700, David Miller wrote:
>>> From: Hannes Frederic Sowa <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
>>> Date: Mon, 13 Mar 2017 00:01:24 +0100
>>>
>>> > afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
>>> > can work with afnetns with one limitation: one cannot cross the realm
>>> > of a network namespace while changing the afnetns compartement. To get
>>> > into a new afnetns in a different net namespace, one must first change
>>> > to the net namespace and afterwards switch to the desired afnetns.
>>>
>>> Please explain why this is useful, who wants this kind of facility,
>>> and how it will be used.
>>
>> Yes, I have to enhance the cover letter:
>>
>> The work behind all this is to provide more dense container hosting.
>> Right now we lose performance, because all packets need to be forwarded
>> through either a bridge or must be routed until they reach the
>> containers. For example, we can't make use of early demuxing for the
>> incoming packets. We basically pass the networking stack twice for
>> every packet.
>>
>> The usage is very much in line with how network namespaces are used
>> nowadays:
>>
>> ip afnetns add afns-1
>> ip address add 192.168.1.1/24 dev eth0 afnetns afns-1
>> ip afnetns exec afns-1 /usr/sbin/httpd
>>
>> this spawns a shell where all child processes will only have access to
>> the specific ip addresses, even though they do a wildcard bind. Source
>> address selection will also use only the ip addresses available to the
>> children.
>>
>> In some sense it has lots of characteristics like ipvlan, allowing a
>> single MAC address to host lots of IP addresses which will end up in
>> different namespaces. Unlink ipvlan however, it will also solve the
>> problem around duplicate address detection and multiplexing packets to
>> the IGMP or MLD state machines.
>>
>> The resource consumption in comparison with ordinary namespaces will be
>> much lower. All in all, we will have far less networking subsystems to
>> cross compared to normal netns solutions.
>>
>> Some more information also in the first patch, which adds a
>> Documentation.

If the goal is one ip address per network namespace with a network
device and mac address on the network I have something that I was
working on that I believe is in the end is a much simpler solution.

Add routes in the routing table between network namespaces.

AKA in the initial network namespace with the network device have
an input route not towards the local loopback device but towards
the network namespaces loopback device.

Before other issues took precedence I made it half way to implementing
that.   The ip input path won't get confused if the destination network
device is not in the same network namespace as the device.  Last I
looked the ip output path still had a few places where confusion was
possible between the network socket and the output device.

As long as installing such routes is conditional upon having
CAP_NET_ADMIN in both network namespaces you should be fine and things
should be very simple and very fast.  Because that won't take a special
case through the network stack.

Given that performance is your primary motive I suspect this will yield
the fastest possible path through the network stack as no extra steps
need to be taken, and can benefit from any routing improvements to the
ordinary network stack.

Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
  2017-03-13 22:06           ` Eric W. Biederman
@ 2017-03-14 10:18             ` Hannes Frederic Sowa
       [not found]               ` <cc9229f8-a389-87cc-2512-ee00e200a7c3-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Hannes Frederic Sowa @ 2017-03-14 10:18 UTC (permalink / raw)
  To: Eric W. Biederman, Michael Kerrisk; +Cc: David Miller, netdev, Linux API

On 13.03.2017 23:06, Eric W. Biederman wrote:
> Michael Kerrisk <mtk.manpages@gmail.com> writes:
> 
>> On Mon, Mar 13, 2017 at 12:44 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> Hi,
>>>
>>> On Sun, 2017-03-12 at 16:26 -0700, David Miller wrote:
>>>> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>> Date: Mon, 13 Mar 2017 00:01:24 +0100
>>>>
>>>>> afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
>>>>> can work with afnetns with one limitation: one cannot cross the realm
>>>>> of a network namespace while changing the afnetns compartement. To get
>>>>> into a new afnetns in a different net namespace, one must first change
>>>>> to the net namespace and afterwards switch to the desired afnetns.
>>>>
>>>> Please explain why this is useful, who wants this kind of facility,
>>>> and how it will be used.
>>>
>>> Yes, I have to enhance the cover letter:
>>>
>>> The work behind all this is to provide more dense container hosting.
>>> Right now we lose performance, because all packets need to be forwarded
>>> through either a bridge or must be routed until they reach the
>>> containers. For example, we can't make use of early demuxing for the
>>> incoming packets. We basically pass the networking stack twice for
>>> every packet.
>>>
>>> The usage is very much in line with how network namespaces are used
>>> nowadays:
>>>
>>> ip afnetns add afns-1
>>> ip address add 192.168.1.1/24 dev eth0 afnetns afns-1
>>> ip afnetns exec afns-1 /usr/sbin/httpd
>>>
>>> this spawns a shell where all child processes will only have access to
>>> the specific ip addresses, even though they do a wildcard bind. Source
>>> address selection will also use only the ip addresses available to the
>>> children.
>>>
>>> In some sense it has lots of characteristics like ipvlan, allowing a
>>> single MAC address to host lots of IP addresses which will end up in
>>> different namespaces. Unlink ipvlan however, it will also solve the
>>> problem around duplicate address detection and multiplexing packets to
>>> the IGMP or MLD state machines.
>>>
>>> The resource consumption in comparison with ordinary namespaces will be
>>> much lower. All in all, we will have far less networking subsystems to
>>> cross compared to normal netns solutions.
>>>
>>> Some more information also in the first patch, which adds a
>>> Documentation.
> 
> If the goal is one ip address per network namespace with a network
> device and mac address on the network I have something that I was
> working on that I believe is in the end is a much simpler solution.

Actually, it should be possible to use more than one IP address per
namespace, proper source address selection should deal with that and
also correctly select the higher scored ones, based on output device and
distance to the remote ip address.

> Add routes in the routing table between network namespaces.
> 
> AKA in the initial network namespace with the network device have
> an input route not towards the local loopback device but towards
> the network namespaces loopback device.
> 
> Before other issues took precedence I made it half way to implementing
> that.   The ip input path won't get confused if the destination network
> device is not in the same network namespace as the device.  Last I
> looked the ip output path still had a few places where confusion was
> possible between the network socket and the output device.

The ip afnetns input path is also of no concern to me and will work
quite easily. Right now, the different semantics and rules for selecting
a source address are the more problematic ones. I think, that in the
case of directly routing from one ns into another this will be the same
and the most complex case to deal with?

> As long as installing such routes is conditional upon having
> CAP_NET_ADMIN in both network namespaces you should be fine and things
> should be very simple and very fast.  Because that won't take a special
> case through the network stack.
> 
> Given that performance is your primary motive I suspect this will yield
> the fastest possible path through the network stack as no extra steps
> need to be taken, and can benefit from any routing improvements to the
> ordinary network stack.

The major performance improvements come from socket early demuxing,
which actually requires the remote netns socket being visible in the
initial netns esock tables. We need the same for the representations for
IP addresses to have ARP/NDISC work correctly. As soon as you try to
just cross one data structure from one netns to another one, it gets
really difficult to keep track of all the dependencies. It felt way more
complex than this approach.

Thanks for your comments!

Bye,
Hannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level
       [not found]               ` <cc9229f8-a389-87cc-2512-ee00e200a7c3-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-03-14 17:46                 ` Eric W. Biederman
  0 siblings, 0 replies; 34+ messages in thread
From: Eric W. Biederman @ 2017-03-14 17:46 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Michael Kerrisk, David Miller, netdev, Linux API

Hannes Frederic Sowa <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org> writes:

> On 13.03.2017 23:06, Eric W. Biederman wrote:
>> Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> 
>>> On Mon, Mar 13, 2017 at 12:44 AM, Hannes Frederic Sowa
>>> <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org> wrote:
>>>> Hi,
>>>>
>>>> On Sun, 2017-03-12 at 16:26 -0700, David Miller wrote:
>>>>> From: Hannes Frederic Sowa <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
>>>>> Date: Mon, 13 Mar 2017 00:01:24 +0100
>>>>>
>>>>>> afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
>>>>>> can work with afnetns with one limitation: one cannot cross the realm
>>>>>> of a network namespace while changing the afnetns compartement. To get
>>>>>> into a new afnetns in a different net namespace, one must first change
>>>>>> to the net namespace and afterwards switch to the desired afnetns.
>>>>>
>>>>> Please explain why this is useful, who wants this kind of facility,
>>>>> and how it will be used.
>>>>
>>>> Yes, I have to enhance the cover letter:
>>>>
>>>> The work behind all this is to provide more dense container hosting.
>>>> Right now we lose performance, because all packets need to be forwarded
>>>> through either a bridge or must be routed until they reach the
>>>> containers. For example, we can't make use of early demuxing for the
>>>> incoming packets. We basically pass the networking stack twice for
>>>> every packet.
>>>>
>>>> The usage is very much in line with how network namespaces are used
>>>> nowadays:
>>>>
>>>> ip afnetns add afns-1
>>>> ip address add 192.168.1.1/24 dev eth0 afnetns afns-1
>>>> ip afnetns exec afns-1 /usr/sbin/httpd
>>>>
>>>> this spawns a shell where all child processes will only have access to
>>>> the specific ip addresses, even though they do a wildcard bind. Source
>>>> address selection will also use only the ip addresses available to the
>>>> children.
>>>>
>>>> In some sense it has lots of characteristics like ipvlan, allowing a
>>>> single MAC address to host lots of IP addresses which will end up in
>>>> different namespaces. Unlink ipvlan however, it will also solve the
>>>> problem around duplicate address detection and multiplexing packets to
>>>> the IGMP or MLD state machines.
>>>>
>>>> The resource consumption in comparison with ordinary namespaces will be
>>>> much lower. All in all, we will have far less networking subsystems to
>>>> cross compared to normal netns solutions.
>>>>
>>>> Some more information also in the first patch, which adds a
>>>> Documentation.
>> 
>> If the goal is one ip address per network namespace with a network
>> device and mac address on the network I have something that I was
>> working on that I believe is in the end is a much simpler solution.
>
> Actually, it should be possible to use more than one IP address per
> namespace, proper source address selection should deal with that and
> also correctly select the higher scored ones, based on output device and
> distance to the remote ip address.

Definitely.  I should have said at least one.  Some people want address
sharing and precludes several kinds of optimizations.

>> Add routes in the routing table between network namespaces.
>> 
>> AKA in the initial network namespace with the network device have
>> an input route not towards the local loopback device but towards
>> the network namespaces loopback device.
>> 
>> Before other issues took precedence I made it half way to implementing
>> that.   The ip input path won't get confused if the destination network
>> device is not in the same network namespace as the device.  Last I
>> looked the ip output path still had a few places where confusion was
>> possible between the network socket and the output device.
>
> The ip afnetns input path is also of no concern to me and will work
> quite easily. Right now, the different semantics and rules for selecting
> a source address are the more problematic ones. I think, that in the
> case of directly routing from one ns into another this will be the same
> and the most complex case to deal with?

With what I am proposing that case should be drop dead simple and cause
no confusion.  The extra routes should look like ordinary routes
for forwarding packets, not local addresses and as such should cause
no confusion.  So source address selection should work perfectly as is.

>> As long as installing such routes is conditional upon having
>> CAP_NET_ADMIN in both network namespaces you should be fine and things
>> should be very simple and very fast.  Because that won't take a special
>> case through the network stack.
>> 
>> Given that performance is your primary motive I suspect this will yield
>> the fastest possible path through the network stack as no extra steps
>> need to be taken, and can benefit from any routing improvements to the
>> ordinary network stack.
>
> The major performance improvements come from socket early demuxing,
> which actually requires the remote netns socket being visible in the
> initial netns esock tables. We need the same for the representations for
> IP addresses to have ARP/NDISC work correctly. As soon as you try to
> just cross one data structure from one netns to another one, it gets
> really difficult to keep track of all the dependencies. It felt way more
> complex than this approach.

So I will grant I don't see how to perform early demuxing to the
namespaces.  Fundamentally that is hard because the general case allows
network addresses to be repeated in different namespaces.

However there should be a very nice performance gain as a second
trip through the network stack is avoided and the code to perform the
input or output work is fundamentally simple.

As for ARP/NDISC to get the the ARP/NDISC replies working you will
need to enable proxy arp/ndisc, which is what you usually have
to do with that kind of routing nothing special there.  On the output
path the ARP/NDISC tables of the outgoing device will be used
so nothing special needs to happen there.  The latter just falls out of
how the code is designed.

Similarly we will need proxying for IGMP and MLD to enable subscribing
to multicast protocols.  But all of that is the ordinary routing.

So I believe it will take a little bit of care to get things going but
fundamentally it really looks to me like the only new case that needs
to be supported by the network stack is adding a route to an existing
routing table that spans network namespaces.  That includes using the
arp/neighbour table from that network device Which is very well defined
and trivial to maintain.

The only downside I see is the loss of early_demux but that is
fundamental as the network addresses may potentially overlap.

So for best performance to containers disabling early_demux looks like
it will be the way to go.  But I will be really surprised if the route
table lookup will be expensive unless there are a huge number of
containers or routes in the system.  Especially as that code uses an
efficient data structure and was seriously optimized about two years
ago.

Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-03-14 17:46 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-12 23:01 [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 01/27] afnetns: add CLONE_NEWAFNET flag Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 02/27] afnetns: basic namespace operations and representations Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 03/27] afnetns: prepare for integration into ipv4 Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 04/27] afnetns: add net_afnetns Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 05/27] afnetns: ipv6 integration Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 06/27] afnetns: put afnetns pointer into struct sock Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 07/27] ipv4: introduce ifa_find_rcu Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 08/27] afnetns: factor out inet_allow_bind Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 09/27] afnetns: add sock_afnetns Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 10/27] afnetns: add ifa_find_afnetns_rcu Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 11/27] afnetns: validate afnetns in inet_allow_bind Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 12/27] afnetns: ipv4/udp integration Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 13/27] afnetns: use inet_allow_bind in inet6_bind Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 14/27] afnetns: check for afnetns " Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 15/27] afnetns: add ipv6_get_ifaddr_afnetns_rcu Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 16/27] afnetns: add udpv6 support Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 17/27] afnetns: introduce __inet_select_addr Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 18/27] afnetns: afnetns should influence source address selection Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 19/27] afnetns: add afnetns support for tcpv4 Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 20/27] ipv6: move ipv6_get_ifaddr to vmlinux in case ipv6 is build as module Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 21/27] afnetns: add support for tcpv6 Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 22/27] afnetns: track owning namespace for inet_bind Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 23/27] afnetns: use user_ns from afnetns for checking for binding to port < 1024 Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 24/27] afnetns: check afnetns user_ns in inet6_bind Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 25/27] afnetns: ipv4: inherit afnetns from calling application Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 26/27] afnetns: ipv6: " Hannes Frederic Sowa
2017-03-12 23:01 ` [PATCH net-next RFC v1 27/27] afnetns: allow only whitelisted protocols to operate inside afnetns Hannes Frederic Sowa
2017-03-12 23:26 ` [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level David Miller
2017-03-12 23:44   ` Hannes Frederic Sowa
     [not found]     ` <1489362279.2283.1.camel-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-03-13 19:56       ` Michael Kerrisk
     [not found]         ` <CAHO5Pa1s949dohzEEE68Ux=mXA7N7sR-U98Jwjvx1a_A5AhFEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-13 22:06           ` Eric W. Biederman
2017-03-14 10:18             ` Hannes Frederic Sowa
     [not found]               ` <cc9229f8-a389-87cc-2512-ee00e200a7c3-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-03-14 17:46                 ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.