All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 00/11] netns: don't switch namespace while creating kernel sockets
@ 2015-05-07  8:52 Ying Xue
  2015-05-07  8:52 ` [RFC PATCH net-next 01/11] netns: Fix race between put_net() and netlink_kernel_create() Ying Xue
                   ` (11 more replies)
  0 siblings, 12 replies; 56+ messages in thread
From: Ying Xue @ 2015-05-07  8:52 UTC (permalink / raw)
  To: netdev
  Cc: cwang, herbert, xemul, davem, eric.dumazet, ebiederm, maxk,
	stephen, tgraf, nicolas.dichtel, tom, jchapman, erik.hugne,
	jon.maloy, horms

When commit 23fe18669e7f ("[NETNS]: Fix race between put_net() and
netlink_kernel_create().") attempted to fix the race between put_net()
and kernel socket's creation, it adopted a complex solution: create
netlink socket inside init_net namespace and then re-attach it to the
desired one right after the socket is created; similarly, when close
the socket, move back its namespace to init_net so that the socket can
be destroyed in the context which is same as the socket creation.

But the solution artificially makes the whole thing complex as its
design is not only weird, but also it causes a bad consequence that
when all kernel modules create kernel sockets, they have to follow
the model of namespace switch. More importantly, with the way kernel
sockets are created in init_net namespace, but they are released in
another new ones. This inconsistent namespace brings some modules many
inconvenience. For example, what tipc socket is inserted to rhashtable
happens in socket's creation, and different namespace has different
rhashtable for tipc socket. With the approach, a tipc kernel socket
will be inserted into the rhashtable of init_net. But as releasing
the socket happens in another one, it causes what the socket cannot
be found from the rhashtable of the new namespace.

Therefore, we propose a simpler solution to avoid the race: if we
find there is still pending a cleanup work in __put_net(), we don't
queue a new cleanup work to stop the cleanup process. The new proposal
not only successfully solves the race, but also it can help us to
avoid unnecessary namespace switches when creating kernel sockets.
Moreover, it can guarantee that both creation and release of kernel
sockets happen in the same namespace at all time.

In the series, we first resolve the race with patch #1, and then
prevent namespace switches from happening in all relevant kernel
modules one by one from patch #2 to patch #9. Until now, as all
dependencies on sk_change_net() are killed, we can delete the
interface completely in patch #10. Lastly, we simplify the code of
creating kernel sockets through changing the original behaviours
of sock_create_kern() and sk_release_kernel(). If a kernel socket
is created within a namespace which is different with init_net,
we must put the reference counter of the namespace once the socket
is successfully allocated in sk_alloc(), otherwise, the namespace
is probably unable to be shut down finally. Therefore, we decrease
namespace's reference counter once a kernel socket is created
successfully by sock_create_kern() within a namespace which is
different with init_net. Similarly, namespace's reference counter
must be increased back before the socket is destroyed in
sk_release_kernel().

Welcome to any comments.

Ying Xue (11):
  netns: Fix race between put_net() and netlink_kernel_create()
  netlink: avoid unnecessary namespace switch when create netlink
    kernel sockets
  tun: avoid unnecessary namespace switch druing kernel socket creation
  inet: avoid unnecessary namespace switch during kernel socket
    creation
  udp_tunnel: avoid to switch namespace for tunnel socket
  ip6_udp_tunnel: avoid to switch namespace for tunnel socket
  l2tp: avoid to switch namespace for l2tp tunnel socket
  ipvs: avoid to switch namespace for ipvs kernel socket
  tipc: fix net leak issue
  tipc: remove sk_change_net interface
  net: change behaviours of functions of creating and releasing kernel
    sockets

 drivers/block/drbd/drbd_receiver.c |    6 ++++--
 drivers/net/tun.c                  |   14 +++++++++-----
 fs/afs/rxrpc.c                     |    3 ++-
 fs/dlm/lowcomms.c                  |   16 ++++++++--------
 include/linux/net.h                |    3 ++-
 include/net/sock.h                 |   16 ----------------
 net/bluetooth/rfcomm/core.c        |    3 ++-
 net/ceph/messenger.c               |    4 ++--
 net/core/net_namespace.c           |   10 ++++++++--
 net/core/sock.c                    |    5 ++---
 net/ipv4/af_inet.c                 |    4 +---
 net/ipv4/udp_tunnel.c              |    4 +---
 net/ipv6/ip6_udp_tunnel.c          |    4 +---
 net/l2tp/l2tp_core.c               |   12 ++++--------
 net/netfilter/ipvs/ip_vs_sync.c    |   16 ++--------------
 net/netlink/af_netlink.c           |    7 ++++---
 net/rxrpc/ar-local.c               |    4 ++--
 net/socket.c                       |    9 +++++++--
 net/tipc/server.c                  |    5 +++++
 19 files changed, 66 insertions(+), 79 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2015-05-12 15:58 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-07  8:52 [RFC PATCH net-next 00/11] netns: don't switch namespace while creating kernel sockets Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 01/11] netns: Fix race between put_net() and netlink_kernel_create() Ying Xue
2015-05-07  9:04   ` Herbert Xu
2015-05-07 17:19     ` Cong Wang
2015-05-07 17:28       ` Eric W. Biederman
2015-05-08 11:20       ` Eric W. Biederman
2015-05-08 11:20       ` Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 02/11] netlink: avoid unnecessary namespace switch when create netlink kernel sockets Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 03/11] tun: avoid unnecessary namespace switch during kernel socket creation Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 04/11] inet: " Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 05/11] udp_tunnel: avoid to switch namespace for tunnel socket Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 06/11] ip6_udp_tunnel: " Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 07/11] l2tp: avoid to switch namespace for l2tp " Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 08/11] ipvs: avoid to switch namespace for ipvs kernel socket Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 09/11] tipc: fix net leak issue Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 10/11] tipc: remove sk_change_net interface Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 11/11] net: change behaviours of functions of creating and releasing kernel sockets Ying Xue
2015-05-07 16:14 ` [RFC PATCH net-next 00/11] netns: don't switch namespace while creating " Eric W. Biederman
2015-05-07 18:19   ` Cong Wang
2015-05-07 18:26     ` Eric W. Biederman
2015-05-07 18:53       ` Cong Wang
2015-05-07 18:58         ` Eric W. Biederman
2015-05-07 19:29           ` Cong Wang
2015-05-07 20:01             ` Eric W. Biederman
2015-05-08  9:10               ` Ying Xue
2015-05-08 11:15                 ` Eric W. Biederman
2015-05-08  8:50   ` Ying Xue
2015-05-08  9:25     ` Ying Xue
2015-05-08 11:07     ` Eric W. Biederman
2015-05-08 16:33       ` Cong Wang
2015-05-08 14:07   ` Herbert Xu
2015-05-08 17:36     ` Eric W. Biederman
2015-05-08 20:27       ` Cong Wang
2015-05-08 21:13         ` Cong Wang
2015-05-08 22:08           ` Eric W. Biederman
2015-05-09  1:13       ` Herbert Xu
2015-05-09  1:53         ` Eric W. Biederman
2015-05-09  2:05         ` [PATCH 0/6] Cleanup the " Eric W. Biederman
2015-05-09  2:07           ` [PATCH 1/6] tun: Utilize the normal socket network namespace refcounting Eric W. Biederman
2015-05-09  2:08           ` [PATCH 2/6] net: Add a struct net parameter to sock_create_kern Eric W. Biederman
2015-05-12  8:24             ` David Laight
2015-05-12  8:55               ` Eric W. Biederman
2015-05-12 11:48                 ` David Laight
2015-05-12 12:28                   ` Nicolas Dichtel
2015-05-12 13:16                     ` David Laight
2015-05-12 14:15                       ` Nicolas Dichtel
2015-05-12 15:58                       ` Eric W. Biederman
2015-05-12 14:45               ` David Miller
2015-05-09  2:09           ` [PATCH 3/6] net: Pass kern from net_proto_family.create to sk_alloc Eric W. Biederman
2015-05-09 16:51             ` Eric Dumazet
2015-05-09 17:31               ` Eric W. Biederman
2015-05-09  2:10           ` [PATCH 4/6] net: Modify sk_alloc to not reference count the netns of kernel sockets Eric W. Biederman
2015-05-09  2:11           ` [PATCH 5/6] netlink: Create kernel netlink sockets in the proper network namespace Eric W. Biederman
2015-05-09  2:12           ` [PATCH 6/6] net: kill sk_change_net and sk_release_kernel Eric W. Biederman
2015-05-09  2:38           ` [PATCH 0/6] Cleanup the kernel sockets Herbert Xu
2015-05-11 14:53           ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.