linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jamal <hadi@cyberus.ca>
To: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Pavel Emelyanov <xemul@parallels.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org,
	Linux Containers <containers@lists.osdl.org>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Ulrich Drepper <drepper@gmail.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	David Miller <davem@davemloft.net>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Pavel Emelyanov <xemul@openvz.org>,
	Ben Greear <greearb@candelatech.com>,
	Matt Helsley <matthltc@us.ibm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>,
	Jan Engelhardt <jengelh@medozas.de>,
	Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH 8/8] net: Implement socketat.
Date: Sun, 03 Oct 2010 09:44:01 -0400	[thread overview]
Message-ID: <1286113441.3812.229.camel@bigi> (raw)
In-Reply-To: <4CA7A07C.5030504@free.fr>

Hi Daniel,

Thanks for clarifying this ..

On Sat, 2010-10-02 at 23:13 +0200, Daniel Lezcano wrote:
> Just to clarify this point. You enter the namespace, create the socket
> and go back to the initial namespace (or create a new one). Further 
> operations can be made against this fd because it is the network 
> namespace stored in the sock struct which is used, not the current 
> process network namespace which is used at the socket creation only.
> 
> We can actually already do that by unsharing and then create a
> socket. 
> This socket will pin the namespace and can be used as a control socket
> for the namespace (assuming the socket domain will be ok for all the 
> operations).
>
> Jamal, I don't know what kind of application you want to use but if I 
> assume you want to create a process controlling 1024 netns, 

At the moment i am looking at 8K on a Nehalem with lots of RAM. They
will mostly be created at startup but some could be created afterwards.
Each will have its own netdevs etc. also created at startup (and some
other config that may happen later). 
Because startup time may accumulate, it is clearly important to me
to pick whatever scheme that reduces the number of calls...

> let's try to identificate what happen with setns and with socketat :
> 
> With setns:
> 
>      * open /proc/self/ns/net (1)
>      * unshare the netns
>      * open /proc/self/ns/net (2)
>      * setns (1)
>      * create a virtual network device
>      * move the virtual device to (2) (using the set netns by fd)
>      * unshare the netns
>      ...
> 
> With socketat:
> 
>      * open a socket (1)
>      * unshare the netns
>      * open a netlink with socketat(1) => (2)
>      * create a virtual device using (2) (at this point it is
> init_net_ns)
>      * move the virtual device to the current netns (using the set
> netns 
> by pid)
>      * open a socket (3)
>      * unshare the netns
>      ...
> 
> We have the same number of file descriptors kept opened. Except, with 
> setns we can bind mount the directory somewhere, that will pin the 
> namespace and then we can close the /proc/self/ns/net file descriptors
> and reopen them later.
> 

Ok, so a wrapper such as: create_socket_on(namespaceid)
will have generally less system calls with socketat()

> If your application has to do a lot of specific network processing, 
> during its life cycle, in different namespaces, the socketat syscall 
> will be better because it will reduce the number of syscalls but at
> the cost of keeping the file descriptors opened (potentially a big
> number). Otherwise, setns should fit your needs.

Makes sense. 

One thing still confuses me...
The app control point is in namespace0. I still want to be able to
"boot" namespaces first and maybe a few seconds later do a socketat()...
and create devices, tcp sockets etc. I suspect create_ns(namespace-name)
would involve:
     * open /proc/self/ns/net (namespace-name)
     * unshare the netns
Is this correct?

cheers,
jamal


  reply	other threads:[~2010-10-03 13:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-23  8:45 [ABI REVIEW][PATCH 0/8] Namespace file descriptors Eric W. Biederman
2010-09-23  8:46 ` [PATCH 1/8] ns: proc files for namespace naming policy Eric W. Biederman
2010-09-23  8:46 ` [PATCH 2/8] ns: Introduce the setns syscall Eric W. Biederman
2010-09-23  8:47 ` [PATCH 3/8] ns proc: Add support for the network namespace Eric W. Biederman
2010-09-23 11:27   ` Louis Rilling
2010-09-23 16:00     ` Eric W. Biederman
2010-09-23  8:48 ` [PATCH 4/8] ns proc: Add support for the uts namespace Eric W. Biederman
2010-09-23  8:49 ` [PATCH 5/8] ns proc: Add support for the ipc namespace Eric W. Biederman
2010-09-23  8:50 ` [PATCH 6/8] ns proc: Add support for the mount namespace Eric W. Biederman
2010-09-23  8:51 ` [PATCH 7/8] net: Allow setting the network namespace by fd Eric W. Biederman
2010-09-23  9:41   ` Eric Dumazet
2010-09-23 16:03     ` Eric W. Biederman
2010-09-23 11:22   ` jamal
2010-09-23 14:58     ` David Lamparter
2010-09-24 11:51       ` jamal
2010-09-24 12:57         ` David Lamparter
2010-09-24 13:32           ` jamal
2010-09-24 14:09             ` David Lamparter
2010-09-24 14:16               ` jamal
2010-09-23 15:14     ` Eric W. Biederman
2010-09-23 14:22   ` Brian Haley
2010-09-23 16:16     ` Eric W. Biederman
2010-09-24 13:46   ` Daniel Lezcano
2010-09-23  8:51 ` [PATCH 8/8] net: Implement socketat Eric W. Biederman
2010-09-23  8:56   ` Pavel Emelyanov
2010-09-23 11:19     ` jamal
2010-09-23 11:33       ` Pavel Emelyanov
2010-09-23 11:40         ` jamal
2010-09-23 11:53           ` Pavel Emelyanov
2010-09-23 12:11             ` jamal
2010-09-23 12:34               ` Pavel Emelyanov
2010-09-23 14:54                 ` David Lamparter
2010-09-23 15:00                 ` Eric W. Biederman
2010-10-02 21:13             ` Daniel Lezcano
2010-10-03 13:44               ` jamal [this message]
2010-10-04 10:13                 ` Daniel Lezcano
2010-10-04 19:07                 ` Eric W. Biederman
2010-10-15 12:30                 ` netns patches WAS( " jamal
2010-10-26 20:52                   ` jamal
2010-10-27  0:27                     ` Eric W. Biederman
2010-09-23 15:18 ` [ABI REVIEW][PATCH 0/8] Namespace file descriptors David Lamparter
2010-09-23 16:32   ` Eric W. Biederman
2010-09-23 16:49     ` David Lamparter
2010-09-24 13:02 ` Andrew Lutomirski
2010-09-24 13:49   ` Daniel Lezcano
2010-09-24 17:06     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1286113441.3812.229.camel@bigi \
    --to=hadi@cyberus.ca \
    --cc=containers@lists.osdl.org \
    --cc=corbet@lwn.net \
    --cc=daniel.lezcano@free.fr \
    --cc=davem@davemloft.net \
    --cc=drepper@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=greearb@candelatech.com \
    --cc=jengelh@medozas.de \
    --cc=kaber@trash.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthltc@us.ibm.com \
    --cc=mtk.manpages@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=sukadev@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=xemul@openvz.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).