From: Daniel Lezcano <daniel.lezcano@free.fr>
To: hadi@cyberus.ca
Cc: Pavel Emelyanov <xemul@parallels.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
linux-kernel@vger.kernel.org,
Linux Containers <containers@lists.osdl.org>,
netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Ulrich Drepper <drepper@gmail.com>,
Al Viro <viro@ZenIV.linux.org.uk>,
David Miller <davem@davemloft.net>,
"Serge E. Hallyn" <serge@hallyn.com>,
Pavel Emelyanov <xemul@openvz.org>,
Ben Greear <greearb@candelatech.com>,
Matt Helsley <matthltc@us.ibm.com>,
Jonathan Corbet <corbet@lwn.net>,
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>,
Jan Engelhardt <jengelh@medozas.de>,
Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH 8/8] net: Implement socketat.
Date: Mon, 04 Oct 2010 12:13:58 +0200 [thread overview]
Message-ID: <4CA9A8E6.8070600@free.fr> (raw)
In-Reply-To: <1286113441.3812.229.camel@bigi>
On 10/03/2010 03:44 PM, jamal wrote:
> Hi Daniel,
>
> Thanks for clarifying this ..
>
> On Sat, 2010-10-02 at 23:13 +0200, Daniel Lezcano wrote:
>
>> Just to clarify this point. You enter the namespace, create the socket
>> and go back to the initial namespace (or create a new one). Further
>> operations can be made against this fd because it is the network
>> namespace stored in the sock struct which is used, not the current
>> process network namespace which is used at the socket creation only.
>>
>> We can actually already do that by unsharing and then create a
>> socket.
>> This socket will pin the namespace and can be used as a control socket
>> for the namespace (assuming the socket domain will be ok for all the
>> operations).
>>
>> Jamal, I don't know what kind of application you want to use but if I
>> assume you want to create a process controlling 1024 netns,
>>
> At the moment i am looking at 8K on a Nehalem with lots of RAM. They
> will mostly be created at startup but some could be created afterwards.
> Each will have its own netdevs etc. also created at startup (and some
> other config that may happen later).
> Because startup time may accumulate, it is clearly important to me
> to pick whatever scheme that reduces the number of calls...
>
8K ! whow ! :)
>> let's try to identificate what happen with setns and with socketat :
>>
>> With setns:
>>
>> * open /proc/self/ns/net (1)
>> * unshare the netns
>> * open /proc/self/ns/net (2)
>> * setns (1)
>> * create a virtual network device
>> * move the virtual device to (2) (using the set netns by fd)
>> * unshare the netns
>> ...
>>
>> With socketat:
>>
>> * open a socket (1)
>> * unshare the netns
>> * open a netlink with socketat(1) => (2)
>> * create a virtual device using (2) (at this point it is
>> init_net_ns)
>> * move the virtual device to the current netns (using the set
>> netns
>> by pid)
>> * open a socket (3)
>> * unshare the netns
>> ...
>>
>> We have the same number of file descriptors kept opened. Except, with
>> setns we can bind mount the directory somewhere, that will pin the
>> namespace and then we can close the /proc/self/ns/net file descriptors
>> and reopen them later.
>>
>>
> Ok, so a wrapper such as: create_socket_on(namespaceid)
> will have generally less system calls with socketat()
>
Yes, I think so.
>> If your application has to do a lot of specific network processing,
>> during its life cycle, in different namespaces, the socketat syscall
>> will be better because it will reduce the number of syscalls but at
>> the cost of keeping the file descriptors opened (potentially a big
>> number). Otherwise, setns should fit your needs.
>>
> Makes sense.
>
> One thing still confuses me...
> The app control point is in namespace0. I still want to be able to
> "boot" namespaces first and maybe a few seconds later do a socketat()...
> and create devices, tcp sockets etc. I suspect create_ns(namespace-name)
> would involve:
> * open /proc/self/ns/net (namespace-name)
> * unshare the netns
> Is this correct?
>
Maybe I misunderstanding but you are trying to save some syscalls, you
should use socketat only and keep app control namespace0 socket for it.
The process will be in the last netns you unshared (maybe you can use
here one setns syscall to return back to the namespace0).
(1) socketat :
* pros : 1 syscall to create a socket
* cons : a file descriptor per namespace, namespace is only
manageable via a socket
(2) setns :
* pros : namespace is fully manageable with a generic code
* cons : 2 syscall (or 3 if we want to return to the initial
netns) to create a socket(setns + socket [ + setns ]), a file descriptor
per namespace
(3) setns + bind mount :
* pros : no file descriptor need to be kept opened
* cons : startup longer, (unshare + mount --bind), 4 syscalls
to create a socket in the namespace (open, setns, socket, close), (may
be 5 syscalls if we want to return to the initial netns).
Depending of the scheme you choose the startup will be for:
(1) socketat :
* open /proc/self/ns/net (one time to 'save' and pin the
initial netns)
and then
int create_ns(void)
{
unshare(CLONE_NEWNET);
return socket(...)
}
and,
for (i = 0; i < 8192; i++)
mynsfd[i] = create_ns();
(2) setns :
* open /proc/self/ns/net (one time to 'save' and pin the
initial netns)
and then
int create_ns(void)
{
unshare(CLONE_NEWNET);
return open("/proc/self/ns/net");
}
and,
for (i = 0; i < 8192; i++)
mynsfd[i] = create_ns();
(3) setns + mount :
* open /proc/self/ns/net (one time to 'save' and pin the
initial netns)
and then
int create_ns(const char *nspath)
{
unshare(CLONE_NEWNET);
creat(nspath);
mount("/proc/self/ns/net", nspath, MS_BIND);
}
for (i = 0; i < 8192; i++)
create_ns(mynspath[i]);
Hope that helps.
-- Daniel
next prev parent reply other threads:[~2010-10-04 10:14 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-23 8:45 [ABI REVIEW][PATCH 0/8] Namespace file descriptors Eric W. Biederman
2010-09-23 8:46 ` [PATCH 1/8] ns: proc files for namespace naming policy Eric W. Biederman
2010-09-23 8:46 ` [PATCH 2/8] ns: Introduce the setns syscall Eric W. Biederman
2010-09-23 8:47 ` [PATCH 3/8] ns proc: Add support for the network namespace Eric W. Biederman
2010-09-23 11:27 ` Louis Rilling
2010-09-23 16:00 ` Eric W. Biederman
2010-09-23 8:48 ` [PATCH 4/8] ns proc: Add support for the uts namespace Eric W. Biederman
2010-09-23 8:49 ` [PATCH 5/8] ns proc: Add support for the ipc namespace Eric W. Biederman
2010-09-23 8:50 ` [PATCH 6/8] ns proc: Add support for the mount namespace Eric W. Biederman
2010-09-23 8:51 ` [PATCH 7/8] net: Allow setting the network namespace by fd Eric W. Biederman
2010-09-23 9:41 ` Eric Dumazet
2010-09-23 16:03 ` Eric W. Biederman
2010-09-23 11:22 ` jamal
2010-09-23 14:58 ` David Lamparter
2010-09-24 11:51 ` jamal
2010-09-24 12:57 ` David Lamparter
2010-09-24 13:32 ` jamal
2010-09-24 14:09 ` David Lamparter
2010-09-24 14:16 ` jamal
2010-09-23 15:14 ` Eric W. Biederman
2010-09-23 14:22 ` Brian Haley
2010-09-23 16:16 ` Eric W. Biederman
2010-09-24 13:46 ` Daniel Lezcano
2010-09-23 8:51 ` [PATCH 8/8] net: Implement socketat Eric W. Biederman
2010-09-23 8:56 ` Pavel Emelyanov
2010-09-23 11:19 ` jamal
2010-09-23 11:33 ` Pavel Emelyanov
2010-09-23 11:40 ` jamal
2010-09-23 11:53 ` Pavel Emelyanov
2010-09-23 12:11 ` jamal
2010-09-23 12:34 ` Pavel Emelyanov
2010-09-23 14:54 ` David Lamparter
2010-09-23 15:00 ` Eric W. Biederman
2010-10-02 21:13 ` Daniel Lezcano
2010-10-03 13:44 ` jamal
2010-10-04 10:13 ` Daniel Lezcano [this message]
2010-10-04 19:07 ` Eric W. Biederman
2010-10-15 12:30 ` netns patches WAS( " jamal
2010-10-26 20:52 ` jamal
2010-10-27 0:27 ` Eric W. Biederman
2010-09-23 15:18 ` [ABI REVIEW][PATCH 0/8] Namespace file descriptors David Lamparter
2010-09-23 16:32 ` Eric W. Biederman
2010-09-23 16:49 ` David Lamparter
2010-09-24 13:02 ` Andrew Lutomirski
2010-09-24 13:49 ` Daniel Lezcano
2010-09-24 17:06 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CA9A8E6.8070600@free.fr \
--to=daniel.lezcano@free.fr \
--cc=containers@lists.osdl.org \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=drepper@gmail.com \
--cc=ebiederm@xmission.com \
--cc=greearb@candelatech.com \
--cc=hadi@cyberus.ca \
--cc=jengelh@medozas.de \
--cc=kaber@trash.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthltc@us.ibm.com \
--cc=mtk.manpages@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=serge@hallyn.com \
--cc=sukadev@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
--cc=xemul@openvz.org \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).