linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@free.fr>
To: Pavel Emelyanov <xemul@parallels.com>
Cc: hadi@cyberus.ca, "Eric W. Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org,
	Linux Containers <containers@lists.osdl.org>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Ulrich Drepper <drepper@gmail.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	David Miller <davem@davemloft.net>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Pavel Emelyanov <xemul@openvz.org>,
	Ben Greear <greearb@candelatech.com>,
	Matt Helsley <matthltc@us.ibm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>,
	Jan Engelhardt <jengelh@medozas.de>,
	Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH 8/8] net: Implement socketat.
Date: Sat, 02 Oct 2010 23:13:32 +0200	[thread overview]
Message-ID: <4CA7A07C.5030504@free.fr> (raw)
In-Reply-To: <4C9B3F9C.8080506@parallels.com>

On 09/23/2010 01:53 PM, Pavel Emelyanov wrote:
> On 09/23/2010 03:40 PM, jamal wrote:
>    
>> On Thu, 2010-09-23 at 15:33 +0400, Pavel Emelyanov wrote:
>>
>>      
>>> This particular usecase is unneeded once you have the "enter" ability.
>>>        
>> Is that cheaper from a syscall count/cost?
>>      
> Why does it matter? You told, that the usage scenario was to
> add routes to container. If I do 2 syscalls instead of 1, is
> it THAT worse?
>
>    
>> i.e do I have to enter every time i want to write/read this fd?
>>      
> No - you enter once, create a socket and do whatever you need
> withing the enterned namespace.
>    

Just to clarify this point. You enter the namespace, create the socket 
and go back to the initial namespace (or create a new one). Further 
operations can be made against this fd because it is the network 
namespace stored in the sock struct which is used, not the current 
process network namespace which is used at the socket creation only.

We can actually already do that by unsharing and then create a socket. 
This socket will pin the namespace and can be used as a control socket 
for the namespace (assuming the socket domain will be ok for all the 
operations).

Jamal, I don't know what kind of application you want to use but if I 
assume you want to create a process controlling 1024 netns, let's try to 
identificate what happen with setns and with socketat :

With setns:

     * open /proc/self/ns/net (1)
     * unshare the netns
     * open /proc/self/ns/net (2)
     * setns (1)
     * create a virtual network device
     * move the virtual device to (2) (using the set netns by fd)
     * unshare the netns
     ...

With socketat:

     * open a socket (1)
     * unshare the netns
     * open a netlink with socketat(1) => (2)
     * create a virtual device using (2) (at this point it is init_net_ns)
     * move the virtual device to the current netns (using the set netns 
by pid)
     * open a socket (3)
     * unshare the netns
     ...

We have the same number of file descriptors kept opened. Except, with 
setns we can bind mount the directory somewhere, that will pin the 
namespace and then we can close the /proc/self/ns/net file descriptors 
and reopen them later.

If your application has to do a lot of specific network processing, 
during its life cycle, in different namespaces, the socketat syscall 
will be better because it will reduce the number of syscalls but at the 
cost of keeping the file descriptors opened (potentially a big number). 
Otherwise, setns should fit your needs.



>> How does poll/select work in that enter scenario?
>>      
> Just like it used to before the enter.
>
>    
>> cheers,
>> jamal
>>
>>
>>      
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>    


  parent reply	other threads:[~2010-10-02 21:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-23  8:45 [ABI REVIEW][PATCH 0/8] Namespace file descriptors Eric W. Biederman
2010-09-23  8:46 ` [PATCH 1/8] ns: proc files for namespace naming policy Eric W. Biederman
2010-09-23  8:46 ` [PATCH 2/8] ns: Introduce the setns syscall Eric W. Biederman
2010-09-23  8:47 ` [PATCH 3/8] ns proc: Add support for the network namespace Eric W. Biederman
2010-09-23 11:27   ` Louis Rilling
2010-09-23 16:00     ` Eric W. Biederman
2010-09-23  8:48 ` [PATCH 4/8] ns proc: Add support for the uts namespace Eric W. Biederman
2010-09-23  8:49 ` [PATCH 5/8] ns proc: Add support for the ipc namespace Eric W. Biederman
2010-09-23  8:50 ` [PATCH 6/8] ns proc: Add support for the mount namespace Eric W. Biederman
2010-09-23  8:51 ` [PATCH 7/8] net: Allow setting the network namespace by fd Eric W. Biederman
2010-09-23  9:41   ` Eric Dumazet
2010-09-23 16:03     ` Eric W. Biederman
2010-09-23 11:22   ` jamal
2010-09-23 14:58     ` David Lamparter
2010-09-24 11:51       ` jamal
2010-09-24 12:57         ` David Lamparter
2010-09-24 13:32           ` jamal
2010-09-24 14:09             ` David Lamparter
2010-09-24 14:16               ` jamal
2010-09-23 15:14     ` Eric W. Biederman
2010-09-23 14:22   ` Brian Haley
2010-09-23 16:16     ` Eric W. Biederman
2010-09-24 13:46   ` Daniel Lezcano
2010-09-23  8:51 ` [PATCH 8/8] net: Implement socketat Eric W. Biederman
2010-09-23  8:56   ` Pavel Emelyanov
2010-09-23 11:19     ` jamal
2010-09-23 11:33       ` Pavel Emelyanov
2010-09-23 11:40         ` jamal
2010-09-23 11:53           ` Pavel Emelyanov
2010-09-23 12:11             ` jamal
2010-09-23 12:34               ` Pavel Emelyanov
2010-09-23 14:54                 ` David Lamparter
2010-09-23 15:00                 ` Eric W. Biederman
2010-10-02 21:13             ` Daniel Lezcano [this message]
2010-10-03 13:44               ` jamal
2010-10-04 10:13                 ` Daniel Lezcano
2010-10-04 19:07                 ` Eric W. Biederman
2010-10-15 12:30                 ` netns patches WAS( " jamal
2010-10-26 20:52                   ` jamal
2010-10-27  0:27                     ` Eric W. Biederman
2010-09-23 15:18 ` [ABI REVIEW][PATCH 0/8] Namespace file descriptors David Lamparter
2010-09-23 16:32   ` Eric W. Biederman
2010-09-23 16:49     ` David Lamparter
2010-09-24 13:02 ` Andrew Lutomirski
2010-09-24 13:49   ` Daniel Lezcano
2010-09-24 17:06     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CA7A07C.5030504@free.fr \
    --to=daniel.lezcano@free.fr \
    --cc=containers@lists.osdl.org \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=drepper@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=greearb@candelatech.com \
    --cc=hadi@cyberus.ca \
    --cc=jengelh@medozas.de \
    --cc=kaber@trash.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthltc@us.ibm.com \
    --cc=mtk.manpages@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=sukadev@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=xemul@openvz.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).