All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.35: unshare(NEWNS) does not work inside a container anymore?
@ 2010-08-31 11:02 Michael Tokarev
       [not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Tokarev @ 2010-08-31 11:02 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

I just noticed a regression - immediately after updating
kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34).
Namely, unshare(CLONE_NEWNS) stopped workin from within
a container, like this:

unshare(CLONE_NEWNS)              = -1 EINVAL (Invalid argument)

There's no other fancy stuff going on around, just plain
unshare and exec a new shell.

What's wrong with 2.6.35 in this context?

Thanks.

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore?
       [not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
@ 2010-09-01 16:28   ` Serge E. Hallyn
       [not found]     ` <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
  2010-09-02  9:20   ` Michael Tokarev
  1 sibling, 1 reply; 6+ messages in thread
From: Serge E. Hallyn @ 2010-09-01 16:28 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org):
> I just noticed a regression - immediately after updating
> kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34).
> Namely, unshare(CLONE_NEWNS) stopped workin from within
> a container, like this:
> 
> unshare(CLONE_NEWNS)              = -1 EINVAL (Invalid argument)
> 
> There's no other fancy stuff going on around, just plain
> unshare and exec a new shell.

I'm not seeing this behavior.  I'm on 2.6.35-19-generic (ubuntu
maverick), created a lucid container with the standard template,
and tested with ns_exec
	(git clone git://git.sr71.net/~hallyn/cr_tests.git;
	 git checkout ns_exec; make ns_exec;
	 ns_exec -m /bin/bash;  play with mounts; exit)

Can you give us /proc/self/status and capsh --print output
from inside the container before you try to unshare, and
maybe strace output from the program you were using?

> What's wrong with 2.6.35 in this context?
> 
> Thanks.
> 
> /mjt
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore?
       [not found]     ` <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2010-09-01 17:27       ` Michael Tokarev
       [not found]         ` <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Tokarev @ 2010-09-01 17:27 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

01.09.2010 20:28, Serge E. Hallyn wrote:
> Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org):
>> I just noticed a regression - immediately after updating
>> kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34).
>> Namely, unshare(CLONE_NEWNS) stopped workin from within
>> a container, like this:
>>
>> unshare(CLONE_NEWNS)              = -1 EINVAL (Invalid argument)
>>
>> There's no other fancy stuff going on around, just plain
>> unshare and exec a new shell.
> 
> I'm not seeing this behavior.  I'm on 2.6.35-19-generic (ubuntu
> maverick), created a lucid container with the standard template,
> and tested with ns_exec
> 	(git clone git://git.sr71.net/~hallyn/cr_tests.git;
> 	 git checkout ns_exec; make ns_exec;
> 	 ns_exec -m /bin/bash;  play with mounts; exit)

This one is not using unshare(2), it is using clone(2) syscall.

I asked about unshare.  In particular, lxc-unshare fails withing
the container the same way too -- it too uses unshare().

> Can you give us /proc/self/status and capsh --print output
> from inside the container before you try to unshare, and
> maybe strace output from the program you were using?

Sure.

# cat /proc/self/status
Name:	cat
State:	R (running)
Tgid:	2663
Pid:	2663
PPid:	2660
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	256
Groups:	0
VmPeak:	    4944 kB
VmSize:	    4944 kB
VmLck:	       0 kB
VmHWM:	     232 kB
VmRSS:	     232 kB
VmData:	     160 kB
VmStk:	     136 kB
VmExe:	      40 kB
VmLib:	    1388 kB
VmPTE:	      24 kB
VmSwap:	       0 kB
Threads:	1
SigQ:	4/63178
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	ffffffffffbfffff
CapEff:	ffffffffffbfffff
CapBnd:	ffffffffffbfffff
Cpus_allowed:	f
Cpus_allowed_list:	0-3
Mems_allowed:	1
Mems_allowed_list:	0
voluntary_ctxt_switches:	3
nonvoluntary_ctxt_switches:	2

# capsh --print
Current: =ep cap_sys_boot-ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin
Securebits: 00/0x0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0

# strace clone --fs bash
execve("/usr/sbin/clone", ["clone", "--fs", "bash"], [/* 15 vars */]) = 0
brk(0)                                  = 0x834c000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf76f1000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=18528, ...}) = 0
mmap2(NULL, 18528, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf76ec000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i686/cmov/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320m\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1327556, ...}) = 0
mmap2(NULL, 1337704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf75a5000
mprotect(0xf76e5000, 4096, PROT_NONE)   = 0
mmap2(0xf76e6000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x140) = 0xf76e6000
mmap2(0xf76e9000, 10600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf76e9000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf75a4000
set_thread_area({entry_number:-1 -> 12, base_addr:0xf75a46c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xf76e6000, 8192, PROT_READ)   = 0
mprotect(0xf770f000, 4096, PROT_READ)   = 0
munmap(0xf76ec000, 18528)               = 0
unshare(CLONE_NEWNS)                    = -1 EINVAL (Invalid argument)
write(2, "clone: unshare: Invalid argument"..., 33clone: unshare: Invalid argument
) = 33
exit_group(1)                           = ?

The source of this clone program is available at
http://www.corpit.ru/mjt/clone.c - I use it for
a long time, it works on this same machine
outside of containers, and it worked in 2.6.32.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore?
       [not found]         ` <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
@ 2010-09-01 19:41           ` Serge E. Hallyn
       [not found]             ` <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Serge E. Hallyn @ 2010-09-01 19:41 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org):
> 01.09.2010 20:28, Serge E. Hallyn wrote:
> > Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org):
> >> I just noticed a regression - immediately after updating
> >> kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34).
> >> Namely, unshare(CLONE_NEWNS) stopped workin from within
> >> a container, like this:
> >>
> >> unshare(CLONE_NEWNS)              = -1 EINVAL (Invalid argument)
> >>
> >> There's no other fancy stuff going on around, just plain
> >> unshare and exec a new shell.
> > 
> > I'm not seeing this behavior.  I'm on 2.6.35-19-generic (ubuntu
> > maverick), created a lucid container with the standard template,
> > and tested with ns_exec
> > 	(git clone git://git.sr71.net/~hallyn/cr_tests.git;
> > 	 git checkout ns_exec; make ns_exec;
> > 	 ns_exec -m /bin/bash;  play with mounts; exit)
> 
> This one is not using unshare(2), it is using clone(2) syscall.

That's only the case if you do 'ns_exec -cm'.

> I asked about unshare.  In particular, lxc-unshare fails withing
> the container the same way too -- it too uses unshare().

lxc-unshare -s MOUNT /bin/bash passes here too.

> > Can you give us /proc/self/status and capsh --print output
> > from inside the container before you try to unshare, and
> > maybe strace output from the program you were using?
> 
> Sure.
> 
> # cat /proc/self/status
> Name:	cat
> State:	R (running)
> Tgid:	2663
> Pid:	2663
> PPid:	2660
> TracerPid:	0
> Uid:	0	0	0	0
> Gid:	0	0	0	0
> FDSize:	256
> Groups:	0
> VmPeak:	    4944 kB
> VmSize:	    4944 kB
> VmLck:	       0 kB
> VmHWM:	     232 kB
> VmRSS:	     232 kB
> VmData:	     160 kB
> VmStk:	     136 kB
> VmExe:	      40 kB
> VmLib:	    1388 kB
> VmPTE:	      24 kB
> VmSwap:	       0 kB
> Threads:	1
> SigQ:	4/63178
> SigPnd:	0000000000000000
> ShdPnd:	0000000000000000
> SigBlk:	0000000000000000
> SigIgn:	0000000000000000
> SigCgt:	0000000000000000
> CapInh:	0000000000000000
> CapPrm:	ffffffffffbfffff
> CapEff:	ffffffffffbfffff
> CapBnd:	ffffffffffbfffff
> Cpus_allowed:	f
> Cpus_allowed_list:	0-3
> Mems_allowed:	1
> Mems_allowed_list:	0
> voluntary_ctxt_switches:	3
> nonvoluntary_ctxt_switches:	2
> 
> # capsh --print
> Current: =ep cap_sys_boot-ep
> Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin
> Securebits: 00/0x0
>  secure-noroot: no (unlocked)
>  secure-no-suid-fixup: no (unlocked)
>  secure-keep-caps: no (unlocked)
> uid=0
> 
> # strace clone --fs bash
> execve("/usr/sbin/clone", ["clone", "--fs", "bash"], [/* 15 vars */]) = 0
> brk(0)                                  = 0x834c000
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
> mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf76f1000
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=18528, ...}) = 0
> mmap2(NULL, 18528, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf76ec000
> close(3)                                = 0
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
> open("/lib/i686/cmov/libc.so.6", O_RDONLY) = 3
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320m\1\0004\0\0\0"..., 512) = 512
> fstat64(3, {st_mode=S_IFREG|0755, st_size=1327556, ...}) = 0
> mmap2(NULL, 1337704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf75a5000
> mprotect(0xf76e5000, 4096, PROT_NONE)   = 0
> mmap2(0xf76e6000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x140) = 0xf76e6000
> mmap2(0xf76e9000, 10600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf76e9000
> close(3)                                = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf75a4000
> set_thread_area({entry_number:-1 -> 12, base_addr:0xf75a46c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
> mprotect(0xf76e6000, 8192, PROT_READ)   = 0
> mprotect(0xf770f000, 4096, PROT_READ)   = 0
> munmap(0xf76ec000, 18528)               = 0
> unshare(CLONE_NEWNS)                    = -1 EINVAL (Invalid argument)
> write(2, "clone: unshare: Invalid argument"..., 33clone: unshare: Invalid argument
> ) = 33
> exit_group(1)                           = ?
> 
> The source of this clone program is available at
> http://www.corpit.ru/mjt/clone.c - I use it for
> a long time, it works on this same machine
> outside of containers, and it worked in 2.6.32.

Hm, is working for me.  You're on a plain upstream 2.6.35, as in commitid
9fe6206f400646a2322096b56c59891d530e8d51 ?

I see nothing obvious in your output, unfortunately.

-serge

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore?
       [not found]             ` <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2010-09-01 19:53               ` Michael Tokarev
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Tokarev @ 2010-09-01 19:53 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

01.09.2010 23:41, Serge E. Hallyn wrote:
[]
>>>> unshare(CLONE_NEWNS)              = -1 EINVAL (Invalid argument)
[]
>>> 	 ns_exec -m /bin/bash;  play with mounts; exit)
>> This one is not using unshare(2), it is using clone(2) syscall.
> 
> That's only the case if you do 'ns_exec -cm'.

Oh.  I missed that.
[]
>> The source of this clone program is available at
>> http://www.corpit.ru/mjt/clone.c - I use it for
>> a long time, it works on this same machine
>> outside of containers, and it worked in 2.6.32.
> 
> Hm, is working for me.  You're on a plain upstream 2.6.35, as in commitid
> 9fe6206f400646a2322096b56c59891d530e8d51 ?

No, it's 2.6.35.4 - last stable.  Plain 2.6.35 works (or fails)
the same for me as 2.6.35 - this one:
http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.35.tar.bz2

But I see at least one possible difference: I run 64bit kernel
and a 32bit userspace, including lxc tools and unshare code.
Lemme check with 64bit (native) userspace....

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore?
       [not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
  2010-09-01 16:28   ` Serge E. Hallyn
@ 2010-09-02  9:20   ` Michael Tokarev
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Tokarev @ 2010-09-02  9:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

31.08.2010 15:02, Michael Tokarev wrote:
> I just noticed a regression - immediately after updating
> kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34).
> Namely, unshare(CLONE_NEWNS) stopped workin from within
> a container, like this:
> 
> unshare(CLONE_NEWNS)              = -1 EINVAL (Invalid argument)
> 
> There's no other fancy stuff going on around, just plain
> unshare and exec a new shell.
> 
> What's wrong with 2.6.35 in this context?

So, after discussing this on IRC and doing some discovery,
it turned out to be new (in 2.6.35) cgroup subsystem --
block I/O controller (CONFIG_BLK_CGROUP).  This one does
not allow more than 1 level of nesting, so, for example,
it is impossible to create a subdirectory in another
cgroup dir in cgroupfs:

 mkdir /dev/cgroup/foo  -- this one succeeds, but
 mkdir /dev/cgroup/foo/bar -- this fails as long
as blkio mount option is enabled.  Once disabled, it
works again.

In 2.6.35 block/blk-cgroup.c, blkiocg_create() there's the
following code:

  /* Currently we do not support hierarchy deeper than two level (0,1) */
  if (parent != cgroup->top_cgroup)
          return ERR_PTR(-EINVAL);

In 2.6.36-tobe it were changed to

          return ERR_PTR(-EPERM);

but the issue remains anyway.  What is problematic here
is that blkio is different from all other cgroups in
this very respect (not allowing nesting), but there's
no indication of this fact anywhere.  At least, the
above quoted place warranrs a WARN() or WARN_ONCE()
to tell the user what's going on - or else it's very
difficult to debug.

Speaking of real solution, it looks like disallowing
nesting should be done in a different way.  Maybe
allow creation of a subcontainer but reset the limits
in there and catch attempts to set them, - I dunno.
Or, don't clone whole cgroup hierarchy on CLONE_NEWNS
only.

Current situation is too restrictive IMHO - blkio
controller is useful for a container like LXC, but
currently it implies that one can't create even a
new filesystem namespace within it.

Thanks.

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-09-02  9:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-31 11:02 2.6.35: unshare(NEWNS) does not work inside a container anymore? Michael Tokarev
     [not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
2010-09-01 16:28   ` Serge E. Hallyn
     [not found]     ` <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-01 17:27       ` Michael Tokarev
     [not found]         ` <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
2010-09-01 19:41           ` Serge E. Hallyn
     [not found]             ` <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-01 19:53               ` Michael Tokarev
2010-09-02  9:20   ` Michael Tokarev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.