All of lore.kernel.org
 help / color / mirror / Atom feed
* pid namespace bug ?
       [not found]             ` <87ocgt12fb.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
@ 2010-05-06 20:13               ` Daniel Lezcano
       [not found]                 ` <4BE322F1.5030500-GANU6spQydw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Lezcano @ 2010-05-06 20:13 UTC (permalink / raw)
  To: Ferenc Wagner; +Cc: Linux Containers, sukadev Bhattiprolu

Ferenc Wagner wrote:

> I noticed something strange:
>
> # lxc-start -n jail -s lxc.mount.entry="/ /tmp/jail none bind 0 0" -s lxc.rootfs=/tmp/jail -s lxc.pivotdir=/mnt /bin/sleep 1000
> (in another terminal)
> # lxc-ps --lxc
> CONTAINER    PID TTY          TIME CMD
> jail        4173 pts/1    00:00:00 sleep
> # kill 4173
> (this does not kill the sleep!)
> # strace -p 4173
> Process 4173 attached - interrupt to quit
> restart_syscall(<... resuming interrupted call ...> = ? ERESTART_RESTARTBLOCK (To be restarted)
> --- SIGTERM (Terminated) @ 0 (0) ---
> Process 4173 detached
> # lxc-ps --lxc
> CONTAINER    PID TTY          TIME CMD
> jail        4173 pts/1    00:00:00 sleep
> # fgrep -i sig /proc/4173/status 
> SigQ:	1/16382
> SigPnd:	0000000000000000
> SigBlk:	0000000000000000
> SigIgn:	0000000000000000
> SigCgt:	0000000000000000
> # kill -9 4173
>
> That is, the jailed sleep process could be killed by SIGKILL only, even
> though (according to strace) SIGTERM was delivered and it isn't handled
> specially.  Why does this happen?
>   

Whow weird !

I tried with lxc-unshare -s PID sleep 3600, which does nothing more than 
unsharing a new pid namespace and I noticed the same.

I know the process 1 has some properties concerning the signals, it is 
immune against signals coming from the container maybe there is a 
problem in this area in the kernel.

Suka, does this behavior sound familiar to you ?

Happens on 2.6.31-20-generic (ubuntu) and 2.6.33 vanilla kernel.

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                 ` <4BE322F1.5030500-GANU6spQydw@public.gmane.org>
@ 2010-05-06 20:52                   ` Sukadev Bhattiprolu
       [not found]                     ` <20100506205233.GA23542-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2010-05-06 20:52 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Linux Containers, Ferenc Wagner

Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
> Ferenc Wagner wrote:
>
>> I noticed something strange:
>>
>> # lxc-start -n jail -s lxc.mount.entry="/ /tmp/jail none bind 0 0" -s lxc.rootfs=/tmp/jail -s lxc.pivotdir=/mnt /bin/sleep 1000
>> (in another terminal)
>> # lxc-ps --lxc
>> CONTAINER    PID TTY          TIME CMD
>> jail        4173 pts/1    00:00:00 sleep
>> # kill 4173
>> (this does not kill the sleep!)
>> # strace -p 4173
>> Process 4173 attached - interrupt to quit
>> restart_syscall(<... resuming interrupted call ...> = ? ERESTART_RESTARTBLOCK (To be restarted)
>> --- SIGTERM (Terminated) @ 0 (0) ---
>> Process 4173 detached
>> # lxc-ps --lxc
>> CONTAINER    PID TTY          TIME CMD
>> jail        4173 pts/1    00:00:00 sleep
>> # fgrep -i sig /proc/4173/status SigQ:	1/16382
>> SigPnd:	0000000000000000
>> SigBlk:	0000000000000000
>> SigIgn:	0000000000000000
>> SigCgt:	0000000000000000
>> # kill -9 4173
>>
>> That is, the jailed sleep process could be killed by SIGKILL only, even
>> though (according to strace) SIGTERM was delivered and it isn't handled
>> specially.  Why does this happen?

Yes, SIGKILL is the only reliable way to terminate a container-init.
container-init needs to be immune to signals from within the container
but be open to receiving signals from parent container.  These requirements
complicate the implementation of allowing SIGINIT/SIGTERM etc to
container-init from parent container.

Besides a realistic container-init would block such signals, in which case
the complexity in the kernel could be viewed as unnecessary.

Hope that helps,

Sukadev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                     ` <20100506205233.GA23542-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-05-07  8:51                       ` Daniel Lezcano
       [not found]                         ` <4BE3D4AD.1030705-GANU6spQydw@public.gmane.org>
  2010-05-07 14:10                       ` Ferenc Wagner
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Lezcano @ 2010-05-07  8:51 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Linux Containers, Ferenc Wagner

Sukadev Bhattiprolu wrote:
> Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>   
>> Ferenc Wagner wrote:
>>
>>     
>>> I noticed something strange:
>>>
>>> # lxc-start -n jail -s lxc.mount.entry="/ /tmp/jail none bind 0 0" -s lxc.rootfs=/tmp/jail -s lxc.pivotdir=/mnt /bin/sleep 1000
>>> (in another terminal)
>>> # lxc-ps --lxc
>>> CONTAINER    PID TTY          TIME CMD
>>> jail        4173 pts/1    00:00:00 sleep
>>> # kill 4173
>>> (this does not kill the sleep!)
>>> # strace -p 4173
>>> Process 4173 attached - interrupt to quit
>>> restart_syscall(<... resuming interrupted call ...> = ? ERESTART_RESTARTBLOCK (To be restarted)
>>> --- SIGTERM (Terminated) @ 0 (0) ---
>>> Process 4173 detached
>>> # lxc-ps --lxc
>>> CONTAINER    PID TTY          TIME CMD
>>> jail        4173 pts/1    00:00:00 sleep
>>> # fgrep -i sig /proc/4173/status SigQ:	1/16382
>>> SigPnd:	0000000000000000
>>> SigBlk:	0000000000000000
>>> SigIgn:	0000000000000000
>>> SigCgt:	0000000000000000
>>> # kill -9 4173
>>>
>>> That is, the jailed sleep process could be killed by SIGKILL only, even
>>> though (according to strace) SIGTERM was delivered and it isn't handled
>>> specially.  Why does this happen?
>>>       
>
> Yes, SIGKILL is the only reliable way to terminate a container-init.
> container-init needs to be immune to signals from within the container
> but be open to receiving signals from parent container.  These requirements
> complicate the implementation of allowing SIGINIT/SIGTERM etc to
> container-init from parent container.
>
> Besides a realistic container-init would block such signals, in which case
> the complexity in the kernel could be viewed as unnecessary.
>   

I am not sure it is good to have the pid 1 immune against signals sent 
from outside of the container.
 From the POV of the parent process, the container init is like any 
other process and it may want to kill it with a signal (for notification 
or just terminate instead of killing it).

If the container init is a real init pid, these signals will be blocked 
but if we launch something different, eg a 'sleep', Ctrl+C won't work. 
eg: lxc-start -n foo sleep 3600 is not interruptible.

That's a bit annoying if we need to plug the container with batch 
managers or use them with HPC jobs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                     ` <20100506205233.GA23542-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2010-05-07  8:51                       ` Daniel Lezcano
@ 2010-05-07 14:10                       ` Ferenc Wagner
       [not found]                         ` <87aasbsszn.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Ferenc Wagner @ 2010-05-07 14:10 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Linux Containers

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>
>> Ferenc Wagner wrote:
>>
>>> That is, the jailed sleep process could be killed by SIGKILL only, even
>>> though (according to strace) SIGTERM was delivered and it isn't handled
>>> specially.  Why does this happen?
>
> Yes, SIGKILL is the only reliable way to terminate a container-init.
> container-init needs to be immune to signals from within the container
> but be open to receiving signals from parent container.  These requirements
> complicate the implementation of allowing SIGINT/SIGTERM etc to
> container-init from parent container.
>
> Besides a realistic container-init would block such signals, in which case
> the complexity in the kernel could be viewed as unnecessary.

For full-system containers this is acceptable, but for running batch
jobs this may prove problematic.  Is this behaviour documented somewhere?
Is this specific to SIGINT/SIGTERM or are other signals affected as well?
They are used for communication (job control) with the container running
the job.  Such batch jobs are typically run under the supervision of
some kind of "shepherd" process, which acts as "init" for the job
environment; in my case it's the container-init.  It's the reaper or
possible orphaned processes and the same time it communicates with the
job scheduler (outside of the container) via signals.  So I'd consider
at least some kernel complexity necessary for Linux containers becoming
a viable tool for batch job segregation.
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                         ` <87aasbsszn.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
@ 2010-05-07 17:46                           ` Sukadev Bhattiprolu
       [not found]                             ` <20100507174646.GA3484-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2010-05-07 17:46 UTC (permalink / raw)
  To: Ferenc Wagner; +Cc: Linux Containers

Ferenc Wagner [wferi-eEbw3PyuezQ@public.gmane.org] wrote:
| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
| 
| > Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
| >
| >> Ferenc Wagner wrote:
| >>
| >>> That is, the jailed sleep process could be killed by SIGKILL only, even
| >>> though (according to strace) SIGTERM was delivered and it isn't handled
| >>> specially.  Why does this happen?
| >
| > Yes, SIGKILL is the only reliable way to terminate a container-init.
| > container-init needs to be immune to signals from within the container
| > but be open to receiving signals from parent container.  These requirements
| > complicate the implementation of allowing SIGINT/SIGTERM etc to
| > container-init from parent container.
| >
| > Besides a realistic container-init would block such signals, in which case
| > the complexity in the kernel could be viewed as unnecessary.
| 
| For full-system containers this is acceptable, but for running batch
| jobs this may prove problematic.  Is this behaviour documented somewhere?
| Is this specific to SIGINT/SIGTERM or are other signals affected as well?

Let me clarify - if the container-init has a handler for the signal, the
signal will be delivered. _Unhandled_ signals whose default is to terminate/
stop the process will be ignored by cinit unless the signal is SIGKILL/SIGSTOP
and sender is from parent container.

So to terminate a cinit from parent namespace you need SIGKILL. But other
signals will be delivered to cinit only if it has a handler.

| They are used for communication (job control) with the container running
| the job.  Such batch jobs are typically run under the supervision of
| some kind of "shepherd" process, which acts as "init" for the job
| environment; in my case it's the container-init.  It's the reaper or
| possible orphaned processes and the same time it communicates with the
| job scheduler (outside of the container) via signals.

So can this job scheduler install handlers for SIGINIT/SIGTERM/SIGQUIT ?

| So I'd consider
| at least some kernel complexity necessary for Linux containers becoming
| a viable tool for batch job segregation.

Yes, it is annoying that we can't CTRL-C a cinit running /bin/sleep, but
this behavior should not be too limiting to a more functional cinit.

I had submitted a verbose man page patch for kill(2) to describe these
semantics. but following para in the notes section of kill(2) does
allude to this behavior:

       The only signals that can be sent to process ID 1, the init
       process, are those for which init has explicitly installed signal
       handlers.  This is done to assure the system is not brought down
       accidentally.

See: 
	http://www.kernel.org/doc/man-pages/online/pages/man2/kill.2.html


Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                         ` <4BE3D4AD.1030705-GANU6spQydw@public.gmane.org>
@ 2010-05-07 19:44                           ` Sukadev Bhattiprolu
       [not found]                             ` <20100507194426.GB14799-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2010-05-07 19:44 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Linux Containers, Ferenc Wagner

Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>> Besides a realistic container-init would block such signals, in which case
>> the complexity in the kernel could be viewed as unnecessary.
>>   
>
> I am not sure it is good to have the pid 1 immune against signals sent  
> from outside of the container.

cinit is only immune to unhandled signals that terminate/stop the cinit.
If a handler is defined for SIGINT, a SIGINT from parent-ns will still be
delivered but a SIGINT from a descendant of cinit will be ignored.

> From the POV of the parent process, the container init is like any other 
> process and it may want to kill it with a signal (for notification or 
> just terminate instead of killing it).
>
> If the container init is a real init pid, these signals will be blocked  
> but if we launch something different, eg a 'sleep', Ctrl+C won't work.  
> eg: lxc-start -n foo sleep 3600 is not interruptible.

Yes it is annoying, but a mysleep.c that defines a handler which exits
on SIGINT/SIGSEGV/SIGTERM/SIGQUIT.., should still work as expected.
(if not, it is a bug).

>
> That's a bit annoying if we need to plug the container with batch  
> managers or use them with HPC jobs.
>
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                             ` <20100507174646.GA3484-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-05-07 20:54                               ` Ferenc Wagner
       [not found]                                 ` <87d3x7mnzz.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Ferenc Wagner @ 2010-05-07 20:54 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Linux Containers

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Ferenc Wagner [wferi-eEbw3PyuezQ@public.gmane.org] wrote:
>
>| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
>| 
>|> Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>|>
>|>> Ferenc Wagner wrote:
>|>>
>|>>> That is, the jailed sleep process could be killed by SIGKILL only, even
>|>>> though (according to strace) SIGTERM was delivered and it isn't handled
>|>>> specially.  Why does this happen?
>|>
>|> Yes, SIGKILL is the only reliable way to terminate a container-init.
>|> container-init needs to be immune to signals from within the container
>|> but be open to receiving signals from parent container.  These requirements
>|> complicate the implementation of allowing SIGINT/SIGTERM etc to
>|> container-init from parent container.
>|>
>|> Besides a realistic container-init would block such signals, in which case
>|> the complexity in the kernel could be viewed as unnecessary.
>| 
>| For full-system containers this is acceptable, but for running batch
>| jobs this may prove problematic.  Is this behaviour documented somewhere?
>| Is this specific to SIGINT/SIGTERM or are other signals affected as well?
>
> Let me clarify - if the container-init has a handler for the signal, the
> signal will be delivered. _Unhandled_ signals whose default is to terminate/
> stop the process will be ignored by cinit unless the signal is SIGKILL/SIGSTOP
> and sender is from parent container.
>
> So to terminate a cinit from parent namespace you need SIGKILL. But other
> signals will be delivered to cinit only if it has a handler.

Thanks for clarifying.  How does the above apply to signalfds?  Will
those deliver the signals which would otherwise been ignored by cinit,
having no handler installed?

>| They are used for communication (job control) with the container running
>| the job.  Such batch jobs are typically run under the supervision of
>| some kind of "shepherd" process, which acts as "init" for the job
>| environment; in my case it's the container-init.  It's the reaper or
>| possible orphaned processes and the same time it communicates with the
>| job scheduler (outside of the container) via signals.
>
> So can this job scheduler install handlers for SIGINT/SIGTERM/SIGQUIT ?

The scheduler is outside of the container, so I suppose you mean the
shepherd process, which is the container init.  Yes, it already has
handlers for each signal it's interested in, so according to the above,
everything should work as expected (once we get the signals forwarded to
it).

>| So I'd consider at least some kernel complexity necessary for Linux
>| containers becoming a viable tool for batch job segregation.
>
> Yes, it is annoying that we can't CTRL-C a cinit running /bin/sleep, but
> this behavior should not be too limiting to a more functional cinit.

Indeed.  I misunderstood you on first read.

> I had submitted a verbose man page patch for kill(2) to describe these
> semantics. but following para in the notes section of kill(2) does
> allude to this behavior:
>
>        The only signals that can be sent to process ID 1, the init
>        process, are those for which init has explicitly installed signal
>        handlers.  This is done to assure the system is not brought down
>        accidentally.

I even read that paragraph recently.  I didn't think it would apply,
though, as I was trying to kill cinit in the outer namespace, where it
had a generic PID, not 1.  Your effort to expand the man page of kill(2)
is most appreciated, I hope it will land soon!
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                             ` <20100507194426.GB14799-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-05-07 21:01                               ` Ferenc Wagner
       [not found]                                 ` <878w7vmnnn.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Ferenc Wagner @ 2010-05-07 21:01 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Linux Containers

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>
>>> Besides a realistic container-init would block such signals, in which case
>>> the complexity in the kernel could be viewed as unnecessary.
>>
>> I am not sure it is good to have the pid 1 immune against signals sent  
>> from outside of the container.
>
> cinit is only immune to unhandled signals that terminate/stop the cinit.
> If a handler is defined for SIGINT, a SIGINT from parent-ns will still be
> delivered but a SIGINT from a descendant of cinit will be ignored.

So is it impossible to send SIGINT to cinit from the container, even if
it has a handler for that?  Your other reply (and the note in kill(2))
seems to say it's possible, so I'm confused again...
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                                 ` <878w7vmnnn.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
@ 2010-05-07 21:30                                   ` Sukadev Bhattiprolu
       [not found]                                     ` <20100507213037.GA3305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2010-05-07 21:30 UTC (permalink / raw)
  To: Ferenc Wagner; +Cc: Linux Containers

Ferenc Wagner [wferi-eEbw3PyuezQ@public.gmane.org] wrote:
| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
| 
| > Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
| >
| >>> Besides a realistic container-init would block such signals, in which case
| >>> the complexity in the kernel could be viewed as unnecessary.
| >>
| >> I am not sure it is good to have the pid 1 immune against signals sent  
| >> from outside of the container.
| >
| > cinit is only immune to unhandled signals that terminate/stop the cinit.
| > If a handler is defined for SIGINT, a SIGINT from parent-ns will still be
| > delivered but a SIGINT from a descendant of cinit will be ignored.

Sorry. Bad sentence.

Yes, if a handler is defined, the signal will be delivered regardless of
sender's namespace. 

Thanks,

Suka

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                                     ` <20100507213037.GA3305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-05-07 21:43                                       ` Ferenc Wagner
  2010-05-08 12:52                                       ` Daniel Lezcano
  1 sibling, 0 replies; 12+ messages in thread
From: Ferenc Wagner @ 2010-05-07 21:43 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Linux Containers

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

>| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
>| 
>|> Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>|>
>|>>> Besides a realistic container-init would block such signals, in which case
>|>>> the complexity in the kernel could be viewed as unnecessary.
>|>>
>|>> I am not sure it is good to have the pid 1 immune against signals sent  
>|>> from outside of the container.
>|>
>|> cinit is only immune to unhandled signals that terminate/stop the cinit.
>|> If a handler is defined for SIGINT, a SIGINT from parent-ns will still be
>|> delivered but a SIGINT from a descendant of cinit will be ignored.
>
> Sorry. Bad sentence.
>
> Yes, if a handler is defined, the signal will be delivered regardless of
> sender's namespace. 

Great, thanks!
-- 
Feri.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                                 ` <87d3x7mnzz.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
@ 2010-05-08  2:11                                   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2010-05-08  2:11 UTC (permalink / raw)
  To: Ferenc Wagner; +Cc: Linux Containers

Ferenc Wagner [wferi-eEbw3PyuezQ@public.gmane.org] wrote:
| > So to terminate a cinit from parent namespace you need SIGKILL. But other
| > signals will be delivered to cinit only if it has a handler.
| 
| Thanks for clarifying.  How does the above apply to signalfds?  Will
| those deliver the signals which would otherwise been ignored by cinit,
| having no handler installed?

Yes, if the signal is blocked, the signal will still be queued regardless
of sender's namespace[1]. In this case the blocked+pending signal will be
available via the signalfd() until the signal is unblocked.

If the signal is not blocked and the handler is either SIG_DFL or
SIG_IGN, the signal is not queued and will not be available via signalfd.

[1] Blocked signals have some special cases even without signalfd() - If
the signal is queued and later unblocked and the handler is SIG_DFL/SIG_IGN,
the signal will be silently discarded (regardless of sender's namespace).

If the user specifies a handler before unblocking the signal, the signal
will be delivered (regardless of sender's namespace)

| 
| >| They are used for communication (job control) with the container running
| >| the job.  Such batch jobs are typically run under the supervision of
| >| some kind of "shepherd" process, which acts as "init" for the job
| >| environment; in my case it's the container-init.  It's the reaper or
| >| possible orphaned processes and the same time it communicates with the
| >| job scheduler (outside of the container) via signals.
| >
| > So can this job scheduler install handlers for SIGINT/SIGTERM/SIGQUIT ?
| 
| The scheduler is outside of the container, so I suppose you mean the
| shepherd process, which is the container init.  Yes, it already has
| handlers for each signal it's interested in, so according to the above,
| everything should work as expected (once we get the signals forwarded to
| it).

Yes, I meant the shepherd process.

| 
| >| So I'd consider at least some kernel complexity necessary for Linux
| >| containers becoming a viable tool for batch job segregation.
| >
| > Yes, it is annoying that we can't CTRL-C a cinit running /bin/sleep, but
| > this behavior should not be too limiting to a more functional cinit.
| 
| Indeed.  I misunderstood you on first read.
| 
| > I had submitted a verbose man page patch for kill(2) to describe these
| > semantics. but following para in the notes section of kill(2) does
| > allude to this behavior:
| >
| >        The only signals that can be sent to process ID 1, the init
| >        process, are those for which init has explicitly installed signal
| >        handlers.  This is done to assure the system is not brought down
| >        accidentally.
| 
| I even read that paragraph recently.  I didn't think it would apply,
| though, as I was trying to kill cinit in the outer namespace, where it
| had a generic PID, not 1.  Your effort to expand the man page of kill(2)
| is most appreciated, I hope it will land soon!

I do see now that it is ambigous and incomplete - the special handling applies
to a process if it has a pid == 1 in *any* namespace.

Second it does not mention that SIGKILL/SIGSTOP are the only reliable signals
to a container-init from parent namespace.

Will submit a patch for the man page change.

Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pid namespace bug ?
       [not found]                                     ` <20100507213037.GA3305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2010-05-07 21:43                                       ` Ferenc Wagner
@ 2010-05-08 12:52                                       ` Daniel Lezcano
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Lezcano @ 2010-05-08 12:52 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Linux Containers, Ferenc Wagner

Sukadev Bhattiprolu wrote:
> Ferenc Wagner [wferi-eEbw3PyuezQ@public.gmane.org] wrote:
> | Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> | 
> | > Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
> | >
> | >>> Besides a realistic container-init would block such signals, in which case
> | >>> the complexity in the kernel could be viewed as unnecessary.
> | >>
> | >> I am not sure it is good to have the pid 1 immune against signals sent  
> | >> from outside of the container.
> | >
> | > cinit is only immune to unhandled signals that terminate/stop the cinit.
> | > If a handler is defined for SIGINT, a SIGINT from parent-ns will still be
> | > delivered but a SIGINT from a descendant of cinit will be ignored.
>
> Sorry. Bad sentence.
>
> Yes, if a handler is defined, the signal will be delivered regardless of
> sender's namespace. 
>   
Thanks Suka for the clarification.

  -- Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-05-08 12:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8739y6ikjr.fsf@tac.ki.iif.hu>
     [not found] ` <4BE178BC.4030201@free.fr>
     [not found]   ` <87ljbyh1zv.fsf@tac.ki.iif.hu>
     [not found]     ` <4BE18E01.3090103@free.fr>
     [not found]       ` <87hbml2uf3.fsf@tac.ki.iif.hu>
     [not found]         ` <4BE2A479.3060805@free.fr>
     [not found]           ` <87ocgt12fb.fsf@tac.ki.iif.hu>
     [not found]             ` <87ocgt12fb.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
2010-05-06 20:13               ` pid namespace bug ? Daniel Lezcano
     [not found]                 ` <4BE322F1.5030500-GANU6spQydw@public.gmane.org>
2010-05-06 20:52                   ` Sukadev Bhattiprolu
     [not found]                     ` <20100506205233.GA23542-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-05-07  8:51                       ` Daniel Lezcano
     [not found]                         ` <4BE3D4AD.1030705-GANU6spQydw@public.gmane.org>
2010-05-07 19:44                           ` Sukadev Bhattiprolu
     [not found]                             ` <20100507194426.GB14799-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-05-07 21:01                               ` Ferenc Wagner
     [not found]                                 ` <878w7vmnnn.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
2010-05-07 21:30                                   ` Sukadev Bhattiprolu
     [not found]                                     ` <20100507213037.GA3305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-05-07 21:43                                       ` Ferenc Wagner
2010-05-08 12:52                                       ` Daniel Lezcano
2010-05-07 14:10                       ` Ferenc Wagner
     [not found]                         ` <87aasbsszn.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
2010-05-07 17:46                           ` Sukadev Bhattiprolu
     [not found]                             ` <20100507174646.GA3484-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-05-07 20:54                               ` Ferenc Wagner
     [not found]                                 ` <87d3x7mnzz.fsf-/U8DR9OPLL8grVaPS+uXcA@public.gmane.org>
2010-05-08  2:11                                   ` Sukadev Bhattiprolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.