All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-28 10:24 ` Chen Hanxiao
  0 siblings, 0 replies; 40+ messages in thread
From: Chen Hanxiao @ 2014-05-28 10:24 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Richard Weinberger, Serge Hallyn, Oleg Nesterov, David Howells,
	Eric W. Biederman, Andrew Morton, Al Viro

We need a direct method of getting the pid inside containers.
If some issues occurred inside container guest, host user
could not know which process is in trouble just by guest pid:
the users of container guest only knew the pid inside containers.
This will bring obstacle for trouble shooting.

This patch adds two fields:

NStgid and NSpid.

a) In init_pid_ns, nothing changed;

b) In one pidns, will tell the pid inside containers:
NStgid:	1628 	9 	3
NSpid:	1628 	9 	3
** Process id is 1628 in level 0, 9 in level 1, 3 in level 2.

c) If pidns is nested, it depends on which pidns are you in.
NStgid:	9 	3
NSpid:	9 	3
** Views from level 1

Signed-off-by: Chen Hanxiao <chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
v2: add two new fields: NStgid and NSpid.
    keep fields of Tgid and Pid unchanged for back compatibility.

 fs/proc/array.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 64db2bc..9b7e65c 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -193,6 +193,15 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 		from_kgid_munged(user_ns, cred->egid),
 		from_kgid_munged(user_ns, cred->sgid),
 		from_kgid_munged(user_ns, cred->fsgid));
+	seq_puts(m, "NStgid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_tgid_nr_ns(p, pid->numbers[g].ns));
+	seq_puts(m, "\nNSpid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_pid_nr_ns(p, pid->numbers[g].ns));
+	seq_putc(m, '\n');
 
 	task_lock(p);
 	if (p->files)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-28 10:24 ` Chen Hanxiao
  0 siblings, 0 replies; 40+ messages in thread
From: Chen Hanxiao @ 2014-05-28 10:24 UTC (permalink / raw)
  To: containers, linux-kernel
  Cc: Andrew Morton, Eric W. Biederman, Serge Hallyn,
	Daniel P. Berrange, Oleg Nesterov, Al Viro, David Howells,
	Richard Weinberger, Chen Hanxiao

We need a direct method of getting the pid inside containers.
If some issues occurred inside container guest, host user
could not know which process is in trouble just by guest pid:
the users of container guest only knew the pid inside containers.
This will bring obstacle for trouble shooting.

This patch adds two fields:

NStgid and NSpid.

a) In init_pid_ns, nothing changed;

b) In one pidns, will tell the pid inside containers:
NStgid:	1628 	9 	3
NSpid:	1628 	9 	3
** Process id is 1628 in level 0, 9 in level 1, 3 in level 2.

c) If pidns is nested, it depends on which pidns are you in.
NStgid:	9 	3
NSpid:	9 	3
** Views from level 1

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
---
v2: add two new fields: NStgid and NSpid.
    keep fields of Tgid and Pid unchanged for back compatibility.

 fs/proc/array.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 64db2bc..9b7e65c 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -193,6 +193,15 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 		from_kgid_munged(user_ns, cred->egid),
 		from_kgid_munged(user_ns, cred->sgid),
 		from_kgid_munged(user_ns, cred->fsgid));
+	seq_puts(m, "NStgid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_tgid_nr_ns(p, pid->numbers[g].ns));
+	seq_puts(m, "\nNSpid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_pid_nr_ns(p, pid->numbers[g].ns));
+	seq_putc(m, '\n');
 
 	task_lock(p);
 	if (p->files)
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-28 10:24 ` Chen Hanxiao
@ 2014-05-28 12:44     ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-28 12:44 UTC (permalink / raw)
  To: Chen Hanxiao
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On 05/28/2014 02:24 PM, Chen Hanxiao wrote:
> We need a direct method of getting the pid inside containers.

But there's more generic issue -- some day we'll need to know not only
PIDs as seen from different namespaces, but also SIDs and PGIDs.

> If some issues occurred inside container guest, host user
> could not know which process is in trouble just by guest pid:
> the users of container guest only knew the pid inside containers.
> This will bring obstacle for trouble shooting.
> 
> This patch adds two fields:
> 
> NStgid and NSpid.
> 
> a) In init_pid_ns, nothing changed;
> 
> b) In one pidns, will tell the pid inside containers:
> NStgid:	1628 	9 	3
> NSpid:	1628 	9 	3
> ** Process id is 1628 in level 0, 9 in level 1, 3 in level 2.
> 
> c) If pidns is nested, it depends on which pidns are you in.
> NStgid:	9 	3
> NSpid:	9 	3
> ** Views from level 1
> 
> Signed-off-by: Chen Hanxiao <chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> ---
> v2: add two new fields: NStgid and NSpid.
>     keep fields of Tgid and Pid unchanged for back compatibility.
> 
>  fs/proc/array.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 64db2bc..9b7e65c 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -193,6 +193,15 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  		from_kgid_munged(user_ns, cred->egid),
>  		from_kgid_munged(user_ns, cred->sgid),
>  		from_kgid_munged(user_ns, cred->fsgid));
> +	seq_puts(m, "NStgid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_tgid_nr_ns(p, pid->numbers[g].ns));
> +	seq_puts(m, "\nNSpid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_pid_nr_ns(p, pid->numbers[g].ns));
> +	seq_putc(m, '\n');
>  
>  	task_lock(p);
>  	if (p->files)
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-28 12:44     ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-28 12:44 UTC (permalink / raw)
  To: Chen Hanxiao
  Cc: containers, linux-kernel, Richard Weinberger, Serge Hallyn,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/28/2014 02:24 PM, Chen Hanxiao wrote:
> We need a direct method of getting the pid inside containers.

But there's more generic issue -- some day we'll need to know not only
PIDs as seen from different namespaces, but also SIDs and PGIDs.

> If some issues occurred inside container guest, host user
> could not know which process is in trouble just by guest pid:
> the users of container guest only knew the pid inside containers.
> This will bring obstacle for trouble shooting.
> 
> This patch adds two fields:
> 
> NStgid and NSpid.
> 
> a) In init_pid_ns, nothing changed;
> 
> b) In one pidns, will tell the pid inside containers:
> NStgid:	1628 	9 	3
> NSpid:	1628 	9 	3
> ** Process id is 1628 in level 0, 9 in level 1, 3 in level 2.
> 
> c) If pidns is nested, it depends on which pidns are you in.
> NStgid:	9 	3
> NSpid:	9 	3
> ** Views from level 1
> 
> Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
> ---
> v2: add two new fields: NStgid and NSpid.
>     keep fields of Tgid and Pid unchanged for back compatibility.
> 
>  fs/proc/array.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 64db2bc..9b7e65c 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -193,6 +193,15 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  		from_kgid_munged(user_ns, cred->egid),
>  		from_kgid_munged(user_ns, cred->sgid),
>  		from_kgid_munged(user_ns, cred->fsgid));
> +	seq_puts(m, "NStgid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_tgid_nr_ns(p, pid->numbers[g].ns));
> +	seq_puts(m, "\nNSpid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_pid_nr_ns(p, pid->numbers[g].ns));
> +	seq_putc(m, '\n');
>  
>  	task_lock(p);
>  	if (p->files)
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-28 12:44     ` Pavel Emelyanov
@ 2014-05-28 18:28         ` Vasily Kulikov
  -1 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-28 18:28 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> On 05/28/2014 02:24 PM, Chen Hanxiao wrote:
> > We need a direct method of getting the pid inside containers.
> 
> But there's more generic issue -- some day we'll need to know not only
> PIDs as seen from different namespaces, but also SIDs and PGIDs.

Maybe include all per-ns ID in a separate file?  Then the old 'status'
file includes IDs from the current namespace only, the new file (e.g.
'ids' or 'ns_ids') contains only hierarchical IDs which differ from
namespace to namespace for all possible namespaces.  It will be simplier
to parse the file -- if 'ns_ids' file contains some ID then this ID for
every ns can be obtained regardless of the specific ID name (SID, PID,
PGID, etc.).

> 
> > If some issues occurred inside container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> > 
> > This patch adds two fields:
> > 
> > NStgid and NSpid.
> > 
> > a) In init_pid_ns, nothing changed;
> > 
> > b) In one pidns, will tell the pid inside containers:
> > NStgid:	1628 	9 	3
> > NSpid:	1628 	9 	3
> > ** Process id is 1628 in level 0, 9 in level 1, 3 in level 2.
> > 
> > c) If pidns is nested, it depends on which pidns are you in.
> > NStgid:	9 	3
> > NSpid:	9 	3
> > ** Views from level 1

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-28 18:28         ` Vasily Kulikov
  0 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-28 18:28 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Chen Hanxiao, Richard Weinberger, containers, Serge Hallyn,
	linux-kernel, Oleg Nesterov, David Howells, Eric W. Biederman,
	Andrew Morton, Al Viro

On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> On 05/28/2014 02:24 PM, Chen Hanxiao wrote:
> > We need a direct method of getting the pid inside containers.
> 
> But there's more generic issue -- some day we'll need to know not only
> PIDs as seen from different namespaces, but also SIDs and PGIDs.

Maybe include all per-ns ID in a separate file?  Then the old 'status'
file includes IDs from the current namespace only, the new file (e.g.
'ids' or 'ns_ids') contains only hierarchical IDs which differ from
namespace to namespace for all possible namespaces.  It will be simplier
to parse the file -- if 'ns_ids' file contains some ID then this ID for
every ns can be obtained regardless of the specific ID name (SID, PID,
PGID, etc.).

> 
> > If some issues occurred inside container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> > 
> > This patch adds two fields:
> > 
> > NStgid and NSpid.
> > 
> > a) In init_pid_ns, nothing changed;
> > 
> > b) In one pidns, will tell the pid inside containers:
> > NStgid:	1628 	9 	3
> > NSpid:	1628 	9 	3
> > ** Process id is 1628 in level 0, 9 in level 1, 3 in level 2.
> > 
> > c) If pidns is nested, it depends on which pidns are you in.
> > NStgid:	9 	3
> > NSpid:	9 	3
> > ** Views from level 1

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-28 18:28         ` Vasily Kulikov
  (?)
@ 2014-05-28 19:27         ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-28 19:27 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>> On 05/28/2014 02:24 PM, Chen Hanxiao wrote:
>>> We need a direct method of getting the pid inside containers.
>>
>> But there's more generic issue -- some day we'll need to know not only
>> PIDs as seen from different namespaces, but also SIDs and PGIDs.
> 
> Maybe include all per-ns ID in a separate file?

This looks reasonable, but wouldn't this file be too big for a loaded system?

> Then the old 'status'
> file includes IDs from the current namespace only, the new file (e.g.
> 'ids' or 'ns_ids') contains only hierarchical IDs which differ from
> namespace to namespace for all possible namespaces.  

For all visible namespaces. I.e. -- if a task lives in a container and reads
its /proc/self/status it should _not_ see its host pid. Just like it is now
in the current patch. Otherwise it would bring blockers to live migration :(

> It will be simplier
> to parse the file -- if 'ns_ids' file contains some ID then this ID for
> every ns can be obtained regardless of the specific ID name (SID, PID,
> PGID, etc.).

True, but given a task PID how to determine which pid namespaces it lives in
to get the idea of how PIDs map to each other? Maybe we need some explicit
API for converting (ID, NS1, NS2) into (ID)?

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-28 18:28         ` Vasily Kulikov
  (?)
  (?)
@ 2014-05-28 19:27         ` Pavel Emelyanov
       [not found]           ` <53863889.9080509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  -1 siblings, 1 reply; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-28 19:27 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Chen Hanxiao, Richard Weinberger, containers, Serge Hallyn,
	linux-kernel, Oleg Nesterov, David Howells, Eric W. Biederman,
	Andrew Morton, Al Viro

On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>> On 05/28/2014 02:24 PM, Chen Hanxiao wrote:
>>> We need a direct method of getting the pid inside containers.
>>
>> But there's more generic issue -- some day we'll need to know not only
>> PIDs as seen from different namespaces, but also SIDs and PGIDs.
> 
> Maybe include all per-ns ID in a separate file?

This looks reasonable, but wouldn't this file be too big for a loaded system?

> Then the old 'status'
> file includes IDs from the current namespace only, the new file (e.g.
> 'ids' or 'ns_ids') contains only hierarchical IDs which differ from
> namespace to namespace for all possible namespaces.  

For all visible namespaces. I.e. -- if a task lives in a container and reads
its /proc/self/status it should _not_ see its host pid. Just like it is now
in the current patch. Otherwise it would bring blockers to live migration :(

> It will be simplier
> to parse the file -- if 'ns_ids' file contains some ID then this ID for
> every ns can be obtained regardless of the specific ID name (SID, PID,
> PGID, etc.).

True, but given a task PID how to determine which pid namespaces it lives in
to get the idea of how PIDs map to each other? Maybe we need some explicit
API for converting (ID, NS1, NS2) into (ID)?

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-28 19:27         ` Pavel Emelyanov
@ 2014-05-29  5:59               ` Vasily Kulikov
  0 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-29  5:59 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> > It will be simplier
> > to parse the file -- if 'ns_ids' file contains some ID then this ID for
> > every ns can be obtained regardless of the specific ID name (SID, PID,
> > PGID, etc.).
> 
> True, but given a task PID how to determine which pid namespaces it lives in
> to get the idea of how PIDs map to each other? Maybe we need some explicit
> API for converting (ID, NS1, NS2) into (ID)?

AFAIU the idea of the patch is to add a new debugging information which
can be trivially obtained via 'cat /proc/...':

] We need a direct method of getting the pid inside containers.
] If some issues occurred inside container guest, host user
] could not know which process is in trouble just by guest pid:
] the users of container guest only knew the pid inside containers.
] This will bring obstacle for trouble shooting.

A new syscall might complicate trouble shooting by admin.

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29  5:59               ` Vasily Kulikov
  0 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-29  5:59 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> > It will be simplier
> > to parse the file -- if 'ns_ids' file contains some ID then this ID for
> > every ns can be obtained regardless of the specific ID name (SID, PID,
> > PGID, etc.).
> 
> True, but given a task PID how to determine which pid namespaces it lives in
> to get the idea of how PIDs map to each other? Maybe we need some explicit
> API for converting (ID, NS1, NS2) into (ID)?

AFAIU the idea of the patch is to add a new debugging information which
can be trivially obtained via 'cat /proc/...':

] We need a direct method of getting the pid inside containers.
] If some issues occurred inside container guest, host user
] could not know which process is in trouble just by guest pid:
] the users of container guest only knew the pid inside containers.
] This will bring obstacle for trouble shooting.

A new syscall might complicate trouble shooting by admin.

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  5:59               ` Vasily Kulikov
@ 2014-05-29  9:07                 ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29  9:07 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>> It will be simplier
>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>> PGID, etc.).
>>
>> True, but given a task PID how to determine which pid namespaces it lives in
>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>> API for converting (ID, NS1, NS2) into (ID)?
> 
> AFAIU the idea of the patch is to add a new debugging information which
> can be trivially obtained via 'cat /proc/...':

I agree, but this ability will be very useful by checkpoint-restore project
too and I'd really appreciate if the API we have for that would be scalable
enough. Per-task proc file works for me, but how about sid-s and pgid-s?

> ] We need a direct method of getting the pid inside containers.
> ] If some issues occurred inside container guest, host user
> ] could not know which process is in trouble just by guest pid:
> ] the users of container guest only knew the pid inside containers.
> ] This will bring obstacle for trouble shooting.
> 
> A new syscall might complicate trouble shooting by admin.

Pure syscall -- yes. What if we teach the ps and top utilities to show additional
info? I think that would help.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29  9:07                 ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29  9:07 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>> It will be simplier
>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>> PGID, etc.).
>>
>> True, but given a task PID how to determine which pid namespaces it lives in
>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>> API for converting (ID, NS1, NS2) into (ID)?
> 
> AFAIU the idea of the patch is to add a new debugging information which
> can be trivially obtained via 'cat /proc/...':

I agree, but this ability will be very useful by checkpoint-restore project
too and I'd really appreciate if the API we have for that would be scalable
enough. Per-task proc file works for me, but how about sid-s and pgid-s?

> ] We need a direct method of getting the pid inside containers.
> ] If some issues occurred inside container guest, host user
> ] could not know which process is in trouble just by guest pid:
> ] the users of container guest only knew the pid inside containers.
> ] This will bring obstacle for trouble shooting.
> 
> A new syscall might complicate trouble shooting by admin.

Pure syscall -- yes. What if we teach the ps and top utilities to show additional
info? I think that would help.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
       [not found]                 ` <5386F8EA.8050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2014-05-29  9:21                   ` Richard Weinberger
  2014-05-29  9:53                   ` chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
  2014-05-29 11:12                     ` Vasily Kulikov
  2 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2014-05-29  9:21 UTC (permalink / raw)
  To: Pavel Emelyanov, Vasily Kulikov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>> It will be simplier
>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>> PGID, etc.).
>>>
>>> True, but given a task PID how to determine which pid namespaces it lives in
>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>> API for converting (ID, NS1, NS2) into (ID)?
>>
>> AFAIU the idea of the patch is to add a new debugging information which
>> can be trivially obtained via 'cat /proc/...':
> 
> I agree, but this ability will be very useful by checkpoint-restore project
> too and I'd really appreciate if the API we have for that would be scalable
> enough. Per-task proc file works for me, but how about sid-s and pgid-s?

What kind of information does CRIU need?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:07                 ` Pavel Emelyanov
  (?)
@ 2014-05-29  9:21                 ` Richard Weinberger
       [not found]                   ` <5386FC0C.9000307-/L3Ra7n9ekc@public.gmane.org>
  -1 siblings, 1 reply; 40+ messages in thread
From: Richard Weinberger @ 2014-05-29  9:21 UTC (permalink / raw)
  To: Pavel Emelyanov, Vasily Kulikov
  Cc: containers, Serge Hallyn, linux-kernel, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>> It will be simplier
>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>> PGID, etc.).
>>>
>>> True, but given a task PID how to determine which pid namespaces it lives in
>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>> API for converting (ID, NS1, NS2) into (ID)?
>>
>> AFAIU the idea of the patch is to add a new debugging information which
>> can be trivially obtained via 'cat /proc/...':
> 
> I agree, but this ability will be very useful by checkpoint-restore project
> too and I'd really appreciate if the API we have for that would be scalable
> enough. Per-task proc file works for me, but how about sid-s and pgid-s?

What kind of information does CRIU need?

Thanks,
//richard


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:21                 ` Richard Weinberger
@ 2014-05-29  9:41                       ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29  9:41 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Eric W. Biederman, Andrew Morton, Vasily Kulikov,
	Al Viro

On 05/29/2014 01:21 PM, Richard Weinberger wrote:
> Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>> It will be simplier
>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>> PGID, etc.).
>>>>
>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>
>>> AFAIU the idea of the patch is to add a new debugging information which
>>> can be trivially obtained via 'cat /proc/...':
>>
>> I agree, but this ability will be very useful by checkpoint-restore project
>> too and I'd really appreciate if the API we have for that would be scalable
>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
> 
> What kind of information does CRIU need?

We need to know what pid namespaces a task lives in and how pid, sid and
pgid look in all of them. A short example with pids only

Task t1 with pid 2, lives in init pid ns calls clone(CLONE_NEWPID), creates
ns1 with task t2 having pid (3, 1), then t2 calls clone(CLONE_NEWPID) again
and creates ns2 with task t3 having pid (4, 5, 1). I.e. the trees look like 
this:

    init_pid_ns    ns1         ns2
t1  2
t2   `- 3          1 
t3       `- 4      `- 5        1

Also note, that /proc/pid/ns will show us that t1 lives in init_pid_ns,
t2 lives in ns1 and t3 lives in ns2.

Now if we come from init pid ns with criu and try to dump task with pid 3
(i.e. the t2), the existing kernel API can tell us that:

a) t2 lives in ns1 != init_pid_ns (via /proc/pid/ns link)
b) t3 lives in ns2 != init_pid_ns
c) t2 has pid 3 (via init's /proc) in init ns and pid 1 in its ns (via t2's /proc)
d) t3 has pid 4 in init ns and pid 1 in its ns

what we also need to know and don't yet have an API for is

e) ns2 is the child of ns1
f) t3 has pid 5 in ns1

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29  9:41                       ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29  9:41 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Vasily Kulikov, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/29/2014 01:21 PM, Richard Weinberger wrote:
> Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>> It will be simplier
>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>> PGID, etc.).
>>>>
>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>
>>> AFAIU the idea of the patch is to add a new debugging information which
>>> can be trivially obtained via 'cat /proc/...':
>>
>> I agree, but this ability will be very useful by checkpoint-restore project
>> too and I'd really appreciate if the API we have for that would be scalable
>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
> 
> What kind of information does CRIU need?

We need to know what pid namespaces a task lives in and how pid, sid and
pgid look in all of them. A short example with pids only

Task t1 with pid 2, lives in init pid ns calls clone(CLONE_NEWPID), creates
ns1 with task t2 having pid (3, 1), then t2 calls clone(CLONE_NEWPID) again
and creates ns2 with task t3 having pid (4, 5, 1). I.e. the trees look like 
this:

    init_pid_ns    ns1         ns2
t1  2
t2   `- 3          1 
t3       `- 4      `- 5        1

Also note, that /proc/pid/ns will show us that t1 lives in init_pid_ns,
t2 lives in ns1 and t3 lives in ns2.

Now if we come from init pid ns with criu and try to dump task with pid 3
(i.e. the t2), the existing kernel API can tell us that:

a) t2 lives in ns1 != init_pid_ns (via /proc/pid/ns link)
b) t3 lives in ns2 != init_pid_ns
c) t2 has pid 3 (via init's /proc) in init ns and pid 1 in its ns (via t2's /proc)
d) t3 has pid 4 in init ns and pid 1 in its ns

what we also need to know and don't yet have an API for is

e) ns2 is the child of ns1
f) t3 has pid 5 in ns1

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
       [not found]                 ` <5386F8EA.8050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2014-05-29  9:21                   ` Richard Weinberger
@ 2014-05-29  9:53                   ` chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
  2014-05-29 11:12                     ` Vasily Kulikov
  2 siblings, 0 replies; 40+ messages in thread
From: chenhanxiao-BthXqXjhjHXQFUHtdCDX3A @ 2014-05-29  9:53 UTC (permalink / raw)
  To: Pavel Emelyanov, Vasily Kulikov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro



> -----Original Message-----
> From: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> >>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> >>> It will be simplier
> >>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
> >>> every ns can be obtained regardless of the specific ID name (SID, PID,
> >>> PGID, etc.).
> >>
> >> True, but given a task PID how to determine which pid namespaces it lives in
> >> to get the idea of how PIDs map to each other? Maybe we need some explicit
> >> API for converting (ID, NS1, NS2) into (ID)?
> >
> > AFAIU the idea of the patch is to add a new debugging information which
> > can be trivially obtained via 'cat /proc/...':
> 
> I agree, but this ability will be very useful by checkpoint-restore project
> too and I'd really appreciate if the API we have for that would be scalable
> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
> 

Yes, a new syscall is very useful, but it should be another task.
Just for Pids, I think proc file is good enough.

> > ] We need a direct method of getting the pid inside containers.
> > ] If some issues occurred inside container guest, host user
> > ] could not know which process is in trouble just by guest pid:
> > ] the users of container guest only knew the pid inside containers.
> > ] This will bring obstacle for trouble shooting.
> >
> > A new syscall might complicate trouble shooting by admin.
> 
> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> info? I think that would help.
>

Thanks,
- Chen

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:07                 ` Pavel Emelyanov
  (?)
  (?)
@ 2014-05-29  9:53                 ` chenhanxiao
       [not found]                   ` <5871495633F38949900D2BF2DC04883E52A481-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>
  -1 siblings, 1 reply; 40+ messages in thread
From: chenhanxiao @ 2014-05-29  9:53 UTC (permalink / raw)
  To: Pavel Emelyanov, Vasily Kulikov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro, Gotou, Yasunori

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 1905 bytes --]



> -----Original Message-----
> From: containers-bounces@lists.linux-foundation.org
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> >>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> >>> It will be simplier
> >>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
> >>> every ns can be obtained regardless of the specific ID name (SID, PID,
> >>> PGID, etc.).
> >>
> >> True, but given a task PID how to determine which pid namespaces it lives in
> >> to get the idea of how PIDs map to each other? Maybe we need some explicit
> >> API for converting (ID, NS1, NS2) into (ID)?
> >
> > AFAIU the idea of the patch is to add a new debugging information which
> > can be trivially obtained via 'cat /proc/...':
> 
> I agree, but this ability will be very useful by checkpoint-restore project
> too and I'd really appreciate if the API we have for that would be scalable
> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
> 

Yes, a new syscall is very useful, but it should be another task.
Just for Pids, I think proc file is good enough.

> > ] We need a direct method of getting the pid inside containers.
> > ] If some issues occurred inside container guest, host user
> > ] could not know which process is in trouble just by guest pid:
> > ] the users of container guest only knew the pid inside containers.
> > ] This will bring obstacle for trouble shooting.
> >
> > A new syscall might complicate trouble shooting by admin.
> 
> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> info? I think that would help.
>

Thanks,
- Chen
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:41                       ` Pavel Emelyanov
@ 2014-05-29  9:54                           ` Richard Weinberger
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2014-05-29  9:54 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Eric W. Biederman, Andrew Morton, Vasily Kulikov,
	Al Viro

Am 29.05.2014 11:41, schrieb Pavel Emelyanov:
> On 05/29/2014 01:21 PM, Richard Weinberger wrote:
>> Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>>> It will be simplier
>>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>>> PGID, etc.).
>>>>>
>>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>>
>>>> AFAIU the idea of the patch is to add a new debugging information which
>>>> can be trivially obtained via 'cat /proc/...':
>>>
>>> I agree, but this ability will be very useful by checkpoint-restore project
>>> too and I'd really appreciate if the API we have for that would be scalable
>>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
>>
>> What kind of information does CRIU need?
> 
> We need to know what pid namespaces a task lives in and how pid, sid and
> pgid look in all of them. A short example with pids only

So use case is to checkpoint/restore nested containers? :)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29  9:54                           ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2014-05-29  9:54 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Vasily Kulikov, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

Am 29.05.2014 11:41, schrieb Pavel Emelyanov:
> On 05/29/2014 01:21 PM, Richard Weinberger wrote:
>> Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>>> It will be simplier
>>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>>> PGID, etc.).
>>>>>
>>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>>
>>>> AFAIU the idea of the patch is to add a new debugging information which
>>>> can be trivially obtained via 'cat /proc/...':
>>>
>>> I agree, but this ability will be very useful by checkpoint-restore project
>>> too and I'd really appreciate if the API we have for that would be scalable
>>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
>>
>> What kind of information does CRIU need?
> 
> We need to know what pid namespaces a task lives in and how pid, sid and
> pgid look in all of them. A short example with pids only

So use case is to checkpoint/restore nested containers? :)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:54                           ` Richard Weinberger
@ 2014-05-29 10:02                               ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 10:02 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Eric W. Biederman, Andrew Morton, Vasily Kulikov,
	Al Viro

On 05/29/2014 01:54 PM, Richard Weinberger wrote:
> Am 29.05.2014 11:41, schrieb Pavel Emelyanov:
>> On 05/29/2014 01:21 PM, Richard Weinberger wrote:
>>> Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
>>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>>>> It will be simplier
>>>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>>>> PGID, etc.).
>>>>>>
>>>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>>>
>>>>> AFAIU the idea of the patch is to add a new debugging information which
>>>>> can be trivially obtained via 'cat /proc/...':
>>>>
>>>> I agree, but this ability will be very useful by checkpoint-restore project
>>>> too and I'd really appreciate if the API we have for that would be scalable
>>>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
>>>
>>> What kind of information does CRIU need?
>>
>> We need to know what pid namespaces a task lives in and how pid, sid and
>> pgid look in all of them. A short example with pids only
> 
> So use case is to checkpoint/restore nested containers? :)

Yes, but there's one more scenario. AFAIK some applications create pid namespaces 
themselves, without starting what is typically called "a container" :) And when 
such an applications are run inside, well ... "more real" container (e.g. using
openvz, lxc or docker tools) we face this issue.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 10:02                               ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 10:02 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Vasily Kulikov, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/29/2014 01:54 PM, Richard Weinberger wrote:
> Am 29.05.2014 11:41, schrieb Pavel Emelyanov:
>> On 05/29/2014 01:21 PM, Richard Weinberger wrote:
>>> Am 29.05.2014 11:07, schrieb Pavel Emelyanov:
>>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>>>> It will be simplier
>>>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>>>> PGID, etc.).
>>>>>>
>>>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>>>
>>>>> AFAIU the idea of the patch is to add a new debugging information which
>>>>> can be trivially obtained via 'cat /proc/...':
>>>>
>>>> I agree, but this ability will be very useful by checkpoint-restore project
>>>> too and I'd really appreciate if the API we have for that would be scalable
>>>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
>>>
>>> What kind of information does CRIU need?
>>
>> We need to know what pid namespaces a task lives in and how pid, sid and
>> pgid look in all of them. A short example with pids only
> 
> So use case is to checkpoint/restore nested containers? :)

Yes, but there's one more scenario. AFAIK some applications create pid namespaces 
themselves, without starting what is typically called "a container" :) And when 
such an applications are run inside, well ... "more real" container (e.g. using
openvz, lxc or docker tools) we face this issue.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29 10:02                               ` Pavel Emelyanov
@ 2014-05-29 10:19                                   ` Richard Weinberger
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2014-05-29 10:19 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Eric W. Biederman, Andrew Morton, Vasily Kulikov,
	Al Viro

Am 29.05.2014 12:02, schrieb Pavel Emelyanov:
>>> We need to know what pid namespaces a task lives in and how pid, sid and
>>> pgid look in all of them. A short example with pids only
>>
>> So use case is to checkpoint/restore nested containers? :)
> 
> Yes, but there's one more scenario. AFAIK some applications create pid namespaces 
> themselves, without starting what is typically called "a container" :) And when 
> such an applications are run inside, well ... "more real" container (e.g. using
> openvz, lxc or docker tools) we face this issue.

Do you know such an application?
I'm a aware of systemd which uses CLONE_NEWNET/NS to implement security features.

We could add a directory like /proc/<pidX>/ns/proc/ which would contain everything
from /proc/<pidX inside the namespace>/.

This needs definitely more discussion and must not solved by ad-hoc solutions.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 10:19                                   ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2014-05-29 10:19 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Vasily Kulikov, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

Am 29.05.2014 12:02, schrieb Pavel Emelyanov:
>>> We need to know what pid namespaces a task lives in and how pid, sid and
>>> pgid look in all of them. A short example with pids only
>>
>> So use case is to checkpoint/restore nested containers? :)
> 
> Yes, but there's one more scenario. AFAIK some applications create pid namespaces 
> themselves, without starting what is typically called "a container" :) And when 
> such an applications are run inside, well ... "more real" container (e.g. using
> openvz, lxc or docker tools) we face this issue.

Do you know such an application?
I'm a aware of systemd which uses CLONE_NEWNET/NS to implement security features.

We could add a directory like /proc/<pidX>/ns/proc/ which would contain everything
from /proc/<pidX inside the namespace>/.

This needs definitely more discussion and must not solved by ad-hoc solutions.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29 10:19                                   ` Richard Weinberger
@ 2014-05-29 10:36                                       ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 10:36 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Eric W. Biederman, Andrew Morton, Vasily Kulikov,
	Al Viro

On 05/29/2014 02:19 PM, Richard Weinberger wrote:
> Am 29.05.2014 12:02, schrieb Pavel Emelyanov:
>>>> We need to know what pid namespaces a task lives in and how pid, sid and
>>>> pgid look in all of them. A short example with pids only
>>>
>>> So use case is to checkpoint/restore nested containers? :)
>>
>> Yes, but there's one more scenario. AFAIK some applications create pid namespaces 
>> themselves, without starting what is typically called "a container" :) And when 
>> such an applications are run inside, well ... "more real" container (e.g. using
>> openvz, lxc or docker tools) we face this issue.
> 
> Do you know such an application?

There were a couple of them reported on the criu mailing list, but I didn't
track those :(

> I'm a aware of systemd which uses CLONE_NEWNET/NS to implement security features.

Yup, this is its typical behavior.

> We could add a directory like /proc/<pidX>/ns/proc/ which would contain everything
> from /proc/<pidX inside the namespace>/.

But how would it help to find out which $pid directories correspond to which
to properly collect the pid mappings?

> This needs definitely more discussion and must not solved by ad-hoc solutions.

Absolutely.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 10:36                                       ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 10:36 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Vasily Kulikov, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/29/2014 02:19 PM, Richard Weinberger wrote:
> Am 29.05.2014 12:02, schrieb Pavel Emelyanov:
>>>> We need to know what pid namespaces a task lives in and how pid, sid and
>>>> pgid look in all of them. A short example with pids only
>>>
>>> So use case is to checkpoint/restore nested containers? :)
>>
>> Yes, but there's one more scenario. AFAIK some applications create pid namespaces 
>> themselves, without starting what is typically called "a container" :) And when 
>> such an applications are run inside, well ... "more real" container (e.g. using
>> openvz, lxc or docker tools) we face this issue.
> 
> Do you know such an application?

There were a couple of them reported on the criu mailing list, but I didn't
track those :(

> I'm a aware of systemd which uses CLONE_NEWNET/NS to implement security features.

Yup, this is its typical behavior.

> We could add a directory like /proc/<pidX>/ns/proc/ which would contain everything
> from /proc/<pidX inside the namespace>/.

But how would it help to find out which $pid directories correspond to which
to properly collect the pid mappings?

> This needs definitely more discussion and must not solved by ad-hoc solutions.

Absolutely.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:53                 ` chenhanxiao
@ 2014-05-29 10:40                       ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 10:40 UTC (permalink / raw)
  To: chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Eric W. Biederman, Andrew Morton, Vasily Kulikov,
	Al Viro

On 05/29/2014 01:53 PM, chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org wrote:
> 
> 
>> -----Original Message-----
>> From: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>> It will be simplier
>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>> PGID, etc.).
>>>>
>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>
>>> AFAIU the idea of the patch is to add a new debugging information which
>>> can be trivially obtained via 'cat /proc/...':
>>
>> I agree, but this ability will be very useful by checkpoint-restore project
>> too and I'd really appreciate if the API we have for that would be scalable
>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
>>
> 
> Yes, a new syscall is very useful, but it should be another task.
> Just for Pids, I think proc file is good enough.

It is, but since we're going to think about more generic API, that would serve
your needs as well, why do we need two APIs?

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 10:40                       ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 10:40 UTC (permalink / raw)
  To: chenhanxiao
  Cc: Vasily Kulikov, Richard Weinberger, containers, Serge Hallyn,
	linux-kernel, Oleg Nesterov, David Howells, Eric W. Biederman,
	Andrew Morton, Al Viro, Gotou, Yasunori

On 05/29/2014 01:53 PM, chenhanxiao@cn.fujitsu.com wrote:
> 
> 
>> -----Original Message-----
>> From: containers-bounces@lists.linux-foundation.org
>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
>>>>> It will be simplier
>>>>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
>>>>> every ns can be obtained regardless of the specific ID name (SID, PID,
>>>>> PGID, etc.).
>>>>
>>>> True, but given a task PID how to determine which pid namespaces it lives in
>>>> to get the idea of how PIDs map to each other? Maybe we need some explicit
>>>> API for converting (ID, NS1, NS2) into (ID)?
>>>
>>> AFAIU the idea of the patch is to add a new debugging information which
>>> can be trivially obtained via 'cat /proc/...':
>>
>> I agree, but this ability will be very useful by checkpoint-restore project
>> too and I'd really appreciate if the API we have for that would be scalable
>> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
>>
> 
> Yes, a new syscall is very useful, but it should be another task.
> Just for Pids, I think proc file is good enough.

It is, but since we're going to think about more generic API, that would serve
your needs as well, why do we need two APIs?

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29  9:07                 ` Pavel Emelyanov
@ 2014-05-29 11:12                     ` Vasily Kulikov
  -1 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-29 11:12 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> > ] We need a direct method of getting the pid inside containers.
> > ] If some issues occurred inside container guest, host user
> > ] could not know which process is in trouble just by guest pid:
> > ] the users of container guest only knew the pid inside containers.
> > ] This will bring obstacle for trouble shooting.
> > 
> > A new syscall might complicate trouble shooting by admin.
> 
> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> info? I think that would help.

I like the idea with low level non-shell API which can be used by
utility like ps (or implementation of a new tool to work with complex
namespace hierarchies).  It should fit for troublesooting.  Then there
should be no reason to implement two different APIs for observation from
shell via FS and from applications.

However, maybe it is possible to implement not via new syscall but
by implementation of new symlink in sysfs?  Then both ps-like tool and
CRIU-like tool is able to obtain the ns information by the same means.
Maybe sort of a symlink to a parent namespace or a process which is
inside of the parent namespace?  Then a process may identify IDs using
following steps:

1) identify target NS by walking current procfs
2) do setns(2)/chroot(2)
3) look at procfs to identify target IDs in the target NS

It would be impossible to identify foreign IDs for unprivileged
processes, however.

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 11:12                     ` Vasily Kulikov
  0 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-29 11:12 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> > ] We need a direct method of getting the pid inside containers.
> > ] If some issues occurred inside container guest, host user
> > ] could not know which process is in trouble just by guest pid:
> > ] the users of container guest only knew the pid inside containers.
> > ] This will bring obstacle for trouble shooting.
> > 
> > A new syscall might complicate trouble shooting by admin.
> 
> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> info? I think that would help.

I like the idea with low level non-shell API which can be used by
utility like ps (or implementation of a new tool to work with complex
namespace hierarchies).  It should fit for troublesooting.  Then there
should be no reason to implement two different APIs for observation from
shell via FS and from applications.

However, maybe it is possible to implement not via new syscall but
by implementation of new symlink in sysfs?  Then both ps-like tool and
CRIU-like tool is able to obtain the ns information by the same means.
Maybe sort of a symlink to a parent namespace or a process which is
inside of the parent namespace?  Then a process may identify IDs using
following steps:

1) identify target NS by walking current procfs
2) do setns(2)/chroot(2)
3) look at procfs to identify target IDs in the target NS

It would be impossible to identify foreign IDs for unprivileged
processes, however.

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29 11:12                     ` Vasily Kulikov
@ 2014-05-29 11:31                       ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 11:31 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>> ] We need a direct method of getting the pid inside containers.
>>> ] If some issues occurred inside container guest, host user
>>> ] could not know which process is in trouble just by guest pid:
>>> ] the users of container guest only knew the pid inside containers.
>>> ] This will bring obstacle for trouble shooting.
>>>
>>> A new syscall might complicate trouble shooting by admin.
>>
>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
>> info? I think that would help.
> 
> I like the idea with low level non-shell API which can be used by
> utility like ps (or implementation of a new tool to work with complex
> namespace hierarchies).  It should fit for troublesooting.  Then there
> should be no reason to implement two different APIs for observation from
> shell via FS and from applications.

Maybe we can reuse the existing kcmp() system call? We would have to store
the collected pid values in some hash/tree anyway, and kcmp() provides us
good comparing function for doing this.

Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
"Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
nsfd2 the same?"

What do you think?

> However, maybe it is possible to implement not via new syscall but
> by implementation of new symlink in sysfs?  Then both ps-like tool and
> CRIU-like tool is able to obtain the ns information by the same means.
> Maybe sort of a symlink to a parent namespace or a process which is
> inside of the parent namespace?  Then a process may identify IDs using
> following steps:
> 
> 1) identify target NS by walking current procfs
> 2) do setns(2)/chroot(2)
> 3) look at procfs to identify target IDs in the target NS

Can you elaborate on this? Let's imagine we have picked two tasks with
init_pid_ns' PIDs being 11 and 12 and we've found out using /proc/pid/ns/pid
links that they both live in some non-init pid namespace.

Then we have to look at this ns' proc. It says that there are also two 
tasks -- 2 and 3. How can we determine which pid is which?


By the way, calling setns() doesn't change the pids we see in any proc -- the
namespace proc shows pids for is pinned at proc mount time. Neither you can
mount this other pidns' proc into temporary location after setns, kernel uses 
the pidns task really lives in -- the task_active_pid_ns(). But nonetheless, 
we have the proc we want.

> It would be impossible to identify foreign IDs for unprivileged
> processes, however.

Yes, that would be useful limitation.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 11:31                       ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 11:31 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>> ] We need a direct method of getting the pid inside containers.
>>> ] If some issues occurred inside container guest, host user
>>> ] could not know which process is in trouble just by guest pid:
>>> ] the users of container guest only knew the pid inside containers.
>>> ] This will bring obstacle for trouble shooting.
>>>
>>> A new syscall might complicate trouble shooting by admin.
>>
>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
>> info? I think that would help.
> 
> I like the idea with low level non-shell API which can be used by
> utility like ps (or implementation of a new tool to work with complex
> namespace hierarchies).  It should fit for troublesooting.  Then there
> should be no reason to implement two different APIs for observation from
> shell via FS and from applications.

Maybe we can reuse the existing kcmp() system call? We would have to store
the collected pid values in some hash/tree anyway, and kcmp() provides us
good comparing function for doing this.

Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
"Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
nsfd2 the same?"

What do you think?

> However, maybe it is possible to implement not via new syscall but
> by implementation of new symlink in sysfs?  Then both ps-like tool and
> CRIU-like tool is able to obtain the ns information by the same means.
> Maybe sort of a symlink to a parent namespace or a process which is
> inside of the parent namespace?  Then a process may identify IDs using
> following steps:
> 
> 1) identify target NS by walking current procfs
> 2) do setns(2)/chroot(2)
> 3) look at procfs to identify target IDs in the target NS

Can you elaborate on this? Let's imagine we have picked two tasks with
init_pid_ns' PIDs being 11 and 12 and we've found out using /proc/pid/ns/pid
links that they both live in some non-init pid namespace.

Then we have to look at this ns' proc. It says that there are also two 
tasks -- 2 and 3. How can we determine which pid is which?


By the way, calling setns() doesn't change the pids we see in any proc -- the
namespace proc shows pids for is pinned at proc mount time. Neither you can
mount this other pidns' proc into temporary location after setns, kernel uses 
the pidns task really lives in -- the task_active_pid_ns(). But nonetheless, 
we have the proc we want.

> It would be impossible to identify foreign IDs for unprivileged
> processes, however.

Yes, that would be useful limitation.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29 11:31                       ` Pavel Emelyanov
@ 2014-05-29 11:59                           ` Vasily Kulikov
  -1 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-29 11:59 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
> > On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
> >> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> >>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >>> ] We need a direct method of getting the pid inside containers.
> >>> ] If some issues occurred inside container guest, host user
> >>> ] could not know which process is in trouble just by guest pid:
> >>> ] the users of container guest only knew the pid inside containers.
> >>> ] This will bring obstacle for trouble shooting.
> >>>
> >>> A new syscall might complicate trouble shooting by admin.
> >>
> >> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> >> info? I think that would help.
> > 
> > I like the idea with low level non-shell API which can be used by
> > utility like ps (or implementation of a new tool to work with complex
> > namespace hierarchies).  It should fit for troublesooting.  Then there
> > should be no reason to implement two different APIs for observation from
> > shell via FS and from applications.
> 
> Maybe we can reuse the existing kcmp() system call? We would have to store
> the collected pid values in some hash/tree anyway, and kcmp() provides us
> good comparing function for doing this.
> 
> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
> nsfd2 the same?"
> 
> What do you think?

kcmp() is not needed, just compare inode numbers:

    # ls -il /proc/{43,self}/ns/mnt
    208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
    216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]

> > However, maybe it is possible to implement not via new syscall but
> > by implementation of new symlink in sysfs?  Then both ps-like tool and
> > CRIU-like tool is able to obtain the ns information by the same means.
> > Maybe sort of a symlink to a parent namespace or a process which is
> > inside of the parent namespace?  Then a process may identify IDs using
> > following steps:
> > 
> > 1) identify target NS by walking current procfs
> > 2) do setns(2)/chroot(2)
> > 3) look at procfs to identify target IDs in the target NS
> 
> Can you elaborate on this? Let's imagine we have picked two tasks with
> init_pid_ns' PIDs being 11 and 12 and we've found out using /proc/pid/ns/pid
> links that they both live in some non-init pid namespace.
> 
> Then we have to look at this ns' proc. It says that there are also two 
> tasks -- 2 and 3. How can we determine which pid is which?

Oh, right.  My idea is broken.

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 11:59                           ` Vasily Kulikov
  0 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-29 11:59 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
> > On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
> >> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> >>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >>> ] We need a direct method of getting the pid inside containers.
> >>> ] If some issues occurred inside container guest, host user
> >>> ] could not know which process is in trouble just by guest pid:
> >>> ] the users of container guest only knew the pid inside containers.
> >>> ] This will bring obstacle for trouble shooting.
> >>>
> >>> A new syscall might complicate trouble shooting by admin.
> >>
> >> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> >> info? I think that would help.
> > 
> > I like the idea with low level non-shell API which can be used by
> > utility like ps (or implementation of a new tool to work with complex
> > namespace hierarchies).  It should fit for troublesooting.  Then there
> > should be no reason to implement two different APIs for observation from
> > shell via FS and from applications.
> 
> Maybe we can reuse the existing kcmp() system call? We would have to store
> the collected pid values in some hash/tree anyway, and kcmp() provides us
> good comparing function for doing this.
> 
> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
> nsfd2 the same?"
> 
> What do you think?

kcmp() is not needed, just compare inode numbers:

    # ls -il /proc/{43,self}/ns/mnt
    208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
    216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]

> > However, maybe it is possible to implement not via new syscall but
> > by implementation of new symlink in sysfs?  Then both ps-like tool and
> > CRIU-like tool is able to obtain the ns information by the same means.
> > Maybe sort of a symlink to a parent namespace or a process which is
> > inside of the parent namespace?  Then a process may identify IDs using
> > following steps:
> > 
> > 1) identify target NS by walking current procfs
> > 2) do setns(2)/chroot(2)
> > 3) look at procfs to identify target IDs in the target NS
> 
> Can you elaborate on this? Let's imagine we have picked two tasks with
> init_pid_ns' PIDs being 11 and 12 and we've found out using /proc/pid/ns/pid
> links that they both live in some non-init pid namespace.
> 
> Then we have to look at this ns' proc. It says that there are also two 
> tasks -- 2 and 3. How can we determine which pid is which?

Oh, right.  My idea is broken.

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29 11:59                           ` Vasily Kulikov
@ 2014-05-29 12:53                             ` Pavel Emelyanov
  -1 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 12:53 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On 05/29/2014 03:59 PM, Vasily Kulikov wrote:
> On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
>> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
>>> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
>>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>>> ] We need a direct method of getting the pid inside containers.
>>>>> ] If some issues occurred inside container guest, host user
>>>>> ] could not know which process is in trouble just by guest pid:
>>>>> ] the users of container guest only knew the pid inside containers.
>>>>> ] This will bring obstacle for trouble shooting.
>>>>>
>>>>> A new syscall might complicate trouble shooting by admin.
>>>>
>>>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
>>>> info? I think that would help.
>>>
>>> I like the idea with low level non-shell API which can be used by
>>> utility like ps (or implementation of a new tool to work with complex
>>> namespace hierarchies).  It should fit for troublesooting.  Then there
>>> should be no reason to implement two different APIs for observation from
>>> shell via FS and from applications.
>>
>> Maybe we can reuse the existing kcmp() system call? We would have to store
>> the collected pid values in some hash/tree anyway, and kcmp() provides us
>> good comparing function for doing this.
>>
>> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
>> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
>> nsfd2 the same?"
>>
>> What do you think?
> 
> kcmp() is not needed, just compare inode numbers:
> 
>     # ls -il /proc/{43,self}/ns/mnt
>     208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
>     216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]

But that's for comparing the namespaces, while I'm proposing the kcmp to
check for PIDs.

Thanks,
Pavel
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-29 12:53                             ` Pavel Emelyanov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Emelyanov @ 2014-05-29 12:53 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On 05/29/2014 03:59 PM, Vasily Kulikov wrote:
> On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
>> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
>>> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
>>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>>>>> ] We need a direct method of getting the pid inside containers.
>>>>> ] If some issues occurred inside container guest, host user
>>>>> ] could not know which process is in trouble just by guest pid:
>>>>> ] the users of container guest only knew the pid inside containers.
>>>>> ] This will bring obstacle for trouble shooting.
>>>>>
>>>>> A new syscall might complicate trouble shooting by admin.
>>>>
>>>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
>>>> info? I think that would help.
>>>
>>> I like the idea with low level non-shell API which can be used by
>>> utility like ps (or implementation of a new tool to work with complex
>>> namespace hierarchies).  It should fit for troublesooting.  Then there
>>> should be no reason to implement two different APIs for observation from
>>> shell via FS and from applications.
>>
>> Maybe we can reuse the existing kcmp() system call? We would have to store
>> the collected pid values in some hash/tree anyway, and kcmp() provides us
>> good comparing function for doing this.
>>
>> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
>> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
>> nsfd2 the same?"
>>
>> What do you think?
> 
> kcmp() is not needed, just compare inode numbers:
> 
>     # ls -il /proc/{43,self}/ns/mnt
>     208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
>     216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]

But that's for comparing the namespaces, while I'm proposing the kcmp to
check for PIDs.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-29 12:53                             ` Pavel Emelyanov
@ 2014-05-31  6:07                                 ` Vasily Kulikov
  -1 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-31  6:07 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov,
	David Howells, Eric W. Biederman, Andrew Morton, Al Viro

On Thu, May 29, 2014 at 16:53 +0400, Pavel Emelyanov wrote:
> On 05/29/2014 03:59 PM, Vasily Kulikov wrote:
> > On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
> >> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
> >>> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
> >>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> >>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >>>>> ] We need a direct method of getting the pid inside containers.
> >>>>> ] If some issues occurred inside container guest, host user
> >>>>> ] could not know which process is in trouble just by guest pid:
> >>>>> ] the users of container guest only knew the pid inside containers.
> >>>>> ] This will bring obstacle for trouble shooting.
> >>>>>
> >>>>> A new syscall might complicate trouble shooting by admin.
> >>>>
> >>>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> >>>> info? I think that would help.
> >>>
> >>> I like the idea with low level non-shell API which can be used by
> >>> utility like ps (or implementation of a new tool to work with complex
> >>> namespace hierarchies).  It should fit for troublesooting.  Then there
> >>> should be no reason to implement two different APIs for observation from
> >>> shell via FS and from applications.
> >>
> >> Maybe we can reuse the existing kcmp() system call? We would have to store
> >> the collected pid values in some hash/tree anyway, and kcmp() provides us
> >> good comparing function for doing this.
> >>
> >> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
> >> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
> >> nsfd2 the same?"
> >>
> >> What do you think?
> > 
> > kcmp() is not needed, just compare inode numbers:
> > 
> >     # ls -il /proc/{43,self}/ns/mnt
> >     208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
> >     216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]
> 
> But that's for comparing the namespaces, while I'm proposing the kcmp to
> check for PIDs.

Hm, right.

What about the following solution: export global process ID (PID in
init ns) which is visible inside of any namespace.  Then you can compare
numbers regardless in what namespace you are.

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-31  6:07                                 ` Vasily Kulikov
  0 siblings, 0 replies; 40+ messages in thread
From: Vasily Kulikov @ 2014-05-31  6:07 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Richard Weinberger, containers, Serge Hallyn, linux-kernel,
	Oleg Nesterov, David Howells, Eric W. Biederman, Andrew Morton,
	Al Viro

On Thu, May 29, 2014 at 16:53 +0400, Pavel Emelyanov wrote:
> On 05/29/2014 03:59 PM, Vasily Kulikov wrote:
> > On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
> >> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
> >>> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
> >>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> >>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >>>>> ] We need a direct method of getting the pid inside containers.
> >>>>> ] If some issues occurred inside container guest, host user
> >>>>> ] could not know which process is in trouble just by guest pid:
> >>>>> ] the users of container guest only knew the pid inside containers.
> >>>>> ] This will bring obstacle for trouble shooting.
> >>>>>
> >>>>> A new syscall might complicate trouble shooting by admin.
> >>>>
> >>>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
> >>>> info? I think that would help.
> >>>
> >>> I like the idea with low level non-shell API which can be used by
> >>> utility like ps (or implementation of a new tool to work with complex
> >>> namespace hierarchies).  It should fit for troublesooting.  Then there
> >>> should be no reason to implement two different APIs for observation from
> >>> shell via FS and from applications.
> >>
> >> Maybe we can reuse the existing kcmp() system call? We would have to store
> >> the collected pid values in some hash/tree anyway, and kcmp() provides us
> >> good comparing function for doing this.
> >>
> >> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
> >> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
> >> nsfd2 the same?"
> >>
> >> What do you think?
> > 
> > kcmp() is not needed, just compare inode numbers:
> > 
> >     # ls -il /proc/{43,self}/ns/mnt
> >     208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
> >     216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]
> 
> But that's for comparing the namespaces, while I'm proposing the kcmp to
> check for PIDs.

Hm, right.

What about the following solution: export global process ID (PID in
init ns) which is visible inside of any namespace.  Then you can compare
numbers regardless in what namespace you are.

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
  2014-05-31  6:07                                 ` Vasily Kulikov
@ 2014-05-31 20:08                                   ` Eric W. Biederman
  -1 siblings, 0 replies; 40+ messages in thread
From: Eric W. Biederman @ 2014-05-31 20:08 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Richard Weinberger,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	David Howells, Al Viro, Andrew Morton

Vasily Kulikov <segoon@openwall.com> writes:

> On Thu, May 29, 2014 at 16:53 +0400, Pavel Emelyanov wrote:
>> On 05/29/2014 03:59 PM, Vasily Kulikov wrote:
>> > On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
>> >> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
>> >>> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
>> >>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>> >>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>> >>>>> ] We need a direct method of getting the pid inside containers.
>> >>>>> ] If some issues occurred inside container guest, host user
>> >>>>> ] could not know which process is in trouble just by guest pid:
>> >>>>> ] the users of container guest only knew the pid inside containers.
>> >>>>> ] This will bring obstacle for trouble shooting.
>> >>>>>
>> >>>>> A new syscall might complicate trouble shooting by admin.
>> >>>>
>> >>>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
>> >>>> info? I think that would help.
>> >>>
>> >>> I like the idea with low level non-shell API which can be used by
>> >>> utility like ps (or implementation of a new tool to work with complex
>> >>> namespace hierarchies).  It should fit for troublesooting.  Then there
>> >>> should be no reason to implement two different APIs for observation from
>> >>> shell via FS and from applications.
>> >>
>> >> Maybe we can reuse the existing kcmp() system call? We would have to store
>> >> the collected pid values in some hash/tree anyway, and kcmp() provides us
>> >> good comparing function for doing this.
>> >>
>> >> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
>> >> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
>> >> nsfd2 the same?"
>> >>
>> >> What do you think?
>> > 
>> > kcmp() is not needed, just compare inode numbers:
>> > 
>> >     # ls -il /proc/{43,self}/ns/mnt
>> >     208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
>> >     216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]
>> 
>> But that's for comparing the namespaces, while I'm proposing the kcmp to
>> check for PIDs.
>
> Hm, right.
>
> What about the following solution: export global process ID (PID in
> init ns) which is visible inside of any namespace.  Then you can compare
> numbers regardless in what namespace you are.

Which then defeats the point of having pid namespaces in the first
place.

How do you get that same global pid after you have migrated your
container?

Eric

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] /proc/pid/status: show all sets of pid according to ns
@ 2014-05-31 20:08                                   ` Eric W. Biederman
  0 siblings, 0 replies; 40+ messages in thread
From: Eric W. Biederman @ 2014-05-31 20:08 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Pavel Emelyanov, Richard Weinberger, containers, Serge Hallyn,
	linux-kernel, Oleg Nesterov, David Howells, Andrew Morton,
	Al Viro

Vasily Kulikov <segoon@openwall.com> writes:

> On Thu, May 29, 2014 at 16:53 +0400, Pavel Emelyanov wrote:
>> On 05/29/2014 03:59 PM, Vasily Kulikov wrote:
>> > On Thu, May 29, 2014 at 15:31 +0400, Pavel Emelyanov wrote:
>> >> On 05/29/2014 03:12 PM, Vasily Kulikov wrote:
>> >>> On Thu, May 29, 2014 at 13:07 +0400, Pavel Emelyanov wrote:
>> >>>> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
>> >>>>> On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
>> >>>>> ] We need a direct method of getting the pid inside containers.
>> >>>>> ] If some issues occurred inside container guest, host user
>> >>>>> ] could not know which process is in trouble just by guest pid:
>> >>>>> ] the users of container guest only knew the pid inside containers.
>> >>>>> ] This will bring obstacle for trouble shooting.
>> >>>>>
>> >>>>> A new syscall might complicate trouble shooting by admin.
>> >>>>
>> >>>> Pure syscall -- yes. What if we teach the ps and top utilities to show additional
>> >>>> info? I think that would help.
>> >>>
>> >>> I like the idea with low level non-shell API which can be used by
>> >>> utility like ps (or implementation of a new tool to work with complex
>> >>> namespace hierarchies).  It should fit for troublesooting.  Then there
>> >>> should be no reason to implement two different APIs for observation from
>> >>> shell via FS and from applications.
>> >>
>> >> Maybe we can reuse the existing kcmp() system call? We would have to store
>> >> the collected pid values in some hash/tree anyway, and kcmp() provides us
>> >> good comparing function for doing this.
>> >>
>> >> Like we can call kcmp(pid1, pid2, KCMP_PID, nsfd1, nsfd2) which will mean
>> >> "Are tasks with pid1 in namespace pointed by nsfd1 and with pid2 in namespace
>> >> nsfd2 the same?"
>> >>
>> >> What do you think?
>> > 
>> > kcmp() is not needed, just compare inode numbers:
>> > 
>> >     # ls -il /proc/{43,self}/ns/mnt
>> >     208182 lrwxrwxrwx 1 root root 0 мая   29 15:52 /proc/43/ns/mnt -> mnt:[4026531856]
>> >     216556 lrwxrwxrwx 1 root root 0 мая   29 15:57 /proc/self/ns/mnt -> mnt:[4026531840]
>> 
>> But that's for comparing the namespaces, while I'm proposing the kcmp to
>> check for PIDs.
>
> Hm, right.
>
> What about the following solution: export global process ID (PID in
> init ns) which is visible inside of any namespace.  Then you can compare
> numbers regardless in what namespace you are.

Which then defeats the point of having pid namespaces in the first
place.

How do you get that same global pid after you have migrated your
container?

Eric


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-05-31 20:09 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-28 10:24 [PATCH v2] /proc/pid/status: show all sets of pid according to ns Chen Hanxiao
2014-05-28 10:24 ` Chen Hanxiao
     [not found] ` <1401272683-1659-1-git-send-email-chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2014-05-28 12:44   ` Pavel Emelyanov
2014-05-28 12:44     ` Pavel Emelyanov
     [not found]     ` <5385DA19.2060008-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-28 18:28       ` Vasily Kulikov
2014-05-28 18:28         ` Vasily Kulikov
2014-05-28 19:27         ` Pavel Emelyanov
2014-05-28 19:27         ` Pavel Emelyanov
     [not found]           ` <53863889.9080509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-29  5:59             ` Vasily Kulikov
2014-05-29  5:59               ` Vasily Kulikov
2014-05-29  9:07               ` Pavel Emelyanov
2014-05-29  9:07                 ` Pavel Emelyanov
2014-05-29  9:21                 ` Richard Weinberger
     [not found]                   ` <5386FC0C.9000307-/L3Ra7n9ekc@public.gmane.org>
2014-05-29  9:41                     ` Pavel Emelyanov
2014-05-29  9:41                       ` Pavel Emelyanov
     [not found]                       ` <538700B5.5070601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-29  9:54                         ` Richard Weinberger
2014-05-29  9:54                           ` Richard Weinberger
     [not found]                           ` <538703D0.7030308-/L3Ra7n9ekc@public.gmane.org>
2014-05-29 10:02                             ` Pavel Emelyanov
2014-05-29 10:02                               ` Pavel Emelyanov
     [not found]                               ` <5387059E.9010105-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-29 10:19                                 ` Richard Weinberger
2014-05-29 10:19                                   ` Richard Weinberger
     [not found]                                   ` <538709A5.60000-/L3Ra7n9ekc@public.gmane.org>
2014-05-29 10:36                                     ` Pavel Emelyanov
2014-05-29 10:36                                       ` Pavel Emelyanov
2014-05-29  9:53                 ` chenhanxiao
     [not found]                   ` <5871495633F38949900D2BF2DC04883E52A481-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>
2014-05-29 10:40                     ` Pavel Emelyanov
2014-05-29 10:40                       ` Pavel Emelyanov
     [not found]                 ` <5386F8EA.8050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-29  9:21                   ` Richard Weinberger
2014-05-29  9:53                   ` chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
2014-05-29 11:12                   ` Vasily Kulikov
2014-05-29 11:12                     ` Vasily Kulikov
2014-05-29 11:31                     ` Pavel Emelyanov
2014-05-29 11:31                       ` Pavel Emelyanov
     [not found]                       ` <53871A92.9000004-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-29 11:59                         ` Vasily Kulikov
2014-05-29 11:59                           ` Vasily Kulikov
2014-05-29 12:53                           ` Pavel Emelyanov
2014-05-29 12:53                             ` Pavel Emelyanov
     [not found]                             ` <53872DAD.1070502-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-05-31  6:07                               ` Vasily Kulikov
2014-05-31  6:07                                 ` Vasily Kulikov
2014-05-31 20:08                                 ` Eric W. Biederman
2014-05-31 20:08                                   ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.