All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] namespaces: log namespaces per task
@ 2014-04-22 18:12 ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-04-22 18:12 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA, sgrubb-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

I saw no replies to my questions when I replied a year after Aris' posting, so
I don't know if it was ignored or got lost in stale threads:
        https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
        https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
        https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html

I've tried to answer a number of questions that were raised in that thread.

The goal is not quite identical to Aris' patchset.

The purpose is to track namespaces in use by logged processes from the
perspective of init_*_ns.  The first patch defines a function to list them.
The second patch provides an example of usage for audit_log_task_info() which
is used by syscall audits, among others.  audit_log_task() and
audit_common_recv_message() would be other potential use cases.

Use a serial number per namespace (unique across one boot of one kernel)
instead of the inode number (which is claimed to have had the right to change
reserved and is not necessarily unique if there is more than one proc fs).  It
could be argued that the inode numbers have now become a defacto interface and
can't change now, but I'm proposing this approach to see if this helps address
some of the objections to the earlier patchset.

There could also have messages added to track the creation and the destruction
of namespaces, listing the parent for hierarchical namespaces such as pidns,
userns, and listing other ids for non-hierarchical namespaces, as well as other
information to help identify a namespace.

There has been some progress made for audit in net namespaces and pid
namespaces since this previous thread.  net namespaces are now served as peers
by one auditd in the init_net namespace with processes in a non-init_net
namespace being able to write records if they are in the init_user_ns and have
CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
of userspace processes that try to join netlink broadcast groups.


Questions:
Is there a way to link serial numbers of namespaces involved in migration of a
container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
identifier for each running instance of a kernel?  Or at least some identifier
within the container migration realm?

What additional events should list this information?

Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
init namespace at the moment.


Proposed output format:
This differs slightly from Aristeu's patch because of the label conflict with
"pid=" due to including it in existing records rather than it being a seperate
record:
        type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)


Note: This set does not try to solve the non-init namespace audit messages and
auditd problem yet.  That will come later, likely with additional auditd
instances running in another namespace with a limited ability to influence the
master auditd.  I echo Eric B's idea that messages destined for different
namespaces would have to be tailored for that namespace with references that
make sense (such as the right pid number reported to that pid namespace, and
not leaking info about parents or peers).


Richard Guy Briggs (2):
  namespaces: give each namespace a serial number
  audit: log namespace serial numbers

 fs/mount.h                     |    1 +
 fs/namespace.c                 |    1 +
 include/linux/audit.h          |    7 +++++++
 include/linux/ipc_namespace.h  |    1 +
 include/linux/nsproxy.h        |    8 ++++++++
 include/linux/pid_namespace.h  |    1 +
 include/linux/user_namespace.h |    1 +
 include/linux/utsname.h        |    1 +
 include/net/net_namespace.h    |    1 +
 init/version.c                 |    1 +
 ipc/msgutil.c                  |    1 +
 ipc/namespace.c                |    2 ++
 kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
 kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
 kernel/pid.c                   |    1 +
 kernel/pid_namespace.c         |    2 ++
 kernel/user.c                  |    1 +
 kernel/user_namespace.c        |    2 ++
 kernel/utsname.c               |    2 ++
 net/core/net_namespace.c       |    4 +++-
 20 files changed, 99 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 0/2] namespaces: log namespaces per task
@ 2014-04-22 18:12 ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-04-22 18:12 UTC (permalink / raw)
  To: linux-audit, linux-kernel, containers
  Cc: Richard Guy Briggs, arozansk, serge.hallyn, ebiederm, eparis, sgrubb

I saw no replies to my questions when I replied a year after Aris' posting, so
I don't know if it was ignored or got lost in stale threads:
        https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
        https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
        https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html

I've tried to answer a number of questions that were raised in that thread.

The goal is not quite identical to Aris' patchset.

The purpose is to track namespaces in use by logged processes from the
perspective of init_*_ns.  The first patch defines a function to list them.
The second patch provides an example of usage for audit_log_task_info() which
is used by syscall audits, among others.  audit_log_task() and
audit_common_recv_message() would be other potential use cases.

Use a serial number per namespace (unique across one boot of one kernel)
instead of the inode number (which is claimed to have had the right to change
reserved and is not necessarily unique if there is more than one proc fs).  It
could be argued that the inode numbers have now become a defacto interface and
can't change now, but I'm proposing this approach to see if this helps address
some of the objections to the earlier patchset.

There could also have messages added to track the creation and the destruction
of namespaces, listing the parent for hierarchical namespaces such as pidns,
userns, and listing other ids for non-hierarchical namespaces, as well as other
information to help identify a namespace.

There has been some progress made for audit in net namespaces and pid
namespaces since this previous thread.  net namespaces are now served as peers
by one auditd in the init_net namespace with processes in a non-init_net
namespace being able to write records if they are in the init_user_ns and have
CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
of userspace processes that try to join netlink broadcast groups.


Questions:
Is there a way to link serial numbers of namespaces involved in migration of a
container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
identifier for each running instance of a kernel?  Or at least some identifier
within the container migration realm?

What additional events should list this information?

Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
init namespace at the moment.


Proposed output format:
This differs slightly from Aristeu's patch because of the label conflict with
"pid=" due to including it in existing records rather than it being a seperate
record:
        type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)


Note: This set does not try to solve the non-init namespace audit messages and
auditd problem yet.  That will come later, likely with additional auditd
instances running in another namespace with a limited ability to influence the
master auditd.  I echo Eric B's idea that messages destined for different
namespaces would have to be tailored for that namespace with references that
make sense (such as the right pid number reported to that pid namespace, and
not leaking info about parents or peers).


Richard Guy Briggs (2):
  namespaces: give each namespace a serial number
  audit: log namespace serial numbers

 fs/mount.h                     |    1 +
 fs/namespace.c                 |    1 +
 include/linux/audit.h          |    7 +++++++
 include/linux/ipc_namespace.h  |    1 +
 include/linux/nsproxy.h        |    8 ++++++++
 include/linux/pid_namespace.h  |    1 +
 include/linux/user_namespace.h |    1 +
 include/linux/utsname.h        |    1 +
 include/net/net_namespace.h    |    1 +
 init/version.c                 |    1 +
 ipc/msgutil.c                  |    1 +
 ipc/namespace.c                |    2 ++
 kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
 kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
 kernel/pid.c                   |    1 +
 kernel/pid_namespace.c         |    2 ++
 kernel/user.c                  |    1 +
 kernel/user_namespace.c        |    2 ++
 kernel/utsname.c               |    2 ++
 net/core/net_namespace.c       |    4 +++-
 20 files changed, 99 insertions(+), 1 deletions(-)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/2] namespaces: give each namespace a serial number
  2014-04-22 18:12 ` Richard Guy Briggs
@ 2014-04-22 18:12     ` Richard Guy Briggs
  -1 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-04-22 18:12 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA, sgrubb-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Assign a serial number per namespace since boot.

Signed-off-by: Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/mount.h                     |    1 +
 fs/namespace.c                 |    1 +
 include/linux/ipc_namespace.h  |    1 +
 include/linux/nsproxy.h        |    8 ++++++++
 include/linux/pid_namespace.h  |    1 +
 include/linux/user_namespace.h |    1 +
 include/linux/utsname.h        |    1 +
 include/net/net_namespace.h    |    1 +
 init/version.c                 |    1 +
 ipc/msgutil.c                  |    1 +
 ipc/namespace.c                |    2 ++
 kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
 kernel/pid.c                   |    1 +
 kernel/pid_namespace.c         |    2 ++
 kernel/user.c                  |    1 +
 kernel/user_namespace.c        |    2 ++
 kernel/utsname.c               |    2 ++
 net/core/net_namespace.c       |    4 +++-
 18 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index b29e42f..23d041b 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -5,6 +5,7 @@
 struct mnt_namespace {
 	atomic_t		count;
 	unsigned int		proc_inum;
+	unsigned int	serial_num;
 	struct mount *	root;
 	struct list_head	list;
 	struct user_namespace	*user_ns;
diff --git a/fs/namespace.c b/fs/namespace.c
index 2ffc5a2..b4a31aa 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2472,6 +2472,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)
 		kfree(new_ns);
 		return ERR_PTR(ret);
 	}
+	new_ns->serial_num = ns_serial();
 	new_ns->seq = atomic64_add_return(1, &mnt_ns_seq);
 	atomic_set(&new_ns->count, 1);
 	new_ns->root = NULL;
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 35e7eca..ee1444f 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -69,6 +69,7 @@ struct ipc_namespace {
 	struct user_namespace *user_ns;
 
 	unsigned int	proc_inum;
+	unsigned int	serial_num;
 };
 
 extern struct ipc_namespace init_ipc_ns;
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index b4ec59d..12e1250 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -66,6 +66,14 @@ static inline struct nsproxy *task_nsproxy(struct task_struct *tsk)
 	return rcu_dereference(tsk->nsproxy);
 }
 
+unsigned int ns_serial(void);
+enum {
+	NS_IPC_INIT_SN	= 1,
+	NS_UTS_INIT_SN	= 2,
+	NS_USER_INIT_SN	= 3,
+	NS_PID_INIT_SN	= 4,
+};
+
 int copy_namespaces(unsigned long flags, struct task_struct *tsk);
 void exit_task_namespaces(struct task_struct *tsk);
 void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new);
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 7246ef3..4606e8c 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -43,6 +43,7 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	unsigned int proc_inum;
+	unsigned int	serial_num;
 };
 
 extern struct pid_namespace init_pid_ns;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 4836ba3..c70ed6b 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,6 +27,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	unsigned int	serial_num;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 239e277..ccee588 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -24,6 +24,7 @@ struct uts_namespace {
 	struct new_utsname name;
 	struct user_namespace *user_ns;
 	unsigned int proc_inum;
+	unsigned int	serial_num;
 };
 extern struct uts_namespace init_uts_ns;
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 991dcd9..1d2e2a5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct user_namespace   *user_ns;	/* Owning user namespace */
 
 	unsigned int		proc_inum;
+	unsigned int	serial_num;
 
 	struct proc_dir_entry 	*proc_net;
 	struct proc_dir_entry 	*proc_net_stat;
diff --git a/init/version.c b/init/version.c
index 1a4718e..cfdcb85 100644
--- a/init/version.c
+++ b/init/version.c
@@ -36,6 +36,7 @@ struct uts_namespace init_uts_ns = {
 	},
 	.user_ns = &init_user_ns,
 	.proc_inum = PROC_UTS_INIT_INO,
+	.serial_num = NS_UTS_INIT_SN /* ns_serial() */,
 };
 EXPORT_SYMBOL_GPL(init_uts_ns);
 
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index 7e70959..9aa66ae 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -32,6 +32,7 @@ struct ipc_namespace init_ipc_ns = {
 	.count		= ATOMIC_INIT(1),
 	.user_ns = &init_user_ns,
 	.proc_inum = PROC_IPC_INIT_INO,
+	.serial_num = NS_IPC_INIT_SN /* ns_serial() */,
 };
 
 atomic_t nr_ipc_ns = ATOMIC_INIT(1);
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 59451c1..76dac5c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -41,6 +41,8 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	}
 	atomic_inc(&nr_ipc_ns);
 
+	ns->serial_num = ns_serial();
+
 	sem_init_ns(ns);
 	msg_init_ns(ns);
 	shm_init_ns(ns);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 8e78110..212fa63 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -41,6 +41,30 @@ struct nsproxy init_nsproxy = {
 #endif
 };
 
+/**
+ * ns_serial - compute a serial number for the namespace
+ *
+ * Compute a serial number for the namespace to uniquely identify it in
+ * audit records.
+ */
+unsigned int ns_serial(void)
+{
+	static DEFINE_SPINLOCK(serial_lock);
+	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
+
+	unsigned long flags;
+	unsigned int ret;
+
+	spin_lock_irqsave(&serial_lock, flags);
+	do {
+		ret = ++serial;
+	} while (unlikely(!ret));
+	spin_unlock_irqrestore(&serial_lock, flags);
+
+	return ret;
+}
+
+
 static inline struct nsproxy *create_nsproxy(void)
 {
 	struct nsproxy *nsproxy;
diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a266..3bf7127 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -80,6 +80,7 @@ struct pid_namespace init_pid_ns = {
 	.child_reaper = &init_task,
 	.user_ns = &init_user_ns,
 	.proc_inum = PROC_PID_INIT_INO,
+	.serial_num = NS_PID_INIT_SN /* ns_serial() */,
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 06c62de..c24f207 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -109,6 +109,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	if (err)
 		goto out_free_map;
 
+	ns->serial_num = ns_serial();
+
 	kref_init(&ns->kref);
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
diff --git a/kernel/user.c b/kernel/user.c
index c006131..fb16754 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.serial_num = NS_USER_INIT_SN /* ns_serial() */,
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index dd06439..750241c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -92,6 +92,8 @@ int create_user_ns(struct cred *new)
 		return ret;
 	}
 
+	ns->serial_num = ns_serial();
+
 	atomic_set(&ns->count, 1);
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index fd39312..74fa737 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -47,6 +47,8 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 		kfree(ns);
 		return ERR_PTR(err);
 	}
+ 
+	ns->serial_num = ns_serial();
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 81d3a9a..e0b8528 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -104,8 +104,10 @@ static int ops_init(const struct pernet_operations *ops, struct net *net)
 	err = 0;
 	if (ops->init)
 		err = ops->init(net);
-	if (!err)
+	if (!err) {
+		net->serial_num = ns_serial();
 		return 0;
+	}
 
 cleanup:
 	kfree(data);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 1/2] namespaces: give each namespace a serial number
@ 2014-04-22 18:12     ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-04-22 18:12 UTC (permalink / raw)
  To: linux-audit, linux-kernel, containers
  Cc: Richard Guy Briggs, arozansk, serge.hallyn, ebiederm, eparis, sgrubb

Assign a serial number per namespace since boot.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 fs/mount.h                     |    1 +
 fs/namespace.c                 |    1 +
 include/linux/ipc_namespace.h  |    1 +
 include/linux/nsproxy.h        |    8 ++++++++
 include/linux/pid_namespace.h  |    1 +
 include/linux/user_namespace.h |    1 +
 include/linux/utsname.h        |    1 +
 include/net/net_namespace.h    |    1 +
 init/version.c                 |    1 +
 ipc/msgutil.c                  |    1 +
 ipc/namespace.c                |    2 ++
 kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
 kernel/pid.c                   |    1 +
 kernel/pid_namespace.c         |    2 ++
 kernel/user.c                  |    1 +
 kernel/user_namespace.c        |    2 ++
 kernel/utsname.c               |    2 ++
 net/core/net_namespace.c       |    4 +++-
 18 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index b29e42f..23d041b 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -5,6 +5,7 @@
 struct mnt_namespace {
 	atomic_t		count;
 	unsigned int		proc_inum;
+	unsigned int	serial_num;
 	struct mount *	root;
 	struct list_head	list;
 	struct user_namespace	*user_ns;
diff --git a/fs/namespace.c b/fs/namespace.c
index 2ffc5a2..b4a31aa 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2472,6 +2472,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)
 		kfree(new_ns);
 		return ERR_PTR(ret);
 	}
+	new_ns->serial_num = ns_serial();
 	new_ns->seq = atomic64_add_return(1, &mnt_ns_seq);
 	atomic_set(&new_ns->count, 1);
 	new_ns->root = NULL;
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 35e7eca..ee1444f 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -69,6 +69,7 @@ struct ipc_namespace {
 	struct user_namespace *user_ns;
 
 	unsigned int	proc_inum;
+	unsigned int	serial_num;
 };
 
 extern struct ipc_namespace init_ipc_ns;
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index b4ec59d..12e1250 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -66,6 +66,14 @@ static inline struct nsproxy *task_nsproxy(struct task_struct *tsk)
 	return rcu_dereference(tsk->nsproxy);
 }
 
+unsigned int ns_serial(void);
+enum {
+	NS_IPC_INIT_SN	= 1,
+	NS_UTS_INIT_SN	= 2,
+	NS_USER_INIT_SN	= 3,
+	NS_PID_INIT_SN	= 4,
+};
+
 int copy_namespaces(unsigned long flags, struct task_struct *tsk);
 void exit_task_namespaces(struct task_struct *tsk);
 void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new);
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 7246ef3..4606e8c 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -43,6 +43,7 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	unsigned int proc_inum;
+	unsigned int	serial_num;
 };
 
 extern struct pid_namespace init_pid_ns;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 4836ba3..c70ed6b 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,6 +27,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	unsigned int	serial_num;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 239e277..ccee588 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -24,6 +24,7 @@ struct uts_namespace {
 	struct new_utsname name;
 	struct user_namespace *user_ns;
 	unsigned int proc_inum;
+	unsigned int	serial_num;
 };
 extern struct uts_namespace init_uts_ns;
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 991dcd9..1d2e2a5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct user_namespace   *user_ns;	/* Owning user namespace */
 
 	unsigned int		proc_inum;
+	unsigned int	serial_num;
 
 	struct proc_dir_entry 	*proc_net;
 	struct proc_dir_entry 	*proc_net_stat;
diff --git a/init/version.c b/init/version.c
index 1a4718e..cfdcb85 100644
--- a/init/version.c
+++ b/init/version.c
@@ -36,6 +36,7 @@ struct uts_namespace init_uts_ns = {
 	},
 	.user_ns = &init_user_ns,
 	.proc_inum = PROC_UTS_INIT_INO,
+	.serial_num = NS_UTS_INIT_SN /* ns_serial() */,
 };
 EXPORT_SYMBOL_GPL(init_uts_ns);
 
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index 7e70959..9aa66ae 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -32,6 +32,7 @@ struct ipc_namespace init_ipc_ns = {
 	.count		= ATOMIC_INIT(1),
 	.user_ns = &init_user_ns,
 	.proc_inum = PROC_IPC_INIT_INO,
+	.serial_num = NS_IPC_INIT_SN /* ns_serial() */,
 };
 
 atomic_t nr_ipc_ns = ATOMIC_INIT(1);
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 59451c1..76dac5c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -41,6 +41,8 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	}
 	atomic_inc(&nr_ipc_ns);
 
+	ns->serial_num = ns_serial();
+
 	sem_init_ns(ns);
 	msg_init_ns(ns);
 	shm_init_ns(ns);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 8e78110..212fa63 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -41,6 +41,30 @@ struct nsproxy init_nsproxy = {
 #endif
 };
 
+/**
+ * ns_serial - compute a serial number for the namespace
+ *
+ * Compute a serial number for the namespace to uniquely identify it in
+ * audit records.
+ */
+unsigned int ns_serial(void)
+{
+	static DEFINE_SPINLOCK(serial_lock);
+	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
+
+	unsigned long flags;
+	unsigned int ret;
+
+	spin_lock_irqsave(&serial_lock, flags);
+	do {
+		ret = ++serial;
+	} while (unlikely(!ret));
+	spin_unlock_irqrestore(&serial_lock, flags);
+
+	return ret;
+}
+
+
 static inline struct nsproxy *create_nsproxy(void)
 {
 	struct nsproxy *nsproxy;
diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a266..3bf7127 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -80,6 +80,7 @@ struct pid_namespace init_pid_ns = {
 	.child_reaper = &init_task,
 	.user_ns = &init_user_ns,
 	.proc_inum = PROC_PID_INIT_INO,
+	.serial_num = NS_PID_INIT_SN /* ns_serial() */,
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 06c62de..c24f207 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -109,6 +109,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	if (err)
 		goto out_free_map;
 
+	ns->serial_num = ns_serial();
+
 	kref_init(&ns->kref);
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
diff --git a/kernel/user.c b/kernel/user.c
index c006131..fb16754 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.serial_num = NS_USER_INIT_SN /* ns_serial() */,
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index dd06439..750241c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -92,6 +92,8 @@ int create_user_ns(struct cred *new)
 		return ret;
 	}
 
+	ns->serial_num = ns_serial();
+
 	atomic_set(&ns->count, 1);
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index fd39312..74fa737 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -47,6 +47,8 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 		kfree(ns);
 		return ERR_PTR(err);
 	}
+ 
+	ns->serial_num = ns_serial();
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 81d3a9a..e0b8528 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -104,8 +104,10 @@ static int ops_init(const struct pernet_operations *ops, struct net *net)
 	err = 0;
 	if (ops->init)
 		err = ops->init(net);
-	if (!err)
+	if (!err) {
+		net->serial_num = ns_serial();
 		return 0;
+	}
 
 cleanup:
 	kfree(data);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] audit: log namespace serial numbers
  2014-04-22 18:12 ` Richard Guy Briggs
@ 2014-04-22 18:12     ` Richard Guy Briggs
  -1 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-04-22 18:12 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA, sgrubb-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Log the namespace details of a task.

Signed-off-by: Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/audit.h |    7 +++++++
 kernel/audit.c        |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22cfddb..0ef404a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -101,6 +101,13 @@ extern int __weak audit_classify_compat_syscall(int abi, unsigned syscall);
 struct filename;
 
 extern void audit_log_session_info(struct audit_buffer *ab);
+#ifdef CONFIG_NAMESPACES
+extern void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk);
+#else
+void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
+{
+}
+#endif
 
 #ifdef CONFIG_AUDIT_COMPAT_GENERIC
 #define audit_is_compat(arch)  (!((arch) & __AUDIT_ARCH_64BIT))
diff --git a/kernel/audit.c b/kernel/audit.c
index 59c0bbe..9049049 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -64,7 +64,15 @@
 #endif
 #include <linux/freezer.h>
 #include <linux/tty.h>
+#include <linux/nsproxy.h>
+#include <linux/utsname.h>
+#include <linux/ipc_namespace.h>
+#include "../fs/mount.h"
+#include <linux/mount.h>
+#include <linux/mnt_namespace.h>
 #include <linux/pid_namespace.h>
+#include <net/net_namespace.h>
+#include <linux/user_namespace.h>
 #include <net/netns/generic.h>
 
 #include "audit.h"
@@ -1617,6 +1625,35 @@ void audit_log_session_info(struct audit_buffer *ab)
 	audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
 }
 
+#ifdef CONFIG_NAMESPACES
+void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
+{
+	struct nsproxy *nsproxy;
+
+	rcu_read_lock();
+	nsproxy = task_nsproxy(tsk);
+	if (nsproxy != NULL) {
+		audit_log_format(ab, " mntns=%x", nsproxy->mnt_ns->serial_num);
+#ifdef CONFIG_NET_NS
+		audit_log_format(ab, " netns=%x", nsproxy->net_ns->serial_num);
+#endif
+#ifdef CONFIG_UTS_NS
+		audit_log_format(ab, " utsns=%x", nsproxy->uts_ns->serial_num);
+#endif
+#ifdef CONFIG_IPC_NS
+		audit_log_format(ab, " ipcns=%x", nsproxy->ipc_ns->serial_num);
+#endif
+	}
+#ifdef CONFIG_PID_NS
+	audit_log_format(ab, " pidns=%x", task_active_pid_ns(tsk)->serial_num);
+#endif
+#ifdef CONFIG_USER_NS
+	audit_log_format(ab, " userns=%x", task_cred_xxx(tsk, user_ns)->serial_num);
+#endif
+	rcu_read_unlock();
+}
+#endif /* CONFIG_NAMESPACES */
+
 void audit_log_key(struct audit_buffer *ab, char *key)
 {
 	audit_log_format(ab, " key=");
@@ -1861,6 +1898,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
 		up_read(&mm->mmap_sem);
 	} else
 		audit_log_format(ab, " exe=(null)");
+	audit_log_namespace_info(ab, tsk);
 	audit_log_task_context(ab);
 }
 EXPORT_SYMBOL(audit_log_task_info);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] audit: log namespace serial numbers
@ 2014-04-22 18:12     ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-04-22 18:12 UTC (permalink / raw)
  To: linux-audit, linux-kernel, containers
  Cc: Richard Guy Briggs, arozansk, serge.hallyn, ebiederm, eparis, sgrubb

Log the namespace details of a task.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 include/linux/audit.h |    7 +++++++
 kernel/audit.c        |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22cfddb..0ef404a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -101,6 +101,13 @@ extern int __weak audit_classify_compat_syscall(int abi, unsigned syscall);
 struct filename;
 
 extern void audit_log_session_info(struct audit_buffer *ab);
+#ifdef CONFIG_NAMESPACES
+extern void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk);
+#else
+void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
+{
+}
+#endif
 
 #ifdef CONFIG_AUDIT_COMPAT_GENERIC
 #define audit_is_compat(arch)  (!((arch) & __AUDIT_ARCH_64BIT))
diff --git a/kernel/audit.c b/kernel/audit.c
index 59c0bbe..9049049 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -64,7 +64,15 @@
 #endif
 #include <linux/freezer.h>
 #include <linux/tty.h>
+#include <linux/nsproxy.h>
+#include <linux/utsname.h>
+#include <linux/ipc_namespace.h>
+#include "../fs/mount.h"
+#include <linux/mount.h>
+#include <linux/mnt_namespace.h>
 #include <linux/pid_namespace.h>
+#include <net/net_namespace.h>
+#include <linux/user_namespace.h>
 #include <net/netns/generic.h>
 
 #include "audit.h"
@@ -1617,6 +1625,35 @@ void audit_log_session_info(struct audit_buffer *ab)
 	audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
 }
 
+#ifdef CONFIG_NAMESPACES
+void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
+{
+	struct nsproxy *nsproxy;
+
+	rcu_read_lock();
+	nsproxy = task_nsproxy(tsk);
+	if (nsproxy != NULL) {
+		audit_log_format(ab, " mntns=%x", nsproxy->mnt_ns->serial_num);
+#ifdef CONFIG_NET_NS
+		audit_log_format(ab, " netns=%x", nsproxy->net_ns->serial_num);
+#endif
+#ifdef CONFIG_UTS_NS
+		audit_log_format(ab, " utsns=%x", nsproxy->uts_ns->serial_num);
+#endif
+#ifdef CONFIG_IPC_NS
+		audit_log_format(ab, " ipcns=%x", nsproxy->ipc_ns->serial_num);
+#endif
+	}
+#ifdef CONFIG_PID_NS
+	audit_log_format(ab, " pidns=%x", task_active_pid_ns(tsk)->serial_num);
+#endif
+#ifdef CONFIG_USER_NS
+	audit_log_format(ab, " userns=%x", task_cred_xxx(tsk, user_ns)->serial_num);
+#endif
+	rcu_read_unlock();
+}
+#endif /* CONFIG_NAMESPACES */
+
 void audit_log_key(struct audit_buffer *ab, char *key)
 {
 	audit_log_format(ab, " key=");
@@ -1861,6 +1898,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
 		up_read(&mm->mmap_sem);
 	} else
 		audit_log_format(ab, " exe=(null)");
+	audit_log_namespace_info(ab, tsk);
 	audit_log_task_context(ab);
 }
 EXPORT_SYMBOL(audit_log_task_info);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-04-22 18:12 ` Richard Guy Briggs
@ 2014-05-01 22:32     ` Serge E. Hallyn
  -1 siblings, 0 replies; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-01 22:32 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> I saw no replies to my questions when I replied a year after Aris' posting, so
> I don't know if it was ignored or got lost in stale threads:
>         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
>         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
>         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> 
> I've tried to answer a number of questions that were raised in that thread.
> 
> The goal is not quite identical to Aris' patchset.
> 
> The purpose is to track namespaces in use by logged processes from the
> perspective of init_*_ns.  The first patch defines a function to list them.
> The second patch provides an example of usage for audit_log_task_info() which
> is used by syscall audits, among others.  audit_log_task() and
> audit_common_recv_message() would be other potential use cases.
> 
> Use a serial number per namespace (unique across one boot of one kernel)
> instead of the inode number (which is claimed to have had the right to change
> reserved and is not necessarily unique if there is more than one proc fs).  It
> could be argued that the inode numbers have now become a defacto interface and
> can't change now, but I'm proposing this approach to see if this helps address
> some of the objections to the earlier patchset.
> 
> There could also have messages added to track the creation and the destruction
> of namespaces, listing the parent for hierarchical namespaces such as pidns,
> userns, and listing other ids for non-hierarchical namespaces, as well as other
> information to help identify a namespace.
> 
> There has been some progress made for audit in net namespaces and pid
> namespaces since this previous thread.  net namespaces are now served as peers
> by one auditd in the init_net namespace with processes in a non-init_net
> namespace being able to write records if they are in the init_user_ns and have
> CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> of userspace processes that try to join netlink broadcast groups.
> 
> 
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> identifier for each running instance of a kernel?  Or at least some identifier
> within the container migration realm?

Eric Biederman has always been adamantly opposed to adding new namespaces
of namespaces, so the fact that you're asking this question concerns me.

The way things are right now, since audit belongs to the init userns,
we can get away with saying if a container 'migrates', the new kernel
will see a different set of serials, and noone should care.  However,
if we're going to be allowing containers to have their own audit
namespace/layer/whatever, then this becomes more of a concern.

That said, I'll now look at the patches while pretending that problem
does not exist :)  If I ack, it'll be on correctness of the code, but
we'll still have to deal with this issue.

> What additional events should list this information?
> 
> Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> init namespace at the moment.
> 
> 
> Proposed output format:
> This differs slightly from Aristeu's patch because of the label conflict with
> "pid=" due to including it in existing records rather than it being a seperate
> record:
>         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> 
> 
> Note: This set does not try to solve the non-init namespace audit messages and
> auditd problem yet.  That will come later, likely with additional auditd
> instances running in another namespace with a limited ability to influence the
> master auditd.  I echo Eric B's idea that messages destined for different
> namespaces would have to be tailored for that namespace with references that
> make sense (such as the right pid number reported to that pid namespace, and
> not leaking info about parents or peers).
> 
> 
> Richard Guy Briggs (2):
>   namespaces: give each namespace a serial number
>   audit: log namespace serial numbers
> 
>  fs/mount.h                     |    1 +
>  fs/namespace.c                 |    1 +
>  include/linux/audit.h          |    7 +++++++
>  include/linux/ipc_namespace.h  |    1 +
>  include/linux/nsproxy.h        |    8 ++++++++
>  include/linux/pid_namespace.h  |    1 +
>  include/linux/user_namespace.h |    1 +
>  include/linux/utsname.h        |    1 +
>  include/net/net_namespace.h    |    1 +
>  init/version.c                 |    1 +
>  ipc/msgutil.c                  |    1 +
>  ipc/namespace.c                |    2 ++
>  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
>  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
>  kernel/pid.c                   |    1 +
>  kernel/pid_namespace.c         |    2 ++
>  kernel/user.c                  |    1 +
>  kernel/user_namespace.c        |    2 ++
>  kernel/utsname.c               |    2 ++
>  net/core/net_namespace.c       |    4 +++-
>  20 files changed, 99 insertions(+), 1 deletions(-)
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-01 22:32     ` Serge E. Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-01 22:32 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

Quoting Richard Guy Briggs (rgb@redhat.com):
> I saw no replies to my questions when I replied a year after Aris' posting, so
> I don't know if it was ignored or got lost in stale threads:
>         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
>         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
>         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> 
> I've tried to answer a number of questions that were raised in that thread.
> 
> The goal is not quite identical to Aris' patchset.
> 
> The purpose is to track namespaces in use by logged processes from the
> perspective of init_*_ns.  The first patch defines a function to list them.
> The second patch provides an example of usage for audit_log_task_info() which
> is used by syscall audits, among others.  audit_log_task() and
> audit_common_recv_message() would be other potential use cases.
> 
> Use a serial number per namespace (unique across one boot of one kernel)
> instead of the inode number (which is claimed to have had the right to change
> reserved and is not necessarily unique if there is more than one proc fs).  It
> could be argued that the inode numbers have now become a defacto interface and
> can't change now, but I'm proposing this approach to see if this helps address
> some of the objections to the earlier patchset.
> 
> There could also have messages added to track the creation and the destruction
> of namespaces, listing the parent for hierarchical namespaces such as pidns,
> userns, and listing other ids for non-hierarchical namespaces, as well as other
> information to help identify a namespace.
> 
> There has been some progress made for audit in net namespaces and pid
> namespaces since this previous thread.  net namespaces are now served as peers
> by one auditd in the init_net namespace with processes in a non-init_net
> namespace being able to write records if they are in the init_user_ns and have
> CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> of userspace processes that try to join netlink broadcast groups.
> 
> 
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> identifier for each running instance of a kernel?  Or at least some identifier
> within the container migration realm?

Eric Biederman has always been adamantly opposed to adding new namespaces
of namespaces, so the fact that you're asking this question concerns me.

The way things are right now, since audit belongs to the init userns,
we can get away with saying if a container 'migrates', the new kernel
will see a different set of serials, and noone should care.  However,
if we're going to be allowing containers to have their own audit
namespace/layer/whatever, then this becomes more of a concern.

That said, I'll now look at the patches while pretending that problem
does not exist :)  If I ack, it'll be on correctness of the code, but
we'll still have to deal with this issue.

> What additional events should list this information?
> 
> Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> init namespace at the moment.
> 
> 
> Proposed output format:
> This differs slightly from Aristeu's patch because of the label conflict with
> "pid=" due to including it in existing records rather than it being a seperate
> record:
>         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> 
> 
> Note: This set does not try to solve the non-init namespace audit messages and
> auditd problem yet.  That will come later, likely with additional auditd
> instances running in another namespace with a limited ability to influence the
> master auditd.  I echo Eric B's idea that messages destined for different
> namespaces would have to be tailored for that namespace with references that
> make sense (such as the right pid number reported to that pid namespace, and
> not leaking info about parents or peers).
> 
> 
> Richard Guy Briggs (2):
>   namespaces: give each namespace a serial number
>   audit: log namespace serial numbers
> 
>  fs/mount.h                     |    1 +
>  fs/namespace.c                 |    1 +
>  include/linux/audit.h          |    7 +++++++
>  include/linux/ipc_namespace.h  |    1 +
>  include/linux/nsproxy.h        |    8 ++++++++
>  include/linux/pid_namespace.h  |    1 +
>  include/linux/user_namespace.h |    1 +
>  include/linux/utsname.h        |    1 +
>  include/net/net_namespace.h    |    1 +
>  init/version.c                 |    1 +
>  ipc/msgutil.c                  |    1 +
>  ipc/namespace.c                |    2 ++
>  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
>  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
>  kernel/pid.c                   |    1 +
>  kernel/pid_namespace.c         |    2 ++
>  kernel/user.c                  |    1 +
>  kernel/user_namespace.c        |    2 ++
>  kernel/utsname.c               |    2 ++
>  net/core/net_namespace.c       |    4 +++-
>  20 files changed, 99 insertions(+), 1 deletions(-)
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] namespaces: give each namespace a serial number
  2014-04-22 18:12     ` Richard Guy Briggs
@ 2014-05-01 22:51         ` Serge E. Hallyn
  -1 siblings, 0 replies; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-01 22:51 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):

Most of this looks reasonable, but I'm curious about something,

> +/**
> + * ns_serial - compute a serial number for the namespace
> + *
> + * Compute a serial number for the namespace to uniquely identify it in
> + * audit records.
> + */
> +unsigned int ns_serial(void)
> +{
> +	static DEFINE_SPINLOCK(serial_lock);
> +	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
> +
> +	unsigned long flags;
> +	unsigned int ret;
> +
> +	spin_lock_irqsave(&serial_lock, flags);
> +	do {
> +		ret = ++serial;
> +	} while (unlikely(!ret));

Why exactly are you doing this?  Surely if serial is going to
wrap around we've got a bigger problem than just wanting go
bump one more time?

> +	spin_unlock_irqrestore(&serial_lock, flags);
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] namespaces: give each namespace a serial number
@ 2014-05-01 22:51         ` Serge E. Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-01 22:51 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

Quoting Richard Guy Briggs (rgb@redhat.com):

Most of this looks reasonable, but I'm curious about something,

> +/**
> + * ns_serial - compute a serial number for the namespace
> + *
> + * Compute a serial number for the namespace to uniquely identify it in
> + * audit records.
> + */
> +unsigned int ns_serial(void)
> +{
> +	static DEFINE_SPINLOCK(serial_lock);
> +	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
> +
> +	unsigned long flags;
> +	unsigned int ret;
> +
> +	spin_lock_irqsave(&serial_lock, flags);
> +	do {
> +		ret = ++serial;
> +	} while (unlikely(!ret));

Why exactly are you doing this?  Surely if serial is going to
wrap around we've got a bigger problem than just wanting go
bump one more time?

> +	spin_unlock_irqrestore(&serial_lock, flags);
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] audit: log namespace serial numbers
  2014-04-22 18:12     ` Richard Guy Briggs
@ 2014-05-01 23:01         ` Serge E. Hallyn
  -1 siblings, 0 replies; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-01 23:01 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> Log the namespace details of a task.
> 
> Signed-off-by: Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Looks good, and does look useful.  We'll certainly want to also dump
netns and userns for some target objects eventually.

> ---
>  include/linux/audit.h |    7 +++++++
>  kernel/audit.c        |   38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 22cfddb..0ef404a 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -101,6 +101,13 @@ extern int __weak audit_classify_compat_syscall(int abi, unsigned syscall);
>  struct filename;
>  
>  extern void audit_log_session_info(struct audit_buffer *ab);
> +#ifdef CONFIG_NAMESPACES
> +extern void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk);
> +#else
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
> +{
> +}
> +#endif
>  
>  #ifdef CONFIG_AUDIT_COMPAT_GENERIC
>  #define audit_is_compat(arch)  (!((arch) & __AUDIT_ARCH_64BIT))
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 59c0bbe..9049049 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -64,7 +64,15 @@
>  #endif
>  #include <linux/freezer.h>
>  #include <linux/tty.h>
> +#include <linux/nsproxy.h>
> +#include <linux/utsname.h>
> +#include <linux/ipc_namespace.h>
> +#include "../fs/mount.h"
> +#include <linux/mount.h>
> +#include <linux/mnt_namespace.h>
>  #include <linux/pid_namespace.h>
> +#include <net/net_namespace.h>
> +#include <linux/user_namespace.h>
>  #include <net/netns/generic.h>
>  
>  #include "audit.h"
> @@ -1617,6 +1625,35 @@ void audit_log_session_info(struct audit_buffer *ab)
>  	audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
>  }
>  
> +#ifdef CONFIG_NAMESPACES
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
> +{
> +	struct nsproxy *nsproxy;
> +
> +	rcu_read_lock();
> +	nsproxy = task_nsproxy(tsk);
> +	if (nsproxy != NULL) {
> +		audit_log_format(ab, " mntns=%x", nsproxy->mnt_ns->serial_num);
> +#ifdef CONFIG_NET_NS
> +		audit_log_format(ab, " netns=%x", nsproxy->net_ns->serial_num);
> +#endif
> +#ifdef CONFIG_UTS_NS
> +		audit_log_format(ab, " utsns=%x", nsproxy->uts_ns->serial_num);
> +#endif
> +#ifdef CONFIG_IPC_NS
> +		audit_log_format(ab, " ipcns=%x", nsproxy->ipc_ns->serial_num);
> +#endif
> +	}
> +#ifdef CONFIG_PID_NS
> +	audit_log_format(ab, " pidns=%x", task_active_pid_ns(tsk)->serial_num);
> +#endif
> +#ifdef CONFIG_USER_NS
> +	audit_log_format(ab, " userns=%x", task_cred_xxx(tsk, user_ns)->serial_num);
> +#endif
> +	rcu_read_unlock();
> +}
> +#endif /* CONFIG_NAMESPACES */
> +
>  void audit_log_key(struct audit_buffer *ab, char *key)
>  {
>  	audit_log_format(ab, " key=");
> @@ -1861,6 +1898,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
>  		up_read(&mm->mmap_sem);
>  	} else
>  		audit_log_format(ab, " exe=(null)");
> +	audit_log_namespace_info(ab, tsk);
>  	audit_log_task_context(ab);
>  }
>  EXPORT_SYMBOL(audit_log_task_info);
> -- 
> 1.7.1
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] audit: log namespace serial numbers
@ 2014-05-01 23:01         ` Serge E. Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-01 23:01 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

Quoting Richard Guy Briggs (rgb@redhat.com):
> Log the namespace details of a task.
> 
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

Looks good, and does look useful.  We'll certainly want to also dump
netns and userns for some target objects eventually.

> ---
>  include/linux/audit.h |    7 +++++++
>  kernel/audit.c        |   38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 22cfddb..0ef404a 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -101,6 +101,13 @@ extern int __weak audit_classify_compat_syscall(int abi, unsigned syscall);
>  struct filename;
>  
>  extern void audit_log_session_info(struct audit_buffer *ab);
> +#ifdef CONFIG_NAMESPACES
> +extern void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk);
> +#else
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
> +{
> +}
> +#endif
>  
>  #ifdef CONFIG_AUDIT_COMPAT_GENERIC
>  #define audit_is_compat(arch)  (!((arch) & __AUDIT_ARCH_64BIT))
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 59c0bbe..9049049 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -64,7 +64,15 @@
>  #endif
>  #include <linux/freezer.h>
>  #include <linux/tty.h>
> +#include <linux/nsproxy.h>
> +#include <linux/utsname.h>
> +#include <linux/ipc_namespace.h>
> +#include "../fs/mount.h"
> +#include <linux/mount.h>
> +#include <linux/mnt_namespace.h>
>  #include <linux/pid_namespace.h>
> +#include <net/net_namespace.h>
> +#include <linux/user_namespace.h>
>  #include <net/netns/generic.h>
>  
>  #include "audit.h"
> @@ -1617,6 +1625,35 @@ void audit_log_session_info(struct audit_buffer *ab)
>  	audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
>  }
>  
> +#ifdef CONFIG_NAMESPACES
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct *tsk)
> +{
> +	struct nsproxy *nsproxy;
> +
> +	rcu_read_lock();
> +	nsproxy = task_nsproxy(tsk);
> +	if (nsproxy != NULL) {
> +		audit_log_format(ab, " mntns=%x", nsproxy->mnt_ns->serial_num);
> +#ifdef CONFIG_NET_NS
> +		audit_log_format(ab, " netns=%x", nsproxy->net_ns->serial_num);
> +#endif
> +#ifdef CONFIG_UTS_NS
> +		audit_log_format(ab, " utsns=%x", nsproxy->uts_ns->serial_num);
> +#endif
> +#ifdef CONFIG_IPC_NS
> +		audit_log_format(ab, " ipcns=%x", nsproxy->ipc_ns->serial_num);
> +#endif
> +	}
> +#ifdef CONFIG_PID_NS
> +	audit_log_format(ab, " pidns=%x", task_active_pid_ns(tsk)->serial_num);
> +#endif
> +#ifdef CONFIG_USER_NS
> +	audit_log_format(ab, " userns=%x", task_cred_xxx(tsk, user_ns)->serial_num);
> +#endif
> +	rcu_read_unlock();
> +}
> +#endif /* CONFIG_NAMESPACES */
> +
>  void audit_log_key(struct audit_buffer *ab, char *key)
>  {
>  	audit_log_format(ab, " key=");
> @@ -1861,6 +1898,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
>  		up_read(&mm->mmap_sem);
>  	} else
>  		audit_log_format(ab, " exe=(null)");
> +	audit_log_namespace_info(ab, tsk);
>  	audit_log_task_context(ab);
>  }
>  EXPORT_SYMBOL(audit_log_task_info);
> -- 
> 1.7.1
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] namespaces: give each namespace a serial number
  2014-05-01 22:51         ` Serge E. Hallyn
@ 2014-05-02 14:15             ` Richard Guy Briggs
  -1 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-02 14:15 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

On 14/05/02, Serge E. Hallyn wrote:
> Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> 
> Most of this looks reasonable, but I'm curious about something,
> 
> > +/**
> > + * ns_serial - compute a serial number for the namespace
> > + *
> > + * Compute a serial number for the namespace to uniquely identify it in
> > + * audit records.
> > + */
> > +unsigned int ns_serial(void)
> > +{
> > +	static DEFINE_SPINLOCK(serial_lock);
> > +	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
> > +
> > +	unsigned long flags;
> > +	unsigned int ret;
> > +
> > +	spin_lock_irqsave(&serial_lock, flags);
> > +	do {
> > +		ret = ++serial;
> > +	} while (unlikely(!ret));
> 
> Why exactly are you doing this?  Surely if serial is going to
> wrap around we've got a bigger problem than just wanting go
> bump one more time?

Thanks for catching this.
The code was templated off audit_serial() which tries to solve a
different problem and rolling it is much more likely.  I hadn't noticed
that rollover protection.  However, I *had* thought of making it a long
(which would be the same size on 32-bit arches, but larger on 64-bit)
since a 64-bit system is more likely to roll it out of sheer speed and
resource availability.  But perhaps a long long would be safer.

> > +	spin_unlock_irqrestore(&serial_lock, flags);
> > +
> > +	return ret;
> > +}

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] namespaces: give each namespace a serial number
@ 2014-05-02 14:15             ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-02 14:15 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

On 14/05/02, Serge E. Hallyn wrote:
> Quoting Richard Guy Briggs (rgb@redhat.com):
> 
> Most of this looks reasonable, but I'm curious about something,
> 
> > +/**
> > + * ns_serial - compute a serial number for the namespace
> > + *
> > + * Compute a serial number for the namespace to uniquely identify it in
> > + * audit records.
> > + */
> > +unsigned int ns_serial(void)
> > +{
> > +	static DEFINE_SPINLOCK(serial_lock);
> > +	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
> > +
> > +	unsigned long flags;
> > +	unsigned int ret;
> > +
> > +	spin_lock_irqsave(&serial_lock, flags);
> > +	do {
> > +		ret = ++serial;
> > +	} while (unlikely(!ret));
> 
> Why exactly are you doing this?  Surely if serial is going to
> wrap around we've got a bigger problem than just wanting go
> bump one more time?

Thanks for catching this.
The code was templated off audit_serial() which tries to solve a
different problem and rolling it is much more likely.  I hadn't noticed
that rollover protection.  However, I *had* thought of making it a long
(which would be the same size on 32-bit arches, but larger on 64-bit)
since a 64-bit system is more likely to roll it out of sheer speed and
resource availability.  But perhaps a long long would be safer.

> > +	spin_unlock_irqrestore(&serial_lock, flags);
> > +
> > +	return ret;
> > +}

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-01 22:32     ` Serge E. Hallyn
@ 2014-05-02 14:28         ` Richard Guy Briggs
  -1 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-02 14:28 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

On 14/05/02, Serge E. Hallyn wrote:
> Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > I saw no replies to my questions when I replied a year after Aris' posting, so
> > I don't know if it was ignored or got lost in stale threads:
> >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > 
> > I've tried to answer a number of questions that were raised in that thread.
> > 
> > The goal is not quite identical to Aris' patchset.
> > 
> > The purpose is to track namespaces in use by logged processes from the
> > perspective of init_*_ns.  The first patch defines a function to list them.
> > The second patch provides an example of usage for audit_log_task_info() which
> > is used by syscall audits, among others.  audit_log_task() and
> > audit_common_recv_message() would be other potential use cases.
> > 
> > Use a serial number per namespace (unique across one boot of one kernel)
> > instead of the inode number (which is claimed to have had the right to change
> > reserved and is not necessarily unique if there is more than one proc fs).  It
> > could be argued that the inode numbers have now become a defacto interface and
> > can't change now, but I'm proposing this approach to see if this helps address
> > some of the objections to the earlier patchset.
> > 
> > There could also have messages added to track the creation and the destruction
> > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > information to help identify a namespace.
> > 
> > There has been some progress made for audit in net namespaces and pid
> > namespaces since this previous thread.  net namespaces are now served as peers
> > by one auditd in the init_net namespace with processes in a non-init_net
> > namespace being able to write records if they are in the init_user_ns and have
> > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > of userspace processes that try to join netlink broadcast groups.
> > 
> > 
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > identifier for each running instance of a kernel?  Or at least some identifier
> > within the container migration realm?
> 
> Eric Biederman has always been adamantly opposed to adding new namespaces
> of namespaces, so the fact that you're asking this question concerns me.

I have seen that position and I don't fully understand the justification
for it other than added complexity.

One way that occured to me to be able to identify a kernel instance was
to look at CPU serial numbers or other CPU entity intended to be
globally unique, but that isn't universally available.

Another possibility was RTC reading at time of boot, but that isn't good
enough either.

Both are dubious in VMs anyways.

> The way things are right now, since audit belongs to the init userns,
> we can get away with saying if a container 'migrates', the new kernel
> will see a different set of serials, and noone should care.  However,
> if we're going to be allowing containers to have their own audit
> namespace/layer/whatever, then this becomes more of a concern.

Having a container have its own audit daemon (partitionned appropriately
in the kernel) would be a long-term goal.

> That said, I'll now look at the patches while pretending that problem
> does not exist :)  If I ack, it'll be on correctness of the code, but
> we'll still have to deal with this issue.

Getting some discussion about this migration challenge was a significant
motivation for posting this patch, so I'm hoping others will weigh in.

Thanks for your review, Serge.

> > What additional events should list this information?
> > 
> > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > init namespace at the moment.
> > 
> > 
> > Proposed output format:
> > This differs slightly from Aristeu's patch because of the label conflict with
> > "pid=" due to including it in existing records rather than it being a seperate
> > record:
> >         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> > 
> > 
> > Note: This set does not try to solve the non-init namespace audit messages and
> > auditd problem yet.  That will come later, likely with additional auditd
> > instances running in another namespace with a limited ability to influence the
> > master auditd.  I echo Eric B's idea that messages destined for different
> > namespaces would have to be tailored for that namespace with references that
> > make sense (such as the right pid number reported to that pid namespace, and
> > not leaking info about parents or peers).
> > 
> > 
> > Richard Guy Briggs (2):
> >   namespaces: give each namespace a serial number
> >   audit: log namespace serial numbers
> > 
> >  fs/mount.h                     |    1 +
> >  fs/namespace.c                 |    1 +
> >  include/linux/audit.h          |    7 +++++++
> >  include/linux/ipc_namespace.h  |    1 +
> >  include/linux/nsproxy.h        |    8 ++++++++
> >  include/linux/pid_namespace.h  |    1 +
> >  include/linux/user_namespace.h |    1 +
> >  include/linux/utsname.h        |    1 +
> >  include/net/net_namespace.h    |    1 +
> >  init/version.c                 |    1 +
> >  ipc/msgutil.c                  |    1 +
> >  ipc/namespace.c                |    2 ++
> >  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
> >  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
> >  kernel/pid.c                   |    1 +
> >  kernel/pid_namespace.c         |    2 ++
> >  kernel/user.c                  |    1 +
> >  kernel/user_namespace.c        |    2 ++
> >  kernel/utsname.c               |    2 ++
> >  net/core/net_namespace.c       |    4 +++-
> >  20 files changed, 99 insertions(+), 1 deletions(-)
> > 
> > _______________________________________________
> > Containers mailing list
> > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-02 14:28         ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-02 14:28 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

On 14/05/02, Serge E. Hallyn wrote:
> Quoting Richard Guy Briggs (rgb@redhat.com):
> > I saw no replies to my questions when I replied a year after Aris' posting, so
> > I don't know if it was ignored or got lost in stale threads:
> >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > 
> > I've tried to answer a number of questions that were raised in that thread.
> > 
> > The goal is not quite identical to Aris' patchset.
> > 
> > The purpose is to track namespaces in use by logged processes from the
> > perspective of init_*_ns.  The first patch defines a function to list them.
> > The second patch provides an example of usage for audit_log_task_info() which
> > is used by syscall audits, among others.  audit_log_task() and
> > audit_common_recv_message() would be other potential use cases.
> > 
> > Use a serial number per namespace (unique across one boot of one kernel)
> > instead of the inode number (which is claimed to have had the right to change
> > reserved and is not necessarily unique if there is more than one proc fs).  It
> > could be argued that the inode numbers have now become a defacto interface and
> > can't change now, but I'm proposing this approach to see if this helps address
> > some of the objections to the earlier patchset.
> > 
> > There could also have messages added to track the creation and the destruction
> > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > information to help identify a namespace.
> > 
> > There has been some progress made for audit in net namespaces and pid
> > namespaces since this previous thread.  net namespaces are now served as peers
> > by one auditd in the init_net namespace with processes in a non-init_net
> > namespace being able to write records if they are in the init_user_ns and have
> > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > of userspace processes that try to join netlink broadcast groups.
> > 
> > 
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > identifier for each running instance of a kernel?  Or at least some identifier
> > within the container migration realm?
> 
> Eric Biederman has always been adamantly opposed to adding new namespaces
> of namespaces, so the fact that you're asking this question concerns me.

I have seen that position and I don't fully understand the justification
for it other than added complexity.

One way that occured to me to be able to identify a kernel instance was
to look at CPU serial numbers or other CPU entity intended to be
globally unique, but that isn't universally available.

Another possibility was RTC reading at time of boot, but that isn't good
enough either.

Both are dubious in VMs anyways.

> The way things are right now, since audit belongs to the init userns,
> we can get away with saying if a container 'migrates', the new kernel
> will see a different set of serials, and noone should care.  However,
> if we're going to be allowing containers to have their own audit
> namespace/layer/whatever, then this becomes more of a concern.

Having a container have its own audit daemon (partitionned appropriately
in the kernel) would be a long-term goal.

> That said, I'll now look at the patches while pretending that problem
> does not exist :)  If I ack, it'll be on correctness of the code, but
> we'll still have to deal with this issue.

Getting some discussion about this migration challenge was a significant
motivation for posting this patch, so I'm hoping others will weigh in.

Thanks for your review, Serge.

> > What additional events should list this information?
> > 
> > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > init namespace at the moment.
> > 
> > 
> > Proposed output format:
> > This differs slightly from Aristeu's patch because of the label conflict with
> > "pid=" due to including it in existing records rather than it being a seperate
> > record:
> >         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> > 
> > 
> > Note: This set does not try to solve the non-init namespace audit messages and
> > auditd problem yet.  That will come later, likely with additional auditd
> > instances running in another namespace with a limited ability to influence the
> > master auditd.  I echo Eric B's idea that messages destined for different
> > namespaces would have to be tailored for that namespace with references that
> > make sense (such as the right pid number reported to that pid namespace, and
> > not leaking info about parents or peers).
> > 
> > 
> > Richard Guy Briggs (2):
> >   namespaces: give each namespace a serial number
> >   audit: log namespace serial numbers
> > 
> >  fs/mount.h                     |    1 +
> >  fs/namespace.c                 |    1 +
> >  include/linux/audit.h          |    7 +++++++
> >  include/linux/ipc_namespace.h  |    1 +
> >  include/linux/nsproxy.h        |    8 ++++++++
> >  include/linux/pid_namespace.h  |    1 +
> >  include/linux/user_namespace.h |    1 +
> >  include/linux/utsname.h        |    1 +
> >  include/net/net_namespace.h    |    1 +
> >  init/version.c                 |    1 +
> >  ipc/msgutil.c                  |    1 +
> >  ipc/namespace.c                |    2 ++
> >  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
> >  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
> >  kernel/pid.c                   |    1 +
> >  kernel/pid_namespace.c         |    2 ++
> >  kernel/user.c                  |    1 +
> >  kernel/user_namespace.c        |    2 ++
> >  kernel/utsname.c               |    2 ++
> >  net/core/net_namespace.c       |    4 +++-
> >  20 files changed, 99 insertions(+), 1 deletions(-)
> > 
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] namespaces: give each namespace a serial number
  2014-05-02 14:15             ` Richard Guy Briggs
@ 2014-05-02 20:50                 ` Serge Hallyn
  -1 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-02 20:50 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > 
> > Most of this looks reasonable, but I'm curious about something,
> > 
> > > +/**
> > > + * ns_serial - compute a serial number for the namespace
> > > + *
> > > + * Compute a serial number for the namespace to uniquely identify it in
> > > + * audit records.
> > > + */
> > > +unsigned int ns_serial(void)
> > > +{
> > > +	static DEFINE_SPINLOCK(serial_lock);
> > > +	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
> > > +
> > > +	unsigned long flags;
> > > +	unsigned int ret;
> > > +
> > > +	spin_lock_irqsave(&serial_lock, flags);
> > > +	do {
> > > +		ret = ++serial;
> > > +	} while (unlikely(!ret));
> > 
> > Why exactly are you doing this?  Surely if serial is going to
> > wrap around we've got a bigger problem than just wanting go
> > bump one more time?
> 
> Thanks for catching this.
> The code was templated off audit_serial() which tries to solve a
> different problem and rolling it is much more likely.  I hadn't noticed
> that rollover protection.  However, I *had* thought of making it a long
> (which would be the same size on 32-bit arches, but larger on 64-bit)
> since a 64-bit system is more likely to roll it out of sheer speed and
> resource availability.  But perhaps a long long would be safer.

Sounds good, and perhaps a BUG_ON(!serial) for good measure.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] namespaces: give each namespace a serial number
@ 2014-05-02 20:50                 ` Serge Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-02 20:50 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Serge E. Hallyn, linux-audit, linux-kernel, containers, eparis,
	sgrubb, ebiederm

Quoting Richard Guy Briggs (rgb@redhat.com):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (rgb@redhat.com):
> > 
> > Most of this looks reasonable, but I'm curious about something,
> > 
> > > +/**
> > > + * ns_serial - compute a serial number for the namespace
> > > + *
> > > + * Compute a serial number for the namespace to uniquely identify it in
> > > + * audit records.
> > > + */
> > > +unsigned int ns_serial(void)
> > > +{
> > > +	static DEFINE_SPINLOCK(serial_lock);
> > > +	static unsigned int serial = 4; /* reserved for IPC, UTS, user, PID */
> > > +
> > > +	unsigned long flags;
> > > +	unsigned int ret;
> > > +
> > > +	spin_lock_irqsave(&serial_lock, flags);
> > > +	do {
> > > +		ret = ++serial;
> > > +	} while (unlikely(!ret));
> > 
> > Why exactly are you doing this?  Surely if serial is going to
> > wrap around we've got a bigger problem than just wanting go
> > bump one more time?
> 
> Thanks for catching this.
> The code was templated off audit_serial() which tries to solve a
> different problem and rolling it is much more likely.  I hadn't noticed
> that rollover protection.  However, I *had* thought of making it a long
> (which would be the same size on 32-bit arches, but larger on 64-bit)
> since a 64-bit system is more likely to roll it out of sheer speed and
> resource availability.  But perhaps a long long would be safer.

Sounds good, and perhaps a BUG_ON(!serial) for good measure.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-02 14:28         ` Richard Guy Briggs
@ 2014-05-02 21:00             ` Serge Hallyn
  -1 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-02 21:00 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > > I saw no replies to my questions when I replied a year after Aris' posting, so
> > > I don't know if it was ignored or got lost in stale threads:
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > 
> > > I've tried to answer a number of questions that were raised in that thread.
> > > 
> > > The goal is not quite identical to Aris' patchset.
> > > 
> > > The purpose is to track namespaces in use by logged processes from the
> > > perspective of init_*_ns.  The first patch defines a function to list them.
> > > The second patch provides an example of usage for audit_log_task_info() which
> > > is used by syscall audits, among others.  audit_log_task() and
> > > audit_common_recv_message() would be other potential use cases.
> > > 
> > > Use a serial number per namespace (unique across one boot of one kernel)
> > > instead of the inode number (which is claimed to have had the right to change
> > > reserved and is not necessarily unique if there is more than one proc fs).  It
> > > could be argued that the inode numbers have now become a defacto interface and
> > > can't change now, but I'm proposing this approach to see if this helps address
> > > some of the objections to the earlier patchset.
> > > 
> > > There could also have messages added to track the creation and the destruction
> > > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > > information to help identify a namespace.
> > > 
> > > There has been some progress made for audit in net namespaces and pid
> > > namespaces since this previous thread.  net namespaces are now served as peers
> > > by one auditd in the init_net namespace with processes in a non-init_net
> > > namespace being able to write records if they are in the init_user_ns and have
> > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > > of userspace processes that try to join netlink broadcast groups.
> > > 
> > > 
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > identifier for each running instance of a kernel?  Or at least some identifier
> > > within the container migration realm?
> > 
> > Eric Biederman has always been adamantly opposed to adding new namespaces
> > of namespaces, so the fact that you're asking this question concerns me.
> 
> I have seen that position and I don't fully understand the justification
> for it other than added complexity.
> 
> One way that occured to me to be able to identify a kernel instance was
> to look at CPU serial numbers or other CPU entity intended to be
> globally unique, but that isn't universally available.

That's one issue, which is uniqueness of namespaces cross-machines.

But it gets worse if we consider that after allowing in-container audit,
we'll have a nested container running, then have the parent container
migrated to another host (or just checkpointed and restarted);  Now the
nexted container's indexes will all be changed.  Is there any way audit
can track who's who after the migration?

That's not an indictment of the serial # approach, since (a) we don't
have in-container audit yet and (b) we don't have c/r/migration of nested
containers.  But it's worth considering whether we can solve the issue
with serial #s, and, if not, whether we can solve it with any other
approach.

I guess one approach to solve it would be to allow userspace to request
a next serial #.  Which will immediately lead us to a namespace of serial
#s (since the requested # might be lower than the last used one on the
new host).

As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
unique, though perhaps we could attach a generation # for the sake of
audit.  Then after a c/r/migration the generation # may be different,
but we may have a better shot at at least using the same ino#.

> Another possibility was RTC reading at time of boot, but that isn't good
> enough either.
> 
> Both are dubious in VMs anyways.
> 
> > The way things are right now, since audit belongs to the init userns,
> > we can get away with saying if a container 'migrates', the new kernel
> > will see a different set of serials, and noone should care.  However,
> > if we're going to be allowing containers to have their own audit
> > namespace/layer/whatever, then this becomes more of a concern.
> 
> Having a container have its own audit daemon (partitionned appropriately
> in the kernel) would be a long-term goal.

Agreed, fwiw.

> > That said, I'll now look at the patches while pretending that problem
> > does not exist :)  If I ack, it'll be on correctness of the code, but
> > we'll still have to deal with this issue.
> 
> Getting some discussion about this migration challenge was a significant
> motivation for posting this patch, so I'm hoping others will weigh in.
> 
> Thanks for your review, Serge.
> 
> > > What additional events should list this information?
> > > 
> > > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > > init namespace at the moment.
> > > 
> > > 
> > > Proposed output format:
> > > This differs slightly from Aristeu's patch because of the label conflict with
> > > "pid=" due to including it in existing records rather than it being a seperate
> > > record:
> > >         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> > > 
> > > 
> > > Note: This set does not try to solve the non-init namespace audit messages and
> > > auditd problem yet.  That will come later, likely with additional auditd
> > > instances running in another namespace with a limited ability to influence the
> > > master auditd.  I echo Eric B's idea that messages destined for different
> > > namespaces would have to be tailored for that namespace with references that
> > > make sense (such as the right pid number reported to that pid namespace, and
> > > not leaking info about parents or peers).
> > > 
> > > 
> > > Richard Guy Briggs (2):
> > >   namespaces: give each namespace a serial number
> > >   audit: log namespace serial numbers
> > > 
> > >  fs/mount.h                     |    1 +
> > >  fs/namespace.c                 |    1 +
> > >  include/linux/audit.h          |    7 +++++++
> > >  include/linux/ipc_namespace.h  |    1 +
> > >  include/linux/nsproxy.h        |    8 ++++++++
> > >  include/linux/pid_namespace.h  |    1 +
> > >  include/linux/user_namespace.h |    1 +
> > >  include/linux/utsname.h        |    1 +
> > >  include/net/net_namespace.h    |    1 +
> > >  init/version.c                 |    1 +
> > >  ipc/msgutil.c                  |    1 +
> > >  ipc/namespace.c                |    2 ++
> > >  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
> > >  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
> > >  kernel/pid.c                   |    1 +
> > >  kernel/pid_namespace.c         |    2 ++
> > >  kernel/user.c                  |    1 +
> > >  kernel/user_namespace.c        |    2 ++
> > >  kernel/utsname.c               |    2 ++
> > >  net/core/net_namespace.c       |    4 +++-
> > >  20 files changed, 99 insertions(+), 1 deletions(-)
> > > 
> > > _______________________________________________
> > > Containers mailing list
> > > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> - RGB
> 
> --
> Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-02 21:00             ` Serge Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-02 21:00 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Serge E. Hallyn, linux-audit, linux-kernel, containers, eparis,
	sgrubb, ebiederm

Quoting Richard Guy Briggs (rgb@redhat.com):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (rgb@redhat.com):
> > > I saw no replies to my questions when I replied a year after Aris' posting, so
> > > I don't know if it was ignored or got lost in stale threads:
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > 
> > > I've tried to answer a number of questions that were raised in that thread.
> > > 
> > > The goal is not quite identical to Aris' patchset.
> > > 
> > > The purpose is to track namespaces in use by logged processes from the
> > > perspective of init_*_ns.  The first patch defines a function to list them.
> > > The second patch provides an example of usage for audit_log_task_info() which
> > > is used by syscall audits, among others.  audit_log_task() and
> > > audit_common_recv_message() would be other potential use cases.
> > > 
> > > Use a serial number per namespace (unique across one boot of one kernel)
> > > instead of the inode number (which is claimed to have had the right to change
> > > reserved and is not necessarily unique if there is more than one proc fs).  It
> > > could be argued that the inode numbers have now become a defacto interface and
> > > can't change now, but I'm proposing this approach to see if this helps address
> > > some of the objections to the earlier patchset.
> > > 
> > > There could also have messages added to track the creation and the destruction
> > > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > > information to help identify a namespace.
> > > 
> > > There has been some progress made for audit in net namespaces and pid
> > > namespaces since this previous thread.  net namespaces are now served as peers
> > > by one auditd in the init_net namespace with processes in a non-init_net
> > > namespace being able to write records if they are in the init_user_ns and have
> > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > > of userspace processes that try to join netlink broadcast groups.
> > > 
> > > 
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > identifier for each running instance of a kernel?  Or at least some identifier
> > > within the container migration realm?
> > 
> > Eric Biederman has always been adamantly opposed to adding new namespaces
> > of namespaces, so the fact that you're asking this question concerns me.
> 
> I have seen that position and I don't fully understand the justification
> for it other than added complexity.
> 
> One way that occured to me to be able to identify a kernel instance was
> to look at CPU serial numbers or other CPU entity intended to be
> globally unique, but that isn't universally available.

That's one issue, which is uniqueness of namespaces cross-machines.

But it gets worse if we consider that after allowing in-container audit,
we'll have a nested container running, then have the parent container
migrated to another host (or just checkpointed and restarted);  Now the
nexted container's indexes will all be changed.  Is there any way audit
can track who's who after the migration?

That's not an indictment of the serial # approach, since (a) we don't
have in-container audit yet and (b) we don't have c/r/migration of nested
containers.  But it's worth considering whether we can solve the issue
with serial #s, and, if not, whether we can solve it with any other
approach.

I guess one approach to solve it would be to allow userspace to request
a next serial #.  Which will immediately lead us to a namespace of serial
#s (since the requested # might be lower than the last used one on the
new host).

As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
unique, though perhaps we could attach a generation # for the sake of
audit.  Then after a c/r/migration the generation # may be different,
but we may have a better shot at at least using the same ino#.

> Another possibility was RTC reading at time of boot, but that isn't good
> enough either.
> 
> Both are dubious in VMs anyways.
> 
> > The way things are right now, since audit belongs to the init userns,
> > we can get away with saying if a container 'migrates', the new kernel
> > will see a different set of serials, and noone should care.  However,
> > if we're going to be allowing containers to have their own audit
> > namespace/layer/whatever, then this becomes more of a concern.
> 
> Having a container have its own audit daemon (partitionned appropriately
> in the kernel) would be a long-term goal.

Agreed, fwiw.

> > That said, I'll now look at the patches while pretending that problem
> > does not exist :)  If I ack, it'll be on correctness of the code, but
> > we'll still have to deal with this issue.
> 
> Getting some discussion about this migration challenge was a significant
> motivation for posting this patch, so I'm hoping others will weigh in.
> 
> Thanks for your review, Serge.
> 
> > > What additional events should list this information?
> > > 
> > > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > > init namespace at the moment.
> > > 
> > > 
> > > Proposed output format:
> > > This differs slightly from Aristeu's patch because of the label conflict with
> > > "pid=" due to including it in existing records rather than it being a seperate
> > > record:
> > >         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> > > 
> > > 
> > > Note: This set does not try to solve the non-init namespace audit messages and
> > > auditd problem yet.  That will come later, likely with additional auditd
> > > instances running in another namespace with a limited ability to influence the
> > > master auditd.  I echo Eric B's idea that messages destined for different
> > > namespaces would have to be tailored for that namespace with references that
> > > make sense (such as the right pid number reported to that pid namespace, and
> > > not leaking info about parents or peers).
> > > 
> > > 
> > > Richard Guy Briggs (2):
> > >   namespaces: give each namespace a serial number
> > >   audit: log namespace serial numbers
> > > 
> > >  fs/mount.h                     |    1 +
> > >  fs/namespace.c                 |    1 +
> > >  include/linux/audit.h          |    7 +++++++
> > >  include/linux/ipc_namespace.h  |    1 +
> > >  include/linux/nsproxy.h        |    8 ++++++++
> > >  include/linux/pid_namespace.h  |    1 +
> > >  include/linux/user_namespace.h |    1 +
> > >  include/linux/utsname.h        |    1 +
> > >  include/net/net_namespace.h    |    1 +
> > >  init/version.c                 |    1 +
> > >  ipc/msgutil.c                  |    1 +
> > >  ipc/namespace.c                |    2 ++
> > >  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
> > >  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
> > >  kernel/pid.c                   |    1 +
> > >  kernel/pid_namespace.c         |    2 ++
> > >  kernel/user.c                  |    1 +
> > >  kernel/user_namespace.c        |    2 ++
> > >  kernel/utsname.c               |    2 ++
> > >  net/core/net_namespace.c       |    4 +++-
> > >  20 files changed, 99 insertions(+), 1 deletions(-)
> > > 
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> - RGB
> 
> --
> Richard Guy Briggs <rbriggs@redhat.com>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-04-22 18:12 ` Richard Guy Briggs
@ 2014-05-03 21:58     ` James Bottomley
  -1 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-03 21:58 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> identifier for each running instance of a kernel?  Or at least some identifier
> within the container migration realm?

Are you asking for a way of distinguishing an migrated container from an
unmigrated one?  The answer is pretty much "no" because the job of
migration is to restore to the same state as much as possible.

Reading between the lines, I think your goal is to correlate audit
information across a container migration, right?  Ideally the management
system should be able to cough up an audit trail for a container
wherever it's running and however many times it's been migrated?

In that case, I think your idea of a numeric serial number in a dense
range is wrong.  Because the range is dense you're obviously never going
to be able to use the same serial number across a migration.  However,
if you look at all the management systems for containers, they all have
a concept of some unique ID per container, be it name, UUID or even
GUID.  I suspect it's that you should be using to tag the audit trail
with.

James

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-03 21:58     ` James Bottomley
  0 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-03 21:58 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> identifier for each running instance of a kernel?  Or at least some identifier
> within the container migration realm?

Are you asking for a way of distinguishing an migrated container from an
unmigrated one?  The answer is pretty much "no" because the job of
migration is to restore to the same state as much as possible.

Reading between the lines, I think your goal is to correlate audit
information across a container migration, right?  Ideally the management
system should be able to cough up an audit trail for a container
wherever it's running and however many times it's been migrated?

In that case, I think your idea of a numeric serial number in a dense
range is wrong.  Because the range is dense you're obviously never going
to be able to use the same serial number across a migration.  However,
if you look at all the management systems for containers, they all have
a concept of some unique ID per container, be it name, UUID or even
GUID.  I suspect it's that you should be using to tag the audit trail
with.

James



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-03 21:58     ` James Bottomley
  (?)
@ 2014-05-05  3:48     ` Serge E. Hallyn
  2014-05-05 21:48       ` Richard Guy Briggs
  -1 siblings, 1 reply; 58+ messages in thread
From: Serge E. Hallyn @ 2014-05-05  3:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Richard Guy Briggs, linux-audit, linux-kernel, containers,
	serge.hallyn, eparis, sgrubb, ebiederm

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > identifier for each running instance of a kernel?  Or at least some identifier
> > within the container migration realm?
> 
> Are you asking for a way of distinguishing an migrated container from an
> unmigrated one?  The answer is pretty much "no" because the job of
> migration is to restore to the same state as much as possible.
> 
> Reading between the lines, I think your goal is to correlate audit
> information across a container migration, right?  Ideally the management
> system should be able to cough up an audit trail for a container
> wherever it's running and however many times it's been migrated?
> 
> In that case, I think your idea of a numeric serial number in a dense
> range is wrong.  Because the range is dense you're obviously never going
> to be able to use the same serial number across a migration.  However,

Ah, but I was being silly before, we can actually address this pretty
simply.  If we just (for instance) add
/proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
for the relevant ns for the task, then criu can dump this info at
checkpoint.  Then at restart it can dump an audit message per task and
ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
can if it cares keep track.

-serge

(Another, more heavyweight approach would be to track all ns hierarchies
and make the serial numbers per-namespace-instance.  So my container's
pidns serial might be 0x2, and if it clones a new pidns that would be
"(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
need that if the simple userspace approach suffices)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-02 14:28         ` Richard Guy Briggs
  (?)
  (?)
@ 2014-05-05  9:23         ` Nicolas Dichtel
       [not found]           ` <5367587B.20801-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  -1 siblings, 1 reply; 58+ messages in thread
From: Nicolas Dichtel @ 2014-05-05  9:23 UTC (permalink / raw)
  To: Richard Guy Briggs, Serge E. Hallyn
  Cc: containers, serge.hallyn, linux-kernel, linux-audit, ebiederm

Le 02/05/2014 16:28, Richard Guy Briggs a écrit :
> On 14/05/02, Serge E. Hallyn wrote:
>> Quoting Richard Guy Briggs (rgb@redhat.com):
>>> I saw no replies to my questions when I replied a year after Aris' posting, so
>>> I don't know if it was ignored or got lost in stale threads:
>>>          https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
>>>          https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
>>> 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
>>>          https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
>>>
>>> I've tried to answer a number of questions that were raised in that thread.
>>>
>>> The goal is not quite identical to Aris' patchset.
>>>
>>> The purpose is to track namespaces in use by logged processes from the
>>> perspective of init_*_ns.  The first patch defines a function to list them.
>>> The second patch provides an example of usage for audit_log_task_info() which
>>> is used by syscall audits, among others.  audit_log_task() and
>>> audit_common_recv_message() would be other potential use cases.
>>>
>>> Use a serial number per namespace (unique across one boot of one kernel)
>>> instead of the inode number (which is claimed to have had the right to change
>>> reserved and is not necessarily unique if there is more than one proc fs).  It
>>> could be argued that the inode numbers have now become a defacto interface and
>>> can't change now, but I'm proposing this approach to see if this helps address
>>> some of the objections to the earlier patchset.
>>>
>>> There could also have messages added to track the creation and the destruction
>>> of namespaces, listing the parent for hierarchical namespaces such as pidns,
>>> userns, and listing other ids for non-hierarchical namespaces, as well as other
>>> information to help identify a namespace.
>>>
>>> There has been some progress made for audit in net namespaces and pid
>>> namespaces since this previous thread.  net namespaces are now served as peers
>>> by one auditd in the init_net namespace with processes in a non-init_net
>>> namespace being able to write records if they are in the init_user_ns and have
>>> CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
>>> records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
>>> of userspace processes that try to join netlink broadcast groups.
>>>
>>>
>>> Questions:
>>> Is there a way to link serial numbers of namespaces involved in migration of a
>>> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
>>> identifier for each running instance of a kernel?  Or at least some identifier
>>> within the container migration realm?
>>
>> Eric Biederman has always been adamantly opposed to adding new namespaces
>> of namespaces, so the fact that you're asking this question concerns me.
>
> I have seen that position and I don't fully understand the justification
> for it other than added complexity.
Just FYI, have you seen this thread:
http://thread.gmane.org/gmane.linux.network/286572/

There is some explanations/examples about this topic.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-02 21:00             ` Serge Hallyn
  (?)
@ 2014-05-05 21:29             ` Richard Guy Briggs
  -1 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-05 21:29 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Serge E. Hallyn, linux-audit, linux-kernel, containers, eparis,
	sgrubb, ebiederm

On 14/05/02, Serge Hallyn wrote:
> Quoting Richard Guy Briggs (rgb@redhat.com):
> > On 14/05/02, Serge E. Hallyn wrote:
> > > Quoting Richard Guy Briggs (rgb@redhat.com):
> > > > I saw no replies to my questions when I replied a year after Aris' posting, so
> > > > I don't know if it was ignored or got lost in stale threads:
> > > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > > > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > > >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > > 
> > > > I've tried to answer a number of questions that were raised in that thread.
> > > > 
> > > > The goal is not quite identical to Aris' patchset.
> > > > 
> > > > The purpose is to track namespaces in use by logged processes from the
> > > > perspective of init_*_ns.  The first patch defines a function to list them.
> > > > The second patch provides an example of usage for audit_log_task_info() which
> > > > is used by syscall audits, among others.  audit_log_task() and
> > > > audit_common_recv_message() would be other potential use cases.
> > > > 
> > > > Use a serial number per namespace (unique across one boot of one kernel)
> > > > instead of the inode number (which is claimed to have had the right to change
> > > > reserved and is not necessarily unique if there is more than one proc fs).  It
> > > > could be argued that the inode numbers have now become a defacto interface and
> > > > can't change now, but I'm proposing this approach to see if this helps address
> > > > some of the objections to the earlier patchset.
> > > > 
> > > > There could also have messages added to track the creation and the destruction
> > > > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > > > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > > > information to help identify a namespace.
> > > > 
> > > > There has been some progress made for audit in net namespaces and pid
> > > > namespaces since this previous thread.  net namespaces are now served as peers
> > > > by one auditd in the init_net namespace with processes in a non-init_net
> > > > namespace being able to write records if they are in the init_user_ns and have
> > > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > > > of userspace processes that try to join netlink broadcast groups.
> > > > 
> > > > 
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > within the container migration realm?
> > > 
> > > Eric Biederman has always been adamantly opposed to adding new namespaces
> > > of namespaces, so the fact that you're asking this question concerns me.
> > 
> > I have seen that position and I don't fully understand the justification
> > for it other than added complexity.
> > 
> > One way that occured to me to be able to identify a kernel instance was
> > to look at CPU serial numbers or other CPU entity intended to be
> > globally unique, but that isn't universally available.
> 
> That's one issue, which is uniqueness of namespaces cross-machines.
> 
> But it gets worse if we consider that after allowing in-container audit,
> we'll have a nested container running, then have the parent container
> migrated to another host (or just checkpointed and restarted);  Now the
> nexted container's indexes will all be changed.  Is there any way audit
> can track who's who after the migration?

Presumably the namespace serial numbers before and after would be logged
in one message to tie them together.

> That's not an indictment of the serial # approach, since (a) we don't
> have in-container audit yet and (b) we don't have c/r/migration of nested
> containers.  But it's worth considering whether we can solve the issue
> with serial #s, and, if not, whether we can solve it with any other
> approach.
> 
> I guess one approach to solve it would be to allow userspace to request
> a next serial #.  Which will immediately lead us to a namespace of serial
> #s (since the requested # might be lower than the last used one on the
> new host).

:P

> As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
> unique, though perhaps we could attach a generation # for the sake of
> audit.  Then after a c/r/migration the generation # may be different,
> but we may have a better shot at at least using the same ino#.

A generation number is an interesting idea.  Would it get incremented
every time a namespace is c/r/migrated?  Or just if there is a conflict?

Same ino#?  Or same sn?

> > - RGB

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-03 21:58     ` James Bottomley
  (?)
  (?)
@ 2014-05-05 21:44     ` Richard Guy Briggs
  2014-05-06  3:33       ` Serge Hallyn
  -1 siblings, 1 reply; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-05 21:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-audit, linux-kernel, containers, serge.hallyn, eparis,
	sgrubb, ebiederm

On 14/05/03, James Bottomley wrote:
> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > identifier for each running instance of a kernel?  Or at least some identifier
> > within the container migration realm?
> 
> Are you asking for a way of distinguishing an migrated container from an
> unmigrated one?  The answer is pretty much "no" because the job of
> migration is to restore to the same state as much as possible.

I hadn't thought to distinguish a migrated container from an unmigrated
one, but rather I'm more interested in the underlying namespaces.  The
use of a generation number to identify a migrated namespace may be
useful along with the logging to tie them together.

> Reading between the lines, I think your goal is to correlate audit
> information across a container migration, right?  Ideally the management
> system should be able to cough up an audit trail for a container
> wherever it's running and however many times it's been migrated?

The original intent was to track the underlying namespaces themselves.
This sounds like another layer on top of that which sounds useful but
that I had not yet considered.

But yes, that sounds like a good eventual goal.

> In that case, I think your idea of a numeric serial number in a dense
> range is wrong.  Because the range is dense you're obviously never going
> to be able to use the same serial number across a migration.  However,
> if you look at all the management systems for containers, they all have
> a concept of some unique ID per container, be it name, UUID or even
> GUID.  I suspect it's that you should be using to tag the audit trail
> with.

That does sound potentially useful but for the fact that several
containers could share one or more types of namespaces.

Would logging just a container ID be sufficient for audit purposes?  I'm
going to have to dig a bit to understand that one because I was unaware
each container had a unique ID.

I did originally consider a UUID/GUID for namespaces.

> James

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05  3:48     ` Serge E. Hallyn
@ 2014-05-05 21:48       ` Richard Guy Briggs
  2014-05-05 21:51         ` James Bottomley
  0 siblings, 1 reply; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-05 21:48 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: James Bottomley, linux-audit, linux-kernel, containers,
	serge.hallyn, eparis, sgrubb, ebiederm

On 14/05/05, Serge E. Hallyn wrote:
> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > identifier for each running instance of a kernel?  Or at least some identifier
> > > within the container migration realm?
> > 
> > Are you asking for a way of distinguishing an migrated container from an
> > unmigrated one?  The answer is pretty much "no" because the job of
> > migration is to restore to the same state as much as possible.
> > 
> > Reading between the lines, I think your goal is to correlate audit
> > information across a container migration, right?  Ideally the management
> > system should be able to cough up an audit trail for a container
> > wherever it's running and however many times it's been migrated?
> > 
> > In that case, I think your idea of a numeric serial number in a dense
> > range is wrong.  Because the range is dense you're obviously never going
> > to be able to use the same serial number across a migration.  However,
> 
> Ah, but I was being silly before, we can actually address this pretty
> simply.  If we just (for instance) add
> /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> for the relevant ns for the task, then criu can dump this info at
> checkpoint.  Then at restart it can dump an audit message per task and
> ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> can if it cares keep track.

This is the sort of idea I had in mind...

> -serge
> 
> (Another, more heavyweight approach would be to track all ns hierarchies
> and make the serial numbers per-namespace-instance.  So my container's
> pidns serial might be 0x2, and if it clones a new pidns that would be
> "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> need that if the simple userspace approach suffices)

This sounds manageable...

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 21:48       ` Richard Guy Briggs
@ 2014-05-05 21:51         ` James Bottomley
  2014-05-05 22:11           ` Richard Guy Briggs
  2014-05-05 22:27           ` Serge Hallyn
  0 siblings, 2 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-05 21:51 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Serge E. Hallyn, linux-audit, linux-kernel, containers,
	serge.hallyn, eparis, sgrubb, ebiederm

On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> On 14/05/05, Serge E. Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > within the container migration realm?
> > > 
> > > Are you asking for a way of distinguishing an migrated container from an
> > > unmigrated one?  The answer is pretty much "no" because the job of
> > > migration is to restore to the same state as much as possible.
> > > 
> > > Reading between the lines, I think your goal is to correlate audit
> > > information across a container migration, right?  Ideally the management
> > > system should be able to cough up an audit trail for a container
> > > wherever it's running and however many times it's been migrated?
> > > 
> > > In that case, I think your idea of a numeric serial number in a dense
> > > range is wrong.  Because the range is dense you're obviously never going
> > > to be able to use the same serial number across a migration.  However,
> > 
> > Ah, but I was being silly before, we can actually address this pretty
> > simply.  If we just (for instance) add
> > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > for the relevant ns for the task, then criu can dump this info at
> > checkpoint.  Then at restart it can dump an audit message per task and
> > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > can if it cares keep track.
> 
> This is the sort of idea I had in mind...

OK, but I don't understand then why you need a serial number.  There are
plenty of things we preserve across a migration, like namespace name for
instance.  Could you explain what function it performs because I think I
might be missing something.

Thanks,

James


> > -serge
> > 
> > (Another, more heavyweight approach would be to track all ns hierarchies
> > and make the serial numbers per-namespace-instance.  So my container's
> > pidns serial might be 0x2, and if it clones a new pidns that would be
> > "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> > need that if the simple userspace approach suffices)
> 
> This sounds manageable...
> 
> - RGB
> 
> --
> Richard Guy Briggs <rbriggs@redhat.com>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 21:51         ` James Bottomley
@ 2014-05-05 22:11           ` Richard Guy Briggs
  2014-05-05 22:24             ` James Bottomley
  2014-05-05 22:27           ` Serge Hallyn
  1 sibling, 1 reply; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-05 22:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge E. Hallyn, linux-audit, linux-kernel, containers,
	serge.hallyn, eparis, sgrubb, ebiederm

On 14/05/05, James Bottomley wrote:
> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > On 14/05/05, Serge E. Hallyn wrote:
> > > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > Questions:
> > > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > > within the container migration realm?
> > > > 
> > > > Are you asking for a way of distinguishing an migrated container from an
> > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > migration is to restore to the same state as much as possible.
> > > > 
> > > > Reading between the lines, I think your goal is to correlate audit
> > > > information across a container migration, right?  Ideally the management
> > > > system should be able to cough up an audit trail for a container
> > > > wherever it's running and however many times it's been migrated?
> > > > 
> > > > In that case, I think your idea of a numeric serial number in a dense
> > > > range is wrong.  Because the range is dense you're obviously never going
> > > > to be able to use the same serial number across a migration.  However,
> > > 
> > > Ah, but I was being silly before, we can actually address this pretty
> > > simply.  If we just (for instance) add
> > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > for the relevant ns for the task, then criu can dump this info at
> > > checkpoint.  Then at restart it can dump an audit message per task and
> > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > can if it cares keep track.
> > 
> > This is the sort of idea I had in mind...
> 
> OK, but I don't understand then why you need a serial number.  There are
> plenty of things we preserve across a migration, like namespace name for
> instance.  Could you explain what function it performs because I think I
> might be missing something.

If a container was defined as an entity with 6 namespaces to itself,
this would make sense.  As Eric P. put it, containers/namespaces seem to
be a bucket of semi-related nuts and bolts, with any namespace being
optional depending on the application.  My understanding is a
container could be migrated to another host requiring the creation of
(none,) some or all of its namespaces, potentially leaving behind some
of its shared namespaces and/or clashing names of namespaces on the
destination host.

> James
> 
> > > -serge
> > > 
> > > (Another, more heavyweight approach would be to track all ns hierarchies
> > > and make the serial numbers per-namespace-instance.  So my container's
> > > pidns serial might be 0x2, and if it clones a new pidns that would be
> > > "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> > > need that if the simple userspace approach suffices)
> > 
> > This sounds manageable...
> > 
> > - RGB
> > 
> > --
> > Richard Guy Briggs <rbriggs@redhat.com>
> > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> > Remote, Ottawa, Canada
> > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
> 
> 
> 

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 22:11           ` Richard Guy Briggs
@ 2014-05-05 22:24             ` James Bottomley
  0 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-05 22:24 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Serge E. Hallyn, linux-audit, linux-kernel, containers,
	serge.hallyn, eparis, sgrubb, ebiederm

On Mon, 2014-05-05 at 18:11 -0400, Richard Guy Briggs wrote:
> On 14/05/05, James Bottomley wrote:
> > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > > On 14/05/05, Serge E. Hallyn wrote:
> > > > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > > Questions:
> > > > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > > > within the container migration realm?
> > > > > 
> > > > > Are you asking for a way of distinguishing an migrated container from an
> > > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > > migration is to restore to the same state as much as possible.
> > > > > 
> > > > > Reading between the lines, I think your goal is to correlate audit
> > > > > information across a container migration, right?  Ideally the management
> > > > > system should be able to cough up an audit trail for a container
> > > > > wherever it's running and however many times it's been migrated?
> > > > > 
> > > > > In that case, I think your idea of a numeric serial number in a dense
> > > > > range is wrong.  Because the range is dense you're obviously never going
> > > > > to be able to use the same serial number across a migration.  However,
> > > > 
> > > > Ah, but I was being silly before, we can actually address this pretty
> > > > simply.  If we just (for instance) add
> > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > > for the relevant ns for the task, then criu can dump this info at
> > > > checkpoint.  Then at restart it can dump an audit message per task and
> > > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > > can if it cares keep track.
> > > 
> > > This is the sort of idea I had in mind...
> > 
> > OK, but I don't understand then why you need a serial number.  There are
> > plenty of things we preserve across a migration, like namespace name for
> > instance.  Could you explain what function it performs because I think I
> > might be missing something.
> 
> If a container was defined as an entity with 6 namespaces to itself,
> this would make sense.  As Eric P. put it, containers/namespaces seem to
> be a bucket of semi-related nuts and bolts, with any namespace being
> optional depending on the application.

That's right.  An IaaS container has a well defined composition, since
it has to contain a full OS, but an application container is variable.

It's the usual procedure with container management systems to have one
name for the container and give this name to all the namespaces, but I
agree, it doesn't have to.

>   My understanding is a
> container could be migrated to another host requiring the creation of
> (none,) some or all of its namespaces, potentially leaving behind some
> of its shared namespaces and/or clashing names of namespaces on the
> destination host.

Well, no, the environment gets migrated as well so when the migration is
over, the namespaces the migrated entity is in will look the same as
before the migration ... if they didn't exist on the recipient, they'll
be created.  If a namespace already exists the restore fails ... this is
because we support the usual container case where you're migrating to a
disjoint set of namespaces.

Even if there were some reason for supporting shared namespaces, the
fundamental invariant is still the namespace names (i.e. the namespaces
have the same names before and after migration), so how does a serial
number help?

James

> > James
> > 
> > > > -serge
> > > > 
> > > > (Another, more heavyweight approach would be to track all ns hierarchies
> > > > and make the serial numbers per-namespace-instance.  So my container's
> > > > pidns serial might be 0x2, and if it clones a new pidns that would be
> > > > "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> > > > need that if the simple userspace approach suffices)
> > > 
> > > This sounds manageable...
> > > 
> > > - RGB
> > > 
> > > --
> > > Richard Guy Briggs <rbriggs@redhat.com>
> > > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> > > Remote, Ottawa, Canada
> > > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
> > 
> > 
> > 
> 
> - RGB
> 
> --
> Richard Guy Briggs <rbriggs@redhat.com>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 21:51         ` James Bottomley
  2014-05-05 22:11           ` Richard Guy Briggs
@ 2014-05-05 22:27           ` Serge Hallyn
  2014-05-05 22:30             ` James Bottomley
  1 sibling, 1 reply; 58+ messages in thread
From: Serge Hallyn @ 2014-05-05 22:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: Richard Guy Briggs, Serge E. Hallyn, linux-audit, linux-kernel,
	containers, eparis, sgrubb, ebiederm

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > On 14/05/05, Serge E. Hallyn wrote:
> > > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > Questions:
> > > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > > within the container migration realm?
> > > > 
> > > > Are you asking for a way of distinguishing an migrated container from an
> > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > migration is to restore to the same state as much as possible.
> > > > 
> > > > Reading between the lines, I think your goal is to correlate audit
> > > > information across a container migration, right?  Ideally the management
> > > > system should be able to cough up an audit trail for a container
> > > > wherever it's running and however many times it's been migrated?
> > > > 
> > > > In that case, I think your idea of a numeric serial number in a dense
> > > > range is wrong.  Because the range is dense you're obviously never going
> > > > to be able to use the same serial number across a migration.  However,
> > > 
> > > Ah, but I was being silly before, we can actually address this pretty
> > > simply.  If we just (for instance) add
> > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > for the relevant ns for the task, then criu can dump this info at
> > > checkpoint.  Then at restart it can dump an audit message per task and
> > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > can if it cares keep track.
> > 
> > This is the sort of idea I had in mind...
> 
> OK, but I don't understand then why you need a serial number.  There are
> plenty of things we preserve across a migration, like namespace name for
> instance.  Could you explain what function it performs because I think I
> might be missing something.

We're looking ahead to a time when audit is namespaced, and a container
can keep its own audit logs (without limiting what the host audits of
course).  So if a container is auditing suspicious activity by some
task in a sub-namesapce, then the whole parent container gets migrated,
after migration we want to continue being able to correlate the namespaces.

We're also looking at audit trails on a host that is up for years.  We
would like every namespace to be uniquely logged there.  That is why
inode #s on /proc/self/ns/* are not sufficient, unless we add a generation
# (which would end more complicated, not less, than a serial #).

-serge

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 22:27           ` Serge Hallyn
@ 2014-05-05 22:30             ` James Bottomley
  2014-05-05 22:36               ` Serge Hallyn
  0 siblings, 1 reply; 58+ messages in thread
From: James Bottomley @ 2014-05-05 22:30 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Richard Guy Briggs, Serge E. Hallyn, linux-audit, linux-kernel,
	containers, eparis, sgrubb, ebiederm

On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > > On 14/05/05, Serge E. Hallyn wrote:
> > > > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > > Questions:
> > > > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > > > within the container migration realm?
> > > > > 
> > > > > Are you asking for a way of distinguishing an migrated container from an
> > > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > > migration is to restore to the same state as much as possible.
> > > > > 
> > > > > Reading between the lines, I think your goal is to correlate audit
> > > > > information across a container migration, right?  Ideally the management
> > > > > system should be able to cough up an audit trail for a container
> > > > > wherever it's running and however many times it's been migrated?
> > > > > 
> > > > > In that case, I think your idea of a numeric serial number in a dense
> > > > > range is wrong.  Because the range is dense you're obviously never going
> > > > > to be able to use the same serial number across a migration.  However,
> > > > 
> > > > Ah, but I was being silly before, we can actually address this pretty
> > > > simply.  If we just (for instance) add
> > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > > for the relevant ns for the task, then criu can dump this info at
> > > > checkpoint.  Then at restart it can dump an audit message per task and
> > > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > > can if it cares keep track.
> > > 
> > > This is the sort of idea I had in mind...
> > 
> > OK, but I don't understand then why you need a serial number.  There are
> > plenty of things we preserve across a migration, like namespace name for
> > instance.  Could you explain what function it performs because I think I
> > might be missing something.
> 
> We're looking ahead to a time when audit is namespaced, and a container
> can keep its own audit logs (without limiting what the host audits of
> course).  So if a container is auditing suspicious activity by some
> task in a sub-namesapce, then the whole parent container gets migrated,
> after migration we want to continue being able to correlate the namespaces.
> 
> We're also looking at audit trails on a host that is up for years.  We
> would like every namespace to be uniquely logged there.  That is why
> inode #s on /proc/self/ns/* are not sufficient, unless we add a generation
> # (which would end more complicated, not less, than a serial #).

Right, but when the contaner has an audit namespace, that namespace has
a name, which CRIU would migrate, so why not use that name for the
log .. no need for numbers (unless you make the name a number, of
course)?

James



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 22:30             ` James Bottomley
@ 2014-05-05 22:36               ` Serge Hallyn
  2014-05-05 23:23                   ` James Bottomley
  0 siblings, 1 reply; 58+ messages in thread
From: Serge Hallyn @ 2014-05-05 22:36 UTC (permalink / raw)
  To: James Bottomley
  Cc: containers, linux-kernel, eparis, linux-audit, ebiederm, sgrubb

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > > > On 14/05/05, Serge E. Hallyn wrote:
> > > > > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > > > Questions:
> > > > > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > > > > within the container migration realm?
> > > > > > 
> > > > > > Are you asking for a way of distinguishing an migrated container from an
> > > > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > > > migration is to restore to the same state as much as possible.
> > > > > > 
> > > > > > Reading between the lines, I think your goal is to correlate audit
> > > > > > information across a container migration, right?  Ideally the management
> > > > > > system should be able to cough up an audit trail for a container
> > > > > > wherever it's running and however many times it's been migrated?
> > > > > > 
> > > > > > In that case, I think your idea of a numeric serial number in a dense
> > > > > > range is wrong.  Because the range is dense you're obviously never going
> > > > > > to be able to use the same serial number across a migration.  However,
> > > > > 
> > > > > Ah, but I was being silly before, we can actually address this pretty
> > > > > simply.  If we just (for instance) add
> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > > > for the relevant ns for the task, then criu can dump this info at
> > > > > checkpoint.  Then at restart it can dump an audit message per task and
> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > > > can if it cares keep track.
> > > > 
> > > > This is the sort of idea I had in mind...
> > > 
> > > OK, but I don't understand then why you need a serial number.  There are
> > > plenty of things we preserve across a migration, like namespace name for
> > > instance.  Could you explain what function it performs because I think I
> > > might be missing something.
> > 
> > We're looking ahead to a time when audit is namespaced, and a container
> > can keep its own audit logs (without limiting what the host audits of
> > course).  So if a container is auditing suspicious activity by some
> > task in a sub-namesapce, then the whole parent container gets migrated,
> > after migration we want to continue being able to correlate the namespaces.
> > 
> > We're also looking at audit trails on a host that is up for years.  We
> > would like every namespace to be uniquely logged there.  That is why
> > inode #s on /proc/self/ns/* are not sufficient, unless we add a generation
> > # (which would end more complicated, not less, than a serial #).
> 
> Right, but when the contaner has an audit namespace, that namespace has
> a name,

What ns has a name?  The audit ns can be tied to 50 pid namespaces, and we
want to log which pidns is responsible for something.

If you mean the pidns has a name, that's the problem...  it does not, it
only has a inode # which may later be re-use.

> which CRIU would migrate, so why not use that name for the
> log .. no need for numbers (unless you make the name a number, of
> course)?
> 
> James

Sorry if I'm being dense...

-serge

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 22:36               ` Serge Hallyn
@ 2014-05-05 23:23                   ` James Bottomley
  0 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-05 23:23 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers, linux-kernel, eparis, linux-audit, ebiederm, sgrubb



On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
>Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
>> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
>> > > > On 14/05/05, Serge E. Hallyn wrote:
>> > > > > Quoting James Bottomley
>(James.Bottomley@HansenPartnership.com):
>> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
>wrote:
>> > > > > > > Questions:
>> > > > > > > Is there a way to link serial numbers of namespaces
>involved in migration of a
>> > > > > > > container to another kernel?  (I had a brief look at
>CRIU.)  Is there a unique
>> > > > > > > identifier for each running instance of a kernel?  Or at
>least some identifier
>> > > > > > > within the container migration realm?
>> > > > > > 
>> > > > > > Are you asking for a way of distinguishing an migrated
>container from an
>> > > > > > unmigrated one?  The answer is pretty much "no" because the
>job of
>> > > > > > migration is to restore to the same state as much as
>possible.
>> > > > > > 
>> > > > > > Reading between the lines, I think your goal is to
>correlate audit
>> > > > > > information across a container migration, right?  Ideally
>the management
>> > > > > > system should be able to cough up an audit trail for a
>container
>> > > > > > wherever it's running and however many times it's been
>migrated?
>> > > > > > 
>> > > > > > In that case, I think your idea of a numeric serial number
>in a dense
>> > > > > > range is wrong.  Because the range is dense you're
>obviously never going
>> > > > > > to be able to use the same serial number across a
>migration.  However,
>> > > > > 
>> > > > > Ah, but I was being silly before, we can actually address
>this pretty
>> > > > > simply.  If we just (for instance) add
>> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
>serial number
>> > > > > for the relevant ns for the task, then criu can dump this
>info at
>> > > > > checkpoint.  Then at restart it can dump an audit message per
>task and
>> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
>log reader
>> > > > > can if it cares keep track.
>> > > > 
>> > > > This is the sort of idea I had in mind...
>> > > 
>> > > OK, but I don't understand then why you need a serial number. 
>There are
>> > > plenty of things we preserve across a migration, like namespace
>name for
>> > > instance.  Could you explain what function it performs because I
>think I
>> > > might be missing something.
>> > 
>> > We're looking ahead to a time when audit is namespaced, and a
>container
>> > can keep its own audit logs (without limiting what the host audits
>of
>> > course).  So if a container is auditing suspicious activity by some
>> > task in a sub-namesapce, then the whole parent container gets
>migrated,
>> > after migration we want to continue being able to correlate the
>namespaces.
>> > 
>> > We're also looking at audit trails on a host that is up for years. 
>We
>> > would like every namespace to be uniquely logged there.  That is
>why
>> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
>generation
>> > # (which would end more complicated, not less, than a serial #).
>> 
>> Right, but when the contaner has an audit namespace, that namespace
>has
>> a name,
>
>What ns has a name?

The netns for instance.

>  The audit ns can be tied to 50 pid namespaces, and
>we
>want to log which pidns is responsible for something.
>
>If you mean the pidns has a name, that's the problem...  it does not,
>it
>only has a inode # which may later be re-use.

I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.

>> which CRIU would migrate, so why not use that name for the
>> log .. no need for numbers (unless you make the name a number, of
>> course)?
>> 
>> James
>
>Sorry if I'm being dense...

No I think our assumptions are mismatched. I just can't figure out where.

James

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-05 23:23                   ` James Bottomley
  0 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-05 23:23 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: containers, linux-kernel, linux-audit, ebiederm



On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
>Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
>> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
>> > > > On 14/05/05, Serge E. Hallyn wrote:
>> > > > > Quoting James Bottomley
>(James.Bottomley@HansenPartnership.com):
>> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
>wrote:
>> > > > > > > Questions:
>> > > > > > > Is there a way to link serial numbers of namespaces
>involved in migration of a
>> > > > > > > container to another kernel?  (I had a brief look at
>CRIU.)  Is there a unique
>> > > > > > > identifier for each running instance of a kernel?  Or at
>least some identifier
>> > > > > > > within the container migration realm?
>> > > > > > 
>> > > > > > Are you asking for a way of distinguishing an migrated
>container from an
>> > > > > > unmigrated one?  The answer is pretty much "no" because the
>job of
>> > > > > > migration is to restore to the same state as much as
>possible.
>> > > > > > 
>> > > > > > Reading between the lines, I think your goal is to
>correlate audit
>> > > > > > information across a container migration, right?  Ideally
>the management
>> > > > > > system should be able to cough up an audit trail for a
>container
>> > > > > > wherever it's running and however many times it's been
>migrated?
>> > > > > > 
>> > > > > > In that case, I think your idea of a numeric serial number
>in a dense
>> > > > > > range is wrong.  Because the range is dense you're
>obviously never going
>> > > > > > to be able to use the same serial number across a
>migration.  However,
>> > > > > 
>> > > > > Ah, but I was being silly before, we can actually address
>this pretty
>> > > > > simply.  If we just (for instance) add
>> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
>serial number
>> > > > > for the relevant ns for the task, then criu can dump this
>info at
>> > > > > checkpoint.  Then at restart it can dump an audit message per
>task and
>> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
>log reader
>> > > > > can if it cares keep track.
>> > > > 
>> > > > This is the sort of idea I had in mind...
>> > > 
>> > > OK, but I don't understand then why you need a serial number. 
>There are
>> > > plenty of things we preserve across a migration, like namespace
>name for
>> > > instance.  Could you explain what function it performs because I
>think I
>> > > might be missing something.
>> > 
>> > We're looking ahead to a time when audit is namespaced, and a
>container
>> > can keep its own audit logs (without limiting what the host audits
>of
>> > course).  So if a container is auditing suspicious activity by some
>> > task in a sub-namesapce, then the whole parent container gets
>migrated,
>> > after migration we want to continue being able to correlate the
>namespaces.
>> > 
>> > We're also looking at audit trails on a host that is up for years. 
>We
>> > would like every namespace to be uniquely logged there.  That is
>why
>> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
>generation
>> > # (which would end more complicated, not less, than a serial #).
>> 
>> Right, but when the contaner has an audit namespace, that namespace
>has
>> a name,
>
>What ns has a name?

The netns for instance.

>  The audit ns can be tied to 50 pid namespaces, and
>we
>want to log which pidns is responsible for something.
>
>If you mean the pidns has a name, that's the problem...  it does not,
>it
>only has a inode # which may later be re-use.

I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.

>> which CRIU would migrate, so why not use that name for the
>> log .. no need for numbers (unless you make the name a number, of
>> course)?
>> 
>> James
>
>Sorry if I'm being dense...

No I think our assumptions are mismatched. I just can't figure out where.

James

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 23:23                   ` James Bottomley
@ 2014-05-06  3:27                     ` Serge Hallyn
  -1 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-06  3:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: containers, linux-kernel, eparis, linux-audit, ebiederm, sgrubb

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> 
> 
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> >Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> >> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(James.Bottomley@HansenPartnership.com):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.

And what is its name?  The only name I know that we could log in an
audit message is the /proc/self/ns/net inode number (which does not
suffice)

> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.

Maybe this is where we're confusing each other - I'm not talking
about giving the audit ns a name.  I'm talking about being able to
identify the other namespaces inside an audit message.  In a way
that (a) is unique across bare metals' entire uptime, and (b)
can be tracked across migrations.

And again we don't need to actually implement all that now - all
I wanted to make sure of was that the serial # as proposed by Richard
could be made to work for those purposes, and I now believe they can.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?
> >> 
> >> James
> >
> >Sorry if I'm being dense...
> 
> No I think our assumptions are mismatched. I just can't figure out where.
> 
> James
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06  3:27                     ` Serge Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-06  3:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: containers, linux-kernel, linux-audit, ebiederm

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> 
> 
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> >Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> >> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(James.Bottomley@HansenPartnership.com):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.

And what is its name?  The only name I know that we could log in an
audit message is the /proc/self/ns/net inode number (which does not
suffice)

> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.

Maybe this is where we're confusing each other - I'm not talking
about giving the audit ns a name.  I'm talking about being able to
identify the other namespaces inside an audit message.  In a way
that (a) is unique across bare metals' entire uptime, and (b)
can be tracked across migrations.

And again we don't need to actually implement all that now - all
I wanted to make sure of was that the serial # as proposed by Richard
could be made to work for those purposes, and I now believe they can.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?
> >> 
> >> James
> >
> >Sorry if I'm being dense...
> 
> No I think our assumptions are mismatched. I just can't figure out where.
> 
> James
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 21:44     ` Richard Guy Briggs
@ 2014-05-06  3:33       ` Serge Hallyn
  2014-05-06 14:03         ` Richard Guy Briggs
  2014-05-06 14:03           ` Richard Guy Briggs
  0 siblings, 2 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-06  3:33 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: James Bottomley, containers, linux-kernel, eparis, linux-audit,
	ebiederm, sgrubb

Quoting Richard Guy Briggs (rgb@redhat.com):
> On 14/05/03, James Bottomley wrote:
> > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > identifier for each running instance of a kernel?  Or at least some identifier
> > > within the container migration realm?
> > 
> > Are you asking for a way of distinguishing an migrated container from an
> > unmigrated one?  The answer is pretty much "no" because the job of
> > migration is to restore to the same state as much as possible.
> 
> I hadn't thought to distinguish a migrated container from an unmigrated
> one, but rather I'm more interested in the underlying namespaces.  The
> use of a generation number to identify a migrated namespace may be
> useful along with the logging to tie them together.
> 
> > Reading between the lines, I think your goal is to correlate audit
> > information across a container migration, right?  Ideally the management
> > system should be able to cough up an audit trail for a container
> > wherever it's running and however many times it's been migrated?
> 
> The original intent was to track the underlying namespaces themselves.
> This sounds like another layer on top of that which sounds useful but
> that I had not yet considered.
> 
> But yes, that sounds like a good eventual goal.

Right and we don't need that now, all *I* wanted to convince myself of
was that a serial # as you were using it was not going to be a roadlbock
to that, since once we introduce a serial #, we're stuck with that as
user-space facing api.

> > In that case, I think your idea of a numeric serial number in a dense
> > range is wrong.  Because the range is dense you're obviously never going
> > to be able to use the same serial number across a migration.  However,
> > if you look at all the management systems for containers, they all have
> > a concept of some unique ID per container, be it name, UUID or even
> > GUID.  I suspect it's that you should be using to tag the audit trail
> > with.
> 
> That does sound potentially useful but for the fact that several
> containers could share one or more types of namespaces.
> 
> Would logging just a container ID be sufficient for audit purposes?  I'm
> going to have to dig a bit to understand that one because I was unaware
> each container had a unique ID.

They don't :)

> I did originally consider a UUID/GUID for namespaces.

So I think that apart from resending to address the serial # overflow
comment, I'm happy to ack the patches.  Then we probably need to convicne
Eric that we're not torturing kittens.

-serge

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06  3:27                     ` Serge Hallyn
  (?)
@ 2014-05-06  4:59                     ` James Bottomley
       [not found]                       ` <1399352350.2164.91.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  -1 siblings, 1 reply; 58+ messages in thread
From: James Bottomley @ 2014-05-06  4:59 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers, linux-kernel, eparis, linux-audit, ebiederm, sgrubb

On Tue, 2014-05-06 at 03:27 +0000, Serge Hallyn wrote:
> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > >> Right, but when the contaner has an audit namespace, that namespace
> > >has
> > >> a name,
> > >
> > >What ns has a name?
> > 
> > The netns for instance.
> 
> And what is its name?

As I think you know ip netns list will show you all of them.  The way
they're applied is via mapped files in /var/run/netns/ which hold the
names.

>   The only name I know that we could log in an
> audit message is the /proc/self/ns/net inode number (which does not
> suffice)

OK, so I think this is the confusion: You're thinking the container
itself doesn't know what name the namespace has been given by the
system, all it knows is the inode number corresponding to a file which
it may or may not be able to see, right?  I'm thinking that the system
that set up the container gave those files names and usually they're the
same name for all the namespaces.  The point is that the orchestration
system (whatever set up the container) will be responsible for the
migration.  It will be the thing that has a unique handle for the
container.  The handle is usually ascii representable, either a human
readable name or some uuid/guid.  It's that handle that we should be
using to prefix the audit message, so when you set up an audit
namespace, it gets supplied with a prefix string corresponding to the
well known name for the container.  This is the string we'd preserve
across migration as part of the audit namespace state ... so the audit
messages all correlate to the container wherever it's migrated to; no
need to do complex tracking of changes to serial numbers.

> > >  The audit ns can be tied to 50 pid namespaces, and
> > >we
> > >want to log which pidns is responsible for something.
> > >
> > >If you mean the pidns has a name, that's the problem...  it does not,
> > >it
> > >only has a inode # which may later be re-use.
> > 
> > I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.
> 
> Maybe this is where we're confusing each other - I'm not talking
> about giving the audit ns a name.  I'm talking about being able to
> identify the other namespaces inside an audit message.  In a way
> that (a) is unique across bare metals' entire uptime, and (b)
> can be tracked across migrations.

OK, so that is different from what I'm thinking.  I'm thinking unique
name for migrateable entity, you want a unique name for each component
of the migrateable entity?  My instinct still tells me the orchestration
system is going to have a unique identifier for each different sub
container.

However, I have to point out that a serial number isn't what you want
either if you really mean bare metal.  We do a lot of deployments where
the containers run in a hypervisor, there the serial numbers won't be
unique per box (only per vm) and we'll have to do vm correlation
separately.  whereas a scheme which allows the orchestration system to
supply the names would still be unique in that situation.

James



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
       [not found]                   ` <a09ed85b-d6ef-4472-853b-84057d5957c2-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
@ 2014-05-06 12:35                     ` Nicolas Dichtel
  2014-05-06 21:41                       ` Richard Guy Briggs
  1 sibling, 0 replies; 58+ messages in thread
From: Nicolas Dichtel @ 2014-05-06 12:35 UTC (permalink / raw)
  To: James Bottomley, Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Le 06/05/2014 01:23, James Bottomley a écrit :
>
>
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
>> Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
>>> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
>>>> Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
>>>>> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
>>>>>> On 14/05/05, Serge E. Hallyn wrote:
>>>>>>> Quoting James Bottomley
>> (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
>>>>>>>> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
>> wrote:
>>>>>>>>> Questions:
>>>>>>>>> Is there a way to link serial numbers of namespaces
>> involved in migration of a
>>>>>>>>> container to another kernel?  (I had a brief look at
>> CRIU.)  Is there a unique
>>>>>>>>> identifier for each running instance of a kernel?  Or at
>> least some identifier
>>>>>>>>> within the container migration realm?
>>>>>>>>
>>>>>>>> Are you asking for a way of distinguishing an migrated
>> container from an
>>>>>>>> unmigrated one?  The answer is pretty much "no" because the
>> job of
>>>>>>>> migration is to restore to the same state as much as
>> possible.
>>>>>>>>
>>>>>>>> Reading between the lines, I think your goal is to
>> correlate audit
>>>>>>>> information across a container migration, right?  Ideally
>> the management
>>>>>>>> system should be able to cough up an audit trail for a
>> container
>>>>>>>> wherever it's running and however many times it's been
>> migrated?
>>>>>>>>
>>>>>>>> In that case, I think your idea of a numeric serial number
>> in a dense
>>>>>>>> range is wrong.  Because the range is dense you're
>> obviously never going
>>>>>>>> to be able to use the same serial number across a
>> migration.  However,
>>>>>>>
>>>>>>> Ah, but I was being silly before, we can actually address
>> this pretty
>>>>>>> simply.  If we just (for instance) add
>>>>>>> /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
>> serial number
>>>>>>> for the relevant ns for the task, then criu can dump this
>> info at
>>>>>>> checkpoint.  Then at restart it can dump an audit message per
>> task and
>>>>>>> ns saying old_serial=%x,new_serial=%x.  That way the audit
>> log reader
>>>>>>> can if it cares keep track.
>>>>>>
>>>>>> This is the sort of idea I had in mind...
>>>>>
>>>>> OK, but I don't understand then why you need a serial number.
>> There are
>>>>> plenty of things we preserve across a migration, like namespace
>> name for
>>>>> instance.  Could you explain what function it performs because I
>> think I
>>>>> might be missing something.
>>>>
>>>> We're looking ahead to a time when audit is namespaced, and a
>> container
>>>> can keep its own audit logs (without limiting what the host audits
>> of
>>>> course).  So if a container is auditing suspicious activity by some
>>>> task in a sub-namesapce, then the whole parent container gets
>> migrated,
>>>> after migration we want to continue being able to correlate the
>> namespaces.
>>>>
>>>> We're also looking at audit trails on a host that is up for years.
>> We
>>>> would like every namespace to be uniquely logged there.  That is
>> why
>>>> inode #s on /proc/self/ns/* are not sufficient, unless we add a
>> generation
>>>> # (which would end more complicated, not less, than a serial #).
>>>
>>> Right, but when the contaner has an audit namespace, that namespace
>> has
>>> a name,
>>
>> What ns has a name?
>
> The netns for instance.
netns does not have names. iproute2 uses names (a filename in fact, to hold a
reference on the netns), but the kernel never got this name. It only get a file
descriptor (or a pid).


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 23:23                   ` James Bottomley
@ 2014-05-06 12:35                     ` Nicolas Dichtel
  -1 siblings, 0 replies; 58+ messages in thread
From: Nicolas Dichtel @ 2014-05-06 12:35 UTC (permalink / raw)
  To: James Bottomley, Serge Hallyn
  Cc: containers, linux-kernel, linux-audit, ebiederm

Le 06/05/2014 01:23, James Bottomley a écrit :
>
>
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
>> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>>> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
>>>> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>>>>> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
>>>>>> On 14/05/05, Serge E. Hallyn wrote:
>>>>>>> Quoting James Bottomley
>> (James.Bottomley@HansenPartnership.com):
>>>>>>>> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
>> wrote:
>>>>>>>>> Questions:
>>>>>>>>> Is there a way to link serial numbers of namespaces
>> involved in migration of a
>>>>>>>>> container to another kernel?  (I had a brief look at
>> CRIU.)  Is there a unique
>>>>>>>>> identifier for each running instance of a kernel?  Or at
>> least some identifier
>>>>>>>>> within the container migration realm?
>>>>>>>>
>>>>>>>> Are you asking for a way of distinguishing an migrated
>> container from an
>>>>>>>> unmigrated one?  The answer is pretty much "no" because the
>> job of
>>>>>>>> migration is to restore to the same state as much as
>> possible.
>>>>>>>>
>>>>>>>> Reading between the lines, I think your goal is to
>> correlate audit
>>>>>>>> information across a container migration, right?  Ideally
>> the management
>>>>>>>> system should be able to cough up an audit trail for a
>> container
>>>>>>>> wherever it's running and however many times it's been
>> migrated?
>>>>>>>>
>>>>>>>> In that case, I think your idea of a numeric serial number
>> in a dense
>>>>>>>> range is wrong.  Because the range is dense you're
>> obviously never going
>>>>>>>> to be able to use the same serial number across a
>> migration.  However,
>>>>>>>
>>>>>>> Ah, but I was being silly before, we can actually address
>> this pretty
>>>>>>> simply.  If we just (for instance) add
>>>>>>> /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
>> serial number
>>>>>>> for the relevant ns for the task, then criu can dump this
>> info at
>>>>>>> checkpoint.  Then at restart it can dump an audit message per
>> task and
>>>>>>> ns saying old_serial=%x,new_serial=%x.  That way the audit
>> log reader
>>>>>>> can if it cares keep track.
>>>>>>
>>>>>> This is the sort of idea I had in mind...
>>>>>
>>>>> OK, but I don't understand then why you need a serial number.
>> There are
>>>>> plenty of things we preserve across a migration, like namespace
>> name for
>>>>> instance.  Could you explain what function it performs because I
>> think I
>>>>> might be missing something.
>>>>
>>>> We're looking ahead to a time when audit is namespaced, and a
>> container
>>>> can keep its own audit logs (without limiting what the host audits
>> of
>>>> course).  So if a container is auditing suspicious activity by some
>>>> task in a sub-namesapce, then the whole parent container gets
>> migrated,
>>>> after migration we want to continue being able to correlate the
>> namespaces.
>>>>
>>>> We're also looking at audit trails on a host that is up for years.
>> We
>>>> would like every namespace to be uniquely logged there.  That is
>> why
>>>> inode #s on /proc/self/ns/* are not sufficient, unless we add a
>> generation
>>>> # (which would end more complicated, not less, than a serial #).
>>>
>>> Right, but when the contaner has an audit namespace, that namespace
>> has
>>> a name,
>>
>> What ns has a name?
>
> The netns for instance.
netns does not have names. iproute2 uses names (a filename in fact, to hold a
reference on the netns), but the kernel never got this name. It only get a file
descriptor (or a pid).


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 12:35                     ` Nicolas Dichtel
  0 siblings, 0 replies; 58+ messages in thread
From: Nicolas Dichtel @ 2014-05-06 12:35 UTC (permalink / raw)
  To: James Bottomley, Serge Hallyn
  Cc: containers, linux-audit, linux-kernel, ebiederm

Le 06/05/2014 01:23, James Bottomley a écrit :
>
>
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
>> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>>> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
>>>> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>>>>> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
>>>>>> On 14/05/05, Serge E. Hallyn wrote:
>>>>>>> Quoting James Bottomley
>> (James.Bottomley@HansenPartnership.com):
>>>>>>>> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
>> wrote:
>>>>>>>>> Questions:
>>>>>>>>> Is there a way to link serial numbers of namespaces
>> involved in migration of a
>>>>>>>>> container to another kernel?  (I had a brief look at
>> CRIU.)  Is there a unique
>>>>>>>>> identifier for each running instance of a kernel?  Or at
>> least some identifier
>>>>>>>>> within the container migration realm?
>>>>>>>>
>>>>>>>> Are you asking for a way of distinguishing an migrated
>> container from an
>>>>>>>> unmigrated one?  The answer is pretty much "no" because the
>> job of
>>>>>>>> migration is to restore to the same state as much as
>> possible.
>>>>>>>>
>>>>>>>> Reading between the lines, I think your goal is to
>> correlate audit
>>>>>>>> information across a container migration, right?  Ideally
>> the management
>>>>>>>> system should be able to cough up an audit trail for a
>> container
>>>>>>>> wherever it's running and however many times it's been
>> migrated?
>>>>>>>>
>>>>>>>> In that case, I think your idea of a numeric serial number
>> in a dense
>>>>>>>> range is wrong.  Because the range is dense you're
>> obviously never going
>>>>>>>> to be able to use the same serial number across a
>> migration.  However,
>>>>>>>
>>>>>>> Ah, but I was being silly before, we can actually address
>> this pretty
>>>>>>> simply.  If we just (for instance) add
>>>>>>> /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
>> serial number
>>>>>>> for the relevant ns for the task, then criu can dump this
>> info at
>>>>>>> checkpoint.  Then at restart it can dump an audit message per
>> task and
>>>>>>> ns saying old_serial=%x,new_serial=%x.  That way the audit
>> log reader
>>>>>>> can if it cares keep track.
>>>>>>
>>>>>> This is the sort of idea I had in mind...
>>>>>
>>>>> OK, but I don't understand then why you need a serial number.
>> There are
>>>>> plenty of things we preserve across a migration, like namespace
>> name for
>>>>> instance.  Could you explain what function it performs because I
>> think I
>>>>> might be missing something.
>>>>
>>>> We're looking ahead to a time when audit is namespaced, and a
>> container
>>>> can keep its own audit logs (without limiting what the host audits
>> of
>>>> course).  So if a container is auditing suspicious activity by some
>>>> task in a sub-namesapce, then the whole parent container gets
>> migrated,
>>>> after migration we want to continue being able to correlate the
>> namespaces.
>>>>
>>>> We're also looking at audit trails on a host that is up for years.
>> We
>>>> would like every namespace to be uniquely logged there.  That is
>> why
>>>> inode #s on /proc/self/ns/* are not sufficient, unless we add a
>> generation
>>>> # (which would end more complicated, not less, than a serial #).
>>>
>>> Right, but when the contaner has an audit namespace, that namespace
>> has
>>> a name,
>>
>> What ns has a name?
>
> The netns for instance.
netns does not have names. iproute2 uses names (a filename in fact, to hold a
reference on the netns), but the kernel never got this name. It only get a file
descriptor (or a pid).


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06  3:33       ` Serge Hallyn
@ 2014-05-06 14:03         ` Richard Guy Briggs
  2014-05-06 14:03           ` Richard Guy Briggs
  1 sibling, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 14:03 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

On 14/05/06, Serge Hallyn wrote:
> Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > On 14/05/03, James Bottomley wrote:
> > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > within the container migration realm?
> > > 
> > > Are you asking for a way of distinguishing an migrated container from an
> > > unmigrated one?  The answer is pretty much "no" because the job of
> > > migration is to restore to the same state as much as possible.
> > 
> > I hadn't thought to distinguish a migrated container from an unmigrated
> > one, but rather I'm more interested in the underlying namespaces.  The
> > use of a generation number to identify a migrated namespace may be
> > useful along with the logging to tie them together.
> > 
> > > Reading between the lines, I think your goal is to correlate audit
> > > information across a container migration, right?  Ideally the management
> > > system should be able to cough up an audit trail for a container
> > > wherever it's running and however many times it's been migrated?
> > 
> > The original intent was to track the underlying namespaces themselves.
> > This sounds like another layer on top of that which sounds useful but
> > that I had not yet considered.
> > 
> > But yes, that sounds like a good eventual goal.
> 
> Right and we don't need that now, all *I* wanted to convince myself of
> was that a serial # as you were using it was not going to be a roadlbock
> to that, since once we introduce a serial #, we're stuck with that as
> user-space facing api.

Understood.  If a container gets migrated somewhere along with its
namespace, the namespace elsewhere is going to have a new serial number,
but the migration log is going to hopefully show both serial numbers.
If that container gets migrated back, the supporting namespace will get
yet a new serial number, with its log trail connecting the previous
remote one.  Those logs can be used by a higher layer audit aggregator
to piece together those log crumbs.

The serial number was intended to be an alternative to the inode numbers
which had the issues of needing a qualifying device number accompanying
it, plus the reservation that that inode number could change in the
future to solve unforseen technical problems.  I saw no other stable
identifiers common to all namespace types with which I could work.

Containers may have their own names, but I didn't see any consistent way
to identify namespace instances.

> > > In that case, I think your idea of a numeric serial number in a dense
> > > range is wrong.  Because the range is dense you're obviously never going
> > > to be able to use the same serial number across a migration.  However,
> > > if you look at all the management systems for containers, they all have
> > > a concept of some unique ID per container, be it name, UUID or even
> > > GUID.  I suspect it's that you should be using to tag the audit trail
> > > with.
> > 
> > That does sound potentially useful but for the fact that several
> > containers could share one or more types of namespaces.
> > 
> > Would logging just a container ID be sufficient for audit purposes?  I'm
> > going to have to dig a bit to understand that one because I was unaware
> > each container had a unique ID.
> 
> They don't :)

Ok, so I'd be looking in vain...

> > I did originally consider a UUID/GUID for namespaces.
> 
> So I think that apart from resending to address the serial # overflow
> comment, I'm happy to ack the patches.  Then we probably need to convicne
> Eric that we're not torturing kittens.

I've already fixed the overflow issues.  I'll resend with the fixes.

This patch pair was intended to sort out some of my understanding of the
problem I perceived, and has helped me understand there are other layers
that need work too to make it useful, but this is a good base.

A subsequent piece would be to expose that serial number in the proc
filesystem.

> -serge

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06  3:33       ` Serge Hallyn
@ 2014-05-06 14:03           ` Richard Guy Briggs
  2014-05-06 14:03           ` Richard Guy Briggs
  1 sibling, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 14:03 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: James Bottomley, containers, linux-kernel, eparis, linux-audit,
	ebiederm, sgrubb

On 14/05/06, Serge Hallyn wrote:
> Quoting Richard Guy Briggs (rgb@redhat.com):
> > On 14/05/03, James Bottomley wrote:
> > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > within the container migration realm?
> > > 
> > > Are you asking for a way of distinguishing an migrated container from an
> > > unmigrated one?  The answer is pretty much "no" because the job of
> > > migration is to restore to the same state as much as possible.
> > 
> > I hadn't thought to distinguish a migrated container from an unmigrated
> > one, but rather I'm more interested in the underlying namespaces.  The
> > use of a generation number to identify a migrated namespace may be
> > useful along with the logging to tie them together.
> > 
> > > Reading between the lines, I think your goal is to correlate audit
> > > information across a container migration, right?  Ideally the management
> > > system should be able to cough up an audit trail for a container
> > > wherever it's running and however many times it's been migrated?
> > 
> > The original intent was to track the underlying namespaces themselves.
> > This sounds like another layer on top of that which sounds useful but
> > that I had not yet considered.
> > 
> > But yes, that sounds like a good eventual goal.
> 
> Right and we don't need that now, all *I* wanted to convince myself of
> was that a serial # as you were using it was not going to be a roadlbock
> to that, since once we introduce a serial #, we're stuck with that as
> user-space facing api.

Understood.  If a container gets migrated somewhere along with its
namespace, the namespace elsewhere is going to have a new serial number,
but the migration log is going to hopefully show both serial numbers.
If that container gets migrated back, the supporting namespace will get
yet a new serial number, with its log trail connecting the previous
remote one.  Those logs can be used by a higher layer audit aggregator
to piece together those log crumbs.

The serial number was intended to be an alternative to the inode numbers
which had the issues of needing a qualifying device number accompanying
it, plus the reservation that that inode number could change in the
future to solve unforseen technical problems.  I saw no other stable
identifiers common to all namespace types with which I could work.

Containers may have their own names, but I didn't see any consistent way
to identify namespace instances.

> > > In that case, I think your idea of a numeric serial number in a dense
> > > range is wrong.  Because the range is dense you're obviously never going
> > > to be able to use the same serial number across a migration.  However,
> > > if you look at all the management systems for containers, they all have
> > > a concept of some unique ID per container, be it name, UUID or even
> > > GUID.  I suspect it's that you should be using to tag the audit trail
> > > with.
> > 
> > That does sound potentially useful but for the fact that several
> > containers could share one or more types of namespaces.
> > 
> > Would logging just a container ID be sufficient for audit purposes?  I'm
> > going to have to dig a bit to understand that one because I was unaware
> > each container had a unique ID.
> 
> They don't :)

Ok, so I'd be looking in vain...

> > I did originally consider a UUID/GUID for namespaces.
> 
> So I think that apart from resending to address the serial # overflow
> comment, I'm happy to ack the patches.  Then we probably need to convicne
> Eric that we're not torturing kittens.

I've already fixed the overflow issues.  I'll resend with the fixes.

This patch pair was intended to sort out some of my understanding of the
problem I perceived, and has helped me understand there are other layers
that need work too to make it useful, but this is a good base.

A subsequent piece would be to expose that serial number in the proc
filesystem.

> -serge

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 14:03           ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 14:03 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers, linux-kernel, James Bottomley, linux-audit, ebiederm

On 14/05/06, Serge Hallyn wrote:
> Quoting Richard Guy Briggs (rgb@redhat.com):
> > On 14/05/03, James Bottomley wrote:
> > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > > identifier for each running instance of a kernel?  Or at least some identifier
> > > > within the container migration realm?
> > > 
> > > Are you asking for a way of distinguishing an migrated container from an
> > > unmigrated one?  The answer is pretty much "no" because the job of
> > > migration is to restore to the same state as much as possible.
> > 
> > I hadn't thought to distinguish a migrated container from an unmigrated
> > one, but rather I'm more interested in the underlying namespaces.  The
> > use of a generation number to identify a migrated namespace may be
> > useful along with the logging to tie them together.
> > 
> > > Reading between the lines, I think your goal is to correlate audit
> > > information across a container migration, right?  Ideally the management
> > > system should be able to cough up an audit trail for a container
> > > wherever it's running and however many times it's been migrated?
> > 
> > The original intent was to track the underlying namespaces themselves.
> > This sounds like another layer on top of that which sounds useful but
> > that I had not yet considered.
> > 
> > But yes, that sounds like a good eventual goal.
> 
> Right and we don't need that now, all *I* wanted to convince myself of
> was that a serial # as you were using it was not going to be a roadlbock
> to that, since once we introduce a serial #, we're stuck with that as
> user-space facing api.

Understood.  If a container gets migrated somewhere along with its
namespace, the namespace elsewhere is going to have a new serial number,
but the migration log is going to hopefully show both serial numbers.
If that container gets migrated back, the supporting namespace will get
yet a new serial number, with its log trail connecting the previous
remote one.  Those logs can be used by a higher layer audit aggregator
to piece together those log crumbs.

The serial number was intended to be an alternative to the inode numbers
which had the issues of needing a qualifying device number accompanying
it, plus the reservation that that inode number could change in the
future to solve unforseen technical problems.  I saw no other stable
identifiers common to all namespace types with which I could work.

Containers may have their own names, but I didn't see any consistent way
to identify namespace instances.

> > > In that case, I think your idea of a numeric serial number in a dense
> > > range is wrong.  Because the range is dense you're obviously never going
> > > to be able to use the same serial number across a migration.  However,
> > > if you look at all the management systems for containers, they all have
> > > a concept of some unique ID per container, be it name, UUID or even
> > > GUID.  I suspect it's that you should be using to tag the audit trail
> > > with.
> > 
> > That does sound potentially useful but for the fact that several
> > containers could share one or more types of namespaces.
> > 
> > Would logging just a container ID be sufficient for audit purposes?  I'm
> > going to have to dig a bit to understand that one because I was unaware
> > each container had a unique ID.
> 
> They don't :)

Ok, so I'd be looking in vain...

> > I did originally consider a UUID/GUID for namespaces.
> 
> So I think that apart from resending to address the serial # overflow
> comment, I'm happy to ack the patches.  Then we probably need to convicne
> Eric that we're not torturing kittens.

I've already fixed the overflow issues.  I'll resend with the fixes.

This patch pair was intended to sort out some of my understanding of the
problem I perceived, and has helped me understand there are other layers
that need work too to make it useful, but this is a good base.

A subsequent piece would be to expose that serial number in the proc
filesystem.

> -serge

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06  4:59                     ` James Bottomley
@ 2014-05-06 14:50                           ` Serge Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-06 14:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, sgrubb-H+wXaHxf7aLQT0dZR+AlfA

Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> On Tue, 2014-05-06 at 03:27 +0000, Serge Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> > > >> Right, but when the contaner has an audit namespace, that namespace
> > > >has
> > > >> a name,
> > > >
> > > >What ns has a name?
> > > 
> > > The netns for instance.
> > 
> > And what is its name?
> 
> As I think you know ip netns list will show you all of them.  The way

Ah.  Now I see, thanks :)  I never actually use that feature (other
than when debugging how mounts propagation affects how that's implemented)
which is why it completely did not occur to me that this might be what you
meant.

However these names are (a) not in the kernel, (b) not unique per-boot,
and (c) not applicable to other namespaces (without more userspace
tweaking).  So these are not a substitute for what Richard is proposing.

> they're applied is via mapped files in /var/run/netns/ which hold the
> names.
> 
> >   The only name I know that we could log in an
> > audit message is the /proc/self/ns/net inode number (which does not
> > suffice)
> 
> OK, so I think this is the confusion: You're thinking the container
> itself doesn't know what name the namespace has been given by the
> system, all it knows is the inode number corresponding to a file which
> it may or may not be able to see, right?  I'm thinking that the system
> that set up the container gave those files names and usually they're the
> same name for all the namespaces.  The point is that the orchestration
> system (whatever set up the container) will be responsible for the
> migration.  It will be the thing that has a unique handle for the
> container.

(Several things to reply to there but I'll pick just one,)

We are not looking for a unique name for a container, that's far too
coarse.  Within that container there may be many daemons which have
unshared their own namespaces, i.e. cgmanager unshared a mntns,
vsftpd unshared a netns, etc.  We want the namespace identified in
the audit messages.  We want, within an audit record for a system
boot, for each namespace to be *uniquely* identified.  I don't know
how many people are still doing capp/lspp type installs, but that's
the level I'm thinking at for this.  It's not syslog, it's audit.

> The handle is usually ascii representable, either a human
> readable name or some uuid/guid.  It's that handle that we should be
> using to prefix the audit message, so when you set up an audit
> namespace, it gets supplied with a prefix string corresponding to the
> well known name for the container.  This is the string we'd preserve
> across migration as part of the audit namespace state ... so the audit
> messages all correlate to the container wherever it's migrated to; no
> need to do complex tracking of changes to serial numbers.
> 
> > > >  The audit ns can be tied to 50 pid namespaces, and
> > > >we
> > > >want to log which pidns is responsible for something.
> > > >
> > > >If you mean the pidns has a name, that's the problem...  it does not,
> > > >it
> > > >only has a inode # which may later be re-use.
> > > 
> > > I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.
> > 
> > Maybe this is where we're confusing each other - I'm not talking
> > about giving the audit ns a name.  I'm talking about being able to
> > identify the other namespaces inside an audit message.  In a way
> > that (a) is unique across bare metals' entire uptime, and (b)
> > can be tracked across migrations.
> 
> OK, so that is different from what I'm thinking.  I'm thinking unique
> name for migrateable entity, you want a unique name for each component
> of the migrateable entity?  My instinct still tells me the orchestration
> system is going to have a unique identifier for each different sub
> container.
> 
> However, I have to point out that a serial number isn't what you want
> either if you really mean bare metal.  We do a lot of deployments where
> the containers run in a hypervisor, there the serial numbers won't be
> unique per box (only per vm) and we'll have to do vm correlation
> separately.  whereas a scheme which allows the orchestration system to
> supply the names would still be unique in that situation.
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 14:50                           ` Serge Hallyn
  0 siblings, 0 replies; 58+ messages in thread
From: Serge Hallyn @ 2014-05-06 14:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: containers, linux-kernel, eparis, linux-audit, ebiederm, sgrubb

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> On Tue, 2014-05-06 at 03:27 +0000, Serge Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > >> Right, but when the contaner has an audit namespace, that namespace
> > > >has
> > > >> a name,
> > > >
> > > >What ns has a name?
> > > 
> > > The netns for instance.
> > 
> > And what is its name?
> 
> As I think you know ip netns list will show you all of them.  The way

Ah.  Now I see, thanks :)  I never actually use that feature (other
than when debugging how mounts propagation affects how that's implemented)
which is why it completely did not occur to me that this might be what you
meant.

However these names are (a) not in the kernel, (b) not unique per-boot,
and (c) not applicable to other namespaces (without more userspace
tweaking).  So these are not a substitute for what Richard is proposing.

> they're applied is via mapped files in /var/run/netns/ which hold the
> names.
> 
> >   The only name I know that we could log in an
> > audit message is the /proc/self/ns/net inode number (which does not
> > suffice)
> 
> OK, so I think this is the confusion: You're thinking the container
> itself doesn't know what name the namespace has been given by the
> system, all it knows is the inode number corresponding to a file which
> it may or may not be able to see, right?  I'm thinking that the system
> that set up the container gave those files names and usually they're the
> same name for all the namespaces.  The point is that the orchestration
> system (whatever set up the container) will be responsible for the
> migration.  It will be the thing that has a unique handle for the
> container.

(Several things to reply to there but I'll pick just one,)

We are not looking for a unique name for a container, that's far too
coarse.  Within that container there may be many daemons which have
unshared their own namespaces, i.e. cgmanager unshared a mntns,
vsftpd unshared a netns, etc.  We want the namespace identified in
the audit messages.  We want, within an audit record for a system
boot, for each namespace to be *uniquely* identified.  I don't know
how many people are still doing capp/lspp type installs, but that's
the level I'm thinking at for this.  It's not syslog, it's audit.

> The handle is usually ascii representable, either a human
> readable name or some uuid/guid.  It's that handle that we should be
> using to prefix the audit message, so when you set up an audit
> namespace, it gets supplied with a prefix string corresponding to the
> well known name for the container.  This is the string we'd preserve
> across migration as part of the audit namespace state ... so the audit
> messages all correlate to the container wherever it's migrated to; no
> need to do complex tracking of changes to serial numbers.
> 
> > > >  The audit ns can be tied to 50 pid namespaces, and
> > > >we
> > > >want to log which pidns is responsible for something.
> > > >
> > > >If you mean the pidns has a name, that's the problem...  it does not,
> > > >it
> > > >only has a inode # which may later be re-use.
> > > 
> > > I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.
> > 
> > Maybe this is where we're confusing each other - I'm not talking
> > about giving the audit ns a name.  I'm talking about being able to
> > identify the other namespaces inside an audit message.  In a way
> > that (a) is unique across bare metals' entire uptime, and (b)
> > can be tracked across migrations.
> 
> OK, so that is different from what I'm thinking.  I'm thinking unique
> name for migrateable entity, you want a unique name for each component
> of the migrateable entity?  My instinct still tells me the orchestration
> system is going to have a unique identifier for each different sub
> container.
> 
> However, I have to point out that a serial number isn't what you want
> either if you really mean bare metal.  We do a lot of deployments where
> the containers run in a hypervisor, there the serial numbers won't be
> unique per box (only per vm) and we'll have to do vm correlation
> separately.  whereas a scheme which allows the orchestration system to
> supply the names would still be unique in that situation.
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05  9:23         ` Nicolas Dichtel
@ 2014-05-06 21:15               ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:15 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On 14/05/05, Nicolas Dichtel wrote:
> Le 02/05/2014 16:28, Richard Guy Briggs a ?crit :
> >On 14/05/02, Serge E. Hallyn wrote:
> >>Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> >>>I saw no replies to my questions when I replied a year after Aris' posting, so
> >>>I don't know if it was ignored or got lost in stale threads:
> >>>         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> >>>         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> >>>	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> >>>         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> >>>
> >>>I've tried to answer a number of questions that were raised in that thread.
> >>>
> >>>The goal is not quite identical to Aris' patchset.
> >>>
> >>>The purpose is to track namespaces in use by logged processes from the
> >>>perspective of init_*_ns.  The first patch defines a function to list them.
> >>>The second patch provides an example of usage for audit_log_task_info() which
> >>>is used by syscall audits, among others.  audit_log_task() and
> >>>audit_common_recv_message() would be other potential use cases.
> >>>
> >>>Use a serial number per namespace (unique across one boot of one kernel)
> >>>instead of the inode number (which is claimed to have had the right to change
> >>>reserved and is not necessarily unique if there is more than one proc fs).  It
> >>>could be argued that the inode numbers have now become a defacto interface and
> >>>can't change now, but I'm proposing this approach to see if this helps address
> >>>some of the objections to the earlier patchset.
> >>>
> >>>There could also have messages added to track the creation and the destruction
> >>>of namespaces, listing the parent for hierarchical namespaces such as pidns,
> >>>userns, and listing other ids for non-hierarchical namespaces, as well as other
> >>>information to help identify a namespace.
> >>>
> >>>There has been some progress made for audit in net namespaces and pid
> >>>namespaces since this previous thread.  net namespaces are now served as peers
> >>>by one auditd in the init_net namespace with processes in a non-init_net
> >>>namespace being able to write records if they are in the init_user_ns and have
> >>>CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> >>>records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> >>>of userspace processes that try to join netlink broadcast groups.
> >>>
> >>>
> >>>Questions:
> >>>Is there a way to link serial numbers of namespaces involved in migration of a
> >>>container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> >>>identifier for each running instance of a kernel?  Or at least some identifier
> >>>within the container migration realm?
> >>
> >>Eric Biederman has always been adamantly opposed to adding new namespaces
> >>of namespaces, so the fact that you're asking this question concerns me.
> >
> >I have seen that position and I don't fully understand the justification
> >for it other than added complexity.
> Just FYI, have you seen this thread:
> http://thread.gmane.org/gmane.linux.network/286572/
> 
> There is some explanations/examples about this topic.

Thanks for that reference.  I read it through, but will need to do so
again to get it to sink in.

> Nicolas

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 21:15               ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:15 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Serge E. Hallyn, ebiederm, containers, serge.hallyn,
	linux-kernel, linux-audit

On 14/05/05, Nicolas Dichtel wrote:
> Le 02/05/2014 16:28, Richard Guy Briggs a ?crit :
> >On 14/05/02, Serge E. Hallyn wrote:
> >>Quoting Richard Guy Briggs (rgb@redhat.com):
> >>>I saw no replies to my questions when I replied a year after Aris' posting, so
> >>>I don't know if it was ignored or got lost in stale threads:
> >>>         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> >>>         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> >>>	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> >>>         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> >>>
> >>>I've tried to answer a number of questions that were raised in that thread.
> >>>
> >>>The goal is not quite identical to Aris' patchset.
> >>>
> >>>The purpose is to track namespaces in use by logged processes from the
> >>>perspective of init_*_ns.  The first patch defines a function to list them.
> >>>The second patch provides an example of usage for audit_log_task_info() which
> >>>is used by syscall audits, among others.  audit_log_task() and
> >>>audit_common_recv_message() would be other potential use cases.
> >>>
> >>>Use a serial number per namespace (unique across one boot of one kernel)
> >>>instead of the inode number (which is claimed to have had the right to change
> >>>reserved and is not necessarily unique if there is more than one proc fs).  It
> >>>could be argued that the inode numbers have now become a defacto interface and
> >>>can't change now, but I'm proposing this approach to see if this helps address
> >>>some of the objections to the earlier patchset.
> >>>
> >>>There could also have messages added to track the creation and the destruction
> >>>of namespaces, listing the parent for hierarchical namespaces such as pidns,
> >>>userns, and listing other ids for non-hierarchical namespaces, as well as other
> >>>information to help identify a namespace.
> >>>
> >>>There has been some progress made for audit in net namespaces and pid
> >>>namespaces since this previous thread.  net namespaces are now served as peers
> >>>by one auditd in the init_net namespace with processes in a non-init_net
> >>>namespace being able to write records if they are in the init_user_ns and have
> >>>CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> >>>records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> >>>of userspace processes that try to join netlink broadcast groups.
> >>>
> >>>
> >>>Questions:
> >>>Is there a way to link serial numbers of namespaces involved in migration of a
> >>>container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> >>>identifier for each running instance of a kernel?  Or at least some identifier
> >>>within the container migration realm?
> >>
> >>Eric Biederman has always been adamantly opposed to adding new namespaces
> >>of namespaces, so the fact that you're asking this question concerns me.
> >
> >I have seen that position and I don't fully understand the justification
> >for it other than added complexity.
> Just FYI, have you seen this thread:
> http://thread.gmane.org/gmane.linux.network/286572/
> 
> There is some explanations/examples about this topic.

Thanks for that reference.  I read it through, but will need to do so
again to get it to sink in.

> Nicolas

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-05 23:23                   ` James Bottomley
  (?)
@ 2014-05-06 21:41                       ` Richard Guy Briggs
  -1 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA

On 14/05/05, James Bottomley wrote:
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> >Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> >> > Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.
> 
> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just
> need a stable id to tie the audit to, so why not just give the audit
> namespace a name like net?  The id would then be durable across
> migrations.

Audit does not have its own namespace (yet).  That idea is being
considered, but we would prefer to avoid it if it makes sense to tie it
in with an existing namespace.  The pid and user namespaces, being
heierarchical seem to make the most sense so far, but we are proceeding
very carefully to avoid creating a security nightmare in the process.

From the kernel's perspective, none of the namespaces have a name.  A
container concept of a group of namespaces may have been assigned one,
but that isn't apparent to the layer that is logging this information.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?

There would certainly need to be a way to tie these namespace
identifiers to container names in log messages.

> >> James
> >
> >Sorry if I'm being dense...
> 
> No I think our assumptions are mismatched. I just can't figure out where.
> 
> James

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 21:41                       ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge Hallyn, containers, linux-kernel, linux-audit, ebiederm

On 14/05/05, James Bottomley wrote:
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> >Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> >> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(James.Bottomley@HansenPartnership.com):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.
> 
> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just
> need a stable id to tie the audit to, so why not just give the audit
> namespace a name like net?  The id would then be durable across
> migrations.

Audit does not have its own namespace (yet).  That idea is being
considered, but we would prefer to avoid it if it makes sense to tie it
in with an existing namespace.  The pid and user namespaces, being
heierarchical seem to make the most sense so far, but we are proceeding
very carefully to avoid creating a security nightmare in the process.

>From the kernel's perspective, none of the namespaces have a name.  A
container concept of a group of namespaces may have been assigned one,
but that isn't apparent to the layer that is logging this information.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?

There would certainly need to be a way to tie these namespace
identifiers to container names in log messages.

> >> James
> >
> >Sorry if I'm being dense...
> 
> No I think our assumptions are mismatched. I just can't figure out where.
> 
> James

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 21:41                       ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA

On 14/05/05, James Bottomley wrote:
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> >Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> >> > Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.
> 
> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just
> need a stable id to tie the audit to, so why not just give the audit
> namespace a name like net?  The id would then be durable across
> migrations.

Audit does not have its own namespace (yet).  That idea is being
considered, but we would prefer to avoid it if it makes sense to tie it
in with an existing namespace.  The pid and user namespaces, being
heierarchical seem to make the most sense so far, but we are proceeding
very carefully to avoid creating a security nightmare in the process.

>From the kernel's perspective, none of the namespaces have a name.  A
container concept of a group of namespaces may have been assigned one,
but that isn't apparent to the layer that is logging this information.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?

There would certainly need to be a way to tie these namespace
identifiers to container names in log messages.

> >> James
> >
> >Sorry if I'm being dense...
> 
> No I think our assumptions are mismatched. I just can't figure out where.
> 
> James

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06  4:59                     ` James Bottomley
@ 2014-05-06 21:59                           ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA

On 14/05/05, James Bottomley wrote:
> On Tue, 2014-05-06 at 03:27 +0000, Serge Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> > > >> Right, but when the contaner has an audit namespace, that namespace
> > > >has
> > > >> a name,
> > > >
> > > >What ns has a name?
> > > 
> > > The netns for instance.
> > 
> > And what is its name?
> 
> As I think you know ip netns list will show you all of them.  The way
> they're applied is via mapped files in /var/run/netns/ which hold the
> names.
> 
> >   The only name I know that we could log in an
> > audit message is the /proc/self/ns/net inode number (which does not
> > suffice)
> 
> OK, so I think this is the confusion: You're thinking the container
> itself doesn't know what name the namespace has been given by the
> system, all it knows is the inode number corresponding to a file which
> it may or may not be able to see, right?

I guess if that container hasn't mounted /proc, it couldn't find out.
The same would be true of /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq to
find out its namespace serial numbers, but that doesn't stop that
container from initiating an audit message with the information it
knows, which can be supplemented by information the kernel already knows
about it.

> I'm thinking that the system
> that set up the container gave those files names and usually they're the
> same name for all the namespaces.  The point is that the orchestration
> system (whatever set up the container) will be responsible for the
> migration.  It will be the thing that has a unique handle for the
> container.  The handle is usually ascii representable, either a human
> readable name or some uuid/guid.  It's that handle that we should be
> using to prefix the audit message,

It is now possible to send audit messages while in another non-init
namespace, so from there, it could record that handle and have the
namespace serial numbers from the kernel logged with that message.  This
would be recorded by the host audit daemon, not the container audit
daemon.  The container management system could talk to this host audit
daemon to re-assemble an audit record trail for that container.

> so when you set up an audit
> namespace, it gets supplied with a prefix string corresponding to the
> well known name for the container.  This is the string we'd preserve
> across migration as part of the audit namespace state ... so the audit
> messages all correlate to the container wherever it's migrated to; no
> need to do complex tracking of changes to serial numbers.

That is a further step: having a container have its own audit daemon.

> > > >  The audit ns can be tied to 50 pid namespaces, and
> > > >we
> > > >want to log which pidns is responsible for something.
> > > >
> > > >If you mean the pidns has a name, that's the problem...  it does not,
> > > >it
> > > >only has a inode # which may later be re-use.
> > > 
> > > I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.
> > 
> > Maybe this is where we're confusing each other - I'm not talking
> > about giving the audit ns a name.  I'm talking about being able to
> > identify the other namespaces inside an audit message.  In a way
> > that (a) is unique across bare metals' entire uptime, and (b)
> > can be tracked across migrations.
> 
> OK, so that is different from what I'm thinking.  I'm thinking unique
> name for migrateable entity, you want a unique name for each component
> of the migrateable entity?

Yes.

>  My instinct still tells me the orchestration
> system is going to have a unique identifier for each different sub
> container.

So what is a sub container?  A nested container?  We still want to track
component namespaces of each nested container.

> However, I have to point out that a serial number isn't what you want
> either if you really mean bare metal.  We do a lot of deployments where
> the containers run in a hypervisor, there the serial numbers won't be
> unique per box (only per vm) and we'll have to do vm correlation
> separately.  whereas a scheme which allows the orchestration system to
> supply the names would still be unique in that situation.

Unique per _running kernel_ was my intention.  I don't care if it is
bare metal or not.

> James

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 21:59                           ` Richard Guy Briggs
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Guy Briggs @ 2014-05-06 21:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge Hallyn, containers, linux-kernel, linux-audit, ebiederm

On 14/05/05, James Bottomley wrote:
> On Tue, 2014-05-06 at 03:27 +0000, Serge Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > >> Right, but when the contaner has an audit namespace, that namespace
> > > >has
> > > >> a name,
> > > >
> > > >What ns has a name?
> > > 
> > > The netns for instance.
> > 
> > And what is its name?
> 
> As I think you know ip netns list will show you all of them.  The way
> they're applied is via mapped files in /var/run/netns/ which hold the
> names.
> 
> >   The only name I know that we could log in an
> > audit message is the /proc/self/ns/net inode number (which does not
> > suffice)
> 
> OK, so I think this is the confusion: You're thinking the container
> itself doesn't know what name the namespace has been given by the
> system, all it knows is the inode number corresponding to a file which
> it may or may not be able to see, right?

I guess if that container hasn't mounted /proc, it couldn't find out.
The same would be true of /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq to
find out its namespace serial numbers, but that doesn't stop that
container from initiating an audit message with the information it
knows, which can be supplemented by information the kernel already knows
about it.

> I'm thinking that the system
> that set up the container gave those files names and usually they're the
> same name for all the namespaces.  The point is that the orchestration
> system (whatever set up the container) will be responsible for the
> migration.  It will be the thing that has a unique handle for the
> container.  The handle is usually ascii representable, either a human
> readable name or some uuid/guid.  It's that handle that we should be
> using to prefix the audit message,

It is now possible to send audit messages while in another non-init
namespace, so from there, it could record that handle and have the
namespace serial numbers from the kernel logged with that message.  This
would be recorded by the host audit daemon, not the container audit
daemon.  The container management system could talk to this host audit
daemon to re-assemble an audit record trail for that container.

> so when you set up an audit
> namespace, it gets supplied with a prefix string corresponding to the
> well known name for the container.  This is the string we'd preserve
> across migration as part of the audit namespace state ... so the audit
> messages all correlate to the container wherever it's migrated to; no
> need to do complex tracking of changes to serial numbers.

That is a further step: having a container have its own audit daemon.

> > > >  The audit ns can be tied to 50 pid namespaces, and
> > > >we
> > > >want to log which pidns is responsible for something.
> > > >
> > > >If you mean the pidns has a name, that's the problem...  it does not,
> > > >it
> > > >only has a inode # which may later be re-use.
> > > 
> > > I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net?  The id would then be durable across migrations.
> > 
> > Maybe this is where we're confusing each other - I'm not talking
> > about giving the audit ns a name.  I'm talking about being able to
> > identify the other namespaces inside an audit message.  In a way
> > that (a) is unique across bare metals' entire uptime, and (b)
> > can be tracked across migrations.
> 
> OK, so that is different from what I'm thinking.  I'm thinking unique
> name for migrateable entity, you want a unique name for each component
> of the migrateable entity?

Yes.

>  My instinct still tells me the orchestration
> system is going to have a unique identifier for each different sub
> container.

So what is a sub container?  A nested container?  We still want to track
component namespaces of each nested container.

> However, I have to point out that a serial number isn't what you want
> either if you really mean bare metal.  We do a lot of deployments where
> the containers run in a hypervisor, there the serial numbers won't be
> unique per box (only per vm) and we'll have to do vm correlation
> separately.  whereas a scheme which allows the orchestration system to
> supply the names would still be unique in that situation.

Unique per _running kernel_ was my intention.  I don't care if it is
bare metal or not.

> James

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06 21:41                       ` Richard Guy Briggs
@ 2014-05-06 23:57                           ` James Bottomley
  -1 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-06 23:57 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA

On Tue, 2014-05-06 at 17:41 -0400, Richard Guy Briggs wrote:
> On 14/05/05, James Bottomley wrote:
> > On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> > >Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> > >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> > >> > Quoting James Bottomley (James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> > >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > >> > > > On 14/05/05, Serge E. Hallyn wrote:
> > >> > > > > Quoting James Bottomley
> > >(James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org):
> > >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> > >wrote:
> > >> > > > > > > Questions:
> > >> > > > > > > Is there a way to link serial numbers of namespaces
> > >involved in migration of a
> > >> > > > > > > container to another kernel?  (I had a brief look at
> > >CRIU.)  Is there a unique
> > >> > > > > > > identifier for each running instance of a kernel?  Or at
> > >least some identifier
> > >> > > > > > > within the container migration realm?
> > >> > > > > > 
> > >> > > > > > Are you asking for a way of distinguishing an migrated
> > >container from an
> > >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> > >job of
> > >> > > > > > migration is to restore to the same state as much as
> > >possible.
> > >> > > > > > 
> > >> > > > > > Reading between the lines, I think your goal is to
> > >correlate audit
> > >> > > > > > information across a container migration, right?  Ideally
> > >the management
> > >> > > > > > system should be able to cough up an audit trail for a
> > >container
> > >> > > > > > wherever it's running and however many times it's been
> > >migrated?
> > >> > > > > > 
> > >> > > > > > In that case, I think your idea of a numeric serial number
> > >in a dense
> > >> > > > > > range is wrong.  Because the range is dense you're
> > >obviously never going
> > >> > > > > > to be able to use the same serial number across a
> > >migration.  However,
> > >> > > > > 
> > >> > > > > Ah, but I was being silly before, we can actually address
> > >this pretty
> > >> > > > > simply.  If we just (for instance) add
> > >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> > >serial number
> > >> > > > > for the relevant ns for the task, then criu can dump this
> > >info at
> > >> > > > > checkpoint.  Then at restart it can dump an audit message per
> > >task and
> > >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> > >log reader
> > >> > > > > can if it cares keep track.
> > >> > > > 
> > >> > > > This is the sort of idea I had in mind...
> > >> > > 
> > >> > > OK, but I don't understand then why you need a serial number. 
> > >There are
> > >> > > plenty of things we preserve across a migration, like namespace
> > >name for
> > >> > > instance.  Could you explain what function it performs because I
> > >think I
> > >> > > might be missing something.
> > >> > 
> > >> > We're looking ahead to a time when audit is namespaced, and a
> > >container
> > >> > can keep its own audit logs (without limiting what the host audits
> > >of
> > >> > course).  So if a container is auditing suspicious activity by some
> > >> > task in a sub-namesapce, then the whole parent container gets
> > >migrated,
> > >> > after migration we want to continue being able to correlate the
> > >namespaces.
> > >> > 
> > >> > We're also looking at audit trails on a host that is up for years. 
> > >We
> > >> > would like every namespace to be uniquely logged there.  That is
> > >why
> > >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> > >generation
> > >> > # (which would end more complicated, not less, than a serial #).
> > >> 
> > >> Right, but when the contaner has an audit namespace, that namespace
> > >has
> > >> a name,
> > >
> > >What ns has a name?
> > 
> > The netns for instance.
> > 
> > >  The audit ns can be tied to 50 pid namespaces, and
> > >we
> > >want to log which pidns is responsible for something.
> > >
> > >If you mean the pidns has a name, that's the problem...  it does not,
> > >it
> > >only has a inode # which may later be re-use.
> > 
> > I still think there's a miscommunication somewhere: I believe you just
> > need a stable id to tie the audit to, so why not just give the audit
> > namespace a name like net?  The id would then be durable across
> > migrations.
> 
> Audit does not have its own namespace (yet).

So it would make the most sense to do this if audit were a separately
attachable capability the orchestrator would like to control.  I'm not
sure about that so I'll consider some use cases below.

>   That idea is being
> considered, but we would prefer to avoid it if it makes sense to tie it
> in with an existing namespace.  The pid and user namespaces, being
> heierarchical seem to make the most sense so far, but we are proceeding
> very carefully to avoid creating a security nightmare in the process.

pid ns might be.  You need that on almost everything that runs in an OS
like container, but it might not be present for an application.  For an
IaaS container, it doesn't much matter: we attache every namespace.  For
application type containers, it depends.  The lightest weight container
setup is the containerised apache one where you have a shared web
hosting service and you spawn the apache thread into a task cgroup
connected to a mount namespace ... do you need to audit that?  Probably
not; apache has reasonable logging on its own.

The next class of applications is discrete service ones ... annoying
apps that try to bind to 0.0.0.0; you containerise them by placing them
in a net namespace only with their own net device.  Mostly you trust
them to run, you just want to restrict their IP attachment.  Do you want
to audit these ... possibly.

Then there's the fully containerised applications, mostly used for
multi-tenant services.  These often have a net name space (separate IP
devices), a mount namespace (separate data stores) they may have a pid
namespace and they might have a user one (but probably only if the
application needs to run as root) .. they probably need auditing.

Finally, of course, there's the full OS containers with one of every
namespace and cgroup going ... they'll want to appear to run their own
audit daemon (although we can make it a dummy and just pull it into the
host).

> >From the kernel's perspective, none of the namespaces have a name.  A
> container concept of a group of namespaces may have been assigned one,
> but that isn't apparent to the layer that is logging this information.

That's why an audit namespace with a settable prefix looks potentially
interesting: the orchestration system decides what stuff it cares about
being audited separately and slaps it into its own audit namespace.
Stuff you don't care about you leave to the host audit (no separate ns).
It also gets you out of trying to decide which other namespace should be
paired with audit, because now it's fully configurable.

> > >> which CRIU would migrate, so why not use that name for the
> > >> log .. no need for numbers (unless you make the name a number, of
> > >> course)?
> 
> There would certainly need to be a way to tie these namespace
> identifiers to container names in log messages.

Right, and coming from a company that produces orchestration systems all
we really care about is "what's coming out of this entity fred I
configured up yesterday", so we don't care about labelling the
individual namespaces and cgroups, we do care which of them correspond
to fred.  I suppose there are other use cases, though, and I just didn't
notice when people described them.

James

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-06 23:57                           ` James Bottomley
  0 siblings, 0 replies; 58+ messages in thread
From: James Bottomley @ 2014-05-06 23:57 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Serge Hallyn, containers, linux-kernel, linux-audit, ebiederm

On Tue, 2014-05-06 at 17:41 -0400, Richard Guy Briggs wrote:
> On 14/05/05, James Bottomley wrote:
> > On May 5, 2014 3:36:38 PM PDT, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > >Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > >> On Mon, 2014-05-05 at 22:27 +0000, Serge Hallyn wrote:
> > >> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > >> > > > On 14/05/05, Serge E. Hallyn wrote:
> > >> > > > > Quoting James Bottomley
> > >(James.Bottomley@HansenPartnership.com):
> > >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> > >wrote:
> > >> > > > > > > Questions:
> > >> > > > > > > Is there a way to link serial numbers of namespaces
> > >involved in migration of a
> > >> > > > > > > container to another kernel?  (I had a brief look at
> > >CRIU.)  Is there a unique
> > >> > > > > > > identifier for each running instance of a kernel?  Or at
> > >least some identifier
> > >> > > > > > > within the container migration realm?
> > >> > > > > > 
> > >> > > > > > Are you asking for a way of distinguishing an migrated
> > >container from an
> > >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> > >job of
> > >> > > > > > migration is to restore to the same state as much as
> > >possible.
> > >> > > > > > 
> > >> > > > > > Reading between the lines, I think your goal is to
> > >correlate audit
> > >> > > > > > information across a container migration, right?  Ideally
> > >the management
> > >> > > > > > system should be able to cough up an audit trail for a
> > >container
> > >> > > > > > wherever it's running and however many times it's been
> > >migrated?
> > >> > > > > > 
> > >> > > > > > In that case, I think your idea of a numeric serial number
> > >in a dense
> > >> > > > > > range is wrong.  Because the range is dense you're
> > >obviously never going
> > >> > > > > > to be able to use the same serial number across a
> > >migration.  However,
> > >> > > > > 
> > >> > > > > Ah, but I was being silly before, we can actually address
> > >this pretty
> > >> > > > > simply.  If we just (for instance) add
> > >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> > >serial number
> > >> > > > > for the relevant ns for the task, then criu can dump this
> > >info at
> > >> > > > > checkpoint.  Then at restart it can dump an audit message per
> > >task and
> > >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> > >log reader
> > >> > > > > can if it cares keep track.
> > >> > > > 
> > >> > > > This is the sort of idea I had in mind...
> > >> > > 
> > >> > > OK, but I don't understand then why you need a serial number. 
> > >There are
> > >> > > plenty of things we preserve across a migration, like namespace
> > >name for
> > >> > > instance.  Could you explain what function it performs because I
> > >think I
> > >> > > might be missing something.
> > >> > 
> > >> > We're looking ahead to a time when audit is namespaced, and a
> > >container
> > >> > can keep its own audit logs (without limiting what the host audits
> > >of
> > >> > course).  So if a container is auditing suspicious activity by some
> > >> > task in a sub-namesapce, then the whole parent container gets
> > >migrated,
> > >> > after migration we want to continue being able to correlate the
> > >namespaces.
> > >> > 
> > >> > We're also looking at audit trails on a host that is up for years. 
> > >We
> > >> > would like every namespace to be uniquely logged there.  That is
> > >why
> > >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> > >generation
> > >> > # (which would end more complicated, not less, than a serial #).
> > >> 
> > >> Right, but when the contaner has an audit namespace, that namespace
> > >has
> > >> a name,
> > >
> > >What ns has a name?
> > 
> > The netns for instance.
> > 
> > >  The audit ns can be tied to 50 pid namespaces, and
> > >we
> > >want to log which pidns is responsible for something.
> > >
> > >If you mean the pidns has a name, that's the problem...  it does not,
> > >it
> > >only has a inode # which may later be re-use.
> > 
> > I still think there's a miscommunication somewhere: I believe you just
> > need a stable id to tie the audit to, so why not just give the audit
> > namespace a name like net?  The id would then be durable across
> > migrations.
> 
> Audit does not have its own namespace (yet).

So it would make the most sense to do this if audit were a separately
attachable capability the orchestrator would like to control.  I'm not
sure about that so I'll consider some use cases below.

>   That idea is being
> considered, but we would prefer to avoid it if it makes sense to tie it
> in with an existing namespace.  The pid and user namespaces, being
> heierarchical seem to make the most sense so far, but we are proceeding
> very carefully to avoid creating a security nightmare in the process.

pid ns might be.  You need that on almost everything that runs in an OS
like container, but it might not be present for an application.  For an
IaaS container, it doesn't much matter: we attache every namespace.  For
application type containers, it depends.  The lightest weight container
setup is the containerised apache one where you have a shared web
hosting service and you spawn the apache thread into a task cgroup
connected to a mount namespace ... do you need to audit that?  Probably
not; apache has reasonable logging on its own.

The next class of applications is discrete service ones ... annoying
apps that try to bind to 0.0.0.0; you containerise them by placing them
in a net namespace only with their own net device.  Mostly you trust
them to run, you just want to restrict their IP attachment.  Do you want
to audit these ... possibly.

Then there's the fully containerised applications, mostly used for
multi-tenant services.  These often have a net name space (separate IP
devices), a mount namespace (separate data stores) they may have a pid
namespace and they might have a user one (but probably only if the
application needs to run as root) .. they probably need auditing.

Finally, of course, there's the full OS containers with one of every
namespace and cgroup going ... they'll want to appear to run their own
audit daemon (although we can make it a dummy and just pull it into the
host).

> >From the kernel's perspective, none of the namespaces have a name.  A
> container concept of a group of namespaces may have been assigned one,
> but that isn't apparent to the layer that is logging this information.

That's why an audit namespace with a settable prefix looks potentially
interesting: the orchestration system decides what stuff it cares about
being audited separately and slaps it into its own audit namespace.
Stuff you don't care about you leave to the host audit (no separate ns).
It also gets you out of trying to decide which other namespace should be
paired with audit, because now it's fully configurable.

> > >> which CRIU would migrate, so why not use that name for the
> > >> log .. no need for numbers (unless you make the name a number, of
> > >> course)?
> 
> There would certainly need to be a way to tie these namespace
> identifiers to container names in log messages.

Right, and coming from a company that produces orchestration systems all
we really care about is "what's coming out of this entity fred I
configured up yesterday", so we don't care about labelling the
individual namespaces and cgroups, we do care which of them correspond
to fred.  I suppose there are other use cases, though, and I just didn't
notice when people described them.

James



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
  2014-05-06 21:15               ` Richard Guy Briggs
@ 2014-05-07  9:35                   ` Nicolas Dichtel
  -1 siblings, 0 replies; 58+ messages in thread
From: Nicolas Dichtel @ 2014-05-07  9:35 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Le 06/05/2014 23:15, Richard Guy Briggs a écrit :
> On 14/05/05, Nicolas Dichtel wrote:
>> Le 02/05/2014 16:28, Richard Guy Briggs a ?crit :
>>> On 14/05/02, Serge E. Hallyn wrote:
>>>> Quoting Richard Guy Briggs (rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
>>>>> I saw no replies to my questions when I replied a year after Aris' posting, so
>>>>> I don't know if it was ignored or got lost in stale threads:
>>>>>          https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
>>>>>          https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
>>>>> 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
>>>>>          https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
>>>>>
>>>>> I've tried to answer a number of questions that were raised in that thread.
>>>>>
>>>>> The goal is not quite identical to Aris' patchset.
>>>>>
>>>>> The purpose is to track namespaces in use by logged processes from the
>>>>> perspective of init_*_ns.  The first patch defines a function to list them.
>>>>> The second patch provides an example of usage for audit_log_task_info() which
>>>>> is used by syscall audits, among others.  audit_log_task() and
>>>>> audit_common_recv_message() would be other potential use cases.
>>>>>
>>>>> Use a serial number per namespace (unique across one boot of one kernel)
>>>>> instead of the inode number (which is claimed to have had the right to change
>>>>> reserved and is not necessarily unique if there is more than one proc fs).  It
>>>>> could be argued that the inode numbers have now become a defacto interface and
>>>>> can't change now, but I'm proposing this approach to see if this helps address
>>>>> some of the objections to the earlier patchset.
>>>>>
>>>>> There could also have messages added to track the creation and the destruction
>>>>> of namespaces, listing the parent for hierarchical namespaces such as pidns,
>>>>> userns, and listing other ids for non-hierarchical namespaces, as well as other
>>>>> information to help identify a namespace.
>>>>>
>>>>> There has been some progress made for audit in net namespaces and pid
>>>>> namespaces since this previous thread.  net namespaces are now served as peers
>>>>> by one auditd in the init_net namespace with processes in a non-init_net
>>>>> namespace being able to write records if they are in the init_user_ns and have
>>>>> CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
>>>>> records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
>>>>> of userspace processes that try to join netlink broadcast groups.
>>>>>
>>>>>
>>>>> Questions:
>>>>> Is there a way to link serial numbers of namespaces involved in migration of a
>>>>> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
>>>>> identifier for each running instance of a kernel?  Or at least some identifier
>>>>> within the container migration realm?
>>>>
>>>> Eric Biederman has always been adamantly opposed to adding new namespaces
>>>> of namespaces, so the fact that you're asking this question concerns me.
>>>
>>> I have seen that position and I don't fully understand the justification
>>> for it other than added complexity.
>> Just FYI, have you seen this thread:
>> http://thread.gmane.org/gmane.linux.network/286572/
>>
>> There is some explanations/examples about this topic.
>
> Thanks for that reference.  I read it through, but will need to do so
> again to get it to sink in.

I think audit has the same problematic than x-netns netdevice: beeing able to 
identify a peer netns, when a userland apps "read" a message from the kernel.

The main problem with file descriptor is that you cannot use them when you
broadcast a message from kernel to userland.

Maybe we can use the local names concept (like file descriptors but without
their constraints), ie having an identifier of a peer (net)ns which is only
valid the current (net)ns. When the kernel needs to identify a peer (net)ns, it
uses this identifier (or allocate it the first time). After that, the userland
apps may reuse this identifier to configure things in the peer (net)ns.

Eric, any thoughts about this?

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] namespaces: log namespaces per task
@ 2014-05-07  9:35                   ` Nicolas Dichtel
  0 siblings, 0 replies; 58+ messages in thread
From: Nicolas Dichtel @ 2014-05-07  9:35 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Serge E. Hallyn, ebiederm, containers, serge.hallyn,
	linux-kernel, linux-audit

Le 06/05/2014 23:15, Richard Guy Briggs a écrit :
> On 14/05/05, Nicolas Dichtel wrote:
>> Le 02/05/2014 16:28, Richard Guy Briggs a ?crit :
>>> On 14/05/02, Serge E. Hallyn wrote:
>>>> Quoting Richard Guy Briggs (rgb@redhat.com):
>>>>> I saw no replies to my questions when I replied a year after Aris' posting, so
>>>>> I don't know if it was ignored or got lost in stale threads:
>>>>>          https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
>>>>>          https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
>>>>> 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
>>>>>          https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
>>>>>
>>>>> I've tried to answer a number of questions that were raised in that thread.
>>>>>
>>>>> The goal is not quite identical to Aris' patchset.
>>>>>
>>>>> The purpose is to track namespaces in use by logged processes from the
>>>>> perspective of init_*_ns.  The first patch defines a function to list them.
>>>>> The second patch provides an example of usage for audit_log_task_info() which
>>>>> is used by syscall audits, among others.  audit_log_task() and
>>>>> audit_common_recv_message() would be other potential use cases.
>>>>>
>>>>> Use a serial number per namespace (unique across one boot of one kernel)
>>>>> instead of the inode number (which is claimed to have had the right to change
>>>>> reserved and is not necessarily unique if there is more than one proc fs).  It
>>>>> could be argued that the inode numbers have now become a defacto interface and
>>>>> can't change now, but I'm proposing this approach to see if this helps address
>>>>> some of the objections to the earlier patchset.
>>>>>
>>>>> There could also have messages added to track the creation and the destruction
>>>>> of namespaces, listing the parent for hierarchical namespaces such as pidns,
>>>>> userns, and listing other ids for non-hierarchical namespaces, as well as other
>>>>> information to help identify a namespace.
>>>>>
>>>>> There has been some progress made for audit in net namespaces and pid
>>>>> namespaces since this previous thread.  net namespaces are now served as peers
>>>>> by one auditd in the init_net namespace with processes in a non-init_net
>>>>> namespace being able to write records if they are in the init_user_ns and have
>>>>> CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
>>>>> records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
>>>>> of userspace processes that try to join netlink broadcast groups.
>>>>>
>>>>>
>>>>> Questions:
>>>>> Is there a way to link serial numbers of namespaces involved in migration of a
>>>>> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
>>>>> identifier for each running instance of a kernel?  Or at least some identifier
>>>>> within the container migration realm?
>>>>
>>>> Eric Biederman has always been adamantly opposed to adding new namespaces
>>>> of namespaces, so the fact that you're asking this question concerns me.
>>>
>>> I have seen that position and I don't fully understand the justification
>>> for it other than added complexity.
>> Just FYI, have you seen this thread:
>> http://thread.gmane.org/gmane.linux.network/286572/
>>
>> There is some explanations/examples about this topic.
>
> Thanks for that reference.  I read it through, but will need to do so
> again to get it to sink in.

I think audit has the same problematic than x-netns netdevice: beeing able to 
identify a peer netns, when a userland apps "read" a message from the kernel.

The main problem with file descriptor is that you cannot use them when you
broadcast a message from kernel to userland.

Maybe we can use the local names concept (like file descriptors but without
their constraints), ie having an identifier of a peer (net)ns which is only
valid the current (net)ns. When the kernel needs to identify a peer (net)ns, it
uses this identifier (or allocate it the first time). After that, the userland
apps may reuse this identifier to configure things in the peer (net)ns.

Eric, any thoughts about this?

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2014-05-07  9:35 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-22 18:12 [PATCH 0/2] namespaces: log namespaces per task Richard Guy Briggs
2014-04-22 18:12 ` Richard Guy Briggs
     [not found] ` <cover.1398176489.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-04-22 18:12   ` [PATCH 1/2] namespaces: give each namespace a serial number Richard Guy Briggs
2014-04-22 18:12     ` Richard Guy Briggs
     [not found]     ` <be1358c6da51252cd79c51a51bb30bf157624ccd.1398176489.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-01 22:51       ` Serge E. Hallyn
2014-05-01 22:51         ` Serge E. Hallyn
     [not found]         ` <20140501225116.GB25669-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2014-05-02 14:15           ` Richard Guy Briggs
2014-05-02 14:15             ` Richard Guy Briggs
     [not found]             ` <20140502141530.GB24111-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2014-05-02 20:50               ` Serge Hallyn
2014-05-02 20:50                 ` Serge Hallyn
2014-04-22 18:12   ` [PATCH 2/2] audit: log namespace serial numbers Richard Guy Briggs
2014-04-22 18:12     ` Richard Guy Briggs
     [not found]     ` <644ef842bae19c55ae11af07e9fd7ac0ec9c74a1.1398176489.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-01 23:01       ` Serge E. Hallyn
2014-05-01 23:01         ` Serge E. Hallyn
2014-05-01 22:32   ` [PATCH 0/2] namespaces: log namespaces per task Serge E. Hallyn
2014-05-01 22:32     ` Serge E. Hallyn
     [not found]     ` <20140501223212.GA25669-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2014-05-02 14:28       ` Richard Guy Briggs
2014-05-02 14:28         ` Richard Guy Briggs
     [not found]         ` <20140502142851.GC24111-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2014-05-02 21:00           ` Serge Hallyn
2014-05-02 21:00             ` Serge Hallyn
2014-05-05 21:29             ` Richard Guy Briggs
2014-05-05  9:23         ` Nicolas Dichtel
     [not found]           ` <5367587B.20801-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-05-06 21:15             ` Richard Guy Briggs
2014-05-06 21:15               ` Richard Guy Briggs
     [not found]               ` <20140506211530.GB15100-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2014-05-07  9:35                 ` Nicolas Dichtel
2014-05-07  9:35                   ` Nicolas Dichtel
2014-05-03 21:58   ` James Bottomley
2014-05-03 21:58     ` James Bottomley
2014-05-05  3:48     ` Serge E. Hallyn
2014-05-05 21:48       ` Richard Guy Briggs
2014-05-05 21:51         ` James Bottomley
2014-05-05 22:11           ` Richard Guy Briggs
2014-05-05 22:24             ` James Bottomley
2014-05-05 22:27           ` Serge Hallyn
2014-05-05 22:30             ` James Bottomley
2014-05-05 22:36               ` Serge Hallyn
2014-05-05 23:23                 ` James Bottomley
2014-05-05 23:23                   ` James Bottomley
2014-05-06  3:27                   ` Serge Hallyn
2014-05-06  3:27                     ` Serge Hallyn
2014-05-06  4:59                     ` James Bottomley
     [not found]                       ` <1399352350.2164.91.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2014-05-06 14:50                         ` Serge Hallyn
2014-05-06 14:50                           ` Serge Hallyn
2014-05-06 21:59                         ` Richard Guy Briggs
2014-05-06 21:59                           ` Richard Guy Briggs
2014-05-06 12:35                   ` Nicolas Dichtel
2014-05-06 12:35                     ` Nicolas Dichtel
     [not found]                   ` <a09ed85b-d6ef-4472-853b-84057d5957c2-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2014-05-06 12:35                     ` Nicolas Dichtel
2014-05-06 21:41                     ` Richard Guy Briggs
2014-05-06 21:41                       ` Richard Guy Briggs
2014-05-06 21:41                       ` Richard Guy Briggs
     [not found]                       ` <20140506214129.GC15100-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2014-05-06 23:57                         ` James Bottomley
2014-05-06 23:57                           ` James Bottomley
2014-05-05 21:44     ` Richard Guy Briggs
2014-05-06  3:33       ` Serge Hallyn
2014-05-06 14:03         ` Richard Guy Briggs
2014-05-06 14:03         ` Richard Guy Briggs
2014-05-06 14:03           ` Richard Guy Briggs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.